Skip to content
TinyAgent: Edge Function Calling for Small Language Models
Source: bair.berkeley.edu

TinyAgent: Edge Function Calling for Small Language Models

Sources: http://bair.berkeley.edu/blog/2024/05/29/tiny-agent, http://bair.berkeley.edu/blog/2024/05/29/tiny-agent/, BAIR Blog

Overview

TinyAgent presents a pathway to deploy capable language-model agents on edge devices by focusing on function calling rather than broad world-knowledge memorization. Large models such as GPT-4o or Gemini-1.5 offer strong cloud-based capabilities but incur privacy, connectivity, and latency challenges when deployed at scale on the edge. TinyAgent argues that by training small language models (SLMs) on specialized, high-quality data that emphasizes function calling and tool orchestration, it is possible to achieve robust, real-time edge performance without relying on remote inference. The project centers on the idea that many practical agent tasks boil down to selecting the right sequence of predefined functions with correct inputs and ordering, rather than recalling general information. To enable this, the authors introduce a pipeline built around a function calling planner and a curated dataset that guides SLMs to generate executable plans rather than free-form text. Key to the approach is the LLMCompiler framework, which directs the model to output a function calling plan—identifying which functions to call, the required input arguments, and the interdependencies among calls. After producing the plan, the system parses it and executes the functions in dependency-respecting order. The original research question asks whether smaller, open-source models can be taught to perform reliable function calling with targeted data, thus achieving edge-ready reasoning and orchestration capabilities. The TinyAgent study demonstrates these ideas in a MacOS driving application context. The platform exposes 16 predefined functions that interface with macOS applications via predefined Apple scripts. The model’s task is to compose a correct function calling plan that uses those scripts to accomplish user objectives (for example, creating calendar items without exposing the user to generic Q&A responses). The authors also explore data generation and fine-tuning strategies to close the gap between small models and large, cloud-based baselines. Synthetic data is created with a capable model (for example GPT-4-Turbo) to produce realistic user queries, function calling plans, and input arguments, followed by sanity checks to ensure the resulting plans form valid graphs. The study reports the construction of 80k training examples, 1k validation examples, and 1k test examples, with an overall data-generation cost of about $500. A DAG-based evaluation metric checks plan isomorphism to ensure the generated plan aligns structurally with the ground-truth plan. The TinyAgent-1B model, paired with Whisper-v3 for on-device speech, is demonstrated on a MacBook M3 Pro. The framework is open source and available at the project’s GitHub repository. In short, TinyAgent advances a targeted, edge-friendly path to agentic AI: it prioritizes accurate function calling over world-knowledge memorization, uses an explicit function calling planner to manage tool orchestration, and demonstrates end-to-end edge deployment with a practical MacOS-driven use case.

Key features

  • Small, open-source language models capable of function calling when trained on curated data
  • LLMCompiler framework that outputs a function calling plan with functions, inputs, and dependencies
  • High-quality synthetic data generation workflow (80k training, 1k validation, 1k test) using a capable LLM, enabling task-specific planning
  • Fine-tuning pathway that can improve SLMs’ function calling performance beyond larger clouds-based baselines in this niche
  • Tool RAG method proposed to further improve efficiency and performance
  • Edge deployment focus: on-device inference for privacy-preserving, low-latency operation
  • 16 predefined macOS functions interface with Apple scripts for local tool orchestration
  • Demo setup with TinyAgent-1B and Whisper-v3 running locally on a MacBook M3 Pro
  • Open source availability at the official repository

Common use cases

  • Semantic edge assistants that interpret natural language queries and orchestrate a sequence of local tool calls (e.g., calendar, contacts, email) without exposing data to the cloud
  • Siri-like workflows where the agent translates user commands into precise API or script invocations (for example, creating calendar invites with specified attendees) using predefined scripts
  • Private, on-device automation scenarios where minimizing data exposure is critical
  • Edge-enabled agents that operate in environments with intermittent or no network connectivity while preserving responsiveness

Setup & installation

Setup and installation details are not specified in the provided source. The work is released as open source, and the repository is available at:

  • https://github.com/SqueezeAILab/TinyAgent For readers seeking practical steps, please consult the repository for installation instructions, dependencies, and usage examples. The source material emphasizes the architectural approach, data-generation workflow, and edge deployment use case rather than a step-by-step setup guide.
Setup and installation details are not specified in the provided source.
See the repository for instructions: https://github.com/SqueezeAILab/TinyAgent

Quick start (minimal runnable example)

The described workflow centers on translating a natural language request into a plan of function calls and then executing those calls in the correct order. A minimal, high-level quick start based on the described approach might look like this (pseudo-workflow):

  • A user provides a natural language command (for example, “Create a calendar invite with attendees A and B for the next Tuesday at 3 PM”).
  • The LLMCompiler-based planner outputs a function calling plan that lists which functions to invoke, their input arguments (e.g., attendee emails, event title, date/time), and the interdependencies among those calls.
  • The system executes each function in order, substituting actual values for placeholders (e.g., $1, $2) as results from prior calls become available.
  • The final result is a created calendar item, with the inputs validated against the predefined Apple scripts. Note: The above represents the intended workflow described in the source. It is not a runnable script provided in the material; refer to the TinyAgent repository for concrete implementation details, function definitions, and integration points with macOS.

Pros and cons

  • Pros
  • Privacy-preserving edge deployment by keeping inference locally on the device
  • Reduced dependence on cloud connectivity and potentially lower latency
  • Targeted learning through curated, high-quality function-calling data
  • Demonstrated capability to surpass a strong baseline in function calling for small models
  • Clear path to scalable edge agents via an explicit function orchestration paradigm
  • Cons (as reported or implied by the study)
  • Off-the-shelf small models exhibit limited function calling without task-specific fine-tuning
  • Requires curated, high-quality data and careful data generation to achieve robust plans
  • The current results focus on a MacOS-driven driving application with predefined 16 functions, potentially limiting generalizability to other domains or tool sets
  • Data-generation costs and process complexity (e.g., sanity checks, plan validation) are non-trivial in practice

Alternatives (brief comparisons)

  • Large LLMs with cloud inference (e.g., GPT-4/GPT-4o): provide strong function calling but raise privacy and connectivity concerns; often rely on vast parametric memory and cloud access
  • ToolFormer and Gorilla: examples cited as related approaches to enabling tool use via LLMs in agentic settings
  • LLaMA-2 70B-based approaches: prior work that considered function calling with large models; TinyAgent explores whether smaller models can reach similar capabilities with curated data and fine-tuning
  • Tool RAG: proposed optimization to improve efficiency and performance in the function-calling workflow

Pricing or License

  • The framework is described as open source; explicit licensing terms are not provided in the source material. Pricing is not specified.

References

More resources