TinyAgent: Edge Function Calling for Small Language Models

Overview

TinyAgent presents a pathway to deploy capable language-model agents on edge devices by focusing on function calling rather than broad world-knowledge memorization. Large models such as GPT-4o or Gemini-1.5 offer strong cloud-based capabilities but incur privacy, connectivity, and latency challenges when deployed at scale on the edge. TinyAgent argues that by training small language models (SLMs) on specialized, high-quality data that emphasizes function calling and tool orchestration, it is possible to achieve robust, real-time edge performance without relying on remote inference. The project centers on the idea that many practical agent tasks boil down to selecting the right sequence of predefined functions with correct inputs and ordering, rather than recalling general information. To enable this, the authors introduce a pipeline built around a function calling planner and a curated dataset that guides SLMs to generate executable plans rather than free-form text. Key to the approach is the LLMCompiler framework, which directs the model to output a function calling plan—identifying which functions to call, the required input arguments, and the interdependencies among calls. After producing the plan, the system parses it and executes the functions in dependency-respecting order. The original research question asks whether smaller, open-source models can be taught to perform reliable function calling with targeted data, thus achieving edge-ready reasoning and orchestration capabilities. The TinyAgent study demonstrates these ideas in a MacOS driving application context. The platform exposes 16 predefined functions that interface with macOS applications via predefined Apple scripts. The model’s task is to compose a correct function calling plan that uses those scripts to accomplish user objectives (for example, creating calendar items without exposing the user to generic Q&A responses). The authors also explore data generation and fine-tuning strategies to close the gap between small models and large, cloud-based baselines. Synthetic data is created with a capable model (for example GPT-4-Turbo) to produce realistic user queries, function calling plans, and input arguments, followed by sanity checks to ensure the resulting plans form valid graphs. The study reports the construction of 80k training examples, 1k validation examples, and 1k test examples, with an overall data-generation cost of about $500. A DAG-based evaluation metric checks plan isomorphism to ensure the generated plan aligns structurally with the ground-truth plan. The TinyAgent-1B model, paired with Whisper-v3 for on-device speech, is demonstrated on a MacBook M3 Pro. The framework is open source and available at the project’s GitHub repository. In short, TinyAgent advances a targeted, edge-friendly path to agentic AI: it prioritizes accurate function calling over world-knowledge memorization, uses an explicit function calling planner to manage tool orchestration, and demonstrates end-to-end edge deployment with a practical MacOS-driven use case.

Key features

Small, open-source language models capable of function calling when trained on curated data
LLMCompiler framework that outputs a function calling plan with functions, inputs, and dependencies
High-quality synthetic data generation workflow (80k training, 1k validation, 1k test) using a capable LLM, enabling task-specific planning
Fine-tuning pathway that can improve SLMs’ function calling performance beyond larger clouds-based baselines in this niche
Tool RAG method proposed to further improve efficiency and performance
Edge deployment focus: on-device inference for privacy-preserving, low-latency operation
16 predefined macOS functions interface with Apple scripts for local tool orchestration
Demo setup with TinyAgent-1B and Whisper-v3 running locally on a MacBook M3 Pro
Open source availability at the official repository

Common use cases

Semantic edge assistants that interpret natural language queries and orchestrate a sequence of local tool calls (e.g., calendar, contacts, email) without exposing data to the cloud
Siri-like workflows where the agent translates user commands into precise API or script invocations (for example, creating calendar invites with specified attendees) using predefined scripts
Private, on-device automation scenarios where minimizing data exposure is critical
Edge-enabled agents that operate in environments with intermittent or no network connectivity while preserving responsiveness

Setup & installation

Setup and installation details are not specified in the provided source. The work is released as open source, and the repository is available at:

https://github.com/SqueezeAILab/TinyAgent For readers seeking practical steps, please consult the repository for installation instructions, dependencies, and usage examples. The source material emphasizes the architectural approach, data-generation workflow, and edge deployment use case rather than a step-by-step setup guide.

Setup and installation details are not specified in the provided source.
See the repository for instructions: https://github.com/SqueezeAILab/TinyAgent

Quick start (minimal runnable example)

The described workflow centers on translating a natural language request into a plan of function calls and then executing those calls in the correct order. A minimal, high-level quick start based on the described approach might look like this (pseudo-workflow):

A user provides a natural language command (for example, “Create a calendar invite with attendees A and B for the next Tuesday at 3 PM”).
The LLMCompiler-based planner outputs a function calling plan that lists which functions to invoke, their input arguments (e.g., attendee emails, event title, date/time), and the interdependencies among those calls.
The system executes each function in order, substituting actual values for placeholders (e.g., $1, $2) as results from prior calls become available.
The final result is a created calendar item, with the inputs validated against the predefined Apple scripts. Note: The above represents the intended workflow described in the source. It is not a runnable script provided in the material; refer to the TinyAgent repository for concrete implementation details, function definitions, and integration points with macOS.

Pros and cons

Pros
Privacy-preserving edge deployment by keeping inference locally on the device
Reduced dependence on cloud connectivity and potentially lower latency
Targeted learning through curated, high-quality function-calling data
Demonstrated capability to surpass a strong baseline in function calling for small models
Clear path to scalable edge agents via an explicit function orchestration paradigm
Cons (as reported or implied by the study)
Off-the-shelf small models exhibit limited function calling without task-specific fine-tuning
Requires curated, high-quality data and careful data generation to achieve robust plans
The current results focus on a MacOS-driven driving application with predefined 16 functions, potentially limiting generalizability to other domains or tool sets
Data-generation costs and process complexity (e.g., sanity checks, plan validation) are non-trivial in practice

Alternatives (brief comparisons)

Large LLMs with cloud inference (e.g., GPT-4/GPT-4o): provide strong function calling but raise privacy and connectivity concerns; often rely on vast parametric memory and cloud access
ToolFormer and Gorilla: examples cited as related approaches to enabling tool use via LLMs in agentic settings
LLaMA-2 70B-based approaches: prior work that considered function calling with large models; TinyAgent explores whether smaller models can reach similar capabilities with curated data and fine-tuning
Tool RAG: proposed optimization to improve efficiency and performance in the function-calling workflow