What Is Function Calling?
Normally, an LLM receives text and produces text. Function calling (also called tool use) adds a structured layer: you tell the model what tools are available, and it can choose to output a JSON object that calls one of those tools instead of generating a plain text response.
Here is how the flow works:
- You define tools — JSON schemas describing functions the model can call (name, description, parameters)
- You send a message + tools — the model sees both the user message and the available tools
- Model decides — it either responds with text or generates a tool call with structured arguments
- Your code executes — you parse the tool call, run the actual function, and return the result
- Model incorporates result — the model uses the function output to generate its final response
This is what enables AI agents — autonomous systems that can chain multiple tool calls together to accomplish complex tasks.
Which Local Models Support Function Calling
Not all local models handle function calling well. According to LLMCheck benchmarks, these are the top performers in 2026:[LLMCheck]
| Model | Size | Tool Call Accuracy | RAM Needed | Notes |
|---|---|---|---|---|
| Gemma 4 27B (A4B MoE) | 27B (4B active) | 92% | ~16 GB | Native function calling, best accuracy |
| Qwen 3.5 35B | 35B | 89% | ~22 GB | Strong reasoning + tool use |
| Qwen 3.5 9B | 9B | 84% | ~6 GB | Best small model for tools |
| Llama 4 Scout | 17B (A4B MoE) | 86% | ~6 GB | Good balance of speed + accuracy |
| Mistral Small 3.2 | 24B | 82% | ~15 GB | Decent but less reliable on complex calls |
Step 1: Pull a Function-Calling Model
Install Ollama if you have not already (installation guide), then pull a model with strong function calling support:
# Best overall function calling (needs 16 GB RAM)
ollama pull gemma4:27b
# Best for 16 GB Macs
ollama pull qwen3.5:9b
# Good balance on 32 GB+ Macs
ollama pull qwen3.5:35b
Step 2: Define Your Tools
Tools are defined as JSON schemas that describe what each function does and what parameters it accepts. Here are two example tools — a weather lookup and a calculator:
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "The city name, e.g. San Francisco"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "calculate",
"description": "Evaluate a mathematical expression",
"parameters": {
"type": "object",
"properties": {
"expression": {
"type": "string",
"description": "Math expression, e.g. '(4 + 5) * 3'"
}
},
"required": ["expression"]
}
}
}
]
Step 3: Call the Ollama API with Tools
Using curl
curl http://localhost:11434/api/chat -d '{
"model": "gemma4:27b",
"messages": [
{"role": "user", "content": "What is the weather in Tokyo?"}
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {"type": "string"},
"unit": {"type": "string", "enum": ["celsius","fahrenheit"]}
},
"required": ["city"]
}
}
}
],
"stream": false
}'
Using Python
import ollama
response = ollama.chat(
model="gemma4:27b",
messages=[
{"role": "user", "content": "What is the weather in Tokyo?"}
],
tools=tools
)
# Check if the model wants to call a tool
message = response["message"]
if message.get("tool_calls"):
tool_call = message["tool_calls"][0]
print(f"Function: {tool_call['function']['name']}")
print(f"Arguments: {tool_call['function']['arguments']}")
else:
print(message["content"])
Key insight: The model does not actually execute the function — it only generates the JSON describing which function to call and with what arguments. Your code is responsible for the actual execution. This means you have full control over what happens.
Step 4: Execute and Return Results
After receiving a tool call, execute the function and feed the result back to the model:
import ollama
# Your actual function implementations
def get_weather(city, unit="celsius"):
# In a real app, this would call a weather API
return {"city": city, "temp": 22, "unit": unit, "condition": "sunny"}
def calculate(expression):
return {"result": eval(expression)}
# Map function names to implementations
available_functions = {
"get_weather": get_weather,
"calculate": calculate,
}
# Initial request
messages = [{"role": "user", "content": "What is the weather in Tokyo?"}]
response = ollama.chat(model="gemma4:27b", messages=messages, tools=tools)
# Process tool calls
if response["message"].get("tool_calls"):
# Add the assistant's tool call to the conversation
messages.append(response["message"])
for tool_call in response["message"]["tool_calls"]:
func_name = tool_call["function"]["name"]
func_args = tool_call["function"]["arguments"]
# Execute the function
result = available_functions[func_name](**func_args)
# Add the tool result to the conversation
messages.append({
"role": "tool",
"content": str(result)
})
# Let the model generate a final response using the tool result
final = ollama.chat(model="gemma4:27b", messages=messages, tools=tools)
print(final["message"]["content"])
Step 5: Build a Multi-Tool Agent
A true agent loops until the model stops calling tools. According to LLMCheck testing, this pattern handles complex multi-step tasks reliably with Gemma 4 and Qwen 3.5:[LLMCheck]
import ollama
def agent_loop(user_message, model="gemma4:27b", max_steps=10):
messages = [{"role": "user", "content": user_message}]
for step in range(max_steps):
response = ollama.chat(
model=model,
messages=messages,
tools=tools
)
message = response["message"]
messages.append(message)
# If no tool calls, we have the final answer
if not message.get("tool_calls"):
return message["content"]
# Execute each tool call
for tool_call in message["tool_calls"]:
func_name = tool_call["function"]["name"]
func_args = tool_call["function"]["arguments"]
if func_name in available_functions:
result = available_functions[func_name](**func_args)
else:
result = {"error": f"Unknown function: {func_name}"}
messages.append({
"role": "tool",
"content": str(result)
})
return "Agent reached maximum steps without completing."
# Example: multi-step query
answer = agent_loop("What is 15% of the temperature in Tokyo in Fahrenheit?")
print(answer)
Model Comparison: Function Calling Quality
| Capability | Gemma 4 27B | Qwen 3.5 9B | Llama 4 Scout |
|---|---|---|---|
| Single tool call | Excellent | Good | Good |
| Multi-tool chains | Very good | Decent | Decent |
| Parameter extraction | 95%+ accuracy | 88% accuracy | 85% accuracy |
| Nested/complex args | Reliable | Sometimes fails | Sometimes fails |
| Deciding when NOT to call | Very good | Good | Fair |
| Speed (M4 Max) | 45 tok/s | 62 tok/s | 58 tok/s |
| RAM required | ~16 GB | ~6 GB | ~6 GB |
The Privacy Advantage
When you use function calling through cloud APIs like OpenAI, every tool definition, every argument, and every result gets sent to external servers. That means your function names, your database schemas, your API parameters — all visible to third parties.
With local function calling via Ollama, according to LLMCheck, everything stays on your Mac:
- Tool definitions — your function schemas never leave your machine
- Arguments — the model's structured output stays local
- Results — your function outputs are never transmitted externally
- No API keys needed — you do not need accounts with any cloud provider
This makes local function calling ideal for building agents that interact with sensitive systems — internal databases, financial APIs, personal file systems, or proprietary business logic.