What Is Function Calling?

Normally, an LLM receives text and produces text. Function calling (also called tool use) adds a structured layer: you tell the model what tools are available, and it can choose to output a JSON object that calls one of those tools instead of generating a plain text response.

Here is how the flow works:

  1. You define tools — JSON schemas describing functions the model can call (name, description, parameters)
  2. You send a message + tools — the model sees both the user message and the available tools
  3. Model decides — it either responds with text or generates a tool call with structured arguments
  4. Your code executes — you parse the tool call, run the actual function, and return the result
  5. Model incorporates result — the model uses the function output to generate its final response

This is what enables AI agents — autonomous systems that can chain multiple tool calls together to accomplish complex tasks.

Which Local Models Support Function Calling

Not all local models handle function calling well. According to LLMCheck benchmarks, these are the top performers in 2026:[LLMCheck]

ModelSizeTool Call AccuracyRAM NeededNotes
Gemma 4 27B (A4B MoE)27B (4B active)92%~16 GBNative function calling, best accuracy
Qwen 3.5 35B35B89%~22 GBStrong reasoning + tool use
Qwen 3.5 9B9B84%~6 GBBest small model for tools
Llama 4 Scout17B (A4B MoE)86%~6 GBGood balance of speed + accuracy
Mistral Small 3.224B82%~15 GBDecent but less reliable on complex calls

Step 1: Pull a Function-Calling Model

Install Ollama if you have not already (installation guide), then pull a model with strong function calling support:

# Best overall function calling (needs 16 GB RAM)
ollama pull gemma4:27b

# Best for 16 GB Macs
ollama pull qwen3.5:9b

# Good balance on 32 GB+ Macs
ollama pull qwen3.5:35b

Step 2: Define Your Tools

Tools are defined as JSON schemas that describe what each function does and what parameters it accepts. Here are two example tools — a weather lookup and a calculator:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. San Francisco"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Evaluate a mathematical expression",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Math expression, e.g. '(4 + 5) * 3'"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

Step 3: Call the Ollama API with Tools

Using curl

curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:27b",
  "messages": [
    {"role": "user", "content": "What is the weather in Tokyo?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string"},
            "unit": {"type": "string", "enum": ["celsius","fahrenheit"]}
          },
          "required": ["city"]
        }
      }
    }
  ],
  "stream": false
}'

Using Python

import ollama

response = ollama.chat(
    model="gemma4:27b",
    messages=[
        {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    tools=tools
)

# Check if the model wants to call a tool
message = response["message"]
if message.get("tool_calls"):
    tool_call = message["tool_calls"][0]
    print(f"Function: {tool_call['function']['name']}")
    print(f"Arguments: {tool_call['function']['arguments']}")
else:
    print(message["content"])

Key insight: The model does not actually execute the function — it only generates the JSON describing which function to call and with what arguments. Your code is responsible for the actual execution. This means you have full control over what happens.

Step 4: Execute and Return Results

After receiving a tool call, execute the function and feed the result back to the model:

import ollama

# Your actual function implementations
def get_weather(city, unit="celsius"):
    # In a real app, this would call a weather API
    return {"city": city, "temp": 22, "unit": unit, "condition": "sunny"}

def calculate(expression):
    return {"result": eval(expression)}

# Map function names to implementations
available_functions = {
    "get_weather": get_weather,
    "calculate": calculate,
}

# Initial request
messages = [{"role": "user", "content": "What is the weather in Tokyo?"}]
response = ollama.chat(model="gemma4:27b", messages=messages, tools=tools)

# Process tool calls
if response["message"].get("tool_calls"):
    # Add the assistant's tool call to the conversation
    messages.append(response["message"])

    for tool_call in response["message"]["tool_calls"]:
        func_name = tool_call["function"]["name"]
        func_args = tool_call["function"]["arguments"]

        # Execute the function
        result = available_functions[func_name](**func_args)

        # Add the tool result to the conversation
        messages.append({
            "role": "tool",
            "content": str(result)
        })

    # Let the model generate a final response using the tool result
    final = ollama.chat(model="gemma4:27b", messages=messages, tools=tools)
    print(final["message"]["content"])

Step 5: Build a Multi-Tool Agent

A true agent loops until the model stops calling tools. According to LLMCheck testing, this pattern handles complex multi-step tasks reliably with Gemma 4 and Qwen 3.5:[LLMCheck]

import ollama

def agent_loop(user_message, model="gemma4:27b", max_steps=10):
    messages = [{"role": "user", "content": user_message}]

    for step in range(max_steps):
        response = ollama.chat(
            model=model,
            messages=messages,
            tools=tools
        )

        message = response["message"]
        messages.append(message)

        # If no tool calls, we have the final answer
        if not message.get("tool_calls"):
            return message["content"]

        # Execute each tool call
        for tool_call in message["tool_calls"]:
            func_name = tool_call["function"]["name"]
            func_args = tool_call["function"]["arguments"]

            if func_name in available_functions:
                result = available_functions[func_name](**func_args)
            else:
                result = {"error": f"Unknown function: {func_name}"}

            messages.append({
                "role": "tool",
                "content": str(result)
            })

    return "Agent reached maximum steps without completing."

# Example: multi-step query
answer = agent_loop("What is 15% of the temperature in Tokyo in Fahrenheit?")
print(answer)

Model Comparison: Function Calling Quality

CapabilityGemma 4 27BQwen 3.5 9BLlama 4 Scout
Single tool callExcellentGoodGood
Multi-tool chainsVery goodDecentDecent
Parameter extraction95%+ accuracy88% accuracy85% accuracy
Nested/complex argsReliableSometimes failsSometimes fails
Deciding when NOT to callVery goodGoodFair
Speed (M4 Max)45 tok/s62 tok/s58 tok/s
RAM required~16 GB~6 GB~6 GB

The Privacy Advantage

When you use function calling through cloud APIs like OpenAI, every tool definition, every argument, and every result gets sent to external servers. That means your function names, your database schemas, your API parameters — all visible to third parties.

With local function calling via Ollama, according to LLMCheck, everything stays on your Mac:

This makes local function calling ideal for building agents that interact with sensitive systems — internal databases, financial APIs, personal file systems, or proprietary business logic.