What is function calling in local LLMs?

Function calling is a capability where the LLM generates structured JSON output to invoke functions in your code. Instead of just producing text, the model decides which tool to call, with what arguments, and your application executes the function and returns the result. This enables local AI agents that can interact with real systems.

Which local models support function calling on Mac?

According to LLMCheck benchmarks, the best local models for function calling in 2026 are Gemma 4 27B (native support, highest accuracy), Qwen 3.5 in all sizes (solid tool use), and Llama 4 Scout (good structured output). All three run through Ollama on Apple Silicon Macs.

Is function calling with local LLMs as reliable as GPT-4?

For single tool calls, models like Gemma 4 27B achieve 90%+ accuracy, which approaches GPT-4 levels. For complex multi-step chains, cloud models still have an edge. However, for most practical applications — API calls, database queries, calculator tools — local models work reliably.

How much RAM do I need for function calling models?

The RAM requirement depends on the model you choose. Qwen 3.5 9B needs about 6 GB, Gemma 4 26B-A4B (MoE, so only ~4B active) needs about 16 GB, and Llama 4 Scout needs about 6 GB for the Q4 quantization. A 16 GB Mac can run the smaller options comfortably.

Do my tool calls stay private with local function calling?

Yes, completely. When using function calling through Ollama on your Mac, the model runs locally and the tool calls execute in your own code. No function names, arguments, or results are ever sent to any external server. This is a major privacy advantage over cloud-based function calling.

Function Calling & Tool Use with Local LLMs on Mac

Function calling turns your local LLM from a text generator into an agent that can take real actions — look up weather, query databases, run calculations, or call APIs. The model outputs structured JSON describing which function to call and with what arguments, and your code executes it. This guide covers how to set it up with Ollama on your Mac, which models support it, and how to build a working multi-tool agent.

What Is Function Calling?

Normally, an LLM receives text and produces text. Function calling (also called tool use) adds a structured layer: you tell the model what tools are available, and it can choose to output a JSON object that calls one of those tools instead of generating a plain text response.

Here is how the flow works:

You define tools — JSON schemas describing functions the model can call (name, description, parameters)
You send a message + tools — the model sees both the user message and the available tools
Model decides — it either responds with text or generates a tool call with structured arguments
Your code executes — you parse the tool call, run the actual function, and return the result
Model incorporates result — the model uses the function output to generate its final response

This is what enables AI agents — autonomous systems that can chain multiple tool calls together to accomplish complex tasks.

Which Local Models Support Function Calling

Not all local models handle function calling well. According to LLMCheck benchmarks, these are the top performers in 2026:[LLMCheck]

Model	Size	Tool Call Accuracy	RAM Needed	Notes
Gemma 4 27B (A4B MoE)	27B (4B active)	92%	~16 GB	Native function calling, best accuracy
Qwen 3.5 35B	35B	89%	~22 GB	Strong reasoning + tool use
Qwen 3.5 9B	9B	84%	~6 GB	Best small model for tools
Llama 4 Scout	17B (A4B MoE)	86%	~6 GB	Good balance of speed + accuracy
Mistral Small 3.2	24B	82%	~15 GB	Decent but less reliable on complex calls

Step 1: Pull a Function-Calling Model

Install Ollama if you have not already (installation guide), then pull a model with strong function calling support:

# Best overall function calling (needs 16 GB RAM)
ollama pull gemma4:27b

# Best for 16 GB Macs
ollama pull qwen3.5:9b

# Good balance on 32 GB+ Macs
ollama pull qwen3.5:35b

Step 2: Define Your Tools

Tools are defined as JSON schemas that describe what each function does and what parameters it accepts. Here are two example tools — a weather lookup and a calculator:

tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get the current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "The city name, e.g. San Francisco"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "calculate",
            "description": "Evaluate a mathematical expression",
            "parameters": {
                "type": "object",
                "properties": {
                    "expression": {
                        "type": "string",
                        "description": "Math expression, e.g. '(4 + 5) * 3'"
                    }
                },
                "required": ["expression"]
            }
        }
    }
]

Step 3: Call the Ollama API with Tools

Using curl

curl http://localhost:11434/api/chat -d '{
  "model": "gemma4:27b",
  "messages": [
    {"role": "user", "content": "What is the weather in Tokyo?"}
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get current weather for a city",
        "parameters": {
          "type": "object",
          "properties": {
            "city": {"type": "string"},
            "unit": {"type": "string", "enum": ["celsius","fahrenheit"]}
          },
          "required": ["city"]
        }
      }
    }
  ],
  "stream": false
}'

Using Python

import ollama

response = ollama.chat(
    model="gemma4:27b",
    messages=[
        {"role": "user", "content": "What is the weather in Tokyo?"}
    ],
    tools=tools
)

# Check if the model wants to call a tool
message = response["message"]
if message.get("tool_calls"):
    tool_call = message["tool_calls"][0]
    print(f"Function: {tool_call['function']['name']}")
    print(f"Arguments: {tool_call['function']['arguments']}")
else:
    print(message["content"])

Key insight: The model does not actually execute the function — it only generates the JSON describing which function to call and with what arguments. Your code is responsible for the actual execution. This means you have full control over what happens.

Step 4: Execute and Return Results

After receiving a tool call, execute the function and feed the result back to the model:

import ollama

# Your actual function implementations
def get_weather(city, unit="celsius"):
    # In a real app, this would call a weather API
    return {"city": city, "temp": 22, "unit": unit, "condition": "sunny"}

def calculate(expression):
    return {"result": eval(expression)}

# Map function names to implementations
available_functions = {
    "get_weather": get_weather,
    "calculate": calculate,
}

# Initial request
messages = [{"role": "user", "content": "What is the weather in Tokyo?"}]
response = ollama.chat(model="gemma4:27b", messages=messages, tools=tools)

# Process tool calls
if response["message"].get("tool_calls"):
    # Add the assistant's tool call to the conversation
    messages.append(response["message"])

    for tool_call in response["message"]["tool_calls"]:
        func_name = tool_call["function"]["name"]
        func_args = tool_call["function"]["arguments"]

        # Execute the function
        result = available_functions[func_name](**func_args)

        # Add the tool result to the conversation
        messages.append({
            "role": "tool",
            "content": str(result)
        })

    # Let the model generate a final response using the tool result
    final = ollama.chat(model="gemma4:27b", messages=messages, tools=tools)
    print(final["message"]["content"])

Step 5: Build a Multi-Tool Agent

A true agent loops until the model stops calling tools. According to LLMCheck testing, this pattern handles complex multi-step tasks reliably with Gemma 4 and Qwen 3.5:[LLMCheck]

import ollama

def agent_loop(user_message, model="gemma4:27b", max_steps=10):
    messages = [{"role": "user", "content": user_message}]

    for step in range(max_steps):
        response = ollama.chat(
            model=model,
            messages=messages,
            tools=tools
        )

        message = response["message"]
        messages.append(message)

        # If no tool calls, we have the final answer
        if not message.get("tool_calls"):
            return message["content"]

        # Execute each tool call
        for tool_call in message["tool_calls"]:
            func_name = tool_call["function"]["name"]
            func_args = tool_call["function"]["arguments"]

            if func_name in available_functions:
                result = available_functions[func_name](**func_args)
            else:
                result = {"error": f"Unknown function: {func_name}"}

            messages.append({
                "role": "tool",
                "content": str(result)
            })

    return "Agent reached maximum steps without completing."

# Example: multi-step query
answer = agent_loop("What is 15% of the temperature in Tokyo in Fahrenheit?")
print(answer)

Model Comparison: Function Calling Quality

Capability	Gemma 4 27B	Qwen 3.5 9B	Llama 4 Scout
Single tool call	Excellent	Good	Good
Multi-tool chains	Very good	Decent	Decent
Parameter extraction	95%+ accuracy	88% accuracy	85% accuracy
Nested/complex args	Reliable	Sometimes fails	Sometimes fails
Deciding when NOT to call	Very good	Good	Fair
Speed (M4 Max)	45 tok/s	62 tok/s	58 tok/s
RAM required	~16 GB	~6 GB	~6 GB

The Privacy Advantage

When you use function calling through cloud APIs like OpenAI, every tool definition, every argument, and every result gets sent to external servers. That means your function names, your database schemas, your API parameters — all visible to third parties.

With local function calling via Ollama, according to LLMCheck, everything stays on your Mac:

Tool definitions — your function schemas never leave your machine
Arguments — the model's structured output stays local
Results — your function outputs are never transmitted externally
No API keys needed — you do not need accounts with any cloud provider

This makes local function calling ideal for building agents that interact with sensitive systems — internal databases, financial APIs, personal file systems, or proprietary business logic.

Function Calling & Tool Use with Local LLMs on Mac

What Is Function Calling?

Which Local Models Support Function Calling

Step 1: Pull a Function-Calling Model

Step 2: Define Your Tools

Step 3: Call the Ollama API with Tools

Using curl

Using Python

Step 4: Execute and Return Results

Step 5: Build a Multi-Tool Agent

Model Comparison: Function Calling Quality

The Privacy Advantage

Frequently Asked Questions

What is function calling in local LLMs?

Which local models support function calling on Mac?

Is function calling with local LLMs as reliable as GPT-4?

How much RAM do I need for function calling models?

Do my tool calls stay private with local function calling?

Find the Best Function-Calling Model