What AI Agents Are
A chatbot takes a message and returns a response. An AI agent takes a goal and works toward it autonomously — deciding what tools to use, what information to gather, when to iterate, and when the task is complete.
The key capabilities that distinguish agents from simple chatbots:
- Tool use — agents call functions to interact with the real world (search files, query databases, run code, call APIs)
- Planning — agents break complex tasks into steps and decide the order of execution
- Memory — agents maintain context across multiple tool calls and reasoning steps
- Autonomy — agents operate with minimal human intervention, making decisions at each step
- Iteration — agents can evaluate their own output and retry if the result is unsatisfactory
Until recently, building agents required cloud APIs like GPT-4 or Claude. In 2026, local models have crossed the capability threshold where on-device agents are genuinely useful.
Agent Frameworks That Run Locally
CrewAI
CrewAI is the most beginner-friendly agent framework. It uses a "crew" metaphor where you define agents with specific roles and tasks, and they collaborate to achieve a goal. It has first-class Ollama support:
pip install crewai crewai-tools
from crewai import Agent, Task, Crew
from langchain_ollama import ChatOllama
llm = ChatOllama(model="gemma4:27b", base_url="http://localhost:11434")
researcher = Agent(
role="Research Analyst",
goal="Find and summarize key information from documents",
backstory="Expert at analyzing documents and extracting insights.",
llm=llm,
verbose=True
)
task = Task(
description="Analyze the quarterly report and identify the top 3 risks.",
expected_output="A bullet-point list of the top 3 risks with brief explanations.",
agent=researcher
)
crew = Crew(agents=[researcher], tasks=[task], verbose=True)
result = crew.kickoff()
print(result)
AutoGen
Microsoft's AutoGen excels at multi-agent conversations and code generation. Agents can discuss, debate, and iterate on solutions. It connects to Ollama through its OpenAI-compatible API:
pip install autogen-agentchat
from autogen import AssistantAgent, UserProxyAgent
config_list = [{
"model": "gemma4:27b",
"base_url": "http://localhost:11434/v1",
"api_key": "ollama" # Ollama doesn't need a real key
}]
assistant = AssistantAgent(
name="analyst",
llm_config={"config_list": config_list}
)
user_proxy = UserProxyAgent(
name="user",
human_input_mode="NEVER",
code_execution_config={"work_dir": "workspace"}
)
user_proxy.initiate_chat(
assistant,
message="Write a Python script that analyzes CSV sales data and creates a chart."
)
LangGraph
LangGraph from LangChain provides the most control over agent behavior through a graph-based workflow system. You define nodes (actions) and edges (transitions) to create complex, stateful agent pipelines:
pip install langgraph langchain-ollama
from langgraph.graph import StateGraph, END
from langchain_ollama import ChatOllama
from typing import TypedDict, Annotated
llm = ChatOllama(model="qwen3.5:35b")
class AgentState(TypedDict):
messages: list
next_action: str
def research_node(state):
# Agent searches for information
response = llm.invoke(state["messages"])
return {"messages": state["messages"] + [response]}
def decide_next(state):
# Decide whether to continue or finish
last_message = state["messages"][-1].content
if "DONE" in last_message:
return "end"
return "research"
graph = StateGraph(AgentState)
graph.add_node("research", research_node)
graph.add_conditional_edges("research", decide_next, {
"research": "research",
"end": END
})
graph.set_entry_point("research")
app = graph.compile()
Which Models Work Best for Agents
Agent performance depends heavily on the underlying model's ability to follow instructions, use tools reliably, and reason through multi-step problems. According to LLMCheck benchmarks:[LLMCheck]
| Model | Agent Strength | RAM | Speed (M4 Max) | Best For |
|---|---|---|---|---|
| Gemma 4 26B-A4B | Function calling (92%) | ~16 GB | 45 tok/s | Tool-heavy agents |
| Qwen 3.5 35B | Reasoning depth | ~22 GB | 32 tok/s | Research/analysis agents |
| Qwen 3.5 9B | Good all-round | ~6 GB | 62 tok/s | Fast agents on 16 GB Macs |
| Llama 4 Scout | Balanced | ~6 GB | 58 tok/s | General-purpose agents |
Key insight: For agents, function calling accuracy matters more than raw speed. A model that correctly identifies and formats tool calls 92% of the time (Gemma 4) produces far better agent behavior than a faster model that fails 20% of tool calls.
Example: Build a Research Agent
Here is a complete, working research agent that searches local documents and produces a summary. It uses CrewAI with Ollama, according to LLMCheck the most accessible starting point for local agents:[LLMCheck]
from crewai import Agent, Task, Crew
from crewai_tools import DirectoryReadTool, FileReadTool
from langchain_ollama import ChatOllama
# Connect to local Ollama
llm = ChatOllama(model="gemma4:27b", base_url="http://localhost:11434")
# Define tools for reading local files
docs_tool = DirectoryReadTool(directory="./research_docs")
file_tool = FileReadTool()
# Research agent
researcher = Agent(
role="Research Analyst",
goal="Thoroughly analyze documents and extract key findings",
backstory="""You are an expert research analyst. You read documents
carefully, identify key themes, and produce clear summaries.""",
tools=[docs_tool, file_tool],
llm=llm,
verbose=True
)
# Summary agent
writer = Agent(
role="Technical Writer",
goal="Produce clear, well-structured summaries",
backstory="""You take research findings and create polished
summaries with key insights and recommendations.""",
llm=llm,
verbose=True
)
# Tasks
research_task = Task(
description="""Read all documents in the research folder.
Identify the top 5 key findings across all documents.
Note any contradictions or gaps in the research.""",
expected_output="A detailed list of findings with supporting quotes.",
agent=researcher
)
summary_task = Task(
description="""Using the research findings, create an executive
summary with: 1) Key findings, 2) Implications, 3) Recommendations.""",
expected_output="A polished executive summary in markdown format.",
agent=writer,
context=[research_task]
)
# Run the crew
crew = Crew(
agents=[researcher, writer],
tasks=[research_task, summary_task],
verbose=True
)
result = crew.kickoff()
print(result)
This agent will autonomously read every document in the research folder, identify key themes, and produce a structured executive summary — all running on your Mac with no internet connection.
CrewAI vs AutoGen vs LangGraph
| Feature | CrewAI | AutoGen | LangGraph |
|---|---|---|---|
| Learning curve | Easy | Moderate | Steep |
| Multi-agent support | Role-based crews | Conversation-based | Graph-based |
| Tool integration | Built-in tool library | Function decorators | LangChain ecosystem |
| Ollama support | Native | Via OpenAI compat | Native |
| Code execution | Limited | Built-in sandbox | Custom nodes |
| Workflow control | Sequential/parallel | Conversation flow | Full graph control |
| State management | Basic | Conversation history | Custom state schemas |
| Best use case | Simple agent teams | Code gen + debates | Complex pipelines |
Privacy and the On-Device Advantage
When you run agents through cloud APIs, every single step is visible to the provider — your tool definitions, your document contents, your database queries, the agent's reasoning chain. With local agents:
- Documents stay local — the agent reads your files directly from disk, never uploading them
- Reasoning is private — the model's chain-of-thought processing happens on your GPU
- No API keys needed — you do not need accounts with OpenAI, Anthropic, or anyone else
- No usage limits — run as many agent tasks as you want, no rate limits or token quotas
- Works offline — once models are downloaded, agents run without internet
According to LLMCheck, this makes local agents ideal for: legal document analysis, medical record processing, financial data review, proprietary code analysis, and any workflow involving sensitive business data.
RAM Requirements Per Framework
| Setup | LLM RAM | Framework Overhead | Total | Min Mac |
|---|---|---|---|---|
| CrewAI + Qwen 3.5 9B | ~6 GB | ~300 MB | ~6.3 GB | 16 GB Mac |
| AutoGen + Qwen 3.5 9B | ~6 GB | ~500 MB | ~6.5 GB | 16 GB Mac |
| LangGraph + Qwen 3.5 9B | ~6 GB | ~400 MB | ~6.4 GB | 16 GB Mac |
| CrewAI + Gemma 4 27B | ~16 GB | ~300 MB | ~16.3 GB | 24 GB Mac |
| CrewAI + Qwen 3.5 35B | ~22 GB | ~300 MB | ~22.3 GB | 36 GB Mac |
Note: Multi-agent setups in all three frameworks reuse the same Ollama model instance. Running a crew of 3 agents does not require 3x the RAM — they share the single loaded model and take turns generating responses.