What AI Agents Are

A chatbot takes a message and returns a response. An AI agent takes a goal and works toward it autonomously — deciding what tools to use, what information to gather, when to iterate, and when the task is complete.

The key capabilities that distinguish agents from simple chatbots:

Until recently, building agents required cloud APIs like GPT-4 or Claude. In 2026, local models have crossed the capability threshold where on-device agents are genuinely useful.

Agent Frameworks That Run Locally

CrewAI

CrewAI is the most beginner-friendly agent framework. It uses a "crew" metaphor where you define agents with specific roles and tasks, and they collaborate to achieve a goal. It has first-class Ollama support:

pip install crewai crewai-tools
from crewai import Agent, Task, Crew
from langchain_ollama import ChatOllama

llm = ChatOllama(model="gemma4:27b", base_url="http://localhost:11434")

researcher = Agent(
    role="Research Analyst",
    goal="Find and summarize key information from documents",
    backstory="Expert at analyzing documents and extracting insights.",
    llm=llm,
    verbose=True
)

task = Task(
    description="Analyze the quarterly report and identify the top 3 risks.",
    expected_output="A bullet-point list of the top 3 risks with brief explanations.",
    agent=researcher
)

crew = Crew(agents=[researcher], tasks=[task], verbose=True)
result = crew.kickoff()
print(result)

AutoGen

Microsoft's AutoGen excels at multi-agent conversations and code generation. Agents can discuss, debate, and iterate on solutions. It connects to Ollama through its OpenAI-compatible API:

pip install autogen-agentchat
from autogen import AssistantAgent, UserProxyAgent

config_list = [{
    "model": "gemma4:27b",
    "base_url": "http://localhost:11434/v1",
    "api_key": "ollama"  # Ollama doesn't need a real key
}]

assistant = AssistantAgent(
    name="analyst",
    llm_config={"config_list": config_list}
)

user_proxy = UserProxyAgent(
    name="user",
    human_input_mode="NEVER",
    code_execution_config={"work_dir": "workspace"}
)

user_proxy.initiate_chat(
    assistant,
    message="Write a Python script that analyzes CSV sales data and creates a chart."
)

LangGraph

LangGraph from LangChain provides the most control over agent behavior through a graph-based workflow system. You define nodes (actions) and edges (transitions) to create complex, stateful agent pipelines:

pip install langgraph langchain-ollama
from langgraph.graph import StateGraph, END
from langchain_ollama import ChatOllama
from typing import TypedDict, Annotated

llm = ChatOllama(model="qwen3.5:35b")

class AgentState(TypedDict):
    messages: list
    next_action: str

def research_node(state):
    # Agent searches for information
    response = llm.invoke(state["messages"])
    return {"messages": state["messages"] + [response]}

def decide_next(state):
    # Decide whether to continue or finish
    last_message = state["messages"][-1].content
    if "DONE" in last_message:
        return "end"
    return "research"

graph = StateGraph(AgentState)
graph.add_node("research", research_node)
graph.add_conditional_edges("research", decide_next, {
    "research": "research",
    "end": END
})
graph.set_entry_point("research")
app = graph.compile()

Which Models Work Best for Agents

Agent performance depends heavily on the underlying model's ability to follow instructions, use tools reliably, and reason through multi-step problems. According to LLMCheck benchmarks:[LLMCheck]

ModelAgent StrengthRAMSpeed (M4 Max)Best For
Gemma 4 26B-A4BFunction calling (92%)~16 GB45 tok/sTool-heavy agents
Qwen 3.5 35BReasoning depth~22 GB32 tok/sResearch/analysis agents
Qwen 3.5 9BGood all-round~6 GB62 tok/sFast agents on 16 GB Macs
Llama 4 ScoutBalanced~6 GB58 tok/sGeneral-purpose agents

Key insight: For agents, function calling accuracy matters more than raw speed. A model that correctly identifies and formats tool calls 92% of the time (Gemma 4) produces far better agent behavior than a faster model that fails 20% of tool calls.

Example: Build a Research Agent

Here is a complete, working research agent that searches local documents and produces a summary. It uses CrewAI with Ollama, according to LLMCheck the most accessible starting point for local agents:[LLMCheck]

from crewai import Agent, Task, Crew
from crewai_tools import DirectoryReadTool, FileReadTool
from langchain_ollama import ChatOllama

# Connect to local Ollama
llm = ChatOllama(model="gemma4:27b", base_url="http://localhost:11434")

# Define tools for reading local files
docs_tool = DirectoryReadTool(directory="./research_docs")
file_tool = FileReadTool()

# Research agent
researcher = Agent(
    role="Research Analyst",
    goal="Thoroughly analyze documents and extract key findings",
    backstory="""You are an expert research analyst. You read documents
    carefully, identify key themes, and produce clear summaries.""",
    tools=[docs_tool, file_tool],
    llm=llm,
    verbose=True
)

# Summary agent
writer = Agent(
    role="Technical Writer",
    goal="Produce clear, well-structured summaries",
    backstory="""You take research findings and create polished
    summaries with key insights and recommendations.""",
    llm=llm,
    verbose=True
)

# Tasks
research_task = Task(
    description="""Read all documents in the research folder.
    Identify the top 5 key findings across all documents.
    Note any contradictions or gaps in the research.""",
    expected_output="A detailed list of findings with supporting quotes.",
    agent=researcher
)

summary_task = Task(
    description="""Using the research findings, create an executive
    summary with: 1) Key findings, 2) Implications, 3) Recommendations.""",
    expected_output="A polished executive summary in markdown format.",
    agent=writer,
    context=[research_task]
)

# Run the crew
crew = Crew(
    agents=[researcher, writer],
    tasks=[research_task, summary_task],
    verbose=True
)
result = crew.kickoff()
print(result)

This agent will autonomously read every document in the research folder, identify key themes, and produce a structured executive summary — all running on your Mac with no internet connection.

CrewAI vs AutoGen vs LangGraph

FeatureCrewAIAutoGenLangGraph
Learning curveEasyModerateSteep
Multi-agent supportRole-based crewsConversation-basedGraph-based
Tool integrationBuilt-in tool libraryFunction decoratorsLangChain ecosystem
Ollama supportNativeVia OpenAI compatNative
Code executionLimitedBuilt-in sandboxCustom nodes
Workflow controlSequential/parallelConversation flowFull graph control
State managementBasicConversation historyCustom state schemas
Best use caseSimple agent teamsCode gen + debatesComplex pipelines

Privacy and the On-Device Advantage

When you run agents through cloud APIs, every single step is visible to the provider — your tool definitions, your document contents, your database queries, the agent's reasoning chain. With local agents:

According to LLMCheck, this makes local agents ideal for: legal document analysis, medical record processing, financial data review, proprietary code analysis, and any workflow involving sensitive business data.

RAM Requirements Per Framework

SetupLLM RAMFramework OverheadTotalMin Mac
CrewAI + Qwen 3.5 9B~6 GB~300 MB~6.3 GB16 GB Mac
AutoGen + Qwen 3.5 9B~6 GB~500 MB~6.5 GB16 GB Mac
LangGraph + Qwen 3.5 9B~6 GB~400 MB~6.4 GB16 GB Mac
CrewAI + Gemma 4 27B~16 GB~300 MB~16.3 GB24 GB Mac
CrewAI + Qwen 3.5 35B~22 GB~300 MB~22.3 GB36 GB Mac

Note: Multi-agent setups in all three frameworks reuse the same Ollama model instance. Running a crew of 3 agents does not require 3x the RAM — they share the single loaded model and take turns generating responses.