What's the best local model for coding on Mac?

According to LLMCheck benchmarks, Qwen 4 Coder 32B-A3B is the best Mac-runnable open coding model, scoring 82% on SWE-Verified. Its mixture-of-experts design (32B total / 3B active) means it fits in ~18 GB on a 24 GB Mac while running at roughly 58 tok/s on an M4 Pro.

Can I use Cursor with a local model?

Yes. Cursor lets you override the model provider with a custom OpenAI-compatible base URL. Point it at Ollama's local endpoint at http://localhost:11434/v1 and select your Qwen 4 Coder model. Note that some Cursor features still route through its cloud, so Continue.dev or Zed are better choices for a fully offline setup.

Continue.dev vs Zed for local AI?

Continue.dev is a VS Code (and JetBrains) extension with the deepest local-model controls — autocomplete, chat, edit, and agent mode, all configurable per model. Zed has fast, built-in AI baked into a native Mac editor with less setup. Choose Continue.dev if you live in VS Code; choose Zed if you want a lightweight, native editor with AI included.

How much RAM do I need for a local coding assistant?

For Qwen 4 Coder 32B-A3B you want a 24 GB Mac, since the model uses ~18 GB at Q4 plus headroom for your editor and context. On a 16 GB Mac, drop to a smaller coder model like Qwen 4 Coder 7B. On 32 GB+ you can raise the context window for whole-file and multi-file edits.

Is a local assistant as good as GitHub Copilot or GPT-5?

For autocomplete and most day-to-day coding, Qwen 4 Coder is competitive with Copilot and close to frontier cloud models, at 82% SWE-Verified. The biggest cloud models (GPT-5, DeepSeek V4 Pro) still lead on the hardest agentic tasks, but a local assistant wins on privacy, cost (free), and offline use — your code never leaves your Mac.

How to Build a Local AI Coding Assistant on Mac (2026) — Qwen 4 Coder + Continue.dev

You can build a private, offline AI coding assistant on your Mac — autocomplete, inline chat, and agent-style edits — with zero cloud, zero subscription, and zero code leaving your machine. The stack is simple: Ollama serving Qwen 4 Coder locally, plus an editor integration like Continue.dev, Zed, or Cursor. This guide walks you through the whole thing.

The stack at a glance

A local coding assistant has two pieces: a model server that runs the LLM on your hardware, and an editor integration that feeds it your code and shows you suggestions. We use Ollama for the server and Qwen 4 Coder 32B-A3B for the model — Alibaba's Apache 2.0 coding model that, according to LLMCheck benchmarks, hits 82% on SWE-Verified, the best of any open model you can run on a Mac.[1]

Like Qwen 4.1, it is a mixture-of-experts model: 32B total parameters, 3B active per token. That is what lets a near-frontier coder fit in ~18 GB and run at ~58 tok/s on an M4 Pro. For the editor, you have three good choices, and this guide covers all of them: Continue.dev (VS Code), Zed (native Mac editor with built-in AI), and Cursor (pointed at a custom endpoint).

Editor	Best for	Offline?
Continue.dev	Deepest local-model control in VS Code	Fully local
Zed	Fast native editor, least setup	Fully local
Cursor	Already a Cursor user	Mostly (some cloud features)

Step 1: Hardware Check

Qwen 4 Coder 32B-A3B needs about 18 GB of unified memory at Q4, plus headroom for your editor and the code context you feed it. That makes a 24 GB Mac the recommended minimum. Check yours under Apple menu → About This Mac → Memory.

24 GB+ — ideal. Run the full 32B-A3B model.
16 GB — drop to a smaller coder such as qwen4-coder:7b, which fits comfortably and is still strong for autocomplete and routine edits.
32 GB+ — raise the context window for whole-file and multi-file work.

Buying a Mac for local coding? A 24–32 GB M4 Pro or M5 is the value sweet spot. Our Mac hardware buying hub ranks configurations by real-world tok/s per dollar so you don't overspend on memory you won't use — or under-buy and stall.

Step 2: Install Ollama & Pull Qwen 4 Coder

Install Ollama from ollama.com (full walkthrough in our Install Ollama on Mac guide), then pull the coding model:

ollama pull qwen4-coder

This downloads the Q4 build (~18 GB). To confirm it is ready and check that Ollama's local server is serving it:

ollama list          # should show qwen4-coder
ollama serve         # starts the API at http://localhost:11434 (usually already running)

Ollama normally runs its server automatically in the background, so ollama serve is only needed if the API isn't already up. You can do a quick smoke test:

curl http://localhost:11434/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{"model":"qwen4-coder","messages":[{"role":"user","content":"Write a Python one-liner to flatten a nested list."}]}'

Step 3: Install Continue.dev (or Zed / Cursor)

Pick the editor that fits how you work. The rest of this guide uses Continue.dev as the primary path because it offers the most control, with Zed and Cursor notes alongside.

Continue.dev (VS Code)

Open VS Code, go to the Extensions panel (Cmd+Shift+X), search for Continue, and click Install. A Continue icon appears in your sidebar. It also works in JetBrains IDEs via their plugin marketplace.

Zed

Download Zed from zed.dev. AI is built in — no extension needed. You'll configure its Ollama provider in the next step.

Cursor

If you already use Cursor, you can keep it and point it at your local model. Open Settings → Models, enable a custom OpenAI-compatible base URL, and you'll wire it up in Step 4.

Step 4: Point the Extension at Your Local Model

Continue.dev config

Open the Continue config (click the gear in the Continue sidebar, or edit ~/.continue/config.yaml) and add Qwen 4 Coder as both your chat model and your autocomplete model:

models:
  - name: Qwen 4 Coder (local)
    provider: ollama
    model: qwen4-coder
    apiBase: http://localhost:11434
    roles:
      - chat
      - edit
      - apply

  - name: Qwen 4 Coder Autocomplete
    provider: ollama
    model: qwen4-coder
    apiBase: http://localhost:11434
    roles:
      - autocomplete

Save the file. Continue picks up the change immediately — you'll see "Qwen 4 Coder (local)" in the model dropdown.

Zed config

In Zed, open settings.json (Cmd+,) and register Ollama as a language-model provider:

{
  "language_models": {
    "ollama": {
      "api_url": "http://localhost:11434"
    }
  },
  "assistant": {
    "default_model": {
      "provider": "ollama",
      "model": "qwen4-coder"
    }
  }
}

Cursor config

In Cursor's Settings → Models, add a custom model with the OpenAI-compatible base URL pointing at Ollama:

Base URL:  http://localhost:11434/v1
API Key:   ollama        (any non-empty string works locally)
Model:     qwen4-coder

Heads-up on Cursor: some Cursor features (like its tab autocomplete and indexing) still route through Cursor's cloud even with a custom model. For a guaranteed fully-offline assistant, prefer Continue.dev or Zed.

Step 5: Use It — Autocomplete, Chat & Agent Mode

With the model wired in, you now have a full coding assistant running on-device. Three things to try:

Tab autocomplete — start typing a function and Qwen 4 Coder suggests the rest inline. Press Tab to accept. Great for boilerplate, tests, and repetitive patterns.
Inline chat & edit — select code and press Cmd+I (Continue) to ask for a refactor, a bug fix, or an explanation. The model rewrites the selection in place and shows a diff you can accept or reject.
Agent mode — in Continue's chat, switch to Agent and give a higher-level task ("add input validation to this endpoint and a test for it"). It reads relevant files, proposes multi-file changes, and applies them with your approval.

A good first prompt to feel it out, with a file open:

Refactor this function to use early returns and add a docstring.
Then write a pytest test that covers the edge cases.

Everything here runs through your local Ollama server — no network calls, no data sent to any vendor, and it keeps working on a plane or behind a firewall.

Tip: For autocomplete that stays snappy, some teams pair a small fast model (e.g. qwen4-coder:7b) for tab completion with the 32B model for chat and agent mode. Continue.dev lets you assign different models per role, exactly as shown in Step 4.

Step 6: Tips & Scaling Beyond Your Mac

Give it more context on bigger Macs

The more of your codebase the model can see, the better its edits. On 32 GB+ Macs, raise the context window so it can hold larger files and more surrounding code. With Ollama you can bake a larger window into a custom model:

# Modelfile
FROM qwen4-coder
PARAMETER num_ctx 32768

ollama create qwen4-coder-32k -f Modelfile
# then reference qwen4-coder-32k in your editor config

On a 24 GB Mac, keep context moderate (8K–16K) so the model doesn't swap and slow down.

When you need a coder bigger than your Mac

Some 2026 frontier coders — DeepSeek V4 Pro, Kimi K3 — are simply too large for any Mac's unified memory. When a hard task outruns what Qwen 4 Coder can do locally, the practical option is to rent a GPU by the hour and run the big model there, keeping the same Ollama/OpenAI-compatible workflow — just point your editor at the remote endpoint instead of localhost.

A cost-effective place to do this is Vast.ai, a marketplace for on-demand GPUs where an H100 or 80GB card runs a few dollars an hour — far cheaper than buying hardware for an occasional frontier-model task.

Disclosure: the Vast.ai link is a referral; if you sign up through it, LLMCheck may earn a small credit at no extra cost to you. We only recommend it because renting beats buying for occasional big-model jobs.

You're set. You now have a private coding assistant — autocomplete, chat, and agent edits — running entirely on your Mac, with a clear path to rent extra horsepower only when you actually need it.

How to Build a Local AI Coding Assistant on Mac (2026)

The stack at a glance

Step 1: Hardware Check

Step 2: Install Ollama & Pull Qwen 4 Coder

Step 3: Install Continue.dev (or Zed / Cursor)

Continue.dev (VS Code)

Zed

Cursor

Step 4: Point the Extension at Your Local Model

Continue.dev config

Zed config

Cursor config

Step 5: Use It — Autocomplete, Chat & Agent Mode

Step 6: Tips & Scaling Beyond Your Mac

Give it more context on bigger Macs

When you need a coder bigger than your Mac

Frequently Asked Questions

What's the best local model for coding on Mac?

Can I use Cursor with a local model?

Continue.dev vs Zed for local AI?

How much RAM do I need for a local coding assistant?

Is a local assistant as good as GitHub Copilot or GPT-5?

Find the Best Model for Your Mac