Is Qwen 3.6 35B-A3B the best local coding LLM for Mac?

According to the LLMCheck index, yes. Qwen 3.6 35B-A3B scores 73.4% on SWE-bench Verified, the highest of any model that runs locally on Apple Silicon. It earns a LLMCheck Score of 69, surpassing the previous leader Gemma 4 26B-A4B at 67. It fits in ~20GB RAM at Q4 quantization, making it practical on 24GB Macs.

How much RAM does Qwen 3.6 35B-A3B need on Mac?

At Q4 quantization, Qwen 3.6 35B-A3B requires approximately 20 GB of RAM. This means it runs comfortably on any Mac with 24 GB or more of unified memory, including the M4 Pro MacBook Pro and M5 Max Mac Studio.

How fast is Qwen 3.6 35B-A3B on Apple Silicon?

Based on LLMCheck estimates, Qwen 3.6 35B-A3B generates approximately 32 tok/s on M4 Pro, 44 tok/s on M4 Max, and 52 tok/s on M5 Max via MLX. The MoE architecture activates only 3B of the 35B total parameters per token, keeping inference fast despite the large total parameter count.

Can I run Qwen 3.6 35B-A3B with Ollama on Mac?

Yes. You can install and run Qwen 3.6 35B-A3B with a single Ollama command: ollama run qwen3.6:35b-a3b. This automatically downloads the Q4 quantized version. You can also run it via MLX using mlx_lm.generate with the MLX Community weights for potentially faster inference on Apple Silicon.

How does Qwen 3.6 35B-A3B compare to Gemma 4 26B-A4B?

Qwen 3.6 35B-A3B beats Gemma 4 26B-A4B on coding benchmarks — 73.4% vs 52.1% on SWE-bench Verified, 92.1% vs 72.0% on HumanEval. However, Gemma 4 26B-A4B uses less RAM (~18GB vs ~20GB), supports multimodal input (images and audio), and has native function calling. Qwen 3.6 wins on coding; Gemma 4 wins on versatility.

Qwen 3.6-35B-A3B on Mac: The New #1 Local LLM for Coding

Alibaba's Qwen team just dropped Qwen 3.6 35B-A3B, and it has immediately claimed the top spot on the LLMCheck leaderboard. With a 73.4% SWE-bench Verified score that beats every other locally-runnable model, a 262K context window extendable to one million tokens, and an Apache 2.0 license, this mixture-of-experts model is a landmark release for developers running AI on Apple Silicon. Here is everything you need to know to get it running on your Mac today.

Why Qwen 3.6 Is the New #1

The LLMCheck Score combines capability, speed, accessibility, and license openness into a single 0-100 ranking. Qwen 3.6 35B-A3B earns a 69, overtaking the previous leader Gemma 4 26B-A4B at 67. The two-point gain comes almost entirely from capability: this model's coding benchmarks are in a different league from anything else you can run locally.

The headline number is 73.4% on SWE-bench Verified -- a benchmark that measures a model's ability to resolve real GitHub issues from popular open-source projects. For context, Gemma 4 31B scores 52.1% and the previous Qwen 3 30B-A3B managed 49.8%. Qwen 3.6 35B-A3B does not just beat the local competition; it narrows the gap to cloud-only frontier models that cost hundreds of dollars per month to access.

Simon Willison noted that Qwen 3.6 even beat Claude Opus at certain drawing and visual generation tasks, suggesting the model's capabilities extend well beyond code completion into creative and spatial reasoning.

Key takeaway: Qwen 3.6 35B-A3B is the first local model to cross 70% on SWE-bench Verified. If you write code on a Mac and want AI assistance without cloud dependency, this is the model to install right now.

Architecture & Specs

Qwen 3.6 35B-A3B uses a mixture-of-experts (MoE) architecture with 35 billion total parameters but only 3 billion active per forward pass. This design gives it the knowledge capacity of a much larger model while keeping inference costs comparable to a small dense model. Here are the key specs:

Total parameters: 35B (MoE)
Active parameters: 3B per token
Context window: 262K tokens, extendable to 1M with YaRN scaling
License: Apache 2.0 (fully open, commercial use allowed)
RAM at Q4: ~20 GB (fits on 24 GB Macs)
Thinking mode: Toggle between fast generation and deep chain-of-thought reasoning
Training data: Multilingual, with strong English and Chinese coverage

The 262K native context is already generous, but the architecture supports extension to 1M tokens using YaRN positional encoding. This makes Qwen 3.6 35B-A3B viable for entire-codebase analysis, long document processing, and multi-file code review workflows that would choke smaller-context models.

Benchmark Results

According to the LLMCheck index, here is how Qwen 3.6 35B-A3B stacks up against the top local models on Apple Silicon:

Metric	Qwen 3.6 35B-A3B	Gemma 4 26B-A4B	Gemma 4 31B	Qwen 3 30B-A3B
LLMCheck Score	69	67	64	58
SWE-bench Verified	73.4%	52.1%	52.1%	49.8%
HumanEval	92.1%	72.0%	78.5%	85.4%
MMLU	82.6%	78.4%	83.2%	79.5%
RAM (Q4)	~20 GB	~18 GB	~20 GB	~20 GB
Speed (M4 Max)	~44 tok/s	~48 tok/s	~24 tok/s	~45 tok/s
Context	262K (1M ext.)	256K	256K	128K
License	Apache 2.0	Apache 2.0	Apache 2.0	Apache 2.0

The coding gap is massive: Qwen 3.6 35B-A3B scores 21 percentage points higher than Gemma 4 26B-A4B on SWE-bench Verified and 20 points higher on HumanEval. For developers, this translates to noticeably better code generation, bug fixing, and refactoring suggestions.

How to Run It on Mac

Getting Qwen 3.6 35B-A3B running on your Mac takes about two minutes. You need at least 24 GB of unified memory.

Option 1: Ollama (Easiest)

If you already have Ollama installed, a single command pulls and runs the model:

ollama run qwen3.6:35b-a3b

This downloads the Q4_K_M quantized version (~20 GB). To use thinking mode for harder problems:

ollama run qwen3.6:35b-a3b "Think step by step: how would you refactor this function to reduce cyclomatic complexity?"

Option 2: MLX (Fastest on Apple Silicon)

MLX is Apple's machine learning framework optimized for unified memory. It typically delivers 10-20% faster inference than Ollama on the same hardware:

pip install mlx-lm
mlx_lm.generate --model mlx-community/Qwen3.6-35B-A3B-4bit --prompt "Write a Python function that..."

For interactive chat sessions with MLX:

mlx_lm.chat --model mlx-community/Qwen3.6-35B-A3B-4bit

Option 3: LM Studio

Open LM Studio, search for "Qwen 3.6 35B A3B" in the model browser, and download the Q4_K_M GGUF variant. LM Studio provides a polished chat interface and an OpenAI-compatible API server for integration with other tools.

Apple Silicon Performance

According to LLMCheck estimates, here is the expected token generation speed for Qwen 3.6 35B-A3B across recent Apple Silicon chips using MLX with Q4 quantization:

Chip	RAM	Est. tok/s	Usable?
M4 Pro (24 GB)	24 GB	~32	Yes (tight fit)
M4 Pro (48 GB)	48 GB	~34	Yes (comfortable)
M4 Max (48 GB)	48 GB	~44	Yes (fast)
M4 Max (128 GB)	128 GB	~46	Yes (fast)
M5 Max (48 GB)	48 GB	~52	Yes (excellent)

On a 24 GB M4 Pro, the model uses roughly 20 GB of your 24 GB unified memory, leaving about 4 GB for the OS and other apps. It works, but you will want to close memory-heavy applications. On 48 GB or higher machines, the model runs with plenty of headroom for long context windows and concurrent tasks.

Sweet spot: The M4 Max with 48 GB delivers the best balance of speed (~44 tok/s) and headroom for the 262K context window. On M5 Max, expect roughly 52 tok/s -- fast enough for real-time pair programming.

The Verdict

Qwen 3.6 35B-A3B is a generational leap for local coding AI on Mac. Its 73.4% SWE-bench Verified score makes it the first locally-runnable model to genuinely compete with cloud APIs on real-world software engineering tasks. The MoE architecture keeps it fast and RAM-efficient enough to run on a 24 GB Mac, and the Apache 2.0 license means zero restrictions on commercial use.

Why it is #1

73.4% SWE-bench Verified. 92.1% HumanEval. LLMCheck Score 69. 262K context (1M extended). Apache 2.0 license. Only ~20 GB RAM at Q4. Thinking mode toggle for deep reasoning.

Where Gemma 4 still wins

Native multimodal input (images + audio). Built-in function calling for agentic workflows. Arena #6 overall chat ranking. Slightly lower RAM at the MoE tier (~18 GB). Better for non-coding general-purpose tasks.

According to LLMCheck, developers who primarily use local AI for code generation, debugging, and refactoring should switch to Qwen 3.6 35B-A3B immediately. For general-purpose chat, multimodal tasks, and function calling workflows, Gemma 4 remains the stronger choice. The ideal setup for power users is both: Qwen 3.6 for coding, Gemma 4 for everything else.

Qwen 3.6-35B-A3B on Mac: The New #1 Local LLM for Coding

Why Qwen 3.6 Is the New #1

Architecture & Specs

Benchmark Results

How to Run It on Mac

Option 1: Ollama (Easiest)

Option 2: MLX (Fastest on Apple Silicon)

Option 3: LM Studio

Apple Silicon Performance

The Verdict

Why it is #1

Where Gemma 4 still wins

Frequently Asked Questions

Is Qwen 3.6 35B-A3B the best local coding LLM for Mac?

How much RAM does Qwen 3.6 35B-A3B need on Mac?

How fast is Qwen 3.6 35B-A3B on Apple Silicon?

Can I run Qwen 3.6 35B-A3B with Ollama on Mac?

How does Qwen 3.6 35B-A3B compare to Gemma 4 26B-A4B?

Sources & References

See Where Qwen 3.6 Ranks on Your Mac