Why Qwen 3.6 Is the New #1
The LLMCheck Score combines capability, speed, accessibility, and license openness into a single 0-100 ranking. Qwen 3.6 35B-A3B earns a 69, overtaking the previous leader Gemma 4 26B-A4B at 67. The two-point gain comes almost entirely from capability: this model's coding benchmarks are in a different league from anything else you can run locally.
The headline number is 73.4% on SWE-bench Verified -- a benchmark that measures a model's ability to resolve real GitHub issues from popular open-source projects. For context, Gemma 4 31B scores 52.1% and the previous Qwen 3 30B-A3B managed 49.8%. Qwen 3.6 35B-A3B does not just beat the local competition; it narrows the gap to cloud-only frontier models that cost hundreds of dollars per month to access.
Simon Willison noted that Qwen 3.6 even beat Claude Opus at certain drawing and visual generation tasks, suggesting the model's capabilities extend well beyond code completion into creative and spatial reasoning.
Key takeaway: Qwen 3.6 35B-A3B is the first local model to cross 70% on SWE-bench Verified. If you write code on a Mac and want AI assistance without cloud dependency, this is the model to install right now.
Architecture & Specs
Qwen 3.6 35B-A3B uses a mixture-of-experts (MoE) architecture with 35 billion total parameters but only 3 billion active per forward pass. This design gives it the knowledge capacity of a much larger model while keeping inference costs comparable to a small dense model. Here are the key specs:
- Total parameters: 35B (MoE)
- Active parameters: 3B per token
- Context window: 262K tokens, extendable to 1M with YaRN scaling
- License: Apache 2.0 (fully open, commercial use allowed)
- RAM at Q4: ~20 GB (fits on 24 GB Macs)
- Thinking mode: Toggle between fast generation and deep chain-of-thought reasoning
- Training data: Multilingual, with strong English and Chinese coverage
The 262K native context is already generous, but the architecture supports extension to 1M tokens using YaRN positional encoding. This makes Qwen 3.6 35B-A3B viable for entire-codebase analysis, long document processing, and multi-file code review workflows that would choke smaller-context models.
Benchmark Results
According to LLMCheck benchmarks, here is how Qwen 3.6 35B-A3B stacks up against the top local models on Apple Silicon:
| Metric | Qwen 3.6 35B-A3B | Gemma 4 26B-A4B | Gemma 4 31B | Qwen 3 30B-A3B |
|---|---|---|---|---|
| LLMCheck Score | 69 | 67 | 64 | 58 |
| SWE-bench Verified | 73.4% | 52.1% | 52.1% | 49.8% |
| HumanEval | 92.1% | 72.0% | 78.5% | 85.4% |
| MMLU | 82.6% | 78.4% | 83.2% | 79.5% |
| RAM (Q4) | ~20 GB | ~18 GB | ~20 GB | ~20 GB |
| Speed (M4 Max) | ~44 tok/s | ~48 tok/s | ~24 tok/s | ~45 tok/s |
| Context | 262K (1M ext.) | 256K | 256K | 128K |
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 | Apache 2.0 |
The coding gap is massive: Qwen 3.6 35B-A3B scores 21 percentage points higher than Gemma 4 26B-A4B on SWE-bench Verified and 20 points higher on HumanEval. For developers, this translates to noticeably better code generation, bug fixing, and refactoring suggestions.
How to Run It on Mac
Getting Qwen 3.6 35B-A3B running on your Mac takes about two minutes. You need at least 24 GB of unified memory.
Option 1: Ollama (Easiest)
If you already have Ollama installed, a single command pulls and runs the model:
ollama run qwen3.6:35b-a3bThis downloads the Q4_K_M quantized version (~20 GB). To use thinking mode for harder problems:
ollama run qwen3.6:35b-a3b "Think step by step: how would you refactor this function to reduce cyclomatic complexity?"Option 2: MLX (Fastest on Apple Silicon)
MLX is Apple's machine learning framework optimized for unified memory. It typically delivers 10-20% faster inference than Ollama on the same hardware:
pip install mlx-lm
mlx_lm.generate --model mlx-community/Qwen3.6-35B-A3B-4bit --prompt "Write a Python function that..."For interactive chat sessions with MLX:
mlx_lm.chat --model mlx-community/Qwen3.6-35B-A3B-4bitOption 3: LM Studio
Open LM Studio, search for "Qwen 3.6 35B A3B" in the model browser, and download the Q4_K_M GGUF variant. LM Studio provides a polished chat interface and an OpenAI-compatible API server for integration with other tools.
Apple Silicon Performance
According to LLMCheck estimates, here is the expected token generation speed for Qwen 3.6 35B-A3B across recent Apple Silicon chips using MLX with Q4 quantization:
| Chip | RAM | Est. tok/s | Usable? |
|---|---|---|---|
| M4 Pro (24 GB) | 24 GB | ~32 | Yes (tight fit) |
| M4 Pro (48 GB) | 48 GB | ~34 | Yes (comfortable) |
| M4 Max (48 GB) | 48 GB | ~44 | Yes (fast) |
| M4 Max (128 GB) | 128 GB | ~46 | Yes (fast) |
| M5 Max (48 GB) | 48 GB | ~52 | Yes (excellent) |
On a 24 GB M4 Pro, the model uses roughly 20 GB of your 24 GB unified memory, leaving about 4 GB for the OS and other apps. It works, but you will want to close memory-heavy applications. On 48 GB or higher machines, the model runs with plenty of headroom for long context windows and concurrent tasks.
Sweet spot: The M4 Max with 48 GB delivers the best balance of speed (~44 tok/s) and headroom for the 262K context window. On M5 Max, expect roughly 52 tok/s -- fast enough for real-time pair programming.
The Verdict
Qwen 3.6 35B-A3B is a generational leap for local coding AI on Mac. Its 73.4% SWE-bench Verified score makes it the first locally-runnable model to genuinely compete with cloud APIs on real-world software engineering tasks. The MoE architecture keeps it fast and RAM-efficient enough to run on a 24 GB Mac, and the Apache 2.0 license means zero restrictions on commercial use.
Why it is #1
73.4% SWE-bench Verified. 92.1% HumanEval. LLMCheck Score 69. 262K context (1M extended). Apache 2.0 license. Only ~20 GB RAM at Q4. Thinking mode toggle for deep reasoning.
Where Gemma 4 still wins
Native multimodal input (images + audio). Built-in function calling for agentic workflows. Arena #6 overall chat ranking. Slightly lower RAM at the MoE tier (~18 GB). Better for non-coding general-purpose tasks.
According to LLMCheck, developers who primarily use local AI for code generation, debugging, and refactoring should switch to Qwen 3.6 35B-A3B immediately. For general-purpose chat, multimodal tasks, and function calling workflows, Gemma 4 remains the stronger choice. The ideal setup for power users is both: Qwen 3.6 for coding, Gemma 4 for everything else.