When Did Qwen 4 Release?
Qwen 4 released in June 2026. Alibaba launched the generation with three models at once — the flagship Qwen 4 32B-A3B, the coding-specialized Qwen 4 Coder 32B-A3B, and the lightweight Qwen 4 4B — then shipped a point-release refresh, Qwen 4.1 32B-A3B, in July 2026. The refresh is the one to grab today: according to LLMCheck benchmarks it now holds the number-one spot among Mac-runnable models.
Here is the release timeline at a glance:
| Model | Released | Role | License |
|---|---|---|---|
| Qwen 4 32B-A3B | June 2026 | Flagship general model | Apache 2.0 |
| Qwen 4 Coder 32B-A3B | June 2026 | Coding specialist | Apache 2.0 |
| Qwen 4 4B | June 2026 | Lightweight / 8 GB Mac | Apache 2.0 |
| Qwen 4.1 32B-A3B | July 2026 | Refresh — current Mac #1 | Apache 2.0 |
According to LLMCheck benchmarks, the July refresh Qwen 4.1 32B-A3B carries an LLMCheck Score of 80 — the highest of any model that runs comfortably on consumer Apple Silicon, making it the current Mac-runnable champion.
The Full Qwen 4 Family
The defining trait across the whole family is the mixture-of-experts (MoE) design. The 32B models activate only 3 billion parameters per token — the "A3B" suffix — which is why a 32B-class model can hit ~60 tok/s on a laptop. Most of the weights sit idle on any given token, so generation speed tracks closer to a 3B model while quality tracks the full 32B. Here is each variant, what it is for, and how to install it.
Qwen 4 32B-A3B — the flagship
The general-purpose anchor of the family. Hybrid reasoning (it can switch between fast direct answers and deeper step-by-step thinking), 78% SWE-Verified, and roughly 60 tok/s on a 24 GB Mac at Q4. It needs ~24 GB of unified memory to run comfortably.
ollama run qwen4
Qwen 4 Coder 32B-A3B — the best Mac-runnable coder
The coding specialist, tuned for software engineering and fill-in-the-middle completion. It posts 82% SWE-Verified — the best of any Mac-runnable coder — making it the standout pick for a local coding assistant or agentic dev workflow. Same ~24 GB memory footprint as the flagship.
ollama run qwen4-coder
Qwen 4 4B — the lightweight
A compact dense model for tighter machines. It runs at about 135 tok/s and fits comfortably on an 8 GB Mac, making it ideal for autocomplete, quick chat, and on-device assistants where speed and a small footprint matter more than frontier capability.
ollama run qwen4:4b
Qwen 4.1 32B-A3B — the refresh and current #1
The July 2026 point release. It nudges SWE-Verified up to 80%, runs slightly faster at ~62 tok/s on a 24 GB Mac, and earns an LLMCheck Score of 80 — the current Mac #1. Same 32B-A3B architecture, drop-in compatible with anything built for Qwen 4.
ollama run qwen4.1
What's New vs Qwen 3.6
Qwen 4 keeps the efficient 32B-A3B MoE layout that made Qwen 3.6 such a good Mac fit, but layers on two big changes: hybrid reasoning and a substantial coding jump. The previous generation, Qwen 3.6-35B-A3B, scored 73.4% on SWE-Verified; Qwen 4 base reaches 78% and Qwen 4 Coder hits 82%. Long-context handling and tool-call formatting also tightened up across the board.
| Spec | Qwen 4 32B-A3B | Qwen 4 Coder | Qwen 3.6-35B-A3B |
|---|---|---|---|
| SWE-Verified | 78% | 82% | 73.4% |
| Architecture | 32B-A3B MoE | 32B-A3B MoE | 35B-A3B MoE |
| Hybrid reasoning | Yes | Yes | No |
| tok/s (24 GB Mac) | ~60 | ~60 | ~58 |
| License | Apache 2.0 | Apache 2.0 | Apache 2.0 |
The headline is that +4.6 percentage points of SWE-Verified on the base model and +8.6 on Coder are the kind of jump you actually feel in agentic coding — the margin between a multi-file patch that applies cleanly and one that needs hand-fixing. Hybrid reasoning is the other standout: Qwen 4 can answer simple prompts instantly but spend extra tokens "thinking" on hard math or code, which is why AIME and SWE scores climbed without tanking everyday latency.
How to Run Qwen 4 on a Mac
Every Qwen 4 model is one command away via Ollama, and because the 32B variants are MoE models that activate just 3B parameters per token, they run genuinely well on Apple Silicon. The simplest path:
ollama run qwen4.1 # current Mac #1
ollama run qwen4-coder # best local coder
ollama run qwen4:4b # 8 GB machines
For a step-by-step walkthrough — picking a quant, raising the context window, and squeezing extra tok/s out of the unified-memory path — see our dedicated guide on how to run Qwen 4.1 on a Mac. If your goal specifically is a coding assistant in your editor, the local AI coding assistant on Mac guide covers wiring Qwen 4 Coder into your IDE. Not sure your hardware is up to it? The best Macs for local LLMs page maps each model to the chip and RAM tier that runs it well.
In LM Studio, search the model name and pick a Q4_K_M quant for the best quality-to-speed balance on Apple Silicon. MLX users will find community-converted builds that shave a few extra tok/s out of the unified-memory path. Whichever runtime you choose, the large context window is available out of the box — just raise the context length in your runner's settings to use it.
Which Variant Should You Pick?
It comes down to your RAM and your workload:
- 8 GB Mac — Run Qwen 4 4B. At ~135 tok/s it is blisteringly fast for chat, autocomplete, and light assistant tasks, and it is the only family member that fits this tier with headroom.
- 24–32 GB Mac, general use — Run Qwen 4.1 32B-A3B. It is the current Mac #1 (LLMCheck Score 80), runs at ~62 tok/s, and handles everything from reasoning to drafting. This is the default recommendation for most people.
- 24–32 GB Mac, coding — Run Qwen 4 Coder 32B-A3B. Its 82% SWE-Verified leads every Mac-runnable coder, making it the pick for agentic dev and IDE assistants.
- Already on Qwen 4 base — It remains excellent (78% SWE-Verified) and fully compatible. Re-pull Qwen 4.1 when convenient for the small capability and speed bump, but there is no urgency for casual use.
Pick Qwen 4.1 if…
You have 24 GB+ and want the best all-round Mac model available. LLMCheck Score 80, ~62 tok/s, 80% SWE-Verified, Apache 2.0. It is the current number one for a reason and the safe default for nearly everyone.
Pick a specialist if…
You code all day (Qwen 4 Coder, 82% SWE-Verified) or you are on an 8 GB machine (Qwen 4 4B, ~135 tok/s). The family is built so you can match the exact variant to your hardware and your job.