When Did Qwen 4 Release?

Qwen 4 released in June 2026. Alibaba launched the generation with three models at once — the flagship Qwen 4 32B-A3B, the coding-specialized Qwen 4 Coder 32B-A3B, and the lightweight Qwen 4 4B — then shipped a point-release refresh, Qwen 4.1 32B-A3B, in July 2026. The refresh is the one to grab today: according to LLMCheck benchmarks it now holds the number-one spot among Mac-runnable models.

Here is the release timeline at a glance:

Model Released Role License
Qwen 4 32B-A3B June 2026 Flagship general model Apache 2.0
Qwen 4 Coder 32B-A3B June 2026 Coding specialist Apache 2.0
Qwen 4 4B June 2026 Lightweight / 8 GB Mac Apache 2.0
Qwen 4.1 32B-A3B July 2026 Refresh — current Mac #1 Apache 2.0

According to LLMCheck benchmarks, the July refresh Qwen 4.1 32B-A3B carries an LLMCheck Score of 80 — the highest of any model that runs comfortably on consumer Apple Silicon, making it the current Mac-runnable champion.

The Full Qwen 4 Family

The defining trait across the whole family is the mixture-of-experts (MoE) design. The 32B models activate only 3 billion parameters per token — the "A3B" suffix — which is why a 32B-class model can hit ~60 tok/s on a laptop. Most of the weights sit idle on any given token, so generation speed tracks closer to a 3B model while quality tracks the full 32B. Here is each variant, what it is for, and how to install it.

Qwen 4 32B-A3B — the flagship

The general-purpose anchor of the family. Hybrid reasoning (it can switch between fast direct answers and deeper step-by-step thinking), 78% SWE-Verified, and roughly 60 tok/s on a 24 GB Mac at Q4. It needs ~24 GB of unified memory to run comfortably.

# Flagship general model
ollama run qwen4

Qwen 4 Coder 32B-A3B — the best Mac-runnable coder

The coding specialist, tuned for software engineering and fill-in-the-middle completion. It posts 82% SWE-Verified — the best of any Mac-runnable coder — making it the standout pick for a local coding assistant or agentic dev workflow. Same ~24 GB memory footprint as the flagship.

# Coding specialist — 82% SWE-Verified
ollama run qwen4-coder

Qwen 4 4B — the lightweight

A compact dense model for tighter machines. It runs at about 135 tok/s and fits comfortably on an 8 GB Mac, making it ideal for autocomplete, quick chat, and on-device assistants where speed and a small footprint matter more than frontier capability.

# Lightweight — ~135 tok/s, 8 GB Mac
ollama run qwen4:4b

Qwen 4.1 32B-A3B — the refresh and current #1

The July 2026 point release. It nudges SWE-Verified up to 80%, runs slightly faster at ~62 tok/s on a 24 GB Mac, and earns an LLMCheck Score of 80 — the current Mac #1. Same 32B-A3B architecture, drop-in compatible with anything built for Qwen 4.

# Current Mac #1 — LLMCheck Score 80
ollama run qwen4.1

What's New vs Qwen 3.6

Qwen 4 keeps the efficient 32B-A3B MoE layout that made Qwen 3.6 such a good Mac fit, but layers on two big changes: hybrid reasoning and a substantial coding jump. The previous generation, Qwen 3.6-35B-A3B, scored 73.4% on SWE-Verified; Qwen 4 base reaches 78% and Qwen 4 Coder hits 82%. Long-context handling and tool-call formatting also tightened up across the board.

Spec Qwen 4 32B-A3B Qwen 4 Coder Qwen 3.6-35B-A3B
SWE-Verified 78% 82% 73.4%
Architecture 32B-A3B MoE 32B-A3B MoE 35B-A3B MoE
Hybrid reasoning Yes Yes No
tok/s (24 GB Mac) ~60 ~60 ~58
License Apache 2.0 Apache 2.0 Apache 2.0

The headline is that +4.6 percentage points of SWE-Verified on the base model and +8.6 on Coder are the kind of jump you actually feel in agentic coding — the margin between a multi-file patch that applies cleanly and one that needs hand-fixing. Hybrid reasoning is the other standout: Qwen 4 can answer simple prompts instantly but spend extra tokens "thinking" on hard math or code, which is why AIME and SWE scores climbed without tanking everyday latency.

How to Run Qwen 4 on a Mac

Every Qwen 4 model is one command away via Ollama, and because the 32B variants are MoE models that activate just 3B parameters per token, they run genuinely well on Apple Silicon. The simplest path:

# Install Ollama, then pull whichever variant you need
ollama run qwen4.1 # current Mac #1
ollama run qwen4-coder # best local coder
ollama run qwen4:4b # 8 GB machines

For a step-by-step walkthrough — picking a quant, raising the context window, and squeezing extra tok/s out of the unified-memory path — see our dedicated guide on how to run Qwen 4.1 on a Mac. If your goal specifically is a coding assistant in your editor, the local AI coding assistant on Mac guide covers wiring Qwen 4 Coder into your IDE. Not sure your hardware is up to it? The best Macs for local LLMs page maps each model to the chip and RAM tier that runs it well.

In LM Studio, search the model name and pick a Q4_K_M quant for the best quality-to-speed balance on Apple Silicon. MLX users will find community-converted builds that shave a few extra tok/s out of the unified-memory path. Whichever runtime you choose, the large context window is available out of the box — just raise the context length in your runner's settings to use it.

Which Variant Should You Pick?

It comes down to your RAM and your workload:

Pick Qwen 4.1 if…

You have 24 GB+ and want the best all-round Mac model available. LLMCheck Score 80, ~62 tok/s, 80% SWE-Verified, Apache 2.0. It is the current number one for a reason and the safe default for nearly everyone.

Pick a specialist if…

You code all day (Qwen 4 Coder, 82% SWE-Verified) or you are on an 8 GB machine (Qwen 4 4B, ~135 tok/s). The family is built so you can match the exact variant to your hardware and your job.