What's new in Qwen 4 vs Qwen 3.6?

Qwen 4 keeps the efficient 32B-A3B mixture-of-experts layout but adds hybrid reasoning and a big coding jump. SWE-Verified climbs from 73.4% on Qwen 3.6-35B-A3B to 78% on Qwen 4 base and 82% on Qwen 4 Coder. Long-context handling and tool-call formatting also improve, while speed on Apple Silicon stays in the ~60 tok/s range on a 24 GB Mac.

Is Qwen 4 free for commercial use?

Yes. Every model in the Qwen 4 family — Qwen 4, Qwen 4 Coder, Qwen 4 4B, and Qwen 4.1 — ships under the Apache 2.0 license, which permits commercial use, modification, and redistribution with no usage restrictions. You can run them entirely offline on your own Mac at no cost.

Qwen 4 vs Qwen 4.1 — which should I use?

Use Qwen 4.1 32B-A3B if you can re-pull the weights — it is the July 2026 refresh with 80% SWE-Verified, ~62 tok/s on a 24 GB Mac, and an LLMCheck Score of 80, making it the current Mac #1. Qwen 4 base (78% SWE-Verified) remains excellent and fully compatible, so stay on it if you have already downloaded it and mainly do chat or drafting.

Can I run Qwen 4 on a Mac, and how fast?

Yes. Because Qwen 4 32B-A3B activates only 3B of its 32B parameters per token, it runs comfortably on Apple Silicon. According to LLMCheck benchmarks it reaches roughly 60 tok/s on a 24 GB M4 Pro Mac at Q4 quantization. Qwen 4 4B is far lighter at about 135 tok/s and fits on an 8 GB Mac.

What is Qwen 4 Coder?

Qwen 4 Coder 32B-A3B is the coding-specialized variant of Qwen 4, tuned for software engineering and fill-in-the-middle completion. It scores 82% on SWE-Verified — the best of any Mac-runnable coder — making it the standout choice for local coding assistants and agentic workflows on Apple Silicon.

Qwen 4: Release Date, What's New & How to Run It on a Mac (2026)

Q: When did Qwen 4 release?

Qwen 4 released in June 2026 from Alibaba, launching with the flagship Qwen 4 32B-A3B alongside Qwen 4 Coder 32B-A3B and the compact Qwen 4 4B. A refresh, Qwen 4.1 32B-A3B, followed in July 2026 and is now the top Mac-runnable model on the LLMCheck leaderboard.

If you have been searching "when is Qwen 4 coming out," the wait is over: Alibaba shipped the Qwen 4 generation in June 2026 and followed it with a Qwen 4.1 refresh in July. This is the complete rundown — the release timeline, the full family, what changed versus Qwen 3.6, and the exact commands to run every variant locally on a Mac.

When Did Qwen 4 Release?

Qwen 4 released in June 2026. Alibaba launched the generation with three models at once — the flagship Qwen 4 32B-A3B, the coding-specialized Qwen 4 Coder 32B-A3B, and the lightweight Qwen 4 4B — then shipped a point-release refresh, Qwen 4.1 32B-A3B, in July 2026. The refresh is the one to grab today: according to LLMCheck benchmarks it now holds the number-one spot among Mac-runnable models.

Here is the release timeline at a glance:

Model	Released	Role	License
Qwen 4 32B-A3B	June 2026	Flagship general model	Apache 2.0
Qwen 4 Coder 32B-A3B	June 2026	Coding specialist	Apache 2.0
Qwen 4 4B	June 2026	Lightweight / 8 GB Mac	Apache 2.0
Qwen 4.1 32B-A3B	July 2026	Refresh — current Mac #1	Apache 2.0

According to LLMCheck benchmarks, the July refresh Qwen 4.1 32B-A3B carries an LLMCheck Score of 80 — the highest of any model that runs comfortably on consumer Apple Silicon, making it the current Mac-runnable champion.

The Full Qwen 4 Family

The defining trait across the whole family is the mixture-of-experts (MoE) design. The 32B models activate only 3 billion parameters per token — the "A3B" suffix — which is why a 32B-class model can hit ~60 tok/s on a laptop. Most of the weights sit idle on any given token, so generation speed tracks closer to a 3B model while quality tracks the full 32B. Here is each variant, what it is for, and how to install it.

Qwen 4 32B-A3B — the flagship

The general-purpose anchor of the family. Hybrid reasoning (it can switch between fast direct answers and deeper step-by-step thinking), 78% SWE-Verified, and roughly 60 tok/s on a 24 GB Mac at Q4. It needs ~24 GB of unified memory to run comfortably.

# Flagship general model

ollama run qwen4

Qwen 4 Coder 32B-A3B — the best Mac-runnable coder

The coding specialist, tuned for software engineering and fill-in-the-middle completion. It posts 82% SWE-Verified — the best of any Mac-runnable coder — making it the standout pick for a local coding assistant or agentic dev workflow. Same ~24 GB memory footprint as the flagship.

# Coding specialist — 82% SWE-Verified

ollama run qwen4-coder

Qwen 4 4B — the lightweight

A compact dense model for tighter machines. It runs at about 135 tok/s and fits comfortably on an 8 GB Mac, making it ideal for autocomplete, quick chat, and on-device assistants where speed and a small footprint matter more than frontier capability.

# Lightweight — ~135 tok/s, 8 GB Mac

ollama run qwen4:4b

Qwen 4.1 32B-A3B — the refresh and current #1

The July 2026 point release. It nudges SWE-Verified up to 80%, runs slightly faster at ~62 tok/s on a 24 GB Mac, and earns an LLMCheck Score of 80 — the current Mac #1. Same 32B-A3B architecture, drop-in compatible with anything built for Qwen 4.

# Current Mac #1 — LLMCheck Score 80

ollama run qwen4.1

What's New vs Qwen 3.6

Qwen 4 keeps the efficient 32B-A3B MoE layout that made Qwen 3.6 such a good Mac fit, but layers on two big changes: hybrid reasoning and a substantial coding jump. The previous generation, Qwen 3.6-35B-A3B, scored 73.4% on SWE-Verified; Qwen 4 base reaches 78% and Qwen 4 Coder hits 82%. Long-context handling and tool-call formatting also tightened up across the board.

Spec	Qwen 4 32B-A3B	Qwen 4 Coder	Qwen 3.6-35B-A3B
SWE-Verified	78%	82%	73.4%
Architecture	32B-A3B MoE	32B-A3B MoE	35B-A3B MoE
Hybrid reasoning	Yes	Yes	No
tok/s (24 GB Mac)	~60	~60	~58
License	Apache 2.0	Apache 2.0	Apache 2.0

The headline is that +4.6 percentage points of SWE-Verified on the base model and +8.6 on Coder are the kind of jump you actually feel in agentic coding — the margin between a multi-file patch that applies cleanly and one that needs hand-fixing. Hybrid reasoning is the other standout: Qwen 4 can answer simple prompts instantly but spend extra tokens "thinking" on hard math or code, which is why AIME and SWE scores climbed without tanking everyday latency.

How to Run Qwen 4 on a Mac

Every Qwen 4 model is one command away via Ollama, and because the 32B variants are MoE models that activate just 3B parameters per token, they run genuinely well on Apple Silicon. The simplest path:

# Install Ollama, then pull whichever variant you need

ollama run qwen4.1 # current Mac #1

ollama run qwen4-coder # best local coder

ollama run qwen4:4b # 8 GB machines

For a step-by-step walkthrough — picking a quant, raising the context window, and squeezing extra tok/s out of the unified-memory path — see our dedicated guide on how to run Qwen 4.1 on a Mac. If your goal specifically is a coding assistant in your editor, the local AI coding assistant on Mac guide covers wiring Qwen 4 Coder into your IDE. Not sure your hardware is up to it? The best Macs for local LLMs page maps each model to the chip and RAM tier that runs it well.

In LM Studio, search the model name and pick a Q4_K_M quant for the best quality-to-speed balance on Apple Silicon. MLX users will find community-converted builds that shave a few extra tok/s out of the unified-memory path. Whichever runtime you choose, the large context window is available out of the box — just raise the context length in your runner's settings to use it.

Which Variant Should You Pick?

It comes down to your RAM and your workload:

8 GB Mac — Run Qwen 4 4B. At ~135 tok/s it is blisteringly fast for chat, autocomplete, and light assistant tasks, and it is the only family member that fits this tier with headroom.
24–32 GB Mac, general use — Run Qwen 4.1 32B-A3B. It is the current Mac #1 (LLMCheck Score 80), runs at ~62 tok/s, and handles everything from reasoning to drafting. This is the default recommendation for most people.
24–32 GB Mac, coding — Run Qwen 4 Coder 32B-A3B. Its 82% SWE-Verified leads every Mac-runnable coder, making it the pick for agentic dev and IDE assistants.
Already on Qwen 4 base — It remains excellent (78% SWE-Verified) and fully compatible. Re-pull Qwen 4.1 when convenient for the small capability and speed bump, but there is no urgency for casual use.

Pick Qwen 4.1 if…

You have 24 GB+ and want the best all-round Mac model available. LLMCheck Score 80, ~62 tok/s, 80% SWE-Verified, Apache 2.0. It is the current number one for a reason and the safe default for nearly everyone.

Pick a specialist if…

You code all day (Qwen 4 Coder, 82% SWE-Verified) or you are on an 8 GB machine (Qwen 4 4B, ~135 tok/s). The family is built so you can match the exact variant to your hardware and your job.

Qwen 4: Release Date, What's New & How to Run It on a Mac

When Did Qwen 4 Release?

The Full Qwen 4 Family

Qwen 4 32B-A3B — the flagship

Qwen 4 Coder 32B-A3B — the best Mac-runnable coder

Qwen 4 4B — the lightweight

Qwen 4.1 32B-A3B — the refresh and current #1

What's New vs Qwen 3.6

How to Run Qwen 4 on a Mac

Which Variant Should You Pick?

Pick Qwen 4.1 if…

Pick a specialist if…

Frequently Asked Questions

When did Qwen 4 release?

What's new in Qwen 4 vs Qwen 3.6?

Is Qwen 4 free for commercial use?

Qwen 4 vs Qwen 4.1 — which should I use?

Can I run Qwen 4 on a Mac, and how fast?

What is Qwen 4 Coder?

Sources & References

Can Your Mac Run Qwen 4 at Full Speed?