The Three-Way Verdict at a Glance

Before the data, here is the short version. These three models are not competing for the same crown — they each win a different category, and the right pick depends entirely on whether you care about peak capability, local practicality, or dense-model frontier scores.

Capability King

GLM 5.2

Zhipu's 744B-A40B MoE is the most capable open model on the board — best-in-class 68.5% SWE-Pro, 92% MMLU, 94% AIME. MIT-licensed. Server-class, but GLM 5.2 Air (106B-A12B) brings it to a 64GB Mac at ~30 tok/s.

Most Practical

Qwen 4.1

The 32B-A3B MoE is the only frontier-adjacent model that is also genuinely Mac-native — 80% SWE-Verified, 90% MMLU, ~62 tok/s on a 24GB Mac, Apache 2.0. LLMCheck Score 76. The one most people should actually download.

Frontier, But Heavy

Llama 5 405B

Meta's 405B dense model posts 91% MMLU and 72% SWE-Verified — true frontier quality. But dense means no MoE sparsity: on a Mac you are stuck at Q2 and ~5 tok/s. Great on a cluster, impractical locally.

Capability Comparison: Benchmarks Head to Head

Raw capability is where the three models separate most clearly. GLM 5.2 takes the top of the table thanks to its enormous 744B total-parameter MoE, which posts the highest SWE-Pro coding score of any open model. Qwen 4.1 punches dramatically above its 32B total weight on SWE-Verified. Llama 5 405B holds steady frontier numbers but does not lead any single category.

Metric GLM 5.2 Qwen 4.1 Llama 5 405B
SWE-Pro (coding) 68.5%
SWE-Verified 80% 72%
MMLU 92% 90% 91%
AIME (math reasoning) 94%
License MIT Apache 2.0 Llama 5 Community
Params (total / active) 744B / 40B 32B / 3B 405B / 405B
Architecture MoE MoE Dense
Mac tier (best fit) 64GB (Air variant) 24GB 128GB (Q2 only)
Local speed (tok/s) ~30 (Air) ~62 ~5

According to LLMCheck benchmarks, GLM 5.2's 68.5% SWE-Pro is the highest coding score recorded for any open-weights model in our database — but Qwen 4.1's 80% SWE-Verified, achieved with just 3B active parameters, is the single most efficient capability-per-watt result we have measured on Apple Silicon.

A note on benchmark comparability: SWE-Pro and SWE-Verified are different test suites at different difficulty tiers, so the 68.5% and 80% figures are not directly comparable across models — each lab reports the suite it leads on. What is comparable is MMLU and AIME, where GLM 5.2 edges ahead on both. The headline holds: GLM 5.2 wins raw capability across the board, Qwen 4.1 wins efficiency, and Llama 5 405B is consistently strong without topping any column.

License Three-Way: MIT vs Apache 2.0 vs Llama 5 Community

For anyone deploying commercially, the license is not a footnote — it determines whether you can ship at all. Here the three models fall into two camps: two genuinely permissive licenses and one conditional one.

License Trait GLM 5.2 (MIT) Qwen 4.1 (Apache 2.0) Llama 5 405B
Commercial use Yes, unrestricted Yes, unrestricted Yes, with conditions
User cap None None 700M MAU
Patent grant No Yes Limited
Acceptable-use policy None None Yes
LLMCheck license score 10 / 10 10 / 10 7 / 10

If your deployment is commercial and you want zero ambiguity, GLM 5.2 and Qwen 4.1 are both clean. Between them, Apache 2.0's patent grant gives Qwen 4.1 a slight edge for risk-averse legal teams.

The Practicality Axis: What Runs on a Mac

This is the axis that separates a benchmark headline from a usable tool. For local Mac users, the decisive number is not total parameters — it is active parameters, because that drives inference speed, plus total parameters for RAM footprint. MoE models win here decisively; dense models do not.

GLM 5.2 — only via the Air variant

The full 744B-A40B flagship is server-class and will not fit on any consumer Mac. The version you actually run is GLM 5.2 Air (106B-A12B), which fits a 64GB Mac at Q4 and runs at roughly 30 tok/s — perfectly usable for interactive coding and reasoning. You give up some of the flagship's peak capability, but you keep Zhipu's architecture and the MIT license.

# GLM 5.2 Air on a 64GB Mac via Ollama ollama run glm-5.2-air:q4_K_M # ~106B total / 12B active — needs ~58GB RAM, ~30 tok/s

Qwen 4.1 — the Mac-native frontier model

This is the sweet spot of the whole comparison. Qwen 4.1 32B-A3B activates only 3B parameters per token, so a 24GB Mac runs it at ~62 tok/s — well above reading speed — while the 32B total weight keeps the memory footprint modest. You get frontier-adjacent quality on hardware most developers already own.

# Qwen 4.1 on a 24GB Mac via Ollama ollama run qwen4.1:32b-a3b-q4_K_M # 32B total / 3B active — fits 24GB, ~62 tok/s

Llama 5 405B — the dense-model trap

Llama 5 405B is the cautionary tale. Because it is dense, all 405B parameters are active on every token — there is no MoE sparsity to exploit. Even on a 128GB Mac you are forced down to Q2 quantization just to load it, and you land at roughly 5 tok/s, slower than most people can comfortably read and with measurable quality loss from the aggressive quantization.

Llama 5 405B at Q2 on a 128GB Mac runs at approximately 5 tok/s. That is below the threshold for interactive use, and Q2 quantization degrades the very frontier quality you downloaded the model for. On Apple Silicon, a sparse MoE like Qwen 4.1 or GLM 5.2 Air delivers more usable intelligence per gigabyte of RAM.

The lesson is architectural, not brand-specific: in mid-2026, MoE sparsity is what makes large open models runnable on consumer hardware. A 405B dense model and a 744B MoE with 40B active have wildly different local profiles, even though the MoE is nominally "bigger."

Use-Case Picks

Matching the model to the job removes most of the ambiguity. Here is how the three sort out by workload.

The Verdict

The open frontier in July 2026 has settled into a clear three-way split, and the right answer depends on what you are optimizing for:

If you take one thing away: the most capable open model and the most useful local model are not the same model. GLM 5.2 holds the capability crown, but Qwen 4.1 is the one that turns the open frontier into something you can run on the Mac already on your desk. Llama 5 405B reminds us that parameter count alone no longer predicts what you can actually deploy — architecture does.