The Three-Way Verdict at a Glance
Before the data, here is the short version. These three models are not competing for the same crown — they each win a different category, and the right pick depends entirely on whether you care about peak capability, local practicality, or dense-model frontier scores.
GLM 5.2
Zhipu's 744B-A40B MoE is the most capable open model on the board — best-in-class 68.5% SWE-Pro, 92% MMLU, 94% AIME. MIT-licensed. Server-class, but GLM 5.2 Air (106B-A12B) brings it to a 64GB Mac at ~30 tok/s.
Qwen 4.1
The 32B-A3B MoE is the only frontier-adjacent model that is also genuinely Mac-native — 80% SWE-Verified, 90% MMLU, ~62 tok/s on a 24GB Mac, Apache 2.0. LLMCheck Score 76. The one most people should actually download.
Llama 5 405B
Meta's 405B dense model posts 91% MMLU and 72% SWE-Verified — true frontier quality. But dense means no MoE sparsity: on a Mac you are stuck at Q2 and ~5 tok/s. Great on a cluster, impractical locally.
Capability Comparison: Benchmarks Head to Head
Raw capability is where the three models separate most clearly. GLM 5.2 takes the top of the table thanks to its enormous 744B total-parameter MoE, which posts the highest SWE-Pro coding score of any open model. Qwen 4.1 punches dramatically above its 32B total weight on SWE-Verified. Llama 5 405B holds steady frontier numbers but does not lead any single category.
| Metric | GLM 5.2 | Qwen 4.1 | Llama 5 405B |
|---|---|---|---|
| SWE-Pro (coding) | 68.5% | — | — |
| SWE-Verified | — | 80% | 72% |
| MMLU | 92% | 90% | 91% |
| AIME (math reasoning) | 94% | — | — |
| License | MIT | Apache 2.0 | Llama 5 Community |
| Params (total / active) | 744B / 40B | 32B / 3B | 405B / 405B |
| Architecture | MoE | MoE | Dense |
| Mac tier (best fit) | 64GB (Air variant) | 24GB | 128GB (Q2 only) |
| Local speed (tok/s) | ~30 (Air) | ~62 | ~5 |
According to LLMCheck benchmarks, GLM 5.2's 68.5% SWE-Pro is the highest coding score recorded for any open-weights model in our database — but Qwen 4.1's 80% SWE-Verified, achieved with just 3B active parameters, is the single most efficient capability-per-watt result we have measured on Apple Silicon.
A note on benchmark comparability: SWE-Pro and SWE-Verified are different test suites at different difficulty tiers, so the 68.5% and 80% figures are not directly comparable across models — each lab reports the suite it leads on. What is comparable is MMLU and AIME, where GLM 5.2 edges ahead on both. The headline holds: GLM 5.2 wins raw capability across the board, Qwen 4.1 wins efficiency, and Llama 5 405B is consistently strong without topping any column.
License Three-Way: MIT vs Apache 2.0 vs Llama 5 Community
For anyone deploying commercially, the license is not a footnote — it determines whether you can ship at all. Here the three models fall into two camps: two genuinely permissive licenses and one conditional one.
- GLM 5.2 — MIT. The most permissive option. Use it commercially, modify it, redistribute it, embed it in a closed product. No user caps, no acceptable-use policy bolted on. MIT's only notable omission is an explicit patent grant.
- Qwen 4.1 — Apache 2.0. Equally permissive for practical purposes, and it adds the explicit patent grant that MIT lacks — meaningful protection if you are shipping at scale and worried about patent exposure. This is the gold standard for enterprise open-source adoption.
- Llama 5 405B — Llama 5 Community License. Commercial use is allowed, but with strings: an acceptable-use policy, attribution requirements, and a clause requiring a separate license from Meta once your product crosses 700 million monthly active users. For most teams that cap never bites — but it is a categorically different license from MIT or Apache 2.0.
| License Trait | GLM 5.2 (MIT) | Qwen 4.1 (Apache 2.0) | Llama 5 405B |
|---|---|---|---|
| Commercial use | Yes, unrestricted | Yes, unrestricted | Yes, with conditions |
| User cap | None | None | 700M MAU |
| Patent grant | No | Yes | Limited |
| Acceptable-use policy | None | None | Yes |
| LLMCheck license score | 10 / 10 | 10 / 10 | 7 / 10 |
If your deployment is commercial and you want zero ambiguity, GLM 5.2 and Qwen 4.1 are both clean. Between them, Apache 2.0's patent grant gives Qwen 4.1 a slight edge for risk-averse legal teams.
The Practicality Axis: What Runs on a Mac
This is the axis that separates a benchmark headline from a usable tool. For local Mac users, the decisive number is not total parameters — it is active parameters, because that drives inference speed, plus total parameters for RAM footprint. MoE models win here decisively; dense models do not.
GLM 5.2 — only via the Air variant
The full 744B-A40B flagship is server-class and will not fit on any consumer Mac. The version you actually run is GLM 5.2 Air (106B-A12B), which fits a 64GB Mac at Q4 and runs at roughly 30 tok/s — perfectly usable for interactive coding and reasoning. You give up some of the flagship's peak capability, but you keep Zhipu's architecture and the MIT license.
Qwen 4.1 — the Mac-native frontier model
This is the sweet spot of the whole comparison. Qwen 4.1 32B-A3B activates only 3B parameters per token, so a 24GB Mac runs it at ~62 tok/s — well above reading speed — while the 32B total weight keeps the memory footprint modest. You get frontier-adjacent quality on hardware most developers already own.
Llama 5 405B — the dense-model trap
Llama 5 405B is the cautionary tale. Because it is dense, all 405B parameters are active on every token — there is no MoE sparsity to exploit. Even on a 128GB Mac you are forced down to Q2 quantization just to load it, and you land at roughly 5 tok/s, slower than most people can comfortably read and with measurable quality loss from the aggressive quantization.
Llama 5 405B at Q2 on a 128GB Mac runs at approximately 5 tok/s. That is below the threshold for interactive use, and Q2 quantization degrades the very frontier quality you downloaded the model for. On Apple Silicon, a sparse MoE like Qwen 4.1 or GLM 5.2 Air delivers more usable intelligence per gigabyte of RAM.
The lesson is architectural, not brand-specific: in mid-2026, MoE sparsity is what makes large open models runnable on consumer hardware. A 405B dense model and a 744B MoE with 40B active have wildly different local profiles, even though the MoE is nominally "bigger."
Use-Case Picks
Matching the model to the job removes most of the ambiguity. Here is how the three sort out by workload.
- Coding agents and large refactors — GLM 5.2 (or GLM 5.2 Air locally). Its best-in-class 68.5% SWE-Pro makes it the strongest open coder available, and the Air variant brings most of that to a 64GB Mac.
- Day-to-day coding on the hardware you own — Qwen 4.1. 80% SWE-Verified at ~62 tok/s on 24GB is the best practical coding experience for local users, full stop.
- Hard math and multi-step reasoning — GLM 5.2, with its 94% AIME. If you need that capability locally and have a 64GB Mac, run the Air variant.
- Commercial deployment with legal scrutiny — Qwen 4.1. Apache 2.0's patent grant and zero user caps make it the cleanest model to ship in a product. GLM 5.2's MIT is a close second.
- On-device / low-RAM Macs (16–24GB) — Qwen 4.1, no contest. It is the only one of the three that fits comfortably and runs fast on mainstream Apple Silicon.
- Frontier scores via API or a GPU cluster — Llama 5 405B. Off the Mac, where its dense weights can spread across server GPUs, it is a legitimate frontier model. Locally, it is the wrong tool.
The Verdict
The open frontier in July 2026 has settled into a clear three-way split, and the right answer depends on what you are optimizing for:
- GLM 5.2 wins raw capability. Best open coding score, best MMLU, best AIME, and an MIT license. If you want the most capable open model and can either run the Air variant locally or hit the flagship via an endpoint, this is it.
- Qwen 4.1 wins practicality. It is the only model here that is simultaneously frontier-adjacent and runnable on a normal Mac — 80% SWE-Verified, ~62 tok/s on 24GB, Apache 2.0, and an LLMCheck Score of 76. For the overwhelming majority of local users, this is the model to download.
- Llama 5 405B is frontier-but-impractical locally. Strong dense-model scores, but the dense architecture makes it a cluster-or-API model. On a Mac it is a benchmark you can read about, not a tool you will use.
If you take one thing away: the most capable open model and the most useful local model are not the same model. GLM 5.2 holds the capability crown, but Qwen 4.1 is the one that turns the open frontier into something you can run on the Mac already on your desk. Llama 5 405B reminds us that parameter count alone no longer predicts what you can actually deploy — architecture does.