According to LLMCheck benchmarks, the most capable open model is Zhipu AI's GLM 5.2 (68.5% on SWE-Bench Pro, MIT) — but it's server-class. The best you can actually run on a Mac is Alibaba's Qwen 4.1 32B-A3B: 80% on SWE-Verified at ~62 tok/s on a 24 GB Mac. All 79 models are scored on capability, Apple Silicon speed, RAM, and license openness.
Every major open-source and frontier AI model ranked by LLMCheck Score — capability, speed on Apple Silicon (M5 Max tok/s), minimum RAM, and license openness. Filter by what fits in your Mac's memory, from 8 GB MacBook Air to 192 GB Mac Studio.
Find your Mac →⚡ Top pick per RAM tier
The model you can run is decided by unified memory — more RAM, bigger models. Here's a current Apple Silicon Mac for each tier below:
As an Amazon Associate, LLMCheck earns from qualifying purchases. Affiliate links — no extra cost to you, and they keep our benchmarks free. Rankings are never influenced by affiliate relationships.
| #↕ | Model↕ | Params↕ | Context↕ | License↕ | Min RAM↕ | M5 Max tok/s↕ | Score↕ |
|---|
The LLMCheck Leaderboard ranks 79 large language models specifically for Mac users on Apple Silicon. According to LLMCheck benchmarks, every model is evaluated on its practical utility for local inference — how capable it is, how fast it runs, how much memory it needs, and how freely it can be used.
LLMCheck Score (0–100) is a composite metric: Capability (50 pts) sourced from Arena AI ELO ratings, MMLU, and coding benchmarks; Mac Speed (25 pts) based on tokens/sec on M5 Max 128 GB; Accessibility (15 pts) inversely scaled to minimum RAM; License Openness (10 pts). Full formula and per-model source citations available at our methodology page.
All benchmark data is available for download at llmcheck.net/data/ under CC BY 4.0 license. Real-world speeds vary by quantization, context length, and software. Models requiring more than 128 GB unified memory are marked Server Only. All data updated July 2026. Compare models interactively at the model comparator.
Based on LLMCheck's July 2026 leaderboard data, GLM 5.2 (Zhipu AI, MIT) is the most capable open model ever released — the first to beat GPT-5 and Claude Opus 4.6 on SWE-Bench Pro (68.5%), though server-class. The top model you can actually run on a Mac is Qwen 4.1 32B-A3B (score 80, Apache 2.0) — 80% SWE-Verified at ~62 tok/s on 24 GB Macs. GLM 5.2 Air brings the flagship's reasoning to a 64 GB Mac at ~30 tok/s, Phi-5 Large 28B (MIT) tops the 32 GB tier, and Qwen 4 Coder (Apache 2.0) still leads pure coding at 82% SWE-V. The fastest model is Gemma 4 E2B at ~155 tok/s.
On a 16 GB Mac (M3, M4, or M5 chip), models up to ~12 GB in INT4 quantization run comfortably. Top choices include Qwen 3 14B (~55 tok/s), Gemma 3 12B (~65 tok/s), and Qwen 2.5 14B (~55 tok/s). All 8 GB models — Qwen 3.5 9B, Qwen 3 8B, DeepSeek R1 8B, Llama 3.1 8B, Mistral 7B, Phi-4 Mini — also run extremely fast on 16 GB hardware.
Yes. DeepSeek R1 has distilled variants for every Mac tier: DeepSeek R1 8B runs on 8 GB RAM (~95 tok/s, MIT), DeepSeek R1 32B requires 32 GB (~25 tok/s, MIT), and DeepSeek R1 70B needs 64 GB. The full DeepSeek R1 671B model is server-only (350 GB+ RAM). All variants are MIT licensed and available as GGUF files on Hugging Face.
Estimated speeds on an M5 Max 128 GB: Phi-4 Mini ~135 tok/s, Mistral 7B ~118 tok/s, Llama 3.1 8B ~112 tok/s, Qwen 3.5 9B ~100 tok/s, Qwen 3 8B ~95 tok/s. Speed scales with Apple Silicon's unified memory bandwidth — the M5 Ultra 192 GB (~800 GB/s) is roughly 30% faster than M5 Max for the same model.
Mixture-of-Experts (MoE) models have a large total parameter count but activate only a small fraction per token — this makes them faster and more RAM-efficient than a dense model of comparable quality. For example, Qwen 3 30B-A3B has 30B total parameters but only ~3B are active per token, running at ~58 tok/s on a 24 GB Mac. In contrast, a dense 30B model would need 64 GB and run at ~15 tok/s. MoE models are marked with a "MoE" badge in the leaderboard.
No. Kimi K2.5 is a 1-trillion parameter MoE model from Moonshot AI that requires approximately 600 GB of RAM in INT4 quantization — far beyond any current Mac, including the M4 Ultra with 192 GB. It is only available via Moonshot AI's API at kimi.ai. It is MIT licensed for those with server infrastructure.
Top coding models by Mac tier: 8 GB — DeepSeek R1 8B (MIT, strong chain-of-thought reasoning) or Qwen 3.5 9B (Apache 2.0, fast); 16–24 GB — Qwen 3 14B or Qwen 3 30B-A3B (Apache 2.0, excellent code generation); 128 GB — GPT-oss 120B (Apache 2.0, OpenAI's first open-weight model). For server-scale, DeepSeek V3 685B MoE is considered one of the best coding models in the world.
The LLMCheck Score (0–100) ranks models specifically for Mac users. It combines: Capability (50 pts) from benchmark results; Mac Speed (25 pts) from tok/s on M5 Max 128 GB; Accessibility (15 pts) — models needing ≤8 GB RAM score 15, server-only models score 0; License Openness (10 pts) — MIT = 10, Apache 2.0 = 8, others score lower. Scores ≥60 are Excellent, 45–59 are Good, below 45 are Limited.
RAM decides which of these models you can run. Our Mac buying guide ranks every Apple Silicon Mac for local AI by RAM, bandwidth, tok/s and price — best value is the Mac mini M4 Pro from $1,399.
See the best Mac for local LLMs →