Every major open-source and frontier AI model ranked by LLMCheck Score — capability, speed on Apple Silicon (M5 Max tok/s), minimum RAM, and license openness. Filter by what fits in your Mac's memory, from 8 GB MacBook Air to 192 GB Mac Studio.
⚡ Top pick per RAM tier
| #↕ | Model↕ | Params↕ | Context↕ | License↕ | Min RAM↕ | M5 Max tok/s↕ | Score↕ |
|---|
The LLMCheck Leaderboard ranks 34 large language models specifically for Mac users on Apple Silicon. Unlike cloud-focused benchmarks, every model is evaluated on its practical utility for local inference — how capable it is, how fast it runs, how much memory it needs, and how freely it can be used.
LLMCheck Score (0–100) is a composite metric: Capability (50 pts) derived from public benchmark results across reasoning, coding, and instruction-following tasks; Mac Speed (25 pts) based on estimated tokens/sec on an M5 Max 128 GB (≈600 GB/s memory bandwidth); Accessibility (15 pts) inversely scaled to minimum INT4 RAM — 8 GB models score 15, server-only models score 0; License Openness (10 pts) — MIT = 10, Apache 2.0 = 8, Gemma = 6, Meta Custom = 5.
Token-per-second figures are estimates based on model weight size and Apple Silicon memory bandwidth. Real-world speeds vary by quantization level, context length, and software (LM Studio, Ollama, llama.cpp). Models requiring more than 128 GB unified memory are marked Server Only and show no tok/s value. All data updated March 2026.
By LLMCheck Score, Qwen 3.5 9B (score 66) is the top-ranked model — it runs on any Mac with 8 GB RAM at ~100 tok/s and is Apache 2.0 licensed. For the best capability-speed balance on a 24–32 GB Mac, Qwen 3 30B-A3B (MoE, ~58 tok/s) and Qwen 3.5 35B (~45 tok/s) are excellent choices. The fastest small model is Phi-4 Mini at ~135 tok/s with an MIT license.
On a 16 GB Mac (M3, M4, or M5 chip), models up to ~12 GB in INT4 quantization run comfortably. Top choices include Qwen 3 14B (~55 tok/s), Gemma 3 12B (~65 tok/s), and Qwen 2.5 14B (~55 tok/s). All 8 GB models — Qwen 3.5 9B, Qwen 3 8B, DeepSeek R1 8B, Llama 3.1 8B, Mistral 7B, Phi-4 Mini — also run extremely fast on 16 GB hardware.
Yes. DeepSeek R1 has distilled variants for every Mac tier: DeepSeek R1 8B runs on 8 GB RAM (~95 tok/s, MIT), DeepSeek R1 32B requires 32 GB (~25 tok/s, MIT), and DeepSeek R1 70B needs 64 GB. The full DeepSeek R1 671B model is server-only (350 GB+ RAM). All variants are MIT licensed and available as GGUF files on Hugging Face.
Estimated speeds on an M5 Max 128 GB: Phi-4 Mini ~135 tok/s, Mistral 7B ~118 tok/s, Llama 3.1 8B ~112 tok/s, Qwen 3.5 9B ~100 tok/s, Qwen 3 8B ~95 tok/s. Speed scales with Apple Silicon's unified memory bandwidth — the M5 Ultra 192 GB (~800 GB/s) is roughly 30% faster than M5 Max for the same model.
Mixture-of-Experts (MoE) models have a large total parameter count but activate only a small fraction per token — this makes them faster and more RAM-efficient than a dense model of comparable quality. For example, Qwen 3 30B-A3B has 30B total parameters but only ~3B are active per token, running at ~58 tok/s on a 24 GB Mac. In contrast, a dense 30B model would need 64 GB and run at ~15 tok/s. MoE models are marked with a "MoE" badge in the leaderboard.
No. Kimi K2.5 is a 1-trillion parameter MoE model from Moonshot AI that requires approximately 600 GB of RAM in INT4 quantization — far beyond any current Mac, including the M4 Ultra with 192 GB. It is only available via Moonshot AI's API at kimi.ai. It is MIT licensed for those with server infrastructure.
Top coding models by Mac tier: 8 GB — DeepSeek R1 8B (MIT, strong chain-of-thought reasoning) or Qwen 3.5 9B (Apache 2.0, fast); 16–24 GB — Qwen 3 14B or Qwen 3 30B-A3B (Apache 2.0, excellent code generation); 128 GB — GPT-oss 120B (Apache 2.0, OpenAI's first open-weight model). For server-scale, DeepSeek V3 685B MoE is considered one of the best coding models in the world.
The LLMCheck Score (0–100) ranks models specifically for Mac users. It combines: Capability (50 pts) from benchmark results; Mac Speed (25 pts) from tok/s on M5 Max 128 GB; Accessibility (15 pts) — models needing ≤8 GB RAM score 15, server-only models score 0; License Openness (10 pts) — MIT = 10, Apache 2.0 = 8, others score lower. Scores ≥60 are Excellent, 45–59 are Good, below 45 are Limited.