📊 Updated July 2026 · 79 Models

Open LLM Leaderboard for Mac

According to LLMCheck benchmarks, the most capable open model is Zhipu AI's GLM 5.2 (68.5% on SWE-Bench Pro, MIT) — but it's server-class. The best you can actually run on a Mac is Alibaba's Qwen 4.1 32B-A3B: 80% on SWE-Verified at ~62 tok/s on a 24 GB Mac. All 79 models are scored on capability, Apple Silicon speed, RAM, and license openness.

Every major open-source and frontier AI model ranked by LLMCheck Score — capability, speed on Apple Silicon (M5 Max tok/s), minimum RAM, and license openness. Filter by what fits in your Mac's memory, from 8 GB MacBook Air to 192 GB Mac Studio.

Find your Mac →

⚡ Top pick per RAM tier

8 GB Mac
Qwen 4 4B
~135 tok/s · Apache 2.0 · Score 70
16 GB Mac
Phi-5 Medium 14B
~65 tok/s · MIT · Score 66
24–32 GB Mac
Qwen 4.1 32B-A3B
~62 tok/s · Apache 2.0 · Score 80
64 GB Mac
GLM 5.2 Air
~30 tok/s · MIT · Score 63
128 GB Mac
Llama 5 70B
~18 tok/s · Llama 5 · Score 60
Server Only
GLM 5.2
390 GB+ RAM · MIT · 68.5% SWE-Pro
🛒 Shop a Mac for your RAM tier

The model you can run is decided by unified memory — more RAM, bigger models. Here's a current Apple Silicon Mac for each tier below:

As an Amazon Associate, LLMCheck earns from qualifying purchases. Affiliate links — no extra cost to you, and they keep our benchmarks free. Rankings are never influenced by affiliate relationships.

Filter:
Sort:
LLMCheck Score = Capability (50 pts) + Mac Speed on M5 Max (25 pts) + Accessibility (15 pts) + License Openness (10 pts). ≥ 60 Excellent 45–59 Good < 45 Limited  · tok/s estimates based on M5 Max 128 GB memory bandwidth (~600 GB/s). Models requiring >128 GB show "Server Only".
# Model Params Context License Min RAM M5 Max tok/s Score

About This Leaderboard

The LLMCheck Leaderboard ranks 79 large language models specifically for Mac users on Apple Silicon. According to LLMCheck benchmarks, every model is evaluated on its practical utility for local inference — how capable it is, how fast it runs, how much memory it needs, and how freely it can be used.

LLMCheck Score (0–100) is a composite metric: Capability (50 pts) sourced from Arena AI ELO ratings, MMLU, and coding benchmarks; Mac Speed (25 pts) based on tokens/sec on M5 Max 128 GB; Accessibility (15 pts) inversely scaled to minimum RAM; License Openness (10 pts). Full formula and per-model source citations available at our methodology page.

All benchmark data is available for download at llmcheck.net/data/ under CC BY 4.0 license. Real-world speeds vary by quantization, context length, and software. Models requiring more than 128 GB unified memory are marked Server Only. All data updated July 2026. Compare models interactively at the model comparator.

Frequently Asked Questions

What is the best LLM for Mac in 2026?

Based on LLMCheck's July 2026 leaderboard data, GLM 5.2 (Zhipu AI, MIT) is the most capable open model ever released — the first to beat GPT-5 and Claude Opus 4.6 on SWE-Bench Pro (68.5%), though server-class. The top model you can actually run on a Mac is Qwen 4.1 32B-A3B (score 80, Apache 2.0) — 80% SWE-Verified at ~62 tok/s on 24 GB Macs. GLM 5.2 Air brings the flagship's reasoning to a 64 GB Mac at ~30 tok/s, Phi-5 Large 28B (MIT) tops the 32 GB tier, and Qwen 4 Coder (Apache 2.0) still leads pure coding at 82% SWE-V. The fastest model is Gemma 4 E2B at ~155 tok/s.

Which LLMs can run on a Mac with 16 GB RAM?

On a 16 GB Mac (M3, M4, or M5 chip), models up to ~12 GB in INT4 quantization run comfortably. Top choices include Qwen 3 14B (~55 tok/s), Gemma 3 12B (~65 tok/s), and Qwen 2.5 14B (~55 tok/s). All 8 GB models — Qwen 3.5 9B, Qwen 3 8B, DeepSeek R1 8B, Llama 3.1 8B, Mistral 7B, Phi-4 Mini — also run extremely fast on 16 GB hardware.

Can DeepSeek R1 run on a Mac?

Yes. DeepSeek R1 has distilled variants for every Mac tier: DeepSeek R1 8B runs on 8 GB RAM (~95 tok/s, MIT), DeepSeek R1 32B requires 32 GB (~25 tok/s, MIT), and DeepSeek R1 70B needs 64 GB. The full DeepSeek R1 671B model is server-only (350 GB+ RAM). All variants are MIT licensed and available as GGUF files on Hugging Face.

What is the fastest LLM on Apple Silicon?

Estimated speeds on an M5 Max 128 GB: Phi-4 Mini ~135 tok/s, Mistral 7B ~118 tok/s, Llama 3.1 8B ~112 tok/s, Qwen 3.5 9B ~100 tok/s, Qwen 3 8B ~95 tok/s. Speed scales with Apple Silicon's unified memory bandwidth — the M5 Ultra 192 GB (~800 GB/s) is roughly 30% faster than M5 Max for the same model.

What is the difference between MoE and dense LLMs?

Mixture-of-Experts (MoE) models have a large total parameter count but activate only a small fraction per token — this makes them faster and more RAM-efficient than a dense model of comparable quality. For example, Qwen 3 30B-A3B has 30B total parameters but only ~3B are active per token, running at ~58 tok/s on a 24 GB Mac. In contrast, a dense 30B model would need 64 GB and run at ~15 tok/s. MoE models are marked with a "MoE" badge in the leaderboard.

Can Kimi K2.5 run on a Mac?

No. Kimi K2.5 is a 1-trillion parameter MoE model from Moonshot AI that requires approximately 600 GB of RAM in INT4 quantization — far beyond any current Mac, including the M4 Ultra with 192 GB. It is only available via Moonshot AI's API at kimi.ai. It is MIT licensed for those with server infrastructure.

Which open source LLMs are best for coding on Mac?

Top coding models by Mac tier: 8 GB — DeepSeek R1 8B (MIT, strong chain-of-thought reasoning) or Qwen 3.5 9B (Apache 2.0, fast); 16–24 GB — Qwen 3 14B or Qwen 3 30B-A3B (Apache 2.0, excellent code generation); 128 GB — GPT-oss 120B (Apache 2.0, OpenAI's first open-weight model). For server-scale, DeepSeek V3 685B MoE is considered one of the best coding models in the world.

What is the LLMCheck Score?

The LLMCheck Score (0–100) ranks models specifically for Mac users. It combines: Capability (50 pts) from benchmark results; Mac Speed (25 pts) from tok/s on M5 Max 128 GB; Accessibility (15 pts) — models needing ≤8 GB RAM score 15, server-only models score 0; License Openness (10 pts) — MIT = 10, Apache 2.0 = 8, others score lower. Scores ≥60 are Excellent, 45–59 are Good, below 45 are Limited.

🖥️ What Mac runs this?

RAM decides which of these models you can run. Our Mac buying guide ranks every Apple Silicon Mac for local AI by RAM, bandwidth, tok/s and price — best value is the Mac mini M4 Pro from $1,399.

See the best Mac for local LLMs →