Run Qwen 2.5 72B on M4 Ultra

Yes — Qwen 2.5 72B (72B) runs at 15 tok/s on M4 Ultra with 192 GB RAM using Q4_K_M quantization via Ollama. First token latency is 2.5s. Alibaba's 72B Qwen 2.5 model — powerful but requires 64 GB+ Mac.

Speed

tok/s

First Token

2.5

seconds

RAM Needed

192

GB minimum

Engine

Ollama

Benchmark Details

The LLMCheck index estimates Qwen 2.5 72B on M4 Ultra using our published methodology: Q4_K_M quantization, memory-bandwidth scaling, and cross-referenced third-party benchmarks where available. Figures are transparent estimates — own this config? Submit a real benchmark →

Metric	Value
Tokens per second	15 tok/s
Time to first token	2.5s
Quantization	Q4_K_M
Minimum RAM	192 GB
Recommended engine	Ollama
Parameters	72B
Benchmark date	2026-02

Q4_K_M 72B Ollama M4 Ultra

Setup Guide: Run Qwen 2.5 72B on M4 Ultra

The recommended engine for Qwen 2.5 72B on M4 Ultra is Ollama. Install Ollama, then pull the model:

ollama run qwen2.5:72b

Ollama handles quantization automatically — it will download the Q4_K_M variant (~192 GB) and start an interactive chat session.

Performance on Other Apple Silicon Chips

Chip	Speed	First Token	Min RAM	Engine
M5 Max	10 tok/s	3.2s	128 GB	Ollama

System Requirements

To run Qwen 2.5 72B on M4 Ultra you need:

• Mac with M4 Ultra chip (or newer)
• 192 GB unified memory minimum
• macOS 13 Ventura or later
• ~163-192 GB free disk space for the model file
• Ollama installed (see our Ollama install guide)

🛒 Get a Mac that runs Qwen 2.5 72B

Qwen 2.5 72B needs about 192 GB of unified memory. These current Apple Silicon Macs have the headroom to run it comfortably:

Mac Studio M4 Ultra (192GB) → Mac Studio 192GB →

As an Amazon Associate, LLMCheck earns from qualifying purchases. These affiliate links cost you nothing extra and help keep our benchmarks free.

Compare More Models

See how Qwen 2.5 72B stacks up against other models on your specific Mac hardware.

Open Compare Tool Full Leaderboard