Yes — Llama 3.3 70B (70B) runs at 18 tok/s on M4 Ultra with 192 GB RAM using Q4_K_M quantization via MLX. First token latency is 2.0s. Meta's 70B Llama 3.3 model for Macs with 64–128 GB RAM.
LLMCheck measured Llama 3.3 70B on M4 Ultra using the standard methodology: Q4_K_M quantization, 256-token input, 512-token output, 3 runs averaged on a freshly-booted system.
| Metric | Value |
|---|---|
| Tokens per second | 18 tok/s |
| Time to first token | 2.0s |
| Quantization | Q4_K_M |
| Minimum RAM | 192 GB |
| Recommended engine | MLX |
| Parameters | 70B |
| Benchmark date | 2026-02 |
Q4_K_M 70B MLX M4 Ultra
The recommended engine for Llama 3.3 70B on M4 Ultra is MLX. Install with pip and pull the model:
Alternatively, you can use Ollama for a simpler setup:
| Chip | Speed | First Token | Min RAM | Engine |
|---|---|---|---|---|
| M5 Max | 15 tok/s | 2.4s | 128 GB | MLX |
| M5 Max | 12 tok/s | 2.8s | 128 GB | Ollama |
To run Llama 3.3 70B on M4 Ultra you need:
See how Llama 3.3 70B stacks up against other models on your specific Mac hardware.
Open Compare Tool Full Leaderboard