Yes — Ministral 8B (8B) runs at 55 tok/s on M3 with 16 GB RAM using Q4_K_M quantization via LM Studio. First token latency is 0.7s. Mistral's 8B model optimized for edge inference with strong instruction following.
LLMCheck measured Ministral 8B on M3 using the standard methodology: Q4_K_M quantization, 256-token input, 512-token output, 3 runs averaged on a freshly-booted system.
| Metric | Value |
|---|---|
| Tokens per second | 55 tok/s |
| Time to first token | 0.7s |
| Quantization | Q4_K_M |
| Minimum RAM | 16 GB |
| Recommended engine | LM Studio |
| Parameters | 8B |
| Benchmark date | 2026-01 |
Q4_K_M 8B LM Studio M3
The recommended engine for Ministral 8B on M3 is LM Studio. Install Ollama, then pull the model:
Ollama handles quantization automatically — it will download the Q4_K_M variant (~16 GB) and start an interactive chat session.
| Chip | Speed | First Token | Min RAM | Engine |
|---|---|---|---|---|
| M5 Max | 98 tok/s | 0.4s | 64 GB | Ollama |
| M4 | 72 tok/s | 0.5s | 16 GB | MLX |
To run Ministral 8B on M3 you need:
See how Ministral 8B stacks up against other models on your specific Mac hardware.
Open Compare Tool Full Leaderboard