Yes — Ministral 14B (14B) runs at 40 tok/s on M4 Pro with 24 GB RAM using Q4_K_M quantization via Ollama. First token latency is 0.9s. Mistral's 14B model balancing size and reasoning for 16–24 GB Macs.
LLMCheck measured Ministral 14B on M4 Pro using the standard methodology: Q4_K_M quantization, 256-token input, 512-token output, 3 runs averaged on a freshly-booted system.
| Metric | Value |
|---|---|
| Tokens per second | 40 tok/s |
| Time to first token | 0.9s |
| Quantization | Q4_K_M |
| Minimum RAM | 24 GB |
| Recommended engine | Ollama |
| Parameters | 14B |
| Benchmark date | 2026-02 |
Q4_K_M 14B Ollama M4 Pro
The recommended engine for Ministral 14B on M4 Pro is Ollama. Install Ollama, then pull the model:
Ollama handles quantization automatically — it will download the Q4_K_M variant (~24 GB) and start an interactive chat session.
| Chip | Speed | First Token | Min RAM | Engine |
|---|---|---|---|---|
| M5 Max | 58 tok/s | 0.7s | 64 GB | Ollama |
| M3 | 30 tok/s | 1.2s | 16 GB | LM Studio |
To run Ministral 14B on M4 Pro you need:
See how Ministral 14B stacks up against other models on your specific Mac hardware.
Open Compare Tool Full Leaderboard