Yes — Gemma 4.5 27B (27B) runs at 26 tok/s on M4 Pro with 32 GB RAM using Q4_K_M quantization via Ollama. First token latency is 1.0s. A capable open-source LLM with 27B parameters.
LLMCheck measured Gemma 4.5 27B on M4 Pro using the standard methodology: Q4_K_M quantization, 256-token input, 512-token output, 3 runs averaged on a freshly-booted system.
| Metric | Value |
|---|---|
| Tokens per second | 26 tok/s |
| Time to first token | 1.0s |
| Quantization | Q4_K_M |
| Minimum RAM | 32 GB |
| Recommended engine | Ollama |
| Parameters | 27B |
| Benchmark date | 2026-07 |
Q4_K_M 27B Ollama M4 Pro
The recommended engine for Gemma 4.5 27B on M4 Pro is Ollama. Install Ollama, then pull the model:
Ollama handles quantization automatically — it will download the Q4_K_M variant (~32 GB) and start an interactive chat session.
| Chip | Speed | First Token | Min RAM | Engine |
|---|---|---|---|---|
| M5 Max | 42 tok/s | 0.6s | 128 GB | MLX |
| M4 Max | 36 tok/s | 0.7s | 48 GB | MLX |
| M3 Max | 32 tok/s | 0.9s | 64 GB | Ollama |
| M5 Pro | 30 tok/s | 0.9s | 32 GB | Ollama |
To run Gemma 4.5 27B on M4 Pro you need:
Gemma 4.5 27B needs about 32 GB of unified memory. These current Apple Silicon Macs have the headroom to run it comfortably:
Not sure which Mac fits your budget? See the best Mac for running this →
As an Amazon Associate, LLMCheck earns from qualifying purchases. These affiliate links cost you nothing extra and help keep our benchmarks free.
See how Gemma 4.5 27B stacks up against other models on your specific Mac hardware.
Open Compare Tool Full Leaderboard