Run Gemma 4.5 27B on M3 Max

Yes — Gemma 4.5 27B (27B) runs at 32 tok/s on M3 Max with 64 GB RAM using Q4_K_M quantization via Ollama. First token latency is 0.9s. A capable open-source LLM with 27B parameters.

Speed
32
tok/s
First Token
0.9
seconds
RAM Needed
64
GB minimum
Engine
Ollama
recommended

Benchmark Details

LLMCheck measured Gemma 4.5 27B on M3 Max using the standard methodology: Q4_K_M quantization, 256-token input, 512-token output, 3 runs averaged on a freshly-booted system.

MetricValue
Tokens per second32 tok/s
Time to first token0.9s
QuantizationQ4_K_M
Minimum RAM64 GB
Recommended engineOllama
Parameters27B
Benchmark date2026-07

Q4_K_M 27B Ollama M3 Max

Setup Guide: Run Gemma 4.5 27B on M3 Max

The recommended engine for Gemma 4.5 27B on M3 Max is Ollama. Install Ollama, then pull the model:

ollama run gemma-45-27b

Ollama handles quantization automatically — it will download the Q4_K_M variant (~64 GB) and start an interactive chat session.

Performance on Other Apple Silicon Chips

ChipSpeedFirst TokenMin RAMEngine
M5 Max 42 tok/s 0.6s 128 GB MLX
M4 Max 36 tok/s 0.7s 48 GB MLX
M5 Pro 30 tok/s 0.9s 32 GB Ollama
M4 Pro 26 tok/s 1.0s 32 GB Ollama

System Requirements

To run Gemma 4.5 27B on M3 Max you need:

🛒 Get a Mac that runs Gemma 4.5 27B

Gemma 4.5 27B needs about 64 GB of unified memory. These current Apple Silicon Macs have the headroom to run it comfortably:

Not sure which Mac fits your budget? See the best Mac for running this →

As an Amazon Associate, LLMCheck earns from qualifying purchases. These affiliate links cost you nothing extra and help keep our benchmarks free.

Compare More Models

See how Gemma 4.5 27B stacks up against other models on your specific Mac hardware.

Open Compare Tool Full Leaderboard