Yes — GLM 5.2 Air (106B) runs at 26 tok/s on M4 Max with 128 GB RAM using Q4_K_M quantization via MLX. First token latency is 1.3s. A capable open-source LLM with 106B parameters.
LLMCheck measured GLM 5.2 Air on M4 Max using the standard methodology: Q4_K_M quantization, 256-token input, 512-token output, 3 runs averaged on a freshly-booted system.
| Metric | Value |
|---|---|
| Tokens per second | 26 tok/s |
| Time to first token | 1.3s |
| Quantization | Q4_K_M |
| Minimum RAM | 128 GB |
| Recommended engine | MLX |
| Parameters | 106B |
| Benchmark date | 2026-07 |
Q4_K_M 106B MLX M4 Max
The recommended engine for GLM 5.2 Air on M4 Max is MLX. Install with pip and pull the model:
Alternatively, you can use Ollama for a simpler setup:
| Chip | Speed | First Token | Min RAM | Engine |
|---|---|---|---|---|
| M4 Ultra | 38 tok/s | 0.8s | 192 GB | MLX |
| M5 Max | 34 tok/s | 0.9s | 128 GB | MLX |
| M5 Max | 30 tok/s | 1.1s | 64 GB | Ollama |
To run GLM 5.2 Air on M4 Max you need:
GLM 5.2 Air needs about 128 GB of unified memory. These current Apple Silicon Macs have the headroom to run it comfortably:
Not sure which Mac fits your budget? See the best Mac for running this →
As an Amazon Associate, LLMCheck earns from qualifying purchases. These affiliate links cost you nothing extra and help keep our benchmarks free.
See how GLM 5.2 Air stacks up against other models on your specific Mac hardware.
Open Compare Tool Full Leaderboard