Run GLM 5.2 Air on M4 Ultra

Yes — GLM 5.2 Air (106B) runs at 38 tok/s on M4 Ultra with 192 GB RAM using Q4_K_M quantization via MLX. First token latency is 0.8s. A capable open-source LLM with 106B parameters.

Speed
38
tok/s
First Token
0.8
seconds
RAM Needed
192
GB minimum
Engine
MLX
recommended

Benchmark Details

LLMCheck measured GLM 5.2 Air on M4 Ultra using the standard methodology: Q4_K_M quantization, 256-token input, 512-token output, 3 runs averaged on a freshly-booted system.

MetricValue
Tokens per second38 tok/s
Time to first token0.8s
QuantizationQ4_K_M
Minimum RAM192 GB
Recommended engineMLX
Parameters106B
Benchmark date2026-07

Q4_K_M 106B MLX M4 Ultra

Setup Guide: Run GLM 5.2 Air on M4 Ultra

The recommended engine for GLM 5.2 Air on M4 Ultra is MLX. Install with pip and pull the model:

pip install mlx-lm
mlx_lm.generate --model mlx-community/glm-52-air-q4_k_m --prompt "Hello!"

Alternatively, you can use Ollama for a simpler setup:

ollama run glm-52-air

Performance on Other Apple Silicon Chips

ChipSpeedFirst TokenMin RAMEngine
M5 Max 34 tok/s 0.9s 128 GB MLX
M5 Max 30 tok/s 1.1s 64 GB Ollama
M4 Max 26 tok/s 1.3s 128 GB MLX

System Requirements

To run GLM 5.2 Air on M4 Ultra you need:

🛒 Get a Mac that runs GLM 5.2 Air

GLM 5.2 Air needs about 192 GB of unified memory. These current Apple Silicon Macs have the headroom to run it comfortably:

Not sure which Mac fits your budget? See the best Mac for running this →

As an Amazon Associate, LLMCheck earns from qualifying purchases. These affiliate links cost you nothing extra and help keep our benchmarks free.

Compare More Models

See how GLM 5.2 Air stacks up against other models on your specific Mac hardware.

Open Compare Tool Full Leaderboard