What We Do

LLM Check is a free, independent compatibility tool for Apple Silicon Macs. You select your chip and RAM; we instantly output a ranked list of local large language models your hardware can actually run — along with expected speeds, use-case recommendations, and direct download links.

Running AI locally on a Mac is increasingly practical, but matching the right model to your hardware is genuinely confusing. Quantization levels, parameter counts, memory bandwidth requirements, and MoE architectures all interact in ways that aren't obvious. We do that work so you don't have to.

Our Methodology

Every compatibility recommendation on LLM Check is based on three inputs: the model's file size at a given quantization level, the estimated system overhead (OS + background apps ≈ 2–3GB), and the available Unified Memory bandwidth of the target chip.

We test and update our database as new models are released — typically within days of a major open-source release like Qwen 3.5, Llama 3, or Mistral. Speed estimates (tokens per second) are sourced from community benchmarks on Apple Silicon and cross-referenced with our own testing.

Our Principles

🔓

Always Free

The compatibility checker and all guides are free. No account, no paywall, no subscription.

⚖️

Independent

We are not affiliated with Apple, Ollama, LM Studio, or any model creator. Recommendations are unsponsored.

🎯

Mac-Focused

We specialise exclusively in Apple Silicon. Every recommendation is tested and validated for Unified Memory architectures.

🔄

Kept Current

The AI model landscape moves fast. We update our database within days of major new model releases.


Why Local AI on Mac?

Apple Silicon's Unified Memory architecture is uniquely suited for LLM inference. Unlike Windows PCs where the CPU and GPU have separate memory pools, a Mac with 128GB of Unified Memory effectively gives the GPU 128GB of "VRAM" — something that would cost tens of thousands of dollars in dedicated GPU hardware.

This means a Mac Studio with 128GB can run the full, unquantized Llama 3 70B model — a task that previously required a rack of datacenter hardware. The combination of high memory bandwidth (~600 GB/s on M5 Max), Apple's Neural Engine, and optimised frameworks like MLX and Ollama makes Macs the best consumer hardware for private, offline AI in 2026.

Find Your Perfect Local LLM

Select your Mac's chip and RAM. Get a personalised list of models in seconds.

→ Run the Free Checker