Best Model for Your Air, by RAM

On a MacBook Air, unified memory is everything. The chip generation (M1 vs M4 vs M5) affects speed, but the amount of RAM decides which models will even fit without forcing macOS to swap to disk — which tanks performance. Find your RAM tier below and start with the matching top pick.

RAMTop pickAlso greatSpeed (Q4)
8 GBQwen 4 4BGemma 4 E2B, Phi-5 Mini~45–65 tok/s
16 GBPhi-5 MiniGemma 4.5 12B (Q4), Llama 5 8B~35–55 tok/s
24 GBQwen 4.1 32B-A3BPhi-5 Medium 14B~40 tok/s

8 GB Air — small but mighty

An 8 GB Air is the entry point, and it is genuinely useful. Stick to 2B–4B models that leave headroom for the OS. Qwen 4 4B is the best all-rounder at roughly 45–65 tok/s; Gemma 4 E2B is the speed champion at ~58 tok/s and barely touches memory; Phi-5 Mini is the sharpest at reasoning for its size. According to LLMCheck benchmarks, Gemma 4 E2B is among the fastest models on Apple Silicon overall.[1]

16 GB Air — the sweet spot

16 GB is where a MacBook Air becomes a serious local-AI laptop. You can comfortably run 8B–14B models: Phi-5 Mini for crisp reasoning, Gemma 4.5 12B at Q4 for well-rounded chat and writing, and Llama 5 8B for broad general knowledge. This tier handles real coding help, summarization, and RAG over local documents without swapping.

24 GB Air — punching above its weight

A 24 GB Air can run the king of Mac-runnable models: Qwen 4.1 32B-A3B, the current #1 on the LLMCheck leaderboard. Its mixture-of-experts design keeps only ~3B parameters active per token, so it fits in ~18 GB at Q4 and runs near 40 tok/s even on the Air — tight, but very usable. Phi-5 Medium 14B is the roomier alternative if you want more memory headroom for long contexts.

No MLX penalty on the Air. A MacBook Air and a MacBook Pro with the same chip run the same tok/s on a fresh run — the GPU, Neural Engine, and MLX framework are identical. The only difference between them is thermal throttling on long workloads, covered next.

Is the MacBook Air Good Enough?

For most people, yes. The thing to understand is the one trade-off Apple made to keep the Air thin and silent: there is no fan. The chip cools passively through the aluminum chassis. That has a clear, predictable effect on local LLMs:

In practice, the Air is excellent for chat, short coding bursts, drafting, and on-the-go private AI. If your workflow is occasional questions rather than hours of nonstop batch generation, the fanless design will rarely get in your way.

How to Run an LLM on Your Air (5 Steps)

Step 1 — Check your MacBook Air's RAM

Click the Apple menu → About This Mac and read the Memory line. That number (8, 16, or 24 GB) is what decides your model. Note it before going further — it maps directly to the table above.

Step 2 — Install Ollama

Ollama is the simplest way to run models on a Mac. Download it from ollama.com, open the .dmg, drag Ollama to Applications, and launch it once so it installs its command-line tool. Then open Terminal and confirm:

ollama --version

You should see a version like ollama version 0.7.x. For a full walkthrough with screenshots, see our Install Ollama on Mac guide.

Step 3 — Pull the right model for your RAM

Run the single command that matches your Air's memory:

# 8 GB Air
ollama run qwen4:4b

# 16 GB Air
ollama run phi5:mini

# 24 GB Air
ollama run qwen4.1

The first run downloads the model (a few hundred MB up to ~18 GB for Qwen 4.1), then caches it locally so future launches take seconds.

Step 4 — Run and chat

When you see the >>> prompt, you are talking to the model entirely on-device — nothing leaves your Air. Type a question, press Enter, and watch it generate. Type /bye or press Ctrl+D to exit.

Step 5 — Manage heat & throttling

To keep your fanless Air running at its best during longer sessions:

When to Choose a Pro or Studio Instead

The MacBook Air is the right machine for most local-AI users. But if your work is sustained and heavy — running an LLM as a coding agent all day, batch-processing documents, or serving a local API for hours — the fanless design becomes a real limit, and a model that needs more than 24 GB simply won't fit. That is when active cooling and more memory pay off.

Our Mac hardware buying hub breaks down exactly which configuration gives the best tok/s per dollar for local LLMs, across the whole lineup.

Just want the best Air? A MacBook Air M4 or M5 with 24 GB of unified memory clears the bar for Qwen 4.1 32B-A3B while staying silent and light. You can grab one here: MacBook Air M4 (24 GB) on Amazon.

As an Amazon Associate, LLMCheck earns from qualifying purchases. This does not affect our rankings.

Bottom line. Match the model to your RAM, run it through Ollama, and the MacBook Air handles real local AI for free and offline. Buy the Air for portability and silence; step up to a Pro or Studio only when you genuinely need sustained, heavy throughput.