Hardware Requirements

Llama 4 comes in two variants, and the hardware requirements are very different:

For this guide, we are focusing entirely on Llama 4 Scout, the variant that is practical for local Mac use.

Important: The 64GB requirement is tight. macOS itself uses 8-12GB of RAM, so a 64GB Mac leaves roughly 52-56GB for the model. Scout at Q4_K_M fits, but you should close all other applications before running it. A 96GB or 128GB Mac gives you much more breathing room.

Install Ollama

If you do not have Ollama installed yet, follow our complete Ollama installation guide. It takes under five minutes.

If Ollama is already installed, make sure you are running the latest version. Llama 4 support requires Ollama 0.6 or later. Check your version in Terminal:

ollama --version

If you need to update, download the latest installer from ollama.com and install it over your existing version. Your models will be preserved.

Pull Llama 4 Scout

Download Llama 4 Scout with a single command:

ollama pull llama4-scout

This pulls the default Q4_K_M quantization, which is approximately 58GB. On a 100 Mbps connection, expect the download to take around 45-60 minutes. Make sure you have at least 70GB of free disk space to account for temporary files during download.

You can verify the download completed successfully:

ollama show llama4-scout

This displays the model's parameter count, quantization level, context window size, and total file size on disk.

Choose Your Quantization

The default Q4_K_M quantization offers the best balance of speed and quality for most users, but you have options:

To pull a specific quantization:

# Higher quality, needs 96GB+ Mac
ollama pull llama4-scout:q5_K_M

For a deeper dive into quantization trade-offs, see our quantization guide.

Run and Benchmark

Start Llama 4 Scout with:

ollama run llama4-scout

The first run takes 30-60 seconds as the model loads into memory. Subsequent runs are faster if the model is still cached. Once loaded, you will see a chat prompt where you can start typing.

To benchmark your performance, try a consistent prompt and observe the token generation speed displayed at the bottom of the output. Expected speeds by hardware:

For detailed benchmarks across many models and Mac configurations, visit our benchmarks page.

Tip: If you see significantly lower speeds than expected, check Activity Monitor for memory pressure. A yellow or red indicator means the model is swapping to disk, which destroys performance. Close other applications and try again.

Performance Tips

Getting the best possible performance from Llama 4 Scout on your Mac requires some optimization:

For more model recommendations tailored to your specific Mac, read our Llama 4 Scout & Maverick analysis.