Step 1: Download Ollama

Head to ollama.com and click the download button for macOS. The installer is a standard .dmg file, around 90 MB. It supports every Apple Silicon Mac — M1, M2, M3, M4, and M5 — as well as Intel Macs (though performance will be significantly slower on Intel).

Once the download finishes, open the .dmg file and drag the Ollama icon into your Applications folder, just like any other Mac app.

Step 2: Install and Verify

Launch Ollama from your Applications folder. On first launch, it will ask for permission to install its command-line tool. Click Allow — this adds the ollama command to your Terminal so you can interact with it from the command line.

Open Terminal (search for it in Spotlight with Cmd + Space) and verify the installation:

ollama --version

You should see something like ollama version 0.6.x. If you see this, Ollama is installed and ready to go.

Tip: Ollama runs as a background service on your Mac. You will see a small llama icon in your menu bar. It uses virtually no resources until you actually load a model.

Step 3: Pull Your First Model

Ollama uses a Docker-like pull system to download models. For your first model, we recommend Qwen 3.5 9B — it is one of the best small models available in 2026, with strong reasoning and coding ability.

ollama pull qwen3.5:9b

This downloads the Q4_K_M quantized version (about 5.5 GB). The download speed depends on your internet connection — on a typical broadband connection, expect 2-5 minutes.

If you have an 8 GB Mac and want something even lighter, try:

ollama pull qwen3.5:4b

For 32 GB+ Macs looking for maximum capability:

ollama pull qwen3.5:32b

Step 4: Start Chatting

Run the model with a single command:

ollama run qwen3.5:9b

You will see a prompt appear in your terminal. Type any question or task and press Enter. The model runs entirely on your Mac — nothing is sent to the internet.

Try these to get started:

To exit the chat, type /bye or press Ctrl+D.

Step 5: Essential Commands Cheat Sheet

Here are the commands you will use most often with Ollama:

# List all downloaded models
ollama list

# Pull a specific model
ollama pull llama4-scout

# Run a model (downloads it first if needed)
ollama run mistral

# Remove a model to free disk space
ollama rm qwen3.5:9b

# Show model details (size, parameters, quantization)
ollama show qwen3.5:9b

# Start the API server (for use with other apps)
ollama serve

# Check running models
ollama ps

Pro tip: Ollama automatically exposes an OpenAI-compatible API at http://localhost:11434. Any app that supports the OpenAI API format can connect to your local Ollama models.

Connecting Other Apps to Ollama

Ollama works as a backend for many popular AI apps. Once Ollama is running, you can connect:

For a complete list of compatible apps, check our software directory. To see how your Mac performs with different models, visit the leaderboard.