1. "Error: model not found"

What it means

The model you are trying to run has not been downloaded to your Mac yet. Ollama requires models to be pulled (downloaded) before they can be used.

How to fix it

# Download the model first
ollama pull qwen3.5:9b

# Then run it
ollama run qwen3.5:9b

# Check what models you have downloaded
ollama list

Common causes:

Tip: ollama run model_name automatically pulls the model if it is not downloaded yet. But if you are offline, you need to have pulled it beforehand.

2. "Error: insufficient memory"

What it means

Your Mac does not have enough available RAM to load the model. According to LLMCheck, this is the most common error on 8 GB and 16 GB Macs trying to run models that are too large.

How to fix it

  1. Close memory-hungry apps — check Activity Monitor for RAM hogs (Docker, Chrome, Xcode)
  2. Use a smaller quantization:
    # Instead of the default, explicitly use Q4
    ollama pull qwen3.5:9b-q4_K_M
  3. Switch to a smaller model:
    # If 9B is too large, try 4B
    ollama pull qwen3.5:4b
  4. Check your available RAM:
    # Open Activity Monitor → Memory tab
    # Or use terminal:
    sysctl -n hw.memsize | awk '{print $1/1073741824 " GB total"}'

See our model too large guide for the complete RAM tier breakdown.

3. "Error: connection refused"

What it means

The Ollama server is not running or something else is using port 11434. This typically happens when you try to use the Ollama API or run a model but the background service has not started.

How to fix it

# Start the Ollama server
ollama serve

# If port is already in use, check what's on it
lsof -i :11434

# Kill the conflicting process (replace PID)
kill -9 PID

# Restart Ollama
ollama serve

Other solutions:

Note: According to LLMCheck, this error often appears when using third-party apps (Open WebUI, Continue.dev) that connect to Ollama's API. Make sure Ollama is running before launching these apps.

4. "Metal: error loading model"

What it means

The Metal GPU framework failed to initialize or load the model's compute kernels. This prevents GPU acceleration, causing either a crash or fallback to much slower CPU-only mode.

How to fix it

  1. Update Ollama to the latest version:
    brew upgrade ollama
    # Or re-download from ollama.com
  2. Check your macOS version — Metal compute for LLMs requires macOS 13 Ventura or later:
    sw_vers -productVersion
  3. Verify you have Apple Silicon — Intel Macs do not support Metal for LLM inference:
    uname -m
    # Should show "arm64" for Apple Silicon
  4. Try a different model — some model formats have compatibility issues with specific Metal versions

5. "GGUF parse error"

What it means

The model file on disk is corrupted, usually from an interrupted download or a full disk. Ollama stores models in GGUF format and needs the complete file to parse model weights.

How to fix it

# Remove the corrupted model
ollama rm qwen3.5:9b

# Re-download it
ollama pull qwen3.5:9b

# Check disk space first (need enough for the model)
df -h ~

Prevention tips:

6. "Context length exceeded"

What it means

The conversation or input text exceeds the model's configured context window. By default, Ollama sets context to 2048-8192 tokens depending on the model, but some prompts or long conversations can exceed this.

How to fix it

# Option 1: Set context length via environment variable
export OLLAMA_NUM_CTX=8192
ollama serve

# Option 2: Create a Modelfile with custom context
cat > Modelfile << 'EOF'
FROM qwen3.5:9b
PARAMETER num_ctx 8192
EOF

ollama create qwen3.5-8k -f Modelfile
ollama run qwen3.5-8k

Important: Increasing context length uses more RAM. According to LLMCheck, each doubling of context (e.g., 4096 to 8192) adds roughly 500 MB-1 GB of memory usage. Only increase context if you truly need it and have the RAM to spare.

Quick Reference: All Errors at a Glance

ErrorCauseFixTime
model not foundNot downloadedollama pull model2 min
insufficient memoryModel too large for RAMClose apps, smaller quant/model5 min
connection refusedServer not runningollama serve30 sec
Metal: error loadingOutdated version / Intel MacUpdate Ollama + macOS5 min
GGUF parse errorCorrupted downloadollama rm then ollama pull5 min
context length exceededInput too longReduce num_ctx or shorten input1 min

Sources