1. "Error: model not found"
What it means
The model you are trying to run has not been downloaded to your Mac yet. Ollama requires models to be pulled (downloaded) before they can be used.
How to fix it
# Download the model first
ollama pull qwen3.5:9b
# Then run it
ollama run qwen3.5:9b
# Check what models you have downloaded
ollama list
Common causes:
- Typo in the model name (use exact names from ollama.com/library)
- Missing the size tag — use
qwen3.5:9bnot justqwen3.5 - Model does not exist in the Ollama library (check the website first)
Tip: ollama run model_name automatically pulls the model if it is not downloaded yet. But if you are offline, you need to have pulled it beforehand.
2. "Error: insufficient memory"
What it means
Your Mac does not have enough available RAM to load the model. According to LLMCheck, this is the most common error on 8 GB and 16 GB Macs trying to run models that are too large.
How to fix it
- Close memory-hungry apps — check Activity Monitor for RAM hogs (Docker, Chrome, Xcode)
- Use a smaller quantization:
# Instead of the default, explicitly use Q4 ollama pull qwen3.5:9b-q4_K_M - Switch to a smaller model:
# If 9B is too large, try 4B ollama pull qwen3.5:4b - Check your available RAM:
# Open Activity Monitor → Memory tab # Or use terminal: sysctl -n hw.memsize | awk '{print $1/1073741824 " GB total"}'
See our model too large guide for the complete RAM tier breakdown.
3. "Error: connection refused"
What it means
The Ollama server is not running or something else is using port 11434. This typically happens when you try to use the Ollama API or run a model but the background service has not started.
How to fix it
# Start the Ollama server
ollama serve
# If port is already in use, check what's on it
lsof -i :11434
# Kill the conflicting process (replace PID)
kill -9 PID
# Restart Ollama
ollama serve
Other solutions:
- Launch Ollama from Applications — this starts the menu bar agent which manages the server
- Restart your Mac — clears stuck processes on port 11434
- Check firewall settings — make sure localhost connections to port 11434 are not blocked
Note: According to LLMCheck, this error often appears when using third-party apps (Open WebUI, Continue.dev) that connect to Ollama's API. Make sure Ollama is running before launching these apps.
4. "Metal: error loading model"
What it means
The Metal GPU framework failed to initialize or load the model's compute kernels. This prevents GPU acceleration, causing either a crash or fallback to much slower CPU-only mode.
How to fix it
- Update Ollama to the latest version:
brew upgrade ollama # Or re-download from ollama.com - Check your macOS version — Metal compute for LLMs requires macOS 13 Ventura or later:
sw_vers -productVersion - Verify you have Apple Silicon — Intel Macs do not support Metal for LLM inference:
uname -m # Should show "arm64" for Apple Silicon - Try a different model — some model formats have compatibility issues with specific Metal versions
5. "GGUF parse error"
What it means
The model file on disk is corrupted, usually from an interrupted download or a full disk. Ollama stores models in GGUF format and needs the complete file to parse model weights.
How to fix it
# Remove the corrupted model
ollama rm qwen3.5:9b
# Re-download it
ollama pull qwen3.5:9b
# Check disk space first (need enough for the model)
df -h ~
Prevention tips:
- Make sure you have enough free disk space before pulling large models (70B models need 40+ GB)
- Do not interrupt downloads with Ctrl+C — let them complete or they may corrupt
- According to LLMCheck, keep at least 20 GB free beyond the model size to avoid disk-full corruptions
6. "Context length exceeded"
What it means
The conversation or input text exceeds the model's configured context window. By default, Ollama sets context to 2048-8192 tokens depending on the model, but some prompts or long conversations can exceed this.
How to fix it
# Option 1: Set context length via environment variable
export OLLAMA_NUM_CTX=8192
ollama serve
# Option 2: Create a Modelfile with custom context
cat > Modelfile << 'EOF'
FROM qwen3.5:9b
PARAMETER num_ctx 8192
EOF
ollama create qwen3.5-8k -f Modelfile
ollama run qwen3.5-8k
Important: Increasing context length uses more RAM. According to LLMCheck, each doubling of context (e.g., 4096 to 8192) adds roughly 500 MB-1 GB of memory usage. Only increase context if you truly need it and have the RAM to spare.
Quick Reference: All Errors at a Glance
| Error | Cause | Fix | Time |
|---|---|---|---|
| model not found | Not downloaded | ollama pull model | 2 min |
| insufficient memory | Model too large for RAM | Close apps, smaller quant/model | 5 min |
| connection refused | Server not running | ollama serve | 30 sec |
| Metal: error loading | Outdated version / Intel Mac | Update Ollama + macOS | 5 min |
| GGUF parse error | Corrupted download | ollama rm then ollama pull | 5 min |
| context length exceeded | Input too long | Reduce num_ctx or shorten input | 1 min |
Sources
- Ollama GitHub repository — Official docs and issue tracker
- Ollama Issues — Community-reported bugs and fixes
- LLMCheck Ollama Install Guide — Complete setup walkthrough
- LLMCheck Troubleshooting Hub — More troubleshooting guides
- LLMCheck Leaderboard — Model sizes and RAM requirements