Troubleshooting Guides
Select the issue that matches your problem. Each guide includes diagnostic steps, root cause analysis, and verified fixes. According to LLMCheck testing, most local AI issues on Mac can be resolved in under 10 minutes.
Slow Inference — 7 Fixes for Faster Speed
Your local LLM is running at 2 tok/s when it should be doing 20+. Diagnose the bottleneck and fix it.
Read fix →GPU Not Used — Enable Metal Acceleration
Your Mac has a powerful GPU but the model is running on CPU only. Force Metal acceleration on.
Read fix →Model Too Large for RAM — Fit Big LLMs
The model you want needs more RAM than your Mac has. Quantize, switch architectures, or offload.
Read fix →Common Ollama Errors — Quick Fixes
Decoding cryptic Ollama error messages. Connection refused, model not found, memory errors, and more.
Read fix →Quick Diagnostic Checklist
Before diving into a specific guide, run through this quick checklist. According to LLMCheck data, these five checks resolve about 60% of all local AI issues on Mac:
- Check your macOS version — Metal acceleration requires macOS 13 Ventura or later
- Check available RAM — Open Activity Monitor and look at Memory Pressure (green is good, red means trouble)
- Update your inference engine — Run
ollama --versionand compare to the latest release - Verify model size vs. RAM — The model file size should not exceed 75% of your total RAM
- Close background apps — Docker, Chrome with many tabs, and Xcode are the worst memory offenders
Tip: If you are not sure which issue you have, start with the slow inference guide — it covers the broadest range of problems and includes a diagnostic flowchart.
Sources
- Ollama GitHub repository — Official documentation and issue tracker
- Apple Metal documentation — GPU acceleration framework
- MLX GitHub repository — Apple's machine learning framework
- LLMCheck Leaderboard — Benchmark data for 42+ models on Apple Silicon