Why Offline AI Matters

The push toward offline-capable AI is driven by three overlapping needs that affect more users than you might expect.

Setup: Download Once, Use Forever

The setup process requires internet connectivity exactly once: to download the application and your chosen model. After that, everything runs locally.

Option A: Ollama (Recommended for Terminal Users)

  1. Install Ollama while connected to the internet: curl -fsSL https://ollama.com/install.sh | sh
  2. Download a model: ollama pull qwen3.5:9b (5.5 GB download, takes 5-10 minutes on broadband)
  3. Disconnect from the internet. Turn off Wi-Fi, enable airplane mode, or unplug your ethernet cable.
  4. Run the model offline: ollama run qwen3.5:9b — it works exactly the same as when connected.

Ollama stores downloaded models in ~/.ollama/models/. These files persist across restarts and never need re-downloading unless you explicitly delete them.

Option B: LM Studio (Recommended for GUI Users)

  1. Download LM Studio from lmstudio.ai while online. It is a native macOS app with a visual interface.
  2. Browse and download models from the built-in model browser. Select your model and click download.
  3. Go offline. LM Studio's chat interface works identically without a connection.

Pro tip: Download multiple models of different sizes while you have internet access. This gives you options for different tasks offline — a small fast model for quick Q&A and a larger model for complex reasoning.

Best Models for Offline Use

Not all models are equally suited for offline work. According to LLMCheck benchmarks, here are the best choices organized by use case:

Use Case Model Size RAM tok/s Why
General assistant Qwen 3.5 9B 5.5 GB 16 GB ~100 Best quality/speed balance
Coding Qwen 3.5 35B MoE 20 GB 32 GB ~45 Near-frontier code generation
Quick Q&A Phi-4 Mini 2.4 GB 8 GB ~135 Fastest responses, tiny footprint
Legal/medical writing Llama 3.1 8B 4.7 GB 8 GB ~120 Strong instruction following
Deep reasoning DeepSeek R1 70B 40 GB 64 GB ~10 Frontier-class thinking

Real-World Use Cases

Offline AI is already being used in scenarios where cloud AI simply cannot operate.

Verifying Zero Network Activity

Trust but verify. Here is how to confirm your local AI setup makes absolutely no network connections during inference.

Note: Ollama checks for updates on launch when connected. To prevent even this, set the environment variable OLLAMA_NOCHECK_UPDATE=1 before starting the server. According to LLMCheck testing, this makes Ollama fully silent on the network.