Download LM Studio
Head to lmstudio.ai and click the download button for macOS. The installer is approximately 150MB — much smaller than you might expect for an AI application because the models are downloaded separately.
Once the download finishes, open the .dmg file and drag LM Studio into your Applications folder. That is the entire installation. No Homebrew, no terminal commands, no dependencies to manage.
LM Studio supports both Apple Silicon (M1/M2/M3/M4/M5) and Intel Macs, though Apple Silicon delivers dramatically better performance. On Intel, you will be limited to smaller models and slower generation speeds.
First Launch Walkthrough
Open LM Studio from your Applications folder. On first launch, the app automatically detects your Mac's hardware — chip type, memory amount, and GPU capabilities. This information is used to recommend models that will run well on your specific machine.
The main interface has several tabs along the left sidebar:
- Discover — browse and download models from a curated catalog
- Chat — have conversations with your downloaded models
- My Models — manage your downloaded model files
- Developer — run a local API server for connecting other apps
The interface is clean and intuitive. If you have ever used ChatGPT or any other AI chat interface, LM Studio will feel immediately familiar — except everything runs on your machine.
Tip: LM Studio stores downloaded models in ~/.cache/lm-studio/models/ by default. If you have limited space on your boot drive, you can change this location in Settings before downloading your first model.
Download Your First Model
Click the Discover tab to browse available models. LM Studio shows you a curated list of the most popular and highest-quality models, sorted by compatibility with your hardware.
For your first model, we recommend Qwen 3.5 9B — it offers an excellent balance of quality and speed on Apple Silicon. Here is how to download it:
- Type "Qwen 3.5" in the search bar at the top of the Discover tab.
- Find the 9B Instruct version in the results.
- LM Studio will show you available quantization options. Select Q4_K_M (the default, about 5.5GB).
- Click the Download button. The progress bar shows the download status.
The download size depends on the model and quantization level. A 9B model at Q4 is typically 5-6GB. On a standard broadband connection, expect 3-5 minutes.
If you have an 8GB Mac, try Qwen 3.5 4B or Phi-4 Mini instead — these smaller models run comfortably with limited memory.
Start Chatting
Once your model finishes downloading, switch to the Chat tab. Select your downloaded model from the dropdown at the top of the chat window. The model takes a few seconds to load into memory the first time.
Type any message in the text box at the bottom and press Enter or click Send. The model processes your input and streams its response in real-time — just like ChatGPT, but running entirely on your Mac. No internet connection is needed after the initial model download.
Try a few different types of prompts to see what the model can do:
- General questions — ask about any topic and get detailed explanations
- Coding help — paste code and ask for explanations, debugging, or improvements
- Writing assistance — draft emails, summarize documents, or brainstorm ideas
- Analysis — paste text and ask the model to extract key points or compare options
Tip: You can create multiple chat threads and switch between them. Each thread maintains its own conversation history, so you can have separate chats for different projects or topics.
Enable Local API Server
One of LM Studio's most powerful features is its built-in API server. This lets other applications connect to your local model using the same API format as OpenAI's GPT-4 — meaning any tool that supports the OpenAI API can work with your local model.
To start the server:
- Switch to the Developer tab in the left sidebar.
- Select which model to serve from the dropdown.
- Click Start Server.
The server runs at http://localhost:1234 by default and is fully OpenAI-compatible. You can connect it to:
# Test the API with curl
curl http://localhost:1234/v1/chat/completions \
-H "Content-Type: application/json" \
-d '{
"model": "qwen3.5-9b",
"messages": [
{"role": "user", "content": "Hello!"}
]
}'
Many popular tools support this out of the box, including Continue.dev (VS Code AI coding), Open WebUI, and various Python libraries. Just point them at http://localhost:1234 as the API base URL.
Tips and Customization
Once you are comfortable with the basics, here are ways to get more from LM Studio:
- Model management. The My Models tab shows all downloaded models with their sizes and quantization levels. You can delete models you no longer need to free up disk space. Models are stored as GGUF files, so they are easy to back up or move between machines.
- Chat presets. LM Studio lets you save system prompts and generation settings as presets. Create presets for different tasks — one for coding with lower temperature, one for creative writing with higher temperature, one for analysis with specific instructions.
- GPU layer settings. In advanced settings, you can control how many model layers are offloaded to the GPU. On Apple Silicon, the default "Auto" setting usually works well. If you experience out-of-memory issues, reduce the GPU layers to keep more of the model in regular RAM.
- Compare with Ollama. LM Studio and Ollama serve different needs. LM Studio excels at visual model browsing and interactive chatting. Ollama is better as a lightweight background service and for terminal-based workflows. Many users run both — use whatever fits the moment. For a detailed comparison, read our LM Studio vs Ollama breakdown.
For a complete list of local AI tools and how they compare, visit our software directory.
Pro tip: LM Studio can also import GGUF files you have downloaded from Hugging Face. Just drag the .gguf file into the LM Studio window or place it in the models directory. This gives you access to the full range of community-quantized models beyond what appears in the Discover tab.