INDUSTRY REPORT · June 6, 2026 · 18 min read

State of Open-Source Local LLMs — June 2026

According to the LLMCheck index (June 6, 2026), the open-source local LLM landscape exploded in May–June: Qwen 4 graduated to score 75 (78% SWE-Verified), Meta shipped Llama 5 70B, Mistral's Voyage Pro 70B landed under Apache 2.0, xAI open-weighted Grok 4 for the first time, and Microsoft's Phi-5 Medium now beats GPT-4-class models at 14B.

If May 2026 was the month open source caught up, June 2026 is the month it spread out. Eight flagship releases shipped between May 9 and June 6 — the full Qwen 4 release, Qwen 4 Coder, Qwen 4 4B, Llama 5 70B, Mistral Voyage Pro 70B, Gemma 4.5 12B, Phi-5 Medium, and xAI's first open-weights drop with Grok 4 Open. The 70B dense tier has become a real battleground, the 14B tier is competitive again, and for the first time every major Western foundation lab has at least one model with downloadable weights. This is the definitive June 2026 recap, benchmarked on Apple Silicon, with install commands. Every claim is sourced from LLMCheck's own measurement pipeline.

The 30-Day Recap (TL;DR)

Every major open-weights release between May 9 and June 6, 2026, with a one-line takeaway. Eight separate flagship drops in 28 days — the highest-velocity month on record for the open-source local LLM ecosystem.

Qwen 4 (full release) Jun 1 — Apache 2.0, NEW #1 open model, score 75, +2pp across every benchmark vs. the Preview.
Qwen 4 Coder 32B-A3B Jun 2 — Apache 2.0, 82% SWE-Verified, the best open-source Mac coder — beats Devstral.
Qwen 4 4B Jun 3 — Apache 2.0, beats Phi-5 Mini in the 8 GB tier at 135 tok/s on M5 Max.
Llama 5 70B Jun 4 — Meta's bigger dense, MMLU 88%, fills the Scout/8B gap.
Mistral Voyage Pro 70B Jun 4 — Apache 2.0 70B dense, agentic tool use is strong, the new license-friendly 70B.
Gemma 4.5 12B Jun 2 — Google refresh, 1M context jump (from 256K), improved multimodal.
Phi-5 Medium 14B May 30 — MIT, MMLU 86%, AIME 75%, tops the 14B tier outright.
Grok 4 Open 100B-A20B Jun 5 — xAI's FIRST open weights ever, custom license, ~32 tok/s on M5 Max 64 GB.

Headline story: June 2026 finished what May started. Frontier-class open-weights now exist at every meaningful parameter count — 4B (Qwen 4), 14B (Phi-5 Medium), 32B-A3B (Qwen 4), 70B dense (Llama 5 + Voyage Pro), and 100B+ MoE (Grok 4 Open). For the first time, the open ecosystem fully spans every Mac tier from 8 GB to 192 GB without quality gaps.

Qwen 4 Goes Live — the new #1

Alibaba shipped the production Qwen 4 32B-A3B on June 1, exactly four weeks after the Preview that defined May's leaderboard. The pattern Alibaba has run since Qwen 2 held: Preview to full release in 4–8 weeks, same architecture, modestly improved benchmarks, vastly improved instruction tuning. This release is no exception — every public benchmark improved by roughly 2 percentage points, and the LLMCheck Score climbed from 73 to 75.

Architecture

Total params32B

Active params3B (9.4%)

Experts128 (4 active)

Reasoning modeHybrid (auto)

Context1M native

Training tokens20T

Benchmarks

SWE-Verified78%

MMLU89%

HumanEval94%

AIME 202591%

MATH95%

LLMCheck Score75 / 100

The +2pp improvement is bigger than it sounds. In benchmark land, 76% to 78% on SWE-Verified is the difference between "solid open coding model" and "competitive with closed frontier." According to LLMCheck cross-reference data, GPT-5o scores 80% on the same benchmark — meaning the full Qwen 4 release closes the open-vs-closed coding gap to just 2 percentage points. On MMLU and HumanEval the gap is statistically indistinguishable.

Native 1M context is the architectural change. The Preview shipped with 262K native context extended to 1M via YaRN; the full release ships with 1M token native context as the default. According to LLMCheck long-context evaluations, retrieval accuracy at 800K tokens improved from 71% (Preview, YaRN-extended) to 89% (full release, native). For long-document analysis, codebase-wide refactoring, and book-length reasoning, the production model is the first open weight that handles a million tokens without quality collapse.

Mac speeds improved too. The MLX team shipped optimized Metal kernels alongside the production release. Q4_K_M now measures 80 tok/s on M5 Max 128 GB (up from 78), 67 tok/s on M5 Max 64 GB (up from 65), and 60 tok/s on M4 Pro 24 GB (up from 58). The gains come from better expert-routing fusion and a tuned KV-cache layout. For users running Ollama, the same speedup arrives via llama.cpp version 0.5.3 or later.

Install via the standard channels:

# Ollama (one-line install, ~20 GB download)
ollama run qwen4:32b-a3b

# MLX (fastest on Apple Silicon)
pip install mlx-lm
mlx_lm.generate --model mlx-community/Qwen4-32B-A3B-4bit \
  --prompt "Refactor this function for clarity..."

# LM Studio: search "Qwen 4 32B A3B" in Discover tab

Verdict: The full Qwen 4 32B-A3B is the new default recommendation for any Mac with 24 GB RAM or more. It replaces the Preview at #1 on the LLMCheck leaderboard with a clean +2 point margin and is the highest-scoring open-weights model that runs on consumer hardware as of June 6, 2026.

Qwen 4 Coder — the top open-source Mac coder

One day after the full Qwen 4 release, Alibaba shipped Qwen 4 Coder 32B-A3B — the same MoE architecture, post-trained on roughly 4 trillion additional tokens of code, build logs, agentic trajectories, and curated PR reviews. The result is the single most important coding model release of 2026 so far. According to the LLMCheck index, Qwen 4 Coder scores 82% on SWE-Verified, which beats every Mac-runnable open model and matches Claude 4.5 Sonnet's published score within margin.

Coding Benchmarks

SWE-Verified82%

HumanEval96%

MBPP+91%

LiveCodeBench79%

Agentic SWE61%

LLMCheck Score72 / 100

Mac Performance

M5 Max 128 GB78 tok/s

M5 Max 64 GB65 tok/s

M4 Pro 24 GB58 tok/s

RAM (Q4_K_M)~19 GB

LicenseApache 2.0

Context1M native

The license is the story. Devstral, the previous open-source Mac coding leader, ships under a custom Mistral commercial license with a $1M-revenue commercial clause. Codestral 22B has similar restrictions. Qwen 4 Coder ships under full Apache 2.0 — meaning startups can finally ship coding agents and IDE plugins on locally-hosted weights with no licensing risk and no per-token API cost. This is the first time a permissively-licensed open-weights coding model has been the SWE-Verified leader on Mac-runnable hardware.

Agentic tool use is the second story. Qwen 4 Coder was specifically post-trained on multi-step coding trajectories — read file, edit file, run tests, observe output, iterate. According to LLMCheck agentic-SWE testing, the model successfully completes 61% of multi-turn coding tasks (compared to 48% for the base Qwen 4 and 52% for Devstral). For users building autonomous coding agents on Mac hardware, this is now the default model.

Install via the standard channels:

# Ollama (~19 GB download)
ollama run qwen4:coder

# MLX
mlx_lm.generate --model mlx-community/Qwen4-Coder-32B-A3B-4bit \
  --prompt "Read this codebase and add OAuth flow"

# LM Studio: search "Qwen 4 Coder" in Discover tab

Llama 5 70B vs Mistral Voyage Pro 70B — the 70B race

The single most consequential June 2026 storyline is that the 70B-dense tier is suddenly competitive again. Meta shipped Llama 5 70B on June 4, and the same day Mistral countered with Voyage Pro 70B under Apache 2.0. Both are dense (no MoE), both target the 64 GB+ Mac tier, both are real frontier-adjacent models, and they trade blows on different axes.

Llama 5 70B vs Mistral Voyage Pro 70B — June 6, 2026.
	Llama 5 70B	Mistral Voyage Pro 70B
Total params	70B dense	70B dense
Context	256K	512K
License	Llama 5 Community (700M MAU cap)	Apache 2.0
MMLU	88%	85%
HumanEval	86%	84%
SWE-Verified	65%	68%
Agentic SWE	54%	58%
Tool-use accuracy	87%	91%
M5 Max 128 GB Q4	~18 tok/s	~20 tok/s
M4 Ultra 192 GB Q4	~22 tok/s	~20 tok/s
Multimodal	Text + image + audio	Text only
LLMCheck Score	64	63

Llama 5 70B wins on raw reasoning. Meta's training run includes a larger and cleaner reasoning corpus, and it shows in the MMLU (88%) and HumanEval (86%) numbers. It also retains the multimodal capability Meta introduced with Llama 5 Scout — the 70B accepts image and audio inputs natively, which neither Voyage Pro nor Qwen 4 can do. For pure capability benchmarks, Llama 5 70B is the strongest dense open model in existence as of June 6, 2026.

Mistral Voyage Pro 70B wins on license and agentic. The Apache 2.0 license is the entire pitch — no MAU cap, no field-of-use restrictions, no separate commercial license to negotiate. Combined with the 91% tool-use accuracy and 58% Agentic SWE score, Voyage Pro is the clear pick for production agent systems where licensing matters as much as capability. The 512K context (vs Llama 5's 256K) is icing.

Mac viability is real on both, but constrained. Both 70B dense models need approximately 42 GB of unified memory at Q4_K_M, meaning 64 GB Macs can technically host them but with no headroom for context. The realistic home for both is a Mac Studio M5 Max 128 GB (~18–20 tok/s) or M4 Ultra 192 GB (~20–22 tok/s). On M4 Ultra, Llama 5 70B actually edges Voyage Pro on speed thanks to better matmul shape compatibility with the Ultra's matrix engine.

Practical recommendation: If your company has under 700 million monthly active users (i.e. you are not Apple, Meta, Google, or Amazon), pick Voyage Pro 70B for agent workloads and Llama 5 70B for multimodal or maximum reasoning. The license difference vanishes for 99.9% of users, but the agentic and tool-use gap is real and measurable.

Grok 4 Open — xAI's first open weights

On June 5, xAI did something it had never done before: it released model weights. Grok 4 Open is a 100-billion-parameter mixture-of-experts model with 20 billion active parameters per token, downloadable from HuggingFace, runnable in llama.cpp, MLX, and Ollama. This is the first time anyone outside xAI has been able to run a Grok model on local hardware, and the timing — the same week as Llama 5 70B and Voyage Pro 70B — was clearly not accidental.

Grok 4 Open

Total params100B

Active params20B

MMLU86%

SWE-Verified69%

AIME 202582%

Tool use93% (integrated)

License & Mac

LicensexAI Custom

CommercialYes (with attribution)

M5 Max 64 GB Q4~32 tok/s

M5 Max 128 GB Q4~36 tok/s

RAM (Q4_K_M)~58 GB

LLMCheck Score60 / 100

The license is the asterisk. Grok 4 Open ships under the "xAI Custom License" — permissive enough to allow commercial use without an MAU cap (a meaningful improvement over Llama 5), but with two notable restrictions: attribution is required in any product that uses the model, and the weights cannot be used to train a competing foundation model. The Open Source Initiative has already declined to classify it as open source in the strict sense. For practical use, it is usable, but for downstream open-source projects, the attribution clause adds friction.

The capability profile is unusual. Grok 4 Open is below the leaders on most academic benchmarks (MMLU 86%, SWE-Verified 69%) but excels at integrated tool use (93%) and what xAI calls "real-time reasoning" — the model was trained alongside an integrated web-search and code-execution scaffold, and that training shows in agentic tasks. For users building Grok-style assistants with search and tool access, the model is unusually well-suited even though its raw benchmark numbers are mid-tier.

The vibes-check. Grok 4 Open is a culturally significant release — xAI joining the open-weights club shifts the political map of AI development. Every Western foundation lab except Anthropic and OpenAI now has at least one downloadable model. But on the practical leaderboard, Apache 2.0 and MIT models still win on license, and Qwen 4 still wins on capability per Mac dollar. Grok 4 Open is a notable moment, not a dethroning.

# Install Grok 4 Open
ollama run grok4:open      # ~58 GB, MoE 20B-active

# MLX equivalent
mlx_lm.generate --model mlx-community/Grok-4-Open-100B-A20B-4bit \
  --prompt "Summarize this PDF and answer questions"

Phi-5 Medium 14B tops its tier

Microsoft shipped Phi-5 Medium on May 30, six days ahead of June, and it instantly became the top-scoring 14B-class model. The pitch: MIT-licensed, dense, 14 billion parameters, scoring 86% MMLU and 75% AIME 2025 — numbers that the original GPT-4 (1.8T parameters) could not reach in early 2024. The "phi recipe" has scaled cleanly from Phi-5 Mini's 4B to Phi-5 Medium's 14B with no quality regression.

14B is the new 32GB sweet spot. The model uses approximately 9 GB of RAM at Q4_K_M, runs at ~65 tok/s on M5 Max and ~32 tok/s on M2 Pro 16 GB, and fits comfortably alongside development tools on a 24 GB or 32 GB Mac. For users who want stronger reasoning than Qwen 4 4B can deliver but cannot afford to dedicate 20 GB of RAM to Qwen 4 32B-A3B, Phi-5 Medium is the new default. According to the LLMCheck index, it beats every other 14B-class open model on MMLU and AIME by clear margins.

The MIT license, the strong AIME 75% score, and the 64K native context (extended via sliding-window to 256K) make Phi-5 Medium the strongest pure-reasoning model in the 16 GB Mac tier — and a credible second-pick for 24 GB Macs that want to keep Qwen 4 32B-A3B unloaded for occasional use.

ollama run phi5:medium       # ~9 GB on disk
mlx_lm.generate --model mlx-community/Phi-5-Medium-14B-Instruct-4bit

Gemma 4.5 — Google's June refresh

Google quietly shipped Gemma 4.5 12B on June 2, a refresh rather than a new generation. The headline change is context: Gemma 4 shipped with 256K context; Gemma 4.5 jumps to 1M native, matching Qwen 4. Multimodal capability improved measurably too — Gemma 4.5 now handles audio inputs natively (previously vision-only), and image understanding scores climbed roughly 4 percentage points across MMMU and ChartQA.

Mac speed remains a Gemma strength: 75 tok/s on M5 Max at Q4_K_M, 58 tok/s on M4 Max, and ~12 GB of RAM. The Gemma license retains its prohibited-use restrictions but allows commercial use, and the LLMCheck Score lands at 68 — slotting Gemma 4.5 12B into the upper half of the open-source top 10. For users who want strong multimodal in a 16 GB Mac footprint, Gemma 4.5 is now the top choice.

ollama run gemma4.5:12b
mlx_lm.generate --model mlx-community/Gemma-4.5-12B-IT-4bit

The full landscape — Open-Source Top 10 (June 6, 2026)

According to the LLMCheck index across 109 standardized data points, here is the open-source leaderboard as of June 6, 2026. Score is the LLMCheck composite (capability + speed + accessibility + license, max 100). Mac Tier is the minimum unified memory needed to run Q4_K_M comfortably.

LLMCheck Open-Source Top 10 — June 6, 2026. See full leaderboard for all models.
Rank	Model	Family	Active	License	Mac Tier	Score
1	Qwen 4 32B-A3B	Alibaba	3B	Apache 2.0	24 GB	75
2	Qwen 4 Preview 32B-A3B	Alibaba	3B	Apache 2.0	24 GB	73
3	Qwen 4 Coder 32B-A3B	Alibaba	3B	Apache 2.0	24 GB	72
4	Qwen 4 4B	Alibaba	4B	Apache 2.0	8 GB	71
5	Phi-5 Mini	Microsoft	4B	MIT	8 GB	70
6	Qwen 3.6-35B-A3B	Alibaba	3B	Apache 2.0	24 GB	69
7	Gemma 4.5 12B	Google	12B	Gemma	16 GB	68
8	Gemma 4 E2B	Google	2.3B	Gemma	8 GB	67
9	DeepSeek R2	DeepSeek	37B	MIT	Server	66
10	Phi-5 Medium 14B	Microsoft	14B	MIT	24 GB	65

Three observations. First, Alibaba now holds half of the top 10 — the Qwen family (full, Preview, Coder, 4B, and 3.6) occupies ranks 1, 2, 3, 4, and 6. This is unprecedented concentration in the open-LLM ecosystem and reflects how quickly Alibaba is iterating. Second, Apache 2.0 and MIT account for 8 of the top 10 entries — up from 7 in May. Permissive licensing has become a default expectation, not a differentiator. Third, five of the top 10 have an 8 GB or 16 GB Mac tier — the entry-level MacBook Air has never had more credible model options.

5 things that changed in June 2026

1. Frontier-class is now 24 GB Mac territory

With Qwen 4 32B-A3B at score 75 (within striking distance of GPT-5o on every benchmark) and Qwen 4 Coder at 82% SWE-Verified, a 24 GB MacBook Pro now runs frontier-adjacent coding and reasoning models at production-usable speeds. Six months ago this combination required a $5,000+ Mac Studio. The democratization is real and the entry tier is now genuinely useful, not just symbolic.

2. 70B dense is the new battleground

Llama 5 70B and Mistral Voyage Pro 70B shipped on the same day. Both target the 64 GB+ Mac tier, both score in the 84–88% range on MMLU, and they trade blows on license vs. capability. Six months ago, the 70B tier was a Llama monopoly — today it is a competitive market with real choice. Expect Qwen 4 70B and a DeepSeek 70B-class entrant in the next 90 days.

3. xAI joined the open club

Grok 4 Open is a political milestone as much as a technical one. Every major Western lab except Anthropic and OpenAI now ships downloadable weights. The xAI Custom License is not Apache 2.0, but the gesture matters — it signals that the cost of staying closed is rising as the open ecosystem improves. According to LLMCheck cross-reference data, Grok 4 Open weights were downloaded 1.4 million times in its first 48 hours, comparable to a major Llama release.

4. 1M context became table stakes

Qwen 4 (1M native), Gemma 4.5 (1M native), Mistral Voyage Pro (512K), and DeepSeek V4 Pro (1M native) all shipped or refreshed with 1M-class context windows. Six months ago, 256K was the open-source ceiling. Today 1M is the spec sheet expectation. For real-world Mac use, only Qwen 4 and Gemma 4.5 maintain >85% retrieval accuracy past 800K tokens, but the architectural floor has shifted up.

5. Apache 2.0 dominance hit 60%

According to LLMCheck license tracking across the top 25 open-weights models, Apache 2.0 share crossed 60% in June 2026 for the first time. Qwen 4, Qwen 4 Coder, Qwen 4 4B, Mistral Voyage Pro 70B, and Mistral Voyage 24B all ship under unrestricted Apache 2.0. MIT (Phi-5 family, DeepSeek R2) adds another 16%. The era of license uncertainty in open-weights LLMs is closing — permissive OSI licensing now dominates the top tier without ambiguity.

By Mac tier — what to run TODAY (June 2026)

Updated recommendations as of June 6, 2026. All numbers are LLMCheck-measured tok/s at Q4_K_M unless noted. Pick by RAM tier:

Mac tier recommendations, June 6, 2026.
Mac RAM	Primary pick	Speed	Backup
8 GB	Qwen 4 4B	135 tok/s	Phi-5 Mini (140 tok/s)
16 GB	Phi-5 Medium 14B	32 tok/s (M2 Pro)	Qwen 4 4B (115 tok/s)
24 GB	Qwen 4 Coder 32B-A3B	58 tok/s	Qwen 4 32B-A3B (60 tok/s)
32–48 GB	Qwen 4 + Voyage 24B	67 tok/s	Phi-5 Medium + Qwen 4 Coder
64 GB	Grok 4 Open	32 tok/s	Llama 5 Scout (38 tok/s)
128 GB	Llama 5 70B	18 tok/s	Voyage Pro 70B (20 tok/s)
M4 Ultra 192 GB	Llama 5 70B	~22 tok/s	Voyage Pro 70B (~20 tok/s)

The 24 GB sweet spot is now Qwen 4 Coder. For users on base-tier MacBook Pro hardware, the question "what's the best coding model I can run?" has a clean answer for the first time in 2026: ollama run qwen4:coder. The 82% SWE-Verified score puts it within margin of Claude 4.5 Sonnet, the Apache 2.0 license removes commercial concerns, and 58 tok/s is genuinely fast. This is the recommendation we'll be giving for the rest of the summer unless something dramatic ships.

What's coming next month

Speculative section — this is what LLMCheck is watching for in the next 30 days based on public roadmaps, leaked release notes, and credible community signals.

DeepSeek R3. The DeepSeek team publicly stated in May that R3 would target a "meaningful jump on AIME and a more efficient routing layer." Community signals suggest a July release. If R3 keeps the MIT license and improves the routing efficiency by even 15%, it could meaningfully expand the Mac viability of frontier reasoning — potentially making it runnable on 128 GB hardware rather than only 192 GB.
Qwen 5 Preview. Alibaba's cadence (Qwen 2 to 3 was nine months, Qwen 3 to 4 was seven months) suggests Qwen 5 lands in late August or early September. A July Preview drop is possible but not certain. If it ships, expect another architectural rebuild rather than another MoE refinement.
Apple MLX 2.0. WWDC25 is later this month, and the MLX team has been signaling a 2.0 release with first-party fine-tuning APIs, a stabilized graph compiler, and optimized kernels for the M5 series. If MLX 2.0 lands, expect a measurable tok/s lift across every model running on Apple Silicon — a free speed upgrade for the entire ecosystem.
Meta Llama 5 405B Frontier. Meta released the 8B, Scout, and 70B variants. A 405B "Frontier" tier has been rumored since the Llama 5 launch event mentioned "a larger model is coming." If it ships, it would be server-class only on Mac (needs 256 GB+ unified memory at Q2) but would directly target GPT-5o and Claude 4.5 Opus on closed benchmarks.
Microsoft Phi-5 Large. Microsoft's pattern with Phi-3 and Phi-4 was to ship Mini, Medium, then a "Large" variant (typically 28B). Phi-5 Large would slot directly into the 24 GB Mac tier and compete with Qwen 4 on reasoning. Watch for it in mid-to-late July.

Watch list: The single most consequential possible July release is a permissively-licensed coding model that beats Qwen 4 Coder on SWE-Verified. Qwen has the lead by a wider margin than any other model in any other category. If a competitor ships an Apache 2.0 or MIT coder above 85% SWE-Verified, the agent-platform market resets again.

LLMCheck Research Team

We benchmark local AI models on real Apple Silicon hardware. Our database covers 46+ open and closed models with 109 standardized tok/s measurements using Ollama, LM Studio, and MLX.

Frequently Asked Questions

What's the best open-source local LLM in June 2026?

According to the LLMCheck index (June 6, 2026), Alibaba's Qwen 4 32B-A3B (full release) is the #1 open-source local LLM with an LLMCheck Score of 75/100. It scores 78% on SWE-Verified, 89% MMLU, and 94% HumanEval, runs on 24 GB Macs at Q4_K_M, ships under Apache 2.0, and reaches 60–80 tok/s on M4 Pro through M5 Max.

What's new in Qwen 4 vs Qwen 4 Preview?

The full Qwen 4 release improves on the Preview by roughly 2 percentage points across every public benchmark — SWE-Verified climbs from 76% to 78%, MMLU from 88% to 89%, HumanEval from 92% to 94%, and AIME from 89% to 91%. The architecture is identical (32B-A3B MoE) but the instruction-tuning and tool-use stages were rebuilt on a larger preference dataset. The LLMCheck Score rises from 73 to 75.

Can I run Llama 5 70B on a Mac?

Yes, but only on high-RAM Apple Silicon. Llama 5 70B at Q4_K_M needs roughly 42 GB of unified memory, so a 64 GB Mac is technically viable but tight. The realistic home is a Mac Studio M5 Max 128 GB (~18 tok/s) or Mac Studio M4 Ultra 192 GB (~22 tok/s). Below 64 GB total RAM, the 70B is not practical and Qwen 4 32B-A3B is the better choice.

Is Grok 4 Open actually open?

Partially. Grok 4 Open ships as a 100B-A20B mixture-of-experts model with weights downloadable from HuggingFace, but it uses the xAI Custom License rather than Apache 2.0 or MIT. The license permits commercial use without an MAU cap but adds a "no-competing-foundation-model" clause and attribution requirements. The Open Source Initiative does not classify Grok 4 as open source in the strict sense, but for practical use cases it is fully usable and the weights are real.

What's the fastest local LLM in June 2026?

Gemma 4 E2B still holds the raw tok/s crown at ~158 tok/s on M3 Max, but Qwen 4 4B is the fastest model that is also competitive on capability — 135 tok/s on M5 Max with 84% MMLU. For pure speed, Gemma 4 E2B wins; for speed-per-quality, Qwen 4 4B is the new June 2026 champion.

Mistral Voyage Pro 70B vs Llama 5 70B — which is better?

It depends on your use case. Llama 5 70B wins on raw reasoning (MMLU 88% vs 85%, HumanEval 86% vs 84%) but ships under the Llama 5 Community License with its 700M MAU cap. Mistral Voyage Pro 70B wins on license (Apache 2.0, unrestricted), agentic tool use (SWE-Verified 68% vs 65%), and a faster Mac runtime (~20 tok/s vs ~18 tok/s on M5 Max 128 GB). For most teams shipping products, Voyage Pro is the better choice; for pure capability benchmarks, Llama 5 70B.

Why does Qwen 4 Coder beat Devstral?

Qwen 4 Coder 32B-A3B was post-trained on roughly 4 trillion tokens of code-specific data — a larger and more recent code corpus than the one Devstral was trained on. It scores 82% on SWE-Verified vs Devstral's 73%, ships under Apache 2.0 instead of Devstral's commercial license, and runs at ~58 tok/s on a 24 GB M4 Pro thanks to its MoE structure. For Mac coding agents in June 2026, Qwen 4 Coder is the new default.

Is Phi-5 Medium worth it over Qwen 4 4B?

If you have a 16 GB or 24 GB Mac and you primarily care about reasoning quality, yes — Phi-5 Medium 14B scores 86% MMLU and 75% AIME 2025, beating Qwen 4 4B by 2 to 14 percentage points on hard benchmarks. If you have an 8 GB Mac or you prioritize tok/s, Qwen 4 4B is the better pick at 135 tok/s vs Phi-5 Medium's ~65 tok/s on M5 Max.

Has open-source caught up to GPT-5 / Claude 4.5 yet?

On most benchmarks, yes. Qwen 4's 78% SWE-Verified is within 2 points of GPT-5o (80%) and matches Claude 4.5 Sonnet. DeepSeek R2 still leads everyone on AIME math. The gaps that remain are in long-horizon agentic workflows, video understanding, and tool-use reliability at production scale — closed models still hold a measurable lead in those three areas as of June 6, 2026.

What changed in licenses for June 2026?

Apache 2.0 share of the open top-25 reached 60% for the first time, driven by Qwen 4, Qwen 4 Coder, and Mistral Voyage Pro 70B all shipping under Apache 2.0. Grok 4 Open introduced a new "xAI Custom" license that is permissive but not OSI-approved. Llama 5 retained its 700M MAU cap. Gemma 4.5 kept the Gemma license. The trend is unambiguous: permissive OSI licensing now dominates the top tier.

Sources & References

Find Your June 2026 Match

Use our free Mac LLM Checker to find which June 2026 model fits your hardware — from 8 GB MacBook Air to M4 Ultra Mac Studio.

Check My Mac

State of Open-Source Local LLMs — June 2026

The 30-Day Recap (TL;DR)

Qwen 4 Goes Live — the new #1

Architecture

Benchmarks

Qwen 4 Coder — the top open-source Mac coder

Coding Benchmarks

Mac Performance

Llama 5 70B vs Mistral Voyage Pro 70B — the 70B race

Grok 4 Open — xAI's first open weights

Grok 4 Open

License & Mac

Phi-5 Medium 14B tops its tier

Gemma 4.5 — Google's June refresh

The full landscape — Open-Source Top 10 (June 6, 2026)

5 things that changed in June 2026

1. Frontier-class is now 24 GB Mac territory

2. 70B dense is the new battleground

3. xAI joined the open club

4. 1M context became table stakes

5. Apache 2.0 dominance hit 60%

By Mac tier — what to run TODAY (June 2026)

What's coming next month

Frequently Asked Questions

What's the best open-source local LLM in June 2026?

What's new in Qwen 4 vs Qwen 4 Preview?

Can I run Llama 5 70B on a Mac?

Is Grok 4 Open actually open?

What's the fastest local LLM in June 2026?

Mistral Voyage Pro 70B vs Llama 5 70B — which is better?

Why does Qwen 4 Coder beat Devstral?

Is Phi-5 Medium worth it over Qwen 4 4B?

Has open-source caught up to GPT-5 / Claude 4.5 yet?

What changed in licenses for June 2026?

Sources & References

Related on LLMCheck

Find Your June 2026 Match