OpenRouter’s monthly leaderboard is one of the cleaner signals in AI — it tracks token consumption across thousands of developers and apps, which means it reflects actual usage, not just benchmark hype. The June 2026 rankings tell a clear story: Chinese open-source models have taken over the top of the chart, Anthropic’s Claude family holds firm in the middle tier, and the rest of the field is scrambling for scraps.
Here’s a breakdown of every model in the top 10.

1. DeepSeek V4 Flash — 10.9T tokens (+995%)
The runaway leader. DeepSeek V4 Flash is an MoE model built around DeepSeek Sparse Attention (DSA) and token-wise compression, making 1M-context inference practical at scale — and DeepSeek offers it as the default. The nearly 10x growth in token consumption reflects a model that hit production pipelines hard and stayed there.
It does come with a caveat: V4 Flash hallucinates 96% of the time when it doesn’t know an answer, preferring a confident wrong response over abstention. For production deployments where accuracy matters more than throughput, that’s a meaningful risk. But for developers optimizing for speed and volume, the price-to-performance ratio is hard to beat.
2. Hy3 Preview — 10.7T tokens (+>999%)
Tencent’s Hy3 Preview is the biggest surprise on the leaderboard. Released in late April 2026 — less than three months after Tencent rebuilt its pre-training infrastructure from scratch — it went from zero to near-parity with DeepSeek V4 Flash in a single month.
Hy3 is a 295B-parameter MoE model with only 21B active parameters per inference pass. It’s optimized for agentic workflows, long-context understanding, and instruction following. On BrowseComp, a benchmark for complex web research, it reached 67.1%, a dramatic jump from Hy2’s 28.7%. Its pricing — $0.063 per million input tokens — makes it one of the most accessible models on the market. The >999% growth figure tells you everything: this model essentially didn’t exist on OpenRouter last month.
3. Claude Opus 4.7 — 7.48T tokens (+197%)
Claude Opus 4.7 is Anthropic’s flagship publicly-available model, and it’s the highest-ranked closed-source model on the leaderboard. The 197% growth isn’t viral surprise — it’s steady production adoption.
Opus 4.7 leads GPT-5.4 and Gemini 3.1 Pro on most key agentic benchmarks, including SWE-bench Pro (64.3%) and SWE-bench Verified (87.6%). Anthropic’s data shows it tops the Artificial Analysis GDPval-AA benchmark for general agentic performance across 44 occupations. Claude Code now accounts for roughly 4% of all public GitHub commits — and Opus 4.7 is what’s powering most of those workflows. For enterprise developers who need reliability alongside raw capability, Opus 4.7 is the benchmark.
4. Claude Sonnet 4.6 — 7.45T tokens (+34%)
Claude Sonnet 4.6 is the workhorse of the Claude lineup — fast, cost-efficient, and capable enough for most production use cases that don’t require Opus-level reasoning. The more modest 34% growth reflects a model that’s already deeply embedded in workflows rather than one riding a launch spike. Sitting just 30 billion tokens behind Opus 4.7 suggests many teams are running both, routing simpler tasks to Sonnet and harder ones to Opus.
5. Owl Alpha — 5.03T tokens (+>999%)
OpenRouter’s own model, built specifically for the platform. The >999% growth suggests it’s functioning as a default fallback or routing layer for traffic that doesn’t specify a model. Without external benchmarks to evaluate it against, it’s difficult to assess on capability alone — but its position here says something about the value of platform-native distribution.
6. Gemini 3 Flash Preview — 4.6T tokens (+3%)
Google’s Gemini 3 Flash is designed for high-frequency, latency-sensitive workflows — the kind of production pipelines where you’re calling a model thousands of times a day and raw intelligence is less important than speed and cost. The near-flat 3% growth puts it in a different category from the models above it: this is a mature, stable choice for teams that have already built around it. Gemini 3 Pro dominates the intelligence benchmarks in Google’s lineup, but Flash is where the volume lives.
7. DeepSeek V4 Pro — 4.54T tokens (+739%)
The more powerful sibling to V4 Flash. DeepSeek V4 Pro runs 1.6 trillion total parameters with 49 billion active — more than double V3’s architecture — and scores 52 on the Artificial Analysis Intelligence Index, making it the #2 open-weights model behind Kimi K2.6. The 739% growth is high, but V4 Flash’s 995% suggests developers are preferring speed over peak capability for most workloads. V4 Pro carries the same hallucination caveat as Flash, at a rate of 94%.
8. DeepSeek V3.2 — 4.31T tokens (-14%)
The only model in the top 10 with declining usage. DeepSeek V3.2 is being cannibalized by its successors — both V4 Flash and V4 Pro offer better performance at competitive prices, and the 14% drop is a predictable consequence of the V4 launch. V3.2 remains a capable reasoning-first model built for agentic tasks, and it will likely retain a long tail of users who’ve built stable pipelines around it. But the trajectory is clear.
9. Kimi K2.6 — 3.72T tokens (+1%)
Moonshot AI’s Kimi K2.6 is the top-ranked open-weights model on the Artificial Analysis Intelligence Index at 54 — just three points behind the closed-source trio of Claude Opus 4.7, GPT-5.4, and Gemini 3.1 Pro, all tied at 57. It’s a 1-trillion-parameter MoE model with 32B active parameters and native support for image and video input.
The nearly flat growth (+1%) despite strong benchmark performance is interesting. Kimi K2.6 may be a developers’ model — widely respected, heavily evaluated, but not yet embedded in the kind of high-volume pipelines that drive token counts into the trillions. The Chinese open-source surge is real, but token leadership still correlates with price and infrastructure availability, not just benchmark rank.
10. Nemotron 3 Super (free) — 2.65T tokens (+3%)
Nvidia’s entry into the open-weights race. Nemotron 3 Super is a 120B-parameter hybrid Mamba-Transformer MoE model that scores 48 on the Artificial Analysis Intelligence Index — the highest of any US open-weights model, though still behind the Chinese-led frontier. Nvidia offers it free on OpenRouter, which explains its position here. The 3% growth is modest but steady. Its real advantage isn’t intelligence — it’s inference speed, serving over 300 tokens per second compared to 50–100 for comparable Chinese models. For latency-sensitive workloads, that matters.
The bigger picture
The June 2026 leaderboard makes the structural shift in AI hard to ignore. Chinese open-source models occupy six of the top ten spots, and the two at the very top grew by close to 1,000% in a single month. Chinese models have already displaced US open models as the developer community’s default choice — not because of benchmark games, but because they’re fast, cheap, and available.
Anthropic is the exception among Western labs, holding positions three and four through genuine production utility. The Claude family’s staying power comes from agentic performance and enterprise trust — qualities that take longer to build but are harder to displace.
The decline of DeepSeek V3.2 is also worth noting as a structural signal: in this market, even strong models become obsolete within a few months. The labs releasing new architectures fastest are winning the leaderboard — and right now, that race is being run largely out of China.