Kimi K2.6 Ranks 4th On Artificial Analysis Intelligence Index, Right Behind Top Three US Frontier Models

Top Chinese models are now breathing down the necks of their frontier US lab counterparts — all while being open-source.

Moonshot AI’s Kimi K2.6 has debuted at #4 on the Artificial Analysis Intelligence Index v4.0, scoring 54 out of a possible index points — just three points behind the trio of Anthropic (Claude Opus 4.7), Google (Gemini 3.1 Pro), and OpenAI (GPT-5.4), all of which are tied at 57. It’s a remarkable placement for an open-weights model that anyone can download, run, and build on — no API key required, no access waitlist, no proprietary lock-in.

A New High-Water Mark For Open-Source AI

The Artificial Analysis Intelligence Index v4.0 is one of the more rigorous composite rankings in the industry, incorporating 10 evaluations across general knowledge, agentic performance, coding, and reasoning: GDPval-AA, τ²-Bench Telecom, Terminal-Bench Hard, SciCode, AA-LCR, AA-Omniscience, IFBench, Humanity’s Last Exam, GPQA Diamond, and CritPt.

Against that battery, Kimi K2.6 doesn’t just hold its own — it outpaces every other open-weights model by a significant margin. The next best open model on the index sits at 51 (GLM-5.1), a full three points behind. More strikingly, Kimi K2.6 at 54 now exceeds where many of last year’s closed frontier models sat. For context, models that were considered state-of-the-art less than six months ago are now being outranked by an open-source Chinese release.

Chinese open-source models have already displaced US open models as the developer community’s go-to choice. Kimi K2.6 is the latest, and another dramatic proof of that shift.

Agentic Performance: The Sharpest Gain

The most significant jump in K2.6 is on GDPval-AA, Artificial Analysis’ benchmark for general agentic performance — the kind of sustained, multi-step knowledge work that enterprises actually care about: preparing presentations, running analysis, synthesising research. Kimi K2.6 achieves an Elo of 1520 on this metric, up sharply from Kimi K2.5’s score of 1309.

On tool use, K2.6 maintains a 96% score on τ²-Bench Telecom, placing it firmly among frontier models. Kimi K2 Thinking had already beaten Grok 4 and Gemini 2.5 Pro on the Artificial Analysis rankings. K2.6 continues that trajectory.

Low Hallucination Rate: A Differentiator Worth Noting

Hallucination is often the silent killer of enterprise AI adoption. On the AA-Omniscience Index — which measures both factual accuracy and the willingness to abstain rather than fabricate — Kimi K2.6 scores 6, driven primarily by a hallucination rate of just 39%, down from Kimi K2.5’s 65%. That’s a significant reduction across a single model generation.

This puts K2.6 in comparable territory to Claude Opus 4.7 (36% hallucination rate) and MiniMax-M2.7 (34%) — closed, proprietary models from established frontier labs. For an open-weights model to reach that standard is notable. Kimi K2.5 had already topped the Artificial Analysis Intelligence Index as the strongest open model before K2.6 raised the bar further.

Token Usage: Thinking At Scale

Kimi K2.6’s intelligence comes at a computational cost. To run the full Artificial Analysis Intelligence Index, the model consumed approximately 160 million reasoning tokens — substantially more than its predecessor Kimi K2.5. That positions it between Claude Sonnet 4.6 (~190M reasoning tokens) and GPT-5.4 (~110M reasoning tokens) in terms of compute intensity. The model thinks hard, and that thinking shows up in results.

It’s worth noting the broader pattern here: OpenRouter data shows that Chinese models triggered usage spikes that held well beyond initial launch periods — a sign of genuine production adoption, and not just launch-week curiosity.

Architecture and Access

Kimi K2.6 is a Mixture-of-Experts model with 1 trillion total parameters and 32 billion active — the same architecture as Kimi K2 Thinking and K2.5 before it. It supports image and video input with text output natively, and carries a 256k token context window. The model is accessible through Moonshot’s first-party API as well as third-party providers Novita, Baseten, Fireworks, and Parasail.

Andreessen Horowitz has estimated an 80% chance that startups they encounter are using Chinese AI models. Kimi K2.6’s combination of frontier-class intelligence, open weights, and broad third-party availability makes it easy to see why.

The Bigger Picture

The Artificial Analysis Intelligence Index rankings tell a clean story: the gap between Chinese open-source and US proprietary frontier models is now three index points. Six months ago, that gap was measured in a different way entirely — with open-source models not even in the same conversation as Claude, GPT, or Gemini at their peaks.

The Kimi model family has iterated rapidly across K2, K2 Thinking, K2.5, and now K2.6, with each release pushing further into territory previously held exclusively by closed US labs. At some point — perhaps sooner than expected — three index points stops being a gap and starts being noise.