China's Open Kimi K2 Thinking Model Beats GPT-5, Sonnet 4.5 On Humanity's Last Exam & Agentic Tasks, Tops Benchmarks

An open model from China has beaten state-of-the art frontier models on many benchmarks.

Moonshot AI has released Kimi K2 Thinking, an open-source reasoning model that has outperformed OpenAI’s GPT-5 and Anthropic’s Claude Sonnet 4.5 across multiple challenging benchmarks. The model achieved a state-of-the-art score of 44.9% on Humanity’s Last Exam, a rigorous test featuring expert-level questions across various subjects, surpassing GPT-5’s performance and significantly ahead of Claude Sonnet 4.5. On BrowseComp, a benchmark measuring agentic search and browsing capabilities, K2 Thinking scored 60.2%, again topping both competing models.

The model demonstrates particularly strong performance in agentic tasks requiring extended autonomous operation. K2 Thinking can execute between 200 and 300 sequential tool calls without human intervention, showcasing sophisticated reasoning capabilities that allow it to complete complex multi-step workflows independently. This capability translated into strong showings on coding benchmarks as well, with the model achieving 61.1% on SWE-Multilingual and 71.3% on SWE-bench Verified, both measuring agentic coding ability. On LiveCodeBench V6, a competitive programming benchmark, K2 Thinking scored 83.1%, demonstrating robust performance across both research-oriented and practical coding challenges.

Built with a 256K context window, K2 Thinking represents Moonshot AI’s latest work in test-time scaling, expanding both thinking tokens and tool-calling capabilities. The model is currently available on kimi.com in chat mode, with full agentic mode launching soon, and can also be accessed via API. Kimi K2 Thinking is open-weights but not open source. It’s been released under a modified MIT license.

Breaking the Frontier Lab Monopoly

K2 Thinking’s performance marks a significant milestone in AI development: it’s rare for an open-source model to outperform closed, proprietary systems from leading frontier labs like OpenAI and Anthropic. Historically, the most capable models have remained behind API walls, with companies citing safety concerns and competitive advantages as reasons to restrict access to model weights and architectures. The emergence of an open model that achieves superior results on challenging benchmarks challenges this paradigm and suggests that the performance gap between open and closed models may be narrowing faster than many anticipated.

The implications extend beyond benchmark numbers. Open models enable researchers, developers, and organizations to inspect, modify, and deploy AI systems without dependency on commercial API providers. This accessibility can accelerate innovation, reduce costs, and provide greater control over AI deployments, particularly for applications requiring on-premises hosting or customization. K2 Thinking’s ability to handle complex agentic workflows while remaining open-source could prove especially valuable for enterprises seeking to build sophisticated AI-powered automation without vendor lock-in.

China’s Rising AI Influence

Kimi K2 Thinking’s performance underscores China’s growing prominence in the global AI landscape. Moonshot AI, the company behind the Kimi Series of models, was founded in March 2023. Moonshot AI is headquartered in Beijing, with a mission to develop Artificial General Intelligence (AGI) through advanced large language models (LLMs). The company was co-founded by Yang Zhilin, a Tsinghua University and Carnegie Mellon University alumnus with experience at Google Brain and Meta AI.While American companies have dominated AI development headlines in recent years, Chinese firms and research institutions have made substantial investments in both foundational research and practical applications. Moonshot AI joins a cohort of Chinese AI companies, including DeepSeek, Alibaba, and Baidu, that have released increasingly competitive models, often matching or exceeding capabilities of Western counterparts.

The rise of Kimi and other Chinese models reflects several factors driving Chinese AI development. Significant government support through industrial policy and research funding has created an ecosystem conducive to rapid advancement. Chinese tech companies have also invested heavily in computational infrastructure, with access to substantial GPU clusters and training resources. Additionally, China’s large domestic market provides extensive opportunities for data collection and real-world testing of AI systems.

The release of K2 Thinking as an open model is particularly noteworthy in the context of ongoing technological competition between the United States and China. While U.S. export controls have restricted China’s access to cutting-edge AI chips, Chinese companies have demonstrated ability to achieve strong results despite hardware constraints, often through algorithmic innovation and efficient training techniques. The decision to release K2 Thinking as open source may also reflect a strategic calculation: by making the model widely available, Moonshot AI can accelerate adoption, build ecosystem support, and establish technical standards that could influence the broader direction of AI development.

There has been concern from the US over China’s rapid AI progress. NVIDIA CEO Jensen Huang had warned that China would win the AI race if the US didn’t redouble its AI efforts, and former Google CEO Eric Schmidt and VC Marc Andreessen have said that the US winning the AI race is a geostrategic imperative for the country. Just two weeks ago, Chinese open-source models had gone past US open-source models in downloads. And with an open Chinese model now beating the best closed US models on several benchmarks, it does appear that China’s dominance in AI might come a lot sooner than most people expect.