Moonshot AI Releases Open Kimi K2.5 Model, Beats All US Models On Humanity’s Last Exam, BrowseComp Benchmarks

China continues to release top-quality AI models — and they’re now besting the top US models on some benchmarks.

Chinese AI startup Moonshot AI has launched Kimi K2.5, a powerful multimodal model that has achieved state-of-the-art results across multiple challenging benchmarks. Most notably, K2.5 has scored 50.2% on Humanity’s Last Exam (HLE-Full) with tools enabled, outperforming GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro. This marks a significant achievement for Chinese AI companies and underscores the growing competitiveness of open-source alternatives to proprietary US models.

A Native Multimodal Powerhouse

Kimi K2.5 builds on its predecessor, Kimi K2, through continued pretraining on approximately 15 trillion mixed visual and text tokens. As a native multimodal model, K2.5 delivers exceptional coding and vision capabilities, making it particularly strong for tasks that require reasoning across images, video, and text.

What sets K2.5 apart is its innovative self-directed agent swarm paradigm. For complex tasks, the model can orchestrate up to 100 sub-agents working in parallel, executing workflows across up to 1,500 coordinated tool calls. This parallel execution reduces completion time by up to 4.5× compared to traditional single-agent approaches, with the entire swarm automatically created and managed by K2.5 without predefined workflows.

Kimi K2.5 Benchmarks

K2.5’s performance across key benchmarks demonstrates its capabilities. On agentic benchmarks, where AI systems must use tools and navigate complex multi-step tasks, K2.5 leads the field with 50.2% on HLE-Full with tools, 74.9% on BrowseComp with context management, and 77.1% on DeepSearchQA.

In vision tasks, K2.5 achieves 78.5% on MMMU Pro and an impressive 86.6% on VideoMMMU, showcasing its ability to reason over both images and video. For coding, the model scores 76.8% on SWE-Bench Verified and 73.0% on SWE-Bench Multilingual, making it the strongest open-source coding model to date.

Particularly striking is K2.5’s performance with its Agent Swarm feature enabled. On BrowseComp, the swarm mode pushes performance to 78.4%, and on WideSearch, it achieves 79.0% item-f1, demonstrating the power of parallel, coordinated execution for complex research tasks.

Coding Capabilities

K2.5 excels at visual coding tasks, with particularly strong front-end development capabilities. The model can transform simple conversational prompts into complete, interactive websites with rich animations and scroll-triggered effects. More impressively, it can reconstruct functional websites from video demonstrations and debug code by visually inspecting outputs.

This breakthrough in autonomous visual debugging enables K2.5 to iterate on its own work, using visual inputs and documentation to refine outputs without human intervention. Moonshot AI has paired K2.5 with Kimi Code, an open-source terminal tool that integrates with popular IDEs including VSCode, Cursor, and Zed, supporting both image and video inputs.

China’s Open-Source AI Surge

Kimi K2.5’s launch comes amid a remarkable acceleration in Chinese AI development. Over recent months, Chinese companies including DeepSeek, Alibaba’s Qwen team, and now Moonshot AI have released increasingly capable models that rival or surpass leading US offerings.

Crucially, many of these Chinese models are fully open-source, contrasting sharply with the closed, proprietary nature of top US models like GPT-5.2, Claude Opus 4.5, and Gemini 3 Pro. This open approach democratizes access to cutting-edge AI technology and enables researchers and developers worldwide to build upon these foundations.

The implications extend beyond technical metrics. By achieving competitive or superior performance at a fraction of the cost, models like Kimi K2.5 challenge the assumption that frontier AI requires massive proprietary investments. On HLE, BrowseComp, and SWE-Verified benchmarks, K2.5 delivers strong results while remaining accessible to the broader developer community.

Scaling Out with Agent Swarms

K2.5’s Agent Swarm represents a shift from single-agent scaling to coordinated parallel execution. Trained using Parallel-Agent Reinforcement Learning (PARL), the system learns to decompose complex tasks into parallelizable subtasks, each handled by dynamically instantiated sub-agents.

The orchestrator agent evaluates performance using Critical Steps, a latency-oriented metric inspired by parallel computation’s critical path. This ensures that spawning additional subtasks only occurs when it meaningfully shortens execution time. In practice, Agent Swarm reduces critical steps by 3×–4.5× compared to single-agent execution for complex search scenarios.

Moonshot AI demonstrated this capability by tasking K2.5 Agent Swarm with identifying the top three YouTube creators across 100 niche domains. The system autonomously created 100 specialized sub-agents, conducted parallel searches, and aggregated 300 creator profiles into a structured spreadsheet—showcasing both scale and coordination.

Professional Productivity Applications

Beyond research and coding, K2.5 brings agentic intelligence to real-world knowledge work. The model handles high-density office tasks end-to-end, producing documents, spreadsheets, PDFs, and presentations directly through conversation.

On Moonshot AI’s internal productivity benchmarks, K2.5 shows 59.3% improvement on AI Office tasks and 24.3% improvement on General Agent workflows compared to K2 Thinking. The model supports advanced features including Word annotations, financial modeling with pivot tables, and LaTeX equations in PDFs, while scaling to outputs like 10,000-word papers or 100-page documents.

Access and Availability

Kimi K2.5 is now available via Kimi.com, the Kimi App, and through an API. The platform offers four modes: K2.5 Instant, K2.5 Thinking, K2.5 Agent, and K2.5 Agent Swarm (currently in beta). Agent Swarm access is initially available for high-tier paid users with free credits.

For developers, Kimi Code provides command-line access to K2.5’s agentic coding capabilities, with automatic discovery and migration of existing skills and Model Context Protocol (MCP) integrations into the working environment.

Looking Forward

Kimi K2.5 represents a meaningful milestone for both Chinese AI development and the open-source community. By achieving state-of-the-art performance on Humanity’s Last Exam and other challenging benchmarks while remaining fully open-source, Moonshot AI demonstrates that frontier AI capabilities need not be locked behind proprietary walls.

As Chinese AI companies continue releasing increasingly capable models, the competitive landscape shifts. The choice is no longer simply between closed US models and less capable alternatives—open-source options now compete directly on performance while offering greater accessibility, transparency, and opportunity for innovation. For researchers, developers, and organizations weighing AI deployment options, K2.5’s combination of strong benchmark results, innovative agent swarm architecture, and open availability marks it as a significant contender worthy of serious consideration.

Posted in AI