Chinese models are now regularly beating models from frontier US models on some benchmarks.
Minimax, one of China’s leading AI companies, has unveiled its latest large language model, M2.5, which is already making headlines for surpassing top systems from OpenAI, Anthropic, and Google on several competitive AI benchmarks. According to the company, M2.5 delivers state-of-the-art (SOTA) performance in software engineering, reasoning, and agentic tasks while maintaining remarkable efficiency and scalability.

In the SWE-Bench Verified coding benchmark—a key industry measure of code generation and software reasoning—Minimax M2.5 achieved 80.2%, edging out OpenAI’s GPT-5.2 (80.0%), , and Google’s Gemini 3 Pro (78%), while being within touching distance of Anthropic’s Claude Opus 4.6 (80.8%).
Beyond coding, M2.5 also demonstrated leading performance across other practical productivity and agentic evaluation suites, achieving 76.3% on BrowseComp (web search and context), 76.8% on BFCL Multi-Turn (tool-use reasoning), 74.4% on MEWC (multi-expert workflow coordination), and 54.2% on VIBE-Pro (office productivity).
Minimax claims that M2.5 is 37% faster at complex tasks than its predecessor, while offering enterprise-ready throughput at a cost of just $1 per hour with 100 transactions per second (TPS). The company positions this as a breakthrough for scalable long-horizon AI agents, enabling continuous, low-cost execution for complex workflows such as research assistants, customer service automation, and software maintenance.
A separate longitudinal comparison shows how rapidly Minimax has caught up. Over the past year, its M-series has climbed from 56% on SWE-Bench Verified (M1) to 80.2% (M2.5), outpacing even the steady improvements from OpenAI, Anthropic, and Google.

The M2.5 model is yet another indication of just how close China is to the US in AI progress. Just yesterday, Chinese company Z.ai had released GLM-5, which had scored higher than Google’s flahship Gemini 3 Pro on the Artificial Analysis Intelligence Index. Minimax’s M2.5 has now performed better than both the Google and OpenAI flagship models on a popular coding benchmark. And while there has never been a better time to be a consumer of AI, China’s rise could also herald a change in the global world order and lead to it emerging as a equal challenger to the US on the world stage.