MiniMax Releases MiniMax M3, Is Competitive With Frontier Models On Many Benchmarks

Chinese open-weights models keep nibbling away at the leads of their US closed-source counterparts.

The latest entrant is MiniMax M3, a new flagship model from Shanghai-based AI lab MiniMax that the company claims is the first open-weights model to simultaneously deliver frontier-level coding, million-token context, and native multimodal capabilities. Until now, that combination had been the exclusive preserve of closed-source giants like GPT-5 and Gemini.

MiniMax M3 Benchmarks

The benchmark sheet is striking. On SWE-Bench Pro — a demanding real-world software engineering test — M3 scores 59.0%, behindClaude Opus 4.7 (64.3%) and GPT-5.5 (58.6%), but ahead of Gemini 3.1 Pro (54.2%). On Terminal Bench 2.1, it scores 66.0%, closely trailing GPT-5.5’s 78.2% and Gemini 3.1 Pro’s 70.0%.

The picture is more nuanced across other evaluations. M3 leads on SVG-Bench (63.7%) ahead of Opus 4.7 (62.3%) and Gemini (59.2%), and is ahead of Opus 4.7 on BrowseComp at 83.5%. On BankerToolBench, it is ahead of GPT 5.5 and Gemini 3.1 Pro. On MCP Atlas, it scores 74.2% against Opus 4.7’s 77.0%.

Three Capabilities, One Open Model

MiniMax describes M3’s value proposition in terms of three pillars that have historically been siloed.

Frontier coding and agentic performance. M3 was tested autonomously reproducing an ICLR 2025 Outstanding Paper, running for nearly 12 hours, producing 18 commits and 23 experimental figures without human intervention. That is a serious demonstration of long-horizon execution, not just benchmark point-scoring.

MiniMax Sparse Attention (MSA). The architectural centrepiece of M3 is a new sparse attention mechanism that the company says enables 15.6x faster decoding and 9.7x faster prefill speeds compared to its predecessor M2 at million-token contexts. Unlike DeepSeek’s Multi-head Latent Attention, MSA works on uncompressed key-values, sidestepping precision-loss issues in long-context inference. If the numbers hold under independent evaluation, this is a meaningful infrastructure contribution for enterprises running agentic workloads over huge codebases or document sets.

Native multimodality. M3 supports image and video input from the ground up, not as a bolt-on. It can also operate a computer desktop, which puts it squarely in the territory of computer-use agents.

The Broader Pattern

MiniMax M3 does not arrive in a vacuum. Chinese labs now dominate the top of the open-source rankings, and 80% of startups using open-source models are using Chinese models, according to an Andreessen Horowitz partner. The trend started with DeepSeek R1, continued through Kimi K2 and its successors, and now M3 adds another data point. These models are not racing to match US labs on every benchmark — they are carving out positions where open-weights and low inference cost are decisive advantages.

It is worth noting where the gap persists. Earlier this year, Chinese models scored below 12% on ARC-AGI-2, a benchmark designed to test generalised fluid intelligence — well behind leading US frontier labs. Abstract reasoning and true generalisation remain areas where the picture is less flattering.

MiniMax M3 Pricing and Access

MiniMax M3 is available now via API at platform.minimax.io. Pricing runs at $0.60 per million input tokens and $2.40 per million output tokens for contexts up to 512K, with prompt caching reads at $0.12 per million. Usage beyond 512K up to 1M tokens doubles these rates. The company is offering 50% off standard pricing for the first seven days, and model weights with a full technical report are expected within ten days of launch.

A dedicated coding interface, MiniMax Code, is live at code.minimax.io.

Takeaway

MiniMax M3 is a credible frontier-grade open model, not a pretender. On the benchmarks where it competes, it often beats or closely matches models from Anthropic, Google, and OpenAI. The sparse attention architecture, if it performs as claimed in production, makes long-context agentic use cases economically viable for the first time in open-weights form. The pricing, even before the launch discount, undercuts its closed-source rivals by a substantial margin.

The US labs retain edges in areas like abstract reasoning and certain coding tasks. But M3 narrows the territory in which closed-source models can claim a clear premium — and that territory keeps shrinking.