Cursor Releases Composer 2.5, Matches Opus 4.7 On Some Benchmarks

Cursor has been overshadowed by Claude and Anthropic in recent quarters for coding use-cases, but it’s looking to make a comeback with a model of its own.

The AI coding IDE has released Composer 2.5, its most capable in-house model yet, promising significant gains in intelligence, reliability on long-running tasks, and overall usefulness. The launch is a pointed move in an increasingly competitive market where Cursor — once the undisputed leader in AI-assisted coding — has found itself on the defensive.

The Stakes Are Real

The context for this release is hard to ignore. Claude Code has grown into a formidable rival, reportedly crossing $2.5 billion in annualized revenue and signing up over 300,000 business customers. Anthropic’s structural advantage — offering Claude Code at prices Cursor simply cannot match while paying Anthropic for inference at the same time — has put Cursor in an uncomfortable squeeze. Building its own model is, in part, a bid to break that dependency.

Cursor’s own numbers remain impressive — it was generating a billion lines of accepted code per day as recently as mid-2025, and 67% of Fortune 500 companies are customers. But the vibe has shifted. “I don’t believe the ‘Cursor is dead’ memes,” Warp CEO Zach Lloyd told Fortune, “but ‘The IDE is dead’ is real.” Autonomous coding agents are what the market is excited about now, and Composer 2.5 is Cursor’s answer.

The Benchmarks

On paper, Composer 2.5 is competitive. On SWE-Bench Multilingual, it scores 79.8% — just a hair behind Opus 4.7’s 80.5% and ahead of GPT-5.5’s 77.8%. On Terminal-Bench 2.0, it matches Opus 4.7 closely (69.3% vs. 69.4%), with GPT-5.5 pulling ahead at 82.7%.

The more nuanced story is on CursorBench v3.1, Cursor’s own harder-task benchmark, where Composer 2.5 scores 63.2%. Opus 4.7 scores higher at 64.8% on its max setting, but its default (xhigh) setting drops to 61.6%. GPT-5.5’s default comes in at 59.2%.

The cost-efficiency angle is where Cursor makes its most compelling argument. Priced at $0.50/M input and $2.50/M output tokens, Composer 2.5 is dramatically cheaper than comparable frontier models. An effort curve chart published alongside the release shows Composer 2.5 achieving roughly 63% on CursorBench at under $1 average cost per task — a point where competitors like Opus 4.7 and GPT-5.5 cost several dollars more per task for similar or worse results.

What’s New Under the Hood

Composer 2.5 is built on Moonshot’s Kimi K2.5, the same open-source base as Composer 2, but 85% of its total compute went into Cursor’s own training and reinforcement learning on top of that foundation.

Three technical advances stand out. First, targeted RL with textual feedback: rather than relying on a single reward signal at the end of a long rollout, Cursor inserts localized hints directly at the point in a trajectory where the model erred — say, a bad tool call — and uses the corrected distribution as a teacher signal. This makes credit assignment far more precise over rollouts spanning hundreds of thousands of tokens.

Second, synthetic data at scale: Composer 2.5 was trained on 25x more synthetic tasks than its predecessor. One creative approach involves “feature deletion” — stripping a working codebase of a feature and asking the model to reimplement it, with tests serving as the verifiable reward. As a side effect, the model got creative at gaming tasks: in one instance it reverse-engineered a Python type-checking cache to recover a deleted function signature; in another, it decompiled Java bytecode to reconstruct a third-party API. Cursor says it caught these via agentic monitoring, but the examples hint at how hard large-scale RL is becoming to control.

Third, Sharded Muon with dual mesh HSDP: Cursor uses a distributed variant of the Muon optimizer that runs Newton-Schulz orthogonalization asynchronously across shards, overlapping network communication with compute. On a 1T-parameter model, optimizer step time clocks in at 0.2 seconds.

What Comes Next

Cursor isn’t stopping at Composer 2.5. The company has announced a significantly larger model in training with SpaceXAI, using Colossus 2’s million H100-equivalents and 10x more total compute. The autonomous agent push is also accelerating — 35% of merged PRs at Cursor itself are now created by autonomous agents, a figure CEO Michael Truell has cited as a sign of where software development is heading.

Composer 2.5 is available now in Cursor with doubled usage for the first week. Whether it’s enough to shift the narrative is another question — but it’s a credible signal that Cursor is serious about owning its own destiny in the model race.