Cursor's Composer 2.5 Places 3rd In Artificial Analysis Coding Agent Index, Is 10-60x Cheaper Than Variants Above It

Claude Code and Codex are the most popular coding agents at the moment, but another player seems to have entered the race.

Cursor has released Composer 2.5, and the numbers from Artificial Analysis tell a clear story: the model scores 63 on the Coding Agent Index — a composite average pass@1 across SWE-Bench-Pro-Hard-AA, Terminal-Bench v2, and SWE-Atlas-QnA — landing in third place behind only Claude Code with Opus 4.7 (Max) at 67 and Codex with GPT-5.5 (XHigh) at 65. What makes the result stand out isn’t just the ranking, but the cost. Composer 2.5 standard runs at $0.07 per task; the Fast variant at $0.44. The agents above it cost $4.14 (Claude Code Opus 4.7 Max) and $4.33 (Codex GPT-5.5 XHigh) per task — roughly 10x the cost of Fast and 60x the cost of standard.

The Results

The headline gain over Composer 2 is on SWE-Bench-Pro-Hard-AA, where Composer 2.5 jumped 35 points — from 12% to 47%. That puts it in the same territory as Claude Code with Opus 4.7 (Max) on that specific benchmark. Terminal-Bench v2 improved by 2 points (64% → 66%) and SWE-Atlas-QnA by 3 points (69% → 72%).

Cursor serves Composer 2.5 in two variants: standard and Fast. Fast completes tasks in an average of 6.7 minutes — the third-fastest agent on the index — while standard runs about 30% slower at 9.3 minutes per task. The pricing differential matches: Fast costs 6x more per token ($3.00/$15.00 vs. $0.50/$2.50 per million input/output tokens), and about 6x more per task ($0.44 vs. $0.07).

On the cost-quality curve, Composer 2.5 clears 60 on the index at a price point where no other agent comes close. Medium-effort peers cost between $1.24 and $2.21 per task; higher-effort variants are 3-4 index points higher at $4.10–$4.82. Cursor’s model sits at the Pareto frontier.

Composer 2.5 is not available via a public API — it runs exclusively inside Cursor IDE and Cursor CLI. Like its predecessor Composer 2, it is built on continued training atop Moonshot AI’s open-weight Kimi K2.5, with Cursor reporting that roughly 85% of total compute came from its own additional training and reinforcement learning.

What This Means for the Cursor-xAI Deal

The timing of this release matters beyond benchmarks. SpaceX — which owns xAI — and Cursor recently announced a partnership to build what they described as the “world’s best coding and knowledge work AI,” combining Cursor’s developer distribution with Colossus, SpaceX’s supercomputer with a claimed million H100-equivalent GPUs. SpaceX secured the right to acquire Cursor for $60 billion later this year — roughly double its current ~$30 billion valuation. If the collaboration doesn’t lead to a deal, Cursor has agreed to pay SpaceX $10 billion for compute access.

Cursor CEO Michael Truell specifically cited scaling up Composer as a goal of the partnership. Composer 2.5 arriving as a legitimate top-three coding agent — not just a competitive one — validates that bet. It gives Cursor a proprietary model it can actually point to when making the case that the xAI compute investment will pay off.

For xAI, this is a potentially positive signal too. xAI’s own coding tools have struggled, and Musk publicly acknowledged the company needed to “catch up and exceed” competitors on coding. The embarrassment of xAI teams reportedly using Anthropic’s Claude via Cursor — before Anthropic cut off that access — made the coding gap especially visible. A Cursor with a proven in-house model is a more strategically valuable acquisition target than one dependent on third-party inference. If the deal closes, xAI inherits a coding agent that can compete at the top of the market — and do it cheaply.

The Bigger Picture

Cursor had been on the defensive heading into this release. Claude Code crossed $2.5 billion in annualized revenue with over 300,000 business customers. Codex usage surged 10x in weeks. Cursor’s structural problem — paying retail for the models that power its competitors — was getting harder to ignore.

Composer 2.5 doesn’t fully solve that problem, but it changes the shape of it. At $0.07 per task for a third-ranked coding agent, Cursor now has a model that makes a genuine cost-quality case. Whether the xAI compute deal can turn that into a sustainable advantage is the next question.