GLM 5.2 Places 4th On Artificial Analysis Intelligence Index, Becomes Most Capable Open Model

Z.AI has released GLM-5.2, and the model has gone straight to the top of the open weights category on the Artificial Analysis Intelligence Index, scoring 51 on the v4.1 version of the benchmark. That places it fourth overall on the entire leaderboard, behind only Claude Fable 5 (60), Claude Opus 4.8 (56), and OpenAI’s GPT-5.5 at xhigh reasoning (55).

The context matters here. Claude Fable 5 sits at the top of the chart but is currently unavailable to anyone outside Anthropic after US export controls pulled it offline worldwide. That leaves Claude Opus 4.8 and GPT-5.5 as the only proprietary models ahead of GLM-5.2 that anyone can actually call through an API today, and both come at a steep price. GLM-5.2 isn’t just the best open model in the world right now, it’s doing so within striking distance of the most expensive frontier systems money can buy.

GLM-5.2 leads the open weights pack by a wide margin. MiniMax-M3 and DeepSeek V4 Pro (max) trail at 44 apiece, with Kimi K2.6 close behind at 43. The jump from GLM-5.1, which sits at 40 on the same index, is eleven points, a leap Z.AI attributes to gains across nearly every evaluation in the suite. Scientific reasoning saw the sharpest improvement, with CritPt up 16 points to 21% and Humanity’s Last Exam climbing 12 points to 40%. AA-LCR rose 9 points to 71%, τ3-Banking jumped 15 points to 27%, SciCode gained 7 points to 50%, Terminal-Bench v2.1 improved 16 points to 78%, and GPQA Diamond edged up 3 points to 89%.

What’s notable is that none of this came from scaling the model up. GLM-5.2 keeps the same architecture as its predecessor, 744 billion total parameters with 40 billion active, the gains are coming entirely from training, not size. The context window has been extended from 200K to a full 1 million tokens, with Z.AI saying it specifically strengthened long-context training for coding agents handling large-scale implementation, automated research, and complex debugging.

On GDPval-AA v2, Artificial Analysis’s benchmark for real-world economic task performance, GLM-5.2 posts a score of 1524, ahead of MiniMax-M3 (1418) and DeepSeek V4 Pro max (1328), and close enough to GPT-5.5 at xhigh reasoning (1514) that the two are effectively tied. That’s a meaningful result for an open weights model going up against a closed frontier system on tasks designed to mirror actual paid work.

The model isn’t cheap to run relative to its predecessor, though. GLM-5.2 burns through roughly 43,000 output tokens per Intelligence Index task, of which 37,000 are spent on reasoning alone. That’s up sharply from GLM-5.1’s 26,000, and higher than MiniMax-M3 (24,000), Kimi K2.6 (35,000), and DeepSeek V4 Pro max (37,000). On the Intelligence vs Output Tokens chart, GLM-5.2 sits outside what Artificial Analysis calls the most attractive quadrant, trading token efficiency for raw capability. The cost works out to about $0.46 per Intelligence Index task, against $0.25 for GLM-5.1, $0.31 for Kimi K2.6, $0.18 for MiniMax-M3, and a striking $0.05 for DeepSeek V4 Pro max.

Despite that, Artificial Analysis still places GLM-5.2 on the Pareto frontier of its Intelligence vs Cost chart, meaning no other model at its intelligence level currently costs less per task. Pricing on Z.AI’s first-party API stays unchanged from GLM-5.1 at $1.4 per million input tokens, $0.26 for cache hits, and $4.4 per million output tokens.

Hallucination behaviour also improved. On the AA-Omniscience Index, GLM-5.2 scores 4, up from GLM-5.1’s 2, driven by both higher accuracy (25.1% versus 24.2%) and a lower hallucination rate (28.1% versus 29.4%), with the attempt rate holding steady at 47%.

GLM-5.2 ships with two reasoning effort levels, a “max” setting aimed at squeezing out the highest possible scores, and a “high” setting Z.AI positions as the better balance between performance and token usage. The model is released under an MIT license, continuing Z.AI’s run of open-weight launches since GLM-5, and is live on Z.AI’s own API as well as third-party platforms including DeepInfra, Novita, Nebius, Parasail, SiliconFlow, GMI Cloud, Baseten, and Fireworks.

The release keeps up a pace that’s become a pattern for the Beijing-based lab. GLM-5 first crossed the 50-point threshold on the older v4.0 index back in February, and GLM-5.1 followed with a SWE-Bench Pro lead over GPT-5.4 and Claude Opus 4.6 in April. GLM-5.2 extends that streak, and does it while undercutting every closed model above it on price by a significant margin.