Z.AI’s GLM-5 Displaces Kimi 2.5 Thinking To Become Top Open Model, Scores Higher Than Gemini 3 Pro On Artificial Analysis Intelligence Index

Chinese open models aren’t only leapfrogging each other with regularity in the AI race, but they’re also going past the best models of some US frontier labs.

Z.AI’s newly released GLM-5 has achieved a milestone in the open-weights AI landscape, becoming the first model to score 50 or above on the new Artificial Analysis Intelligence Index v4.0. The achievement marks a significant narrowing of the performance gap between proprietary and open-weights models, with GLM-5 now outranking Google’s Gemini 3 Pro (scoring 48) and matching the performance of several established commercial offerings.

Leading the Open Weights Pack

GLM-5’s Intelligence Index score of 50 represents an impressive eight-point jump from its predecessor, GLM-4.7, which scored 42. This improvement was driven by gains across agentic performance and knowledge capabilities, along with a dramatic reduction in hallucination rates. The model now leads other frontier open-weights models including Moonshot’s Kimi K2.5 (scoring 47), MiniMax 2.1 (40), and DeepSeek V3.2 (42).

The model’s performance is particularly notable in agentic tasks. GLM-5 achieved the highest Artificial Analysis Agentic Index score among open-weights models with a score of 63, ranking third overall across all models tested. On GDPval-AA, Artificial Analysis’s benchmark for evaluating agentic performance on economically valuable work tasks, GLM-5 achieved an ELO rating of 1,412. This places it behind only Anthropic’s Claude Opus 4.6 (1,606 ELO) and OpenAI’s GPT-5.2 (1,462 ELO), and ahead of numerous proprietary models.

Architectural Advances and Reduced Hallucinations

GLM-5 represents Z.AI’s first new architecture since GLM-4.5. While the GLM-4.5, 4.6, and 4.7 models were all 355B total parameter / 32B active parameter mixture-of-experts models, GLM-5 scales up to 744B total parameters with 40B active parameters. The new architecture integrates DeepSeek Sparse Attention, aligning it more closely with the parameter counts of DeepSeek V3 (671B total / 37B active) and Moonshot’s Kimi K2 family (1T total / 32B active).

However, GLM-5 distinguishes itself through its release in BF16 precision, resulting in a total size of approximately 1.5TB. This makes it larger than DeepSeek V3 and recent Kimi K2 models, which have been released natively in FP8 and INT4 precision respectively. For organizations looking to self-deploy GLM-5, the model will require approximately 1,490GB of memory to store the weights in native BF16 precision.

One of GLM-5’s most significant improvements comes in hallucination reduction. The model scored -1 on the AA-Omniscience Index, a 35-point improvement compared to GLM-4.7 Reasoning’s score of -36. This was driven by a 56 percentage point reduction in hallucination rate, achieved through more frequent abstention when the model lacks confidence. GLM-5 now has the lowest level of hallucination among all models tested on the benchmark.

Efficiency Gains and Market Position

Despite achieving higher scores across most evaluations, GLM-5 demonstrated improved efficiency, using approximately 110 million output tokens to complete the Intelligence Index evaluations compared to GLM-4.7’s 170 million tokens. This represents a significant decrease in token usage while delivering superior performance, though the model remains less token-efficient compared to Claude Opus 4.6.

Z.AI also indicated that it has increased pre-training data volume from 23 trillion to 28.5 trillion tokens for GLM-5, contributing to the model’s improved performance across benchmarks.

Availability and Specifications

GLM-5 is released under the MIT License, offering flexibility for commercial deployment. The model supports a 200,000-token context window, matching GLM-4.7’s capacity, though it currently only supports text input and output. This means Kimi K2.5 retains its position as the leading open-weights model supporting image input.

At the time of this analysis, GLM-5 is available through Z.AI’s first-party API and several third-party providers including Novita ($1/$3.20 per million input/output tokens), GMI Cloud ($1/$3.20), and DeepInfra ($0.80/$2.56), with deployment in FP8 precision on these platforms to reduce memory requirements.

The Broader Implications

GLM-5’s achievement of a 50+ Intelligence Index score represents more than just a benchmark victory. It signals that the gap between proprietary frontier models and open-weights alternatives continues to narrow, with Chinese AI labs driving much of this convergence. The model’s strong performance on economically valuable work tasks, as measured by GDPval-AA, suggests that open-weights models are increasingly viable for real-world enterprise applications. Perhaps most crucially, an open model from China is now better at text output than the best model from Google, which is a remarkable achievement by itself.

For organizations evaluating AI deployment options, GLM-5’s combination of strong agentic capabilities, low hallucination rates, and open licensing presents a compelling alternative to proprietary models, particularly for use cases where model control and customization are priorities. However, the model’s substantial size and current limitation to text-only processing remain considerations for potential adopters.

As the AI landscape continues to evolve rapidly, GLM-5’s performance demonstrates that the competitive dynamics extend beyond the US frontier labs, with Chinese open-weights models now setting new standards for what’s achievable outside proprietary ecosystems.

Posted in AI