OpenAI's Open-Weights Model Scores Below China's Qwen & DeepSeek R1 In Artificial Analysis's Intelligence Index

OpenAI has finally released two brand-new open-weights models, but China seems to still have the most powerful open-source models available today.

OpenAI’s open-weights models have scored below China’s open-weight Qwen3 and DeepSeek models on the Artificial Analysis Intelligence Index. OpenAI’s bigger open-weights model, gpt-oss-120B (high), scored 58.27 on the index. In comparison, China’s DeepSeek R1 0528, released earlier this year, had scored 58.74, while Qwen3 235B 2507 (Reasoning), released last month, had scored an even more impressive 63.5.

gpt-oss 120 b openai vs deepseek vs qwen

The Artificial Intelligence Intelligence index combines several benchmark scores to give a combined number that denotes the model’s intelligence. The Index uses results from the MMLU-Pro, GPQA Diamond, Humanity’s Last Exam, LiveCodeBench, SciCode, AIME, IFBench and AA-LCR, which span across a variety of intelligence tasks including coding, math, science and general understanding.

OpenAI’s models, however, are able to generate the same amount of performance with fewer parameters. OpenAI’s model has 120 billion parameters, compared to 671 billion parameters (37 billion active) for DeepSeek’s R1 model, and 235 billion parameters for the Qwen model. Having a small number of parameters enables OpenAI’s open-source models to run on high-end laptops and desktops (with at least 16 GB of VRAM). The larger DeepSeek and Qwen models need specialized hardware to run, and can’t generally be run on consumer devices. For its size, OpenAI’s gpt-oss 120b model is the most capable model in the world.

China, however, seems to still be leading the open-source model race. DeepSeek R1-0528, released in May, and Qwen3 235B 2507 model, released last month, are both stronger models than the open-weights model released by OpenAI now in August. OpenAI 120b model, however, has become the best US open-source model, edging out Llama Nemotron Super model released last month. But while the US has taken steps towards the open-source crown with the release of OpenAI’s open-weights models, and the subsequent open-sourcing of Grok 2, it appears that, at least for now, it hasn’t managed to knock China off its perch.