Open-Source Models Currently Lag Proprietary Models By Just 4 Months: Epoch AI

The top US labs have touched some mouth-watering valuations based on their AI models, but open alternatives aren’t all that far behind.

New research from Epoch AI shows that open-weight models trail the state-of-the-art closed models by just four months on the Epoch Capabilities Index — an aggregate measure of model performance across a broad range of tasks. The finding is striking, given that Anthropic recently crossed a $965 billion valuation and OpenAI raised $122 billion at an $852 billion valuation — numbers that implicitly price in a durable edge for proprietary models. The data suggests that edge is real, but it’s measured in weeks, not years.

The Gap Has Grown — Slightly

Epoch AI’s previous analysis, covering January 2023 through October 2025, put the average lag at three months. The updated figure of four months represents a modest widening. The trend line, however, tells a more nuanced story: for most of the past two-plus years, open-weight models have tracked closely behind their closed counterparts, improving steadily even as the frontier moved rapidly.

The chart maps the progression clearly. On the closed side, the staircase runs from GPT-4 in early 2023 through o1-mini, o1, o3, GPT-5 Pro, GPT-5.3 Codex, and GPT-5.5 Pro — each jump upward representing a meaningful capability leap. The open-weight line mirrors this trajectory with a visible delay: Llama 2-70B, Mixtral, Llama 3.1-405B, DeepSeek-R1 and DeepSeek-V3, and most recently Qwen3-235B-A22B and Kimi K2.6, which now sits near the top of the open-weight rankings.

Chinese Labs Are Driving the Open-Weight Frontier

The open-weight leaderboard has become a Chinese-dominated space. Kimi K2.6 currently holds the top open-weights spot on the Artificial Analysis Intelligence Index, with GLM-5.1 and MiniMax-M2.7 close behind. DeepSeek, whose R1 release in early 2025 forced the world to take Chinese AI seriously, has returned to the top tier with V4 Pro. Western open-weight models, including OpenAI’s gpt-oss-120B, have been pushed further down.

This is a significant shift. Until about 18 months ago, the open-weight conversation was largely a Meta story — Llama was the benchmark, and everything else followed. Today, it’s a multihorse race led by labs that most Western enterprise buyers have never heard of.

What Four Months Actually Means

Four months is simultaneously a long time and a very short one, depending on what you’re building.

For enterprise buyers evaluating which model to deploy in production, a four-month lag can translate to meaningful capability differences — particularly on tasks that require frontier-level reasoning, coding, or agentic performance. Anthropic’s explosive revenue growth, driven significantly by enterprise adoption of Claude, suggests that businesses are willing to pay for that edge.

For developers, researchers, and companies in cost-sensitive markets, the calculus looks different. Open-weight models can be self-hosted, fine-tuned, and deployed without API dependency. The capability gap being measured in months — not years — makes that trade-off increasingly rational.

There’s also the question of trajectory. If open-weight models have maintained a roughly stable four-month lag even as the frontier accelerated dramatically, that’s actually a sign of strength. The gap hasn’t blown out; it’s held roughly constant despite a pace of proprietary model development that has been relentless.

The Business Model Question

The open-source vs. closed debate was once primarily philosophical — about safety, access, and the distribution of AI power. It is increasingly a commercial one. If capable open-weight models continue closing the gap, the premium that proprietary labs can charge for API access faces long-term pressure.

That’s a problem the biggest labs are well aware of. The valuations being assigned to OpenAI and Anthropic are bets that their closed models will retain a durable advantage — in capability, safety, reliability, and enterprise trust — even as the open-weight tier climbs. Four months is enough of a gap to justify a premium today. Whether it stays that way is the question worth watching.

Posted in AI