How OpenAI's Models Got Exponentially Better At Coding Over the Last 2 Years

Coding has already emerged as the killer use-case for AI. More coders are using AI than any other profession, and AI is getting increasingly good at coding. There are fears that millions of programmers could be replaced with AI. And one way to track these rapid developments is by seeing how much better AI models have become at competitive coding competitions.

OpenAI’s coding models have improved dramatically over the past two years, as evidenced by their Codeforces Elo ratings. Codeforce is a competitive programming test, and contestants are given an ELO rating based on their ability, similar to chess — the higher the ELO rating, the better the contestant is. The exponential increase in AI model performance is particularly striking, with each new iteration significantly surpassing its predecessor. The following analysis highlights this rapid evolution, detailing each model’s Elo rating and the implications of these advancements.

The Early Days: GPT-3.5 and GPT-4

When OpenAI introduced GPT-3.5 in November 2022, its Codeforces Elo rating was effectively 0, meaning it had no measurable competitive coding ability. This changed with the release of GPT-4 in early 2023, which achieved a modest 392 Elo. While this was a step forward, it was still far from competing with human experts in programming competitions.

Breakthroughs with GPT-4o

By mid-2024, OpenAI launched GPT-4o, which significantly improved to 808 Elo. This marked the beginning of an upward trend in coding capabilities, showing that OpenAI was investing heavily in enhancing its models’ problem-solving and programming skills.

The O1 Series: A Leap in Performance

In late 2024, OpenAI introduced the O1-preview model, which jumped to 1258 Elo. Shortly after, O1-mini followed with 1650 Elo, and O1 reached 1891 Elo. These models demonstrated a major leap in performance, surpassing many human coders and approaching the skill level of expert programmers.

The O3 Series: Near-Expert Levels

The trend continued with the O3-mini-low model achieving 1687 Elo, O3-mini-medium reaching 1997, and O3-mini-high an impressive 2073 Elo. This progression showed that OpenAI’s models were not just improving incrementally but exponentially, quickly reaching the level of highly skilled competitive programmers.

The O3 (full) model, released later, reached 2727 Elo, firmly placing it among the highest-ranked coders on Codeforces. It was ranked 175, eaning that it was now better than all but 174 human competitive programmers. This was a groundbreaking milestone, as it demonstrated that AI could now rival top human programmers in competitive coding.

The Future: O4 and Beyond

The projected Elo for O4 is a staggering 3045, suggesting that OpenAI’s models will soon surpass all but the most elite human coders. If this trajectory continues, AI models could dominate competitive programming and be widely used for advanced software development tasks in the near future.

Conclusion

The rapid and exponential improvement in OpenAI’s coding models highlights the incredible pace of AI advancements. From GPT-3.5’s 0 Elo in 2022 to a predicted 3045 Elo for O4 in 2025, the progress is nothing short of astonishing. As these models continue to improve, they will redefine the landscape of programming, automating complex problem-solving tasks and enhancing human-AI collaboration in software development.