OpenAI Releases GPT-5.3-Codex For Agentic Coding, Says It Was Its First Model Instrumental In Creating Itself

The frontier AI labs continue releasing models that are better than their predecessors, but there’s an interesting change in how they operate — some of the models are helping create themselves.

OpenAI announced GPT-5.3-Codex on Thursday, calling it “the most capable agentic coding model to date” and revealing that it was the company’s first model instrumental in creating itself. The Codex team used early versions of GPT-5.3-Codex to debug its own training, manage its deployment, and diagnose test results and evaluations during development.

The new model represents a significant expansion of what Codex can do, moving beyond writing and reviewing code to handling nearly any task developers and professionals perform on a computer. It combines the frontier coding performance of GPT-5.2-Codex with the reasoning and professional knowledge capabilities of GPT-5.2, while also operating 25% faster than previous versions.

gpt 5.3 codex

Benchmark Performance

GPT-5.3-Codex achieves state-of-the-art results across multiple industry benchmarks. On SWE-Bench Pro, a rigorous evaluation of real-world software engineering that spans four programming languages, the model scored 56.8% accuracy. This represents an improvement over GPT-5.2-Codex (56.4%) and GPT-5.2 (55.6%), while using fewer tokens than any prior model.

The performance gains are even more pronounced on Terminal-Bench 2.0, which measures the terminal skills coding agents need. GPT-5.3-Codex achieved 77.3% accuracy compared to 64.0% for GPT-5.2-Codex and 62.2% for GPT-5.2.

On OSWorld-Verified, an agentic computer-use benchmark where agents complete productivity tasks in a visual desktop environment, GPT-5.3-Codex scored 64.7% accuracy. This substantially exceeds the 38.2% achieved by GPT-5.2-Codex and 37.9% by GPT-5.2, approaching the approximately 72% human baseline.

The model also matched GPT-5.2’s performance on GDPval, scoring 70.9% on this evaluation measuring performance across 44 occupations on well-specified knowledge-work tasks including creating presentations, spreadsheets, and other work products.

GPT-5.3-Codex benchmarks

Beyond Software Engineering

While GPT-5.3-Codex excels at coding tasks, OpenAI emphasized that it’s designed to support the full software development lifecycle including debugging, deploying, monitoring, writing product requirement documents, editing copy, user research, tests, and metrics analysis. The model’s capabilities extend beyond software development entirely, helping users create slide decks, analyze data in spreadsheets, and complete various professional knowledge work.

To demonstrate its web development capabilities, OpenAI had GPT-5.3-Codex autonomously build two complex games over millions of tokens: a racing game with different racers, eight maps, and power-up items, and a diving game featuring multiple reefs, fish collection, and resource management mechanics like oxygen and pressure.

The company also showed improved understanding of user intent for day-to-day website creation. When asked to build landing pages with underspecified prompts, GPT-5.3-Codex now defaults to more functional designs with sensible defaults, creating production-ready pages that require less iteration.

Interactive Collaboration

A key feature of GPT-5.3-Codex is its enhanced interactivity. Rather than waiting for final outputs, users can engage with the model in real time as it works. The model provides frequent updates on key decisions and progress, allowing users to ask questions, discuss approaches, and steer toward solutions throughout the development process.

“As model capabilities become more powerful, the gap shifts from what agents are capable of doing to how easily humans can interact with, direct and supervise many of them working in parallel,” OpenAI stated in its announcement.

Self-Improvement in Action

The recursive nature of GPT-5.3-Codex’s development highlights a new phase in AI model creation. OpenAI’s research team used Codex to monitor and debug the training run, track patterns throughout training, analyze interaction quality, and propose fixes. The engineering team employed it to optimize the model’s harness, identify context rendering bugs, and root cause low cache hit rates.

During alpha testing, researchers used GPT-5.3-Codex to analyze its own performance improvements, building regex classifiers to estimate the frequency of clarifications, user responses, and task progress across all session logs. Data scientists worked with the model to create new data pipelines and visualization tools, with Codex concisely summarizing key insights over thousands of data points in under three minutes.

“People building with Codex were happier as the agent was better understanding their intent and made more progress per turn, with fewer clarifying questions,” OpenAI reported.

Cybersecurity Considerations

GPT-5.3-Codex is the first model OpenAI has classified as “High capability” for cybersecurity-related tasks under its Preparedness Framework, and the first it has directly trained to identify software vulnerabilities. While the company states it doesn’t have definitive evidence the model can automate cyber attacks end-to-end, it’s taking a precautionary approach with comprehensive safety measures including safety training, automated monitoring, trusted access for advanced capabilities, and threat intelligence enforcement pipelines.

To support defensive cybersecurity efforts, OpenAI is launching “Trusted Access for Cyber,” a pilot program to accelerate cyber defense research. The company is also committing $10 million in API credits through its Cybersecurity Grant Program to support good-faith security research, particularly for open source software and critical infrastructure systems.

Availability

GPT-5.3-Codex is available now with paid ChatGPT plans across the Codex app, command-line interface, IDE extensions, and web interface. OpenAI says it is working to safely enable API access soon. The model was co-designed for, trained with, and served on NVIDIA GB200 NVL72 systems.

The release positions Codex as evolving from a specialized coding tool into a more general-purpose collaborator capable of operating computers and completing end-to-end work across a broad spectrum of professional tasks. OpenAI frames this as expanding both who can build with Codex and what’s possible with the platform. Codex has seem plenty of momentum in recent days, and a new agentic model should help it get even more users, and compete against Claude Code and other competitors.

Posted in AI