Companies staffed by humans are divided into managers and employees, and it seems that AI organizations could be structured the same way.
Cursor, the AI-powered code editor, has revealed the architecture behind its recent achievement of building a web browser from scratch in under a week using autonomous AI agents. The system, which coordinated hundreds of concurrent agents writing over a million lines of code, succeeded by abandoning the notion of egalitarian AI collaboration in favor of a hierarchical structure that mirrors traditional organizational design.

The company’s experiments began with what seemed like an intuitive approach: giving all agents equal status and allowing them to self-coordinate through a shared file system. Each agent would check what others were doing, claim tasks, and update their status. The reality proved far messier than the theory.
Agents would hold locks for too long or forget to release them entirely, creating bottlenecks that reduced the effective throughput of twenty agents to just two or three. The system proved brittle in ways that highlighted the gap between human and AI coordination. Agents would fail while holding locks, attempt to acquire locks they already possessed, or update coordination files without proper synchronization. When the team replaced locks with optimistic concurrency control, the technical problems diminished but deeper issues emerged.
Without hierarchy, the agents became risk-averse in ways that paralleled human group dynamics. They avoided difficult tasks, made small and safe changes, and refused to take responsibility for hard problems or end-to-end implementation. Work churned for extended periods without meaningful progress, a phenomenon familiar to anyone who has witnessed committees fail to make decisions.
The breakthrough came when Cursor implemented what they call a planner-worker architecture, separating agents into distinct roles with different responsibilities. Planners continuously explore the codebase and create tasks, capable of spawning sub-planners for specific areas to make planning itself parallel and recursive. Workers, by contrast, focus entirely on completing assigned tasks without coordinating with other workers or concerning themselves with the broader picture. They simply execute their assigned work until completion, then push changes and move on.
At the end of each cycle, a judge agent determines whether to continue, and the next iteration starts fresh. This structure solved the coordination problems that had plagued the flat organizational model and allowed the system to scale to massive projects without individual agents developing tunnel vision.
The results validated this hierarchical approach dramatically. The browser project ran for nearly a week, producing over a million lines of code across a thousand files. Other experiments have proven equally ambitious: a three-week effort to migrate Cursor’s own codebase from Solid to React involved over 266,000 additions and 193,000 deletions that the team believes can be merged into production. Another long-running agent improved video rendering performance by 25 times through an efficient Rust implementation while adding smooth zoom, pan, spring transitions, and motion blur effects.
Additional experiments still running include a Java language server with 7,400 commits and 550,000 lines of code, a Windows 7 emulator with 14,600 commits and 1.2 million lines of code, and a spreadsheet application with 12,000 commits and 1.6 million lines of code.
The company deployed billions of tokens across these agents, learning lessons that challenge assumptions about AI coordination. Model selection proved crucial for long-running tasks, with OpenAI’s GPT-5.2 demonstrating superior ability to follow instructions, maintain focus, avoid drift, and implement features completely compared to other models. Interestingly, different models excelled at different roles: GPT-5.2 proved a better planner than GPT-5.1-codex despite the latter being specifically trained for coding tasks.
Many improvements came from removing complexity rather than adding it. An integrator role initially designed for quality control and conflict resolution created more bottlenecks than it solved, as workers proved capable of handling conflicts independently. The most effective system proved simpler than the distributed computing and organizational design theories the team initially tried to apply.
The right amount of structure, Cursor discovered, lies somewhere in the middle ground. Too little structure leads to conflict, duplicated work, and drift. Too much creates fragility. And surprisingly, the system’s behavior depended heavily on how the agents were prompted, requiring extensive experimentation to achieve proper coordination, avoid pathological behaviors, and maintain focus over extended periods.
The current system remains far from optimal. Planners should ideally wake up when their tasks complete to plan next steps. Agents occasionally run far too long. Periodic fresh starts remain necessary to combat drift and tunnel vision. But the core question of whether autonomous coding can scale by adding more agents has found a more optimistic answer than expected.
Hundreds of agents can collaborate on a single codebase for weeks, making genuine progress on ambitious projects. The techniques being developed will eventually inform Cursor’s agent capabilities for end users, potentially transforming how software development teams operate when augmented by AI colleagues that understand organizational structure just as well as they understand code.
For now, the experiment demonstrates that when it comes to coordinating artificial intelligence at scale, human organizational wisdom may prove more relevant than we imagined. Sometimes the most advanced AI systems work best when they’re organized like a traditional company, complete with managers, workers, and clear divisions of responsibility.