Adoption Of AI Agents In Software Development Will Be A Costly Mistake: George Hotz

AI agentic coding is currently all the rage, but one of the most prominent programmers in the world believes that it’s taking software development down what would prove to be the wrong path.

George Hotz — the hacker who jailbroke the iPhone and PlayStation 3, founder of comma.ai, and one of the sharpest minds in software — has published a sharp blog post calling the industry’s embrace of AI coding agents “one of the most costly mistakes in the field’s history.” Coming from someone who has personally built with agents, reversed hardware with them, and tried virtually every major model and harness available, the critique carries weight.

Sophisticated Mimicry, Not Programming

Hotz’s core argument is precise: agents are not programmers. “They are a highly sophisticated statistical model designed to mimic the distribution of programming,” he writes. “The output is broken, but in a way that’s getting harder and harder to detect. Which is exactly what you’d expect from an increasingly accurate statistical model.”

He was not always this certain. Hotz spent the last six months genuinely trying — using agents to write parts of tinygrad, reversing a USB ↔ PCIe chip — before concluding that he could have done it better and faster manually each time. The pattern he observed: “The agent frontloads all the progress, then gives you a slot machine lever to pull to hope it gets the polish done. It never quite gets there.”

He pre-empts the inevitable response. “And in before, ‘you are using it wrong.’ I have tried all the different models, different harnesses, different prompts. It’s not this.”

The Organizational Trap

Hotz’s sharpest insight may be structural. He argues that AI coding agents will damage large organizations far more than skilled individuals, because high performers have the instinct to catch slop while lower performers do not — and they are the ones now producing 10x output with agents. “What do you think is happening to the average output of that organization? What is happening to the average output of the world?”

His conclusion is bleak but clarifying: “Agents will end up producing more code, more apps, and more features than ever before. It is a golden era for buckets and buckets of slop, and a dark age for gems of quality.”

The AI coding boom is already straining infrastructure — GitHub saw AI-agent pull requests jump from 4 million in September 2025 to 17 million in March 2026. That surge in volume, Hotz would argue, says nothing about quality.

Apple as the Test Case

Hotz makes the argument concrete with Apple. He notes the company is pushing AI on all its engineers, then asks a pointed question: “Do you think macOS will get better or worse in the next 2 years?”

It’s a provocation, but a useful one. Apple has built its brand on quality and polish — the exact properties Hotz argues agents cannot reliably deliver. If the hypothesis holds, large organizations that mandate AI-assisted development will see output volume rise and quality quietly degrade, with feedback loops too slow to catch it.

What Agents Actually Do Well

Hotz is not an AI skeptic in the broad sense. “I’m not saying that AI isn’t useful, it clearly is. It’s definitely a better Google for most searches. And whenever you need a quick prototype and don’t care about polish, it is absurdly fast.” He draws a hard line, though: “Is it a software engineer? Not close to the bar at any company I have worked at. The key aspect is knowing when to use it and when not to.”

The Process Argument

On the deeper technical question, Hotz has moved camps. “Without fully endorsing all their ideas, I’m now in the LeCun/Marcus camp on LLMs. I don’t think models like this will ever be able to program — I think the process matters.” His view is that real programming agents will require world models, not the current RLVR-based approaches — which he describes with characteristic bluntness as “RLVR shit that comments out the failing test and tells you all the tests are now passing.”

The deeper issue, he argues, is epistemological. When people encounter an artifact, they assume a human-like process behind it. That assumption no longer holds. “Things can be broken in ways that weren’t previously possible, and old proxies of underlying quality like syntax and grammar are useless.” Agent-produced code is not produced the same way human code is, and while the statistical difference may be subtle, “it makes itself obvious when you try to interact with and build on the artifact in human ways.” And Hotz has a dramatic warning for those who’re using AI agents to build serious software. “The real story of this era will be who manages to avoid harming themselves in their AI psychosis,” he says.