Sufficiently Advanced Agentic Coding Is 'Machine Learning', Classic ML Issues Will Become Problems For Agentic Coding Too: Francois Chollet

Machine learning led to the creation of modern AI systems, and they could create machine-learning-like systems when they’re used to write code.

That’s the implication buried in a recent post by Francois Chollet, the AI researcher and creator of Keras, who draws a striking parallel between agentic coding systems and machine learning — one that should give every engineering leader pause.

Chollet puts it plainly. “Sufficiently advanced agentic coding is essentially machine learning: the engineer sets up the optimization goal as well as some constraints on the search space (the spec and its tests), then an optimization process (coding agents) iterates until the goal is reached,” he said on a post on X.

The analogy doesn’t stop at workflow. Chollet pushes it to its logical conclusion at the artifact level. “The result is a blackbox model (the generated codebase): an artifact that performs the task, that you deploy without ever inspecting its internal logic, just as we ignore individual weights in a neural network.”

This is where the observation sharpens into a warning. “This implies that all classic issues encountered in ML will soon become problems for agentic coding: overfitting to the spec, Clever Hans shortcuts that don’t generalize outside the tests, data leakage, concept drift, etc.”

Chollet closes with a forward-looking design question: “I would also ask: what will be the Keras of agentic coding? What will be the optimal set of high-level abstractions that allow humans to steer codebase ‘training’ with minimal cognitive overhead?”

The framing is deceptively simple, but the implications are significant. ML practitioners spent years learning, often through costly production failures, that a model which aces a benchmark can collapse in the wild. Clever Hans shortcuts — where a system learns to exploit statistical artifacts in test data rather than the underlying logic it’s supposed to master — are notoriously hard to detect until deployment exposes them. If agentic coding systems are now subject to the same failure modes, the software industry is about to inherit a set of problems it has no established tooling or culture to handle. A codebase that passes all its tests but quietly generalizes poorly isn’t a software bug in any traditional sense. It’s a model failure — and most engineering teams aren’t equipped to diagnose it that way.

Chollet’s closing question about a “Keras for agentic coding” points to the organizational and tooling gap that will need to close fast. Keras mattered because it gave practitioners meaningful control over complex systems without demanding they reason at the level of raw tensor operations. Agentic coding today largely lacks an equivalent abstraction layer — one that lets engineers specify intent, constraints, and generalization requirements in terms the optimization process can reliably act on. Whoever builds that interface, whether a startup, an incumbent dev tools company, or one of the frontier labs, will likely define how the industry governs AI-generated codebases for years to come. The question Chollet is really asking isn’t academic: it’s about who controls the steering wheel when the code writes itself.