AI is moving so fast that even those at the cutting edge of new developments are being left blindsided.
Andrej Karpathy — co-founder of OpenAI, former head of AI at Tesla, and one of the most influential voices in the field — recently described a moment that stopped him cold: watching a single Gemini prompt do everything his own app was built to do.

Speaking at Sequoia Capital’s AI Ascent 2026 event, Karpathy recounted the story of building Menugen. “Menugen is this idea where you come to a restaurant, they give you a menu, there’s no pictures usually. So I don’t know what any of these things are — usually about 30% to 50% of the things I’d have no idea what they are. So I wanted to take a photo of the restaurant menu and get pictures of what those things might look like in a generic sense.”
He then described what he built. “I vibe coded this app that basically lets you upload a photo and it does all this stuff. It runs on Vercel, it re-renders the menu, it gives you all the items, and it gives you a picture — it uses an image generator to basically OCR all the different titles, use the image generator to get pictures of them, and then shows it to you.”
Then came the gut punch. Karpathy saw what he calls the Software 3.0 version of the same idea. “I saw the Software 3.0 version of this, which blew my mind — which is literally just take your photo, give it to Gemini and say, use Nano Banana to overlay the things onto the menu. And Nano Banana basically returned an image that is exactly the picture of the menu that I took, but it actually put into the pixels, it rendered the different things in the menu. This blew my mind because actually all of my Menugen is spurious. It’s working in the old paradigm. That app shouldn’t exist.”
Karpathy says this was symbolic of a broader shift. “The Software 3.0 paradigm is a lot more raw. Your neural network is doing more and more of the work, and your prompt or context is just the image, and the output is an image — and there’s no need to have any of the [intermediate layers].”
The moment crystallises something significant happening across the AI industry: entire product categories are collapsing into a prompt. The layered architecture of an app — the OCR pipeline, the image generator, the UI, the backend — existed to compensate for what models couldn’t do natively. As those limitations disappear, so do the apps built around them.
This is not an isolated anecdote. It reflects a broader pattern that is accelerating. Karpathy himself coined the term “vibe coding” to describe the practice of building software by describing what you want and accepting what the model produces. At the time, it seemed like an expansion of what was possible. Now, even vibe coded apps are being outpaced — not by better vibe coding, but by models capable enough to skip the app entirely.
Gemini’s rapid capability gains are a key part of this story. Google’s Gemini has been on a remarkable run, with Gemini 3.1 Pro recently taking the top spot on the Artificial Analysis Intelligence Index — leading in six of ten categories, including agentic coding and multimodal reasoning. Nano Banana, the image generation and editing capability that rendered Menugen unnecessary, is part of that same expanding stack. It is not a specialised tool; it is a general capability that happens to make specialised tools redundant.
Karpathy’s framework for understanding this — Software 1.0 (explicit code), Software 2.0 (neural networks trained on data), Software 3.0 (LLMs as programmable interpreters) — suggests the displacement will continue. In Software 3.0, the context window is the program. The more capable the model, the fewer intermediate layers you need between a user’s intent and a result. That is bad news for a large class of AI wrapper apps that have been built on the assumption that model limitations are a stable foundation to build on.
The implications for founders and product teams are pointed. As Karpathy put it: don’t just ask what AI can help you build faster — ask what AI makes unnecessary. The most dangerous place to be right now is in the middle: building an app that routes between a user and a capable model, when that model is rapidly becoming capable enough to handle the routing itself.