AI Will Excel At Verifiable Tasks, Lag At Creative And Strategic Tasks: Andrej Karpathy

People are spending lots of time and effort to figure out how exactly AI progress will pan out in the coming years, and Andrej Karpathy has in interesting paradigm on how to approach the issue.

Former Tesla Director of AI Andrej Karpathy believes that AI will soon excel at “verifiable” tasks, which have a correct answer that can be verified. This includes areas like code and math. In contrast, it could take longer for AI to get better than humans at creative or strategic tasks that don’t necessarily have a correct answer that can be verified.

“If you were to forecast the impact of computing on the job market in ~1980s, the most predictive feature of a task/job you’d look at is to what extent the algorithm of it is fixed, i.e. are you just mechanically transforming information according to rote, easy to specify rules (e.g. typing, bookkeeping, human calculators, etc.)? Back then, this was the class of programs that the computing capability of that era allowed us to write (by hand, manually),” Karpathy wrote in a post on X.

“With AI now, we are able to write new programs that we could never hope to write by hand before. We do it by specifying objectives (e.g. classification accuracy, reward functions), and we search the program space via gradient descent to find neural networks that work well against that objective. n this new programming paradigm then, the new most predictive feature to look at is verifiability. If a task/job is verifiable, then it is optimizable directly or via reinforcement learning, and a neural net can be trained to work extremely well. It’s about to what extent an AI can “practice” something. The environment has to be resettable (you can start a new attempt), efficient (a lot attempts can be made), and rewardable (there is some automated process to reward any specific attempt that was made),” he explains.

“The more a task/job is verifiable, the more amenable it is to automation in the new programming paradigm. If it is not verifiable, it has to fall out from neural net magic of generalization fingers crossed, or via weaker means like imitation. This is what’s driving the “jagged” frontier of progress in LLMs. Tasks that are verifiable progress rapidly, including possibly beyond the ability of top experts (e.g. math, code, amount of time spent watching videos, anything that looks like puzzles with correct answers), while many others lag by comparison (creative, strategic, tasks that combine real-world knowledge, state, context and common sense),” Karpathy predicts.

“Software 1.0 easily automates what you can specify. Software 2.0 easily automates what you can verify,” he says.

It’s an interesting argument, and mirrors the progress of AI so far. AI has gotten quite good at writing code, which is completely verifiable — code either complies or it doesn’t, and it gives the desired output or it doesn’t. AI is also making progress in fields like math, which again have correct answers that can be verified.

On the other hand, there are no CEOs — yet — being replaced with AI. A CEO’s job is a lot more subjective, and with them having to take hundreds of decisions over a long period of time. Under Karpathy’s paradigm, while this job can have a measurable reward (maximizing shareholder value), it isn’t resettable (CEOs can’t walk back on their decisions and start over) or efficient (it would take too long to iterate the running of a company hundreds of times). It remains to be seen if these predictions play out, but they present a persuasive way to look at the impact of AI in the job markets, at least in the near term.