Andrej Karpathy Explains Different Parts Of Training An LLM Through The Example Of A Textbook

Andrej Karpathy isn’t only one of the top minds in AI, but he also seems to have an ability to simplify difficult concepts to make them accessible to the lay person.

Former Tesla Director of AI and OpenAI executive Andrej Karpathy has explained the different stages of training an LLM through the example of a school textbook. “We have to take the LLMs to school,” he wrote on X. “When you open any textbook, you’ll see three major types of information:

1. Background information / exposition. The meat of the textbook that explains concepts. As you attend over it, your brain is training on that data. This is equivalent to pretraining, where the model is reading the internet and accumulating background knowledge.

2. Worked problems with solutions. These are concrete examples of how an expert solves problems. They are demonstrations to be imitated. This is equivalent to supervised finetuning, where the model is finetuning on “ideal responses” for an Assistant, written by humans.

3. Practice problems. These are prompts to the student, usually without the solution, but always with the final answer. There are usually many, many of these at the end of each chapter. They are prompting the student to learn by trial & error – they have to try a bunch of stuff to get to the right answer. This is equivalent to reinforcement learning,” he explained.

This is great way to explain how LLMs learn by comparing each stage of the LLM training process to the parts of a school textbook. But Karpathy extended the example to show which parts of training LLMs needed more of.

“We’ve subjected LLMs to a ton of 1 (background information/pretraining) and 2 (worked problems with solutions/supervised finetuning), but 3 (practice problems/reinforcement learning) is a nascent, emerging frontier. When we’re creating datasets for LLMs, it’s no different from writing textbooks for them, with these 3 types of data. They have to read, and they have to practice,” he wrote.

It was advances in reinforcement learning that helped DeepSeek stun the world with its R1 model, which held its own against OpenAI’s top models. Anthropic CEO Dario Amodei said that Reinforcement Learning was still a largely untapped axis for LLMs to improve on, and there could be rapid progress if LLMs were improved with better reinforcement learning. And Karpathy has told researchers that they needed to help out models with Reinforcement Learning. “For friends of open source: imo the highest leverage thing you can do is help construct a high diversity of RL environments that help elicit LLM cognitive strategies. To build a gym of sorts. This is a highly parallelizable task, which favors a large community of collaborators,” he said on X.

Posted in AI