Google DeepMind had indicated that it was still seeing gains in pre-training when it had released the Gemini 3 Pro model, and it turns out that even small gains in pre-training can have a big impact on the overall quality of the model.
Logan Kilpatrick, Group Product Manager at Google DeepMind, recently addressed mounting speculation about whether pre-training—the foundational phase of AI model development—has hit a wall. His response cuts through the noise with a clear message: the gains are real, and more importantly, they compound in unexpected ways throughout the model development pipeline.

When asked directly about whether pre-training scaling has plateaued, Kilpatrick framed the issue not as a matter of belief but of empirical fact. “It’s not whether or not you believe it, it’s whether or not it’s true,” he said. “And I think the truth is this: pre-training continues to deliver real gains.”
But Kilpatrick’s most compelling insight came when he explained the mechanics of how these gains translate into final model performance. “The most important part is that the gains that you see in pre-training, even if the exponential is slightly shallower than it was before where we were in the early days of scaling up, principally every other step in the model development process is an amplification of the underlying pre-trained model,” he explained.
He used a vivid metaphor to drive the point home: “So if you have a beast of a pre-trained model, and even if you’re only eking out a couple of percentage points increases in overall capability during pre-training, the other steps of the process amplify and are exponential on top of that.”
The implications of Kilpatrick’s comments are significant for the AI industry. His explanation suggests that even modest improvements in pre-training—say, a 2-3% gain in base capabilities—can translate into substantially larger improvements in the final model after post-training techniques like reinforcement learning from human feedback, fine-tuning, and instruction tuning are applied. This multiplicative effect means that the returns on pre-training investments may be far greater than raw benchmark numbers suggest, challenging the narrative that scaling laws have broken down. The perspective also helps explain why major AI labs continue to invest heavily in larger training runs despite seemingly diminishing returns—the downstream amplification makes even incremental pre-training gains worthwhile. And while there were concerns that companies had already used up all the data that was available to pre-train models, it appears that even small gains at this step could lead to significant improvements in model quality.