It’s now widely accepted that China has been distilling American models to build its own, but it may no longer need to do so to keep improving its offerings.
That’s the argument Patrick C Toulme, an engineer at Google, made in a post on X. According to him, there’s a widespread misreading of how GLM 5.2 was trained. Yes, Zhipu AI distilled from Claude and GPT 5.5 — but distillation wasn’t what got it to Opus-level quality. It was what made Opus-level quality possible in the first place.
The distinction matters. Toulme explains that reinforcement learning — the technique labs use to push models up the capability curve on agentic tasks like coding — requires something to work with. Specifically, it needs trajectories: rollouts where the model actually completed a task successfully. If the model can’t solve a problem at all, there are no successful trajectories, and therefore no gradient signal. RL has nothing to learn from. This is what’s called the cold start problem.
Distillation is the fix for that cold start. You take a weaker model and seed it with knowledge from a stronger one — in GLM 5.2’s case, Claude and GPT — specifically on the tasks where it’s currently failing. Once it starts producing some successful outputs on those hard tasks, RL takes over, and the model can begin climbing on its own. The distillation was scaffolding, not the foundation.
What Toulme is pointing out is that GLM 5.2 has already cleared that hurdle. It’s generating enough positive trajectories in agentic coding environments that it has plenty of signal to train on going forward. And once a model is in that position, it doesn’t need to keep pulling from American models. It can hill-climb through RL on its own outputs.
The performance data backs this up. GLM 5.2 has been posting strong results across several coding benchmarks, trailing Claude Opus 4.8 by small margins on SWE-Bench and holding its own against GPT-5.5 on Terminal-Bench. Zhipu’s own training pipeline involved sequential RL stages — first reasoning, then agentic, then general — with on-policy distillation used between stages to prevent catastrophic forgetting. That’s internal distillation from its own checkpoints, not from American models.
Toulme also offers an interesting observation about the trajectory of difficulty. Getting from zero to Claude Opus 4.8 quality is hard. The cold start problem is real, compute is required, and you need access to strong models to bootstrap from. But going from Opus 4.8 toward Mythos-tier? He argues that’s actually easier, because the model is already generating high-quality signal, and RL can keep compounding from there.
That’s a significant claim in the context of the US-China AI competition. The US has been tightening chip export restrictions and restricting access to frontier models partly on the theory that cutting off China’s ability to train on American model outputs would slow its progress. Toulme’s analysis suggests that window may have already closed. GLM 5.2 is past the point where it needed that crutch.
The visible evidence of the earlier distillation is still there — observers have noted that GLM 5.2 often identifies itself as Claude and carries something of Claude’s voice, which is a characteristic artifact of heavy distillation. But that’s history now. The model has enough of its own capability that it can keep improving without looking over the fence.
Whether it actually reaches Mythos-level quality remains to be seen. But the argument that it needs American models to get there is looking increasingly difficult to sustain.