Meta’s Llama 4 ended up being one of the biggest flops in the AI space last year, but that hadn’t been immediately apparent from the benchmarks that had been released alongside the model. But former Meta Chief Scientist Yann LeCun has some insight into why that was the case.
Yann LeCun has revealed that Meta “fudged” Llama 4’s benchmarks “a little bit” when it announced the model. LeCun told FT in an interview that the team — one that he wasn’t leading — used different versions of the model for different benchmarks to get better results. He added that once this came to light, CEO Mark Zuckerberg decided to sideline people and teams that had been responsible for the subterfuge.

“Mark was really upset and basically lost confidence in everyone who was involved in this. And so basically sidelined the entire GenAI organization. A lot of people have left, a lot of people who haven’t yet left will leave,” LeCun said.
Using different versions of a model for different benchmarks is a way to make a model appear more capable than it is. Models can be fine-tuned for different use cases, and Meta’s team likely fine-tuned them for different benchmarks, and then reported results for each. This would’ve made the benchmark results look impressive, but users who used the model would be underwhelmed with its performance, given how they’d be using just one version and not one that was optimized for their particular tasks.
LeCun’s revelation also explains events that followed Llama 4’s botched release. Once it was established by the user community that Llama 4 wasn’t what it claimed to be in the benchmarks, Meta didn’t announce or release a follow-up model. Instead, Meta acquired Scale AI and made its CEO Alexandr Wang the head of its AI efforts. Wang set about building a new AI team, poaching away top researchers from rival labs like OpenAI. Meta simultaneously laid off 600 engineers and researchers, many from its GenAI organization. And while Meta hasn’t yet released a major model since Llama 4, the gaming of benchmarks seems to have set about of series of events that has led to the complete overhaul of Meta’s AI operations.