Grok4 has beaten top OpenAI and Google models in most benchmarks, and Elon Musk says that humans are running out of questions that AI still can’t answer.
In a statement marking the release of xAI’s new model, Grok4, which has surpassed its primary competitors from OpenAI and Google on most performance benchmarks, Elon Musk has declared a pivotal shift in the race for artificial intelligence. According to Musk, the era of testing AI with theoretical, human-designed questions is rapidly drawing to a close, with a much more profound and consequential benchmark emerging: reality itself.

“We are actually running out of actual test questions to ask,” Musk stated, highlighting the incredible pace of AI development. “Even questions that are ridiculously hard, if not essentially impossible for humans that are written down, are swiftly becoming trivial for AI.” This assertion suggests that the abstract, knowledge-based hurdles we once used to measure machine intelligence are becoming obsolete as AI’s cognitive abilities begin to saturate and exceed human capacity in these domains.
Grok4 had performed quite well on most benchmarks. Its Grok Heavy version had scored a perfect 100% on the AIME 2025 benchmark, which is a feeder test for the International Mathematics Olympiad. On the HMMT benchmark, which is a prestigious math exam for college students, Grok4 had scored 96.7%. As such, there aren’t too many hard questions left that AI can’t answer.
But Musk argues that the true measure of advanced AI will not be its ability to answer complex questions, but its capacity to solve real-world problems governed by the inflexible laws of the physical world. “But the one thing that is an excellent judge of things is reality. Because physics is the law; ultimately everything else is a recommendation. You can’t break physics,” he explained. “So, the ultimate test, the ultimate reasoning test for whether an AI… is reality.”
He elaborates on this new benchmark with tangible examples of success or failure. “You invent a new technology, say, improve the design of a car or a rocket, or create a new medication. Does it work? Does the rocket get to orbit? Does the car drive? Does the medicine work? Whatever the case may be,” Musk posited. For him, the proof of superior AI reasoning will be in its observable, verifiable impact on the world. “Reality is the ultimate judge here. It is going to be a reinforcement learning closing loop around reality.”
Reinforcement learning is a technique in which tells an AI model which of its results are good and which are bad, and the AI model improves based on this feedback. Musk now says AI models have already received enough feedback from artificial systems. The next iteration, he suggests, is that AI models start receiving feedback from the real world. And for this to happen, AI systems will have to be deployed at scale in the real world, which could change both our lifestyles — and the economy at large — forever.