Some companies are spending vast amounts of time and resources to make sure that AI models are completely safe, but it might not be possible to be completely certain of how they’ll behave before they’re made available to millions of users.
This unsettling truth was recently highlighted by Anthropic CEO Dario Amodei. Amodei addressed the inherent unpredictability of AI models, drawing a parallel between their behavior and that of humans. His observation underscores a critical challenge in AI development: the difficulty of fully anticipating the potential misuse of these powerful tools, even with rigorous testing. Amodei emphasizes the continuous learning process involved in understanding AI capabilities, suggesting that each iteration of a model can offer clues to potential issues with future versions.

“Just as every time we release a new model, there are positive applications for it that people find that we weren’t expecting,” Amodei explains, “I expect there also to be negative applications. We always monitor the models for different use cases in order to discover this, so that we have a continuous process where we don’t get taken by surprise.” This proactive approach to monitoring aims to identify potential misuse early on. He continues:
“If we’re worried that someone will do something evil with Model 6, hopefully, some early signs of that can be seen in Model 5 when we monitor it. But this is the fundamental problem of the models: you don’t really know what they’re capable of. You don’t truly know what they’re capable of until they’re deployed to a million people.”
Amodei acknowledges the limitations of pre-deployment testing: “You can test ahead of time. You can have your researchers bash against them. You can even – we collaborate with the government – have the government AI safety teams test them. But the hard truth is that there’s no way to be sure. They’re not like code where you can do formal verification. What they can do is unpredictable.”
He further illustrates this point with a striking analogy: “It’s just like, if I think of you or me, if I’m like the quality assurance engineer for you or me, can I give a guarantee that a particular kind of bad behavior you are logically not capable of will never happen? People don’t work that way.”
Amodei’s statements highlight the inherent risks associated with deploying increasingly sophisticated AI models. His comparison of AI to humans underscores the complexity of predicting behavior, even with rigorous testing and monitoring. While pre-release evaluations are essential, they cannot fully capture the emergent properties that arise from interactions with millions of users in real-world scenarios. This uncertainty poses significant challenges for developers and policymakers alike, necessitating ongoing vigilance and adaptive strategies to mitigate potential harms. Humans can do their best to make sure that the models that are being created are safe, but there might be no way to be completely certain that they are.