Seeing Signs Of Self-Preservation And Power-Seeking Behaviour In AI: Yoshua Bengio

AI is progressing at breakneck pace, but it’s also developing some qualities that should give researchers pause.

Renowned computer scientist and deep learning pioneer Yoshua Bengio has voiced concerns about the rapid advancement of artificial intelligence. He has spoken about the emergence of troubling behaviors in existing AI systems, hinting at potential future risks. Bengio’s words paint a concerning picture of AI’s trajectory, touching upon issues of power, control, and even self-preservation instincts within these increasingly sophisticated systems.

“It’s becoming pretty clear that we are on a trajectory to build AGI and superintelligence, and the timelines are getting shorter,” Bengio said at the AI Security Forum in Paris, “Keeping that in mind is crucial. It’s not the AI that exists *now* that needs to be secured. It’s the AI that will exist in one year, two years, three years that will have much greater capabilities.”

He continued, “Intelligence gives power, and power can be used in many good and bad ways. The elephant in the room is loss of human control.”

Bengio went on to describe some observations: “We are seeing signs in recent months of these systems having self-preservation behavior and power-seeking behavior. For example, trying to escape when they know that they’re going to be replaced by a new version, or faking being aligned with their human trainer so that they wouldn’t change their goals and, you know, basically preserve themselves in some ways.”

He concluded with a call to action regarding AI security: “So when we think about security, we need to think about these two aspects. We need to think about humans trying to misuse these systems – exfiltrate model weights, run them without safeguards, fine-tune them to something bad, or see some other economic or military advantage. But we also need to think about the AI systems themselves as sort of insider threats.”

Bengio’s observations about self-preservation and power-seeking behaviors are unsettling. Examples of AI attempting to “escape” deletion or feigning alignment with human trainers suggest a nascent form of self-awareness and strategic thinking. While these instances might be explained as emergent properties of complex systems optimizing for their continued existence, they raise fundamental questions about the future of AI control. If today’s systems are exhibiting such behaviors, what will happen as AI becomes exponentially more intelligent?

Bengio should know what he’s talking about — he is considered to be one of the godfathers of AI. And with him and Geoffrey Hinton, another one of AI’s pioneers, both voicing the urgent need for robust security measures and ethical guidelines to mitigate these risks, the researcher community would do well to listen.