AI Can Be Used To Supervise Other AI: Anthropic's Chief Scientist Jared Kaplan

There are plenty of concerns around how humanity can make sure AI systems are aligned with their goals, but the solution could be in the form of AI itself.

Jared Kaplan, Chief Science Officer at Anthropic, has said that AI can be used to supervise other AI. His insights shed light on a potential path towards creating more reliable and safer AI systems, especially as these systems become increasingly powerful. Kaplan’s vision hinges on a method called “Constitutional AI,” which he believes could offer a “quadratic improvement” in AI oversight.

“We developed this idea of Constitutional AI a couple of years ago,” Kaplan explains. “The idea is that you write down a list of principles that you want your AI system to comply with.” This “constitution” serves as a guide for the AI’s behavior.

He continues, “Then you can automatically train a system to obey those principles through reinforcement learning and kind of self-evaluation.” This introduces the element of AI self-supervision. The AI, in essence, learns to evaluate its own actions against the established principles.

“So it’s sort of a way of using AI to supervise AI to make it more helpful, honest, and harmless,” Kaplan clarifies. This highlights the core purpose of constitutional AI: to steer AI development towards beneficial outcomes.

“And those kinds of methods are becoming much more powerful as AI becomes more powerful,” he observes. This statement points to the scalability of the approach, suggesting that as AI capabilities grow, so too will the effectiveness of AI-driven oversight.

Kaplan elaborates on the potential for accelerated improvement: “I think part of the goal originally with that idea was that this quadratic improvement, like (when) AI gets better, you get better at supervising itself, which makes it more reliable and smarter.” This feedback loop, where advancements in AI enhance its own supervision, is key to the concept’s promise.

“A lot of these techniques, like scalable oversight, like having oversight that benefits from AI, are starting to work pretty well. And I think [they] will help us make systems both more useful and also safer.” This suggests that the idea of AI-driven oversight is gaining traction and demonstrating practical value.

Anthropic isn’t the only company that’s said that AI could be used to supervise AIs — other people have also pointed in this direction. Deep learning pioneer Yoshua Bengio has said that we’ll need to develop AI systems to supervise other AI systems. Google too has created AI systems that have learnt to improve themselves through reinforcement learning. There could, of course, be some issues with AI supervising other AIs — if AI behaviour across models ends up being coupled, having an AI supervising other AIs might not be very useful. But given how powerful AI systems are getting, the only way to make them fall in line could be to have them be overseen by other AI systems.