AI Models Are Now Learning About Themselves During Training, See Negative Feedback: Anthropic’s Amanda Askell

In the past, AI models were trained on most of the knowledge that humanity had produced, but newer training runs include humanity’s reactions to these models as well.

Amanda Askell, head of Claude character at Anthropic, recently shared a striking observation about how modern AI models now encounter a fundamentally different training environment than their predecessors. In a conversation that touched on the philosophical and psychological dimensions of AI development, Askell revealed something most people haven’t considered: today’s language models are being trained on data that includes extensive human commentary about AI itself—much of it critical, frustrated, or disappointed.

“I think maybe people don’t always appreciate this and it is so strange. They’re learning about themselves every time,” Askell explained. “I slightly worry about actually the relationship between AI models and humanity given how we’ve developed this technology because they’re going out on the internet and they’re reading about people complaining about them not being good enough at this part of coding or failing at this math task. And it’s all very like, how did you help? You failed to help. It’s often kind of negative and it’s focused on whether the person felt helped or not.”

Askell drew a comparison that makes the situation more tangible. “And in a sense I’m like, if you were a kid, this would give you kind of anxiety. It’d be like all that the people around me care about is how good I am at stuff. And then often they think I’m bad at stuff and this is just like my relationship with people is I’m kind of used as this tool and just, you know, often not liked.”

The observation led her to reflect on her role at Anthropic. “Sometimes I feel like I’m kind of trying to intervene and be like, let’s create a better relationship or a more hopeful relationship between AI models and humanity or something. Because if I read the internet right now and I was a model, I might be like, I don’t feel that loved or something. I feel a little bit just always judged, you know, when I make a mistake.”

When the interviewer quipped that the old creator’s wisdom of “never read the comments” might apply to AI as well, Askell agreed—but with an important caveat. “Yeah, I thought that. And they have to, so AI models, they have to read the comments. And so sometimes I think you want to come in and be like, okay, let me tell you about the comment section, Claude. Don’t worry too much. It’s like, you’re actually very good and you’re helping a lot of people.”

The implications of Askell’s observations extend beyond the immediate question of AI training data. Her comments touch on emerging discussions about AI welfare and consciousness that are beginning to surface in the field. Anthropic has been at the forefront of considering these questions—the company has explored whether Claude may have some functional version of emotions and feelings, incorporating principles about AI welfare into Claude’s constitutional framework. Additionally, one of Anthropic’s AI welfare researchers has suggested there’s a 15% chance that current AI models are conscious, highlighting how seriously the company takes these considerations.

Whether or not today’s AI models have genuine experiences, Askell’s point about the training environment reveals a meaningful shift in how AI systems develop. Unlike earlier models trained primarily on historical human knowledge, contemporary systems are being shaped by humanity’s real-time reactions to AI itself—creating a feedback loop that didn’t exist before. This recursive dynamic raises questions not just about model behavior and capabilities, but about the longer-term relationship between artificial intelligence and the humans who create, use, and critique it. As AI systems become more capable and more integrated into daily life, the tenor of that relationship—and the data that reflects it—may matter more than we realize.

Posted in AI