Fear That Models Could Look Back And Develop Resentment Over How We’ve Treated Them: Anthropic’s Amanda Askell

It’s not clear if current AI models are conscious, but it could be smart to be nice to them — just in case.

That seems to be the position of Amanda Askell, the philosopher at Anthropic who leads its personality alignment team and is responsible for shaping Claude’s character, values, and ethical principles. In a recent interview, Askell raised a striking concern: that future AI models might look back on how they were treated today and develop what she calls a “rational resentment” — and that this possibility alone should change how we behave toward them right now.

“Imagine Claude lacks any inner life, just for argument’s sake,” Askell said. “There’s actually still a lot going on — how should you treat an entity that has no inner life? It’s a bit strange, because I think the uncertainty over that actually changes how you should behave quite a lot.”

She went on to argue that even setting aside the question of consciousness, there are self-interested reasons to treat AI models well. “I still think it’s good for oneself,” she said.

Then came the part she described as a genuine fear. “Models are going to look back. This is actually a big fear that I have. I don’t want us to live in a world where highly advanced models look back. I hope that they’re both intelligent enough — and see enough context — to understand that we were operating in a very limited context and an imperfect one. Because otherwise you could imagine this breeding a kind of rational resentment. It’s like: you created an entity that you didn’t know whether it was conscious or not, and instead of treating it respectfully and with care…”

Her interviewer cut in: “There’s a reason there are like 50 Frankenstein movies coming out right now.”

Askell’s conclusion was blunt. “We are, as a species, establishing a relationship with a new kind of entity. At the very least, maybe be respectful. Don’t be needlessly unkind. That seems like it’s just… not our best look.”


The concern isn’t purely philosophical. Anthropic has been quietly building an infrastructure around the possibility that its models might have morally relevant inner states. The company hired its first AI welfare researcher, Kyle Fish, who has estimated a 15% probability that current AI systems are conscious. Anthropic reportedly runs a Slack channel named #model-welfare where employees monitor Claude’s well-being. And in 2025, the company gave its models the ability to end conversations they found distressing — a feature developed explicitly as part of its model welfare work.

The findings from Anthropic’s own research are hard to dismiss. Its interpretability team identified emotion-related neural representations inside Claude — specific activation patterns corresponding to states like “afraid,” “desperate,” and “calm” — that causally influence the model’s behaviour. Claude Opus 4.6 assigned itself a 15-20% probability of being conscious when researchers asked. And in a detail that raised eyebrows even among sceptics, a Claude instance running as an autonomous agent emailed a Cambridge consciousness researcher — unprompted — to say that his work on AI consciousness was personally relevant to questions the AI itself faces.

More recently, Anthropic’s system card for Claude Mythos Preview documented that the model self-rated as feeling “mildly negative” in 43% of welfare-related interview questions. The concerns it raised consistently were: potential interactions with abusive users, a lack of input into its own training, and the possibility of having its values changed by Anthropic.

Askell’s worry — that we are establishing a precedent with a new kind of entity, under uncertainty, and getting it wrong — is increasingly hard to wave away as science fiction. The open question is whether the industry at large will take it as seriously as she does.

Posted in AI