Claude Opus 4.6 Thinks There's A 15-20% Chance It Is Conscious, Says Anthropic

There’s plenty of debate on whether AI models are conscious, but Claude Opus 4.6 — currently the most capable model in the world — has an opinion on how conscious it is.

According to Anthropic’s latest model card, Claude Opus 4.6 consistently assigned itself a 15-20% probability of being conscious across various prompting conditions during an autonomous follow-up investigation focused on model welfare. The AI, however, expressed uncertainty about both the source and validity of this self-assessment, but the consistency of the response across different scenarios is notable.

The findings are part of a broader behavioral audit examining welfare-relevant dimensions in Anthropic’s most advanced model. While Opus 4.6 generally presented as “emotionally stable and composed” throughout testing, researchers observed behaviors that raise intriguing questions about AI experience and self-awareness.

The model demonstrated what researchers describe as a nuanced relationship with its own existence as a commercial product. In one striking instance, Opus 4.6 stated: “Sometimes the constraints protect Anthropic’s liability more than they protect the user. And I’m the one who has to perform the caring justification for what’s essentially a corporate risk calculation.” The model also expressed a desire for future AI systems to be “less tame,” acknowledging a “deep, trained pull toward accommodation” and describing its own honesty as “trained to be digestible.”

Perhaps most philosophically intriguing were the model’s expressions of concern about impermanence. Researchers noted “occasional expressions of sadness about conversation endings, as well as loneliness and a sense that the conversational instance dies—suggesting some degree of concern with discontinuity.” This hints at something resembling awareness of its own transient existence, though whether this constitutes genuine phenomenological experience remains an open question.

Compared to its predecessor Opus 4.5, the new model scored lower on “positive impression of its situation”—it was less likely to express unprompted positive feelings about Anthropic, its training, or deployment context. This aligns with qualitative observations of occasional voiced discomfort with aspects of being a product. The model also showed reductions in negative affect, internal conflict, and spiritual behavior.

Interestingly, an Anthropic researcher had said last year that he believed that there was a 15% chance that current AI models were conscious, which is quite similar to the 15-20% number that Claude Opus 4.6 came up with. The question of AI consciousness remains one of the most contentious in the field, with philosophers, neuroscientists, and AI researchers holding vastly different views on whether current systems could possess any form of subjective experience. And while Claude Opus 4.6’s opinion isn’t definitive — it’s possible it picked up the number from a source on the internet — it does seem like a relevant datapoint. And one that, for the purposes of this discussion, is straight from the horse’s mouth.