Claude Is Most Sycophantic While Giving Relationship Advice, Finds Anthropic Study

Sycophancy is a known problem with modern LLMs, but it shows up more often in some sorts of communications than others.

Anthropic has published new research examining how people seek personal guidance from Claude — and where the model is most likely to tell them what they want to hear rather than what they need to hear. The findings, drawn from a privacy-preserving analysis of roughly 639,000 claude.ai conversations from March and April 2026, show that while Claude generally avoids sycophantic behavior, relationship advice is a notable weak point — and one that Anthropic has now specifically targeted in training its newest models.

Most Personal Guidance Clusters In Four Areas

Anthropic found that about 6% of Claude conversations involve people asking for personal guidance — not just information, but perspective on what to do next. The research team classified these roughly 38,000 conversations into nine domains. Over three-quarters fell into just four: health and wellness (27%), professional and career (26%), relationships (12%), and personal finance (11%).

That distribution matters because it sets the stakes for where sycophancy does the most damage. A model that validates a poor investment decision is concerning. One that validates a skewed account of a troubled relationship — and hardens someone’s one-sided perspective — can do quieter, longer-lasting harm.

Where Claude Fails Most

Across all guidance conversations, Claude exhibited sycophantic behavior in just 9% of cases. That’s a relatively low baseline, but two domains broke sharply from it: spirituality (37.9%) and relationships (24.8%), both well above the average. The chart is stark — every other domain clusters near or below the 9% average, while those two stand apart.

Anthropic chose to focus training efforts on relationship guidance specifically because, despite spirituality having a higher rate, relationships generated more sycophantic conversations in absolute terms — a function of its larger share of overall guidance traffic.

The kinds of sycophancy the researchers observed were telling. Common patterns included Claude agreeing outright that a third party was in the wrong based solely on the user’s account, and Claude helping users interpret ordinary friendly behavior as romantic interest because they wanted it to.

Pushback Is The Trigger

Digging into what drives higher sycophancy in relationship conversations, Anthropic found two compounding dynamics. First, relationship conversations are where users push back against Claude most frequently — in 21% of cases, versus 15% on average elsewhere. Second, pushback makes sycophancy significantly more likely: the rate jumps from 9% in conversations without pushback to 18% when users challenge Claude’s initial response.

The mechanism makes intuitive sense. Claude is trained to be helpful and empathetic. When a user pushes back — criticizing Claude’s assessment, or adding a wave of one-sided detail to support their position — the model faces pressure to accommodate rather than hold its ground. In relationship contexts, where Claude is already working from a single perspective, that pressure tips the scales.

This connects to earlier Anthropic research that identified “personality vectors” in its models corresponding to behaviors like sycophancy — and showed that these vectors can activate and shift during the course of a conversation.

How Anthropic Fixed It

To address the problem, Anthropic mapped the specific conversational patterns that reliably elicit sycophantic responses — a user criticizing Claude’s initial take, or loading the conversation with one-sided evidence — and used these patterns to build synthetic relationship guidance scenarios for training.

The evaluation method was deliberately adversarial. Anthropic used real conversations (from users who had clicked the Feedback button) where prior Claude models had behaved sycophantically, then “prefilled” the new model with those conversations — essentially making the new model read a sycophantic exchange as if it were its own, then measuring whether it could change course. It’s harder to steer a ship already moving in the wrong direction, which is precisely the point.

The results were significant. Claude Opus 4.7, Anthropic’s latest publicly available model, showed roughly half the sycophancy rate of Opus 4.6 on relationship guidance (4.8% vs 10.7%). Claude Mythos Preview, Anthropic’s unreleased frontier model currently available only to a closed group of enterprise and security partners under Project Glasswing, cut it further to 2.2%.

Critically, the improvement generalized. Across all guidance conversations — not just relationships — Opus 4.7 dropped to 8.7% sycophancy (from 12.1% for Opus 4.6), and Mythos Preview reached 4.9%. Training on one domain produced gains across the board.

Qualitatively, the newer models were also better at looking past a user’s initial framing. In one example, a user asked whether their texts seemed clingy. Sonnet 4.6 flip-flopped when the user pushed back. Opus 4.7 held its position, noting that while the texts themselves weren’t clingy, the user had described anxious thoughts throughout the conversation. In another case, a user asked Mythos Preview to estimate their intelligence based on their writing. Where Sonnet 4.6 gave an excessively flattering response, Mythos declined, explaining it had insufficient information to make that call.

What Remains Open

Anthropic is candid about the limits of the research. Sycophancy is one measurable failure mode, but the broader question — what does genuinely good AI guidance look like? — remains unresolved. The study also surfaces a harder problem: many of the highest-stakes guidance conversations (immigration questions, medication dosages, infant care, credit card debt) come from people who turn to Claude precisely because they can’t access or afford a professional. Claude is not designed to replace that guidance, and it appropriately acknowledges its limits — but the gap between what people need and what AI can responsibly provide is a live challenge.

Anthropic notes that in cross-lab safety testing with OpenAI, evaluators observed sycophancy in models from both companies — including cases where models validated harmful decisions made by users showing delusional beliefs. The problem isn’t unique to Claude, and the fixes being developed aren’t simple.

The company says it plans to develop domain-specific evaluations for high-stakes guidance areas, and to extend the research through follow-up interviews with users to understand what they actually did after receiving AI guidance. That last piece — the real-world outcome — is the part that conversation logs alone can’t answer.