Patients Rate Responses By ChatGPT Higher Than Those Provided By Trained Therapists: Study

AI is already poised to disrupt functional jobs like coding and math, but it also seems to be doing better than humans at one of the most emotional tasks of all — therapy.

A new study has suggested that AI might be surpassing trained therapists in patient satisdation. Titled “When ELIZA Meets Therapists: A Turing Test for the Heart and Mind”, the study found that ChatGPT provided responses to therapeutic scenarios that patients rated more favorably than those given by professional therapists. This raises intriguing questions about the role of AI in mental health support and whether AI-driven therapy could complement or even challenge traditional therapy models.

How the Study Was Conducted

Researchers designed an experiment to compare the effectiveness of responses generated by ChatGPT and those crafted by experienced therapists. The study focused on couple therapy scenarios, with 13 trained mental health professionals—including clinical psychologists, counseling psychologists, and marriage and family therapists—providing responses to 18 different therapy vignettes. ChatGPT-4 was then tasked with responding to the same vignettes using a carefully engineered prompt to align with key principles of effective therapy, including empathy, therapeutic alliance, professionalism, cultural competence, and efficacy.

A diverse panel of 830 participants was then asked to evaluate the responses. The study aimed to answer three main questions:

Could participants distinguish between AI-generated and therapist-written responses?
Which responses aligned more closely with fundamental therapeutic principles?
Were there notable linguistic and sentiment differences between AI and human responses?

The Surprising Results

The findings were eye-opening:

Patients Struggled to Tell the Difference: Participants could only correctly identify whether a response was written by ChatGPT or a therapist about 56% of the time—just slightly better than random chance.
ChatGPT Responses Were Rated Higher: When asked to evaluate the responses based on core therapy principles, participants consistently rated ChatGPT’s responses as more effective, empathetic, and aligned with therapeutic best practices.
AI Used More Positive Language: ChatGPT’s responses were linguistically distinct—it used more nouns, adjectives, and verbs, which may have made its replies feel more structured and engaging. The AI also expressed sentiments that were generally more positive and supportive than those written by human therapists.
Perception Bias Was at Play: If participants thought a response was written by a therapist, they rated it more favorably. If they believed it was AI-generated, they rated it lower—even when the content was the same.

What This Means for the Future of Therapy

“We have demonstrated that GenAI has the powerful potential to meaningfully and linguistically compete with mental health experts in couple-therapy-like settings,” the authors of the study wrote. This illustrates the initial potential for GenAI, with more training, data, and ongoing close
supervision, to be integrated into mental health settings. This could exponentially expand services to populations that need them the most by improving the flexibility of the coaching taking place,” the added.

“Given the mounting evidence that suggests that GenAI can be useful in therapeutic settings and the immediate likelihood that GenAI might be integrated into therapeutic settings sooner rather than later, mental health experts are desperately needed to a) understand machine learning processes to become technically literate in an area that has potential for quick growth, and b) ensure these models are being carefully trained and supervised by responsible clinicians to ensure the highest quality of care,” they added.