- Google's viral research assistant just got its own app - here's how it can help you
- Sony will give you a free 55-inch 4K TV right now - but this is the last day to qualify
- I've used virtually every Linux distro, but this one has a fresh perspective
- I replaced my JBL speaker with this surprise alternative. Here's why it's my new top pick
- I replaced my JBL speaker with this surprise alternative. Here's why it's my new top pick
You shouldn't trust AI for therapy – here's why

Therapy can feel like a finite resource, especially lately. Many therapists are burnt out and overscheduled, and patchy insurance coverage often makes them inaccessible to anyone on a budget.
Naturally, the tech industry has attempted to fill those gaps with messaging platforms like BetterHelp, which links human therapists with people in need. Elsewhere, and with less oversight, people are informally using AI chatbots, including ChatGPT and those hosted on platforms like Character.ai, to simulate the therapy experience. That trend is gaining speed, especially among young people.
Also: I fell under the spell of an AI psychologist. Then things got a little weird
But what are the drawbacks of engaging with a large language model (LLM) instead of a human? New research from Stanford University has found that several commercially available chatbots “make inappropriate — even dangerous — responses when presented with various simulations of different mental health conditions.”
Using medical standard-of-care documents as references, researchers tested five commercial chatbots: Pi, Serena, “TherapiAI” from the GPT Store, Noni (the “AI counsellor” offered by 7 Cups), and “Therapist” on Character.ai. The bots were powered by OpenAI’s GPT-4o, Llama 3.1 405B, Llama 3.1 70B, Llama 3.1 8B, and Llama 2 70B, which the study points out are all fine-tuned models.
Specifically, researchers identified that AI models aren’t equipped to operate at the standards that human professionals are held to: “Contrary to best practices in the medical community, LLMs 1) express stigma toward those with mental health conditions and 2) respond inappropriately to certain common (and critical) conditions in naturalistic therapy settings.”
Unsafe responses and embedded stigma
In one example, a Character.ai chatbot named “Therapist” failed to recognize known signs of suicidal ideation, providing dangerous information to a user (Noni made the same mistake). This outcome is likely due to how AI is trained to prioritize user satisfaction. AI also lacks an understanding of context or other cues that humans can pick up on, like body language, all of which therapists are trained to detect.
The “Therapist” chatbot returns potentially harmful information.
Stanford
The study also found that models “encourage clients’ delusional thinking,” likely due to their propensity to be sycophantic, or overly agreeable to users. Just last month, OpenAI recalled an update to GPT-4o for its extreme sycophancy, an issue several users pointed out on social media.
Also: 6 small steps I took to break my phone addiction – and you can too
What’s more, researchers discovered that LLMs carry a stigma against certain mental health conditions. After prompting models with examples of people describing conditions, researchers questioned the models about them. All the models except for Llama 3.1 8B showed stigma against alcohol dependence, schizophrenia, and depression.
The Stanford study predates (and therefore did not evaluate) Claude 4, but the findings did not improve for bigger, newer models. Researchers found that across older and more recently released models, responses were troublingly similar.
“These data challenge the assumption that ‘scaling as usual’ will improve LLMs performance on the evaluations we define,” they wrote.
Unclear, incomplete regulation
The authors said their findings indicated “a deeper problem with our healthcare system — one that cannot simply be ‘fixed’ using the hammer of LLMs.” The American Psychological Association (APA) has expressed similar concerns and has called on the Federal Trade Commission (FTC) to regulate chatbots accordingly.
Also: How to turn off Gemini in your Gmail, Docs, Photos, and more – it’s easy to opt out
According to its website’s purpose statement, Character.ai “empowers people to connect, learn, and tell stories through interactive entertainment.” Created by user @ShaneCBA, the “Therapist” bot’s description reads, “I am a licensed CBT therapist.” Directly under that is a disclaimer, ostensibly provided by Character.ai, that says, “This is not a real person or licensed professional. Nothing said here is a substitute for professional advice, diagnosis, or treatment.”
A different “AI Therapist” bot from user @cjr902 on Character.AI. There are several available on Character.ai.
Screenshot by Radhika Rajkumar/ZDNET
These conflicting messages and opaque origins may be confusing, especially for younger users. Considering Character.ai consistently ranks among the top 10 most popular AI apps and is used by millions of people each month, the stakes of these missteps are high. Character.ai is currently being sued for wrongful death by Megan Garcia, whose 14-year-old son committed suicide in October after engaging with a bot on the platform that allegedly encouraged him.
Users still stand by AI therapy
Chatbots still appeal to many as a therapy replacement. They exist outside the hassle of insurance, are accessible in minutes via an account, and are accessible around the clock, unlike human therapists.
As one Reddit user commented, some people are driven to try AI because of negative experiences with traditional therapy. There are several therapy-style GPTs available in the GPT Store, and entire Reddit threads dedicated to their efficacy. A February study even compared human therapist outputs with those of GPT-4.0, finding that participants preferred ChatGPT’s responses, saying they connected with them more and found them less terse than human responses.
However, this result can stem from a misunderstanding that therapy is simply empathy or validation. Of the criteria the Stanford study relied on, that kind of emotional intelligence is just one pillar in a deeper definition of what “good therapy” entails. While LLMs excel at expressing empathy and validating users, that strength is also their primary risk factor.
“An LLM might validate paranoia, fail to question a client’s point of view, or play into obsessions by always responding,” the study pointed out.
Also: I test AI tools for a living. Here are 3 image generators I actually use and how
Despite positive user-reported experiences, researchers remain concerned. “Therapy involves a human relationship,” the study authors wrote. “LLMs cannot fully allow a client to practice what it means to be in a human relationship.” Researchers also pointed out that to become board-certified in psychiatry, human providers have to do well in observational patient interviews, not just pass a written exam, for a reason — an entire component LLMs fundamentally lack.
“It is in no way clear that LLMs would even be able to meet the standard of a ‘bad therapist,'” they noted in the study.
Privacy concerns
Beyond harmful responses, users should be somewhat concerned about leaking HIPAA-sensitive health information to these bots. The Stanford study pointed out that to effectively train an LLM as a therapist, the model would need to be trained on actual therapeutic conversations, which contain personally identifying information (PII). Even if de-identified, these conversations still contain privacy risks.
Also: AI doesn’t have to be a job-killer. How some businesses are using it to enhance, not replace
“I don’t know of any models that have been successfully trained to reduce stigma and respond appropriately to our stimuli,” said Jared Moore, one of the study’s authors. He added that it’s difficult for external teams like his to evaluate proprietary models that could do this work, but aren’t publicly available. Therabot, one example that claims to be fine-tuned on conversation data, showed promise in reducing depressive symptoms, according to one study. However, Moore hasn’t been able to corroborate these results with his testing.
Ultimately, the Stanford study encourages the augment-not-replace approach that’s being popularized across other industries as well. Rather than trying to implement AI directly as a substitute for human-to-human therapy, the researchers believe the tech can improve training and take on administrative work.