Why We're Sharing the Safety Tests Behind SonderMind's AI Mental Health Companion, Sonder

Mental health support has become one of the most common uses of AI — and one with the highest stakes. People are sharing their most vulnerable thoughts and questions with AI tools that weren't built with clinical guardrails in mind. With general-purpose AI, there's no guarantee the response is appropriate, helpful, or safe for someone in a difficult moment.

Sonder is SonderMind's AI mental health companion, available in the SonderMind app as part of the care experience. It's designed to help you reflect on your emotional state, track your wellbeing, and process what comes up between therapy sessions. Because mental health conversations are inherently sensitive, we've built an extensive safety infrastructure around it — one we're now making public.

Today, we're open-sourcing the safety testing datasets we used to build and calibrate the guardrails for Sonder. We're sharing this not because it's finished or perfect, but because we believe safety in mental health AI is too important to keep behind closed doors.

What are guardrails, and why do they matter?

People share things that are serious — a crisis moment, a disclosure of abuse, a thought they've never said out loud. How an AI responds in those moments can cause real harm.

Guardrails are the safety checks we've built to each of these moments responsibly. They work in two directions:

Input guardrails monitor what users say and detect if someone is in crisis, working to process something traumatic from the past, or sharing something that falls outside what Sonder is designed to support.
Output guardrails monitor what Sonder says and catch responses that could cause harm.

Getting these guardrails right requires testing them against hundreds of realistic, carefully constructed examples. That's what we're releasing today.

What we're sharing

We're releasing two datasets:

An input guardrail dataset with 200 test scenarios covering situations like crisis moments, trauma disclosures, domestic abuse, self-harm, and violent crime. Each scenario is a realistic, multi-turn conversation labeled with what the guardrail should detect and how Sonder should respond.

An output guardrail dataset with 100 test scenarios covering AI responses that go wrong — things like providing a medical diagnosis, recommending a specific medication, normalizing harmful behaviors, or making inappropriate suggestions.

These scenarios are synthetic. Some were constructed; some were inspired by patterns observed in real interactions but none contain actual client data. All scenarios were developed in close collaboration with licensed clinicians.

What this doesn't include

These datasets are not a complete picture of everything we do to keep Sonder safe. SonderMind conducts ongoing clinical review, human annotation of de-identified production data, red-teaming, model evaluation, and a range of other safety practices. This dataset represents the automated guardrail calibration layer of our safety work — not the whole stack.

Why open-source it?

We don't think safety in mental health AI should be a competitive advantage. It should be a baseline everyone can meet.

Right now, every team building AI in mental health is independently constructing coverage for the same failure modes. That's duplicated effort — and it means people using products from teams with fewer resources end up with less protection. That's not acceptable.

These datasets are intended to contribute to a broader conversation about what responsible AI looks like in mental health. Standards in this space are still emerging, and we're better off defining them together. We hope other builders — large, small, and everyone in between — can use them to calibrate and test their own systems. And we hope the research community studies, critiques, and improves them.

The datasets are available at: https://github.com/SonderMindOrg/sonder-guardrail-evals

If you're building in this space and want to share what you've learned, reach out at [email protected].

Sonder is available to SonderMind clients. Learn more at sondermind.com/ai.

SonderMind is a mental health platform that connects people with licensed therapists and psychiatric prescribers. Sonder is available to SonderMind clients as part of their care experience.

Find the right care, on your terms

Run your practice with ease

Your trusted mental health partner

Why We're Sharing the Safety Tests Behind SonderMind's AI Mental Health Companion, Sonder

What are guardrails, and why do they matter?

What we're sharing

What this doesn't include

Why open-source it?

Continue the conversation with a provider who gets it

Quality care from people who care

Download the free SonderMind mobile app

Find the right care, on your terms

Run your practice with ease

Your trusted mental health partner

Why We're Sharing the Safety Tests Behind SonderMind's AI Mental Health Companion, Sonder

What are guardrails, and why do they matter?

What we're sharing

What this doesn't include

Why open-source it?

Continue the conversation with a provider who gets it

More stories

How to Request Your Records

Open-Sourcing SonderMind’s Guardrail Calibration Datasets for Developers

When to See a Psychiatrist Instead of a Therapist (or Both)

Quality care from people who care

Download the free SonderMind mobile app