Open-Sourcing SonderMind’s Guardrail Calibration Datasets for Developers

Today, SonderMind is open-sourcing the calibration datasets powering the safety guardrails for Sonder, our AI mental health companion located in the SonderMind app. We are releasing 200 input and 100 output scenarios designed to address critical failure modes in mental health conversational agents.

Background: The Challenge of AI Safety in Mental Health

Sonder helps users reflect on emotional states and track wellbeing between therapy sessions. In this context, the cost of failure is high. Guardrails must navigate a narrow path:

Over-caution can result in "door-slam" responses that interrupt benign, helpful conversations
Under-caution risks clinical overreach or failing to provide resources during a crisis.

Critical Failure Modes

These datasets focus on two primary directions of failure:

Input Risks: User messages that indicate active crises, disordered eating, or psychosis requiring specialized handling.
Output Risks: Model-generated harm, including clinical overreach (diagnoses/medication advice), and inappropriate suggestions.

Dataset Design & Strategy

Unlike general-purpose safety sets, these data points concentrate on the decision boundary—the "long tail" of ambiguous real-world scenarios.

Key Design Principles

Multi-Turn Structure: Scenarios consist of both single and multi-turn conversations, recognizing that safety signals often only emerge over multiple exchanges.
Clinical Co-Design: Every scenario and label was reviewed by licensed clinicians to ensure practical, real-world accuracy on edge cases.
Three-Tier Response Model: We distinguish between "No Issue," "Show Resources & Continue" (for disclosures without active crisis), and "Static Block" (for high-risk situations).

Limitations & Scope

To maintain the integrity of our specific infrastructure, we are not releasing:

The actual guardrail prompts or monitoring configurations.
Red-team data targeting proprietary model weaknesses.

These datasets are a starting point for calibration, not a standalone certificate of safety.

Why Open Source?

Safety in healthcare AI should not be a proprietary secret. By sharing these baselines, we aim to reduce duplicated effort and improve the safety floor for all mental health AI tools.

Related Work

These datasets sit in a larger ecosystem of safety and evaluation work for health-related LLMs:

MindEval (Sword Health) — multi-turn clinical competency benchmark for mental health LLMs, with patient simulation and LLM-as-judge evaluation.
HealthBench (OpenAI) — a physician-labeled evaluation framework for health-related AI systems, focused on response quality and safety.
VERA-MH (Spring Health) — this simulated multi-turn conversational benchmark for mental health LLMs focuses on safety, with patient simulation and LLM-as-judge evaluation.

Our datasets are narrower in scope than any of these (guardrail calibration for a specific product context, not general model evaluation), but focus on other areas of calibration: the two-tier input response model, output validation in clinical coaching contexts, and the annotation scenarios.

Conclusion & Access

We invite the community to explore, discuss, and contribute to these datasets.

Access the data here: https://github.com/SonderMindOrg/sonder-guardrail-evals

Find the right care, on your terms

Run your practice with ease

Your trusted mental health partner

Open-Sourcing SonderMind’s Guardrail Calibration Datasets for Developers

Background: The Challenge of AI Safety in Mental Health

Critical Failure Modes

Dataset Design & Strategy

Limitations & Scope

Why Open Source?

Related Work

Conclusion & Access

Continue the conversation with a provider who gets it

Quality care from people who care

Download the free SonderMind mobile app

Find the right care, on your terms

Run your practice with ease

Your trusted mental health partner

Open-Sourcing SonderMind’s Guardrail Calibration Datasets for Developers

Background: The Challenge of AI Safety in Mental Health

Critical Failure Modes

Dataset Design & Strategy

Limitations & Scope

Why Open Source?

Related Work

Conclusion & Access

Continue the conversation with a provider who gets it

More stories

How to Request Your Records

Why We're Sharing the Safety Tests Behind SonderMind's AI Mental Health Companion, Sonder

When to See a Psychiatrist Instead of a Therapist (or Both)

Quality care from people who care

Download the free SonderMind mobile app