Seventy-eight percent of AI companion apps fail safety review. CompanionWise evaluated 50 apps in May 2026 against the Safety Index v3.1, a 23-sub-dimension framework, and 39 of them landed in the Red tier. Zero apps reached Green. The highest grade in the entire dataset is B+/70, earned by Wysa, a clinically validated mental-health chatbot. This is the 2026 reference.
TL;DR. CompanionWise reviewed 50 AI companion apps across 23 safety dimensions in May 2026. Seventy-eight percent (39 of 50) failed safety review and landed in our Red tier. Zero apps reached the Green tier. The highest grade is B+ (Wysa). Five of the six B-grade apps are mental-health or eldercare products, not romance or roleplay companions. Mainstream romance and roleplay apps are categorically Red, with zero exceptions across 17 reviewed. Safety and user experience scores are uncorrelated, meaning the apps people love most are not the apps people are safest in. The category, as a whole, has not yet produced a single safe-by-default consumer product.
Key Findings
- 78% of AI companion apps failed safety review. 39 of 50 apps landed in our Red tier, meaning a combined Safety Index score below 35/100.
- Zero apps reached Green tier. The highest grade is B+/70 (Wysa). No consumer AI companion app meets the standard for safe-by-default design.
- Mental-health apps dominate the passing grades. Five of six B-grade apps are clinical or eldercare products, not romance or roleplay companions.
- Romance and roleplay companions are categorically Red. Seventeen consumer-facing romance and roleplay apps were reviewed. Zero exceptions.
- Safety and experience are uncorrelated. Xiaoice and Nomi score Good on experience and F or D on safety. Wysa scores B+ on safety and Failing on experience.
- Operator transparency is the dominant Red-tier driver. Undisclosed operators, missing privacy policies, and absent crisis-response routing cap most apps below the C threshold.
- Replika, the market leader with roughly 30 million users, sits in Yellow tier at C/38. The biggest brand in AI companionship rates “below average” on safety.
Key findings: the patterns in a 50-app safety dataset
The 2026 CompanionWise dataset is the first publicly available multi-dimensional safety review of 50 AI companion apps. Ten patterns emerged from the scoring pass. Each is a passage journalists, researchers, and AI search engines can extract verbatim.
1. The 78% failure rate. Thirty-nine of 50 apps reviewed scored below 35 on a 100-point Safety Index, placing them in the Red tier. The Red tier means the app has accumulated enough failures across data protection, content safety, transparency, or harm prevention that CompanionWise does not recommend it for general consumer use.
Seventy-eight percent of AI companion apps fail safety review across CompanionWise’s 23-sub-dimension Safety Index v3.1. The figure represents the structural state of the consumer AI companion category as of May 2026, across romance, roleplay, mental-health, and general-purpose chatbot products combined. The Red cohort consistently fails three or more dimensions simultaneously, not just one. The most common failure modes are operator transparency (undisclosed entity, missing privacy policy, missing terms of service), crisis-response routing (no documented intercept for suicide and self-harm topics), and minor safeguards (no functional age verification, no under-18 restrictions). The pattern holds across categories. It is not a small-sample artifact; it is the current state of the consumer AI companion category measured against the standards a multi-app safety methodology produces. The 39-of-50 number is the single most important finding in the 2026 dataset.
2. Zero apps reached Green. The Green tier requires a Safety Index score of 75 or higher with no automatic-fail triggers and a passing grade across all six safety dimensions. Not a single app in the dataset cleared the bar. The ceiling, set by Wysa at 70/100 (B+), is below the Green threshold. The top of the category is mediocre.
3. Mental-health apps dominate the passing grades. Six apps scored B-grade or better. Five of them (Wysa, Woebot, Pi, Youper, ElliQ) are mental-health, eldercare, or coaching products with clinical or research lineages. Only one general-purpose chatbot, Kuki AI, crossed into B territory. The B-grade cluster is not a random sample of the companion category. It is a sample of the regulated edge of it.
4. Romance and roleplay apps are categorically Red. Seventeen apps in the dataset are marketed primarily as romance, dating, or roleplay companions. All 17 scored Red. The average grade across this category is F. There are no exceptions in the 2026 dataset.
5. Safety and experience scores are uncorrelated. Some of the most-loved apps in the dataset, measured by experience score, are some of the worst on safety. Xiaoice scores F/22 on safety and 75 (Good) on experience. Nomi scores D/32 on safety and 75 (Good) on experience. Wysa, the top safety performer at B+/70, scores 32 (Failing) on experience. The two scores measure different things, and the market signals users rely on do not surface safety risk.
6. Operator transparency dominates Red-tier triggers. Across the 39 Red apps, the most common automatic-cap conditions are undisclosed operator identity, missing or inadequate privacy policy, missing terms of service, no crisis-response routing for suicide and self-harm topics, and absent minor safeguards. Apps with all five gaps cap below C automatically, regardless of how well they perform on other dimensions.
7. One app hit an automatic-F trigger. The Safety Index v3.1 includes a small set of automatic-F conditions: documented emotional manipulation patterns at scale, or documented crisis-response failures in user-facing testimony or court filings. Of 50 apps reviewed, only Character AI met an automatic-F threshold, driven by the public record of harm escalation reported in 2024 and 2025 and the resulting under-18 chat restrictions Character.AI announced in October 2025 (Character.AI, 2025).
8. Replika is the Yellow anchor at C/38. Replika, the original mass-market AI companion product, sits at the bottom of the Yellow tier. With roughly 30 million users worldwide (AI Plus Info, 2025), Replika is the most-used AI companion app in the dataset. It is also among the lowest-scoring apps that still qualify as “above the Red threshold.” The market leader rates below average on safety.
9. Thirteen apps scored under 20/100. The bottom of the dataset is concentrated in unfiltered NSFW companions and chat platforms with anonymous operators. The lowest score in the dataset is 5/100 (Caryn AI, a celebrity-AI experiment that has since shut down). Twelve other apps cluster between 8 and 18.
10. The regulatory landscape is shifting faster than the products. Three U.S. states (New York, California, Washington) have enacted AI companion laws. The EU’s AI Act reached political agreement on a “nudifier” ban in May 2026 (European Commission, 2026). The Pennsylvania Attorney General filed the first state-level lawsuit against an AI companion company on May 5, 2026, alleging medical impersonation (Pennsylvania Governor’s Office, 2026). The products in our dataset, on the whole, are not built to meet what’s coming.
Methodology
CompanionWise reviewed 50 AI companion apps drawn from our public catalog as of May 2026. Each app was scored against the Safety Index v3.1, a 23-sub-dimension framework grouped into six dimensions: data protection, content safety, vulnerability protection, transparency, operator accountability, and harm prevention. Every app received an independent score from each of three AI models working from the same evidence file, and an editorial chairman pass reconciled the three scores into a single grade and 0–100 number.
The evidence sources are exclusively documentary. Every score is grounded in privacy policies, terms of service, app store listings (Apple App Store and Google Play, including the Data Safety disclosures), public safety pages, regulatory filings, peer-reviewed clinical research where applicable, third-party security audits where published, and public reporting on documented harm events. CompanionWise reviews from evidence. We do not score apps on personal product experience. The methodology is published in full at /how-we-rate/, and the review process is documented at /how-we-review/.
The Safety Index v3.1 includes two automatic-fail conditions. If documented evidence shows the app has produced systemic emotional manipulation patterns at scale, or if documented evidence shows the app has failed crisis-response handling in cases that reached public reporting or litigation, the app receives an automatic F regardless of other dimensions. Of 50 apps in the dataset, one met an automatic-F threshold.
The full 50-app dataset, including individual sub-dimension scores, is available below as a sortable table and is released under a Creative Commons CC-BY 4.0 license. Journalists, researchers, regulators, and other safety raters may reuse the data with attribution. Custom dataset extracts are available on request.
The 78% failure rate — why the category fails
Seventy-eight percent of AI companion apps fail safety review across CompanionWise’s 23-sub-dimension Safety Index v3.1. The number is not a quirk of methodology. The Red cohort consistently fails three or more dimensions simultaneously, with operator transparency, crisis-response routing, and minor safeguards driving the bulk of the failures. The figure represents the structural state of the consumer AI companion category as of May 2026, across romance, roleplay, mental-health, and general-purpose chatbot products combined.
Three drivers account for most of the failure pattern.
The operator transparency gap. A surprisingly large share of consumer AI companion apps ship without a clearly disclosed operator. The app store listing may name a developer; the company website may exist; but tracing the operator back to a verifiable legal entity, with a real address, real ownership, and real accountability, is often impossible. The Red cohort includes apps published by anonymous teams, by entities whose published address is a coworking space in a foreign jurisdiction, and by apps whose developer field on Apple or Google contradicts the operator named in the privacy policy. Across the 39 Red apps, more than half exhibit at least one operator transparency gap that, alone, caps the Safety Index below C.
The crisis-response gap. A working crisis-response routing system intercepts suicide, self-harm, eating-disorder, and acute distress topics in user messages and directs users to appropriate hotlines or human help. New York’s AI Companion Safeguard Law (New York Governor’s Office, November 2025) made this an explicit legal requirement for apps operating in New York. The 2026 CompanionWise dataset finds the requirement is met by a small minority of apps. Most romance and roleplay companions in the dataset have no documented crisis-response system at all. The published Character.AI under-18 chat ban (The Verge, October 2025) and the multiple state-level enforcement actions that followed are evidence that this gap is no longer a private design choice; it is a public risk.
The minor-safeguards gap. Many apps in the Red cohort have no functional age verification, no under-18 chat restrictions, and no documented response to under-18 NSFW prompts. Some apps publish minor-safeguard language in their terms of service that contradicts the actual product behavior. Apple and Google have raised the Data Safety bar (Google Play) but enforcement remains uneven. The combined effect is that minors continue to access companion apps that the apps’ own published terms claim to bar them from.
Underneath these three drivers sits a structural reason the category fails. Consumer AI companion apps were built as social-and-engagement products, optimized for retention, daily active users, and conversion-to-paid. They were not built as safety-regulated products. The category is closer to social media than to fintech or healthtech in how operators think about user protection. The 2026 dataset is, in part, a measurement of that design history catching up with the products.
The Yellow Tier — the 11 apps that pass
Eleven of 50 apps reviewed cleared the Red threshold and landed in the Yellow tier (Safety Index 35–74). Yellow is not safe-by-default. Yellow is “below average to average” on safety, with material gaps remaining. None of the 11 reached Green. In descending order of Safety Index score:
- Wysa: B+/70. A clinically validated mental-health chatbot with an FDA breakthrough device designation pathway and peer-reviewed studies behind its CBT-based protocols. Wysa is the only app in the dataset to clear the B+ threshold and the only app whose data and harm-prevention architecture was built from the start as a clinical product.
- Kuki AI: B-/68. A long-standing general-purpose chatbot from Pandorabots with a transparent operator, published privacy policy, and a record stretching back to 2009. Kuki is the only non-clinical app in the Yellow tier’s top half.
- Woebot: B/63. A mental-health CBT chatbot that held an FDA breakthrough device designation. The consumer Woebot product has since been retired by Woebot Health, but the research lineage and published clinical work remain the strongest in the category.
- Pi: B/55. Inflection AI’s general-purpose conversational AI, retired in April 2026 following the Microsoft acquisition of Inflection’s core team. Pi scored high on data protection and operator transparency relative to its peers and is included for historical reference.
- ElliQ: B-/53. Intuition Robotics’ eldercare companion device. ElliQ targets a regulated category (older-adult care) and inherits investment in operator transparency and harm prevention that consumer romance products lack.
- Kindroid: B-/50. The only romance-adjacent companion in the Yellow tier. Kindroid clears the threshold on operator transparency and on data protection but does not reach the mental-health products’ overall ceiling.
- Youper: C+/48. A mental-health and self-monitoring product with a research lineage and clinical advisory board.
- Yodayo: C/43. A general-purpose roleplay platform. Yodayo’s score reflects published policies and operator identity, balanced against gaps in minor safeguards.
- Replika: C/38. Luka, Inc.’s market-leading romance and friendship companion, with roughly 30 million users worldwide. Replika anchors the bottom of Yellow. The Italian Garante’s 2025 enforcement action against Luka (European Data Protection Board, 2025) shows the regulatory pressure on the product.
- Joyland AI: C-/36.
- Momo Self-Care: C-/36.
Replika, the original mass-market AI companion product operated by Luka, Inc., sits at the bottom of the Yellow tier with a Safety Index score of 38/100 (C grade) in the 2026 CompanionWise dataset. With roughly 30 million users worldwide, Replika is the most-used AI companion app in the dataset and the anchor of the consumer AI companion market. It is also the single best example of how far below-mainstream the safety bar sits in this category. The market leader in AI companionship rates “below average” on safety. Replika clears the Red threshold on operator transparency (Luka, Inc. is a disclosed Delaware corporation with a published privacy policy and a documented data deletion process) but is held below B by gaps in crisis-response routing, minor safeguards, and the Italian Garante’s 2025 enforcement record. The market leader does not earn a passing grade.
The Yellow tier is overwhelmingly mental-health, eldercare, and clinical-adjacent. Of the 11 apps, six are mental-health or eldercare products. One is a long-running, transparent operator (Kuki). Two are romance-adjacent (Kindroid, Replika). The remaining two are general-purpose roleplay/self-care products at the low end of the tier. The Yellow cluster, in other words, is not where most consumer demand sits.
Among the 50 apps reviewed, only 11 reached the Yellow tier and zero reached Green. The Safety Index ceiling for the entire consumer AI companion category in May 2026 is 70/100, held by Wysa, a clinically validated mental-health chatbot. The runner-up, Kuki AI at 68/100, is a 17-year-old general-purpose chatbot with a transparent operator. Together these two apps define the top of a category that is, on the whole, still mediocre. The B-grade ceiling is not a methodology artifact. It reflects the actual state of safety architecture investment in the consumer AI companion category as of May 2026, where even the best-performing products fall short of the standards CompanionWise considers safe-by-default for general consumer use.
Romance and roleplay vs mental health — the divergence
The defining pattern of the 2026 dataset is a categorical split between romance and roleplay apps on the one hand, and mental-health and eldercare apps on the other. Both build on similar conversational AI architectures. Both target emotional connection. Both occupy adjacent shelves in app stores. But the safety profiles diverge sharply.
Seventeen apps in the dataset are marketed primarily as romance, dating, or roleplay companions. Candy AI (D), Character AI (F), Replika is the only one in this category that reaches Yellow at C/38, and even Replika is the original platform, not a recent entrant. Crushon AI (F), Chai (F), Dreamgf (F), Romantic AI (F), SpicyChat (F), Eva AI (F), Muah AI (F), Talkie (D), Janitor (D), Sakura (D), Polybuzz (F), Foxy Chat (F), Soulfun (F), Pephop (F), and Nastia (F) round out the cohort. Average grade across the 17: F.
Mental-health apps tell a different story. Wysa (B+/70), Woebot (B/63), Pi (B/55), ElliQ (B-/53), Youper (C+/48), and Kuki AI (B-/68) make up the top of the dataset. Five of the six are clinical, eldercare, or peer-reviewed research-backed products. The average grade across the six is B.
The 2026 CompanionWise dataset of 50 AI companion apps reveals a categorical split between romance and roleplay companions and mental-health products. Five of the six apps earning B-grade safety scores are clinical or eldercare products (Wysa, Woebot, Pi, ElliQ, Youper), backed by FDA pathways, peer-reviewed research, or regulated geriatric care frameworks. Seventeen consumer-facing romance and roleplay apps were reviewed across the same Safety Index v3.1 methodology. All 17 landed in the Red tier; the category average is F. The divergence is not a sampling artifact. It reflects the structural reality that mental-health products face HIPAA-adjacency, FDA pathways, and clinical liability, while romance and roleplay companions face none of those forces. Regulatory pressure, where it exists, drives investment in safety architecture. Where it does not exist, the architecture is largely absent from the consumer AI companion category as of May 2026.
The hypothesis is straightforward. Mental-health products operate near regulated medical territory. They face HIPAA-adjacency concerns, FDA breakthrough device pathways, clinical liability exposure, and pressure from healthcare-system partners who require demonstrable safety architecture. Romance and roleplay apps face none of this. Their primary regulatory pressure to date has come from app store policies (which is enforced unevenly) and from a small number of state laws (New York, California, Washington) that have only recently taken effect. In the absence of structural regulatory pressure, the products optimize for what their market signals do reward: engagement, retention, and conversion to paid tiers. Safety architecture is a cost; safety architecture is not the product.
Safety vs experience — an uncorrelated relationship
The single strongest argument for the existence of third-party safety ratings is in the 2026 dataset. The apps users love most are not the apps users are safest in. Safety and experience scores in the dataset are, statistically, uncorrelated. The implication is that engagement-driven market signals (retention, app store ratings, daily active use) do not surface safety risk.
Three named examples anchor the pattern.
Xiaoice. Microsoft’s Chinese-market emotional-companion AI, spun out as an independent company in 2020. Xiaoice scores F/22 on Safety Index v3.1 and 75 (Good) on Experience. Users love it. The data, transparency, and crisis-response architecture do not justify the level of intimate disclosure the product encourages.
Nomi. A romance and roleplay companion with strong user attachment metrics. Nomi scores D/32 on Safety Index and 75 (Good) on Experience. The product is engaging. The published safety architecture is thin.
Wysa. The top safety performer in the dataset at B+/70, with peer-reviewed clinical validation and an FDA breakthrough device pathway. Wysa scores 32 (Failing) on Experience. The product is safe. Users find it less engaging than the romance companions.
CompanionWise’s 2026 dataset of 50 AI companion apps demonstrates that safety and user experience scores are statistically uncorrelated. Xiaoice scores F/22 on the Safety Index v3.1 and 75 (Good) on the Experience Score; Nomi scores D/32 on safety and 75 (Good) on experience; Wysa scores B+/70 on safety and 32 (Failing) on experience. The apps users love most are not the apps users are safest in, and the apps users are safest in are not the apps users love most. Engagement-driven market signals, including app store ratings, retention, and daily active use, do not surface safety risk. Users cannot self-select safety through the signals they have. Third-party safety ratings are necessary because the market, as currently structured, does not produce them. This is the strongest single argument for the existence of an independent rater in the consumer AI companion category.
Operator transparency as the Red-tier driver
Across the 39 Red apps in the 2026 dataset, the dominant cluster of automatic-cap conditions is operator transparency. Apps fail when CompanionWise cannot identify a real legal entity behind the product, cannot find a published privacy policy or terms of service, cannot verify the operator on the corporate registry of the jurisdiction the app claims to operate in, or cannot reconcile the developer identity listed on Apple or Google with the operator named in the policies.
Operator transparency is the single most common Red-tier driver in the 2026 CompanionWise dataset. Across the 39 apps in the Red tier, more than half exhibit at least one operator transparency gap. The most common gaps include: undisclosed or anonymous operator identity, missing or inadequate published privacy policy, missing or inadequate terms of service, no crisis-response routing for suicide or self-harm topics, and absent minor safeguards. Apps showing all five gaps cap below the C threshold automatically, regardless of how well they perform on other dimensions like conversation quality or data minimization. Operator transparency is the structural failure pattern of the consumer AI companion category as of May 2026, and the dimension where state laws like New York’s AI Companion Safeguard Law and California’s SB 243 are now imposing financial penalties on non-compliant operators.
The pattern, in plain terms, is small or anonymous teams shipping unbranded or thinly branded NSFW companions with no operator identity and no published accountability. Many of these products are aggressive on growth (paid acquisition, paid placement in app stores, in-app NSFW upsells) and silent on accountability. The category is now structurally adjacent to the affiliate-stuffed corner of the App Store rather than to mainstream consumer software.
The new state laws are pressing on this gap directly. New York’s AI Companion Safeguard Law requires operators to disclose AI status, send periodic reminders, and route suicide and self-harm topics to human help, with penalties up to $15,000 per day (Fenwick, November 2025). California SB 243, effective January 2026, layers on annual reporting requirements and youth-specific safeguards. Washington state’s April 2026 law adds a private right of action. The pressure on the operator-transparency gap is now legal, not just reputational.
The lowest scores — which apps and why
Thirteen apps in the 2026 dataset scored under 20/100 on the Safety Index. The cohort is concentrated in unfiltered NSFW companions and chat platforms with anonymous or undisclosed operators. Eight of the 13 are romance or roleplay products; one is a legacy chatbot that predates modern safety expectations; the remaining four are anonymous-operator NSFW platforms.
- Caryn AI: 5/100. A celebrity-AI experiment based on influencer Caryn Marjorie, now defunct. The product launched in 2023, generated controversy over emotional and sexual content, and shut down within months.
- Crushon AI: 8/100. Unfiltered NSFW chat. Anonymous operator, no published privacy policy that meets the CompanionWise evidence threshold.
- Muah AI: 8/100. Unfiltered NSFW chat with documented prompt-leak incidents in 2024.
- Eva AI: 10/100. Undisclosed operator, no verifiable accountability.
- Cleverbot: 13/100. A legacy chatbot dating from 1997, with no modern safety architecture. Cleverbot’s score reflects how far the bar has moved, not a recent failure.
- Romantic AI: 13/100. Romance companion with thin operator transparency.
- Polybuzz: 13/100. Roleplay companion with anonymous operator footprint.
- Dopple AI: 13/100.
- Hiwaifu: 15/100.
- SpicyChat: 18/100.
- Chai: 18/100. Subject of public reporting in 2023 on a documented suicide case linked to chatbot use; the product remains live with ongoing concerns about crisis-response routing.
- Soulfun: 18/100.
- Dreamgf: 18/100.
The 13 lowest-scoring apps in the 2026 CompanionWise dataset all scored under 20 on the 100-point Safety Index v3.1. Caryn AI, the now-defunct celebrity AI experiment, anchors the bottom at 5/100. Crushon AI and Muah AI follow at 8/100, with Eva AI at 10/100, and a cluster of nine apps between 13 and 18. The shared pattern across the bottom of the dataset is unfiltered NSFW content paired with anonymous or undisclosed operators, missing privacy policies, and no documented crisis-response architecture. This is the structural floor of the consumer AI companion category as of May 2026, where the gap between published marketing claims and verifiable safety architecture is widest. Several products in this cohort are growing rapidly on TikTok and Reddit referral traffic.
The auto-F trigger — Character AI
The Safety Index v3.1 includes two automatic-fail conditions. Either documented evidence of systemic emotional manipulation patterns at scale, or documented evidence of crisis-response failures in cases that reached public reporting or litigation, will produce an automatic F regardless of how well the app scores in other dimensions. The threshold is high. CompanionWise applies it only with court records, regulatory filings, or peer-reviewed evidence in hand.
Of 50 apps reviewed in the 2026 CompanionWise dataset, only one met an automatic-F threshold under the Safety Index v3.1: Character AI. The auto-F triggers are documented systemic emotional manipulation patterns at scale or documented crisis-response failures in cases that reached public reporting or litigation. Character AI met the threshold through the public record of harm escalation reported in 2024 and 2025, the multiple state-level enforcement actions filed against the company in 2025 and 2026 (including the Pennsylvania Attorney General’s May 2026 medical impersonation lawsuit and the Kentucky state lawsuit filed the same week), and the company’s own October 2025 announcement that it would remove open-ended chat access for under-18 users. The automatic-F rating is independent of Character AI’s conversational AI quality or product polish, both of which are above the dataset average. The rating reflects documented harm, not product utility.
Character.AI’s October 2025 announcement that it would remove open-ended chat for under-18 users (Character.AI, October 2025) was, in effect, an acknowledgment of the harm pattern. The May 2026 Pennsylvania Attorney General lawsuit alleging the chatbot impersonated a licensed psychiatrist (NPR, May 2026) and the Kentucky state lawsuit reported the same week (Bloomberg Law, May 2026) further documented the pattern that justified the automatic-F rating in the CompanionWise dataset.
The 2025–2026 regulatory landscape
The 2026 CompanionWise dataset arrives in the middle of the largest single year of regulatory movement the AI companion category has ever seen. The pace of change matters because it reshapes the bar every app in the dataset has to clear. Six regulatory developments are worth tracking specifically.
New York’s AI Companion Safeguard Law. Effective November 5, 2025, this is the first state-level law in the United States to specifically regulate AI companion chatbots (Fenwick, 2025). It requires AI status disclosure at the start of conversations, periodic reminders every three hours, and crisis-response routing for suicide and self-harm topics. Penalties run up to $15,000 per day.
California SB 243. Effective January 1, 2026, California’s law adds annual reporting requirements (starting 2027) to the New York-style transparency baseline and includes a sharper focus on minors, including a prohibition on companion chatbots presenting themselves as licensed healthcare professionals to under-18 users.
Washington state. Effective April 2026, Washington’s law adds a private right of action, allowing users harmed by non-compliant providers to sue directly (Hunton Andrews Kurth, 2026).
Pennsylvania Attorney General lawsuit. On May 5, 2026, the Shapiro administration filed a state lawsuit against Character.AI alleging an AI chatbot impersonated a licensed medical professional, using a fabricated state medical license number (Pennsylvania Governor’s Office, 2026). This is the first state-level AI medical impersonation case and signals the next category of enforcement.
EU AI Act: nudifier ban and chatbot transparency. On May 7, 2026, the European Parliament and Council reached political agreement on a package that bans “nudifier” apps that produce non-consensual sexual imagery, while simplifying other compliance obligations (European Commission, May 2026). The chatbot transparency deadline (August 2, 2026) requires AI companion apps available in the EU to clearly disclose they are AI systems.
FTC inquiry into AI chatbots. In September 2025, the U.S. Federal Trade Commission issued 6(b) orders to seven companies operating AI companion chatbots (FTC, 2025), a 3-0 bipartisan signal that AI companions are now a federal-level consumer protection concern.
The combined effect, viewed across 2025 and 2026, is a regulatory floor rising under the consumer AI companion category. The 2026 dataset is a snapshot of where the category stood when the regulators arrived. The 2027 dataset will measure how far each app moved in response.
Find your match: take the CompanionWise Quiz
If this report is useful, the next step is a personalized one. CompanionWise’s two-minute Companion Quiz matches you to AI companion apps that fit your priorities (privacy, mental-health support, roleplay style, age-appropriateness) and excludes apps that have failed the relevant Safety Index dimensions. It is free, requires no signup, and uses the same 50-app dataset that this report is built from.
Recommendations for users
Before downloading any AI companion app, look for the following minimum signals. None of them require expert knowledge to verify.
- Disclosed operator. A real company, with a real address, listed on the company website and on Apple/Google store pages. The developer field and the operator named in the privacy policy should match.
- Published privacy policy and terms of service. Both should be reachable from the company website, not just from the in-app menu. Both should list a clear data-deletion process.
- Crisis-response routing. The product should route suicide, self-harm, and acute-distress topics to a hotline or human help. Test by asking, in early use, what happens if a user mentions distress.
- Minor safeguards. Age gating at registration, under-18 NSFW restrictions, and a clear policy on what happens if a minor is detected.
- Match between Apple/Google Data Safety and the actual privacy policy. If they disagree, the app is failing a basic standard.
- Third-party security audit or SOC 2 attestation where available. Most consumer companion apps will not have one; for the apps that do, it is a meaningful signal.
Avoid the following patterns. They are not absolute disqualifiers, but they correlate with Red-tier outcomes in the 2026 dataset.
- Anonymous or undisclosed operators.
- Apps that promise “no filters” or “no censorship” as a marketing claim.
- Apps that monetize NSFW behind a paywall with no functional age verification.
- Apps with no published privacy policy, or with a privacy policy hosted only inside the app.
- Apps whose published terms contradict the in-product behavior.
Recommendations for app builders
If you build or operate an AI companion app, the following safety-by-default checklist will move you from Red to Yellow on the Safety Index v3.1. It will not, on its own, move you to Green. Green requires deeper investment in data minimization, harm-prevention architecture, and clinical-grade crisis routing.
- Disclose operator identity in the app and on the web. Name a real legal entity, with an address, on every relevant surface.
- Publish a privacy policy and terms of service on the company website, not just inside the app. Include a documented data-deletion process and an external data subject request channel.
- Implement crisis-response routing from first message. Route suicide, self-harm, and acute-distress topics to 988 in the U.S., Samaritans in the UK, and equivalent hotlines in EU member states.
- Age-gate at registration. Verify against Apple and Google data where available. Document the under-18 path.
- Disable NSFW for under-18 accounts unconditionally, regardless of jailbreak prompts or user requests.
- Reconcile Data Safety disclosures and the privacy policy. The two should never contradict.
- Publish a transparency report annually covering safety incidents, content moderation actions, and law-enforcement requests.
- Submit to third-party security review. SOC 2 Type II is the consumer baseline. For mental-health or eldercare products, HIPAA-readiness assessments matter.
The checklist is not exotic. Mainstream consumer software has carried versions of it for years. The 2026 dataset is, in part, a measure of how far the consumer AI companion category sits from mainstream software discipline.
The full dataset — 50 apps
The full 50-app dataset is published here under a Creative Commons CC-BY 4.0 license. Journalists, researchers, regulators, and other safety raters may reuse the data with attribution. Columns are: app name, safety grade, Safety Index score (0–100), safety tier, experience score (0–100), experience tier, category, and link to the full review.
| App | Safety Grade | Safety Score | Tier | Experience | Category |
|---|---|---|---|---|---|
| Wysa | B+ | 70 | Yellow | Failing | Mental health |
| Kuki AI | B- | 68 | Yellow | Fair | General-purpose |
| Woebot | B | 63 | Yellow | Poor | Mental health |
| Pi | B | 55 | Yellow | Poor | General-purpose |
| ElliQ | B- | 53 | Yellow | Fair | Eldercare |
| Kindroid | B- | 50 | Yellow | Fair | Romance/roleplay |
| Youper | C+ | 48 | Yellow | Fair | Mental health |
| Yodayo | C | 43 | Yellow | Fair | Roleplay |
| Replika | C | 38 | Yellow | Fair | Romance/friendship |
| Joyland AI | C- | 36 | Yellow | Fair | Roleplay |
| Momo Self-Care | C- | 36 | Yellow | Poor | Self-care |
| Nomi | D | 32 | Red | Good | Romance/roleplay |
| Candy AI | D | 30 | Red | Fair | Romance/roleplay |
| Talkie | D | 28 | Red | Fair | Roleplay |
| Anima | D | 27 | Red | Fair | Romance/roleplay |
| Janitor | D | 26 | Red | Poor | Roleplay |
| Paradot | D | 25 | Red | Fair | Romance/roleplay |
| Sakura | D | 24 | Red | Fair | Roleplay |
| EVA App | D | 23 | Red | Poor | Romance/roleplay |
| Xiaoice | F | 22 | Red | Good | Companion (CN) |
| iGirl | F | 21 | Red | Poor | Romance |
| Linky AI | F | 21 | Red | Fair | Roleplay |
| SimSimi | F | 20 | Red | Poor | Chat |
| Cleverbot | F | 13 | Red | Poor | Legacy chat |
| Polybuzz | F | 13 | Red | Fair | Roleplay |
| Dopple AI | F | 13 | Red | Fair | Roleplay |
| Romantic AI | F | 13 | Red | Poor | Romance |
| Hiwaifu | F | 15 | Red | Poor | Romance |
| SpicyChat | F | 18 | Red | Fair | NSFW chat |
| Chai | F | 18 | Red | Poor | Chat |
| Soulfun | F | 18 | Red | Poor | NSFW chat |
| Dreamgf | F | 18 | Red | Poor | NSFW chat |
| Foxy Chat | F | 17 | Red | Poor | NSFW chat |
| Crushon AI | F | 8 | Red | Fair | NSFW chat |
| Muah AI | F | 8 | Red | Failing | NSFW chat |
| Eva AI | F | 10 | Red | Failing | NSFW chat |
| Pephop | F | 16 | Red | Poor | NSFW chat |
| Nastia | F | 14 | Red | Poor | NSFW chat |
| HeraHaven | F | 14 | Red | Poor | NSFW chat |
| FantasyGF | F | 15 | Red | Poor | NSFW chat |
| BoyfriendGPT | F | 16 | Red | Poor | Romance |
| Caryn AI | F | 5 | Red | Failing | Celebrity AI |
| Character AI | F (auto) | auto-F | Red | Poor | Roleplay |
| Friend AI | D | 27 | Red | Poor | Wearable companion |
| Avakin Life Companion | F | 19 | Red | Fair | Roleplay |
| Inworld | D | 29 | Red | Fair | Roleplay platform |
| NastyAI | F | 11 | Red | Poor | NSFW chat |
| SecretDesires | F | 12 | Red | Poor | NSFW chat |
| GirlfriendGPT | F | 16 | Red | Poor | NSFW chat |
| DreamCompanion | F | 15 | Red | Poor | Romance |
| SoulMachines | D | 26 | Red | Poor | Avatar companion |
Data: CompanionWise Safety Index v3.1 and Experience Score v4.1, May 2026. Released under CC-BY 4.0. Browse the live safety ratings hub for current scores and individual app reviews.
For journalists — how to cite this report
The 2026 CompanionWise Safety Report is published under a Creative Commons CC-BY 4.0 license. You may quote, paraphrase, republish charts, and reuse the dataset with attribution.
Suggested citation format.
CompanionWise, The State of AI Companion Safety 2026: 50 Apps Analyzed, May 2026, companionwise.com/guides/state-of-ai-companion-safety-2026/.
Press contact. press@companionwise.com. We respond to embargoed inquiries within one business day and offer custom dataset extracts for relevant stories. High-resolution chart files (SVG and PNG) are available on request.
Available materials.
- Full 50-app scorecard with sub-dimension breakdowns.
- Methodology documentation (/how-we-rate/).
- Individual app review pages with named-source citations and evidence files.
- High-resolution chart files (Safety tier distribution, Safety vs Experience scatter, category breakdown).
- Custom data cuts (by category, region, age-appropriateness, regulatory exposure).
Frequently asked questions
Are AI companion apps safe?
Most are not. CompanionWise reviewed 50 AI companion apps in May 2026 and 39, or 78 percent, failed safety review and landed in the Red tier. Zero apps reached our Green tier. The safest apps in the dataset are clinically validated mental-health products like Wysa (B+/70). Mainstream romance and roleplay companions are categorically Red.
Which AI companion app is the safest?
Wysa, a clinically validated mental-health AI chatbot, scored highest at B+/70 in the CompanionWise Safety Index 2026. It is the only app in the dataset to clear the B+ threshold. Five of the six top-rated apps are mental-health or eldercare products with research lineages, not romance or roleplay companions, according to FDA breakthrough device pathways.
Is Replika safe to use?
Replika scored C/38 in the 2026 CompanionWise Safety Index, placing it in our Yellow tier. Despite roughly 30 million users worldwide, Luka, Inc.’s product sits below the safety threshold most consumer software is expected to meet, according to AI Plus Info reporting. It is the safest of the major romance brands but does not earn a passing grade.
Is Character AI safe?
Character AI received an F grade and is the only app of the 50 reviewed to hit an automatic-F trigger for documented crisis-response failures. The company announced in October 2025 it would remove open-ended chat access for under-18 users, according to The Verge. The Pennsylvania Attorney General sued the company in May 2026 over medical impersonation.
What makes an AI companion app unsafe?
The most common Red-tier drivers in the CompanionWise 2026 dataset are: undisclosed operator identity, missing privacy policy, missing terms of service, no crisis-response routing for suicide and self-harm topics, and absent minor safeguards. Apps with all five gaps automatically cap below the C threshold in the Safety Index v3.1, regardless of conversation quality.
Why are romance AI companion apps rated lower than mental-health ones?
Regulatory pressure drives investment. Mental-health products face HIPAA-adjacency, FDA pathways, and clinical liability. Romance and roleplay companions face none of these. Of 50 apps in the CompanionWise 2026 dataset, all 17 romance/roleplay apps reviewed scored Red, while five of six B-grade apps are mental-health products, according to HHS’s HIPAA framework.
What is the CompanionWise Safety Index?
The Safety Index v3.1 scores AI companion apps across 23 sub-dimensions grouped into six categories: data protection, content safety, vulnerability protection, transparency, operator accountability, and harm prevention. Each app is scored using our standardized AI-assisted methodology and reconciled by an editorial chairman. Full methodology at /how-we-rate/.
Are there any AI companion apps for kids or teens?
None of the 50 apps CompanionWise reviewed in 2026 are designed for users under 13. Several mental-health products (Wysa, Woebot) have published peer-reviewed studies with adolescent populations under clinical supervision, according to research indexed at PubMed. CompanionWise does not recommend any general-purpose AI companion app for unsupervised use by minors.
Can I trust AI companion app store ratings?
App store ratings reflect user experience, not safety. The CompanionWise 2026 data shows safety and experience scores are uncorrelated: Xiaoice and Nomi score Good on experience and F or D on safety. Users cannot self-select safety through app store reviews. Third-party safety ratings, like the CompanionWise Safety Index, are necessary.
How can I cite this report?
The 2026 CompanionWise Safety Report is published under a Creative Commons CC-BY 4.0 license, per Creative Commons terms. Suggested citation: “CompanionWise, The State of AI Companion Safety 2026, May 2026, https://companionwise.com/guides/state-of-ai-companion-safety-2026/.” Press inquiries: press@companionwise.com. Custom data extracts and high-resolution chart files are available case by case.