Earlier this week, buried in the middle of a lengthy blog post addressing ChatGPT's propensity for severe mental health harms, OpenAI admitted that it's scanning users' conversations and reporting to police any interactions that a human reviewer deems sufficiently threatening.
"When we detect users who are planning to harm others, we route their conversations to specialized pipelines where they are reviewed by a small team trained on our usage policies and who are authorized to take action, including banning accounts," it wrote. "If human reviewers determine that a case involves an imminent threat of serious physical harm to others, we may refer it to law enforcement."
The announcement raised immediate questions. Don't human moderators judging tone, for instance, undercut the entire prem