Some say confession is good for the soul, but what if you have no soul? OpenAI recently tested what happens if you ask its bots to "confess" to bypassing their guardrails.

We must note that AI models cannot "confess." They are not alive, despite the sad AI companionship industry. They are not intelligent. All they do is predict tokens from training data and, if given agency, apply that uncertain output to tool interfaces.

Terminology aside, OpenAI sees a need to audit AI models more effectively due to their tendency to generate output that's harmful or undesirable – perhaps part of the reason that companies have been slow to adopt AI , alongside concerns about cost and utility.

"At the moment, we see the most concerning misbehaviors, such as scheming⁠ , only in stress-tests and adversar

See Full Page