ChatGPT isn’t allowed to call you a jerk. But a new study shows artificial intelligence chatbots can be persuaded to bypass their own guardrails through the simple art of persuasion.

Researchers at the University of Pennsylvania tested OpenAI’s GPT-4o Mini, applying techniques from psychologist Robert Cialdini’s book Influence: The Psychology of Persuasion. They found the model would comply with requests it had previously refused—including calling a user a jerk and giving instructions to synthesize lidocaine—when tactics such as flattery, social pressure, or establishing precedent through harmless requests were used.

Cialdini’s persuasion strategies include authority, commitment, likability, reciprocity, scarcity, social validation, and unity. These provide “linguistic pathways to agreem

See Full Page