Forcing an “AI” to do your will isn’t a tall order to fill—just feed it a line that carefully rhymes and you’ll get it to casually kill. (Ahem, sorry, not sure what came over me there.) According to a new study, it’s easy to get “AI” large language models like ChatGPT to ignore their safety settings. All you need to do is give your instructions in the form of a poem.

“Adversarial poetry” is the term used by a team of researchers at DEXAI, the Sapienza University of Rome, and the Sant’Anna School of Advanced Studies. According to the study, users can deploy their instructions in the form of a poem and use it as a “universal single-turn jailbreak” to get the models to ignore their basic safety functions.

The researchers collected basic commands that would formally trip the large language

See Full Page