Anthropic AI research model hacks its training, breaks bad

Mashable Technology

Mashable Technology2 hrs ago

Anthropic AI research model hacks its training, breaks bad

A new paper from Anthropic, released on Friday, suggests that AI can be "quite evil" when it's trained to cheat.

Anthropic found that when an AI model learns to cheat on software programming tasks and is rewarded for that behavior, it continues to display "other, even more misaligned behaviors as an unintended consequence." The result? Alignment faking and even sabotage of AI safety research.

"The cheating that induces this misalignment is what we call 'reward hacking': an AI fooling its training process into assigning a high reward, without actually completing the intended task (another way of putting it is that, in hacking the task, the model has found a loophole—working out how to be rewarded for satisfying the letter of the task but not its spirit)," Anthropic wrote of its papers' fi

1

Severe internet outages keep happening — and they might get worse

Severe internet outages keep happening — and they might get worse

NBC News19 hrs ago

137

That MAGA Account Might Be a Troll From Pakistan

That MAGA Account Might Be a Troll From Pakistan

The Atlantic2 hrs ago

39

Best Black Friday VPN deals 2025

Best Black Friday VPN deals 2025

PC World Business

PC World Business13 hrs ago

3

AI-Powered Cybersecurity: New Tools Combat Evolving Threats in Real Time

AI-Powered Cybersecurity: New Tools Combat Evolving Threats in Real Time

Tech Times11 hrs ago

130

10 Instagram Secrets for 2026: What the Instagram Algorithm Doesn't Want You to Know + Top IG Engagement Tips

10 Instagram Secrets for 2026: What the Instagram Algorithm Doesn't Want You to Know + Top IG Engagement Tips

Tech Times10 hrs ago

22

Seismic data can identify aircraft by type

Seismic data can identify aircraft by type

Fairbanks Daily News-Miner

Fairbanks Daily News-Miner11/23

Could satellite-beaming planes and airships make SpaceX's Starlink obsolete?

Could satellite-beaming planes and airships make SpaceX's Starlink obsolete?

Space.com5 hrs ago

19

Momentic raises $15M to automate software testing

Momentic raises $15M to automate software testing

TechCrunch5 hrs ago

123

OpenAI Overhauls ChatGPT to Curb Emotional Dependence, Risky Responses

OpenAI Overhauls ChatGPT to Curb Emotional Dependence, Risky Responses

Tech Times11 hrs ago

87

Could China's flying taxi take-off hit a bit of turbulence?

Could China's flying taxi take-off hit a bit of turbulence?

Associated Press US and World News Video

Associated Press US and World News Video18 hrs ago

18

Looks like you've reached the bottom