A new study into the testing procedure behind common AI models has reached some worrying conclusions.
The joint investigation between U.S. and U.K researchers examined data from over 440 benchmarking tests used to measure an AI's ability to resolve problems and determine safety parameters. They reported flaws in these tests that undermine the credibility of these models.
According to the study, the flaws are due to these benchmarks being built on unclear definitions or weak analytical methods, making it difficult to accurately make assessments of the model’s abilities or AI progress.
“Benchmarks underpin nearly all claims about advances in AI,” said Andrew Bean, lead author of the study. “But without shared definitions and sound measurement, it becomes hard to know whether models are ge

Tom's Guide

The Daily Beast
NBC News
Newsweek Top
FOX 10 Phoenix National
New York Magazine
NFL Carolina Panthers
CBS News
IMDb TV