Everything ever said on the internet was just the start of teaching artificial intelligence about humanity. Tech companies are now tapping into an older repository of knowledge: the library stacks.

Nearly one million books published as early as the 15th century, and in 254 languages, are part of a Harvard University collection released to AI researchers in June. Also coming are troves of old newspapers and government documents held by Boston’s public library.

Cracking open the vaults to centuries-old tomes could be a data bonanza for tech companies battling lawsuits from living novelists, visual artistsand others whose creative works have been scooped up without their consent to train AI chatbots.

“It is a prudent decision to start with public domain data because that’s less controversi

See Full Page