Global news organisations block Internet Archive crawlers over AI training fears

15 articles · Updated · Euronews · May 1

About 245 outlets across nine countries are involved, with at least one of the Archive’s four bots blocked by 241 news sites and more than 20 major publishers already blocking its main crawler.
Publishers say archived articles are being used by AI firms including OpenAI and Perplexity without permission or payment, while USA Today owner Gannett’s blocks have effectively removed hundreds of local titles from the historical record.
The Internet Archive says it is collateral damage and has limited some automated extraction, while some publishers seek partial-access compromises and journalists have backed a petition defending the Wayback Machine’s preservation role.

Could blocking the Internet Archive to prevent AI training end up erasing vital parts of our public record forever?

Will new licensing deals and regulations truly protect news publishers, or just shift the balance of power to tech giants in a different way?