Updated
Updated · Straight Arrow News · May 22
More Than 340 U.S. Publishers Block Wayback Machine Over AI Training Fears
Updated
Updated · Straight Arrow News · May 22

More Than 340 U.S. Publishers Block Wayback Machine Over AI Training Fears

2 articles · Updated · Straight Arrow News · May 22
  • More than 340 U.S. news publishers have cut off the Wayback Machine from archiving their stories, a sharp expansion from earlier reports that only some major outlets had done so.
  • The publishers say archived copies could give AI companies unauthorized access to subscriber-only, proprietary journalism, though the report says there is no evidence the Internet Archive is being used to train models.
  • Five conglomerates account for most of the blocks, with 80% of the affected local newspaper sites owned by USA Today’s parent company; outlets named include The New York Times and The Idaho Statesman.
  • The nonprofit Internet Archive says the fears are misplaced and warns blocking its crawlers could damage the public record, especially as a 2024 Pew study found 38% of 2013 webpages had vanished within a decade.
  • More than 200 journalists have signed a petition urging employers to restore access, underscoring a widening clash between publishers’ AI defenses and the archive’s role in preserving online history.
As news outlets block web archives, who is now responsible for preserving our digital history for future generations?
Could blocking archives to fight AI inadvertently create a 'digital dark age' and increase vulnerability to misinformation?
Will licensing deals with AI companies save human journalism or simply turn news into another dataset for machines?

The Shrinking Digital Archive: Hundreds of Publishers Block Wayback Machine Over AI Copyright Fears

Overview

A growing number of publishers are blocking the Internet Archive’s Wayback Machine, causing the digital historical record to shrink and limiting access to past web content. This trend is driven by publishers’ concerns that AI companies are using archived, copyrighted material to train their models without permission or compensation, which threatens traditional content creation business models. As a result, publishers are restricting access even to non-profit archival efforts, intensifying the conflict between digital preservation and copyright protection. This widespread blocking risks eroding the public record, making it harder for everyone to access and verify historical information online.

...