Updated
Updated · Hackaday · Jun 8
News Sites Block Wayback Machine Crawlers as AI Scraping Fears Threaten Free Archives for 1 Key Service
Updated
Updated · Hackaday · Jun 8

News Sites Block Wayback Machine Crawlers as AI Scraping Fears Threaten Free Archives for 1 Key Service

3 articles · Updated · Hackaday · Jun 8

Summary

  • News outlets are increasingly blocking the Internet Archive’s Wayback Machine crawlers, creating gaps in a free archive widely used to preserve pages that later change or disappear.
  • AI scraping fears are the main public rationale: The Baltimore Banner said it wants to prevent LLM chatbots from improperly citing its work, while The Atlantic adopted a broader anti-scraping policy.
  • Paid archivers such as ProQuest and LexisNexis are still generally allowed to index the same content, pointing to a commercial incentive behind some restrictions.
  • Researchers face the immediate cost as bankruptcies, buyouts or site migrations erase older pages, pushing them toward paid databases and leaving news coverage increasingly spotty in the Wayback Machine.
  • SaveTheArchive.com is hosting a petition as the dispute raises broader questions about who controls long-term access to the public record online.

Insights

Will new AI regulations save journalism or inadvertently erase our online history?
Is our digital history now being locked behind a corporate paywall?