News Sites Block Wayback Machine Crawlers as AI Scraping Fears Threaten Free Archives for 1 Key Service
Updated
Updated · Hackaday · Jun 8
News Sites Block Wayback Machine Crawlers as AI Scraping Fears Threaten Free Archives for 1 Key Service
3 articles · Updated · Hackaday · Jun 8
Summary
News outlets are increasingly blocking the Internet Archive’s Wayback Machine crawlers, creating gaps in a free archive widely used to preserve pages that later change or disappear.
AI scraping fears are the main public rationale: The Baltimore Banner said it wants to prevent LLM chatbots from improperly citing its work, while The Atlantic adopted a broader anti-scraping policy.
Paid archivers such as ProQuest and LexisNexis are still generally allowed to index the same content, pointing to a commercial incentive behind some restrictions.
Researchers face the immediate cost as bankruptcies, buyouts or site migrations erase older pages, pushing them toward paid databases and leaving news coverage increasingly spotty in the Wayback Machine.
SaveTheArchive.com is hosting a petition as the dispute raises broader questions about who controls long-term access to the public record online.