Meta faces class action over alleged copyright infringement in Llama training
Updated
Updated · The Verge · May 5
Meta faces class action over alleged copyright infringement in Llama training
15 articles · Updated · The Verge · May 5
Macmillan, McGraw-Hill, Elsevier, Hachette, Cengage and author Scott Turow accuse Meta of copying books and journal articles from LibGen, Anna's Archive, Sci-Hub and Common Crawl.
The suit says Llama can reproduce near-verbatim passages, seeks damages, an order halting unlawful conduct, and disclosure of all copyrighted works used to train the models.
The case adds to widening AI copyright litigation; Meta says training can be fair use, while earlier rulings left open whether using pirated material for AI training is lawful.
As Meta is sued for using pirated data, can we ever truly know what's inside the AI we use daily?
With courts divided on fair use, is a Supreme Court showdown over AI and copyright now inevitable?
If an AI can perfectly mimic any author's style, what is the future value of human creativity?
Meta Faces Class-Action Over Pirated Data: 81 Terabytes of Copyrighted Works Used for Llama AI
Overview
In April 2026, five major publishers and author Scott Turow filed a class-action lawsuit against Meta and Mark Zuckerberg, accusing them of using over 81 terabytes of pirated books and articles from shadow libraries like LibGen and Sci-Hub to train Meta's AI model, Llama. Internal evidence shows Meta researchers knowingly downloaded and shared this copyrighted content, with Zuckerberg approving the practice despite risks. Llama can mimic the styles of specific authors, threatening the market for original works. This lawsuit highlights growing legal risks for AI companies, pushing the industry toward licensing agreements to protect creators and reduce litigation.