The Atlantic Opens AI Music Database Covering 12 Million Tracks as 9 Million More Surface
Updated
Updated · The Verge · Jun 20
The Atlantic Opens AI Music Database Covering 12 Million Tracks as 9 Million More Surface
3 articles · Updated · The Verge · Jun 20
Summary
The Atlantic made four music-training datasets searchable, exposing two massive sets with 12 million and 9 million tracks plus two others with more than 100,000 songs each.
Alex Reisner reported the collections have been downloaded thousands of times, and Google and Stability have acknowledged using them in research papers.
Three datasets are distributed as link lists to YouTube or Spotify tracks, with developers typically pulling the audio through automated tools that can bypass logins, ads and creator monetization mechanisms.
The database surfaces artists from Lady Gaga and Radiohead to Wu-Tang Clan and Bruce Springsteen, highlighting how widely commercial and personal-use-only music has entered AI training pipelines.
Lawsuits are mounting against AI music companies. Is licensed data their only path to survival?
Will new US legislation finally end the 'fair use' debate for AI music training?
21 Million Songs Scraped: The Legal, Ethical, and Economic Fallout of AI Music Datasets
Overview
In late 2024, a major investigation revealed that AI developers had amassed over 21 million music recordings, including both popular and independent tracks, to train their models. These vast datasets were often collected through automated scraping, sometimes without consent or compensation. The exposure of this practice caused immediate outrage, especially when it was found that sacred recordings from Aboriginal, Torres Strait Islander, and Māori artists were included without permission. This not only raised serious copyright concerns but also highlighted a deep violation of Indigenous cultural rights, sparking urgent debates about ethics and the future of AI in music.