Nvidia faces allegations over AI training data from shadow library

Nvidia is facing serious accusations: the company is suspected of attempting to negotiate with Anna's Archive—one of the largest 'shadow libraries' hosting pirated books and academic materials—for paid access to its archives to train AI models. This was reported by TorrentFreak, citing documents that emerged as part of a U.S. court proceeding.

According to the published materials, representatives from Nvidia's data strategy team allegedly discussed the possibility of paying for 'high-speed access' to Anna's Archive, whose volume is estimated at roughly 500 terabytes of data. Moreover, correspondence claims Nvidia's leadership approved such a plan just one week after initial contact, despite warnings about the content's illegal origins.

These documents surfaced during a class-action lawsuit accusing Nvidia of copyright infringement in training language models on the Books3 dataset. This dataset has previously been linked to pirated sources, including the site Bibliotik. Nvidia insists it used the materials under 'fair use,' but the new evidence prompted the plaintiffs to expand the lawsuit to include the Anna's Archive episode.

The situation appears especially resonant against the backdrop of major AI companies fiercely protecting their own developments while, as alleged, turning a blind eye to authors' rights when training their models. There is no confirmation yet that a deal with Anna's Archive was finalized or that money changed hands, but the mere fact of such negotiations could seriously damage Nvidia's reputation.