Nvidia accused of trying to cut a deal with Anna’s Archive for high‑speed access to the massive pirated book haul — allegedly chased stolen data to fuel its LLMs

schizoidman@lemmy.zip · 1 month ago

Nvidia accused of trying to cut a deal with Anna’s Archive for high‑speed access to the massive pirated book haul — allegedly chased stolen data to fuel its LLMs

theunknownmuncher@lemmy.world · 1 month ago

Allegedly most valuable company on the planet in all of history (can’t afford books). Allegedly not a bubble or fraud.

rafoix@lemmy.zip · 1 month ago

Will they be sued per book?

DandomRude@lemmy.world · 1 month ago

So we can assume that in the future, only slob written by LLMs will be available. I mean, who would be willing to spend hundreds of hours writing a book when even huge corporations that earn billions from it won’t pay the author a single dime?

dukemirage@lemmy.world · 1 month ago

Why should this development stop at books? There are already generated books available, mostly children’s books (no one’s thinking about them now).

DandomRude@lemmy.world · 1 month ago

This development will certainly not end with books - countless other creative and intellectual achievements have long been affected. That is precisely the problem with generative models, whether they involve text, code, video, images, or whatever else. All of this boils down to the fact that the already precarious situation for everyone who creates value by themselves is continuing to deteriorate. Professional work in all these areas will undoubtedly become even more precarious in the future, with artists, designers, and writers, who were already in a difficult position, now being joined by industries such as software development and administrative work.

Please don’t get me wrong: I am anything but a technology pessimist, but the business model of the so-called AI companies is so exploitative and their owners so unscrupulous that, given the status quo (cloud models), I can hardly imagine that this will lead to even halfway fair working conditions or remuneration models for people who create value in the form of intellectual achievements. I mean, this post is a vivid example.

PierceTheBubble@lemmy.ml · edit-2 1 month ago

So the amend alleges, Nvidia having used/stored/copied/obtained/distributed copyrighted works (including plaintiffs’), both through databases available on Hugging Face (‘Books3’ featured in both ‘The Pile’ and ‘SlimPajama’), or pirating from shadow libraries (like Anna’s Archive), to train multiple LLMs (primarily their ‘NeMo Megatron’ series), and distributing the copyrighted data through the ‘NeMo Megatron Framework’; data which was ultimately sourced from shadow libraries.

It’s quite an interesting read actually, especially the link to this Anna’s Archive blog post. Which it grossly pulls out of context, as plaintiffs clearly despise the shadow libraries too: as they have ultimately provided access to their copyrighted material.

Especially the part: “Most (but not all!) US-based companies reconsidered once they realized the illegal nature of our work. By contrast, Chinese firms have enthusiastically embraced our collection, apparently untroubled by its legality.” makes me wonder if that’s the reason why models like Deepseek, initially blew Western models out of the water.

Knock_Knock_Lemmy_In@lemmy.world · 1 month ago

You can ask deepseek detailed questions about Harry Potter books and it responds intelligently with (almost) quotes from the book.

Ask chatGPT and it will respond to questions but denys it has read any book.

Random_Character_A@lemmy.world · 1 month ago

Allegedly, but holy shit if true. Hard to explain yourself out of that one.