- cross-posted to:
- hackernews@lemmy.smeargle.fans
- hackernews@derp.foo
- cross-posted to:
- hackernews@lemmy.smeargle.fans
- hackernews@derp.foo
Increasingly, the authors of works being used to train large language models are complaining (and rightfully so) that they never gave permission for such a use-case. If I were an LLM company, I’d be seriously looking for a Plan B right now, whether that’s engaging publishing companies to come up with new licensing options, paying 1,000,000 grad students to write 1,000,000 lines of prose, or something else entirely.
Or move to a country with more permissive IP laws to do your AI work.
It’s hard to trade with the rest of the world when you’re not a party to the Berne Convention
The Berne Convention contains an enumerated list of things that it recognizes as things that can be restricted by IP law. Training AIs is not among them.
Derivative works is though - and the cases slowly plodding through the court system right now are going to demand a decision on whether an LLM or its creations count as derivative works.
For it to be a derivative work you’re going to have to prove that the model contains a substantial portion of the material it’s supposedly a derivative work of. Good luck with that, neural nets simply don’t work that way.
That’s not really true, though. The biggest reason why these cases were able to get traction was because when prompted a certain, specific way, researchers were able to reproduce substantial portions of copyrighted works - https://arstechnica.com/tech-policy/2023/08/openai-disputes-authors-claims-that-every-chatgpt-response-is-a-derivative-work/
So many people urging policymakers to kneecap AI development and cede all that progress to China