- cross-posted to:
- hackernews@lemmy.smeargle.fans
- hackernews@derp.foo
- cross-posted to:
- hackernews@lemmy.smeargle.fans
- hackernews@derp.foo
Increasingly, the authors of works being used to train large language models are complaining (and rightfully so) that they never gave permission for such a use-case. If I were an LLM company, I’d be seriously looking for a Plan B right now, whether that’s engaging publishing companies to come up with new licensing options, paying 1,000,000 grad students to write 1,000,000 lines of prose, or something else entirely.
That’s not really true, though. The biggest reason why these cases were able to get traction was because when prompted a certain, specific way, researchers were able to reproduce substantial portions of copyrighted works - https://arstechnica.com/tech-policy/2023/08/openai-disputes-authors-claims-that-every-chatgpt-response-is-a-derivative-work/