Microsoft, OpenAI sued for copyright infringement by nonfiction book authors in class action claim::The new copyright infringement lawsuit against Microsoft and OpenAI comes a week after The New York Times filed a similar complaint in New York.
I wish the protections placed on corporate control on cultural and intellectual assets were placed on the average persons privacy instead.
Like I really don’t care that someone’s publicly available book and movie in the last century is analysed and used to create tools, but I do care that without people’s actual knowledge a intense surveillance apparatus is being built to collect every minute piece of data about their lives and the lives of those around them to be sold without ethical oversight or consent.
IP is bull, but privacy is a real concern. No one is going to using a extra copy of NY times article to hurt someone, but surveillance is used by authoritarians to oppress and harass innocent people.
I’m not a huge fan of Microsoft or even OpenAI by any means, but all these lawsuits just seem so… lazy and greedy?
It isn’t like ChatGPT is just spewing out the entirety of their works in a single chat. In that context, I fail to see how seeing snippets of said work returned in a Google summary is any different than ChatGPT or any other LLM doing the same.
Should OpenAI and other LLM creators use ethically sourced data in the future? Absolutely. They should’ve been doing so all along. But to me, these rich chumps like George R. R. Martin complaining that they felt their data was stolen without their knowledge and profited off of just feels a little ironic.
Welcome to the rest of the 6+ billion people on the Internet who’ve been spied on, data mined, and profited off of by large corps for the last two decades. Where’s my god damn check? Maybe regulators should’ve put tougher laws and regulations in place long ago to protect all of us against this sort of shit, not just businesses and wealthy folk able to afford launching civil suits and shakey grounds. It’s not like deep learning models are anything new.
deleted
I hear those kinds of arguments a lot, though usually from the exact same people who claimed nobody would be convicted of fraud for NFT and crypto scams when those were at their peak. The days of the wild west internet are long over.
Theft in the digital space is a very real thing in the eyes of the law, especially when it comes to copyright infringement. It‘s wild to me how many people seem to think Microsoft will just get a freebie here because they helped pioneering a new technology for personal gain. Copyright holders have a very real case here and I‘d argue even a strong one.
Even using user data (that they own legally) for machine learning could get them into trouble in some parts of the developed world because users 10 years ago couldn‘t anticipate it could be used that way and not give their full consent for that.
deleted
Personally, I think public info is fair game - consent or not, it’s public. They’re not sharing the source material, and the goal was never plagiarism. There was a period where it became coherent enough to get very close to plagiarism, but it’s been moving past that phase very quickly
Microsoft, especially with how they scraped private GitHub repos (and the things I’m sure Google and Facebook just haven’t gotten caught doing with private data) is way over the line for me. But I see that more as being bad stewards of private data - they shouldn’t be looking at it, their AI shouldn’t be looking at it, the public shouldn’t be able to see it, and they probably failed on all counts
Granted, I think copyright is a bullshit system. Normal people don’t get any protection, because you need to pay to play. Being unable to defend it means you lose it, and in most situations you’re going to spend way more on legal costs than you could possibly get back.
I also think the most important thing is that this tech is spread everywhere, because we can’t have one group in charge of the miracle technology… It’s too powerful.
Google has all the data they could need, they’ve bullied the web into submission… They don’t have to worry about copyright, they control the largest ad network and dominate search (at least for now).
It sucks that you can take any artist’s visual work, and fine tune a network to replicate endless rough facsimile in a few days. I genuinely get how that must feel violating.
But they’re going to be screwed when the corporate work dries up for a much cheaper option, and they’re going to have to deal with the flood of AI work… Copyright won’t help them, it’s too late for it to even slow it down
If companies did something wrong, have it out in court. My concern is that they’re going to pass laws on this that claim it’s for the artists, but effectively gatekeep AI to tech giants
If I want to be able to argue that having any copyleft stuff in the training dataset makes all the output copyleft – and I do – then I necessarily have to also side with the rich chumps as a matter of consistency. It’s not ideal, but it can’t be helped. ¯\_(ツ)_/¯
In your mind are the publishers the rich chumps, or Microsoft?
For copyleft to work, copyright needs to be strong.
I was just repeating the language the parent commenter used (probably should’ve quoted it in retrospect). In this case, “rich chumps” are George R.R. Martin and other authors suing Microsoft.
I fail to see how seeing snippets of said work returned in a Google summary is any different than ChatGPT or any other LLM doing the same.
Just because it was available for the public internet doesn’t mean it was available legally. Google has a way to remove it from their index when asked, while it seems that OpenAI has no way to do so (or will to do so).
deleted
All the grifters coming out to feed 🫣
If it’s not infringement to input copyrighted materials, then it’s not infringement to take the output.
Copyright can be enforced at both ends or neither end, not one or the other.
Because… why?
A better question is: Why not?
If Copyright doesn’t protect what goes in, why should it protect what comes out?
Because sometimes it spits it out verbatim, and sometimes GPLed code gets spat out in the case of Copilot.
See: the time Copilot spat out the Quake inverse square root algorithm, comments and all.
Also, if it’s legal to disregard libre/open source licenses for this, then why isn’t it legal for me to look at leaked code, which I also do not have permission to use, and use the knowledge gained from that to write something else?
Which is exactly why the output of an AI trained on copyrighted inputs should not be copyrightable. It should not become the private property of whichever company owns the language model. That would be bad for a lot more reasons than the potential for laundering open source code.
Well. That sounds perfectly legal. However, mind that “leaked” implies unauthorized copying and/or a violation of trade secrets. But it’s not a given, that looking at such code violates any law.
And if they’re not going to respect the copyleft, they are also performing unauthorised copying.
“Copyleft” means certain types of copyright licenses. Since these licenses generally allow and encourage public distribution/copying, such code is certainly not leaked. Laws pertaining to trade secrets cannot be involved in principle.
I think the copies made during AI training would be typically allowed under copyleft licenses. In any case, as it is a copyright license, it is subject to the same limitations.
Public distribution and copying is allowed, but only if the license in it’s entirety is respected.
And when the license is void, it’s all rights reserved, right?
deleted by creator
deleted by creator
The part that you’re apparently having trouble understanding is that a language model is not a human mind and a human mind is not a language model.