Training "AI" On Public Data Is Totally Fine And Not Stealing.

31337@sh.itjust.works · 5 months ago

Training "AI" On Public Data Is Totally Fine And Not Stealing.

istanbullu · 5 months ago

I don’t get the AI hate.

Railcar8095@lemm.ee · 5 months ago

As someone who doesn’t hate AI, I hate a few things about how it’s happening:

If I want to make a book, and I want to use other books for reference, I need to obtain them legally. Purchase, rent, loan… Else I’m a pirate. Multimillion companies say for them it’s fine as long as somebody posted it on the internet. Their version of annas-archive is suddenly legal and moral, while I’m harming the authors if I use it.
They are stuffing everything with AI, which generally means internet connection and sending unknown data.
It’s an annoying marketing gimmick. While incredible useful in some places, the insistence that it solves all the problems make it seem as a failure.

Specal@lemmy.world · 5 months ago

I think your issue moreso lies on copyright laws than the LLM datasets origination then. Which I completely understand, I hate copyright laws.

There’s TV shows that I can’t stream and the only legal way to watch them is to buy the box set for £90. Get fucked I’m not paying that, I’ll just download it for free.

evasive_chimpanzee@lemmy.world · 5 months ago

There are a lot of problems with it. Lots of people could probably tell you about security concerns and wasted energy. Also there’s the whole comically silly concept of them marketing having AI write your texts and emails for you, and then having it summarize the texts and emails you get. Just needlessly complicating things.

Conceptually, though, most people aren’t too against it. In my opinion, all the stuff they are labeling “generative AI” isn’t really “AI” or “generative”. There are lots of ways that people define AI, and without being too pedantic about definitions, the main reason I think they call it that, other than marketing, is that they are really trying to sway public opinion by controlling language. Scraping all sorts of copywritten material, and re-jumbling it to spit out something similar, is arguably something we should prohibit as copyright infringement. It’s enough of a gray area to get away with short term. By convincing people with the very language they use to describe it that they aren’t just putting other people’s material in a mixer, they are “generating new content”, they hope to have us roll over and sign off on what they’ve been doing.

Saying that humans create stories by jumbling together previous stories is a BS cop out, too. Obviously, we do, but humans have not, and do not have to give computers that same right. Also, LLMs are very complex, but they are also way way less complex than human minds. The way they put together text is closer to running a story through Google translate 10 times than it is to a human using a story for inspiration.

There are real, definite benefits of using LLMs, but selling it as AI and trying to force it into everything is a gimmick.