Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

misk@sopuli.xyz · 9 months ago

Stack Overflow bans users en masse for rebelling against OpenAI partnership — users banned for deleting answers to prevent them being used to train ChatGPT

Spedwell@lemmy.world · 9 months ago

We should already be at that point. We have already seen LLMs’ potential to inadvertently backdoor your code and to inadvertently help you violate copyright law (I guess we do need to wait to see what the courts rule, but I’ll be rooting for the open-source authors).

If you use LLMs in your professional work, you’re crazy. I would never be comfortably opening myself up to the legal and security liabilities of AI tools.

Cubes@lemm.ee · 9 months ago

If you use LLMs in your professional work, you’re crazy

Eh, we use copilot at work and it can be pretty helpful. You should always check and understand any code you commit to any project, so if you just blindly paste flawed code (like with stack overflow,) that’s kind of on you for not understanding what you’re doing.

Spedwell@lemmy.world · 9 months ago

The issue on the copyright front is the same kind of professional standards and professional ethics that should stop you from just outright copying open-source code into your application. It may be very small portions of code, and you may never get caught, but you simply don’t do that. If you wouldn’t steal a function from a copyleft open-source project, you wouldn’t use that function when copilot suggests it. Idk if copilot has added license tracing yet (been a while since I used it), but absent that feature you are entirely blind to the extent which it’s output is infringing on licenses. That’s huge legal liability to your employer, and an ethical coinflip.

Regarding understanding of code, you’re right. You have to own what you submit into the codebase.

The drawback/risks of using LLMs or copilot are more to do with the fact it generates the likely code, which means it’s statistically biased to generate whatever common and unnoticeable bugged logic exists in the average github repo it trained on. It will at some point give you code you read and say “yep, looks right to me” and then actually has a subtle buffer overflow issue, or actually fails in an edge case, because in a way that is just unnoticeable enough.

And you can make the argument that it’s your responsibility to find that (it is). But I’ve seen some examples thrown around on twitter of just slightly bugged loops; I’ve seen examples of it replicated known vulnerabilities; and we have that package name fiasco in the that first article above.

If I ask myself would I definitely have caught that? the answer is only a maybe. If it replicates a vulnerability that existed in open-source code for years before it was noticed, do you really trust yourself to identify that the moment copilot suggests it to you?

I guess it all depends on stakes too. If you’re generating buggy JavaScript who cares.

Grandwolf319@sh.itjust.works · 9 months ago

I feel like it had to cause an actual disaster with assets getting destroyed to become part of common knowledge (like the challenger shuttle or something).

Amanduh@lemm.ee · 9 months ago

Yeah but if you’re not feeding it protected code and just asking simple questions for libraries etc then it’s good