Study of 8k Posts Suggests 40+% of Facebook Posts are AI-Generated

will_a113 · 15 hours ago

Study of 8k Posts Suggests 40+% of Facebook Posts are AI-Generated

Don_alForno@feddit.org · 4 hours ago

I see no reason why “post right wing propaganda” and "write so you don’t sound like “AI” " should be conflicting goals.

The actual argument why I don’t find such results credible is that the “creator” is trained to sound like humans, so the “detector” has to be trained to find stuff that does not sound like humans. This means, both basically have to solve the same task: Decide if something sounds like a human.

To be able to find the “AI” content, the “detector” would have to be better at deciding what sounds like a human than the “creator”. So for the results to have any kind of accuracy, you’re already banking on the “detector” company having more processing power / better training data / more money than, say, OpenAI or google.

But also, if the “detector” was better at the job, it could be used as a better “creator” itself. Then, how would we distinguish the content it created?

xor@lemmy.blahaj.zone · 2 hours ago

I’m not necessarily saying they’re conflicting goals, merely that they’re not the same goal.

The incentive for the generator becomes “generate propaganda that doesn’t have the language chatacteristics of typical LLMs”, so the incentive is split between those goals. As a simplified example, if the additional incentive were “include the word bamboo in every response”, I think we would both agree that it would do a worse job at its original goal, since the constraint means that outputs that would have been optimal previously are now considered poor responses.

Meanwhile, the detector network has a far simpler task - given some input string, give back a value representing the confidence it was output by a system rather than a person.

I think it’s also worth considering that LLMs don’t “think” in the same way people do - where people construct an abstract thought, then find the best combinations of words to express that thought, an LLM generates words that are likely to follow the preceding ones (including prompts). This does leave some space for detecting these different approaches better than at random, even though it’s impossible to do so reliably.

But I guess really the important thing is that people running these bots don’t really care if it’s possible to find that the content is likely generated, just so long as it’s not so obvious that the content gets removed. This means they’re not really incentivised to spend money training models to avoid detection.

Study of 8k Posts Suggests 40+% of Facebook Posts are AI-Generated

Study of 8k Posts Suggests 40+% of Facebook Posts are AI-Generated

Over 40% of Facebook Posts are Likely AI-Generated — Study – Originality.AI