Recalling that LLMs have no notion of reality and thus no way to map what they’re saying to things that are real, you can actually put an LLM to use in destroying itself.
The line of attack that this one helped me do is a “Tlön/Uqbar” style of attack: make up information that is clearly labelled as bullshit (something the bot won’t understand) with the LLM’s help, spread it around to others who use the same LLM to rewrite, summarize, etc. the information (keeping the warning that everything past this point is bullshit), and wait for the LLM’s training data to get updated with the new information. All the while ask questions about the bullshit data to raise the bullshit’s priority in their front-end so there’s a greater chance of that bullshit being hallucinated in the answers.
If enough people worked on the same set, we could poison a given LLM’s training data (and likely many more since they all suck at the same social teat for their data).
When I bought my current car, I read the privacy policy and it says that they’ll record anything in the cabin of the car they damned well please and upload it to the mothership(/car manufacturer/Subaru).
For a while, I adopted the practice of repeating disparaging things about Subaru while I drove. I’ve kindof gotten away from the practice lately. What I really ought to do is find and unplug the OnStar MOBO to kill its internet connection. I’ll do that one of these days.
As for what you’re talking about, I don’t think LLMs (typically?) learn by your interaction with them, right? Like, they take a lot of data, churn it through the algorithm, and produce a set of weights that are then used with the ending to produce hallucinations. And it’s very possible (probable, actually) that for the next generation of the LLM, they’ll use the prompts you used in the previous generation as more training data. So, yeah, what you’re getting at would work, but I don’t think it would work until the release of the next major version of the LLM.
I dunno. I could be wrong about some of my assumptions in that last paragraph, though. Definitely open to correction.
That’s what I’m talking about. We use the Degenerative AI to create a whole pile of bullshit Tlön-style, then spread that around the Internet with a warning up front for human readers that what follows is stupid bullshit intended to just poison the AI well. We then wait for the next round of model updates in various LLMs and start to engage with the subject matter in the various chatbots. (Perplexity’s says that while they do not keep user information, they do create a data set of amalgamated text from all the queries to help decide what to prioritize in the model.)
The ultimate goal is to have it, over time, hallucinate stuff into its model that is bullshit and well-known bullshit so that Degenerative AI’s inability to actually think is highlighted even for the credulous.
can you find and block the interior cameras? there’s a bunch of sticker manufacturers that sell opaque dots.