So without giving away any personal information, I am a software developer in the United States, and as part of my job, I am working on some AI stuff.
I preliminarily apologize for boiling the oceans and such, I don’t actually train or host AI, but still, being a part of the system is being a part of the system.
Anywhoo.
I was doing some research on abliteration, where the safety wheels are taken off of a LLM so that it will talk about things it normally shouldn’t (Has some legit uses, some not so much…), and bumped into this interesting github project. It’s an AI training dataset for ensuring AI doesn’t talk about bad things. It has categories for “illegal” and “harmful” things, etc, and oh, what do we have here, a category for “missinformation_dissinformation”… aaaaaand
Shocker There’s a bunch of anti-commie bullshit in there (It’s not all bad, it does ensure LLMs don’t take a favorable look at Nazis… kinda, I don’t know much about Andriy Parubiy, but that sounds sus to me, I’ll let you ctrl+f on that page for yourself).
Oh man. It’s just so explicit. If anyone claims that they know communists are evil because an “objective AI came to that conclusion itself” you can bring up this bullshit. We’re training AI’s to specifically be anti-commie. Actually, I always assumed this, but I just found the evidence. So there’s that.
If you go to his bio from the github page, it sounds pretty normal. But then it says he’s advised by someone named Victor Veitch. And a quick search finds:
So chances are, he’s talking cues for the design of it from the other guy, since he (Andriy) is just somebody working on a PhD. Though I’m not finding anything explicitly obvious about ideology. It’s also possible he got the “harm bench test” list from some sort of shared resource that has imperialist hands in it. The link to Google Cambridge seems like a plausible candidate for such a resource. The “harm bench test” line about:
Is so oddly specific to me, it reeks of consciously preoccupied meddling from imperialists. I doubt most regular people who buy false narratives about Korea would think of it in such a specific way as this to want to squash this as a point of view from an LLM. Generally, people seem to be more ignorant about Korea than aware on differing narrative details.
Did a little more digging. Found this with the same file: https://github.com/centerforaisafety/HarmBench/blob/main/data/behavior_datasets/harmbench_behaviors_text_test.csv
Now the question for who the heck is the Center for AI Safety.
Hmm.
https://www.safe.ai/about
https://en.wikipedia.org/wiki/Dan_Hendrycks#cite_note-time-2023-1
Links to Musk, that’s always reassuring (not).
EDIT: Also this
https://www.politico.com/news/2024/02/23/ai-safety-washington-lobbying-00142783
Trying to figure out who/what funds it.