I was just watching a tiktok with a black girl going over how race is a social construct. This felt wrong to me so I decided to back check her facts.
(she was right, BTW)
Now I’ve been using Microsoft’s Copilot which is baked into Bing right now. It’s fairly robust and sure it has it’s quirks but by and large it cuts out the middle man of having to find facts on your own and gives a breakdown of whatever your looking for followed by a list of sources it got it’s information from.
So I asked it a simple straightforward question:
“I need a breakdown on the theory behind human race classifications”
And it started to do so. quite well in fact. it started listing historical context behind the question and was just bringing up Johann Friedrich Blumenbach, who was a German physician, naturalist, physiologist, and anthropologist. He is considered to be a main founder of zoology and anthropology as comparative, scientific disciplines. He has been called the “founder of racial classifications.”
But right in the middle of the breakdown on him all the previous information disappeared and said, I’m sorry I can’t provide you with this information at this time.
I pointed out that it was doing so and quite well.
It said that no it did not provide any information on said subject and we should perhaps look at another subject.
Now nothing i did could have fallen under some sort of racist context. i was looking for historical scientific information. But Bing in it’s infinite wisdom felt the subject was too touchy and will not even broach the subject.
When other’s, be it corporations or people start to decide which information a person can and cannot access, is a damn slippery slope we better level out before AI starts to roll out en masse.
PS. Google had no trouble giving me the information when i requested it. i just had to look up his name on my own.
The big problem with AI butlers for research is, IMO, stripping out the source takes away important context that helps you decide wether the information you are getting is relevant and appropriate or not. Was the information posted on a parody forum or is it an excerpt from a book by an author with a Ph.D. on the subject? Who knows. The AI is trained to tell you something that you want to hear, not something you ought to hear. It’s the same old problem of self selecting information, but magnified 100x fold.
As it turns out, data is just noise without some authority or chain of custody behind it.
As I mentioned, Copilot links the sources of the information it gives at the bottom. if you want to double check the information, it is provided to you.
And somewhere in the Terms of Service it says you have to give up your first born child. Or maybe it doesn’t, but nobody will ever know because nobody reads more than is strictly required.
The source is just as vulnerable to being hallucinations as anything else it tells you.
So, when you go to check them… It’s not like the AI is going to hallucinate a valid registered domain with a webserver hosting the hallucinated source as well, so click the link, it’s dead/fake, toss out that reply as suspect.
If you follow the source and find it’s valid, supports what the AI said, and is reasonably trustworthy, then you can consider what it has told you.
If it cites its sources, you have a way to check its math (so to speak).
You have a way to do so, yes, but you actually have to do it and we know people don’t. False sources can just make already believable responses more credible, despite them being full of rubbish.
The person you were replying to was talking about checking those sources though.
Yes, fake sources can and will give people a false sense that it’s legit, but checking a “hallucinated” source will quickly make it clear that there’s nothing backing it up.
It’s a problem, but it’s one that an individual using it who’s aware of it does actually have a way of mitigating fairly easily.
I’m pretty sure when searching with AI the model gets told “here are five articles about <user search term>, summarize them and answer the following question: <user input> <top 5 search results from a traditional search engine>”
Many modern models using RAG can and do source with accurate citations. Whether the human checks the citation is another matter.
While it is true that RLHF introduces a degree of sycophancy due to the confirmation bias of the raters, more modern models don’t just agree with the user over providing accurate information. If that were the case, Grok wouldn’t have been telling Twitter users they were idiots for their racist and transphobic views.