I wanted to extract some crime statistics broken by the type of crime and different populations, all of course normalized by the population size. I got a nice set of tables summarizing the data for each year that I requested.

When I shared these summaries I was told this is entirely unreliable due to hallucinations. So my question to you is how common of a problem this is?

I compared results from Chat GPT-4, Copilot and Grok and the results are the same (Gemini says the data is unavailable, btw :)

So is are LLMs reliable for research like that?

  • ViaFedi
    link
    fedilink
    arrow-up
    4
    arrow-down
    4
    ·
    21 days ago

    Solutions exist where you give the LLM a bunch of files e.g., PDFs which it then will solely base it’s knowledge on

    • jet@hackertalks.com
      link
      fedilink
      English
      arrow-up
      8
      arrow-down
      2
      ·
      21 days ago

      It’s still a probable token generator, you’re just training it on your local data. Hallucinations will absolutely happen.

      • slacktoid
        link
        fedilink
        English
        arrow-up
        1
        arrow-down
        1
        ·
        edit-2
        21 days ago

        This isn’t training its called a RAG Workflow, as there is no training step per se