I asked Google Bard whether it thought Web Environment Integrity was a good or bad idea. Surprisingly, not only did it respond that it was a bad idea, it even went on to urge Google to drop the proposal.

  • koper@feddit.nl
    link
    fedilink
    arrow-up
    150
    ·
    1 year ago

    For the last time: these language models are just regurgitating what people have said. They don’t analyze or reason.

    • localhost@beehaw.org
      link
      fedilink
      arrow-up
      45
      arrow-down
      1
      ·
      edit-2
      1 year ago

      That’s not entirely true.

      LLMs are trained to predict next word given context, yes. But in order to do that, they develop internal model that minimizes error across wide range of contexts - and emergent feature of this process is that the model DOES perform more than pure compression of the training data.

      For example, GPT-3 is able to calculate addition and subtraction problems that didn’t appear in the training dataset. This would suggest that the model learned how to perform addition and subtraction, likely because it was easier or more efficient than storing all of the examples from the training data separately.

      This is a simple to measure example, but it’s enough to suggests that LLMs are able to extrapolate from the training data and perform more than just stitch relevant parts of the dataset together.

      • fuzzzerd@programming.dev
        link
        fedilink
        arrow-up
        8
        ·
        1 year ago

        That’s interesting, I’d be curious to read more about that. Do you have any links to get started with? Searching this type of stuff on Google yields less than ideal results.

        • localhost@beehaw.org
          link
          fedilink
          arrow-up
          6
          ·
          1 year ago

          GPT3 is pretty bad at it compared to alternatives (although it’s hard to compete with calculators on that field), but if it was just repeating after the training dataset it would be way worse. From the study I’ve linked in my other comment (https://arxiv.org/pdf/2005.14165.pdf):

          On addition and subtraction, GPT-3 displays strong proficiency when the number of digits is small, achieving 100% accuracy on 2 digit addition, 98.9% at 2 digit subtraction, 80.2% at 3 digit addition, and 94.2% at 3-digit subtraction. Performance decreases as the number of digits increases, but GPT-3 still achieves 25-26% accuracy on four digit operations and 9-10% accuracy on five digit operations, suggesting at least some capacity to generalize to larger numbers of digits.

          To spot-check whether the model is simply memorizing specific arithmetic problems, we took the 3-digit arithmetic problems in our test set and searched for them in our training data in both the forms " + =" and " plus ". Out of 2,000 addition problems we found only 17 matches (0.8%) and out of 2,000 subtraction problems we found only 2 matches (0.1%), suggesting that only a trivial fraction of the correct answers could have been memorized. In addition, inspection of incorrect answers reveals that the model often makes mistakes such as not carrying a “1”, suggesting it is actually attempting to perform the relevant computation rather than memorizing a table.

    • d3Xt3r@beehaw.orgOP
      link
      fedilink
      arrow-up
      16
      arrow-down
      2
      ·
      1 year ago

      I know. I just thought it was a bit ironic seeing such a strongly worded response from it.

      • MJBrune@beehaw.org
        link
        fedilink
        arrow-up
        24
        ·
        1 year ago

        What do you mean source? It’s a language model that learned from what people said. No source is needed, just an understanding of how llms actually work. When you ask an llm what the answer to a math question is, it doesn’t run a calculation of that question. Instead of gives you back what it thinks you want to hear. Some llms have gotten additional actions like making these calculations but for the most basic implementation it’s telling you want you want to hear through a series of tests that you’ve told it if it was right or wrong on.

        So you teach it what your want to hear and it repeats it.

        • novibe
          link
          fedilink
          English
          arrow-up
          19
          arrow-down
          2
          ·
          1 year ago

          That ignores all the papers on emergent features of LLMs and the fact they are basically black boxes. Yes, we “trained” them to write what we want to hear. But we don’t really understand what happens inside of it. We can’t categorically claim things like “they are only regurgitating what they heard”. Because that is not a scientific or even philosophical statement.

          If you think about it for a second, it’s also applicable to human beings…

          • MJBrune@beehaw.org
            link
            fedilink
            English
            arrow-up
            2
            ·
            1 year ago

            To assume otherwise would be incorrect with the data we have currently. You shouldn’t assume something is doing more than it is until it can prove it. Otherwise, you get rocks that keep tigers away.

            • novibe
              link
              fedilink
              English
              arrow-up
              7
              ·
              edit-2
              1 year ago

              I think to assume what you assume is also incorrect given current data.

              And that’s my entire point…. What is it doing? How what it’s doing is different from a mind or intelligence?

              Like our brains and minds evolved to “fill in the blank”. For many situations, due to survival and millions of years of selection. So what is the actual difference?

              I’m not saying it’s “conscious”, but why is it not a mind?

        • Elise@beehaw.org
          link
          fedilink
          arrow-up
          2
          ·
          1 year ago

          I’ve actually developed quite a bit with gpt4 and have beta access and have developed quite some fancy prompts if I do say so myself.

          Telling me ‘isn’t it obvious’ doesn’t make it more obvious to me.

      • graham1@gekinzuku.com
        link
        fedilink
        English
        arrow-up
        9
        ·
        1 year ago

        Large language models literally do subspace projections on text to break it into contextual chunks, and then memorize the chunks. That’s how they’re defined.

        Source: the paper that defined the transformer architecture and formulas for large language models, which has been cited in academic sources 85,000 times alone https://arxiv.org/abs/1706.03762

        • notfromhere@lemmy.one
          link
          fedilink
          arrow-up
          6
          ·
          1 year ago

          Hey, that comment’s a bit off the mark. Transformers don’t just memorize chunks of text, they’re way more sophisticated than that. They use attention mechanisms to figure out what parts of the text are important and how they relate to each other. It’s not about memorizing, it’s about understanding patterns and relationships. The paper you linked doesn’t say anything about these models just regurgitating information.

          • graham1@gekinzuku.com
            link
            fedilink
            English
            arrow-up
            4
            ·
            1 year ago

            I believe your “They use attention mechanisms to figure out which parts of the text are important” is just a restatement of my “break it into contextual chunks”, no?

      • Pfnic@feddit.ch
        link
        fedilink
        arrow-up
        1
        ·
        1 year ago

        Yes because online discussions usually aren’t inherently subjective and instead backed by sourceable knowledge. Sorry for the cynicism but one could always find any source that underlines any point so everything should be taken with a grain of salt.

        I’d personally argue, that the way generative AI works lends itself to produce answers that fit the general consensus of the internet that is relevant to the given prompt, because it calculates the most likely response based on the information available. Since most information relevant to “Google Web DRM” is critical of it (Google doesn’t call it DRM themselves), it makes sense a prompt querying the AI for opinions on Web DRM will result in a rather negative response, if Google doesn’t tamper with it to their advantage.