• mo_ztt ✅@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    1 year ago

    LLMs doing a really bad job at things outside the scope of their language model (spatial relationships, the alphabet, math) isn’t new. Although, I think Google letting an LLM into their search more than they should is important news.

    This is ChatGPT 3.5:

    If I face north, then turn left, and left again, then 180 degrees to the right, then left and left and left, which way am I facing?

    If you start facing north and turn left, you’ll be facing west. If you turn left again, you’ll be facing south. Turning 180 degrees to the right from south will also make you face south. If you then turn left three more times, you’ll still be facing south. So, after all these turns, you would be facing south.

    GPT-4 is actually able to solve this, but most other modern language models can’t. It’s just an easy way to show the cracks in the “because it can speak means it understands” facade.

    • lloram239@feddit.de
      link
      fedilink
      arrow-up
      2
      ·
      1 year ago

      You can have GPT-3.5 solve it by simply adding “Make a list of each step and print the current direction”.

      This is less of a problem with its understanding and more of a problem with it not having any internal buffer and its inability to loop. Meaning it simply can’t solve large problems in one go. It can’t think for itself and just tell you the answer. The text output is its thinking space, so questions need to be phrased in such a way that they let it go through the individual steps.

      • mo_ztt ✅@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        1 year ago

        But, it does have internal buffer space. You can give it a massive chunk of text and it can parrot it right back to you perfectly. It just can’t make use of that internal buffer space for tracking and reasoning about the problem, which I would describe as because it can’t reason in the first place.

        I mean, you are correct that saying that leads to it succeeding:

        If I face north, then turn left, and left again, then 180 degrees to the right, then left and left and left, which way am I facing? Make a list of each step and print the current direction.

        Sure, let’s break down each step and track the direction you’re facing:

        • Start facing north.
        • Turn left (90 degrees counterclockwise) - Now facing west.
        • Turn left again (90 degrees counterclockwise) - Now facing south.
        • Turn 180 degrees to the right (180 degrees clockwise) - Now facing north again.
        • Turn left (90 degrees counterclockwise) - Now facing west.
        • Turn left (90 degrees counterclockwise) - Now facing south.
        • Turn left (90 degrees counterclockwise) - Now facing east.

        So, after following these steps, you would be facing east.

        But, I would say that if it had understanding, it would be able to do that for itself because it knew that that was required in order to correctly answer the question. Or it would say “I’m sorry, I can’t quite make sense of that situation without talking through it one step at a time.” Or something like that. Here, take a look at these – does it seem like these are an issue of buffer space instead of its understanding?

        What are the countries in Africa that start with a “K”?

        There are three countries in Africa whose names start with the letter “K”:

        • Kenya
        • Kiribati
        • Comoros

        Please note that Kiribati is not entirely located in Africa; it is a Pacific island nation. It seems there might be some confusion, as there are no other African countries that start with the letter “K” as of my last knowledge update in September 2021.

        Or:

        What are the countries in Africa that start with a “K”? Make a list of each step and print the current direction.

        Sure, here’s a list of African countries that start with the letter “K,” along with each step and the current direction:

        • Start
        • Algeria
        • Angola
        • Benin
        • Botswana
        • Burkina Faso
        • Burundi
        • Cape Verde
        • Cameroon
        • Central African Republic
        • Chad
        • Comoros
        • Congo, Democratic Republic of the
        • Congo, Republic of the
        • Cote d’Ivoire
        • Djibouti
        • Egypt
        • Equatorial Guinea
        • Eritrea
        • Eswatini
        • Ethiopia
        • Gabon
        • Gambia
        • Ghana
        • Guinea
        • Guinea-Bissau
        • Kenya
        • Lesotho
        • Liberia
        • Libya
        • Madagascar
        • Malawi
        • Mali
        • Mauritania
        • Mauritius
        • Morocco
        • Mozambique
        • Namibia
        • Niger
        • Nigeria
        • Rwanda
        • Sao Tome and Principe
        • Senegal
        • Seychelles
        • Sierra Leone
        • Somalia
        • South Africa
        • South Sudan
        • Sudan
        • Tanzania
        • Togo
        • Tunisia
        • Uganda
        • Zambia
        • Zimbabwe
        • End

        Please note that the information provided is based on my knowledge cutoff date in September 2021, and there might have been changes since then.

        • lloram239@feddit.de
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          You can give it a massive chunk of text and it can parrot it right back to you perfectly. It just can’t make use of that internal buffer space for tracking and reasoning about the problem, which I would describe as because it can’t reason in the first place.

          That’s not the internal buffer, that’s the external buffer, also known as the text prompt. The point is that there is no internal buffer that is hidden from you (outside some control instruction to drive the chat GUI, that OpenAI hides). GPT gets the last ~2000 words of the text prompt, which includes both your questions as well as its answer, and predicts the next. This is done by the text prompt taking one round through the network, which is plain matrix multiplication, no loops or other complicated control strucutre. It fundamentally can’t loop or write down hidden notes. Meaning the current state of its “thinking” has to be written down in the text-prompt, as that’s the only dynamic state that exist in the system. That means if you force GPT to give a short answer, it will be have less “thinking time” than when you let it give it a long answer. GPT is fundamentally unable to think a bit longer for itself and give a short answer when it is done. The length of the answer is directly related to how much time it spend thinking about it. Which in turn is why forcing it to break the problem down into steps will often lead to better answers.

          What are the countries in Africa that start with a “K”?

          Everything involving letters and digits is problematic with GPT, as GPT does not operate on letters, but on tokens, i.e. words or word fragments. It has no direct access to the starting letter of a word, and while it can guess them with some level of success, it very easily gets them wrong when embedded in larger questions.

          • mo_ztt ✅@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            1 year ago

            I do know the basics of how LLMs work, yes. My point was that the process that it uses for next-word prediction is inherently capable of, in effect, iterating over a sequence of tokens and processing it step by step. For example it’s got the capability to look back in its context and see that:

            QUERY_TOKEN “And so I told him the story.” Repeat that quotation back to me. ANSWER_TOKEN "And so I told him

            Needs to complete with “the”. That’s trivial for good LLMs and they can get it perfect every time. There’s nothing that would prevent that same logic from completing:

            QUERY_TOKEN If I face north, then turn left, and left again, then 180 degrees to the right, then left and left and left, which way am I facing? ANSWER_TOKEN If you start facing north and turn left, you’ll be facing west. If you turn left again, you’ll be facing south. Turning 180 degrees to the right from south will make you face

            … as “north.” The problem is not the need for some internal buffer separate from the context. The problem is that it’s not directly capable with directions and spatial orientations in the same thorough fashion as it is with language; if it were, it’d solve the second problem just as readily as the first.

            • lloram239@feddit.de
              link
              fedilink
              arrow-up
              1
              ·
              1 year ago

              There’s nothing that would prevent that same logic from completing

              The whole text prompt is one step. It doesn’t iterate on the input tokens, it only iterates on the output tokens, which are generated one by one. So when a problem has multiple steps, GPT will struggle to answer them in a single step.

              • mo_ztt ✅@lemmy.world
                link
                fedilink
                English
                arrow-up
                1
                ·
                edit-2
                1 year ago

                I’m not sure how to distill the point I’m trying to make down any further. The basics of what you’re saying are 100% accurate, yes, but look back at the two specific examples I gave. Are you asserting that an LLM inherently can’t process the second example, because it would all have to be done in one step, but at the same time it can process the first (in one step)? Can’t you see what I’m saying that the two examples are identical, in the aspect of the LLM needing to identify which part of the input sequence applies to the place it’s currently at in the output sequence?

                Edit: Actually, second counterpoint: How, if you’re saying that this is just an inherent limitation of LLMs, can GPT-4 do it?

                • lloram239@feddit.de
                  link
                  fedilink
                  arrow-up
                  1
                  ·
                  1 year ago

                  The difference is that repeating a quote does not need new information, it’s all already in the text prompt. The current direction on the other side is not in the text, it has to be derived from the instructions. If you ask GPT to break the problem down into steps, you shrink the size of the problem dramatically. One or two turn it can handle in one step, it’s only when you increase the turn number that it gets it wrong and can’t answer it in one step.

                  It’s really not much different from humans here. If I read all those turn instruction, I have no idea where things will end up either. I have to break the problem down and keep track of the direction at each step.

                  How, if you’re saying that this is just an inherent limitation of LLMs, can GPT-4 do it?

                  GPT-4 is just bigger, meaning it can handle larger problems in one-step. It should still fail when you ask it the same simple problem, but just make it longer.

  • AutoTL;DR@lemmings.worldB
    link
    fedilink
    English
    arrow-up
    1
    ·
    1 year ago

    This is the best summary I could come up with:


    And to make matters worse, this featured claim actually comes from a bizarre, seemingly AI-generated blog post about a conversation between a human user and ChatGPT.

    Google’s AI-infused “SGE” search went as far as to vacuum up the ChatGPT-spun garbage and paraphrase it for users — a clear sign that the much-hyped feature doesn’t yet know how to filter out blatantly false information.

    Here’s what SGE told us when we asked it to provide us with “countries in Africa that start with the letter K,” again referencing that sleazy Emergent Mind post:

    When asked to provide a list of “countries in North America that start with the letter M,” for instance, Google SGE was deeply confused:

    “We’ve built in strong protections to mitigate against inaccuracies, but like all LLM-based experiences, generative AI in Search is experimental and can make mistakes,” the spokesperson told us over email.

    Asking ChatGPT to repeat letters, for example, breaks its tiny little digital brain, and when prompted with similar geography-by-alphabet questions, the OpenAI-built bot didn’t fare much better than its Google-made counterpart.


    I’m a bot and I’m open source!