• mo_ztt ✅@lemmy.world
    link
    fedilink
    English
    arrow-up
    2
    ·
    1 year ago

    I do know the basics of how LLMs work, yes. My point was that the process that it uses for next-word prediction is inherently capable of, in effect, iterating over a sequence of tokens and processing it step by step. For example it’s got the capability to look back in its context and see that:

    QUERY_TOKEN “And so I told him the story.” Repeat that quotation back to me. ANSWER_TOKEN "And so I told him

    Needs to complete with “the”. That’s trivial for good LLMs and they can get it perfect every time. There’s nothing that would prevent that same logic from completing:

    QUERY_TOKEN If I face north, then turn left, and left again, then 180 degrees to the right, then left and left and left, which way am I facing? ANSWER_TOKEN If you start facing north and turn left, you’ll be facing west. If you turn left again, you’ll be facing south. Turning 180 degrees to the right from south will make you face

    … as “north.” The problem is not the need for some internal buffer separate from the context. The problem is that it’s not directly capable with directions and spatial orientations in the same thorough fashion as it is with language; if it were, it’d solve the second problem just as readily as the first.

    • lloram239@feddit.de
      link
      fedilink
      arrow-up
      1
      ·
      1 year ago

      There’s nothing that would prevent that same logic from completing

      The whole text prompt is one step. It doesn’t iterate on the input tokens, it only iterates on the output tokens, which are generated one by one. So when a problem has multiple steps, GPT will struggle to answer them in a single step.

      • mo_ztt ✅@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        edit-2
        1 year ago

        I’m not sure how to distill the point I’m trying to make down any further. The basics of what you’re saying are 100% accurate, yes, but look back at the two specific examples I gave. Are you asserting that an LLM inherently can’t process the second example, because it would all have to be done in one step, but at the same time it can process the first (in one step)? Can’t you see what I’m saying that the two examples are identical, in the aspect of the LLM needing to identify which part of the input sequence applies to the place it’s currently at in the output sequence?

        Edit: Actually, second counterpoint: How, if you’re saying that this is just an inherent limitation of LLMs, can GPT-4 do it?

        • lloram239@feddit.de
          link
          fedilink
          arrow-up
          1
          ·
          1 year ago

          The difference is that repeating a quote does not need new information, it’s all already in the text prompt. The current direction on the other side is not in the text, it has to be derived from the instructions. If you ask GPT to break the problem down into steps, you shrink the size of the problem dramatically. One or two turn it can handle in one step, it’s only when you increase the turn number that it gets it wrong and can’t answer it in one step.

          It’s really not much different from humans here. If I read all those turn instruction, I have no idea where things will end up either. I have to break the problem down and keep track of the direction at each step.

          How, if you’re saying that this is just an inherent limitation of LLMs, can GPT-4 do it?

          GPT-4 is just bigger, meaning it can handle larger problems in one-step. It should still fail when you ask it the same simple problem, but just make it longer.

          • mo_ztt ✅@lemmy.world
            link
            fedilink
            English
            arrow-up
            2
            ·
            1 year ago

            Hm… yeah, I see what you’re saying. It’s not capable of maintaining “hidden” state as it goes step by step through the output, but if you have it talk its way through the hidden part of the state, it can do it. I can agree with that.