• LifeInMultipleChoice
    link
    fedilink
    English
    arrow-up
    16
    arrow-down
    2
    ·
    1 day ago

    So if you give a human and a system 10 tasks and the human completes 3 correctly, 5 incorrectly and 3 it failed to complete altogether… And then you give those 10 tasks to the software and it does 9 correctly and 1 it fails to complete, what does that mean. In general I’d say the tasks need to be defined, as I can give very many tasks to people right now that language models can solve that they can’t, but language models to me aren’t “AGI” in my opinion.

      • notfromhere
        link
        fedilink
        English
        arrow-up
        4
        ·
        12 hours ago

        Any is very hard to benchmark and is also not how humans are tested.

    • hendrik@palaver.p3x.de
      link
      fedilink
      English
      arrow-up
      7
      ·
      1 day ago

      Agree. And these tasks can’t be tailored to the AI in order for it to have a chance. It needs to drive to work, fix the computers/plumbing/whatever there, earn a decent salary and return with some groceries and cook dinner. Or at least do something comparable to a human. Just wording emails and writing boilerplate computer-code isn’t enough in my eyes. Especially since it even struggles to do that. It’s the “general” that is missing.

      • NeverNudeNo13@lemmings.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        11 hours ago

        On the same hand… “Fluently translate this email into 10 random and discrete languages” is a task that 99.999% of humans would fail that a language model should be able to hit.

        • hendrik@palaver.p3x.de
          link
          fedilink
          English
          arrow-up
          2
          ·
          edit-2
          10 hours ago

          Agree. That’s a super useful thing LLMs can do. I’m still waiting for Mozilla to integrate Japanese and a few other (distant to me) languages into my browser. And it’s a huge step up from Google translate. It can do (to a degree) proverbs, nuance, tone… There are a few things AI or machine learning can do very well. And outperform any human by a decent margin.

          On the other hand, we’re talking about general intelligence here. And translating is just one niche task. By definition that’s narrow intelligence. But indeed very useful to have, and I hope this will connect people and broaden their (and my) horizon.

      • Free_Opinions@feddit.uk
        link
        fedilink
        English
        arrow-up
        4
        ·
        21 hours ago

        It needs to drive to work, fix the computers/plumbing/whatever there, earn a decent salary and return with some groceries and cook dinner.

        This is more about robotics than AGI. A system can be generally intelligent without having a physical body.

        • hendrik@palaver.p3x.de
          link
          fedilink
          English
          arrow-up
          1
          ·
          edit-2
          14 hours ago

          You’re - of course - right. Though I’m always a bit unsure about exactly that. We also don’t attribute intelligence to books. For example an encyclopedia, or Wikipedia… That has a lot of knowledge stored, yet it is not intelligent. That makes me believe being intelligent has something to do with being able to apply knowledge, and do something with it. And outputting text is just one very limited form of interacting with the world.

          And since we’re using humans as a benchmark for the “general” part in AGI… Humans have several senses, they’re able to interact with their environment in lots of ways, and 90% of that isn’t drawing and communicating with words. That makes me wonder: Where exactly is the boundary between an encyclopedia and an intelligent entity… Is intelligence a useful metric if we exclude being able to do anything useful with it? And how much do we exclude by not factoring in parts of the environment/world?

          And is there a difference between being book-smart and intelligent? Because LLMs certainly get all of their information second-hand and filtered in some way. They can’t really see the world itself, smell it, touch it and manipulate something and observe the consequences… They only get a textual description of what someone did and put into words in some book or text on the internet. Is that a minor or major limitation, and do we know for sure this doesn’t matter?

          (Plus, I think we need to get “hallucinations” under control. That’s also not 100% “intelligence”, but it also cuts into actual use if that intelligence isn’t reliably there.)