So-called “emergent” behavior in LLMs may not be the breakthrough that researchers think.

  • UraniumBlazer@lemm.ee
    link
    fedilink
    English
    arrow-up
    49
    ·
    8 months ago

    TLDR: Let’s say you want to teach an LLM a new skill. You give them training data pertaining to that skill. Currently, researchers believe that this skill development shows up suddenly in a breakthrough fashion. They think so because they measure this skill using some methods. The skill levels remain very low until they unpredictably jump up like crazy. This is the “breakthrough”.

    BUT, the paper that this article references points at flaws in the methods of measuring skills. This paper suggests that breakthrough behavior doesn’t really exist and skill development is actually quite predictable.

    Also, uhhh I’m not AI (I see that TLDR bot lurking everywhere, which is what made me specify this).

  • Norgur@fedia.io
    link
    fedilink
    arrow-up
    27
    ·
    8 months ago

    What always irks me about those “emergent behavior” articles: no one ever really defines what those amazing"skills" are supposed to be.

  • kromem@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    8 months ago

    I’m not sure why they are describing it as “a new paper” - this came out in May of 2023 (and as such notably only used GPT-3 and not GPT-4, which was where some of the biggest leaps to date have been documented).

    For those interested in the debate on this, the rebuttal by Jason Wei (from the original emergent abilities paper and also the guy behind CoT prompting paper) is interesting: https://www.jasonwei.net/blog/common-arguments-regarding-emergent-abilities

    In particular, I find his argument at the end compelling:

    Another popular example of emergence which also underscores qualitative changes in the model is chain-of-thought prompting, for which performance is worse than answering directly for small models, but much better than answering directly for large models. Intuitively, this is because small models can’t produce extended chains of reasoning and end up confusing themselves, while larger models can reason in a more-reliable fashion.

    If you follow the evolution of prompting in research lately, there’s definitely a pattern of reliance on increased inherent capabilities.

    Whether that’s using analogy to solve similar problems (https://openreview.net/forum?id=AgDICX1h50) or self-determining the optimal strategy for a given problem (https://arxiv.org/abs/2402.03620), there’s double digit performance gains in state of the art models by having them perform actions that less sophisticated models simply cannot achieve.

    The compounding effects of competence alone mean that progress here isn’t going to be a linear trajectory.

  • spujb@lemmy.cafe
    link
    fedilink
    English
    arrow-up
    1
    arrow-down
    1
    ·
    8 months ago

    what material benefit does having a cutesy representation of phrenology, a pseudoscience used to justify systematic racism, bring to this article or discussion?