• everett
    link
    fedilink
    arrow-up
    9
    arrow-down
    3
    ·
    5 months ago

    What am I missing here?

    “L” “M” “M”

      • blindsight@beehaw.org
        link
        fedilink
        arrow-up
        6
        ·
        5 months ago

        Correct.

        large language models (LLM) vs. large multi-modal models (LMM)

        Regardless, they both use an LLM as the main driver. Multi modal just means that the LLM is interfaced with generative and/or predictive AIs for other types of content like images, sound, video, etc.

        This is using a generalist tool for a specialized job. I’d expect the limit for LMMs is telling you if your picture is a heart or a kidney… Maybe. With low accuracy. Diagnosing? lol, hell no.