• Alex
    link
    fedilink
    English
    arrow-up
    2
    ·
    5 days ago

    Does this use the same attention architecture as traditional tokenisation? As far as I understood it each token has a bunch of meaning associated with it encoded in a vector.

    • hendrik@palaver.p3x.deOP
      link
      fedilink
      English
      arrow-up
      2
      ·
      edit-2
      5 days ago

      Uh, I’m not sure. I didn’t have the time yet to read those papers. I suppose the Byte Latent Transformer does. It’s still some kind of a transformer architecture. With the Large Concept Models, I’m not so sure. They’re encoding whole sentences. And the researchers explore like 3 different (diffusion) architectures. The paper calls itself a “proof of feasibility”, so it’s more basic research about that approach, not one single/specific model architecture.