• ☆ Yσɠƚԋσʂ ☆OP
    link
    fedilink
    arrow-up
    5
    ·
    2 days ago

    The reason they ask for less money is due to the fact that it’s a more efficient algorithm, which means it uses less power. They leveraged mixture-of-experts architecture to get far better performance than traditional models. While it has 671 billion parameters overall, it only uses 37 billion at a time, making it very efficient. For comparison, Meta’s Llama3.1 uses 405 billion parameters used all at once. You can read all about here https://arxiv.org/abs/2405.04434

    • melroy@kbin.melroy.org
      link
      fedilink
      arrow-up
      1
      arrow-down
      1
      ·
      2 days ago

      I see ok. I only want to add that DeepSeek is not the first or the only model that is using mixture-of-experts (MoE).