• ☆ Yσɠƚԋσʂ ☆OP
    link
    fedilink
    arrow-up
    36
    arrow-down
    4
    ·
    2 days ago

    Because it’s an open source project that’s destroying the whole closed source subscription AI model.

    • The Octonaut@mander.xyz
      link
      fedilink
      arrow-up
      11
      arrow-down
      6
      ·
      2 days ago

      I don’t think you or that Medium writer understand what “open source” means. Being able to run a local stripped down version for free puts it on par with Llama, a Meta product. Privacy-first indeed. Unless you can train your own from scratch, it’s not open source.

      Here’s the OSI’s helpful definition for your reference https://opensource.org/ai/open-source-ai-definition

      • ☆ Yσɠƚԋσʂ ☆OP
        link
        fedilink
        arrow-up
        11
        arrow-down
        8
        ·
        2 days ago

        You can run the full version if you have the hardware, the weights are published, and importantly the research behind it is published as well. Go troll somewhere else.

        • The Octonaut@mander.xyz
          link
          fedilink
          arrow-up
          6
          arrow-down
          7
          ·
          2 days ago

          All that is true of Meta’s products too. It doesn’t make them open source.

          Do you disagree with the OSI?

          • Grapho
            link
            fedilink
            arrow-up
            7
            arrow-down
            2
            ·
            1 day ago

            What makes it open source is that the source code is open.

            My grandma is as old as my great aunts, that doesn’t transitively make her my great aunt.

            • The Octonaut@mander.xyz
              link
              fedilink
              arrow-up
              2
              arrow-down
              4
              ·
              1 day ago

              A model isn’t an application. It doesn’t have source code. Any more than an image or a movie has source code to be “open”. That’s why OSI’s definition of an “open source” model is controversial in itself.

              • Grapho
                link
                fedilink
                arrow-up
                3
                arrow-down
                1
                ·
                1 day ago

                It’s clear you’re being disingenuous. A model is its dataset and its weights too but the weights are also open and if the source code was as irrelevant as you say it is, Deepseek wouldn’t be this much more performant, and “Open” AI would have published it instead of closing the whole release.

            • The Octonaut@mander.xyz
              link
              fedilink
              arrow-up
              10
              arrow-down
              3
              ·
              edit-2
              2 days ago

              The data part. ie the very first part of the OSI’s definition.

              It’s not available from their articles https://arxiv.org/html/2501.12948v1 https://arxiv.org/html/2401.02954v1

              Nor on their github https://github.com/deepseek-ai/DeepSeek-LLM

              Note that the OSI only ask for transparency of what the dataset was - a name and the fee paid will do - not that full access to it to be free and Free.

              It’s worth mentioning too that they’ve used the MIT license for the “code” included with the model (a few YAML files to feed it to software) but they have created their own unrecognised non-free license for the model itself. Why they having this misleading label on their github page would only be speculation.

              Without making the dataset available then nobody can accurately recreate, modify or learn from the model they’ve released. This is the only sane definition of open source available for an LLM model since it is not in itself code with a “source”.

                • The Octonaut@mander.xyz
                  link
                  fedilink
                  arrow-up
                  9
                  arrow-down
                  2
                  ·
                  2 days ago

                  That’s the “prover” dataset, ie the evaluation dataset mentioned in the articles I linked you to. It’s for checking the output, it is not the training output.

                  It’s also 20mb, which is miniscule not just for a training dataset but even as what you seem to think is a “huge data file” in general.

                  You really need to stop digging and admit this is one more thing you have surface-level understanding of.

    • Hnery@feddit.org
      link
      fedilink
      arrow-up
      2
      arrow-down
      7
      ·
      2 days ago

      So… as far as I understand from this thread, it’s basically a finished model (llama or qwen) which is then fine tuned using an unknown dataset? That’d explain the claimed 6M training cost, hiding the fact that the heavy lifting has been made by others (US of A’s Meta in this case). Nothing revolutionary to see here, I guess. Small improvements are nice to have, though. I wonder how their smallest models perform, are they any better than llama3.2:8b?

      • ☆ Yσɠƚԋσʂ ☆OP
        link
        fedilink
        arrow-up
        6
        arrow-down
        2
        ·
        1 day ago

        What’s revolutionary here is the use of mixture-of-experts approach to get far better performance. While it has 671 billion parameters overall, it only uses 37 billion at a time, making it very efficient. For comparison, Meta’s Llama3.1 uses 405 billion parameters used all at once. It does as well as GPT-4o in the benchmarks, and excels in advanced mathematics and code generation. It also has 128K token context window means it can process and understand very long documents, and processes text at 60 tokens per second, twice as fast as GPT-4o.