• FooBarrington@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    ·
    15 hours ago

    The cool thing about this is that they also published a bunch of details about their approach, as well as tooling around it!

  • Deceptichum@quokk.au
    link
    fedilink
    English
    arrow-up
    51
    arrow-down
    1
    ·
    edit-2
    1 day ago

    Fuck it, I use local LLMs enough, will give this a crack.

    Edit: it’s doing 6 paragraphs in 8.2 seconds, the last model I used was doing like 1 paragraph in 12 seconds. Crazy fast in my experience.

    • yeehaw@lemmy.ca
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      1
      ·
      1 day ago

      How are they to run, how useful are they, and any you can recommend?

      • Deceptichum@quokk.au
        link
        fedilink
        English
        arrow-up
        33
        ·
        1 day ago

        Dead simple to run, I use Ollama to run local models and it’s like 3 words to setup from the command line.

        Useful is entirely relative. I use mine personally and somewhat professionally, but I only use it to draft text and manually alter it. AI is amazing, but it’s also crap. You gotta work it a bit.

        Umm this model from what I can see, I’m using the 8b model and it’s fast to generate, time will tell how good the quality is but I’m impressed after a few minutes play.

        • chiisana@lemmy.chiisana.net
          link
          fedilink
          English
          arrow-up
          11
          ·
          1 day ago

          8B parameter tag is the distilled llama 3.1 model, which should be great for general writing. 7B is distilled qwen 2.5 math, and 14B is distilled qwen 2.5 (general purpose but good at coding). They have the entire table called out on their huggingface page, which is handy to know which one to use for specific purposes.

          The full model is 671B and unfortunately not going to work on most consumer hardwares, so it is still tethered to the cloud for most people.

          Also, it being a made in China model, there are some degree of censorship mandated. So depending on use case, this may be a point of consideration, too.

          Overall, it’s super cool to see something at this level to be generally available, especially with all the technical details out in the open. Hopefully we’ll see more models with this level of capability become available so there are even more choices and competition.

          • cyd@lemmy.world
            link
            fedilink
            English
            arrow-up
            6
            ·
            1 day ago

            Also, the release of R1 under the MIT license means that in principle anyone can use R1 to generate synthetic training sets for improving other (non-reasoning) models. This may be a real game changer.

            The one fly in the ointment is that Deepseek didn’t deign to share details of their synthetic data generation procedure. But they are already way more transparent than any other non-academic AI lab, so it’s hard to get mad at them over this.

        • yeehaw@lemmy.ca
          link
          fedilink
          English
          arrow-up
          2
          ·
          16 hours ago

          This is cool, are there any decent ones that run in docker and have a web UI?

          • rebelsimile@sh.itjust.works
            link
            fedilink
            English
            arrow-up
            1
            ·
            13 hours ago

            I’ve been using open webui (search for it with those terms) to run local models in a docker container served from Llama for the last few months and I love it.

  • jimmy90@lemmy.world
    link
    fedilink
    English
    arrow-up
    8
    ·
    1 day ago

    so what of its reasoning? can it deduce? can it follow specific logic/equations in mathematical notation or in plain language?