cross-posted from: https://lemmy.ml/post/24102825

DeepSeek V3 is a big deal for a number of reasons.

At only $5.5 million to train, it’s a fraction of the cost of models from OpenAI, Google, or Anthropic which are often in the hundreds of millions.

It breaks the whole AI as a service business model that OpenAI and Google have been pursuing making state-of-the-art language models accessible to smaller companies, research institutions, and even individuals.

The code is publicly available, allowing anyone to use, study, modify, and build upon it. Companies can integrate it into their products without paying for usage, making it financially attractive. The open-source nature fosters collaboration and rapid innovation.

The model goes head-to-head with and often outperforms models like GPT-4o and Claude-3.5-Sonnet in various benchmarks. It excels in areas that are traditionally challenging for AI, like advanced mathematics and code generation. Its 128K token context window means it can process and understand very long documents. Meanwhile it processes text at 60 tokens per second, twice as fast as GPT-4o.

The Mixture-of-Experts (MoE) approach used by the model is key to its performance. While the model has a massive 671 billion parameters, it only uses 37 billion at a time, making it incredibly efficient. Compared to Meta’s Llama3.1 (405 billion parameters used all at once), DeepSeek V3 is over 10 times more efficient yet performs better.

DeepSeek V3 can be seen as a significant technological achievement by China in the face of US attempts to limit its AI progress. China once again demonstrates that resourcefulness can overcome limitations.

  • Daemon Silverstein@thelemmy.club
    link
    fedilink
    English
    arrow-up
    4
    ·
    5 hours ago

    Finally got to sign up. Last time I tried, it complained about “abnormalities in the browser”, perhaps it wasn’t available for Brazilian IP addresses back when I found out about it.

    I found it interesting the way it tries to do “reasoning”. I mean, of course LLMs can’t “reason”, but DeepSeek seems to build a “chain of thought”, which brings interesting insights regarding the conversation.

    • ☆ Yσɠƚԋσʂ ☆OP
      link
      fedilink
      English
      arrow-up
      3
      ·
      4 hours ago

      I’ve been playing with it a bit too, and it’s pretty impressive. Incidentally, I saw a couple of promising approaches to help with the reasoning aspect of LLMs.

      The first method is called the consensus game to address the issue of models giving different answers to the same question depending on how it’s phrased. The trick here is to align the generator which answers open-ended questions, and the discriminator which evaluates multiple-choice questions. By incentivizing them to agree on answers through a scoring system, the game improves the model’s consistency and accuracy without requiring retraining. https://www.wired.com/story/game-theory-can-make-ai-more-correct-and-efficient/

      The second method is to use neurosymbolic systems that combine deep learning to identify patterns in data with reasoning based on knowledge using symbolic logic. It has the potential to outperform systems relying either solely on neural networks or symbolic logic while providing clear explanations for decisions. This involves encoding symbolic knowledge into a format compatible with neural networks, and mapping data from neural patterns back to symbolic representations.

      https://arxiv.org/abs/2305.00813

      The neurosymbolic approach in particular looks like a very promising way to get actual reasoning to start happening in these systems. It’s gonna be interesting to see where this all goes in a few years.

      • Daemon Silverstein@thelemmy.club
        link
        fedilink
        English
        arrow-up
        2
        ·
        4 hours ago

        The first method is called the consensus game to address the issue of models giving different answers to the same question depending on how it’s phrased

        Although humans can reason and, therefore, reply in a more coherent manner (according to one’s own cosmos which contains personality traits, knowledge, mood, etc), this phenomenon kind of also happens with humans. Depending on how multifaceted is the question/statement, a slightly different phrasing can “induce” an answer. Actually, it’s a fundamental principle behind mesmerism, gas-lighting and social engineering: inducing someone to a certain reply/action/behavior/thought, sometimes relying on repetition, sometimes relying on complexity.

        Artificial automatons are particularly sensible to this because of how their underlying principles are purely algorithmic. We aren’t exactly algorithmic, although we have physical components of “determinism” (e.g. muscles contracting when in contact with electricity, body always seeking homeostasis, etc).

        However, I understood what you meant with it. It’d be akin to a human trying to think twice/thrice when faced by complex and potentially mischievous/misleading questions/statements. “Thinking” before “acting” through consensus game.

        The second method is to use neurosymbolic systems that combine deep learning to identify patterns in data with reasoning based on knowledge using symbolic logic. It has the potential to outperform systems relying either solely on neural networks or symbolic logic while providing clear explanations for decisions. This involves encoding symbolic knowledge into a format compatible with neural networks, and mapping data from neural patterns back to symbolic representations.

        Yeah. I see a great potential on it, too. “Signs and symbols rule the world, not words or laws” (unfortunately this Confucian quote is often misused by people, but it captures the essence of how symbols are a fundamental piece of the cosmos).

        • ☆ Yσɠƚԋσʂ ☆OP
          link
          fedilink
          English
          arrow-up
          2
          ·
          4 hours ago

          For sure, and I think it’s a really important thing to keep in mind that our own logic is far from being infallible. Humans easily fall for all kinds of logical fallacies, and we find formal reasoning to be very difficult. It takes scientists years of training to develop this mindset, and they are still unable to eliminate the problem of biases and other fallacies. This is why we rely on concepts like peer review to mitigate these problems.

          An artificial reasoning system should be held to a similar standard as our own reasoning instead of some ideal of rational thought. I think that the key aspects that need to be focused on is consistency, ability to explain the steps, and being able to integrate feedback to correct mistakes. If we can get that going, then we’d have systems that can improve themselves over time and that can be taught the way we teach humans.

    • ☆ Yσɠƚԋσʂ ☆OP
      link
      fedilink
      English
      arrow-up
      2
      ·
      6 hours ago

      It’s a real game changer, and the trick of using window into a larger token space is pretty clever. This kind of stuff is precisely why I don’t take arguments that LLMs are inherently wasteful and useless very seriously. We’re just starting to figure out different techniques for using and improving them, and nobody knows what the actual limits are. I’m also very optimistic that open source models are consistently catching up and surpassing closed ones, meaning that the tech continues to stay available to the public. This was a pretty fun write up for a little while back, but still holds up well today https://steve-yegge.medium.com/were-gonna-need-a-bigger-moat-478a8df6a0d2

      • JoeByeThen [he/him, they/them]@hexbear.net
        link
        fedilink
        English
        arrow-up
        1
        ·
        60 minutes ago

        Ah yea, I remember when the We Have no Moats article dropped. It’s wild because for years I was on the cutting edge of what was going on; Tinkering with java based neural network apps , then python based tensors, and right around when Transformers dropped I was pulled away from my hobbies for familial reasons and I’ve been playing catch up ever since. Everything is happening very fast and I’ve got so much to do that I just can’t find time to stay on top of it all. Or have the money, tbh. But, yeah, lot of potential that the Left (in these parts) have plugged their fingers into their ears about. Especially as resistance is moving in a more physical wayluigi-dance, but the infrastructure of our oppression is built on the cloud.

        I saw this interesting video the other day. Basically since some of these mini-PCs share their memory with the onboard gpu, they can load up the 70b models. Slow as hell, but if you’re running everything through a queue it’d be pretty handy.

        https://www.youtube.com/watch?v=xyKEQjUzfAk

  • AtmosphericRiversCuomo [none/use name]@hexbear.net
    link
    fedilink
    English
    arrow-up
    10
    ·
    22 hours ago

    These type of posts go over like a lead balloon here because people don’t want to accept the material reality of what’s happening with this tech, but it’s undoubtedly a stroke of luck for all of us that ghoulish companies like openai don’t have any special sauce here. Open source models have consistently been able to keep up or at least get really close to the frontier model performance from companies that spend billions only to see their efforts replicated by these abaolute Chads from China.

    • ☆ Yσɠƚԋσʂ ☆OP
      link
      fedilink
      English
      arrow-up
      13
      ·
      21 hours ago

      The amount of hate this tech gets is phenomenal, and most of it is completely misdirected. The problems that people ascribe to it aren’t inherent in the technology, but are simply symptoms of underlying social problems in a capitalist society.

      For example, people complain that it takes jobs away, but the whole idea that we have to work for the sake of work is idiotic to begin with. Technology that frees up people from work should create more free time for people to enjoy. The reason that’s not happening is because capitalism is not a rational economic system.

      Another common argument is that it’s very resource intensive and wastes energy. This is true, but there’s no reason to believe this won’t be optimized. In fact, we’ve already seen a lot of optimizations happen in just a few years that now make it possible to run models that used to require a data centre to run on a laptop.

      However, more fundamentally, wasting energy is once again an aspect of the capitalist system itself. Before AI we saw stuff like crypto, NFTs, and so on. Much of the technology that’s developed under capitalism ends up being frivolous or even actively harmful. So, it’s not generative AI that’s the problem, but the social system that guides allocation of labour and resources.

      In particular, artists are still clinging to an artisan model focusing on individual exceptionalism and intellectual property rights. These reactions, rooted in petty-bourgeois ideology, ultimately serve to reinforce inequality and empower corporations rather than protect artists.

      The core contradiction here is between the increasingly socialized nature of artistic production in a globalized, digital world and the continued emphasis on private ownership. It’s a symptom of capitalist development that leads to the proletarianization of artists as they are displaced by industrial competition.

      The real solution lies in worker solidarity, unionization, and ultimately, the socialization of property. The enemy is not AI itself but the capitalist market that shapes its deployment, a system that already produces formulaic, profit-driven art. The focus on the underlying class struggle is how we get a future where technology serves the collective good rather than further entrenching existing power structures. This was a brilliant write up on the subject incidentally https://redsails.org/artisanal-intelligence/

  • Inui [comrade/them]@hexbear.net
    link
    fedilink
    English
    arrow-up
    3
    ·
    edit-2
    19 hours ago

    I skimmed the article so maybe I missed it, but how is it for language learning? It vaguely mentions it in a few places. I’ve used different Chinese models specifically because I assumed they’d be trained on more native Chinese content, but as your post alludes to, many of them are still poor in this area and very heavily focused on mathematics and programming.

    I just want something that even vaguely understands concepts like a “question word” to help correct my grammar as I learn, instead of trying to end every sentence in 吗? like ChatGPT.

    Edit: they have benchmarks on their Github that look better or comparable to similar models. https://github.com/deepseek-ai/DeepSeek-V3?tab=readme-ov-file

    • ☆ Yσɠƚԋσʂ ☆OP
      link
      fedilink
      English
      arrow-up
      3
      ·
      18 hours ago

      Ah that’s a neat use case actually. I’ve been using an app on the phone for learning, but haven’t tried using a model to practice chatting with.