Couldn't have happened to a nicer guy

☆ Yσɠƚԋσʂ ☆ · 2 days ago

Couldn't have happened to a nicer guy

☆ Yσɠƚԋσʂ ☆ · 1 day ago

I never disagreed that you can run Meta’s model with the same level of privacy, so don’t know why you keep bringing that up as some sort of gotcha. The point about DeepSeek is its efficiency. OSI definition for open source is good, and it does look like you’re right that the full data set is not available. However, the real question is why you’d be so hung up on that.

Given that the code for training a new model is released, and it can be applied to open data sets, that means it’s perfectly possible to make a version that’s trained on open data that would check off the final requirement you keep bringing up. Also, adapting it does not require having the original training set since it’s done by tuning the weights in the network itself. Go read up on how LoRA works for example.

The Octonaut@mander.xyz · 1 day ago

I know how LoRA works thanks. You still need the original model to use a LoRA. As mentioned, adding open stuff to closed stuff doesn’t make it open - that’s a principle applicable to pretty much anything software related.

You could use their training method on another dataset, but you’d be creating your own model at that point. You also wouldn’t get the same results - you can read in their article that their “zero” version would have made this possible but they found that it would often produce a gibberish mix of English, Mandarin and code. For R1 they adapted their pure “we’ll only give it feedback” efficiency training method to starting with a base dataset before feeding it more, a compromise to their plan but necessary and with the right dataset - great! It eliminated the gibberish.

Without that specific dataset - and this is what makes them a company not a research paper - you cannot recreate DeepSeek yourself (which would be open source) and you can’t guarantee that you would get anything near the same results (in which case why even relate it to thid model anymore). That’s why those are both important to the OSI who define Open Source in all regards as the principle of having all the information you need to recreate the software or asset locally from scratch. If it were truly Open Source by the way, that wouldn’t be the disaster you think it would be as then OpenAI could just literally use it themselves. Or not - that’s the difference between Open and Free I alluded to. It’s perfectly possible for something to be Open Source and require a license and a fee.

Anyway, it does sound like an exciting new model and I can’t wait to make it write smut.

☆ Yσɠƚԋσʂ ☆ · 1 day ago

I didn’t say that using LoRA makes it more open, I was pointing out that you don’t need the original data to extend the model.

Basically what you’re talking about is being able to replicate the original model from scratch given the code and the data. And since the data component is missing you can’t replicate the original model. I personally don’t find this to be that much of a problem because people could create a comparable model from scratch if they really wanted to using an open data set.

The actual innovation with DeepSeek lies in the use of mixture-of-experts approach to get far better performance. While it has 671 billion parameters overall, it only uses 37 billion at a time, making it very efficient. For comparison, Meta’s Llama3.1 uses 405 billion parameters used all at once. That’s the really interesting part of the whole thing. That’s the part where openness really matters.

And I full expect that OpenAI will incorporate this idea into their models. The disaster for open AI is in the fact that their whole business model around selling subscriptions is now dead in the water. When models were really expensive to run, then only a handful of megacorps could do it. Now, it turns out that you can get the same results at a fraction of the cost.