Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price

☆ Yσɠƚԋσʂ ☆ · edit-2 2 days ago

Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price

melroy@kbin.melroy.org · 2 days ago

DeepSeek is not that great. I run it here locally, but the answers are often still wrong. And I get Chinese characters in my English output

☆ Yσɠƚԋσʂ ☆ · 2 days ago

What makes DeepSeek important is that it shows that you can train and run a large scale model at a fraction of the cost of what existing models require. Meanwhile, in terms of quality it outperforms the top Llama model in benchmarks https://docsbot.ai/models/compare/deepseek-r1/llama-3-1-405b-instruct

melroy@kbin.melroy.org · 2 days ago

Yes that is true… now the question I have back is: How is this price calculated? I mean the price can also be low, because they ask less. Or the price can be low because interference costs less time / energy. You might answer the latter is true, but where is the source for that?

Again, since I can run it locally my price is $0 per million tokens, I only pay electricity for my home.

EDIT: The link you gave me also says “API costs” at the top of the article. So that means, they just ask less money. The model itself might use the same amount (or even more) energy than other existing models costs.

☆ Yσɠƚԋσʂ ☆ · 2 days ago

The reason they ask for less money is due to the fact that it’s a more efficient algorithm, which means it uses less power. They leveraged mixture-of-experts architecture to get far better performance than traditional models. While it has 671 billion parameters overall, it only uses 37 billion at a time, making it very efficient. For comparison, Meta’s Llama3.1 uses 405 billion parameters used all at once. You can read all about here https://arxiv.org/abs/2405.04434

melroy@kbin.melroy.org · 2 days ago

I see ok. I only want to add that DeepSeek is not the first or the only model that is using mixture-of-experts (MoE).

☆ Yσɠƚԋσʂ ☆ · 2 days ago

Ok, but it is clearly the first one to use this approach to such an effect.

Xavienth@lemmygrad.ml · 2 days ago

The claim going around is that it uses 50x less energy

Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price

Meta is reportedly scrambling ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price

Meta is reportedly scrambling multiple ‘war rooms’ of engineers to figure out how DeepSeek’s AI is beating everyone else at a fraction of the price