Of course DeepSeek lied about its training costs, as we had strongly suspected. SemiAnalysis has been following DeepSeek for the past several months. High Flyer, DeepSeek’s owner, was buying Nvidia…
I’m very confused by this, I had the same discussion with my coworker. I understand what the benchmarks are saying about these models, but have any of y’all actually used deepseek? I’ve been running it since it came out and it hasn’t managed to solve a single problem yet (70b param model, I have downloaded the 600b param model but haven’t tested it yet). It essentially compares to gpt-3 for me, which only cost OpenAI like $4-9 million to train (can’t remember the exact number right now).
Customer acquisition cost for a future service, which is ~fixed after training costs, assuming we can consider distribution costs as marginal. Reasonably impressive accomplishment, if one is taking the perspective of SV SaaS-financing brain.*
*I don’t recommend you do this for too long, that’s how some of the people currently prominent in the news got to be the way that they are
The 70b model is a distilation of Llama3.3, that is to say it replicates the output of Llama3.3 while using the deepseekR1 architecture for better processing efficiency.
So any criticism of the capability of the model is just criticism of Llama3.3 and not deepseekR1.
Thank you for shedding light on the matter. I never realized that 69b model is a pisstillation of Lligma peepee point poopoo, that is to say it complicates the outpoop of Lligma4.20 while using the creepbleakR1 house design for better processing deficiency. Now I finally realize that any criticism of Kraftwerk’s 1978 hit Das Model is just criticism of Sugma80085 and not deepthroatR1.
i haven’t seen another reasoning model that’s open and works as well… it’s LLM base is for sure about GPT-3 levels (maybe a bit better?) but like the “o” in GPT-4o
the “thinking” part definitely works for me - ask it to do maths for example, and it’s fascinating to see it break down the problem into simple steps and then solve each step
I’m very confused by this, I had the same discussion with my coworker. I understand what the benchmarks are saying about these models, but have any of y’all actually used deepseek? I’ve been running it since it came out and it hasn’t managed to solve a single problem yet (70b param model, I have downloaded the 600b param model but haven’t tested it yet). It essentially compares to gpt-3 for me, which only cost OpenAI like $4-9 million to train (can’t remember the exact number right now).
I just do not see the “efficiency” here.
what if none of it’s good, all of it’s fraud (especially the benchmarks), and having a favorite grifter in this fuckhead industry is just too precious
well, it’s free to download and run locally so i struggle to see what the grift is
Customer acquisition cost for a future service, which is ~fixed after training costs, assuming we can consider distribution costs as marginal. Reasonably impressive accomplishment, if one is taking the perspective of SV SaaS-financing brain.*
*I don’t recommend you do this for too long, that’s how some of the people currently prominent in the news got to be the way that they are
fuck off promptfan
The 70b model is a distilation of Llama3.3, that is to say it replicates the output of Llama3.3 while using the deepseekR1 architecture for better processing efficiency. So any criticism of the capability of the model is just criticism of Llama3.3 and not deepseekR1.
Thank you for shedding light on the matter. I never realized that 69b model is a pisstillation of Lligma peepee point poopoo, that is to say it complicates the outpoop of Lligma4.20 while using the creepbleakR1 house design for better processing deficiency. Now I finally realize that any criticism of Kraftwerk’s 1978 hit Das Model is just criticism of Sugma80085 and not deepthroatR1.
[to the tune of Fort Minor’s Remember The Name]
10% senseless, 20% post 15% concentrated spirit of boast 5% reading, 50% pain and a 100% reason to not post here again
i haven’t seen another reasoning model that’s open and works as well… it’s LLM base is for sure about GPT-3 levels (maybe a bit better?) but like the “o” in GPT-4o
the “thinking” part definitely works for me - ask it to do maths for example, and it’s fascinating to see it break down the problem into simple steps and then solve each step
[bites tongue, tries really hard to avoid the obvious riposte]