Meet DeepSeek: the Chinese start-up that is changing how AI models are trained

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 6 days ago

Meet DeepSeek: the Chinese start-up that is changing how AI models are trained

RedClouds@lemmygrad.ml · 6 days ago

That’s an important distinction yes, it uses a lot of smaller models added up. I haven’t been able to test it yet as I’m working with downstream tools and the raw stuff just isn’t something I’ve set up (Plus, I have like 90 gigs of ram, not… well) I read in one place you need 500 gb+ of ram to run it, so I think all 600+ billion params need to be in memory at once, and you need to use a quantized model, to get it to fit in even that space, which kinda sucks. However, that’s how it is for Mistral’s mixture of experts models too. So no difference there. MoE’s are pretty promising.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 5 days ago

Exactly, it’s the approach itself that’s really valuable. Now that we know the benefits those will translate to all other models too.