ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

poo_22@lemmygrad.ml · 3 days ago

What do I need to run this? I saw people on Xiaohongshu make an 8 macbook cluster, presumably networked using thunderbolt, and I’m thinking that might actually be the most economical way to do it right now.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 3 days ago

It depends on the model size, here’s how you can get DeepSeek running locally https://dev.to/shayy/run-deepseek-locally-on-your-laptop-37hl

poo_22@lemmygrad.ml · 2 days ago

According to this page to run the full model you need about 1.4TB of memory, or about 16 A100 GPUs. Which is still prohibitively expensive for an individual enthusiast, but yes you can run a simplified model locally with ollama. Still probably needs a GPU with a lot of memory.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 2 days ago

I got deepseek-r1:14b-qwen-distill-fp16 running locally with 32gb ram and a GPU, but yeah you do need a fairly beefy machine to run even medium sized models.

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance just dopped Doubao-1.5-pro tht uses sparse MoE architecture, it matches GPT 4o benchmarks while being 50x cheaper to run, and it's 5x cheaper than DeepSeek

ByteDance Releases Doubao Large Model 1.5 Pro, Performance Surpassing GPT-4o and Claude3.5Sonnet