e

  • 13 Posts
  • 367 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle

  • AdrianTheFrog@lemmy.worldto196@lemmy.blahaj.zoneThe Rule
    link
    fedilink
    English
    arrow-up
    4
    ·
    1 day ago

    I don’t have access to llama 3.1 405b but I can see that llama 3 70b takes up ~145 gb, so 405b would probably take 840 gigabytes, just to download the uncompressed fp16 (16 bits / weight) model. With 8 bit quantization it would probably take closer to 420 gb, and with 4 bit it would probably take closer to 210 gb. 4 bit quantization is really going to start harming the model outputs, and its still probably not going to fit in your RAM, let alone VRAM.

    So yes, it is a crazy model. You’d probably need at least 3 or 4 a100s to have a good experience with it.


















  • Unions would probably work, as long as you get some people the company doesn’t want to replace in there too

    Maybe also federal regulations, although would probably just slow it because models are being made all around the world, including places like Russia and China that the US and EU don’t have legal influence over

    Also, it might be just me, but it feels like generative AI progress has really slowed, it almost feels like we’re approaching the point where we’ve squeezed the most out of the hardware we have and now we just have to wait for the hardware to get better