Is there a good reason why AMD APUs just aren't used with massive amounts of (V)RAM just like the Mac M2 is?

kelvie@lemmy.ca · 1 year ago

Is there a good reason why AMD APUs just aren't used with massive amounts of (V)RAM just like the Mac M2 is?

j4k3@lemmy.world · 1 year ago

Ultimately, it is all about data throughput to the CPU caches because tensors are so large. The M2 claims a 128 bit bus. The instruction support for ARM built into llama.cpp is weak compared to x86. If you want to run big models that require lots of memory, without spending five figures, find a Intel chip that supports AVX-512 and has support for 96+ GB of ram. AVX-512 and the related sub commands are directly supported in llama.cpp and that gets you 512 bit instructions. Apple can’t match that.

If you want a laptop, get something with a 3080Ti. It needs to specifically be the Ti version. This has 16GBV ram and came in several 2022 models.

Run Fedora with it. They have Nvidia support including a slick script that builds the GPU driver from source with every kernel update automatically, and keeps secure boot working all the time.

Atemu · 1 year ago

The instruction support for ARM built into llama.cpp is weak compared to x86.

I don’t know about you but my M1 Pro is a hellovalot faster than my 5800x in llama.cpp.

These CPUs benchmark similarly across a wide range of other tasks.

j4k3@lemmy.world · edit-2 1 year ago

deleted by creator

Atemu · 1 year ago

No consumer AMD hardware is on that list.

*No consumer Intel hardware is on that list.

The only widely available consumer hardware with AVX512 support is AMD’s Zen4 (7000 series).

I think just about the only Apple computer that supports AVX512 is the 2019 mac pro.

j4k3@lemmy.world · edit-2 1 year ago

deleted by creator

kelvie@lemmy.ca · 1 year ago

I run exllama on a 24GB GPU right now, just seeing what’s feasible for larger models – so an intel CPU with lots of RAM would in theory outperform an AMD iGPU with the same amount of ram allocated as VRAM? (I’m looking at APU/iGPUs solely because you can configure the amount of VRAM allocated to them.

j4k3@lemmy.world · 1 year ago

I’m pretty sure it is not super relevant. The amount of vram in a GPU is different than the amount in a CPU. The system memory with x86 is mostly virtual bits. I haven’t played in this space in awhile, and so my memory is rusty. The system memory is not directly accessible by an address bus. It creates a major bottleneck when you need to access a lot of information at once. It is more of a large storage system that is made to move chunks of data that are limited in size. If you want more info read about address buses and physical/virtual buses: https://en.m.wikipedia.org/wiki/Physical_Address_Extension

In a GPU, the goal is to move data in parallel where most of the memory is available at the same time. This doesn’t have the extra overhead of complicated memory management systems. Each small processor is directly addressing the memory it needs. With a GPU, more memory usually means more physical compute hardware .

If you ever feel motivated to build vintage computing hardware like Ben Eater’s 8 bit bread board computer project on YouTube, or his 6502 stuff, you’ll see a lot of this first hand. The early 8 bit computer stuff is when a lot of this memory bus and address space was a major design aspect that is much more clear to understand because it is manually configured in hardware external to the processor.

kelvie@lemmy.ca · 1 year ago

As per the link (YouTube) in the other thread, it seems like iGPU + increased allocation of VRAM is better than using the CPU, though it also seems APUs max out at 16GB. Maybe something AMD can improve in the future then…

rufus@discuss.tchncs.de · 1 year ago

What’s the memory bandwith on the AMD platform?