If it’s actually High Bandwidth Memory, it’s the VRAM they use for some video cards/SoCs.
It might be mostly the same components, but the high bandwidth part is important and harder to do. They get the much higher throughput by physically stacking the chips on top of each other directly on the chip. The much lower distance signals have to travel (combined with a lot of pins to send signal through) do more than you can do with traditional RAM.
I get that this is expensive. However, it should also work with RAM if you accept slower speeds I guess. The question is of course if it’s still usable then.
Most current locally hosted software has some option to offload to RAM, CPU, and disk. VRAM is fastest, but RAM and CPU offloading lets you cut down to less than 4GB VRAM for certain applications, at plenty reasonable speed.
GPT-4 is already kinda slow - it works best as a “conversational” tool where you ask follow up questions and clarify things that have already been said. That’s painful when you have to wait 10 seconds for a response. I couldn’t imagine it being useful if it was minutes.
If it’s actually High Bandwidth Memory, it’s the VRAM they use for some video cards/SoCs.
It might be mostly the same components, but the high bandwidth part is important and harder to do. They get the much higher throughput by physically stacking the chips on top of each other directly on the chip. The much lower distance signals have to travel (combined with a lot of pins to send signal through) do more than you can do with traditional RAM.
I get that this is expensive. However, it should also work with RAM if you accept slower speeds I guess. The question is of course if it’s still usable then.
Most current locally hosted software has some option to offload to RAM, CPU, and disk. VRAM is fastest, but RAM and CPU offloading lets you cut down to less than 4GB VRAM for certain applications, at plenty reasonable speed.
GPT-4 is already kinda slow - it works best as a “conversational” tool where you ask follow up questions and clarify things that have already been said. That’s painful when you have to wait 10 seconds for a response. I couldn’t imagine it being useful if it was minutes.
Having to wait 10 seconds for a response is “painful”?