- cross-posted to:
- selfhost
- cross-posted to:
- selfhost
I’m currently getting a lot of timeout errors and delays processing the analysis. What GPU can I add to this? Please advise.
Unfortunately Nvidia is, by fair, the best choice for local LLM coder hosting, and there are basically two tiers:
-
Buy a used 3090, limit the clocks to like 1400 Mhz, and then host Qwen 2.5 coder 32B.
-
Buy a used 3060, host Arcee Medius 14B.
Both these will expose an OpenAI endpoint.
Run tabbyAPI instead of ollama, as it’s far faster and more vram efficient.
You can use AMD, but the setup is more involved. The kernel has to be compatible with the rocm package, and you need a 7000 card and some extra hoops for TabbyAPI compatibility.
Aside from that, an Arc B570 is not a terrible option for 14B coder models.
-
The i7-6700 has an Intel iGPU that will handle heavy transcoding just fine using Quicksync.
It will even do really fast object detection with OpenVINO, with minimal CPU usage. At least in Frigate both of those things work extremely well.
I bought that desktop exactly for that reason. The video recording itself seems to work fine, but the ai model seems to be struggling sometimes, and even when it works, it takes about half a second or more to make a classification. That’s what I want to improve with the gpu. I’m reading up on openvio, and it seems impressive, but only on frigate. Do you have any experience with Frigate vs Blue iris? What are your thoughts?
So I run a debian server, it’s a shitty little i5 4th gen of some sort with 8GB of RAM. It runs BI in a docker of Windows using Dockur, as well as docker stacks for mosquitto and deepstack, and other containers for my solar array and inverters.
I have BI AI pointing to the debian host machine’s IP with the port I used for deepstack container. This seems to be pretty good at object detection without any GPU and on a shitty little I3 processor that’s about a decade or more old.
I use BI because Frigate and any other FOSS just don’t have anything approaching the usefulness and ease of setup of BI. I’d love if there were an alternative, because I fucking loathe having a Windows install in my network, even if it is running as a docker container. But that’s not the case, so I pay for BI and use some mitigations for having to deal with Windows.
I can post the compose files if you think you might want to give this a try.
Yeah I used deepstack before, and it had a lot better detection times, but recently BI switched to using CodeProjectAI as the supported ai, so I moved over to that. It’s not as performant as Deepstack. Maybe I should try going back to deepstack even if it’s not officially supported.
That’s what I noticed and went back to deepstack. It integrates with no issues at all, just specify the port and let it go.
I’ve never used anything else so I can’t really compare, but frigate works well.
Blue Iris is windows only and really resource heavy, so thats why I’ve never used it for more than a quick test.
I’m glad you posted this because I need similar advice. I want a GPU for Jellyfin transcoding and running Ollama (for a local conversation agent for Home Assistant), splitting access to the single GPU between two VMs in Proxmox.
I would also prefer it to be AMD as a first choice or Intel as a second, because I’m still not a fan of Nvidia for their hostile attitude towards Linux and for proprietary CUDA.
(The sad thing is that I probably could have accomplished the transcoding part with just integrated graphics, but my AMD CPU isn’t an APU.)
The problem with AMD graphics cards is that the performance that CUDA, xformers and pytorch provide for nVidia cards blows anything AMD has away by a significantly high order of magnitude.
I have no idea why AMD gpus are so trash when it comes near anything involving generative AI/LLMs, DLSS, Jellyfin transcoding, or even raytracing; i would recommend waiting until their upcoming new GPU announcements.
There’s a CUDA emulator called ZLUDA that fixes a lot of that.
Development for that stopped almost a year ago because the performance difference is so much that no one used it and even AMD themselves dropped all funding to that project.
It was started again and is close to where it was before it was dropped.
It’s not functional yet.
Is that still true though? My impression is that AMD works just fine for inference with ROCm and llama.cpp nowadays. And you get much more VRAM per dollar, which means you can stuff a bigger model in there. You might get fewer tokens per second compared with a similar Nvidia, but that shouldn’t really be a problem for a home assistant. I believe. Even an Arc a770 should work with IPEX-LLM. Buy two Arc or Radeon with 16 GB VRAM each, and you can fit a Llama 3.2 11B or a Pixtral 12B without any quantization. Just make sure that ROCm supports that specific Radeon card, if you go for team red.
I’m also curious. I have also heard good things this past year about AMD and ROCm. Obviously not as close to Nvidia yet (or maybe ever) but considering the price I’ve been considering trying.
You can try a Coral TPU with CodeProjectAI. I used it for a bit but I have the USB version and it has heating/disconnect issues at times.
I don’t like using CPU for AI so plan to offload the AI to an unRAID server with a nVidia 2070 super to combine Plex decoding and AI tasks to that box.
Ryzen 5600g or any higher G model its an APU or 6700xt for a real GPU 6700 is best bang for buck but it fluctuates all the time. All amd systems are much less finnicky than nvidia. Especially on Linux.