Your Lemmy Crash Course to Free Open-Source AI

Blaed@lemmy.world · edit-2 1 year ago

Your Lemmy Crash Course to Free Open-Source AI

Blaed@lemmy.world · edit-2 2 years ago

Great question. I suggest visiting UnderstandGPT for the full table, but here’s a brief breakdown of current home GPU/VRAM recommendations (as of June 2023):

Model Size

7B

```
   Required VRAM (4bit): 6GB
```
```
   Required VRAM (8bit): 10GB
```

   Recommended GPU (4bit): GTX 1660, 2060, AMD 5700 XT, RTX 3050, 3060

13B

```
   Required VRAM (4bit): 10GB
```
```
   Required VRAM (8bit): 20GB
```

   Recommended GPU (4bit): AMD 6900 XT, RTX 2060 12GB, 3060 12GB, 3080, A2000

30B

```
   Required VRAM (4bit): 20GB
```
```
   Required VRAM (8bit): 40GB
```

   Recommended GPU (4bit): RTX 3080 20GB, A4500, A5000, 3090, 4090, 6000, Tesla V100

65B

```
   Required VRAM (4bit): 40GB
```
```
   Required VRAM (8bit): 80GB
```

   Recommended GPU (4bit): A100 40GB, 2x3090, 2x4090, A40, RTX A6000, 8000

In terms of CPU requirements, you can run inference on all sorts of hardware. There are people who have been able to run AI/LLM models on their laptops and others with little to no GPU whatsoever. Although for best results, a strong GPU will be important. CUDA cores (NVIDIA specific hardware on all 2xxx/3xxx/4xxx series graphics cards) utilize advanced acceleration algorithms that significantly boost AI performance. An important detail to keep in mind for anyone wanting to run models on NVIDIA cards. You get a boost in that regard.

For my fellow gamers - NVIDIA CUDA cores are what help process your in-game DLSS and RTX (among many other components), settings I’m sure you’ve explored turning on or off to boost your FPS. This is the same tech that gives you an advantage running AI on an NVIDIA GPU at home.

There is not yet an AMD equivalent to CUDA cores, but they have recently partnered with HuggingFace to explore how to offer more competition in this space.

Storage is up to you. Make sure you read file sizes before downloading. I learned that the hard way. Some of these file sizes can easily blow up your hard drive space. Consider dedicating a disk or large folder for all of your AI tinkering and workloads. I personally dedicate a 1TB drive that I use to archive the many models that I experiment with, but it’s overkill for most. You could get away with 128GB/250GB/500GB of storage if you stayed organized. If you plan to only run the small models, 8GB - 24GB should be plenty of room.

For RAM, it’s suggested to have 16GB+, but it’s not as important as GPU + CPU power (a compute combination possible for GGML models - a popular format that let’s you combine the power of both your graphics card and your processor). It’s worth noting RAM might help you load and unload models a little faster, especially so for the larger parameter variants - but the your CPU & GPU are far more important at the moment. In my opinion, 32GB/64GB of RAM is the sweet spot.

If you don’t have access to powerful GPUs you should check out runpod.io and vast.ai. They are great cloud compute platforms that allow you to rent-a-gpu for relatively cheap (typically for a few bucks an hour). Worth looking into if you want to tinker with the larger models, but there are many ways to get access to those whether renting a GPU or trying to get GGML / Quantized model running on your local hardware at home - which is 100% doable if you have at least a 1660 (or newer) graphics card. I haven’t had a lot of time to interact with AMD benchmarks so I’d love to hear how it goes for anyone running one of those cards. I’ll be doing a thorough bench later this month once I finish setting up the server.

What’s great about all of this is that compute for AI is going down for consumer hardware in general. I wouldn’t be surprised if we started to see people running models that have 65B+ parameters somewhat casually before the end of the year.

Your Lemmy Crash Course to Free Open-Source AI

Your Lemmy Crash Course to Free Open-Source AI

Join the AI Horde!