LocalAI is the free, open source locally run drop-in replacement REST API for OpenAI

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 2 days ago

LocalAI is the free, open source locally run drop-in replacement REST API for OpenAI

JucheStalin@lemmygrad.ml · 1 day ago

So it’s a fancy proxy to existing AI offerings?

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 1 day ago

It’s a way to run models on your local machine and provide an API that’s compatible with OpenAI that can be used by apps that normally rely on that.

Kultronx@lemmygrad.ml · 5 hours ago

what is the advantage of doing something like this? i am a layperson

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 4 hours ago

Privacy and ability to generate content you want. Using commercial services like OpenAI means your data is sent to their servers, so anything you query is known to the company, and their models are often restricted in terms of content they will allow you to generate. For example, Google’s Gemini will refuse to deal with many political subjects.

Kultronx@lemmygrad.ml · 4 hours ago

is there a guide how to use this? i downloaded the zip file but i have no idea what to do with it

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 4 hours ago

Oh haha it’s a bit tricky if you’re not technical, the easiest way is to use docker, but that’s a whole thing of itself. If you want just an app you can run, one of these is easier to get going with

Kultronx@lemmygrad.ml · 4 hours ago

Thanks for the tips, I will check those out

JucheStalin@lemmygrad.ml · 5 hours ago

Hm so it downloads fixed models and works without an internet connection? Interesting.

☆ Yσɠƚԋσʂ ☆@lemmygrad.ml · 4 hours ago

Right, you can download any publicly available model and run it without using the internet. Caveat is that you do need a relatively fast machine to make it performant.

FuckBigTech347@lemmygrad.ml · 3 hours ago

For reference the oldest card I have that Vulkan supports is an RX 560 that I bought in 2017 (I’m on GNU/Linux w/ amdgpu and the RADV mesa driver aka. “The Default”). Most medium models on it run at around 6 - 10 Tokens/s. Some crawl to below 6 Tokens/s though and become slower the longer the answer they output is, probably because parts of the model is in RAM since that card has “only” 4GB of VRAM. Models that fully fit in VRAM are a lot faster.