Everyone is so thrilled with llama.cpp, but I want to do GPU accelerated text generation and interactive writing. What’s the state of the art here? Will KoboldAI now download LLaMA for me?
there’s a bit more setup involved but I would look into https://github.com/oobabooga/text-generation-webui
Hi, I’m happy to see you are willing to give llama a try! If you want to do GPU-Accelerated processing, it depends on your OS and Hardware what you are able to do. If you have a Nvidia card, you will be able to use cuBLAS, instructions here: https://github.com/ggerganov/llama.cpp#cublas . I don’t have experience with other cards, but I’ll try to help if issues arise!
Also, for more ease-of-use try text-generation-webui (https://github.com/oobabooga/text-generation-webui). Well, ease-of-use, until you can want to use GPU acceleration, because you’ll need to look at https://github.com/oobabooga/text-generation-webui/blob/main/docs/llama.cpp-models.md#gpu-acceleration if you want to do that with LLaMA.
33B and 65B models seem to be the best for storytelling and writing.