• 164 Posts
  • 81 Comments
Joined 1 year ago
cake
Cake day: June 10th, 2023

help-circle



















  • I am actively exploring this question.

    So far - it’s been the best performing 7B model I’ve been able to get my hands on. Anyone running consumer hardware could get a GGUF version running on almost any dedicated GPU/CPU combo.

    I am a firm believer there is more performance and better quality of responses to be found in smaller parameter models. Not too mention interesting use cases you could apply fine-tuning an ensemble approach.

    A lot of people sleep on 7B, but I think Mistral is a little different - there’s a lot of exploring to be had finding these use cases but I think they’re out there waiting to be discovered.

    I’ll definitely report back on how the first attempt at fine-tuning this myself goes. Until then, I suppose it would be great for any roleplay or basic chat interaction. Given it’s low headroom - it’s much more lightweight to prototype with outside of the other families and model sizes.

    If anyone else has a particular use case for 7B models - let us know here. Curious to know what others are doing with smaller params.








  • I have learned everything I have about AI through AI mentors.

    Having the ability to ask endless amounts of seemingly stupid questions does a lot for me.

    Not to mention some of the analogies and abstractions you can utilize to build your own learning process.

    I’d love to see schools start embracing the power of personalized mentors for each and every student. I think some of the first universities to embrace this methodology will produce some incredible minds.

    You should try fine-tuning that legalese model! I know I’d use it. Could be a great business idea or generally helpful for anyone you release it to.








  • I have come to believe Moore’s law is finite, and we’re starting to see the exponential end of it. This leads me to believe (or want to believe) there are other looming breakthroughs for compute, optimization, and/or hardware on the horizon. That, or crazy powerful GPUs are about to be a common household investment.

    I keep thinking about what George Hotz is doing in regards to this. He explained on his podcast with Lex Fridman that there is much to be explored in optimization, both with quantization of software and acceleration of hardware.

    His idea of ‘commoditize the petabyte’ is really cool. I think it’s worth bringing up here, especially given the fact it appears one of his biggest goals right now is solving the at-home compute problem. But in a way that you could actually run something like a 180B model in-house no problem.

    George Hotz’ tinybox

    ($15,000)

    • 738 FP16 TFLOPS
    • 144 GB GPU RAM
    • 5.76 TB/s RAM bandwidth
    • 30 GB/s model load bandwidth (big llama loads in around 4 seconds)
    • AMD EPYC CPU
    • 1600W (one 120V outlet)
    • Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)

    You can pre-order one now. You have $15k laying around, right? Lol.

    It’s definitely not easy (or cheap) now, but I think it’s going to get significantly easier to build and deploy large models for all kinds of personal use cases in our near and distant futures.

    If you’re serving/hosting models, it’s also worth checking out vLLM if you haven’t already: https://github.com/vllm-project/vllm