I was pleasantly surprised by many models of the Deepseek family. Verbose, but in a good way? At least that was my experience. Love to see it mentioned here.
This is on the horizon - I will definitely be making a post on the workflow and process once it is figured out.
I am actively exploring this question.
So far - it’s been the best performing 7B model I’ve been able to get my hands on. Anyone running consumer hardware could get a GGUF version running on almost any dedicated GPU/CPU combo.
I am a firm believer there is more performance and better quality of responses to be found in smaller parameter models. Not too mention interesting use cases you could apply fine-tuning an ensemble approach.
A lot of people sleep on 7B, but I think Mistral is a little different - there’s a lot of exploring to be had finding these use cases but I think they’re out there waiting to be discovered.
I’ll definitely report back on how the first attempt at fine-tuning this myself goes. Until then, I suppose it would be great for any roleplay or basic chat interaction. Given it’s low headroom - it’s much more lightweight to prototype with outside of the other families and model sizes.
If anyone else has a particular use case for 7B models - let us know here. Curious to know what others are doing with smaller params.
What I find interesting is how useful these tools are (even with the imperfections that you mention). Imagine a world where this level of intelligence has a consistent low error rate.
Semantic computation and agentic function calling with this level of accuracy will revolutionize the world. It’s only a matter of time, adoption, and availability.
I respect your honesty.
Google has absolutely tanked for me these last few years. It revolutionized the world by revolutionizing search. But ChatGPT has done the same, now better - and in a much more interesting way.
I’ll take a 10 second prompt process over 20 minutes of hunting down (advertised) paged results any day of the week.
I have learned everything I have about AI through AI mentors.
Having the ability to ask endless amounts of seemingly stupid questions does a lot for me.
Not to mention some of the analogies and abstractions you can utilize to build your own learning process.
I’d love to see schools start embracing the power of personalized mentors for each and every student. I think some of the first universities to embrace this methodology will produce some incredible minds.
You should try fine-tuning that legalese model! I know I’d use it. Could be a great business idea or generally helpful for anyone you release it to.
I cannot understate how nice it is having a coding assistant 24/7.
I’m curious to see how projects like ChatDev evolve over time. I think agentic tooling is going to take us to some very sci-fi looking territory.
Semantic computation is the future.
I never considered 8 - 11. Those are really interesting use cases. I’m with you on every other point. I’m particularly interested in solving the messy unstructured notes scenario. I really feel you on that one. I’ll see what I can do!
What I find particularly exciting is that we’re seeing this evolution in real-time.
Can you imagine what these models might look like in 2 years? 5? 10?
There is a remarkable future on the horizon. I hope everyone gets an equal chance to be a part of it.
I could not agree more. I really enjoy Andrej Karpathy’s model where in the future AGI does 99% of the technical work and the human in the loop does the creative and critical 1%.
Mistral seems to be the popular choice. I think it’s the most open-source friendly out of the bunch. I will keep function calling in mind as I design some of our models! Thanks for bringing that up.
I have come to believe Moore’s law is finite, and we’re starting to see the exponential end of it. This leads me to believe (or want to believe) there are other looming breakthroughs for compute, optimization, and/or hardware on the horizon. That, or crazy powerful GPUs are about to be a common household investment.
I keep thinking about what George Hotz is doing in regards to this. He explained on his podcast with Lex Fridman that there is much to be explored in optimization, both with quantization of software and acceleration of hardware.
His idea of ‘commoditize the petabyte’ is really cool. I think it’s worth bringing up here, especially given the fact it appears one of his biggest goals right now is solving the at-home compute problem. But in a way that you could actually run something like a 180B model in-house no problem.
George Hotz’ tinybox
($15,000)
738 FP16 TFLOPS
144 GB GPU RAM
5.76 TB/s RAM bandwidth
30 GB/s model load bandwidth (big llama loads in around 4 seconds)
AMD EPYC CPU
1600W (one 120V outlet)
Runs 65B FP16 LLaMA out of the box (using tinygrad, subject to software development risks)
You can pre-order one now. You have $15k laying around, right? Lol.
It’s definitely not easy (or cheap) now, but I think it’s going to get significantly easier to build and deploy large models for all kinds of personal use cases in our near and distant futures.
If you’re serving/hosting models, it’s also worth checking out vLLM if you haven’t already: https://github.com/vllm-project/vllm
Loved to read everyone’s comments on this one. If you’re here and reading this post now, check out this related thread - you might be interested!
I totally forgot to include vLLM!
If you’re building, deploying, or hosting LLMs, you should definitely check this out.
REDACTED
🥚
This was really nice to celebrate with the (17?) people who saw this post and shared this moment with me. I have no idea what this future will hold, but I’m glad I started this community. I’m going to delete (or archive?) this post now. It feels too selfish to keep up. If you caught this comment, you found an easter egg! Congrats, digital sleuth.
This is all very good feedback! I appreciate everyone who has commented so far. I will leave this post pinned for the remainder of the year for anyone (new member or old) to share their thoughts and what else they (you) think we should explore next with !fosai@lemmy.world.
What sort of tokens per second are you seeing with your hardware? Mind sharing some notes on what you’re running there? Super curious!