Context: Pony Diffusion v6 is one of the most popular SD models, and the upcoming v7 has the potential for similar popularity. An interesting aspect is their controversial decision to use AuraFlow as a base model, rather than Flux or SD3 The creator of Pony Diffusion (AstraliteHeart) was interviewed on a Civit.ai stream two weeks ago where they discuss this further. I don’t use Discord so if you have more visibility and insight into the details, I’d like to hear it.
As mentioned in the stream, as of Nov 2024, some of the big drawbacks with AuraFlow are the high VRAM usage (apparently 24GB VRAM to generate a 1024x1024 image) and the lack of tooling (afaik there are no ControlNets, or training scripts for making LoRas, and many generation UIs like A1111 don’t even support it yet). These sound like big issues, although the stream host points out the recent release of Mochi:
Mochi, the video model release two weeks ago, on release the developers said you’re going to need three H100s [80GB each] to run this model. And now, two weeks later, you can run it on 12GB of VRAM. So I wouldn’t be too worried about this,
There has been a long-standing claim that the missing tools will be built and optimized by the community once there is a decent community using AuraFlow and it’s reassuring to have real examples of these rapid leaps in accessibility and efficiency to look at. And I believe the Pony project is one of the things which does have the real potential to bring in that rapid development activity.
Which brings to mind another side of the choice to use AuraFlow, which I would casually call an activist aspect. And I don’t mean ‘activist’ in a melodramatic way, I mean it just as much as me saying ‘You should help make Lemmy more active because reddit abuses its users’ is activism: I believe one tool is better for our communities and therefore I choose to use my small influence to promote it. I’m also not saying ‘activist’ as a solid claim, accusation or glorification because AstraliteHeart’s contextual reasons for choosing AuraFlow could effectively be ‘I prefer their commerce-enabling license’ or ‘I think this base model is more effective for this one specific project’, I honestly don’t know, but on the other hand, I notice they praise Simo and their team for this open project. And whether or not it’s intentional, Pony shines a big spotlight on their admirable work. Further than that, upon launch, Pony could even be the catalyst to enable AuraFlow to receive major community support and remain competitive with the venture capital-fueled Flux, SD and others.
If PonyFlow is deemed a groundbreaking finetune, with strong enough results to bring its huge audience from SDXL to AuraFlow, that’s a powerful force and one big enough to bring technical development, just like the reddit API exodus brought a wave of devs into Lemmy development, resulting in important improvements in a relatively short time. When I say a powerful force, here are a few stats from civit.ai on the stream:
468,000 downloads, 160 million on-site generations Out of the 3,500 LoRas that we train every 24 hours […] the vast majority are Pony-based.
If PonyFlow can show those people it’s worth crying out for, generation services like civit.ai would be crazy not to try and support it and there will be significant demand for other open-source tools like generators and trainers to support AuraFlow. So if Pony can bring those kinds of boosts to an open project, then I say good on them for it and I think that anyone wary of venture capitalist enshittification should support this push towards a more open tool.
edit: just found this
I had no idea this took so much VRAM. I assumed it was around SDXL in size. Also, I don’t feel like the super artists thing is going to solve the problem of ugly, generic styles. It’s just going to make several boring styles you’re going to want to LoRA away. Which is a problem on a model with such high requirements already.
I had no idea this took so much VRAM
fwiw, an update I didn’t see before:
I did some experiments, 16GB should be doable right now with just weight unloading, so the comfy workflow should just work.
From /r/StableDiffusion/comments/1gpa65w/v7_updates_on_civitai_twitch_stream_tomorrow_nov/lwxc7jj/
It’s still huge and restrictive to consumer GPUs, although that’s already reduced by 1/3, and optimistically they could be right about this number going down as optimizations happen (see the Mochi quote).
I assumed it was around SDXL in size.
I just assumed that since AuraFlow is competing with FLUX and SD3.5 that it would be closer to them. Honestly Pony v6 will probably be alive and active until higher VRAM GPUs become more and more normal.