"I can just use a depth map to get perfect hands!" he said...

FactorSD@lemmy.dbzer0.com · 11 months ago

It does seem to work fairly well, although I will say that it doesn’t fit my workflow at all so I haven’t done a lot of testing. I do think there are some UI things that you could look at though. Engine and Dimensions shouldn’t be minimizable lists, because the fields only take up as much space as the label does. Also, your tooltips are outrageously large, covering about 75% the width of a 1080p monitor which makes them quite hard to actually read.

FactorSD@lemmy.dbzer0.com · 11 months ago

It’s hard to give precise figures, because there’s always tricks to getting a little more or less but from my (admittedly limited) testing SDXL is significantly more demanding, and 10+GB of VRAM is probably going to be the minimum to run it. I don’t remember exactly what I was doing but I run on an RTX A4500 card, and I managed to max out the 20GB of VRAM just with one SDXL process, where I can normally run a LORA training and 512x768 size images at the same time.

FactorSD@lemmy.dbzer0.com · 11 months ago

Protip - If an image is good but not quite perfect, stick to the same seed and use the X/Y script to run the image lots of times at different CFG levels.

FactorSD@lemmy.dbzer0.com · 11 months ago

A lot of the time I try to just let images come out as the AI imagines them - Just running img2img prompts, often in big batches, then picking the pictures that best reflect what I wanted.

But I do also have another process when I want something specific, which involves doing img2img to generate a pose and general composition, flipping that image into both a controlnet (for composition) and a segmentanything mask (for latent couple) and then respinning the same image with the same seed with those new constraints. When you run with the controlnet and the mask you can turn the CFG way down (3 or 4) but keep the coherence in the image so you get much more naturalistic outputs.

This is also a good way to work with LORAs that are either poorly made or don’t work well together - The initial output might look really burned, but when you have the composition locked in you can run the LORAs at much lower strength and with lower CFG so they sit together better.

FactorSD@lemmy.dbzer0.com · 11 months ago

The real value of SDXL isn’t the higher native resolution, its the improvements in rendering fingers and text and so on. But honestly I have not yet been super impressed by SDXL, in the same way that I want to stay playing the old game with all its DLC and mods. SDXL is good, but until we have the same depth of resources available I am staying with 1.5.

FactorSD@lemmy.dbzer0.com · 11 months ago

The community will decide what is best by which model they support

FactorSD@lemmy.dbzer0.com · 11 months ago

I am planning on cooking a LORA today - I’ll give this a go and report back.

FactorSD@lemmy.dbzer0.com · 1 year ago

I guess YMMV on whether focused is boring or not. I agree that I never really found stimulants to be super interesting, but thats partly because it was too expensive to do coke just to work on whatever project was on my mind.

FactorSD@lemmy.dbzer0.com · 1 year ago

Most SD stuff requires specific versions of everything, and as you say the documentation is poor even on Windows. Try other forks, and you may get lucky.

FactorSD@lemmy.dbzer0.com · 1 year ago

How is it meaningfully different to the existing Scribble and Lineart controlnets that are already working in Automatic1111?

FactorSD@lemmy.dbzer0.com · 1 year ago

Prompt was presumably “Shaq to the moon!”

FactorSD@lemmy.dbzer0.com · 1 year ago

You really think people would spend a lifetime writing books if they couldn’t make money from it?

Things which are free have no value, both economic and societal. Even when we pirate stuff, at least our society encourages creative labour.

FactorSD@lemmy.dbzer0.com · 1 year ago

Controlnet of some sort is your best bet. You’ll probably need to render an imperfect scene, spit it out as a lineart png, take to photoshop and draw the lines in more correctly, stick it back into controlnet (without pre-processor) and re-gen with the same seed and prompt.

You can do multiple rounds to get it just so.

You can just doodle a guide image for SD, but I am not good at it so I spin through until I find something I can fix.

FactorSD@lemmy.dbzer0.com · 1 year ago

I just wanted to come back and update my earlier post because HOLY SHIT I have radically improved my LORAs over the past couple of days.

Firstly; I was flat out wrong about the need to heavily tag, at least for things like garments. The guide I was following talked about styles and objects as the two types of LORA. I thought I was doing one type when actually I should have been treating it like the other. I tore the tags apart so that almost my whole training set was just trained on the single key concept, some had one or two extra terms at most. So, Mr Shiimiish can tag his helmets in a much more chill style.

Secondly; at least in my experience you need to twiddle with the Kohya settings just slightly to get genuinely good results. I was getting burned out LORAs by generatiton 6 or 7 before, but I turned down the learning rate to 0.00005 (half the default) with 7 repeats per epoch (instead of the laughably high 40) and it’s much much better now. The jumps between each epoch are much smaller and you can get a much more granular picture of what is actually going on. I also turned the rank, network and module dropouts to 0.1. Kohya says they are the minimum recommended values, but by default they are set to zero. I have no idea what those do, but I am definitely getting better results. I’ll do a codeblock with my full settings at the end.

Finally; and very very very importantly - CHECK THE FUCKING FOLDERS. If you, like me, got a bad LORA, reloaded the old config to change it around a bit then set it running again you will discover that Kohya kept the old training set, and if you changed the number of repeats the old and the new training set will now be in differently named folders but BOTH will be used in the new run, so it’ll just fuck up again and also take much much longer on your second run.

I have also been told to use regularization images - Not to download them, to make some then use them. So, go to your SD, plug in the model you are training the LORA on, give a prompt that will create “things without the LORA item on them”. So, if you are training for dudes in chainmail, put in “30 year old athletic man standing up”. Have SD generate like 500 pictures worth of that (I have no idea why that many, that’s what I was told). The idea is that you have your training data of dudes in armour, and then you have the regularization of similarly constructed dudes not in armour, and SD will look back and forth to help it figure out what you are trying to train it on. You need to use the model you are training on to do the rendering though, so make your own and don’t download other people’s even if it seems similar. Just be patient and let it run.

Since I mention models - It probably goes without saying, but don’t use the base SD1.5 model unless you actually intend to generate images with it, because it is… It’s not wonderful. It’s alright, but get something that’s more appropriate to your needs. There’s plenty of stuff that’s been trained on LOTR and GOT that are at least better at understanding that a cuirass isn’t a type of dress shirt. If you are doing anime, use an anime model, etc etc.

Getting all this stuff right has radically improved the LORAs. They are significantly more transparent; the styling in the original image will change somewhat unless you prompt against it (it’s an inevitable result of imperfectly balanced training data, it’s why so many LORAs make your people look unexpectedly Japanese) but they don’t make people’s faces melt. With the last LORA run I did today I was testing epoch 29 at CFG11 and still getting good clean images with no distortion. Previously I would be running epoch 3 or 4 at CFG3 to not get a Daliesque nightmare. Huge improvement all around, and I no longer have to take the earliest epoch that reproduces the item, there’s a big range to test and see which best preserves detail and plays well with others.

Here’s the Kohya settings that were actually successful - You can’t quite just copy and paste because you will need to set up your own folders correctly yourself, and choose your model and sample prompt and all that. But you can at least run your eye down the values and copy those across. I’d say you want to run 30ish epochs from this, based on 30 to 50 good pictures. One LORA I ran today was good at about 20, the other at about 30. That might take a while, apologies about that, but I am running on a Shadow pc with an A4500 20GB, and it turned into about 1hr 45 to do 30 epochs which is pretty reasonable.

Settings

"LoRA_type": "Standard", "adaptive_noise_scale": 0, "additional_parameters": "", "block_alphas": "", "block_dims": "", "block_lr_zero_threshold": "", "bucket_no_upscale": true, "bucket_reso_steps": 64, "cache_latents": true, "cache_latents_to_disk": false, "caption_dropout_every_n_epochs": 0.0, "caption_dropout_rate": 0, "caption_extension": "", "clip_skip": "1", "color_aug": false, "conv_alpha": 1, "conv_alphas": "", "conv_dim": 1, "conv_dims": "", "decompose_both": false, "dim_from_weights": false, "down_lr_weight": "", "enable_bucket": true, "epoch": 30, "factor": -1, "flip_aug": false, "full_fp16": false, "gradient_accumulation_steps": "1", "gradient_checkpointing": false, "keep_tokens": "0", "learning_rate": 5e-05, "lora_network_weights": "", "lr_scheduler": "cosine", "lr_scheduler_num_cycles": "", "lr_scheduler_power": "", "lr_warmup": 10, "max_data_loader_n_workers": "0", "max_resolution": "512,512", "max_token_length": "75", "max_train_epochs": "", "mem_eff_attn": false, "mid_lr_weight": "", "min_snr_gamma": 0, "mixed_precision": "fp16", "model_list": "custom", "module_dropout": 0.1, "multires_noise_discount": 0, "multires_noise_iterations": 0, "network_alpha": 128, "network_dim": 128, "network_dropout": 0.1, "no_token_padding": false, "noise_offset": 0, "noise_offset_type": "Original", "num_cpu_threads_per_process": 2, "optimizer": "AdamW8bit", "optimizer_args": "", "persistent_data_loader_workers": false, "prior_loss_weight": 1.0, "random_crop": false, "rank_dropout": 0.1, "resume": "", "sample_every_n_epochs": 1, "sample_every_n_steps": 0, "sample_sampler": "k_dpm_2", "save_every_n_epochs": 1, "save_every_n_steps": 0, "save_last_n_steps": 0, "save_last_n_steps_state": 0, "save_model_as": "safetensors", "save_precision": "fp16", "save_state": false, "scale_v_pred_loss_like_noise_pred": false, "scale_weight_norms": 0, "seed": "", "shuffle_caption": false, "stop_text_encoder_training": 0, "text_encoder_lr": 1e-05, "train_batch_size": 2, "train_on_input": false, "unet_lr": 5e-05, "unit": 1, "up_lr_weight": "", "use_cp": false, "use_wandb": false, "v2": false, "v_parameterization": false, "vae_batch_size": 0, "wandb_api_key": "", "weighted_captions": false, "xformers": true

I figured I’d at least post this up so any future garment enthusiasts could at least learn a bit from my monkey-at-a-typewriter approach.

FactorSD@lemmy.dbzer0.com · edit-2 1 year ago

Spotify sucks, but the whole music industry has sucked like that for literally a hundred years - A very very few artists make bank, about 5% make a little, everyone else makes zero.

FactorSD@lemmy.dbzer0.com · 1 year ago

Does it still use the prompt+++ syntax over (prompt:1.2)?

FactorSD@lemmy.dbzer0.com · 1 year ago

Presumably “beta” in the sense of “some things won’t work, take it or leave it”

FactorSD@lemmy.dbzer0.com · 1 year ago

There’s a weird modern military turn based strategy game where you fight invading orcs. It’s called Spellcross and until recently it only was available through Hall of the Underdogs. Great game, very Xcom, balls hard.

FactorSD@lemmy.dbzer0.com · 1 year ago

Royalties is part of the music business. In TV, everyone gets paid per episode.

FactorSD@lemmy.dbzer0.com · 1 year ago

Most artists never make any money at all…

FactorSD@lemmy.dbzer0.com · 1 year ago

"I can just use a depth map to get perfect hands!" he said...