Diffusion Single File
comfyui

LoRA causes strong style dilution / override on Anima – workarounds or fixes?

#60
by darask0 - opened

Hi everyone,

Anima-preview is fantastic for its super-strong prompt adherence β€” especially Danbooru-style artist tags. Just adding "@artistname", the plain artist name, or even partial tags often nails the exact style perfectly without any LoRA needed. That's honestly one of its biggest selling points for me.

The problem: Whenever I add almost any LoRA (character, style, concept β€” even ones that work flawlessly on Illustrious or similar anime bases), the Danbooru artist tags lose most or all of their influence, no matter the LoRA weight I try. The artist-specific look that normally dominates becomes very weak, generic, or completely gone/suppressed.

What I've consistently seen in tests (using ComfyUI with standard LoRA loader; same in A1111/Forge):

  • Even at very low weights (~0.4–0.7): Artist tags are already heavily diluted or basically ineffective β€” the LoRA starts taking over and the output drifts away from the tagged artist style.
  • At medium weights (~0.8–1.0): Artist tags do almost nothing at all β€” the image is dominated by the LoRA's baked-in style, often looking generic/mismatched with Anima's base aesthetic.

This happens pretty consistently across different LoRAs, so it's hard to add specific characters/concepts via LoRA while still relying on Anima's excellent artist tag control. The artist tag strength seems to get overridden or nullified in a weight-independent way (or at best, only survives meaningfully at near-zero LoRA strength, where the LoRA itself becomes useless).

Has anyone else experienced this strong suppression of artist tags when using LoRAs on Anima?

  • Is this expected / by design because of Anima's extremely powerful baked-in tag following and LLM-style text encoder aggressively overriding external LoRA adaptations?
  • Are there Anima-specific training configs or techniques (e.g. anima_train_network.py, networks.lora_anima in kohya_ss forks, special dataset captioning with multi-variant artist tags + detailed captions, avoiding certain triggers, etc.) that let LoRAs coexist better with artist tags without killing them?
  • Any solid workarounds people have found? (e.g. extremely low LoRA weights + heavy artist tag boosting like (artistname:1.4–1.6), custom negative prompts to fight suppression, clip skip tweaks, tag-only or hybrid captioning during LoRA training, avoiding DoRA entirely, etc.)

I'd really love to hear from folks who've managed to train or use character/style LoRAs successfully on Anima while keeping robust Danbooru artist tag responsiveness β€” without the tags getting wiped out like this.

Thanks a ton in advance for any tips, experiences, or confirmations!

What you're experiencing is what's happening in this thread. https://huggingface.co/circlestone-labs/Anima/discussions/44
Basically, all CLIP-less models (everything after XL) can't mix artist/styles/medium like XL due to their more precise method of generating images. The best solution I can think of is to just use LORAs for mixing, but I haven't tested that myself.

What you're experiencing is what's happening in this thread. https://huggingface.co/circlestone-labs/Anima/discussions/44
Basically, all CLIP-less models (everything after XL) can't mix artist/styles/medium like XL due to their more precise method of generating images. The best solution I can think of is to just use LORAs for mixing, but I haven't tested that myself.

Not sure if we're misunderstanding, but lora training right now seems to catastrophically make the model forget.
I'm mostly hoping it's because the model needs more learning in general that it forgets so easily.

What you're experiencing is what's happening in this thread. https://huggingface.co/circlestone-labs/Anima/discussions/44
Basically, all CLIP-less models (everything after XL) can't mix artist/styles/medium like XL due to their more precise method of generating images. The best solution I can think of is to just use LORAs for mixing, but I haven't tested that myself.

OP is reporting that Anima is suffering from catastrophic forgetting with a style lora, if a style only has small breasts for example, if you try to prompt huge or gigantic it can't get nowhere near the sizes that Anima itself can do, same for other physical features, the latent artists not working with a lora is just a by-product of this as well, since its not mixing strictly speaking, if you train a triggerless style lora, apply , then prompt for latent artist style that works without the lora, and lora absolutely nukes the latent artist knowledge, then there's something wrong here.

In summary, yes the model suffers from catastrophic forgetting with simple loras.

What you're experiencing is what's happening in this thread. https://huggingface.co/circlestone-labs/Anima/discussions/44
Basically, all CLIP-less models (everything after XL) can't mix artist/styles/medium like XL due to their more precise method of generating images. The best solution I can think of is to just use LORAs for mixing, but I haven't tested that myself.

OP is reporting that Anima is suffering from catastrophic forgetting with a style lora, if a style only has small breasts for example, if you try to prompt huge or gigantic it can't get nowhere near the sizes that Anima itself can do, same for other physical features, the latent artists not working with a lora is just a by-product of this as well, since its not mixing strictly speaking, if you train a triggerless style lora, apply , then prompt for latent artist style that works without the lora, and lora absolutely nukes the latent artist knowledge, then there's something wrong here.

In summary, yes the model suffers from catastrophic forgetting with simple loras.

Ah, I see. That is a problem.

CircleStone Labs org

I've already made some changes to the training that should make the final version more robust to finetuning. And just more training in general will also help. For the preview version, there are two things you can do that will help the model not degrade:

  1. Don't train the LLM adapter. The example config in diffusion-pipe has it like this by default, but I don't know what other training scripts are defaulting to.
  2. Use a low learning rate. For most concepts you really don't need much, especially if it's something the model already partially knows.

I've already made some changes to the training that should make the final version more robust to finetuning. And just more training in general will also help. For the preview version, there are two things you can do that will help the model not degrade:

  1. Don't train the LLM adapter. The example config in diffusion-pipe has it like this by default, but I don't know what other training scripts are defaulting to.
  2. Use a low learning rate. For most concepts you really don't need much, especially if it's something the model already partially knows.

Training a lora for a style without triggers , with 1e-5 LR or 2e-5 until style fits, without the LLM Adapter still results in concepts dilluting, the examples I proposed of for example breasts sizes dilluting towards the ones in the style were from my own experiments, what are your recommendations in this case?

If you can, training with extra not-strictly-related data helps the model not forget a bit. Not perfect though.
I did not train the text encoder, I'm using sd_scripts and I assume that doesn't train the adapter either; 32/32/0.000122 dim/alpha/lr.
Evidently, the data doesn't strictly need to have the styles you want the model to remember.

Grid

No lora / lora / lora with more data / reference

Top character is 4000 steps of ~200 images vs (almost) 12000 steps of +2 other characters (+ ~600 images). Model starts to seriously lose the watercolor style at around 3000-4000 steps but also gains knowledge of how to do her stirrups properly and more consistently at around 4000/12000... All ~800 images do not contain any "watercolor (medium)"-tagged images.

forgetgrid2

Bottom character is a worst case scenario dataset of purely anime screenshots, likely why the character is decently worsened, sadly. 2000 steps. Added 2000 random dan images (4000 steps).
Sword is on the wrong side because I trained it with flips in desperation for more data. Whoops!
The extra 2000 images did contain 1 single cutesexyrobutts-tagged image (middle row), but they didn't have setz (bottom row). I think most artists trend towards worsening the character like the setz style does . Well, that's one reason why you want a better dataset.

Of course, this means training for longer, if you're not doing a multipurpose lora it can be 2x longer...

I've already made some changes to the training that should make the final version more robust to finetuning. And just more training in general will also help. For the preview version, there are two things you can do that will help the model not degrade:

  1. Don't train the LLM adapter. The example config in diffusion-pipe has it like this by default, but I don't know what other training scripts are defaulting to.
  2. Use a low learning rate. For most concepts you really don't need much, especially if it's something the model already partially knows.

Thank you to all the hard work you and the team are doing. Very much appreciated and I'm glad you're listening to the community as well. I can't wait for the full release definitely my favourite image model right now.

Thank you all for your comments.
I’ll take everyone’s suggestions into account and try out various things myself as well.
Thank you so much!

Sign up or log in to comment