Inference

from diffusers import StableDiffusionPipeline
from gemma_encoder import Encoder

if __name__ == '__main__':
    pipeline = StableDiffusionPipeline.from_single_file('rosaceae_inkRose.safetensors', vae=...)
    pipeline.enable_model_cpu_offload()
    encoder = Encoder(adapter_model, 'google/t5gemma-2b-2b-ul2-it', device='cpu')
    load_model(adapter_model, 'adapter.safetensors')
    image = pipeline(
        None,
        prompt_embeds=encoder.encode(pipeline, text).to('cpu'),
        negative_prompt='bad quality, low quality, worst quality'
    ).images[0]
    image.save('preview.png')

SD1.5 and Gemma

  • Text condition with spatial positional encoding, it can include both image and text tokens, but trained only on text tokens (similar to OneDiffusion, Qwen Image, etc.).
  • Supports long captions, the dataset emphasized mix of booru tags and natural language
  • Unlike similar T5 models, users don't need to write a novel or use a second LLM. It just works with human written text
  • Character looks and actions are prioritized over the ImageNet1K categories

Datasets

  • alfredplpl/artbench-pd-256x256
  • anime-art-multicaptions (multicharacter interactions)
  • danbooru2023-florence2-caption (verb, action clauses)
  • spatial-caption
  • SPRIGHT-T2I/spright_coco
  • colormix (synthetic color, fashion dataset)
  • trojblue/danbooru2025-metadata
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nightknocker/rosaceae-t5gemma-adapter

Finetuned
(1)
this model