TinyFlux

A /12 scaled Flux architecture for experimentation and research. TinyFlux maintains the core MMDiT (Multimodal Diffusion Transformer) design of Flux while dramatically reducing parameter count for faster iteration and lower resource requirements.

Model Description

TinyFlux is a miniaturized version of FLUX.1-schnell that preserves the essential architectural components:

  • Double-stream blocks (MMDiT style) - separate text/image pathways with joint attention
  • Single-stream blocks - concatenated text+image with shared weights
  • AdaLN-Zero modulation - adaptive layer norm with gating
  • 3D RoPE - rotary position embeddings for temporal + spatial positions
  • Flow matching - rectified flow training objective

Architecture Comparison

Component Flux TinyFlux Scale
Hidden size 3072 256 /12
Attention heads 24 2 /12
Head dimension 128 128 preserved
Double-stream layers 19 3 /6
Single-stream layers 38 3 /12
VAE channels 16 16 preserved
Total params ~12B ~8M /1500

Text Encoders

TinyFlux uses smaller text encoders than standard Flux:

Role Flux TinyFlux
Sequence encoder T5-XXL (4096 dim) flan-t5-base (768 dim)
Pooled encoder CLIP-L (768 dim) CLIP-L (768 dim)

Training

Dataset

Trained on AbstractPhil/flux-schnell-teacher-latents:

  • 10,000 samples
  • Pre-computed VAE latents (16, 64, 64) from 512Γ—512 images
  • Diverse prompts covering people, objects, scenes, styles

Training Details

  • Objective: Flow matching (rectified flow)
  • Timestep sampling: Logit-normal with Flux shift (s=3.0)
  • Loss weighting: Min-SNR-Ξ³ (Ξ³=5.0)
  • Optimizer: AdamW (lr=1e-4, Ξ²=(0.9, 0.99), wd=0.01)
  • Schedule: Cosine with warmup
  • Precision: bfloat16

Flow Matching Formulation

Interpolation: x_t = (1 - t) * noise + t * data
Target velocity: v = data - noise
Loss: MSE(predicted_v, target_v) * min_snr_weight(t)

Usage

Installation

pip install torch transformers diffusers safetensors huggingface_hub

Inference

import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL

# Load model (copy TinyFlux class definition first)
config = TinyFluxConfig()
model = TinyFlux(config).to("cuda").to(torch.bfloat16)

weights = load_file(hf_hub_download("AbstractPhil/tiny-flux", "model.safetensors"))
model.load_state_dict(weights)
model.eval()

# Load encoders
t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")

# Encode prompt
prompt = "a photo of a cat"
t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
t5_out = t5_enc(**t5_in).last_hidden_state
clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
clip_out = clip_enc(**clip_in).pooler_output

# Euler sampling (t: 0→1, noise→data)
x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
timesteps = torch.linspace(0, 1, 21, device="cuda")

for i in range(20):
    t = timesteps[i].unsqueeze(0)
    dt = timesteps[i+1] - timesteps[i]
    guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
    
    v = model(
        hidden_states=x,
        encoder_hidden_states=t5_out,
        pooled_projections=clip_out,
        timestep=t,
        img_ids=img_ids,
        guidance=guidance,
    )
    x = x + v * dt

# Decode
latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
latents = latents / vae.config.scaling_factor
image = vae.decode(latents.float()).sample
image = (image / 2 + 0.5).clamp(0, 1)

Full Inference Script

See the inference_colab.py for a complete generation pipeline with:

  • Classifier-free guidance
  • Batch generation
  • Image saving

Files

AbstractPhil/tiny-flux/
β”œβ”€β”€ model.safetensors      # Model weights (~32MB)
β”œβ”€β”€ config.json            # Model configuration
β”œβ”€β”€ README.md              # This file
β”œβ”€β”€ model.py               # Model architecture definition
β”œβ”€β”€ inference_colab.py     # Inference script
β”œβ”€β”€ train_colab.py         # Training script
β”œβ”€β”€ checkpoints/           # Training checkpoints
β”‚   └── step_*.safetensors
β”œβ”€β”€ logs/                  # Tensorboard logs
└── samples/               # Generated samples during training

Limitations

  • Resolution: Trained on 512Γ—512 only
  • Quality: Significantly lower than full Flux due to reduced capacity
  • Text understanding: Limited by smaller T5 encoder (768 vs 4096 dim)
  • Fine details: May struggle with complex scenes or fine-grained details
  • Experimental: Intended for research and learning, not production use

Intended Use

  • Understanding Flux/MMDiT architecture
  • Rapid prototyping and experimentation
  • Educational purposes
  • Resource-constrained environments
  • Baseline for architecture modifications

Citation

If you use TinyFlux in your research, please cite:

@misc{tinyflux2025,
  title={TinyFlux: A Miniaturized Flux Architecture for Experimentation},
  author={AbstractPhil},
  year={2025},
  url={https://huggingface.co/AbstractPhil/tiny-flux}
}

Acknowledgments

License

MIT License - See LICENSE file for details.


Note: This is an experimental research model. For high-quality image generation, use the full FLUX.1-schnell or FLUX.1-dev models.

Downloads last month
361
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AbstractPhil/tiny-flux

Finetuned
(58)
this model
Finetunes
1 model

Dataset used to train AbstractPhil/tiny-flux