TinyFlux-Deep
An expanded TinyFlux architecture that increases depth and width while preserving learned representations. TinyFlux-Deep is ported from TinyFlux with strategic layer expansion and attention head doubling.
Model Description
TinyFlux-Deep extends the base TinyFlux model by:
- Doubling attention heads (2 β 4) with expanded hidden dimension (256 β 512)
- 5Γ more double-stream layers (3 β 15)
- 8Γ more single-stream layers (3 β 25)
- Preserving learned weights from TinyFlux in frozen anchor positions
Architecture Comparison
| Component | TinyFlux | TinyFlux-Deep | Flux |
|---|---|---|---|
| Hidden size | 256 | 512 | 3072 |
| Attention heads | 2 | 4 | 24 |
| Head dimension | 128 | 128 | 128 |
| Double-stream layers | 3 | 15 | 19 |
| Single-stream layers | 3 | 25 | 38 |
| VAE channels | 16 | 16 | 16 |
| Total params | ~8M | ~85M | ~12B |
Layer Mapping (Ported from TinyFlux)
The original TinyFlux weights are strategically distributed and frozen:
Single blocks (3 β 25):
| TinyFlux Layer | TinyFlux-Deep Position | Status |
|---|---|---|
| 0 | 0 | Frozen |
| 1 | 8, 12, 16 | Frozen (3 copies) |
| 2 | 24 | Frozen |
| β | 1-7, 9-11, 13-15, 17-23 | Trainable |
Double blocks (3 β 15):
| TinyFlux Layer | TinyFlux-Deep Position | Status |
|---|---|---|
| 0 | 0 | Frozen |
| 1 | 4, 7, 10 | Frozen (3 copies) |
| 2 | 14 | Frozen |
| β | 1-3, 5-6, 8-9, 11-13 | Trainable |
Trainable ratio: ~70% of parameters
Attention Head Expansion
Original 2 heads are copied to new positions, with 2 new heads randomly initialized:
- Old head 0 β New head 0
- Old head 1 β New head 1
- Heads 2-3 β Xavier initialized (scaled 0.02Γ)
Text Encoders
Same as TinyFlux:
| Role | Model |
|---|---|
| Sequence encoder | flan-t5-base (768 dim) |
| Pooled encoder | CLIP-L (768 dim) |
Training
Strategy
- Port TinyFlux weights with dimension expansion
- Freeze ported layers as "anchor" knowledge
- Train new layers to interpolate between anchors
- Optional: Unfreeze all and fine-tune at lower LR
Dataset
Trained on AbstractPhil/flux-schnell-teacher-latents:
- 10,000 samples
- Pre-computed VAE latents (16, 64, 64) from 512Γ512 images
- Diverse prompts covering people, objects, scenes, styles
Training Details
- Objective: Flow matching (rectified flow)
- Timestep sampling: Logit-normal with Flux shift (s=3.0)
- Loss weighting: Min-SNR-Ξ³ (Ξ³=5.0)
- Optimizer: AdamW (lr=5e-5, Ξ²=(0.9, 0.99), wd=0.01)
- Schedule: Cosine with warmup
- Precision: bfloat16
- Batch size: 32 (16 Γ 2 gradient accumulation)
Usage
Installation
pip install torch transformers diffusers safetensors huggingface_hub
Inference
import torch
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
from transformers import T5EncoderModel, T5Tokenizer, CLIPTextModel, CLIPTokenizer
from diffusers import AutoencoderKL
# Load model (copy TinyFlux class definition first, use TinyFluxDeepConfig)
config = TinyFluxDeepConfig()
model = TinyFlux(config).to("cuda").to(torch.bfloat16)
weights = load_file(hf_hub_download("AbstractPhil/tiny-flux-deep", "model.safetensors"))
model.load_state_dict(weights, strict=False) # strict=False for precomputed buffers
model.eval()
# Load encoders
t5_tok = T5Tokenizer.from_pretrained("google/flan-t5-base")
t5_enc = T5EncoderModel.from_pretrained("google/flan-t5-base", torch_dtype=torch.bfloat16).to("cuda")
clip_tok = CLIPTokenizer.from_pretrained("openai/clip-vit-large-patch14")
clip_enc = CLIPTextModel.from_pretrained("openai/clip-vit-large-patch14", torch_dtype=torch.bfloat16).to("cuda")
vae = AutoencoderKL.from_pretrained("black-forest-labs/FLUX.1-schnell", subfolder="vae", torch_dtype=torch.bfloat16).to("cuda")
# Encode prompt
prompt = "a photo of a cat sitting on a windowsill"
t5_in = t5_tok(prompt, max_length=128, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
t5_out = t5_enc(**t5_in).last_hidden_state
clip_in = clip_tok(prompt, max_length=77, padding="max_length", truncation=True, return_tensors="pt").to("cuda")
clip_out = clip_enc(**clip_in).pooler_output
# Euler sampling with Flux shift
def flux_shift(t, s=3.0):
return s * t / (1 + (s - 1) * t)
x = torch.randn(1, 64*64, 16, device="cuda", dtype=torch.bfloat16)
img_ids = TinyFlux.create_img_ids(1, 64, 64, "cuda")
t_linear = torch.linspace(0, 1, 21, device="cuda")
timesteps = flux_shift(t_linear)
for i in range(20):
t = timesteps[i].unsqueeze(0)
dt = timesteps[i+1] - timesteps[i]
guidance = torch.tensor([3.5], device="cuda", dtype=torch.bfloat16)
v = model(
hidden_states=x,
encoder_hidden_states=t5_out,
pooled_projections=clip_out,
timestep=t,
img_ids=img_ids,
guidance=guidance,
)
x = x + v * dt
# Decode
latents = x.reshape(1, 64, 64, 16).permute(0, 3, 1, 2)
latents = latents / vae.config.scaling_factor
image = vae.decode(latents.float()).sample
image = (image / 2 + 0.5).clamp(0, 1)
Configuration
@dataclass
class TinyFluxDeepConfig:
hidden_size: int = 512
num_attention_heads: int = 4
attention_head_dim: int = 128
in_channels: int = 16
joint_attention_dim: int = 768
pooled_projection_dim: int = 768
num_double_layers: int = 15
num_single_layers: int = 25
mlp_ratio: float = 4.0
axes_dims_rope: Tuple[int, int, int] = (16, 56, 56)
guidance_embeds: bool = True
Files
AbstractPhil/tiny-flux-deep/
βββ model.safetensors # Model weights (~340MB)
βββ config.json # Model configuration
βββ frozen_params.json # List of frozen parameter names
βββ README.md # This file
βββ model.py # Model architecture (includes TinyFluxDeepConfig)
βββ inference_colab.py # Inference script
βββ train_deep_colab.py # Training script with layer freezing
βββ port_to_deep.py # Porting script from TinyFlux
βββ checkpoints/ # Training checkpoints
β βββ step_*.safetensors
βββ logs/ # Tensorboard logs
βββ samples/ # Generated samples during training
Porting from TinyFlux
To create a new TinyFlux-Deep from scratch:
# Run port_to_deep.py
# 1. Downloads AbstractPhil/tiny-flux weights
# 2. Creates TinyFlux-Deep model (512 hidden, 4 heads, 25 single, 15 double)
# 3. Expands attention heads (2β4) and hidden dimension (256β512)
# 4. Distributes layers to anchor positions
# 5. Saves to AbstractPhil/tiny-flux-deep
Comparison with TinyFlux
| Aspect | TinyFlux | TinyFlux-Deep |
|---|---|---|
| Parameters | ~8M | ~85M |
| Memory (bf16) | ~16MB | ~170MB |
| Forward pass | ~15ms | ~60ms |
| Capacity | Limited | Moderate |
| Training | From scratch | Ported + fine-tuned |
Limitations
- Resolution: Trained on 512Γ512 only
- Quality: Better than TinyFlux, still below full Flux
- Text understanding: Limited by smaller T5 encoder (768 vs 4096 dim)
- Early training: Model is actively being trained
- Experimental: Intended for research, not production
Intended Use
- Studying model scaling and expansion techniques
- Testing layer freezing and knowledge transfer
- Rapid prototyping with moderate capacity
- Educational purposes
- Baseline for architecture experiments
Citation
@misc{tinyfluxdeep2026,
title={TinyFlux-Deep: Expanded Flux Architecture with Knowledge Preservation},
author={AbstractPhil},
year={2026},
url={https://huggingface.co/AbstractPhil/tiny-flux-deep}
}
Related Models
- AbstractPhil/tiny-flux - Base model (8M params)
- black-forest-labs/FLUX.1-schnell - Original Flux
Acknowledgments
- Black Forest Labs for the original Flux architecture
- Hugging Face for diffusers and transformers libraries
License
MIT License - See LICENSE file for details.
Note: This is an experimental research model under active development. Training is ongoing and weights may be updated frequently.
- Downloads last month
- 54
Model tree for AbstractPhil/tiny-flux-deep
Base model
black-forest-labs/FLUX.1-schnell