Ambient Diffusion Omni (Ambient-o): Training Good Models with Bad Data
Model Description
Ambient Diffusion Omni (Ambient-o) is a framework for using low-quality, synthetic, and out-of-distribution images to improve the quality of diffusion models. Unlike traditional approaches that rely on highly curated datasets, Ambient-o extracts valuable signal from all available images during training, including data typically discarded as "low-quality."
This model card is for a text-to-image diffusion model trained on 8-H100 GPUs only for two days. The key innovation is the usage of synthetic data as "noisy" samples.
Architecture
Ambient-o builds upon the MicroDiffusion cobebase -- we use a Mixture of Experts Diffusion Transformer totaling ~1.1B parameters.
Text-to-Image Results
Ambient-o demonstrates improvements in text-to-image generation. Compared to the two baselines of 1) filtering low-quality samples and 2) using all the data as equal, Ambient-o achieves increased diversity compared to 1) and enhanced quality compared to 2). Ambient-o achieves visual improvements without sacrificing diversity.
Training Data Composition
The model was trained on a diverse mixture of datasets:
- Conceptual Captions (CC12M): 12M image-caption pairs
- Segment Anything (SA1B): 11.1M high-resolution images with LLaVA-generated captions
- JourneyDB: 4.4M synthetic image-caption pairs from Midjourney
- DiffusionDB: 10.7M quality-filtered synthetic image-caption pairs from Stable Diffusion
Data from DiffusionDB were treated as noisy samples.
Technical Approach
High Noise Regime
At high diffusion times, the model leverages the theoretical insight that noise contracts distributional differences, reducing mismatch between high-quality target distribution and mixed-quality training data. This creates a beneficial bias-variance trade-off where low-quality samples increase sample size and reduce estimator variance.
Low Noise Regime
At low diffusion times, the model exploits locality properties of natural images, using small image crops that allow borrowing high-frequency details from out-of-distribution or synthetic images when their marginal distributions match the target data.
Usage
import torch
from micro_diffusion.models.model import create_latent_diffusion
from huggingface_hub import hf_hub_download
from safetensors import safe_open
# Init model
params = {
'latent_res': 64,
'in_channels': 4,
'pos_interp_scale': 2.0,
}
model = create_latent_diffusion(**params).to('cuda')
# Download weights from HF
model_dict_path = hf_hub_download(repo_id="giannisdaras/ambient-o", filename="model.safetensors")
model_dict = {}
with safe_open(model_dict_path, framework="pt", device="cpu") as f:
for key in f.keys():
model_dict[key] = f.get_tensor(key)
# Convert parameters to float32 + load
float_model_params = {
k: v.to(torch.float32) for k, v in model_dict.items()
}
model.dit.load_state_dict(float_model_params)
# Eval mode
model = model.eval()
# Generate images
prompts = [
"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumet",
"A illustration from a graphic novel. A bustling city street under the shine of a full moon.",
"A giant cobra snake made from corn",
"A fierce garden gnome warrior, clad in armor crafted from leaves and bark, brandishes a tiny sword.",
"A capybara made of lego sitting in a realistic, natural field",
"a close-up of a fire spitting dragon, cinematic shot.",
"Panda mad scientist mixing sparkling chemicals, artstation"
]
images = model.generate(prompt=prompts, num_inference_steps=30, guidance_scale=5.0, seed=42)
Citation
@article{daras2025ambient,
title={Ambient Diffusion Omni: Training Good Models with Bad Data},
author={Daras, Giannis and Rodriguez-Munoz, Adrian and Klivans, Adam and Torralba, Antonio and Daskalakis, Constantinos},
journal={arXiv preprint},
year={2025},
}
License
The model follows the license of the MicroDiffusion repo.
- Downloads last month
- 15
