Ambient Diffusion Omni (Ambient-o): Training Good Models with Bad Data

Model Description

Ambient Diffusion Omni (Ambient-o) is a framework for using low-quality, synthetic, and out-of-distribution images to improve the quality of diffusion models. Unlike traditional approaches that rely on highly curated datasets, Ambient-o extracts valuable signal from all available images during training, including data typically discarded as "low-quality."

This model card is for a text-to-image diffusion model trained on 8-H100 GPUs only for two days. The key innovation is the usage of synthetic data as "noisy" samples.

Architecture

Ambient-o builds upon the MicroDiffusion cobebase -- we use a Mixture of Experts Diffusion Transformer totaling ~1.1B parameters.

Text-to-Image Results

Ambient-o demonstrates improvements in text-to-image generation. Compared to the two baselines of 1) filtering low-quality samples and 2) using all the data as equal, Ambient-o achieves increased diversity compared to 1) and enhanced quality compared to 2). Ambient-o achieves visual improvements without sacrificing diversity.

Training Data Composition

The model was trained on a diverse mixture of datasets:

Conceptual Captions (CC12M): 12M image-caption pairs
Segment Anything (SA1B): 11.1M high-resolution images with LLaVA-generated captions
JourneyDB: 4.4M synthetic image-caption pairs from Midjourney
DiffusionDB: 10.7M quality-filtered synthetic image-caption pairs from Stable Diffusion

Data from DiffusionDB were treated as noisy samples.

Technical Approach

High Noise Regime

At high diffusion times, the model leverages the theoretical insight that noise contracts distributional differences, reducing mismatch between high-quality target distribution and mixed-quality training data. This creates a beneficial bias-variance trade-off where low-quality samples increase sample size and reduce estimator variance.

Low Noise Regime

At low diffusion times, the model exploits locality properties of natural images, using small image crops that allow borrowing high-frequency details from out-of-distribution or synthetic images when their marginal distributions match the target data.

Usage

import torch
from micro_diffusion.models.model import create_latent_diffusion
from huggingface_hub import hf_hub_download
from safetensors import safe_open

# Init model
params = {
    'latent_res': 64,
    'in_channels': 4,
    'pos_interp_scale': 2.0,
}
model = create_latent_diffusion(**params).to('cuda')

# Download weights from HF
model_dict_path = hf_hub_download(repo_id="giannisdaras/ambient-o", filename="model.safetensors")
model_dict = {}
with safe_open(model_dict_path, framework="pt", device="cpu") as f:
   for key in f.keys():
       model_dict[key] = f.get_tensor(key)

# Convert parameters to float32 + load
float_model_params = {
    k: v.to(torch.float32) for k, v in model_dict.items()
}
model.dit.load_state_dict(float_model_params)

# Eval mode
model = model.eval()

# Generate images
prompts = [
    "Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumet",
    "A illustration from a graphic novel. A bustling city street under the shine of a full moon.",
    "A giant cobra snake made from corn",
    "A fierce garden gnome warrior, clad in armor crafted from leaves and bark, brandishes a tiny sword.",
    "A capybara made of lego sitting in a realistic, natural field",
    "a close-up of a fire spitting dragon, cinematic shot.",
    "Panda mad scientist mixing sparkling chemicals, artstation"
]
images = model.generate(prompt=prompts, num_inference_steps=30, guidance_scale=5.0, seed=42)

Citation

@article{daras2025ambient,
  title={Ambient Diffusion Omni: Training Good Models with Bad Data},
  author={Daras, Giannis and Rodriguez-Munoz, Adrian and Klivans, Adam and Torralba, Antonio and Daskalakis, Constantinos},
  journal={arXiv preprint},
  year={2025},
}

License

The model follows the license of the MicroDiffusion repo.

Downloads last month: 15

giannisdaras
/

ambient-o