---
license: cc-by-nc-4.0
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusion
- text-to-image
- ambient diffusion
- low-quality data
- synthetic data
---

# Ambient Diffusion Omni (Ambient-o): Training Good Models with Bad Data
![](ambient_logo.png)

## Model Description

Ambient Diffusion Omni (Ambient-o) is a framework for using low-quality, synthetic, and out-of-distribution images to improve the quality of diffusion models. Unlike traditional approaches that rely on highly curated datasets, Ambient-o extracts valuable signal from all available images during training, including data typically discarded as "low-quality."

This model card is for a text-to-image diffusion model trained on 8-H100 GPUs only for two days. The key innovation is the usage of synthetic data as "noisy" samples.

## Architecture

Ambient-o builds upon the [MicroDiffusion](https://github.com/SonyResearch/micro_diffusion) cobebase -- we use a Mixture of Experts Diffusion Transformer totaling ~1.1B parameters.

## Text-to-Image Results


Ambient-o demonstrates improvements in text-to-image generation. Compared to the two baselines of 1) filtering low-quality samples and 2) using all the data as equal, Ambient-o achieves increased diversity compared to 1) and enhanced quality compared to 2). Ambient-o achieves visual improvements without sacrificing diversity.


### Training Data Composition

The model was trained on a diverse mixture of datasets:
- **Conceptual Captions (CC12M)**: 12M image-caption pairs
- **Segment Anything (SA1B)**: 11.1M high-resolution images with LLaVA-generated captions  
- **JourneyDB**: 4.4M synthetic image-caption pairs from Midjourney
- **DiffusionDB**: 10.7M quality-filtered synthetic image-caption pairs from Stable Diffusion

Data from DiffusionDB were treated as noisy samples.


### Technical Approach

#### High Noise Regime
At high diffusion times, the model leverages the theoretical insight that noise contracts distributional differences, reducing mismatch between high-quality target distribution and mixed-quality training data. This creates a beneficial bias-variance trade-off where low-quality samples increase sample size and reduce estimator variance.

#### Low Noise Regime  
At low diffusion times, the model exploits locality properties of natural images, using small image crops that allow borrowing high-frequency details from out-of-distribution or synthetic images when their marginal distributions match the target data.

## Usage

```python
import torch
from micro_diffusion.models.model import create_latent_diffusion
from huggingface_hub import hf_hub_download
from safetensors import safe_open

# Init model
params = {
    'latent_res': 64,
    'in_channels': 4,
    'pos_interp_scale': 2.0,
}
model = create_latent_diffusion(**params).to('cuda')

# Download weights from HF
model_dict_path = hf_hub_download(repo_id="giannisdaras/ambient-o", filename="model.safetensors")
model_dict = {}
with safe_open(model_dict_path, framework="pt", device="cpu") as f:
   for key in f.keys():
       model_dict[key] = f.get_tensor(key)

# Convert parameters to float32 + load
float_model_params = {
    k: v.to(torch.float32) for k, v in model_dict.items()
}
model.dit.load_state_dict(float_model_params)

# Eval mode
model = model.eval()

# Generate images
prompts = [
    "Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumet",
    "A illustration from a graphic novel. A bustling city street under the shine of a full moon.",
    "A giant cobra snake made from corn",
    "A fierce garden gnome warrior, clad in armor crafted from leaves and bark, brandishes a tiny sword.",
    "A capybara made of lego sitting in a realistic, natural field",
    "a close-up of a fire spitting dragon, cinematic shot.",
    "Panda mad scientist mixing sparkling chemicals, artstation"
]
images = model.generate(prompt=prompts, num_inference_steps=30, guidance_scale=5.0, seed=42)
```

## Citation

```bibtex
@article{daras2025ambient,
  title={Ambient Diffusion Omni: Training Good Models with Bad Data},
  author={Daras, Giannis and Rodriguez-Munoz, Adrian and Klivans, Adam and Torralba, Antonio and Daskalakis, Constantinos},
  journal={arXiv preprint},
  year={2025},
}
```

## License

The model follows the [license](https://github.com/SonyResearch/micro_diffusion/blob/main/LICENSE) of the MicroDiffusion repo.