ambient-o / README.md
giannisdaras's picture
Update README.md (#3)
41c7110 verified
---
license: cc-by-nc-4.0
library_name: diffusers
pipeline_tag: text-to-image
tags:
- diffusion
- text-to-image
- ambient diffusion
- low-quality data
- synthetic data
---
# Ambient Diffusion Omni (Ambient-o): Training Good Models with Bad Data
![](ambient_logo.png)
## Model Description
Ambient Diffusion Omni (Ambient-o) is a framework for using low-quality, synthetic, and out-of-distribution images to improve the quality of diffusion models. Unlike traditional approaches that rely on highly curated datasets, Ambient-o extracts valuable signal from all available images during training, including data typically discarded as "low-quality."
This model card is for a text-to-image diffusion model trained on 8-H100 GPUs only for two days. The key innovation is the usage of synthetic data as "noisy" samples.
## Architecture
Ambient-o builds upon the [MicroDiffusion](https://github.com/SonyResearch/micro_diffusion) cobebase -- we use a Mixture of Experts Diffusion Transformer totaling ~1.1B parameters.
## Text-to-Image Results
Ambient-o demonstrates improvements in text-to-image generation. Compared to the two baselines of 1) filtering low-quality samples and 2) using all the data as equal, Ambient-o achieves increased diversity compared to 1) and enhanced quality compared to 2). Ambient-o achieves visual improvements without sacrificing diversity.
### Training Data Composition
The model was trained on a diverse mixture of datasets:
- **Conceptual Captions (CC12M)**: 12M image-caption pairs
- **Segment Anything (SA1B)**: 11.1M high-resolution images with LLaVA-generated captions
- **JourneyDB**: 4.4M synthetic image-caption pairs from Midjourney
- **DiffusionDB**: 10.7M quality-filtered synthetic image-caption pairs from Stable Diffusion
Data from DiffusionDB were treated as noisy samples.
### Technical Approach
#### High Noise Regime
At high diffusion times, the model leverages the theoretical insight that noise contracts distributional differences, reducing mismatch between high-quality target distribution and mixed-quality training data. This creates a beneficial bias-variance trade-off where low-quality samples increase sample size and reduce estimator variance.
#### Low Noise Regime
At low diffusion times, the model exploits locality properties of natural images, using small image crops that allow borrowing high-frequency details from out-of-distribution or synthetic images when their marginal distributions match the target data.
## Usage
```python
import torch
from micro_diffusion.models.model import create_latent_diffusion
from huggingface_hub import hf_hub_download
from safetensors import safe_open
# Init model
params = {
'latent_res': 64,
'in_channels': 4,
'pos_interp_scale': 2.0,
}
model = create_latent_diffusion(**params).to('cuda')
# Download weights from HF
model_dict_path = hf_hub_download(repo_id="giannisdaras/ambient-o", filename="model.safetensors")
model_dict = {}
with safe_open(model_dict_path, framework="pt", device="cpu") as f:
for key in f.keys():
model_dict[key] = f.get_tensor(key)
# Convert parameters to float32 + load
float_model_params = {
k: v.to(torch.float32) for k, v in model_dict.items()
}
model.dit.load_state_dict(float_model_params)
# Eval mode
model = model.eval()
# Generate images
prompts = [
"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumet",
"A illustration from a graphic novel. A bustling city street under the shine of a full moon.",
"A giant cobra snake made from corn",
"A fierce garden gnome warrior, clad in armor crafted from leaves and bark, brandishes a tiny sword.",
"A capybara made of lego sitting in a realistic, natural field",
"a close-up of a fire spitting dragon, cinematic shot.",
"Panda mad scientist mixing sparkling chemicals, artstation"
]
images = model.generate(prompt=prompts, num_inference_steps=30, guidance_scale=5.0, seed=42)
```
## Citation
```bibtex
@article{daras2025ambient,
title={Ambient Diffusion Omni: Training Good Models with Bad Data},
author={Daras, Giannis and Rodriguez-Munoz, Adrian and Klivans, Adam and Torralba, Antonio and Daskalakis, Constantinos},
journal={arXiv preprint},
year={2025},
}
```
## License
The model follows the [license](https://github.com/SonyResearch/micro_diffusion/blob/main/LICENSE) of the MicroDiffusion repo.