ambient-o / README.md

Update README.md (#3)

41c7110 verified 4 months ago

4.42 kB

	---
	license: cc-by-nc-4.0
	library_name: diffusers
	pipeline_tag: text-to-image
	tags:
	- diffusion
	- text-to-image
	- ambient diffusion
	- low-quality data
	- synthetic data
	---

	# Ambient Diffusion Omni (Ambient-o): Training Good Models with Bad Data
	![](ambient_logo.png)

	## Model Description

	Ambient Diffusion Omni (Ambient-o) is a framework for using low-quality, synthetic, and out-of-distribution images to improve the quality of diffusion models. Unlike traditional approaches that rely on highly curated datasets, Ambient-o extracts valuable signal from all available images during training, including data typically discarded as "low-quality."

	This model card is for a text-to-image diffusion model trained on 8-H100 GPUs only for two days. The key innovation is the usage of synthetic data as "noisy" samples.

	## Architecture

	Ambient-o builds upon the [MicroDiffusion](https://github.com/SonyResearch/micro_diffusion) cobebase -- we use a Mixture of Experts Diffusion Transformer totaling ~1.1B parameters.

	## Text-to-Image Results


	Ambient-o demonstrates improvements in text-to-image generation. Compared to the two baselines of 1) filtering low-quality samples and 2) using all the data as equal, Ambient-o achieves increased diversity compared to 1) and enhanced quality compared to 2). Ambient-o achieves visual improvements without sacrificing diversity.


	### Training Data Composition

	The model was trained on a diverse mixture of datasets:
	- Conceptual Captions (CC12M): 12M image-caption pairs
	- Segment Anything (SA1B): 11.1M high-resolution images with LLaVA-generated captions
	- JourneyDB: 4.4M synthetic image-caption pairs from Midjourney
	- DiffusionDB: 10.7M quality-filtered synthetic image-caption pairs from Stable Diffusion

	Data from DiffusionDB were treated as noisy samples.


	### Technical Approach

	#### High Noise Regime
	At high diffusion times, the model leverages the theoretical insight that noise contracts distributional differences, reducing mismatch between high-quality target distribution and mixed-quality training data. This creates a beneficial bias-variance trade-off where low-quality samples increase sample size and reduce estimator variance.

	#### Low Noise Regime
	At low diffusion times, the model exploits locality properties of natural images, using small image crops that allow borrowing high-frequency details from out-of-distribution or synthetic images when their marginal distributions match the target data.

	## Usage

	```python
	import torch
	from micro_diffusion.models.model import create_latent_diffusion
	from huggingface_hub import hf_hub_download
	from safetensors import safe_open

	# Init model
	params = {
	'latent_res': 64,
	'in_channels': 4,
	'pos_interp_scale': 2.0,
	}
	model = create_latent_diffusion(**params).to('cuda')

	# Download weights from HF
	model_dict_path = hf_hub_download(repo_id="giannisdaras/ambient-o", filename="model.safetensors")
	model_dict = {}
	with safe_open(model_dict_path, framework="pt", device="cpu") as f:
	for key in f.keys():
	model_dict[key] = f.get_tensor(key)

	# Convert parameters to float32 + load
	float_model_params = {
	k: v.to(torch.float32) for k, v in model_dict.items()
	}
	model.dit.load_state_dict(float_model_params)

	# Eval mode
	model = model.eval()

	# Generate images
	prompts = [
	"Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumet",
	"A illustration from a graphic novel. A bustling city street under the shine of a full moon.",
	"A giant cobra snake made from corn",
	"A fierce garden gnome warrior, clad in armor crafted from leaves and bark, brandishes a tiny sword.",
	"A capybara made of lego sitting in a realistic, natural field",
	"a close-up of a fire spitting dragon, cinematic shot.",
	"Panda mad scientist mixing sparkling chemicals, artstation"
	]
	images = model.generate(prompt=prompts, num_inference_steps=30, guidance_scale=5.0, seed=42)
	```

	## Citation

	```bibtex
	@article{daras2025ambient,
	title={Ambient Diffusion Omni: Training Good Models with Bad Data},
	author={Daras, Giannis and Rodriguez-Munoz, Adrian and Klivans, Adam and Torralba, Antonio and Daskalakis, Constantinos},
	journal={arXiv preprint},
	year={2025},
	}
	```

	## License

	The model follows the [license](https://github.com/SonyResearch/micro_diffusion/blob/main/LICENSE) of the MicroDiffusion repo.