--- license: cc-by-nc-4.0 library_name: diffusers pipeline_tag: text-to-image tags: - diffusion - text-to-image - ambient diffusion - low-quality data - synthetic data --- # Ambient Diffusion Omni (Ambient-o): Training Good Models with Bad Data ![](ambient_logo.png) ## Model Description Ambient Diffusion Omni (Ambient-o) is a framework for using low-quality, synthetic, and out-of-distribution images to improve the quality of diffusion models. Unlike traditional approaches that rely on highly curated datasets, Ambient-o extracts valuable signal from all available images during training, including data typically discarded as "low-quality." This model card is for a text-to-image diffusion model trained on 8-H100 GPUs only for two days. The key innovation is the usage of synthetic data as "noisy" samples. ## Architecture Ambient-o builds upon the [MicroDiffusion](https://github.com/SonyResearch/micro_diffusion) cobebase -- we use a Mixture of Experts Diffusion Transformer totaling ~1.1B parameters. ## Text-to-Image Results Ambient-o demonstrates improvements in text-to-image generation. Compared to the two baselines of 1) filtering low-quality samples and 2) using all the data as equal, Ambient-o achieves increased diversity compared to 1) and enhanced quality compared to 2). Ambient-o achieves visual improvements without sacrificing diversity. ### Training Data Composition The model was trained on a diverse mixture of datasets: - **Conceptual Captions (CC12M)**: 12M image-caption pairs - **Segment Anything (SA1B)**: 11.1M high-resolution images with LLaVA-generated captions - **JourneyDB**: 4.4M synthetic image-caption pairs from Midjourney - **DiffusionDB**: 10.7M quality-filtered synthetic image-caption pairs from Stable Diffusion Data from DiffusionDB were treated as noisy samples. ### Technical Approach #### High Noise Regime At high diffusion times, the model leverages the theoretical insight that noise contracts distributional differences, reducing mismatch between high-quality target distribution and mixed-quality training data. This creates a beneficial bias-variance trade-off where low-quality samples increase sample size and reduce estimator variance. #### Low Noise Regime At low diffusion times, the model exploits locality properties of natural images, using small image crops that allow borrowing high-frequency details from out-of-distribution or synthetic images when their marginal distributions match the target data. ## Usage ```python import torch from micro_diffusion.models.model import create_latent_diffusion from huggingface_hub import hf_hub_download from safetensors import safe_open # Init model params = { 'latent_res': 64, 'in_channels': 4, 'pos_interp_scale': 2.0, } model = create_latent_diffusion(**params).to('cuda') # Download weights from HF model_dict_path = hf_hub_download(repo_id="giannisdaras/ambient-o", filename="model.safetensors") model_dict = {} with safe_open(model_dict_path, framework="pt", device="cpu") as f: for key in f.keys(): model_dict[key] = f.get_tensor(key) # Convert parameters to float32 + load float_model_params = { k: v.to(torch.float32) for k, v in model_dict.items() } model.dit.load_state_dict(float_model_params) # Eval mode model = model.eval() # Generate images prompts = [ "Pirate ship trapped in a cosmic maelstrom nebula, rendered in cosmic beach whirlpool engine, volumet", "A illustration from a graphic novel. A bustling city street under the shine of a full moon.", "A giant cobra snake made from corn", "A fierce garden gnome warrior, clad in armor crafted from leaves and bark, brandishes a tiny sword.", "A capybara made of lego sitting in a realistic, natural field", "a close-up of a fire spitting dragon, cinematic shot.", "Panda mad scientist mixing sparkling chemicals, artstation" ] images = model.generate(prompt=prompts, num_inference_steps=30, guidance_scale=5.0, seed=42) ``` ## Citation ```bibtex @article{daras2025ambient, title={Ambient Diffusion Omni: Training Good Models with Bad Data}, author={Daras, Giannis and Rodriguez-Munoz, Adrian and Klivans, Adam and Torralba, Antonio and Daskalakis, Constantinos}, journal={arXiv preprint}, year={2025}, } ``` ## License The model follows the [license](https://github.com/SonyResearch/micro_diffusion/blob/main/LICENSE) of the MicroDiffusion repo.