FlowBlending: Stage-Aware Multi-Model Sampling for Fast and High-Fidelity Video Generation
Abstract
In this work, we show that the impact of model capacity varies across timesteps: it is crucial for the early and late stages but largely negligible during the intermediate stage. Accordingly, we propose FlowBlending, a stage-aware multi-model sampling strategy that employs a large model and a small model at capacity-sensitive stages and intermediate stages, respectively. We further introduce simple criteria to choose stage boundaries and provide a velocity-divergence analysis as an effective proxy for identifying capacity-sensitive regions. Across LTX-Video (2B/13B) and WAN 2.1 (1.3B/14B), FlowBlending achieves up to 1.65x faster inference with 57.35% fewer FLOPs, while maintaining the visual fidelity, temporal coherence, and semantic alignment of the large models. FlowBlending is also compatible with existing sampling-acceleration techniques, enabling up to 2x additional speedup. Project page is available at: https://jibin86.github.io/flowblending_project_page.
Community
arXiv lens breakdown of this paper ๐ https://arxivlens.com/PaperView/Details/flowblending-stage-aware-multi-model-sampling-for-fast-and-high-fidelity-video-generation-5129-375c0b7d
- Executive Summary
- Detailed Breakdown
- Practical Applications
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- HiStream: Efficient High-Resolution Video Generation via Redundancy-Eliminated Streaming (2025)
- Glance: Accelerating Diffusion Models with 1 Sample (2025)
- USV: Unified Sparsification for Accelerating Video Diffusion Models (2025)
- Single-step Diffusion-based Video Coding with Semantic-Temporal Guidance (2025)
- Factorized Video Generation: Decoupling Scene Construction and Temporal Synthesis in Text-to-Video Diffusion Models (2025)
- Sphinx: Efficiently Serving Novel View Synthesis using Regression-Guided Selective Refinement (2025)
- SMRABooth: Subject and Motion Representation Alignment for Customized Video Generation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper
