hecui102 commited on
Commit
9a3bcb8
·
verified ·
1 Parent(s): 21e2143

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +1 -1
README.md CHANGED
@@ -3,7 +3,7 @@ license: apache-2.0
3
  ---
4
  In this work, we present **AMD Hummingbird-XT**, an efficient **DiT-based** video generative model designed for high-quality video generation on client-grade GPUs with **5B parameters** .
5
 
6
- Hummingbird-XT is trained based on Wan2.2-5B-TI2V using **DMD step distillation** with carefully designed **data curation**, enabling **3-step generation** while preserving high visual fidelity and motion quality. To reduce the computational overhead of high-resolution video decoding in 3D convolution–based VAE decoders, we introduce a **lightweight and efficient VAE decoder** by replacing part of the 3D convolutions with depthwise separable convolutions. Additionally, to further extend the length of generated videos, we introduce **Hummingbird-XTX**, an efficient **autoregressive model** for **long-video generation** based on Wan-2.1-1.3B, which is capable of generating long videos with up to **XXX frames**.
7
 
8
  As a result, Hummingbird-XT achieves a **33×** speedup on Strix Halo iGPU and a **40×** speedup on AMD Instinct™ MI325, and supports generating **121-frame** videos at **720×1280** resolution across both server-grade (AMD Instinct™ MI300 and AMD Instinct™ MI325) and client-grade (Strix Halo and Navi48) devices. Quantitative results on the VBench-T2V and VBench-I2V benchmarks show that Hummingbird-XT achieves competitive performance compared to the original **Wan2.2-5B-TI2V** model.
9
 
 
3
  ---
4
  In this work, we present **AMD Hummingbird-XT**, an efficient **DiT-based** video generative model designed for high-quality video generation on client-grade GPUs with **5B parameters** .
5
 
6
+ Hummingbird-XT is trained based on Wan2.2-5B-TI2V using **DMD step distillation** with carefully designed **data curation**, enabling **3-step generation** while preserving high visual fidelity and motion quality. To reduce the computational overhead of high-resolution video decoding in 3D convolution–based VAE decoders, we introduce a **lightweight and efficient VAE decoder** by replacing part of the 3D convolutions with depthwise separable convolutions. Additionally, to further extend the length of generated videos, we introduce **Hummingbird-XTX**, an efficient **autoregressive model** for **long-video generation** based on Wan-2.1-1.3B, which is capable of generating long videos.
7
 
8
  As a result, Hummingbird-XT achieves a **33×** speedup on Strix Halo iGPU and a **40×** speedup on AMD Instinct™ MI325, and supports generating **121-frame** videos at **720×1280** resolution across both server-grade (AMD Instinct™ MI300 and AMD Instinct™ MI325) and client-grade (Strix Halo and Navi48) devices. Quantitative results on the VBench-T2V and VBench-I2V benchmarks show that Hummingbird-XT achieves competitive performance compared to the original **Wan2.2-5B-TI2V** model.
9