amd
/

HummingbirdXT

hecui102 commited on 6 days ago

Commit

05774c0

verified ·

1 Parent(s): 97e5f4e

Update README.md

Files changed (1) hide show

README.md CHANGED Viewed

@@ -1,6 +1,16 @@
 ---
 license: apache-2.0
 ---
 In this work, we present **AMD Hummingbird-XT**, an efficient **DiT-based** video generative model designed for high-quality video generation on client-grade GPUs with **5B parameters** .
 Hummingbird-XT is trained based on Wan2.2-5B-TI2V using **DMD step distillation** with carefully designed **data curation**, enabling **3-step generation** while preserving high visual fidelity and motion quality. To reduce the computational overhead of high-resolution video decoding in 3D convolution–based VAE decoders, we introduce a **lightweight and efficient VAE decoder** by replacing part of the 3D convolutions with depthwise separable convolutions. Additionally, to further extend the length of generated videos, we introduce **Hummingbird-XTX**, an efficient **autoregressive model** for **long-video generation** based on Wan-2.1-1.3B, which is capable of generating long videos.

 ---
 license: apache-2.0
 ---
+<p align="center"><h1 align="center">
+Bridging the Last Mile: Deploying Hummingbird-XT for Efficient Video Generation on AMD Consumer-Grade Platforms
+</h1>
+</p>
+<p align="center">
+<h3 align="center"><a href="https://rocm.blogs.amd.com/artificial-intelligence/hummingbirdxt/README.html">Blog</a> | <a href="https://github.com/AMD-AGI/HummingbirdXT">Code</a></h3>
+</p>
 In this work, we present **AMD Hummingbird-XT**, an efficient **DiT-based** video generative model designed for high-quality video generation on client-grade GPUs with **5B parameters** .
 Hummingbird-XT is trained based on Wan2.2-5B-TI2V using **DMD step distillation** with carefully designed **data curation**, enabling **3-step generation** while preserving high visual fidelity and motion quality. To reduce the computational overhead of high-resolution video decoding in 3D convolution–based VAE decoders, we introduce a **lightweight and efficient VAE decoder** by replacing part of the 3D convolutions with depthwise separable convolutions. Additionally, to further extend the length of generated videos, we introduce **Hummingbird-XTX**, an efficient **autoregressive model** for **long-video generation** based on Wan-2.1-1.3B, which is capable of generating long videos.