tencent
/

HunyuanVideo-1.5

@@ -57,7 +57,9 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
 </p>
 ## 🔥🔥🔥 News
 * 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
 * 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
 * 🚀 Nov 24, 2025: We now support deepcache inference.
 * 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
@@ -72,6 +74,8 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
 If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
 - **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): A powerful and modular diffusion model GUI with a graph/nodes interface. ComfyUI supports HunyuanVideo-1.5 with various engineering optimizations for fast inference. We provide a [ComfyUI Usage Guide](./ComfyUI/README.md) for HunyuanVideo-1.5.
 - **Community-implemented ComfyUI Plugin** - [comfyui_hunyuanvideo_1.5_plugin](https://github.com/yuanyuan-spec/comfyui_hunyuanvideo_1.5_plugin): A community-implemented ComfyUI plugin for HunyuanVideo-1.5, offering both simplified and complete node sets for quick usage or deep workflow customization, with built-in automatic model download support.
@@ -88,7 +92,7 @@ If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
   - [x] Inference Code and checkpoints
   - [x] ComfyUI Support
   - [x] LightX2V Support
-  - [ ] Diffusers Support
   - [ ] Release all model weights (Sparse attention, distill model, and SR models)
 ## 📋 Table of Contents
@@ -103,6 +107,8 @@ If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
 - [🧱 Download Pretrained Models](#-download-pretrained-models)
 - [📝 Prompt Guide](#-prompt-guide)
 - [🔑 Usage](#-usage)
   - [Prompt Enhancement](#prompt-enhancement)
   - [Text to Video](#text-to-video)
   - [Image to Video](#image-to-video)
@@ -209,7 +215,8 @@ Prompt enhancement plays a crucial role in enabling our model to generate high-q
 For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
 ## 🔑 Usage
-### Video Generation
 For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls.
@@ -241,6 +248,7 @@ OUTPUT_PATH=./outputs/output.mp4
 REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
 N_INFERENCE_GPU=8 # Parallel inference GPU count
 CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
 SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
 SAGE_ATTN=true # Inference with SageAttention
 OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
@@ -257,6 +265,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
   --seed $SEED \
   --rewrite $REWRITE \
   --cfg_distilled $CFG_DISTILLED \
   --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
   --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
   --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
@@ -294,6 +303,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 | `--save_pre_sr_video` | bool | No | `false` | Save original video before super resolution (use `--save_pre_sr_video` or `--save_pre_sr_video true` to enable, only effective when super resolution is enabled) |
 | `--rewrite` | bool | No | `true` | Enable prompt rewriting (use `--rewrite false` or `--rewrite 0` to disable, may result in lower quality video generation) |
 | `--cfg_distilled` | bool | No | `false` | Enable CFG distilled model for faster inference (~2x speedup, use `--cfg_distilled` or `--cfg_distilled true` to enable) |
 | `--sparse_attn` | bool | No | `false` | Enable sparse attention for faster inference (~1.5-2x speedup, requires H-series GPUs, auto-enables CFG distilled, use `--sparse_attn` or `--sparse_attn true` to enable) |
 | `--offloading` | bool | No | `true` | Enable CPU offloading (use `--offloading false` or `--offloading 0` to disable for faster inference if GPU memory allows) |
 | `--group_offloading` | bool | No | `None` | Enable group offloading (default: None, automatically enabled if offloading is enabled. Use `--group_offloading` or `--group_offloading true/1` to enable, `--group_offloading false/0` to disable) |
@@ -323,6 +333,7 @@ The following table provides the optimal inference configurations (CFG scale, em
 | 720p I2V | 6 | None | 7 | 50 |
 | 480p T2V CFG Distilled | 1 | None | 5 | 50 |
 | 480p I2V CFG Distilled | 1 | None | 5 | 50 |
 | 720p T2V CFG Distilled | 1 | None | 9 | 50 |
 | 720p I2V CFG Distilled | 1 | None | 7 | 50 |
 | 720p T2V CFG Distilled Sparse | 1 | None | 9 | 50 |
@@ -332,6 +343,70 @@ The following table provides the optimal inference configurations (CFG scale, em
 **Please note that the cfg distilled model we provided, must use 50 steps to generate correct results.**
 ## 🧱 Models Cards
 |ModelName| Download                     |
@@ -340,6 +415,7 @@ The following table provides the optimal inference configurations (CFG scale, em
 |HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
 |HunyuanVideo-1.5-480P-T2V-cfg-distill | [480P-T2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
 |HunyuanVideo-1.5-480P-I2V-cfg-distill |[480P-I2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
 |HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
 |HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
 |HunyuanVideo-1.5-720P-T2V-cfg-distill| Comming soon |

 </p>
 ## 🔥🔥🔥 News
+* 🚀 Dec 05, 2025: **New Release**: We now release the 480p I2V step-distilled model, which generates videos in 8 or 12 steps (recommended)! On RTX 4090, end-to-end generation time is reduced by 75%, and a single RTX 4090 can generate videos within 75 seconds. The step-distilled model maintains comparable quality to the original model while achieving significant speedup. For even faster generation, you can also try 4 steps (faster speed with slightly reduced quality). See [Step Distillation Comparison](./assets/step_distillation_comparison.md) for detailed quality comparisons. 🔥🔥🔥🆕
 * 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
+* 🎉 **Diffusers Support**: HunyuanVideo-1.5 is now available on Hugging Face Diffusers! Check out [Diffusers collection](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) for easy integration. 🔥🔥🔥🆕
 * 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
 * 🚀 Nov 24, 2025: We now support deepcache inference.
 * 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
 If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
+- **Diffusers** - [HunyuanVideo-1.5 Diffusers](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15): Official Hugging Face Diffusers integration for HunyuanVideo-1.5. Easily use HunyuanVideo-1.5 with the Diffusers library for seamless integration into your projects. See [Usage with Diffusers](#usage-with-diffusers) section for details.
 - **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): A powerful and modular diffusion model GUI with a graph/nodes interface. ComfyUI supports HunyuanVideo-1.5 with various engineering optimizations for fast inference. We provide a [ComfyUI Usage Guide](./ComfyUI/README.md) for HunyuanVideo-1.5.
 - **Community-implemented ComfyUI Plugin** - [comfyui_hunyuanvideo_1.5_plugin](https://github.com/yuanyuan-spec/comfyui_hunyuanvideo_1.5_plugin): A community-implemented ComfyUI plugin for HunyuanVideo-1.5, offering both simplified and complete node sets for quick usage or deep workflow customization, with built-in automatic model download support.
   - [x] Inference Code and checkpoints
   - [x] ComfyUI Support
   - [x] LightX2V Support
+  - [x] Diffusers Support
   - [ ] Release all model weights (Sparse attention, distill model, and SR models)
 ## 📋 Table of Contents
 - [🧱 Download Pretrained Models](#-download-pretrained-models)
 - [📝 Prompt Guide](#-prompt-guide)
 - [🔑 Usage](#-usage)
+  - [Inference with Source Code](#inference-with-source-code)
+  - [Usage with Diffusers](#usage-with-diffusers)
   - [Prompt Enhancement](#prompt-enhancement)
   - [Text to Video](#text-to-video)
   - [Image to Video](#image-to-video)
 For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
 ## 🔑 Usage
+### Inference with Source Code
 For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls.
 REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
 N_INFERENCE_GPU=8 # Parallel inference GPU count
 CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
+ENABLE_STEP_DISTILL=true # Enable step distilled model for 480p I2V, recommended 8 or 12 steps, 75% speedup on RTX 4090
 SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
 SAGE_ATTN=true # Inference with SageAttention
 OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
   --seed $SEED \
   --rewrite $REWRITE \
   --cfg_distilled $CFG_DISTILLED \
+  --enable_step_distill $ENABLE_STEP_DISTILL \
   --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
   --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
   --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
 | `--save_pre_sr_video` | bool | No | `false` | Save original video before super resolution (use `--save_pre_sr_video` or `--save_pre_sr_video true` to enable, only effective when super resolution is enabled) |
 | `--rewrite` | bool | No | `true` | Enable prompt rewriting (use `--rewrite false` or `--rewrite 0` to disable, may result in lower quality video generation) |
 | `--cfg_distilled` | bool | No | `false` | Enable CFG distilled model for faster inference (~2x speedup, use `--cfg_distilled` or `--cfg_distilled true` to enable) |
+| `--enable_step_distill` | bool | No | `false` | Enable step distilled model for 480p I2V (recommended 8 or 12 steps, ~75% speedup on RTX 4090, use `--enable_step_distill` or `--enable_step_distill true` to enable) |
 | `--sparse_attn` | bool | No | `false` | Enable sparse attention for faster inference (~1.5-2x speedup, requires H-series GPUs, auto-enables CFG distilled, use `--sparse_attn` or `--sparse_attn true` to enable) |
 | `--offloading` | bool | No | `true` | Enable CPU offloading (use `--offloading false` or `--offloading 0` to disable for faster inference if GPU memory allows) |
 | `--group_offloading` | bool | No | `None` | Enable group offloading (default: None, automatically enabled if offloading is enabled. Use `--group_offloading` or `--group_offloading true/1` to enable, `--group_offloading false/0` to disable) |
 | 720p I2V | 6 | None | 7 | 50 |
 | 480p T2V CFG Distilled | 1 | None | 5 | 50 |
 | 480p I2V CFG Distilled | 1 | None | 5 | 50 |
+| 480p I2V Step Distilled | 1 | None | 7 | 8 or 12 (recommended) |
 | 720p T2V CFG Distilled | 1 | None | 9 | 50 |
 | 720p I2V CFG Distilled | 1 | None | 7 | 50 |
 | 720p T2V CFG Distilled Sparse | 1 | None | 9 | 50 |
 **Please note that the cfg distilled model we provided, must use 50 steps to generate correct results.**
+### Usage with Diffusers
+HunyuanVideo-1.5 is available on Hugging Face Diffusers! You can easily use it with the Diffusers library:
+**Basic Usage:**
+```python
+import torch
+dtype = torch.bfloat16
+device = "cuda:0"
+from diffusers import HunyuanVideo15Pipeline
+from diffusers.utils import export_to_video
+pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
+pipe.enable_model_cpu_offload()
+pipe.vae.enable_tiling()
+generator = torch.Generator(device=device).manual_seed(seed)
+video = pipe(
+    prompt=prompt,
+    generator=generator,
+    num_frames=121,
+    num_inference_steps=50,
+).frames[0]
+export_to_video(video, "output.mp4", fps=24)
+```
+**Optimized Usage with Attention Backend:**
+HunyuanVideo-1.5 uses attention masks with variable-length sequences. For best performance, we recommend using an attention backend that handles padding efficiently.
+We recommend installing kernels (`pip install kernels`) to access prebuilt attention kernels.
+```python
+import torch
+dtype = torch.bfloat16
+device = "cuda:0"
+from diffusers import HunyuanVideo15Pipeline, attention_backend
+from diffusers.utils import export_to_video
+pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
+pipe.enable_model_cpu_offload()
+pipe.vae.enable_tiling()
+generator = torch.Generator(device=device).manual_seed(seed)
+with attention_backend("_flash_3_hub"): # or `"flash_hub"` if you are not on H100/H800
+    video = pipe(
+        prompt=prompt,
+        generator=generator,
+        num_frames=121,
+        num_inference_steps=50,
+    ).frames[0]
+    export_to_video(video, "output.mp4", fps=24)
+```
+For more details, please visit [HunyuanVideo-1.5 Diffusers Collection](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15).
 ## 🧱 Models Cards
 |ModelName| Download                     |
 |HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
 |HunyuanVideo-1.5-480P-T2V-cfg-distill | [480P-T2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
 |HunyuanVideo-1.5-480P-I2V-cfg-distill |[480P-I2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
+|HunyuanVideo-1.5-480P-I2V-step-distill |[480P-I2V-step-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_step_distilled) |
 |HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
 |HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
 |HunyuanVideo-1.5-720P-T2V-cfg-distill| Comming soon |

README_CN.md CHANGED Viewed

@@ -40,7 +40,9 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型，仅需83亿参数即
 </p>
 ## 🔥🔥🔥 最新动态
 * 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练，我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型，或使用 LoRA 进行微调，请使用 Muon 优化器。**
 * 🚀 Nov 27, 2025: 我们现已支持 cache 推理（deepcache, teacache, taylorcache），可极大加速推理！请 pull 最新代码体验。 🔥🔥🔥🆕
 * 🚀 Nov 24, 2025: 我们现已支持 deepcache 推理。
 * 👋 Nov 20, 2025: 我们开源了 HunyuanVideo-1.5的代码和推理权重
@@ -54,6 +56,8 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型，仅需83亿参数即
 如果您在项目中使用或开发了 HunyuanVideo-1.5，欢迎告知我们。
 - **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): 一个强大且模块化的扩散模型图形界面，采用节点式工作流。ComfyUI 支持 HunyuanVideo-1.5，并提供多种工程加速优化以实现快速推理。
 我们提供了一个 [ComfyUI 使用指南](./ComfyUI/README.md) 用于 HunyuanVideo-1.5。
 - **社区实现的 ComfyUI 插件** - [comfyui_hunyuanvideo_1.5_plugin](https://github.com/yuanyuan-spec/comfyui_hunyuanvideo_1.5_plugin): 社区实现的 HunyuanVideo-1.5 ComfyUI 插件，提供简化版和完整版节点集，支持快速使用或深度工作流定制，内置自动模型下载功能。
@@ -70,7 +74,7 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型，仅需83亿参数即
   - [x] 推理代码和模型权重
   - [x] 支持 ComfyUI
   - [x] 支持 LightX2V
-  - [ ] Diffusers 支持
   - [ ] 发布所有模型权重（稀疏注意力、蒸馏模型和超分辨率模型）
@@ -86,7 +90,8 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型，仅需83亿参数即
 - [🧱 下载预训练模型](#-下载预训练模型)
 - [📝 提示词指南](#-提示词指南)
 - [🔑 使用方法](#-使用方法)
-  - [视频生成](#视频生成)
   - [命令行参数](#命令行参数)
   - [最优推理配置](#最优推理配置)
 - [🧱 模型卡片](#-模型卡片)
@@ -194,7 +199,8 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
 ## 🔑 使用方法
-### 视频生成
 对于提示词重写，我们推荐使用 Gemini 或通过 vLLM 部署的大模型。当前代码库仅支持兼容 vLLM 接口的模型，如果您希望使用 Gemini，需自行实现相关接口调用。
@@ -227,6 +233,7 @@ OUTPUT_PATH=./outputs/output.mp4
 REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
 N_INFERENCE_GPU=8 # 并行推理 GPU 数量
 CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理，2倍加速
 SPARSE_ATTN=false # 使用稀疏注意力进行推理（仅 720p 模型配备了稀疏注意力）。请确保 flex-block-attn 已安装
 SAGE_ATTN=true # 使用 SageAttention 进行推理
 OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效，会显著增加 CPU 内存占用，但能够提速
@@ -243,6 +250,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
   --seed $SEED \
   --rewrite $REWRITE \
   --cfg_distilled $CFG_DISTILLED \
   --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
   --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
   --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
@@ -279,6 +287,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 | `--save_pre_sr_video` | bool | 否 | `false` | 保存超分辨率处理前的原始视频（使用 `--save_pre_sr_video` 或 `--save_pre_sr_video true` 来启用，仅在启用超分辨率时有效） |
 | `--rewrite` | bool | 否 | `true` | 启用提示词重写（使用 `--rewrite false` 或 `--rewrite 0` 来禁用，禁用可能导致视频生成质量降低） |
 | `--cfg_distilled` | bool | 否 | `false` | 启用 CFG 蒸馏模型以加速推理（约 2 倍加速，使用 `--cfg_distilled` 或 `--cfg_distilled true` 来启用） |
 | `--sparse_attn` | bool | 否 | `false` | 启用稀疏注意力以加速推理（约 1.5-2 倍加速，需要 H 系列 GPU，会自动启用 CFG 蒸馏，使用 `--sparse_attn` 或 `--sparse_attn true` 来启用） |
 | `--offloading` | bool | 否 | `true` | 启用 CPU 卸载（使用 `--offloading false` 或 `--offloading 0` 来禁用，如果 GPU 内存允许，禁用后速度会更快） |
 | `--group_offloading` | bool | 否 | `None` | 启用组卸载（默认：None，如果启用了 offloading 则自动启用。使用 `--group_offloading` 或 `--group_offloading true/1` 来启用，`--group_offloading false/0` 来禁用） |
@@ -309,6 +318,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 | 720p I2V | 6 | None | 7 | 50 |
 | 480p T2V cfg 蒸馏 | 1 | None | 5 | 50 |
 | 480p I2V cfg 蒸馏 | 1 | None | 5 | 50 |
 | 720p T2V cfg 蒸馏 | 1 | None | 9 | 50 |
 | 720p I2V cfg 蒸馏 | 1 | None | 7 | 50 |
 | 720p T2V cfg 蒸馏稀疏 | 1 | None | 9 | 50 |
@@ -318,6 +328,70 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 **请注意我们提供的cfg蒸馏模型，需要50步的推理步数来获得正确的结果.**
 ## 🧱 模型卡片
 |模型名称| 下载链接                     |
@@ -326,6 +400,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 |HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
 |HunyuanVideo-1.5-480P-T2V-cfg-distill | [480P-T2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
 |HunyuanVideo-1.5-480P-I2V-cfg-distill |[480P-I2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
 |HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
 |HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
 |HunyuanVideo-1.5-720P-T2V-cfg-distill| Comming soon |

 </p>
 ## 🔥🔥🔥 最新动态
+* 🚀 Dec 05, 2025: **新模型发布**：我们现已发布 480p I2V 步数蒸馏模型，建议使用 8 或 12 步生成视频！在 RTX 4090 上，端到端生成耗时减少 75%，单卡 RTX 4090 可在 75 秒内生成视频。步数蒸馏模型在保持与原模型相当质量的同时实现了显著的加速。如需更快的生成速度，您也可以尝试使用4步推理（速度更快，质量略有下降）。详细的质量对比请参见[步数蒸馏对比文档](./assets/step_distillation_comparison.md)。 🔥🔥🔥🆕
 * 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练，我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型，或使用 LoRA 进行微调，请使用 Muon 优化器。**
+* 🎉 **Diffusers 支持**：HunyuanVideo-1.5 现已支持 Hugging Face Diffusers！查看我们的 [Diffusers 集合](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) 以便轻松集成。 🔥🔥🔥🆕
 * 🚀 Nov 27, 2025: 我们现已支持 cache 推理（deepcache, teacache, taylorcache），可极大加速推理！请 pull 最新代码体验。 🔥🔥🔥🆕
 * 🚀 Nov 24, 2025: 我们现已支持 deepcache 推理。
 * 👋 Nov 20, 2025: 我们开源了 HunyuanVideo-1.5的代码和推理权重
 如果您在项目中使用或开发了 HunyuanVideo-1.5，欢迎告知我们。
+- **Diffusers** - [HunyuanVideo-1.5 Diffusers](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15): HunyuanVideo-1.5 的官方 Hugging Face Diffusers 集成。使用 Diffusers 库轻松使用 HunyuanVideo-1.5，无缝集成到您的项目中。详情请参阅[使用 Diffusers](#使用-diffusers) 部分。
 - **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): 一个强大且模块化的扩散模型图形界面，采用节点式工作流。ComfyUI 支持 HunyuanVideo-1.5，并提供多种工程加速优化以实现快速推理。
 我们提供了一个 [ComfyUI 使用指南](./ComfyUI/README.md) 用于 HunyuanVideo-1.5。
 - **社区实现的 ComfyUI 插件** - [comfyui_hunyuanvideo_1.5_plugin](https://github.com/yuanyuan-spec/comfyui_hunyuanvideo_1.5_plugin): 社区实现的 HunyuanVideo-1.5 ComfyUI 插件，提供简化版和完整版节点集，支持快速使用或深度工作流定制，内置自动模型下载功能。
   - [x] 推理代码和模型权重
   - [x] 支持 ComfyUI
   - [x] 支持 LightX2V
+  - [x] Diffusers 支持
   - [ ] 发布所有模型权重（稀疏注意力、蒸馏模型和超分辨率模型）
 - [🧱 下载预训练模型](#-下载预训练模型)
 - [📝 提示词指南](#-提示词指南)
 - [🔑 使用方法](#-使用方法)
+  - [使用源代码推理](#使用源代码推理)
+  - [使用 Diffusers](#使用-diffusers)
   - [命令行参数](#命令行参数)
   - [最优推理配置](#最优推理配置)
 - [🧱 模型卡片](#-模型卡片)
 ## 🔑 使用方法
+### 使用源代码推理
 对于提示词重写，我们推荐使用 Gemini 或通过 vLLM 部署的大模型。当前代码库仅支持兼容 vLLM 接口的模型，如果您希望使用 Gemini，需自行实现相关接口调用。
 REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
 N_INFERENCE_GPU=8 # 并行推理 GPU 数量
 CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理，2倍加速
+ENABLE_STEP_DISTILL=true # 启用 480p I2V 步数蒸馏模型，推荐 8 或 12 步，在 RTX 4090 上可提速 75%
 SPARSE_ATTN=false # 使用稀疏注意力进行推理（仅 720p 模型配备了稀疏注意力）。请确保 flex-block-attn 已安装
 SAGE_ATTN=true # 使用 SageAttention 进行推理
 OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效，会显著增加 CPU 内存占用，但能够提速
   --seed $SEED \
   --rewrite $REWRITE \
   --cfg_distilled $CFG_DISTILLED \
+  --enable_step_distill $ENABLE_STEP_DISTILL \
   --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
   --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
   --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
 | `--save_pre_sr_video` | bool | 否 | `false` | 保存超分辨率处理前的原始视频（使用 `--save_pre_sr_video` 或 `--save_pre_sr_video true` 来启用，仅在启用超分辨率时有效） |
 | `--rewrite` | bool | 否 | `true` | 启用提示词重写（使用 `--rewrite false` 或 `--rewrite 0` 来禁用，禁用可能导致视频生成质量降低） |
 | `--cfg_distilled` | bool | 否 | `false` | 启用 CFG 蒸馏模型以加速推理（约 2 倍加速，使用 `--cfg_distilled` 或 `--cfg_distilled true` 来启用） |
+| `--enable_step_distill` | bool | 否 | `false` | 启用 480p I2V 步数蒸馏模型（推荐 8 或 12 步，在 RTX 4090 上可提速约 75%，使用 `--enable_step_distill` 或 `--enable_step_distill true` 来启用） |
 | `--sparse_attn` | bool | 否 | `false` | 启用稀疏注意力以加速推理（约 1.5-2 倍加速，需要 H 系列 GPU，会自动启用 CFG 蒸馏，使用 `--sparse_attn` 或 `--sparse_attn true` 来启用） |
 | `--offloading` | bool | 否 | `true` | 启用 CPU 卸载（使用 `--offloading false` 或 `--offloading 0` 来禁用，如果 GPU 内存允许，禁用后速度会更快） |
 | `--group_offloading` | bool | 否 | `None` | 启用组卸载（默认：None，如果启用了 offloading 则自动启用。使用 `--group_offloading` 或 `--group_offloading true/1` 来启用，`--group_offloading false/0` 来禁用） |
 | 720p I2V | 6 | None | 7 | 50 |
 | 480p T2V cfg 蒸馏 | 1 | None | 5 | 50 |
 | 480p I2V cfg 蒸馏 | 1 | None | 5 | 50 |
+| 480p I2V 步数蒸馏 | 1 | None | 7 | 8 或 12（推荐） |
 | 720p T2V cfg 蒸馏 | 1 | None | 9 | 50 |
 | 720p I2V cfg 蒸馏 | 1 | None | 7 | 50 |
 | 720p T2V cfg 蒸馏稀疏 | 1 | None | 9 | 50 |
 **请注意我们提供的cfg蒸馏模型，需要50步的推理步数来获得正确的结果.**
+### 使用 Diffusers
+HunyuanVideo-1.5 现已支持 Hugging Face Diffusers！您可以使用 Diffusers 库轻松使用：
+**基础使用：**
+```python
+import torch
+dtype = torch.bfloat16
+device = "cuda:0"
+from diffusers import HunyuanVideo15Pipeline
+from diffusers.utils import export_to_video
+pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
+pipe.enable_model_cpu_offload()
+pipe.vae.enable_tiling()
+generator = torch.Generator(device=device).manual_seed(seed)
+video = pipe(
+    prompt=prompt,
+    generator=generator,
+    num_frames=121,
+    num_inference_steps=50,
+).frames[0]
+export_to_video(video, "output.mp4", fps=24)
+```
+**使用注意力后端优化：**
+HunyuanVideo-1.5 使用可变长度序列的注意力掩码。为了获得最佳性能，我们建议使用能够高效处理填充的注意力后端。
+我们建议安装 kernels（`pip install kernels`）以访问预构建的注意力内核。
+```python
+import torch
+dtype = torch.bfloat16
+device = "cuda:0"
+from diffusers import HunyuanVideo15Pipeline, attention_backend
+from diffusers.utils import export_to_video
+pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
+pipe.enable_model_cpu_offload()
+pipe.vae.enable_tiling()
+generator = torch.Generator(device=device).manual_seed(seed)
+with attention_backend("_flash_3_hub"): # 如果您不在 H100/H800 上，可以使用 `"flash_hub"`
+    video = pipe(
+        prompt=prompt,
+        generator=generator,
+        num_frames=121,
+        num_inference_steps=50,
+    ).frames[0]
+    export_to_video(video, "output.mp4", fps=24)
+```
+更多详情，请访问 [HunyuanVideo-1.5 Diffusers 集合](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15)。
 ## 🧱 模型卡片
 |模型名称| 下载链接                     |
 |HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
 |HunyuanVideo-1.5-480P-T2V-cfg-distill | [480P-T2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
 |HunyuanVideo-1.5-480P-I2V-cfg-distill |[480P-I2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
+|HunyuanVideo-1.5-480P-I2V-step-distill |[480P-I2V-step-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_step_distilled) |
 |HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
 |HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
 |HunyuanVideo-1.5-720P-T2V-cfg-distill| Comming soon |