KevinNg99 commited on
Commit
3dda2be
·
1 Parent(s): 83b3d50

update README

Browse files
Files changed (2) hide show
  1. README.md +78 -2
  2. README_CN.md +78 -3
README.md CHANGED
@@ -57,7 +57,9 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
57
  </p>
58
 
59
  ## 🔥🔥🔥 News
 
60
  * 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
 
61
  * 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
62
  * 🚀 Nov 24, 2025: We now support deepcache inference.
63
  * 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
@@ -72,6 +74,8 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
72
 
73
  If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
74
 
 
 
75
  - **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): A powerful and modular diffusion model GUI with a graph/nodes interface. ComfyUI supports HunyuanVideo-1.5 with various engineering optimizations for fast inference. We provide a [ComfyUI Usage Guide](./ComfyUI/README.md) for HunyuanVideo-1.5.
76
 
77
  - **Community-implemented ComfyUI Plugin** - [comfyui_hunyuanvideo_1.5_plugin](https://github.com/yuanyuan-spec/comfyui_hunyuanvideo_1.5_plugin): A community-implemented ComfyUI plugin for HunyuanVideo-1.5, offering both simplified and complete node sets for quick usage or deep workflow customization, with built-in automatic model download support.
@@ -88,7 +92,7 @@ If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
88
  - [x] Inference Code and checkpoints
89
  - [x] ComfyUI Support
90
  - [x] LightX2V Support
91
- - [ ] Diffusers Support
92
  - [ ] Release all model weights (Sparse attention, distill model, and SR models)
93
 
94
  ## 📋 Table of Contents
@@ -103,6 +107,8 @@ If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
103
  - [🧱 Download Pretrained Models](#-download-pretrained-models)
104
  - [📝 Prompt Guide](#-prompt-guide)
105
  - [🔑 Usage](#-usage)
 
 
106
  - [Prompt Enhancement](#prompt-enhancement)
107
  - [Text to Video](#text-to-video)
108
  - [Image to Video](#image-to-video)
@@ -209,7 +215,8 @@ Prompt enhancement plays a crucial role in enabling our model to generate high-q
209
  For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
210
 
211
  ## 🔑 Usage
212
- ### Video Generation
 
213
 
214
  For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls.
215
 
@@ -241,6 +248,7 @@ OUTPUT_PATH=./outputs/output.mp4
241
  REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
242
  N_INFERENCE_GPU=8 # Parallel inference GPU count
243
  CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
 
244
  SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
245
  SAGE_ATTN=true # Inference with SageAttention
246
  OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
@@ -257,6 +265,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
257
  --seed $SEED \
258
  --rewrite $REWRITE \
259
  --cfg_distilled $CFG_DISTILLED \
 
260
  --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
261
  --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
262
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
@@ -294,6 +303,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
294
  | `--save_pre_sr_video` | bool | No | `false` | Save original video before super resolution (use `--save_pre_sr_video` or `--save_pre_sr_video true` to enable, only effective when super resolution is enabled) |
295
  | `--rewrite` | bool | No | `true` | Enable prompt rewriting (use `--rewrite false` or `--rewrite 0` to disable, may result in lower quality video generation) |
296
  | `--cfg_distilled` | bool | No | `false` | Enable CFG distilled model for faster inference (~2x speedup, use `--cfg_distilled` or `--cfg_distilled true` to enable) |
 
297
  | `--sparse_attn` | bool | No | `false` | Enable sparse attention for faster inference (~1.5-2x speedup, requires H-series GPUs, auto-enables CFG distilled, use `--sparse_attn` or `--sparse_attn true` to enable) |
298
  | `--offloading` | bool | No | `true` | Enable CPU offloading (use `--offloading false` or `--offloading 0` to disable for faster inference if GPU memory allows) |
299
  | `--group_offloading` | bool | No | `None` | Enable group offloading (default: None, automatically enabled if offloading is enabled. Use `--group_offloading` or `--group_offloading true/1` to enable, `--group_offloading false/0` to disable) |
@@ -323,6 +333,7 @@ The following table provides the optimal inference configurations (CFG scale, em
323
  | 720p I2V | 6 | None | 7 | 50 |
324
  | 480p T2V CFG Distilled | 1 | None | 5 | 50 |
325
  | 480p I2V CFG Distilled | 1 | None | 5 | 50 |
 
326
  | 720p T2V CFG Distilled | 1 | None | 9 | 50 |
327
  | 720p I2V CFG Distilled | 1 | None | 7 | 50 |
328
  | 720p T2V CFG Distilled Sparse | 1 | None | 9 | 50 |
@@ -332,6 +343,70 @@ The following table provides the optimal inference configurations (CFG scale, em
332
 
333
  **Please note that the cfg distilled model we provided, must use 50 steps to generate correct results.**
334
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
335
 
336
  ## 🧱 Models Cards
337
  |ModelName| Download |
@@ -340,6 +415,7 @@ The following table provides the optimal inference configurations (CFG scale, em
340
  |HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
341
  |HunyuanVideo-1.5-480P-T2V-cfg-distill | [480P-T2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
342
  |HunyuanVideo-1.5-480P-I2V-cfg-distill |[480P-I2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
 
343
  |HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
344
  |HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
345
  |HunyuanVideo-1.5-720P-T2V-cfg-distill| Comming soon |
 
57
  </p>
58
 
59
  ## 🔥🔥🔥 News
60
+ * 🚀 Dec 05, 2025: **New Release**: We now release the 480p I2V step-distilled model, which generates videos in 8 or 12 steps (recommended)! On RTX 4090, end-to-end generation time is reduced by 75%, and a single RTX 4090 can generate videos within 75 seconds. The step-distilled model maintains comparable quality to the original model while achieving significant speedup. For even faster generation, you can also try 4 steps (faster speed with slightly reduced quality). See [Step Distillation Comparison](./assets/step_distillation_comparison.md) for detailed quality comparisons. 🔥🔥🔥🆕
61
  * 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
62
+ * 🎉 **Diffusers Support**: HunyuanVideo-1.5 is now available on Hugging Face Diffusers! Check out [Diffusers collection](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) for easy integration. 🔥🔥🔥🆕
63
  * 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
64
  * 🚀 Nov 24, 2025: We now support deepcache inference.
65
  * 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
 
74
 
75
  If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
76
 
77
+ - **Diffusers** - [HunyuanVideo-1.5 Diffusers](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15): Official Hugging Face Diffusers integration for HunyuanVideo-1.5. Easily use HunyuanVideo-1.5 with the Diffusers library for seamless integration into your projects. See [Usage with Diffusers](#usage-with-diffusers) section for details.
78
+
79
  - **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): A powerful and modular diffusion model GUI with a graph/nodes interface. ComfyUI supports HunyuanVideo-1.5 with various engineering optimizations for fast inference. We provide a [ComfyUI Usage Guide](./ComfyUI/README.md) for HunyuanVideo-1.5.
80
 
81
  - **Community-implemented ComfyUI Plugin** - [comfyui_hunyuanvideo_1.5_plugin](https://github.com/yuanyuan-spec/comfyui_hunyuanvideo_1.5_plugin): A community-implemented ComfyUI plugin for HunyuanVideo-1.5, offering both simplified and complete node sets for quick usage or deep workflow customization, with built-in automatic model download support.
 
92
  - [x] Inference Code and checkpoints
93
  - [x] ComfyUI Support
94
  - [x] LightX2V Support
95
+ - [x] Diffusers Support
96
  - [ ] Release all model weights (Sparse attention, distill model, and SR models)
97
 
98
  ## 📋 Table of Contents
 
107
  - [🧱 Download Pretrained Models](#-download-pretrained-models)
108
  - [📝 Prompt Guide](#-prompt-guide)
109
  - [🔑 Usage](#-usage)
110
+ - [Inference with Source Code](#inference-with-source-code)
111
+ - [Usage with Diffusers](#usage-with-diffusers)
112
  - [Prompt Enhancement](#prompt-enhancement)
113
  - [Text to Video](#text-to-video)
114
  - [Image to Video](#image-to-video)
 
215
  For users seeking to optimize prompts for other large models, it is recommended to consult the definition of `t2v_rewrite_system_prompt` in the file `hyvideo/utils/rewrite/t2v_prompt.py` to guide text-to-video rewriting. Similarly, for image-to-video rewriting, refer to the definition of `i2v_rewrite_system_prompt` in `hyvideo/utils/rewrite/i2v_prompt.py`.
216
 
217
  ## 🔑 Usage
218
+
219
+ ### Inference with Source Code
220
 
221
  For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls.
222
 
 
248
  REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
249
  N_INFERENCE_GPU=8 # Parallel inference GPU count
250
  CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
251
+ ENABLE_STEP_DISTILL=true # Enable step distilled model for 480p I2V, recommended 8 or 12 steps, 75% speedup on RTX 4090
252
  SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
253
  SAGE_ATTN=true # Inference with SageAttention
254
  OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
 
265
  --seed $SEED \
266
  --rewrite $REWRITE \
267
  --cfg_distilled $CFG_DISTILLED \
268
+ --enable_step_distill $ENABLE_STEP_DISTILL \
269
  --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
270
  --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
271
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
 
303
  | `--save_pre_sr_video` | bool | No | `false` | Save original video before super resolution (use `--save_pre_sr_video` or `--save_pre_sr_video true` to enable, only effective when super resolution is enabled) |
304
  | `--rewrite` | bool | No | `true` | Enable prompt rewriting (use `--rewrite false` or `--rewrite 0` to disable, may result in lower quality video generation) |
305
  | `--cfg_distilled` | bool | No | `false` | Enable CFG distilled model for faster inference (~2x speedup, use `--cfg_distilled` or `--cfg_distilled true` to enable) |
306
+ | `--enable_step_distill` | bool | No | `false` | Enable step distilled model for 480p I2V (recommended 8 or 12 steps, ~75% speedup on RTX 4090, use `--enable_step_distill` or `--enable_step_distill true` to enable) |
307
  | `--sparse_attn` | bool | No | `false` | Enable sparse attention for faster inference (~1.5-2x speedup, requires H-series GPUs, auto-enables CFG distilled, use `--sparse_attn` or `--sparse_attn true` to enable) |
308
  | `--offloading` | bool | No | `true` | Enable CPU offloading (use `--offloading false` or `--offloading 0` to disable for faster inference if GPU memory allows) |
309
  | `--group_offloading` | bool | No | `None` | Enable group offloading (default: None, automatically enabled if offloading is enabled. Use `--group_offloading` or `--group_offloading true/1` to enable, `--group_offloading false/0` to disable) |
 
333
  | 720p I2V | 6 | None | 7 | 50 |
334
  | 480p T2V CFG Distilled | 1 | None | 5 | 50 |
335
  | 480p I2V CFG Distilled | 1 | None | 5 | 50 |
336
+ | 480p I2V Step Distilled | 1 | None | 7 | 8 or 12 (recommended) |
337
  | 720p T2V CFG Distilled | 1 | None | 9 | 50 |
338
  | 720p I2V CFG Distilled | 1 | None | 7 | 50 |
339
  | 720p T2V CFG Distilled Sparse | 1 | None | 9 | 50 |
 
343
 
344
  **Please note that the cfg distilled model we provided, must use 50 steps to generate correct results.**
345
 
346
+ ### Usage with Diffusers
347
+
348
+ HunyuanVideo-1.5 is available on Hugging Face Diffusers! You can easily use it with the Diffusers library:
349
+
350
+ **Basic Usage:**
351
+
352
+ ```python
353
+ import torch
354
+
355
+ dtype = torch.bfloat16
356
+ device = "cuda:0"
357
+
358
+ from diffusers import HunyuanVideo15Pipeline
359
+ from diffusers.utils import export_to_video
360
+
361
+ pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
362
+ pipe.enable_model_cpu_offload()
363
+ pipe.vae.enable_tiling()
364
+
365
+ generator = torch.Generator(device=device).manual_seed(seed)
366
+
367
+ video = pipe(
368
+ prompt=prompt,
369
+ generator=generator,
370
+ num_frames=121,
371
+ num_inference_steps=50,
372
+ ).frames[0]
373
+
374
+ export_to_video(video, "output.mp4", fps=24)
375
+ ```
376
+
377
+ **Optimized Usage with Attention Backend:**
378
+
379
+ HunyuanVideo-1.5 uses attention masks with variable-length sequences. For best performance, we recommend using an attention backend that handles padding efficiently.
380
+
381
+ We recommend installing kernels (`pip install kernels`) to access prebuilt attention kernels.
382
+
383
+ ```python
384
+ import torch
385
+
386
+ dtype = torch.bfloat16
387
+ device = "cuda:0"
388
+
389
+ from diffusers import HunyuanVideo15Pipeline, attention_backend
390
+ from diffusers.utils import export_to_video
391
+
392
+ pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
393
+ pipe.enable_model_cpu_offload()
394
+ pipe.vae.enable_tiling()
395
+
396
+ generator = torch.Generator(device=device).manual_seed(seed)
397
+
398
+ with attention_backend("_flash_3_hub"): # or `"flash_hub"` if you are not on H100/H800
399
+ video = pipe(
400
+ prompt=prompt,
401
+ generator=generator,
402
+ num_frames=121,
403
+ num_inference_steps=50,
404
+ ).frames[0]
405
+ export_to_video(video, "output.mp4", fps=24)
406
+ ```
407
+
408
+ For more details, please visit [HunyuanVideo-1.5 Diffusers Collection](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15).
409
+
410
 
411
  ## 🧱 Models Cards
412
  |ModelName| Download |
 
415
  |HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
416
  |HunyuanVideo-1.5-480P-T2V-cfg-distill | [480P-T2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
417
  |HunyuanVideo-1.5-480P-I2V-cfg-distill |[480P-I2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
418
+ |HunyuanVideo-1.5-480P-I2V-step-distill |[480P-I2V-step-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_step_distilled) |
419
  |HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
420
  |HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
421
  |HunyuanVideo-1.5-720P-T2V-cfg-distill| Comming soon |
README_CN.md CHANGED
@@ -40,7 +40,9 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
40
  </p>
41
 
42
  ## 🔥🔥🔥 最新动态
 
43
  * 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练,我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型,或使用 LoRA 进行微调,请使用 Muon 优化器。**
 
44
  * 🚀 Nov 27, 2025: 我们现已支持 cache 推理(deepcache, teacache, taylorcache),可极大加速推理!请 pull 最新代码体验。 🔥🔥🔥🆕
45
  * 🚀 Nov 24, 2025: 我们现已支持 deepcache 推理。
46
  * 👋 Nov 20, 2025: 我们开源了 HunyuanVideo-1.5的代码和推理权重
@@ -54,6 +56,8 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
54
 
55
  如果您在项目中使用或开发了 HunyuanVideo-1.5,欢迎告知我们。
56
 
 
 
57
  - **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): 一个强大且模块化的扩散模型图形界面,采用节点式工作流。ComfyUI 支持 HunyuanVideo-1.5,并提供多种工程加速优化以实现快速推理。
58
  我们提供了一个 [ComfyUI 使用指南](./ComfyUI/README.md) 用于 HunyuanVideo-1.5。
59
  - **社区实现的 ComfyUI 插件** - [comfyui_hunyuanvideo_1.5_plugin](https://github.com/yuanyuan-spec/comfyui_hunyuanvideo_1.5_plugin): 社区实现的 HunyuanVideo-1.5 ComfyUI 插件,提供简化版和完整版节点集,支持快速使用或深度工作流定制,内置自动模型下载功能。
@@ -70,7 +74,7 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
70
  - [x] 推理代码和模型权重
71
  - [x] 支持 ComfyUI
72
  - [x] 支持 LightX2V
73
- - [ ] Diffusers 支持
74
  - [ ] 发布所有模型权重(稀疏注意力、蒸馏模型和超分辨率模型)
75
 
76
 
@@ -86,7 +90,8 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
86
  - [🧱 下载预训练模型](#-下载预训练模型)
87
  - [📝 提示词指南](#-提示词指南)
88
  - [🔑 使用方法](#-使用方法)
89
- - [视频生成](#视频生成)
 
90
  - [命令行参数](#命令行参数)
91
  - [最优推理配置](#最优推理配置)
92
  - [🧱 模型卡片](#-模型卡片)
@@ -194,7 +199,8 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
194
 
195
 
196
  ## 🔑 使用方法
197
- ### 视频生成
 
198
 
199
  对于提示词重写,我们推荐使用 Gemini 或通过 vLLM 部署的大模型。当前代码库仅支持兼容 vLLM 接口的模型,如果您希望使用 Gemini,需自行实现相关接口调用。
200
 
@@ -227,6 +233,7 @@ OUTPUT_PATH=./outputs/output.mp4
227
  REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
228
  N_INFERENCE_GPU=8 # 并行推理 GPU 数量
229
  CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理,2倍加速
 
230
  SPARSE_ATTN=false # 使用稀疏注意力进行推理(仅 720p 模型配备了稀疏注意力)。请确保 flex-block-attn 已安装
231
  SAGE_ATTN=true # 使用 SageAttention 进行推理
232
  OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效,会显著增加 CPU 内存占用,但能够提速
@@ -243,6 +250,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
243
  --seed $SEED \
244
  --rewrite $REWRITE \
245
  --cfg_distilled $CFG_DISTILLED \
 
246
  --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
247
  --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
248
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
@@ -279,6 +287,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
279
  | `--save_pre_sr_video` | bool | 否 | `false` | 保存超分辨率处理前的原始视频(使用 `--save_pre_sr_video` 或 `--save_pre_sr_video true` 来启用,仅在启用超分辨率时有效) |
280
  | `--rewrite` | bool | 否 | `true` | 启用提示词重写(使用 `--rewrite false` 或 `--rewrite 0` 来禁用,禁用可能导致视频生成质量降低) |
281
  | `--cfg_distilled` | bool | 否 | `false` | 启用 CFG 蒸馏模型以加速推理(约 2 倍加速,使用 `--cfg_distilled` 或 `--cfg_distilled true` 来启用) |
 
282
  | `--sparse_attn` | bool | 否 | `false` | 启用稀疏注意力以加速推理(约 1.5-2 倍加速,需要 H 系列 GPU,会自动启用 CFG 蒸馏,使用 `--sparse_attn` 或 `--sparse_attn true` 来启用) |
283
  | `--offloading` | bool | 否 | `true` | 启用 CPU 卸载(使用 `--offloading false` 或 `--offloading 0` 来禁用,如果 GPU 内存允许,禁用后速度会更快) |
284
  | `--group_offloading` | bool | 否 | `None` | 启用组卸载(默认:None,如果启用了 offloading 则自动启用。使用 `--group_offloading` 或 `--group_offloading true/1` 来启用,`--group_offloading false/0` 来禁用) |
@@ -309,6 +318,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
309
  | 720p I2V | 6 | None | 7 | 50 |
310
  | 480p T2V cfg 蒸馏 | 1 | None | 5 | 50 |
311
  | 480p I2V cfg 蒸馏 | 1 | None | 5 | 50 |
 
312
  | 720p T2V cfg 蒸馏 | 1 | None | 9 | 50 |
313
  | 720p I2V cfg 蒸馏 | 1 | None | 7 | 50 |
314
  | 720p T2V cfg 蒸馏稀疏 | 1 | None | 9 | 50 |
@@ -318,6 +328,70 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
318
 
319
  **请注意我们提供的cfg蒸馏模型,需要50步的推理步数来获得正确的结果.**
320
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
321
 
322
  ## 🧱 模型卡片
323
  |模型名称| 下载链接 |
@@ -326,6 +400,7 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
326
  |HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
327
  |HunyuanVideo-1.5-480P-T2V-cfg-distill | [480P-T2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
328
  |HunyuanVideo-1.5-480P-I2V-cfg-distill |[480P-I2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
 
329
  |HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
330
  |HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
331
  |HunyuanVideo-1.5-720P-T2V-cfg-distill| Comming soon |
 
40
  </p>
41
 
42
  ## 🔥🔥🔥 最新动态
43
+ * 🚀 Dec 05, 2025: **新模型发布**:我们现已发布 480p I2V 步数蒸馏模型,建议使用 8 或 12 步生成视频!在 RTX 4090 上,端到端生成耗时减少 75%,单卡 RTX 4090 可在 75 秒内生成视频。步数蒸馏模型在保持与原模型相当质量的同时实现了显著的加速。如需更快的生成速度,您也可以尝试使用4步推理(速度更快,质量略有下降)。详细的质量对比请参见[步数蒸馏对比文档](./assets/step_distillation_comparison.md)。 🔥🔥🔥🆕
44
  * 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练,我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型,或使用 LoRA 进行微调,请使用 Muon 优化器。**
45
+ * 🎉 **Diffusers 支持**:HunyuanVideo-1.5 现已支持 Hugging Face Diffusers!查看我们的 [Diffusers 集合](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) 以便轻松集成。 🔥🔥🔥🆕
46
  * 🚀 Nov 27, 2025: 我们现已支持 cache 推理(deepcache, teacache, taylorcache),可极大加速推理!请 pull 最新代码体验。 🔥🔥🔥🆕
47
  * 🚀 Nov 24, 2025: 我们现已支持 deepcache 推理。
48
  * 👋 Nov 20, 2025: 我们开源了 HunyuanVideo-1.5的代码和推理权重
 
56
 
57
  如果您在项目中使用或开发了 HunyuanVideo-1.5,欢迎告知我们。
58
 
59
+ - **Diffusers** - [HunyuanVideo-1.5 Diffusers](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15): HunyuanVideo-1.5 的官方 Hugging Face Diffusers 集成。使用 Diffusers 库轻松使用 HunyuanVideo-1.5,无缝集成到您的项目中。详情请参阅[使用 Diffusers](#使用-diffusers) 部分。
60
+
61
  - **ComfyUI** - [ComfyUI](https://github.com/comfyanonymous/ComfyUI): 一个强大且模块化的扩散模型图形界面,采用节点式工作流。ComfyUI 支持 HunyuanVideo-1.5,并提供多种工程加速优化以实现快速推理。
62
  我们提供了一个 [ComfyUI 使用指南](./ComfyUI/README.md) 用于 HunyuanVideo-1.5。
63
  - **社区实现的 ComfyUI 插件** - [comfyui_hunyuanvideo_1.5_plugin](https://github.com/yuanyuan-spec/comfyui_hunyuanvideo_1.5_plugin): 社区实现的 HunyuanVideo-1.5 ComfyUI 插件,提供简化版和完整版节点集,支持快速使用或深度工作流定制,内置自动模型下载功能。
 
74
  - [x] 推理代码和模型权重
75
  - [x] 支持 ComfyUI
76
  - [x] 支持 LightX2V
77
+ - [x] Diffusers 支持
78
  - [ ] 发布所有模型权重(稀疏注意力、蒸馏模型和超分辨率模型)
79
 
80
 
 
90
  - [🧱 下载预训练模型](#-下载预训练模型)
91
  - [📝 提示词指南](#-提示词指南)
92
  - [🔑 使用方法](#-使用方法)
93
+ - [使用源代码推理](#使用源代码推理)
94
+ - [使用 Diffusers](#使用-diffusers)
95
  - [命令行参数](#命令行参数)
96
  - [最优推理配置](#最优推理配置)
97
  - [🧱 模型卡片](#-模型卡片)
 
199
 
200
 
201
  ## 🔑 使用方法
202
+
203
+ ### 使用源代码推理
204
 
205
  对于提示词重写,我们推荐使用 Gemini 或通过 vLLM 部署的大模型。当前代码库仅支持兼容 vLLM 接口的模型,如果您希望使用 Gemini,需自行实现相关接口调用。
206
 
 
233
  REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
234
  N_INFERENCE_GPU=8 # 并行推理 GPU 数量
235
  CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理,2倍加速
236
+ ENABLE_STEP_DISTILL=true # 启用 480p I2V 步数蒸馏模型,推荐 8 或 12 步,在 RTX 4090 上可提速 75%
237
  SPARSE_ATTN=false # 使用稀疏注意力进行推理(仅 720p 模型配备了稀疏注意力)。请确保 flex-block-attn 已安装
238
  SAGE_ATTN=true # 使用 SageAttention 进行推理
239
  OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效,会显著增加 CPU 内存占用,但能够提速
 
250
  --seed $SEED \
251
  --rewrite $REWRITE \
252
  --cfg_distilled $CFG_DISTILLED \
253
+ --enable_step_distill $ENABLE_STEP_DISTILL \
254
  --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
255
  --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
256
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
 
287
  | `--save_pre_sr_video` | bool | 否 | `false` | 保存超分辨率处理前的原始视频(使用 `--save_pre_sr_video` 或 `--save_pre_sr_video true` 来启用,仅在启用超分辨率时有效) |
288
  | `--rewrite` | bool | 否 | `true` | 启用提示词重写(使用 `--rewrite false` 或 `--rewrite 0` 来禁用,禁用可能导致视频生成质量降低) |
289
  | `--cfg_distilled` | bool | 否 | `false` | 启用 CFG 蒸馏模型以加速推理(约 2 倍加速,使用 `--cfg_distilled` 或 `--cfg_distilled true` 来启用) |
290
+ | `--enable_step_distill` | bool | 否 | `false` | 启用 480p I2V 步数蒸馏模型(推荐 8 或 12 步,在 RTX 4090 上可提速约 75%,使用 `--enable_step_distill` 或 `--enable_step_distill true` 来启用) |
291
  | `--sparse_attn` | bool | 否 | `false` | 启用稀疏注意力以加速推理(约 1.5-2 倍加速,需要 H 系列 GPU,会自动启用 CFG 蒸馏,使用 `--sparse_attn` 或 `--sparse_attn true` 来启用) |
292
  | `--offloading` | bool | 否 | `true` | 启用 CPU 卸载(使用 `--offloading false` 或 `--offloading 0` 来禁用,如果 GPU 内存允许,禁用后速度会更快) |
293
  | `--group_offloading` | bool | 否 | `None` | 启用组卸载(默认:None,如果启用了 offloading 则自动启用。使用 `--group_offloading` 或 `--group_offloading true/1` 来启用,`--group_offloading false/0` 来禁用) |
 
318
  | 720p I2V | 6 | None | 7 | 50 |
319
  | 480p T2V cfg 蒸馏 | 1 | None | 5 | 50 |
320
  | 480p I2V cfg 蒸馏 | 1 | None | 5 | 50 |
321
+ | 480p I2V 步数蒸馏 | 1 | None | 7 | 8 或 12(推荐) |
322
  | 720p T2V cfg 蒸馏 | 1 | None | 9 | 50 |
323
  | 720p I2V cfg 蒸馏 | 1 | None | 7 | 50 |
324
  | 720p T2V cfg 蒸馏稀疏 | 1 | None | 9 | 50 |
 
328
 
329
  **请注意我们提供的cfg蒸馏模型,需要50步的推理步数来获得正确的结果.**
330
 
331
+ ### 使用 Diffusers
332
+
333
+ HunyuanVideo-1.5 现已支持 Hugging Face Diffusers!您可以使用 Diffusers 库轻松使用:
334
+
335
+ **基础使用:**
336
+
337
+ ```python
338
+ import torch
339
+
340
+ dtype = torch.bfloat16
341
+ device = "cuda:0"
342
+
343
+ from diffusers import HunyuanVideo15Pipeline
344
+ from diffusers.utils import export_to_video
345
+
346
+ pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
347
+ pipe.enable_model_cpu_offload()
348
+ pipe.vae.enable_tiling()
349
+
350
+ generator = torch.Generator(device=device).manual_seed(seed)
351
+
352
+ video = pipe(
353
+ prompt=prompt,
354
+ generator=generator,
355
+ num_frames=121,
356
+ num_inference_steps=50,
357
+ ).frames[0]
358
+
359
+ export_to_video(video, "output.mp4", fps=24)
360
+ ```
361
+
362
+ **使用注意力后端优化:**
363
+
364
+ HunyuanVideo-1.5 使用可变长度序列的注意力掩码。为了获得最佳性能,我们建议使用能够高效处理填充的注意力后端。
365
+
366
+ 我们建议安装 kernels(`pip install kernels`)以访问预构建的注意力内核。
367
+
368
+ ```python
369
+ import torch
370
+
371
+ dtype = torch.bfloat16
372
+ device = "cuda:0"
373
+
374
+ from diffusers import HunyuanVideo15Pipeline, attention_backend
375
+ from diffusers.utils import export_to_video
376
+
377
+ pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
378
+ pipe.enable_model_cpu_offload()
379
+ pipe.vae.enable_tiling()
380
+
381
+ generator = torch.Generator(device=device).manual_seed(seed)
382
+
383
+ with attention_backend("_flash_3_hub"): # 如果您不在 H100/H800 上,可以使用 `"flash_hub"`
384
+ video = pipe(
385
+ prompt=prompt,
386
+ generator=generator,
387
+ num_frames=121,
388
+ num_inference_steps=50,
389
+ ).frames[0]
390
+ export_to_video(video, "output.mp4", fps=24)
391
+ ```
392
+
393
+ 更多详情,请访问 [HunyuanVideo-1.5 Diffusers 集合](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15)。
394
+
395
 
396
  ## 🧱 模型卡片
397
  |模型名称| 下载链接 |
 
400
  |HunyuanVideo-1.5-480P-I2V |[480P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v) |
401
  |HunyuanVideo-1.5-480P-T2V-cfg-distill | [480P-T2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_t2v_distilled) |
402
  |HunyuanVideo-1.5-480P-I2V-cfg-distill |[480P-I2V-cfg-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_distilled) |
403
+ |HunyuanVideo-1.5-480P-I2V-step-distill |[480P-I2V-step-distill](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_step_distilled) |
404
  |HunyuanVideo-1.5-720P-T2V|[720P-T2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_t2v) |
405
  |HunyuanVideo-1.5-720P-I2V |[720P-I2V](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/720p_i2v) |
406
  |HunyuanVideo-1.5-720P-T2V-cfg-distill| Comming soon |