KevinNg99 commited on
Commit
66b1147
·
1 Parent(s): f11ffa6

update README

Browse files
Files changed (2) hide show
  1. README.md +43 -10
  2. README_CN.md +43 -9
README.md CHANGED
@@ -57,7 +57,9 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
57
  </p>
58
 
59
  ## 🔥🔥🔥 News
60
- * 🚀 Nov 24, 2025: We now support cache inference, achieving approximately 2x speedup! Pull the latest code to try it. 🔥🔥🔥🆕
 
 
61
  * 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
62
 
63
 
@@ -78,6 +80,8 @@ If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
78
 
79
  - **Wan2GP v9.62** - [Wan2GP](https://github.com/deepbeepmeep/Wan2GP): WanGP is a very low VRAM app (as low 6 GB of VRAM for Hunyuan Video 1.5) supports Lora Accelerator for a 8 steps generation and offers tools to facilitate Video Generation.
80
 
 
 
81
 
82
  ## 📑 Open-source Plan
83
  - HunyuanVideo-1.5 (T2V/I2V)
@@ -105,6 +109,7 @@ If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
105
  - [Command Line Arguments](#command-line-arguments)
106
  - [Optimal Inference Configurations](#optimal-inference-configurations)
107
  - [🧱 Models Cards](#-models-cards)
 
108
  - [🎬 More Examples](#-more-examples)
109
  - [📊 Evaluation](#-evaluation)
110
  - [📚 Citation](#-citation)
@@ -226,20 +231,22 @@ export I2V_REWRITE_MODEL_NAME="<your_model_name>"
226
 
227
  PROMPT='A girl holding a paper with words "Hello, world!"'
228
 
229
- IMAGE_PATH=./data/reference_image.png # Optional, 'none' or <image path>
230
  SEED=1
231
  ASPECT_RATIO=16:9
232
  RESOLUTION=480p
233
  OUTPUT_PATH=./outputs/output.mp4
234
 
235
  # Configuration
 
236
  N_INFERENCE_GPU=8 # Parallel inference GPU count
237
  CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
238
  SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
239
  SAGE_ATTN=true # Inference with SageAttention
240
- REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
241
  OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
242
  ENABLE_CACHE=true # Enable feature cache during inference. Significantly speeds up inference.
 
 
243
  MODEL_PATH=ckpts # Path to pretrained model
244
 
245
  torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
@@ -248,14 +255,13 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
248
  --resolution $RESOLUTION \
249
  --aspect_ratio $ASPECT_RATIO \
250
  --seed $SEED \
251
- --cfg_distilled $CFG_DISTILLED \
252
- --sparse_attn $SPARSE_ATTN \
253
- --use_sageattn $SAGE_ATTN \
254
- --enable_cache $ENABLE_CACHE \
255
  --rewrite $REWRITE \
256
- --output_path $OUTPUT_PATH \
 
 
257
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
258
- --save_pre_sr_video \
 
259
  --model_path $MODEL_PATH
260
  ```
261
 
@@ -295,8 +301,9 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
295
  | `--dtype` | str | No | `bf16` | Data type for transformer: `bf16` (faster, lower memory) or `fp32` (better quality, slower, higher memory) |
296
  | `--use_sageattn` | bool | No | `false` | Enable SageAttention (use `--use_sageattn` or `--use_sageattn true/1` to enable, `--use_sageattn false/0` to disable) |
297
  | `--sage_blocks_range` | str | No | `0-53` | SageAttention blocks range (e.g., `0-5` or `0,1,2,3,4,5`) |
298
- | `--enable_torch_compile` | bool | No | `false` | Enable torch compile for transformer (use `--enable_torch_compile` or `--enable_torch_compile true/1` to enable, `--enable_torch_compile false/0` to disable) |
299
  | `--enable_cache` | bool | No | `false` | Enable cache for transformer (use `--enable_cache` or `--enable_cache true/1` to enable, `--enable_cache false/0` to disable) |
 
 
300
  | `--cache_start_step` | int | No | `11` | Start step to skip when using cache |
301
  | `--cache_end_step` | int | No | `45` | End step to skip when using cache |
302
  | `--total_steps` | int | No | `50` | Total inference steps |
@@ -344,6 +351,32 @@ The following table provides the optimal inference configurations (CFG scale, em
344
 
345
 
346
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
347
  ## 🎬 More Examples
348
  |Features|Demo1|Demo2|
349
  |------|------|------|
 
57
  </p>
58
 
59
  ## 🔥🔥🔥 News
60
+ * 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
61
+ * 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
62
+ * 🚀 Nov 24, 2025: We now support deepcache inference.
63
  * 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
64
 
65
 
 
80
 
81
  - **Wan2GP v9.62** - [Wan2GP](https://github.com/deepbeepmeep/Wan2GP): WanGP is a very low VRAM app (as low 6 GB of VRAM for Hunyuan Video 1.5) supports Lora Accelerator for a 8 steps generation and offers tools to facilitate Video Generation.
82
 
83
+ - **ComfyUI-MagCache** - [ComfyUI-MagCache](https://github.com/Zehong-Ma/ComfyUI-MagCache): MagCache is a training-free caching approach that accelerates video generation by estimating fluctuating differences among model outputs across timesteps. It achieves 1.7x speedup for HunyuanVideo-1.5 with 20 inference steps.
84
+
85
 
86
  ## 📑 Open-source Plan
87
  - HunyuanVideo-1.5 (T2V/I2V)
 
109
  - [Command Line Arguments](#command-line-arguments)
110
  - [Optimal Inference Configurations](#optimal-inference-configurations)
111
  - [🧱 Models Cards](#-models-cards)
112
+ - [🎓 Training](#-training)
113
  - [🎬 More Examples](#-more-examples)
114
  - [📊 Evaluation](#-evaluation)
115
  - [📚 Citation](#-citation)
 
231
 
232
  PROMPT='A girl holding a paper with words "Hello, world!"'
233
 
234
+ IMAGE_PATH=none # Optional, none or <image path> to enable i2v mode
235
  SEED=1
236
  ASPECT_RATIO=16:9
237
  RESOLUTION=480p
238
  OUTPUT_PATH=./outputs/output.mp4
239
 
240
  # Configuration
241
+ REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
242
  N_INFERENCE_GPU=8 # Parallel inference GPU count
243
  CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
244
  SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
245
  SAGE_ATTN=true # Inference with SageAttention
 
246
  OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
247
  ENABLE_CACHE=true # Enable feature cache during inference. Significantly speeds up inference.
248
+ CACHE_TYPE=deepcache # Support: deepcache, teacache, taylorcache
249
+ ENABLE_SR=true # Enable super resolution
250
  MODEL_PATH=ckpts # Path to pretrained model
251
 
252
  torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 
255
  --resolution $RESOLUTION \
256
  --aspect_ratio $ASPECT_RATIO \
257
  --seed $SEED \
 
 
 
 
258
  --rewrite $REWRITE \
259
+ --cfg_distilled $CFG_DISTILLED \
260
+ --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
261
+ --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
262
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
263
+ --sr $ENABLE_SR --save_pre_sr_video \
264
+ --output_path $OUTPUT_PATH \
265
  --model_path $MODEL_PATH
266
  ```
267
 
 
301
  | `--dtype` | str | No | `bf16` | Data type for transformer: `bf16` (faster, lower memory) or `fp32` (better quality, slower, higher memory) |
302
  | `--use_sageattn` | bool | No | `false` | Enable SageAttention (use `--use_sageattn` or `--use_sageattn true/1` to enable, `--use_sageattn false/0` to disable) |
303
  | `--sage_blocks_range` | str | No | `0-53` | SageAttention blocks range (e.g., `0-5` or `0,1,2,3,4,5`) |
 
304
  | `--enable_cache` | bool | No | `false` | Enable cache for transformer (use `--enable_cache` or `--enable_cache true/1` to enable, `--enable_cache false/0` to disable) |
305
+ | `--cache_type` | str | No | `deepcache` | Cache type for transformer (e.g., `deepcache, teacache, taylorcache`) |
306
+ | `--no_cache_block_id` | str | No | `53` | Blocks to exclude from deepcache (e.g., `0-5` or `0,1,2,3,4,5`) |
307
  | `--cache_start_step` | int | No | `11` | Start step to skip when using cache |
308
  | `--cache_end_step` | int | No | `45` | End step to skip when using cache |
309
  | `--total_steps` | int | No | `50` | Total inference steps |
 
351
 
352
 
353
 
354
+ ## 🎓 Training
355
+
356
+ > 💡 Training code is coming soon. We will release the complete training pipeline in the future.
357
+
358
+ HunyuanVideo-1.5 is trained using the **Muon optimizer**, which accelerates convergence and improves training stability. The Muon optimizer combines momentum-based updates with Newton-Schulz orthogonalization for efficient optimization of large-scale video generation models.
359
+
360
+ ### Creating a Muon Optimizer
361
+
362
+ Here's how to create a Muon optimizer for your model:
363
+
364
+ ```python
365
+ from hyvideo.optim.muon import get_muon_optimizer
366
+
367
+ # Create Muon optimizer for your model
368
+ optimizer = get_muon_optimizer(
369
+ model=your_model,
370
+ lr=lr, # Learning rate
371
+ weight_decay=weight_decay, # Weight decay
372
+ momentum=momentum, # Momentum coefficient
373
+ adamw_betas=adamw_betas, # AdamW betas for 1D parameters
374
+ adamw_eps=adamw_eps # AdamW epsilon
375
+ )
376
+ ```
377
+
378
+ > 📝 **To be continued**: More training details and the complete training pipeline will be released soon. Stay tuned!
379
+
380
  ## 🎬 More Examples
381
  |Features|Demo1|Demo2|
382
  |------|------|------|
README_CN.md CHANGED
@@ -40,7 +40,9 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
40
  </p>
41
 
42
  ## 🔥🔥🔥 最新动态
43
- * 🚀 Nov 24, 2025: 我们现已支持 cache 推理,可实现约两倍加速!请 pull 最新代码体验。 🔥🔥🔥🆕
 
 
44
  * 👋 Nov 20, 2025: 我们开源了 HunyuanVideo-1.5的代码和推理权重
45
 
46
  ## 🎥 演示视频
@@ -60,6 +62,8 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
60
 
61
  - **Wan2GP v9.62** - [Wan2GP](https://github.com/deepbeepmeep/Wan2GP): Wan2GP 是一款对显存要求非常低的应用(在 Hunyuan Video 1.5 下最低仅需 6GB 显存),支持 Lora 加速器实现 8 步生成,并且提供多种视频生成辅助工具。
62
 
 
 
63
 
64
  ## 📑 开源计划
65
  - HunyuanVideo-1.5 (文生视频/图生视频)
@@ -86,6 +90,7 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
86
  - [命令行参数](#命令行参数)
87
  - [最优推理配置](#最优推理配置)
88
  - [🧱 模型卡片](#-模型卡片)
 
89
  - [🎬 更多示例](#-更多示例)
90
  - [📊 性能评估](#-性能评估)
91
  - [📚 引用](#-引用)
@@ -212,20 +217,22 @@ export I2V_REWRITE_MODEL_NAME="<your_model_name>"
212
 
213
  PROMPT='A girl holding a paper with words "Hello, world!"'
214
 
215
- IMAGE_PATH=./data/reference_image.png # 可选,'none' 或 <图像路径>
216
  SEED=1
217
  ASPECT_RATIO=16:9
218
  RESOLUTION=480p
219
  OUTPUT_PATH=./outputs/output.mp4
220
 
221
  # 配置
 
222
  N_INFERENCE_GPU=8 # 并行推理 GPU 数量
223
  CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理,2倍加速
224
  SPARSE_ATTN=false # 使用稀疏注意力进行推理(仅 720p 模型配备了稀疏注意力)。请确保 flex-block-attn 已安装
225
  SAGE_ATTN=true # 使用 SageAttention 进行推理
226
- REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
227
  OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效,会显著增加 CPU 内存占用,但能够提速
228
  ENABLE_CACHE=true # 启用特征缓存进行推理。显著提升推理速度
 
 
229
  MODEL_PATH=ckpts # 预训练模型路径
230
 
231
  torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
@@ -234,14 +241,13 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
234
  --resolution $RESOLUTION \
235
  --aspect_ratio $ASPECT_RATIO \
236
  --seed $SEED \
237
- --cfg_distilled $CFG_DISTILLED \
238
- --sparse_attn $SPARSE_ATTN \
239
- --use_sageattn $SAGE_ATTN \
240
- --enable_cache $ENABLE_CACHE \
241
  --rewrite $REWRITE \
242
- --output_path $OUTPUT_PATH \
 
 
243
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
244
- --save_pre_sr_video \
 
245
  --model_path $MODEL_PATH
246
  ```
247
 
@@ -282,6 +288,8 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
282
  | `--sage_blocks_range` | str | 否 | `0-53` | SageAttention 块范围(例如:`0-5` 或 `0,1,2,3,4,5`) |
283
  | `--enable_torch_compile` | bool | 否 | `false` | 启用 torch compile 以优化 transformer(使用 `--enable_torch_compile` 或 `--enable_torch_compile true/1` 来启用,`--enable_torch_compile false/0` 来禁用) |
284
  | `--enable_cache` | bool | 否 | `false` | 启用 transformer 缓存(使用 `--enable_cache` 或 `--enable_cache true/1` 来启用,`--enable_cache false/0` 来禁用) |
 
 
285
  | `--cache_start_step` | int | 否 | `11` | 使用缓存时跳过的起始步数 |
286
  | `--cache_end_step` | int | 否 | `45` | 使用缓存时跳过的结束步数 |
287
  | `--total_steps` | int | 否 | `50` | 总推理步数 |
@@ -329,6 +337,32 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
329
 
330
 
331
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
332
  ## 🎬 更多示例
333
  |特性|示例1|示例2|
334
  |------|------|------|
 
40
  </p>
41
 
42
  ## 🔥🔥🔥 最新动态
43
+ * 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练,我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型,或使用 LoRA 进行微调,请使用 Muon 优化器。**
44
+ * 🚀 Nov 27, 2025: 我们现已支持 cache 推理(deepcache, teacache, taylorcache),可极大加速推理!请 pull 最新代码体验。 🔥🔥🔥🆕
45
+ * 🚀 Nov 24, 2025: 我们现已支持 deepcache 推理。
46
  * 👋 Nov 20, 2025: 我们开源了 HunyuanVideo-1.5的代码和推理权重
47
 
48
  ## 🎥 演示视频
 
62
 
63
  - **Wan2GP v9.62** - [Wan2GP](https://github.com/deepbeepmeep/Wan2GP): Wan2GP 是一款对显存要求非常低的应用(在 Hunyuan Video 1.5 下最低仅需 6GB 显存),支持 Lora 加速器实现 8 步生成,并且提供多种视频生成辅助工具。
64
 
65
+ - **ComfyUI-MagCache** - [ComfyUI-MagCache](https://github.com/Zehong-Ma/ComfyUI-MagCache): MagCache 是一种无需训练的缓存方法,通过估计模型输出在不同时间步之间的波动差异来加速视频生成。在 20 步推理下,可为 HunyuanVideo-1.5 实现 1.7 倍加速。
66
+
67
 
68
  ## 📑 开源计划
69
  - HunyuanVideo-1.5 (文生视频/图生视频)
 
90
  - [命令行参数](#命令行参数)
91
  - [最优推理配置](#最优推理配置)
92
  - [🧱 模型卡片](#-模型卡片)
93
+ - [🎓 训练](#-训练)
94
  - [🎬 更多示例](#-更多示例)
95
  - [📊 性能评估](#-性能评估)
96
  - [📚 引用](#-引用)
 
217
 
218
  PROMPT='A girl holding a paper with words "Hello, world!"'
219
 
220
+ IMAGE_PATH=none # 可选,none 或 <图像路径> 以启用 i2v 模式
221
  SEED=1
222
  ASPECT_RATIO=16:9
223
  RESOLUTION=480p
224
  OUTPUT_PATH=./outputs/output.mp4
225
 
226
  # 配置
227
+ REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
228
  N_INFERENCE_GPU=8 # 并行推理 GPU 数量
229
  CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理,2倍加速
230
  SPARSE_ATTN=false # 使用稀疏注意力进行推理(仅 720p 模型配备了稀疏注意力)。请确保 flex-block-attn 已安装
231
  SAGE_ATTN=true # 使用 SageAttention 进行推理
 
232
  OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效,会显著增加 CPU 内存占用,但能够提速
233
  ENABLE_CACHE=true # 启用特征缓存进行推理。显著提升推理速度
234
+ CACHE_TYPE=deepcache # 支持:deepcache, teacache, taylorcache
235
+ ENABLE_SR=true # 启用超分辨率
236
  MODEL_PATH=ckpts # 预训练模型路径
237
 
238
  torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
 
241
  --resolution $RESOLUTION \
242
  --aspect_ratio $ASPECT_RATIO \
243
  --seed $SEED \
 
 
 
 
244
  --rewrite $REWRITE \
245
+ --cfg_distilled $CFG_DISTILLED \
246
+ --sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
247
+ --enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
248
  --overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
249
+ --sr $ENABLE_SR --save_pre_sr_video \
250
+ --output_path $OUTPUT_PATH \
251
  --model_path $MODEL_PATH
252
  ```
253
 
 
288
  | `--sage_blocks_range` | str | 否 | `0-53` | SageAttention 块范围(例如:`0-5` 或 `0,1,2,3,4,5`) |
289
  | `--enable_torch_compile` | bool | 否 | `false` | 启用 torch compile 以优化 transformer(使用 `--enable_torch_compile` 或 `--enable_torch_compile true/1` 来启用,`--enable_torch_compile false/0` 来禁用) |
290
  | `--enable_cache` | bool | 否 | `false` | 启用 transformer 缓存(使用 `--enable_cache` 或 `--enable_cache true/1` 来启用,`--enable_cache false/0` 来禁用) |
291
+ | `--cache_type` | str | 否 | `deepcache` | Transformer 的缓存类型(例如:`deepcache, teacache, taylorcache`) |
292
+ | `--no_cache_block_id` | str | 否 | `53` | 从 deepcache 中排除的块(例如:`0-5` 或 `0,1,2,3,4,5`) |
293
  | `--cache_start_step` | int | 否 | `11` | 使用缓存时跳过的起始步数 |
294
  | `--cache_end_step` | int | 否 | `45` | 使用缓存时跳过的结束步数 |
295
  | `--total_steps` | int | 否 | `50` | 总推理步数 |
 
337
 
338
 
339
 
340
+ ## 🎓 训练
341
+
342
+ > 💡 训练代码即将发布。我们将在未来发布完整���训练流程。
343
+
344
+ HunyuanVideo-1.5 使用 **Muon 优化器**进行训练,该优化器能够加速收敛并提高训练稳定性。Muon 优化器结合了基于动量的更新和 Newton-Schulz 正交化方法,可高效优化大规模视频生成模型。
345
+
346
+ ### 创建 Muon 优化器
347
+
348
+ 以下是如何为您的模型创建 Muon 优化器:
349
+
350
+ ```python
351
+ from hyvideo.optim.muon import get_muon_optimizer
352
+
353
+ # 为您的模型创建 Muon 优化器
354
+ optimizer = get_muon_optimizer(
355
+ model=your_model,
356
+ lr=lr, # 学习率
357
+ weight_decay=weight_decay, # 权重衰减
358
+ momentum=momentum, # 动量系数
359
+ adamw_betas=adamw_betas, # 1D 参数的 AdamW betas
360
+ adamw_eps=adamw_eps # AdamW epsilon
361
+ )
362
+ ```
363
+
364
+ > 📝 **未完待续**:更多训练细节和完整的训练流程即将发布,敬请期待!
365
+
366
  ## 🎬 更多示例
367
  |特性|示例1|示例2|
368
  |------|------|------|