update README
Browse files- README.md +43 -10
- README_CN.md +43 -9
README.md
CHANGED
|
@@ -57,7 +57,9 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
|
|
| 57 |
</p>
|
| 58 |
|
| 59 |
## 🔥🔥🔥 News
|
| 60 |
-
*
|
|
|
|
|
|
|
| 61 |
* 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
|
| 62 |
|
| 63 |
|
|
@@ -78,6 +80,8 @@ If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
|
|
| 78 |
|
| 79 |
- **Wan2GP v9.62** - [Wan2GP](https://github.com/deepbeepmeep/Wan2GP): WanGP is a very low VRAM app (as low 6 GB of VRAM for Hunyuan Video 1.5) supports Lora Accelerator for a 8 steps generation and offers tools to facilitate Video Generation.
|
| 80 |
|
|
|
|
|
|
|
| 81 |
|
| 82 |
## 📑 Open-source Plan
|
| 83 |
- HunyuanVideo-1.5 (T2V/I2V)
|
|
@@ -105,6 +109,7 @@ If you develop/use HunyuanVideo-1.5 in your projects, welcome to let us know.
|
|
| 105 |
- [Command Line Arguments](#command-line-arguments)
|
| 106 |
- [Optimal Inference Configurations](#optimal-inference-configurations)
|
| 107 |
- [🧱 Models Cards](#-models-cards)
|
|
|
|
| 108 |
- [🎬 More Examples](#-more-examples)
|
| 109 |
- [📊 Evaluation](#-evaluation)
|
| 110 |
- [📚 Citation](#-citation)
|
|
@@ -226,20 +231,22 @@ export I2V_REWRITE_MODEL_NAME="<your_model_name>"
|
|
| 226 |
|
| 227 |
PROMPT='A girl holding a paper with words "Hello, world!"'
|
| 228 |
|
| 229 |
-
IMAGE_PATH
|
| 230 |
SEED=1
|
| 231 |
ASPECT_RATIO=16:9
|
| 232 |
RESOLUTION=480p
|
| 233 |
OUTPUT_PATH=./outputs/output.mp4
|
| 234 |
|
| 235 |
# Configuration
|
|
|
|
| 236 |
N_INFERENCE_GPU=8 # Parallel inference GPU count
|
| 237 |
CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
|
| 238 |
SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
|
| 239 |
SAGE_ATTN=true # Inference with SageAttention
|
| 240 |
-
REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
|
| 241 |
OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
|
| 242 |
ENABLE_CACHE=true # Enable feature cache during inference. Significantly speeds up inference.
|
|
|
|
|
|
|
| 243 |
MODEL_PATH=ckpts # Path to pretrained model
|
| 244 |
|
| 245 |
torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
|
@@ -248,14 +255,13 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
|
| 248 |
--resolution $RESOLUTION \
|
| 249 |
--aspect_ratio $ASPECT_RATIO \
|
| 250 |
--seed $SEED \
|
| 251 |
-
--cfg_distilled $CFG_DISTILLED \
|
| 252 |
-
--sparse_attn $SPARSE_ATTN \
|
| 253 |
-
--use_sageattn $SAGE_ATTN \
|
| 254 |
-
--enable_cache $ENABLE_CACHE \
|
| 255 |
--rewrite $REWRITE \
|
| 256 |
-
--
|
|
|
|
|
|
|
| 257 |
--overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
|
| 258 |
-
--save_pre_sr_video \
|
|
|
|
| 259 |
--model_path $MODEL_PATH
|
| 260 |
```
|
| 261 |
|
|
@@ -295,8 +301,9 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
|
| 295 |
| `--dtype` | str | No | `bf16` | Data type for transformer: `bf16` (faster, lower memory) or `fp32` (better quality, slower, higher memory) |
|
| 296 |
| `--use_sageattn` | bool | No | `false` | Enable SageAttention (use `--use_sageattn` or `--use_sageattn true/1` to enable, `--use_sageattn false/0` to disable) |
|
| 297 |
| `--sage_blocks_range` | str | No | `0-53` | SageAttention blocks range (e.g., `0-5` or `0,1,2,3,4,5`) |
|
| 298 |
-
| `--enable_torch_compile` | bool | No | `false` | Enable torch compile for transformer (use `--enable_torch_compile` or `--enable_torch_compile true/1` to enable, `--enable_torch_compile false/0` to disable) |
|
| 299 |
| `--enable_cache` | bool | No | `false` | Enable cache for transformer (use `--enable_cache` or `--enable_cache true/1` to enable, `--enable_cache false/0` to disable) |
|
|
|
|
|
|
|
| 300 |
| `--cache_start_step` | int | No | `11` | Start step to skip when using cache |
|
| 301 |
| `--cache_end_step` | int | No | `45` | End step to skip when using cache |
|
| 302 |
| `--total_steps` | int | No | `50` | Total inference steps |
|
|
@@ -344,6 +351,32 @@ The following table provides the optimal inference configurations (CFG scale, em
|
|
| 344 |
|
| 345 |
|
| 346 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 347 |
## 🎬 More Examples
|
| 348 |
|Features|Demo1|Demo2|
|
| 349 |
|------|------|------|
|
|
|
|
| 57 |
</p>
|
| 58 |
|
| 59 |
## 🔥🔥🔥 News
|
| 60 |
+
* 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
|
| 61 |
+
* 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
|
| 62 |
+
* 🚀 Nov 24, 2025: We now support deepcache inference.
|
| 63 |
* 👋 Nov 20, 2025: We release the inference code and model weights of HunyuanVideo-1.5.
|
| 64 |
|
| 65 |
|
|
|
|
| 80 |
|
| 81 |
- **Wan2GP v9.62** - [Wan2GP](https://github.com/deepbeepmeep/Wan2GP): WanGP is a very low VRAM app (as low 6 GB of VRAM for Hunyuan Video 1.5) supports Lora Accelerator for a 8 steps generation and offers tools to facilitate Video Generation.
|
| 82 |
|
| 83 |
+
- **ComfyUI-MagCache** - [ComfyUI-MagCache](https://github.com/Zehong-Ma/ComfyUI-MagCache): MagCache is a training-free caching approach that accelerates video generation by estimating fluctuating differences among model outputs across timesteps. It achieves 1.7x speedup for HunyuanVideo-1.5 with 20 inference steps.
|
| 84 |
+
|
| 85 |
|
| 86 |
## 📑 Open-source Plan
|
| 87 |
- HunyuanVideo-1.5 (T2V/I2V)
|
|
|
|
| 109 |
- [Command Line Arguments](#command-line-arguments)
|
| 110 |
- [Optimal Inference Configurations](#optimal-inference-configurations)
|
| 111 |
- [🧱 Models Cards](#-models-cards)
|
| 112 |
+
- [🎓 Training](#-training)
|
| 113 |
- [🎬 More Examples](#-more-examples)
|
| 114 |
- [📊 Evaluation](#-evaluation)
|
| 115 |
- [📚 Citation](#-citation)
|
|
|
|
| 231 |
|
| 232 |
PROMPT='A girl holding a paper with words "Hello, world!"'
|
| 233 |
|
| 234 |
+
IMAGE_PATH=none # Optional, none or <image path> to enable i2v mode
|
| 235 |
SEED=1
|
| 236 |
ASPECT_RATIO=16:9
|
| 237 |
RESOLUTION=480p
|
| 238 |
OUTPUT_PATH=./outputs/output.mp4
|
| 239 |
|
| 240 |
# Configuration
|
| 241 |
+
REWRITE=true # Enable prompt rewriting. Please ensure rewrite vLLM server is deployed and configured.
|
| 242 |
N_INFERENCE_GPU=8 # Parallel inference GPU count
|
| 243 |
CFG_DISTILLED=true # Inference with CFG distilled model, 2x speedup
|
| 244 |
SPARSE_ATTN=false # Inference with sparse attention (only 720p models are equipped with sparse attention). Please ensure flex-block-attn is installed
|
| 245 |
SAGE_ATTN=true # Inference with SageAttention
|
|
|
|
| 246 |
OVERLAP_GROUP_OFFLOADING=true # Only valid when group offloading is enabled, significantly increases CPU memory usage but speeds up inference
|
| 247 |
ENABLE_CACHE=true # Enable feature cache during inference. Significantly speeds up inference.
|
| 248 |
+
CACHE_TYPE=deepcache # Support: deepcache, teacache, taylorcache
|
| 249 |
+
ENABLE_SR=true # Enable super resolution
|
| 250 |
MODEL_PATH=ckpts # Path to pretrained model
|
| 251 |
|
| 252 |
torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
|
|
|
| 255 |
--resolution $RESOLUTION \
|
| 256 |
--aspect_ratio $ASPECT_RATIO \
|
| 257 |
--seed $SEED \
|
|
|
|
|
|
|
|
|
|
|
|
|
| 258 |
--rewrite $REWRITE \
|
| 259 |
+
--cfg_distilled $CFG_DISTILLED \
|
| 260 |
+
--sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
|
| 261 |
+
--enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
|
| 262 |
--overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
|
| 263 |
+
--sr $ENABLE_SR --save_pre_sr_video \
|
| 264 |
+
--output_path $OUTPUT_PATH \
|
| 265 |
--model_path $MODEL_PATH
|
| 266 |
```
|
| 267 |
|
|
|
|
| 301 |
| `--dtype` | str | No | `bf16` | Data type for transformer: `bf16` (faster, lower memory) or `fp32` (better quality, slower, higher memory) |
|
| 302 |
| `--use_sageattn` | bool | No | `false` | Enable SageAttention (use `--use_sageattn` or `--use_sageattn true/1` to enable, `--use_sageattn false/0` to disable) |
|
| 303 |
| `--sage_blocks_range` | str | No | `0-53` | SageAttention blocks range (e.g., `0-5` or `0,1,2,3,4,5`) |
|
|
|
|
| 304 |
| `--enable_cache` | bool | No | `false` | Enable cache for transformer (use `--enable_cache` or `--enable_cache true/1` to enable, `--enable_cache false/0` to disable) |
|
| 305 |
+
| `--cache_type` | str | No | `deepcache` | Cache type for transformer (e.g., `deepcache, teacache, taylorcache`) |
|
| 306 |
+
| `--no_cache_block_id` | str | No | `53` | Blocks to exclude from deepcache (e.g., `0-5` or `0,1,2,3,4,5`) |
|
| 307 |
| `--cache_start_step` | int | No | `11` | Start step to skip when using cache |
|
| 308 |
| `--cache_end_step` | int | No | `45` | End step to skip when using cache |
|
| 309 |
| `--total_steps` | int | No | `50` | Total inference steps |
|
|
|
|
| 351 |
|
| 352 |
|
| 353 |
|
| 354 |
+
## 🎓 Training
|
| 355 |
+
|
| 356 |
+
> 💡 Training code is coming soon. We will release the complete training pipeline in the future.
|
| 357 |
+
|
| 358 |
+
HunyuanVideo-1.5 is trained using the **Muon optimizer**, which accelerates convergence and improves training stability. The Muon optimizer combines momentum-based updates with Newton-Schulz orthogonalization for efficient optimization of large-scale video generation models.
|
| 359 |
+
|
| 360 |
+
### Creating a Muon Optimizer
|
| 361 |
+
|
| 362 |
+
Here's how to create a Muon optimizer for your model:
|
| 363 |
+
|
| 364 |
+
```python
|
| 365 |
+
from hyvideo.optim.muon import get_muon_optimizer
|
| 366 |
+
|
| 367 |
+
# Create Muon optimizer for your model
|
| 368 |
+
optimizer = get_muon_optimizer(
|
| 369 |
+
model=your_model,
|
| 370 |
+
lr=lr, # Learning rate
|
| 371 |
+
weight_decay=weight_decay, # Weight decay
|
| 372 |
+
momentum=momentum, # Momentum coefficient
|
| 373 |
+
adamw_betas=adamw_betas, # AdamW betas for 1D parameters
|
| 374 |
+
adamw_eps=adamw_eps # AdamW epsilon
|
| 375 |
+
)
|
| 376 |
+
```
|
| 377 |
+
|
| 378 |
+
> 📝 **To be continued**: More training details and the complete training pipeline will be released soon. Stay tuned!
|
| 379 |
+
|
| 380 |
## 🎬 More Examples
|
| 381 |
|Features|Demo1|Demo2|
|
| 382 |
|------|------|------|
|
README_CN.md
CHANGED
|
@@ -40,7 +40,9 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
|
|
| 40 |
</p>
|
| 41 |
|
| 42 |
## 🔥🔥🔥 最新动态
|
| 43 |
-
*
|
|
|
|
|
|
|
| 44 |
* 👋 Nov 20, 2025: 我们开源了 HunyuanVideo-1.5的代码和推理权重
|
| 45 |
|
| 46 |
## 🎥 演示视频
|
|
@@ -60,6 +62,8 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
|
|
| 60 |
|
| 61 |
- **Wan2GP v9.62** - [Wan2GP](https://github.com/deepbeepmeep/Wan2GP): Wan2GP 是一款对显存要求非常低的应用(在 Hunyuan Video 1.5 下最低仅需 6GB 显存),支持 Lora 加速器实现 8 步生成,并且提供多种视频生成辅助工具。
|
| 62 |
|
|
|
|
|
|
|
| 63 |
|
| 64 |
## 📑 开源计划
|
| 65 |
- HunyuanVideo-1.5 (文生视频/图生视频)
|
|
@@ -86,6 +90,7 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
|
|
| 86 |
- [命令行参数](#命令行参数)
|
| 87 |
- [最优推理配置](#最优推理配置)
|
| 88 |
- [🧱 模型卡片](#-模型卡片)
|
|
|
|
| 89 |
- [🎬 更多示例](#-更多示例)
|
| 90 |
- [📊 性能评估](#-性能评估)
|
| 91 |
- [📚 引用](#-引用)
|
|
@@ -212,20 +217,22 @@ export I2V_REWRITE_MODEL_NAME="<your_model_name>"
|
|
| 212 |
|
| 213 |
PROMPT='A girl holding a paper with words "Hello, world!"'
|
| 214 |
|
| 215 |
-
IMAGE_PATH
|
| 216 |
SEED=1
|
| 217 |
ASPECT_RATIO=16:9
|
| 218 |
RESOLUTION=480p
|
| 219 |
OUTPUT_PATH=./outputs/output.mp4
|
| 220 |
|
| 221 |
# 配置
|
|
|
|
| 222 |
N_INFERENCE_GPU=8 # 并行推理 GPU 数量
|
| 223 |
CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理,2倍加速
|
| 224 |
SPARSE_ATTN=false # 使用稀疏注意力进行推理(仅 720p 模型配备了稀疏注意力)。请确保 flex-block-attn 已安装
|
| 225 |
SAGE_ATTN=true # 使用 SageAttention 进行推理
|
| 226 |
-
REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
|
| 227 |
OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效,会显著增加 CPU 内存占用,但能够提速
|
| 228 |
ENABLE_CACHE=true # 启用特征缓存进行推理。显著提升推理速度
|
|
|
|
|
|
|
| 229 |
MODEL_PATH=ckpts # 预训练模型路径
|
| 230 |
|
| 231 |
torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
|
@@ -234,14 +241,13 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
|
| 234 |
--resolution $RESOLUTION \
|
| 235 |
--aspect_ratio $ASPECT_RATIO \
|
| 236 |
--seed $SEED \
|
| 237 |
-
--cfg_distilled $CFG_DISTILLED \
|
| 238 |
-
--sparse_attn $SPARSE_ATTN \
|
| 239 |
-
--use_sageattn $SAGE_ATTN \
|
| 240 |
-
--enable_cache $ENABLE_CACHE \
|
| 241 |
--rewrite $REWRITE \
|
| 242 |
-
--
|
|
|
|
|
|
|
| 243 |
--overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
|
| 244 |
-
--save_pre_sr_video \
|
|
|
|
| 245 |
--model_path $MODEL_PATH
|
| 246 |
```
|
| 247 |
|
|
@@ -282,6 +288,8 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
|
| 282 |
| `--sage_blocks_range` | str | 否 | `0-53` | SageAttention 块范围(例如:`0-5` 或 `0,1,2,3,4,5`) |
|
| 283 |
| `--enable_torch_compile` | bool | 否 | `false` | 启用 torch compile 以优化 transformer(使用 `--enable_torch_compile` 或 `--enable_torch_compile true/1` 来启用,`--enable_torch_compile false/0` 来禁用) |
|
| 284 |
| `--enable_cache` | bool | 否 | `false` | 启用 transformer 缓存(使用 `--enable_cache` 或 `--enable_cache true/1` 来启用,`--enable_cache false/0` 来禁用) |
|
|
|
|
|
|
|
| 285 |
| `--cache_start_step` | int | 否 | `11` | 使用缓存时跳过的起始步数 |
|
| 286 |
| `--cache_end_step` | int | 否 | `45` | 使用缓存时跳过的结束步数 |
|
| 287 |
| `--total_steps` | int | 否 | `50` | 总推理步数 |
|
|
@@ -329,6 +337,32 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
|
| 329 |
|
| 330 |
|
| 331 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 332 |
## 🎬 更多示例
|
| 333 |
|特性|示例1|示例2|
|
| 334 |
|------|------|------|
|
|
|
|
| 40 |
</p>
|
| 41 |
|
| 42 |
## 🔥🔥🔥 最新动态
|
| 43 |
+
* 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练,我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型,或使用 LoRA 进行微调,请使用 Muon 优化器。**
|
| 44 |
+
* 🚀 Nov 27, 2025: 我们现已支持 cache 推理(deepcache, teacache, taylorcache),可极大加速推理!请 pull 最新代码体验。 🔥🔥🔥🆕
|
| 45 |
+
* 🚀 Nov 24, 2025: 我们现已支持 deepcache 推理。
|
| 46 |
* 👋 Nov 20, 2025: 我们开源了 HunyuanVideo-1.5的代码和推理权重
|
| 47 |
|
| 48 |
## 🎥 演示视频
|
|
|
|
| 62 |
|
| 63 |
- **Wan2GP v9.62** - [Wan2GP](https://github.com/deepbeepmeep/Wan2GP): Wan2GP 是一款对显存要求非常低的应用(在 Hunyuan Video 1.5 下最低仅需 6GB 显存),支持 Lora 加速器实现 8 步生成,并且提供多种视频生成辅助工具。
|
| 64 |
|
| 65 |
+
- **ComfyUI-MagCache** - [ComfyUI-MagCache](https://github.com/Zehong-Ma/ComfyUI-MagCache): MagCache 是一种无需训练的缓存方法,通过估计模型输出在不同时间步之间的波动差异来加速视频生成。在 20 步推理下,可为 HunyuanVideo-1.5 实现 1.7 倍加速。
|
| 66 |
+
|
| 67 |
|
| 68 |
## 📑 开源计划
|
| 69 |
- HunyuanVideo-1.5 (文生视频/图生视频)
|
|
|
|
| 90 |
- [命令行参数](#命令行参数)
|
| 91 |
- [最优推理配置](#最优推理配置)
|
| 92 |
- [🧱 模型卡片](#-模型卡片)
|
| 93 |
+
- [🎓 训练](#-训练)
|
| 94 |
- [🎬 更多示例](#-更多示例)
|
| 95 |
- [📊 性能评估](#-性能评估)
|
| 96 |
- [📚 引用](#-引用)
|
|
|
|
| 217 |
|
| 218 |
PROMPT='A girl holding a paper with words "Hello, world!"'
|
| 219 |
|
| 220 |
+
IMAGE_PATH=none # 可选,none 或 <图像路径> 以启用 i2v 模式
|
| 221 |
SEED=1
|
| 222 |
ASPECT_RATIO=16:9
|
| 223 |
RESOLUTION=480p
|
| 224 |
OUTPUT_PATH=./outputs/output.mp4
|
| 225 |
|
| 226 |
# 配置
|
| 227 |
+
REWRITE=true # 启用提示词重写。请确保 rewrite vLLM server 已部署和配置。
|
| 228 |
N_INFERENCE_GPU=8 # 并行推理 GPU 数量
|
| 229 |
CFG_DISTILLED=true # 使用 CFG 蒸馏模型进行推理,2倍加速
|
| 230 |
SPARSE_ATTN=false # 使用稀疏注意力进行推理(仅 720p 模型配备了稀疏注意力)。请确保 flex-block-attn 已安装
|
| 231 |
SAGE_ATTN=true # 使用 SageAttention 进行推理
|
|
|
|
| 232 |
OVERLAP_GROUP_OFFLOADING=true # 仅在组卸载启用时有效,会显著增加 CPU 内存占用,但能够提速
|
| 233 |
ENABLE_CACHE=true # 启用特征缓存进行推理。显著提升推理速度
|
| 234 |
+
CACHE_TYPE=deepcache # 支持:deepcache, teacache, taylorcache
|
| 235 |
+
ENABLE_SR=true # 启用超分辨率
|
| 236 |
MODEL_PATH=ckpts # 预训练模型路径
|
| 237 |
|
| 238 |
torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
|
|
|
|
| 241 |
--resolution $RESOLUTION \
|
| 242 |
--aspect_ratio $ASPECT_RATIO \
|
| 243 |
--seed $SEED \
|
|
|
|
|
|
|
|
|
|
|
|
|
| 244 |
--rewrite $REWRITE \
|
| 245 |
+
--cfg_distilled $CFG_DISTILLED \
|
| 246 |
+
--sparse_attn $SPARSE_ATTN --use_sageattn $SAGE_ATTN \
|
| 247 |
+
--enable_cache $ENABLE_CACHE --cache_type $CACHE_TYPE \
|
| 248 |
--overlap_group_offloading $OVERLAP_GROUP_OFFLOADING \
|
| 249 |
+
--sr $ENABLE_SR --save_pre_sr_video \
|
| 250 |
+
--output_path $OUTPUT_PATH \
|
| 251 |
--model_path $MODEL_PATH
|
| 252 |
```
|
| 253 |
|
|
|
|
| 288 |
| `--sage_blocks_range` | str | 否 | `0-53` | SageAttention 块范围(例如:`0-5` 或 `0,1,2,3,4,5`) |
|
| 289 |
| `--enable_torch_compile` | bool | 否 | `false` | 启用 torch compile 以优化 transformer(使用 `--enable_torch_compile` 或 `--enable_torch_compile true/1` 来启用,`--enable_torch_compile false/0` 来禁用) |
|
| 290 |
| `--enable_cache` | bool | 否 | `false` | 启用 transformer 缓存(使用 `--enable_cache` 或 `--enable_cache true/1` 来启用,`--enable_cache false/0` 来禁用) |
|
| 291 |
+
| `--cache_type` | str | 否 | `deepcache` | Transformer 的缓存类型(例如:`deepcache, teacache, taylorcache`) |
|
| 292 |
+
| `--no_cache_block_id` | str | 否 | `53` | 从 deepcache 中排除的块(例如:`0-5` 或 `0,1,2,3,4,5`) |
|
| 293 |
| `--cache_start_step` | int | 否 | `11` | 使用缓存时跳过的起始步数 |
|
| 294 |
| `--cache_end_step` | int | 否 | `45` | 使用缓存时跳过的结束步数 |
|
| 295 |
| `--total_steps` | int | 否 | `50` | 总推理步数 |
|
|
|
|
| 337 |
|
| 338 |
|
| 339 |
|
| 340 |
+
## 🎓 训练
|
| 341 |
+
|
| 342 |
+
> 💡 训练代码即将发布。我们将在未来发布完整���训练流程。
|
| 343 |
+
|
| 344 |
+
HunyuanVideo-1.5 使用 **Muon 优化器**进行训练,该优化器能够加速收敛并提高训练稳定性。Muon 优化器结合了基于动量的更新和 Newton-Schulz 正交化方法,可高效优化大规模视频生成模型。
|
| 345 |
+
|
| 346 |
+
### 创建 Muon 优化器
|
| 347 |
+
|
| 348 |
+
以下是如何为您的模型创建 Muon 优化器:
|
| 349 |
+
|
| 350 |
+
```python
|
| 351 |
+
from hyvideo.optim.muon import get_muon_optimizer
|
| 352 |
+
|
| 353 |
+
# 为您的模型创建 Muon 优化器
|
| 354 |
+
optimizer = get_muon_optimizer(
|
| 355 |
+
model=your_model,
|
| 356 |
+
lr=lr, # 学习率
|
| 357 |
+
weight_decay=weight_decay, # 权重衰减
|
| 358 |
+
momentum=momentum, # 动量系数
|
| 359 |
+
adamw_betas=adamw_betas, # 1D 参数的 AdamW betas
|
| 360 |
+
adamw_eps=adamw_eps # AdamW epsilon
|
| 361 |
+
)
|
| 362 |
+
```
|
| 363 |
+
|
| 364 |
+
> 📝 **未完待续**:更多训练细节和完整的训练流程即将发布,敬请期待!
|
| 365 |
+
|
| 366 |
## 🎬 更多示例
|
| 367 |
|特性|示例1|示例2|
|
| 368 |
|------|------|------|
|