KevinNg99 commited on
Commit
b71eee2
·
1 Parent(s): 3dda2be

update README

Browse files
Files changed (2) hide show
  1. README.md +14 -10
  2. README_CN.md +13 -11
README.md CHANGED
@@ -57,7 +57,7 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
57
  </p>
58
 
59
  ## 🔥🔥🔥 News
60
- * 🚀 Dec 05, 2025: **New Release**: We now release the 480p I2V step-distilled model, which generates videos in 8 or 12 steps (recommended)! On RTX 4090, end-to-end generation time is reduced by 75%, and a single RTX 4090 can generate videos within 75 seconds. The step-distilled model maintains comparable quality to the original model while achieving significant speedup. For even faster generation, you can also try 4 steps (faster speed with slightly reduced quality). See [Step Distillation Comparison](./assets/step_distillation_comparison.md) for detailed quality comparisons. 🔥🔥🔥🆕
61
  * 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
62
  * 🎉 **Diffusers Support**: HunyuanVideo-1.5 is now available on Hugging Face Diffusers! Check out [Diffusers collection](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) for easy integration. 🔥🔥🔥🆕
63
  * 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
@@ -218,6 +218,7 @@ For users seeking to optimize prompts for other large models, it is recommended
218
 
219
  ### Inference with Source Code
220
 
 
221
  For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls.
222
 
223
  For models with a vLLM API, note that T2V (text-to-video) and I2V (image-to-video) have different recommended models and environment variables:
@@ -230,6 +231,18 @@ For models with a vLLM API, note that T2V (text-to-video) and I2V (image-to-vide
230
 
231
  Example: Generate a video (works for both T2V and I2V; set `IMAGE_PATH=none` for T2V or provide an image path for I2V)
232
 
 
 
 
 
 
 
 
 
 
 
 
 
233
  ```bash
234
  export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
235
  export T2V_REWRITE_MODEL_NAME="<your_model_name>"
@@ -274,15 +287,6 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
274
  --model_path $MODEL_PATH
275
  ```
276
 
277
- > **Tips:** If your GPU memory is > 14GB but you encounter OOM (Out of Memory) errors during generation, you can try setting the following environment variable before running:
278
- > ```bash
279
- > export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
280
- > ```
281
- >
282
- > **Tips:** If you have limited CPU memory and encounter OOM during inference, you can try disable overlapped group offloading by adding the following argument:
283
- > ```bash
284
- > --overlap_group_offloading false
285
- > ```
286
 
287
 
288
  ### Command Line Arguments
 
57
  </p>
58
 
59
  ## 🔥🔥🔥 News
60
+ * 🚀 Dec 05, 2025: **New Release**: We now release the [480p I2V step-distilled model](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_step_distilled), which generates videos in 8 or 12 steps (recommended)! On RTX 4090, end-to-end generation time is reduced by 75%, and a single RTX 4090 can generate videos within 75 seconds. The step-distilled model maintains comparable quality to the original model while achieving significant speedup. See [Step Distillation Comparison](./assets/step_distillation_comparison.md) for detailed quality comparisons. For even faster generation, you can also try 4 steps (faster speed with slightly reduced quality). **To enable the step-distilled model, run `generate.py` with the `--enable_step_distill` parameter.** See [Usage](#-usage) for detailed usage instructions. 🔥🔥🔥🆕
61
  * 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
62
  * 🎉 **Diffusers Support**: HunyuanVideo-1.5 is now available on Hugging Face Diffusers! Check out [Diffusers collection](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) for easy integration. 🔥🔥🔥🆕
63
  * 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
 
218
 
219
  ### Inference with Source Code
220
 
221
+
222
  For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls.
223
 
224
  For models with a vLLM API, note that T2V (text-to-video) and I2V (image-to-video) have different recommended models and environment variables:
 
231
 
232
  Example: Generate a video (works for both T2V and I2V; set `IMAGE_PATH=none` for T2V or provide an image path for I2V)
233
 
234
+ > 💡 **Tip**: For faster inference speed, you can enable the step-distilled model using the `--enable_step_distill` parameter. The step-distilled model (480p I2V) can generate videos in 8 or 12 steps (recommended), achieving up to 75% speedup on RTX 4090 while maintaining comparable quality.
235
+ >
236
+ > **Tips:** If your GPU memory is > 14GB but you encounter OOM (Out of Memory) errors during generation, you can try setting the following environment variable before running:
237
+ > ```bash
238
+ > export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
239
+ > ```
240
+ >
241
+ > **Tips:** If you have limited CPU memory and encounter OOM during inference, you can try disable overlapped group offloading by adding the following argument:
242
+ > ```bash
243
+ > --overlap_group_offloading false
244
+ > ```
245
+
246
  ```bash
247
  export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
248
  export T2V_REWRITE_MODEL_NAME="<your_model_name>"
 
287
  --model_path $MODEL_PATH
288
  ```
289
 
 
 
 
 
 
 
 
 
 
290
 
291
 
292
  ### Command Line Arguments
README_CN.md CHANGED
@@ -40,7 +40,7 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型,仅需83亿参数即
40
  </p>
41
 
42
  ## 🔥🔥🔥 最新动态
43
- * 🚀 Dec 05, 2025: **新模型发布**:我们现已发布 480p I2V 步数蒸馏模型,建议使用 8 或 12 步生成视频!在 RTX 4090 上,端到端生成耗时减少 75%,单卡 RTX 4090 可在 75 秒内生成视频。步数蒸馏模型在保持与原模型相当质量的同时实现了显著的加速。如需更快的生成速度,您也可以尝试使用4步推理(速度更快,质量略有下降)。详细的质量对比请参见[步数蒸馏对比文档](./assets/step_distillation_comparison.md)。 🔥🔥🔥🆕
44
  * 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练,我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型,或使用 LoRA 进行微调,请使用 Muon 优化器。**
45
  * 🎉 **Diffusers 支持**:HunyuanVideo-1.5 现已支持 Hugging Face Diffusers!查看我们的 [Diffusers 集合](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) 以便轻松集成。 🔥🔥🔥🆕
46
  * 🚀 Nov 27, 2025: 我们现已支持 cache 推理(deepcache, teacache, taylorcache),可极大加速推理!请 pull 最新代码体验。 🔥🔥🔥🆕
@@ -215,6 +215,18 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
215
 
216
  示例:生成视频(支持 T2V/I2V。T2V 模式下设置 `IMAGE_PATH=none`,I2V 模式下指定图像路径)
217
 
 
 
 
 
 
 
 
 
 
 
 
 
218
  ```bash
219
  export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
220
  export T2V_REWRITE_MODEL_NAME="<your_model_name>"
@@ -259,16 +271,6 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
259
  --model_path $MODEL_PATH
260
  ```
261
 
262
- > **Tips:** 如果您的 GPU 内存 > 14GB 但您在生成过程中遇到 OOM (Out of Memory) 错误,可以尝试在运行前设置以下环境变量:
263
- > ```bash
264
- > export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
265
- > ```
266
- >
267
- > **Tips:** 如果您有 CPU 内存有限并且遇到推理时的 OOM 错误,可以尝试禁用重叠组卸载,通过添加以下参数:
268
- > ```bash
269
- > --overlap_group_offloading false
270
- > ```
271
-
272
  ### 命令行参数
273
 
274
  | 参数 | 类型 | 是否必需 | 默认值 | 描述 |
 
40
  </p>
41
 
42
  ## 🔥🔥🔥 最新动态
43
+ * 🚀 Dec 05, 2025: **新模型发布**:我们现已发布 [480p I2V 步数蒸馏模型](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_step_distilled),建议使用 8 或 12 步生成视频!在 RTX 4090 上,端到端生成耗时减少 75%,单卡 RTX 4090 可在 75 秒内生成视频。步数蒸馏模型在保持与原模型相当质量的同时实现了显著的加速。详细的质量对比请参见[步数蒸馏对比文档](./assets/step_distillation_comparison.md)。如需更快的生成速度,您也可以尝试使用4步推理(速度更快,质量略有下降)。**启用步数蒸馏模型,请运行 `generate.py` 并使用 `--enable_step_distill` 参数。** 详细的使用说明请参见[使用方法](#-使用方法)。 🔥🔥🔥🆕
44
  * 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练,我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型,或使用 LoRA 进行微调,请使用 Muon 优化器。**
45
  * 🎉 **Diffusers 支持**:HunyuanVideo-1.5 现已支持 Hugging Face Diffusers!查看我们的 [Diffusers 集合](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) 以便轻松集成。 🔥🔥🔥🆕
46
  * 🚀 Nov 27, 2025: 我们现已支持 cache 推理(deepcache, teacache, taylorcache),可极大加速推理!请 pull 最新代码体验。 🔥🔥🔥🆕
 
215
 
216
  示例:生成视频(支持 T2V/I2V。T2V 模式下设置 `IMAGE_PATH=none`,I2V 模式下指定图像路径)
217
 
218
+ > 💡 **提示**:为了更快的推理速度,您可以使用 `--enable_step_distill` 参数启用步数蒸馏模型。步数蒸馏模型(480p I2V)可使用 8 或 12 步(推荐)生成视频,在 RTX 4090 上可提速高达 75%,同时保持相当的质量。
219
+ >
220
+ > **Tips:** 如果您的 GPU 内存 > 14GB 但您在生成过程中遇到 OOM (Out of Memory) 错误,可以尝试在运行前设置以下环境变量:
221
+ > ```bash
222
+ > export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
223
+ > ```
224
+ >
225
+ > **Tips:** 如果您有 CPU 内存有限并且遇到推理时的 OOM 错误,可以尝试禁用重叠组卸载,通过添加以下参数:
226
+ > ```bash
227
+ > --overlap_group_offloading false
228
+ > ```
229
+
230
  ```bash
231
  export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
232
  export T2V_REWRITE_MODEL_NAME="<your_model_name>"
 
271
  --model_path $MODEL_PATH
272
  ```
273
 
 
 
 
 
 
 
 
 
 
 
274
  ### 命令行参数
275
 
276
  | 参数 | 类型 | 是否必需 | 默认值 | 描述 |