tencent
/

HunyuanVideo-1.5

@@ -57,7 +57,7 @@ HunyuanVideo-1.5 is a video generation model that delivers top-tier quality with
 </p>
 ## 🔥🔥🔥 News
-* 🚀 Dec 05, 2025: **New Release**: We now release the 480p I2V step-distilled model, which generates videos in 8 or 12 steps (recommended)! On RTX 4090, end-to-end generation time is reduced by 75%, and a single RTX 4090 can generate videos within 75 seconds. The step-distilled model maintains comparable quality to the original model while achieving significant speedup. For even faster generation, you can also try 4 steps (faster speed with slightly reduced quality). See [Step Distillation Comparison](./assets/step_distillation_comparison.md) for detailed quality comparisons. 🔥🔥🔥🆕
 * 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
 * 🎉 **Diffusers Support**: HunyuanVideo-1.5 is now available on Hugging Face Diffusers! Check out [Diffusers collection](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) for easy integration. 🔥🔥🔥🆕
 * 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
@@ -218,6 +218,7 @@ For users seeking to optimize prompts for other large models, it is recommended
 ### Inference with Source Code
 For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls.
 For models with a vLLM API, note that T2V (text-to-video) and I2V (image-to-video) have different recommended models and environment variables:
@@ -230,6 +231,18 @@ For models with a vLLM API, note that T2V (text-to-video) and I2V (image-to-vide
 Example: Generate a video (works for both T2V and I2V; set `IMAGE_PATH=none` for T2V or provide an image path for I2V)
 ```bash
 export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
 export T2V_REWRITE_MODEL_NAME="<your_model_name>"
@@ -274,15 +287,6 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
   --model_path $MODEL_PATH
 ```
-> **Tips:** If your GPU memory is > 14GB but you encounter OOM (Out of Memory) errors during generation, you can try setting the following environment variable before running:
-> ```bash
-> export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
-> ```
->
-> **Tips:** If you have limited CPU memory and encounter OOM during inference, you can try disable overlapped group offloading by adding the following argument:
-> ```bash
-> --overlap_group_offloading false
-> ```
 ### Command Line Arguments

 </p>
 ## 🔥🔥🔥 News
+* 🚀 Dec 05, 2025: **New Release**: We now release the [480p I2V step-distilled model](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_step_distilled), which generates videos in 8 or 12 steps (recommended)! On RTX 4090, end-to-end generation time is reduced by 75%, and a single RTX 4090 can generate videos within 75 seconds. The step-distilled model maintains comparable quality to the original model while achieving significant speedup. See [Step Distillation Comparison](./assets/step_distillation_comparison.md) for detailed quality comparisons. For even faster generation, you can also try 4 steps (faster speed with slightly reduced quality). **To enable the step-distilled model, run `generate.py` with the `--enable_step_distill` parameter.** See [Usage](#-usage) for detailed usage instructions. 🔥🔥🔥🆕
 * 📚 Training code is coming soon. HunyuanVideo-1.5 is trained using the Muon optimizer, which we have open-sourced in the in [Training](#-training) section. **If you would like to continue training our model or fine-tune it with LoRA, please use the Muon optimizer.**
 * 🎉 **Diffusers Support**: HunyuanVideo-1.5 is now available on Hugging Face Diffusers! Check out [Diffusers collection](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) for easy integration. 🔥🔥🔥🆕
 * 🚀 Nov 27, 2025: We now support cache inference (deepcache, teacache, taylorcache), achieving significant speedup! Pull the latest code to try it. 🔥🔥🔥🆕
 ### Inference with Source Code
 For prompt rewriting, we recommend using Gemini or models deployed via vLLM. This codebase currently only supports models compatible with the vLLM API. If you wish to use Gemini, you will need to implement your own interface calls.
 For models with a vLLM API, note that T2V (text-to-video) and I2V (image-to-video) have different recommended models and environment variables:
 Example: Generate a video (works for both T2V and I2V; set `IMAGE_PATH=none` for T2V or provide an image path for I2V)
+> 💡 **Tip**: For faster inference speed, you can enable the step-distilled model using the `--enable_step_distill` parameter. The step-distilled model (480p I2V) can generate videos in 8 or 12 steps (recommended), achieving up to 75% speedup on RTX 4090 while maintaining comparable quality.
+>
+> **Tips:** If your GPU memory is > 14GB but you encounter OOM (Out of Memory) errors during generation, you can try setting the following environment variable before running:
+> ```bash
+> export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
+> ```
+>
+> **Tips:** If you have limited CPU memory and encounter OOM during inference, you can try disable overlapped group offloading by adding the following argument:
+> ```bash
+> --overlap_group_offloading false
+> ```
 ```bash
 export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
 export T2V_REWRITE_MODEL_NAME="<your_model_name>"
   --model_path $MODEL_PATH
 ```
 ### Command Line Arguments

README_CN.md CHANGED Viewed

@@ -40,7 +40,7 @@ HunyuanVideo-1.5作为一款轻量级视频生成模型，仅需83亿参数即
 </p>
 ## 🔥🔥🔥 最新动态
-* 🚀 Dec 05, 2025: **新模型发布**：我们现已发布 480p I2V 步数蒸馏模型，建议使用 8 或 12 步生成视频！在 RTX 4090 上，端到端生成耗时减少 75%，单卡 RTX 4090 可在 75 秒内生成视频。步数蒸馏模型在保持与原模型相当质量的同时实现了显著的加速。如需更快的生成速度，您也可以尝试使用4步推理（速度更快，质量略有下降）。详细的质量对比请参见[步数蒸馏对比文档](./assets/step_distillation_comparison.md)。 🔥🔥🔥🆕
 * 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练，我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型，或使用 LoRA 进行微调，请使用 Muon 优化器。**
 * 🎉 **Diffusers 支持**：HunyuanVideo-1.5 现已支持 Hugging Face Diffusers！查看我们的 [Diffusers 集合](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) 以便轻松集成。 🔥🔥🔥🆕
 * 🚀 Nov 27, 2025: 我们现已支持 cache 推理（deepcache, teacache, taylorcache），可极大加速推理！请 pull 最新代码体验。 🔥🔥🔥🆕
@@ -215,6 +215,18 @@ pip install -i https://mirrors.tencent.com/pypi/simple/ --upgrade tencentcloud-s
 示例：生成视频（支持 T2V/I2V。T2V 模式下设置 `IMAGE_PATH=none`，I2V 模式下指定图像路径）
 ```bash
 export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
 export T2V_REWRITE_MODEL_NAME="<your_model_name>"
@@ -259,16 +271,6 @@ torchrun --nproc_per_node=$N_INFERENCE_GPU generate.py \
   --model_path $MODEL_PATH
 ```
-> **Tips:** 如果您的 GPU 内存 > 14GB 但您在生成过程中遇到 OOM (Out of Memory) 错误，可以尝试在运行前设置以下环境变量：
-> ```bash
-> export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
-> ```
->
-> **Tips:** 如果您有 CPU 内存有限并且遇到推理时的 OOM 错误，可以尝试禁用重叠组卸载，通过添加以下参数：
-> ```bash
-> --overlap_group_offloading false
-> ```
 ### 命令行参数
 | 参数 | 类型 | 是否必需 | 默认值 | 描述 |

 </p>
 ## 🔥🔥🔥 最新动态
+* 🚀 Dec 05, 2025: **新模型发布**：我们现已发布 [480p I2V 步数蒸馏模型](https://huggingface.co/tencent/HunyuanVideo-1.5/tree/main/transformer/480p_i2v_step_distilled)，建议使用 8 或 12 步生成视频！在 RTX 4090 上，端到端生成耗时减少 75%，单卡 RTX 4090 可在 75 秒内生成视频。步数蒸馏模型在保持与原模型相当质量的同时实现了显著的加速。详细的质量对比请参见[步数蒸馏对比文档](./assets/step_distillation_comparison.md)。如需更快的生成速度，您也可以尝试使用4步推理（速度更快，质量略有下降）。**启用步数蒸馏模型，请运行 `generate.py` 并使用 `--enable_step_distill` 参数。** 详细的使用说明请参见[使用方法](#-使用方法)。 🔥🔥🔥🆕
 * 📚 训练代码即将发布。HunyuanVideo-1.5 使用 Muon 优化器进行训练，我们在[Training](#-training) 部分开源。**如果您希望继续训练我们的模型，或使用 LoRA 进行微调，请使用 Muon 优化器。**
 * 🎉 **Diffusers 支持**：HunyuanVideo-1.5 现已支持 Hugging Face Diffusers！查看我们的 [Diffusers 集合](https://huggingface.co/collections/hunyuanvideo-community/hunyuanvideo-15) 以便轻松集成。 🔥🔥🔥🆕
 * 🚀 Nov 27, 2025: 我们现已支持 cache 推理（deepcache, teacache, taylorcache），可极大加速推理！请 pull 最新代码体验。 🔥🔥🔥🆕
 示例：生成视频（支持 T2V/I2V。T2V 模式下设置 `IMAGE_PATH=none`，I2V 模式下指定图像路径）
+> 💡 **提示**：为了更快的推理速度，您可以使用 `--enable_step_distill` 参数启用步数蒸馏模型。步数蒸馏模型（480p I2V）可使用 8 或 12 步（推荐）生成视频，在 RTX 4090 上可提速高达 75%，同时保持相当的质量。
+>
+> **Tips:** 如果您的 GPU 内存 > 14GB 但您在生成过程中遇到 OOM (Out of Memory) 错误，可以尝试在运行前设置以下环境变量：
+> ```bash
+> export PYTORCH_CUDA_ALLOC_CONF=expandable_segments:True,max_split_size_mb:128
+> ```
+>
+> **Tips:** 如果您有 CPU 内存有限并且遇到推理时的 OOM 错误，可以尝试禁用重叠组卸载，通过添加以下参数：
+> ```bash
+> --overlap_group_offloading false
+> ```
 ```bash
 export T2V_REWRITE_BASE_URL="<your_vllm_server_base_url>"
 export T2V_REWRITE_MODEL_NAME="<your_model_name>"
   --model_path $MODEL_PATH
 ```
 ### 命令行参数
 | 参数 | 类型 | 是否必需 | 默认值 | 描述 |