YiYiXu's picture
Update README.md
f4dbc4a verified
---
license: other
tags:
- text-to-video
---
Hunyuan1.5 use attention masks with variable-length sequences. For best performance, we recommend using an attention backend that handles padding efficiently.
We recommend installing [kernels](https://github.com/huggingface/kernels) (`pip install kernels`) to access prebuilt attention kernels.
You can check our [documentation](https://huggingface.co/docs/diffusers/main/en/optimization/attention_backends) to learn more about all the different attention backends we support.
```py
import torch
dtype = torch.bfloat16
device = "cuda:0"
from diffusers import HunyuanVideo15Pipeline, attention_backend
from diffusers.utils import export_to_video
pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
generator = torch.Generator(device=device).manual_seed(seed)
with attention_backend("_flash_3_hub"): # or `"flash_hub"` if you are not on H100/H800
video = pipe(
prompt=prompt,
generator=generator,
num_frames=121,
num_inference_steps=50,
).frames[0]
export_to_video(video, "output.mp4", fps=24)
```
To run inference with default attention backend
```py
import torch
dtype = torch.bfloat16
device = "cuda:0"
from diffusers import HunyuanVideo15Pipeline
from diffusers.utils import export_to_video
pipe = HunyuanVideo15Pipeline.from_pretrained("hunyuanvideo-community/HunyuanVideo-1.5-Diffusers-720p_t2v", torch_dtype=dtype)
pipe.enable_model_cpu_offload()
pipe.vae.enable_tiling()
generator = torch.Generator(device=device).manual_seed(seed)
video = pipe(
prompt=prompt,
generator=generator,
num_frames=121,
num_inference_steps=50,
).frames[0]
export_to_video(video, "output.mp4", fps=24)
```