Update README.md
Browse files
README.md
CHANGED
|
@@ -1,10 +1,6 @@
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
| 4 |
-
---
|
| 5 |
-
license: apache-2.0
|
| 6 |
-
---
|
| 7 |
-
|
| 8 |
# FastVideo FastWan2.1-T2V-14B-480P-Diffusers
|
| 9 |
<p align="center">
|
| 10 |
<img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.jpg" width="200"/>
|
|
@@ -40,7 +36,7 @@ This model is jointly finetuned with [DMD](https://arxiv.org/pdf/2405.14867) and
|
|
| 40 |
### Training Infrastructure
|
| 41 |
|
| 42 |
Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`.
|
| 43 |
-
We enable `gradient checkpointing`, set `HSDP_shard_dim = 8`,
|
| 44 |
We set **VSA attention sparsity** to 0.9, and training runs for **3000 steps (~52 hours)**
|
| 45 |
The detailed training example script is available [here](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v_14B_480P.slurm).
|
| 46 |
|
|
|
|
| 1 |
---
|
| 2 |
license: apache-2.0
|
| 3 |
---
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
# FastVideo FastWan2.1-T2V-14B-480P-Diffusers
|
| 5 |
<p align="center">
|
| 6 |
<img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.jpg" width="200"/>
|
|
|
|
| 36 |
### Training Infrastructure
|
| 37 |
|
| 38 |
Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`.
|
| 39 |
+
We enable `gradient checkpointing`, set `HSDP_shard_dim = 8`, `sequence_parallel_size = 4`, and use `learning rate = 1e-5`.
|
| 40 |
We set **VSA attention sparsity** to 0.9, and training runs for **3000 steps (~52 hours)**
|
| 41 |
The detailed training example script is available [here](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v_14B_480P.slurm).
|
| 42 |
|