FastVideo
/

FastWan2.1-T2V-14B-Diffusers

Model card Files Files and versions

BrianChen1129 commited on Jul 28, 2025

Commit

06c3198

·

verified ·

1 Parent(s): cfa0832

Update README.md

Files changed (1) hide show

README.md +1 -5

README.md CHANGED Viewed

@@ -1,10 +1,6 @@
 ---
 license: apache-2.0
 ---
----
-license: apache-2.0
----
 # FastVideo FastWan2.1-T2V-14B-480P-Diffusers
 <p align="center">
   <img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.jpg" width="200"/>
@@ -40,7 +36,7 @@ This model is jointly finetuned with [DMD](https://arxiv.org/pdf/2405.14867) and
 ### Training Infrastructure
 Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`.
-We enable `gradient checkpointing`, set `HSDP_shard_dim = 8`, 'sequence_parallel_size = 4', and use `learning rate = 1e-5`.
 We set **VSA attention sparsity** to 0.9, and training runs for **3000 steps (~52 hours)**
 The detailed training example script is available [here](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v_14B_480P.slurm).

 ---
 license: apache-2.0
 ---
 # FastVideo FastWan2.1-T2V-14B-480P-Diffusers
 <p align="center">
   <img src="https://raw.githubusercontent.com/hao-ai-lab/FastVideo/main/assets/logo.jpg" width="200"/>
 ### Training Infrastructure
 Training was conducted on **8 nodes with 64 H200 GPUs** in total, using a `global batch size = 64`.
+We enable `gradient checkpointing`, set `HSDP_shard_dim = 8`, `sequence_parallel_size = 4`, and use `learning rate = 1e-5`.
 We set **VSA attention sparsity** to 0.9, and training runs for **3000 steps (~52 hours)**
 The detailed training example script is available [here](https://github.com/hao-ai-lab/FastVideo/blob/main/examples/distill/Wan-Syn-480P/distill_dmd_VSA_t2v_14B_480P.slurm).