Update README.md
Browse files
README.md
CHANGED
|
@@ -4,4 +4,49 @@ language:
|
|
| 4 |
base_model:
|
| 5 |
- Wan-AI/Wan2.1-T2V-14B
|
| 6 |
pipeline_tag: text-to-image
|
| 7 |
-
---
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
| 4 |
base_model:
|
| 5 |
- Wan-AI/Wan2.1-T2V-14B
|
| 6 |
pipeline_tag: text-to-image
|
| 7 |
+
---
|
| 8 |
+
|
| 9 |
+
# MMPL — Planning-based T2V Model (finetuned from Wan2.1-14B-T2V)
|
| 10 |
+
|
| 11 |
+
> **概述**:**MMPL** 是在 **Wan2.1-14B-T2V** 基座上,使用 **50k 私有高质量视频数据** 进行微调得到的“规划式(Planning)”文本到视频生成模型。我们**不修改任何原始架构**,仅通过高质量规划式指令与监督,显著提升主体/背景一致性、运动平滑与文本-视觉对齐的人评表现。
|
| 12 |
+
|
| 13 |
+
---
|
| 14 |
+
|
| 15 |
+
## 🔍 模型介绍
|
| 16 |
+
|
| 17 |
+
- **基座模型**:Wan2.1-14B-T2V
|
| 18 |
+
- **微调数据**:50k 条私有高质量视频-文本指令样本
|
| 19 |
+
- **训练目标**:在保持原架构不变的前提下,提升主体一致性、背景一致性、运动平滑与人评可控性
|
| 20 |
+
|
| 21 |
+
---
|
| 22 |
+
|
| 23 |
+
## 🏗️ 模型架构
|
| 24 |
+
|
| 25 |
+
MMPL **完全沿用** Wan2.1-14B-T2V 的原始网络结构与推理范式:
|
| 26 |
+
|
| 27 |
+
- **整体结构**:与 Wan2.1 一致(Decoder-only/扩散式堆叠)
|
| 28 |
+
- **注意力/激活/归一化/位置编码**:与基座一致(兼容 FlashAttention2 等加速)
|
| 29 |
+
- **时空分辨率与采样调度**:与基座配置一致
|
| 30 |
+
- **不改动任何子模块**(头数、通道数、FFN 维度、位置编码等),仅在参数上微调。
|
| 31 |
+
|
| 32 |
+
---
|
| 33 |
+
|
| 34 |
+
## 📊 评测结果
|
| 35 |
+
|
| 36 |
+
> 下表为 MMPL 与常见方法的对比。**粗体**表示该列最优。
|
| 37 |
+
|
| 38 |
+
### VBench Evaluation & Human Evaluation
|
| 39 |
+
|
| 40 |
+
| 模型 | Subject Consistency | Background Consistency | Motion Smoothness | Aesthetic Quality | Imaging Quality | Text-Visual Alignment | Content Consistency | Color Shift |
|
| 41 |
+
|---|---:|---:|---:|---:|---:|---:|---:|---:|
|
| 42 |
+
| **Causal** | | | | | | | | |
|
| 43 |
+
| FIFO (Kim et al., 2024) | 0.956 | 0.960 | 0.949 | 0.588 | 0.603 | – | – | – |
|
| 44 |
+
| **Distilled Causal** | | | | | | | | |
|
| 45 |
+
| CausVid (Yin et al., 2025) | 0.969 | **0.980** | 0.981 | 0.606 | 0.661 | 34.7 | 33.0 | 25.0 |
|
| 46 |
+
| SF (Huang et al., 2025a) | 0.967 | 0.958 | 0.980 | 0.593 | **0.689** | 52.0 | 46.1 | 50.5 |
|
| 47 |
+
| **DF Causal** | | | | | | | | |
|
| 48 |
+
| SkyReels (Chen et al., 2025) | 0.956 | 0.966 | 0.991 | 0.600 | 0.581 | 47.9 | **51.4** | 51.3 |
|
| 49 |
+
| MAGI-1 (Teng et al., 2025) | 0.979 | 0.970 | 0.991 | 0.604 | 0.612 | 34.7 | 40.4 | 39.5 |
|
| 50 |
+
| **Planning** | | | | | | | | |
|
| 51 |
+
| **MMPL (ours)** | **0.980** | 0.968 | **0.992** | 0.628 | 0.661 | **80.0** | 79.2 | **83.1** |
|
| 52 |
+
|