hexuan21 commited on
Commit
623b576
·
verified ·
1 Parent(s): 368fab4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +9 -1
README.md CHANGED
@@ -19,6 +19,12 @@ pipeline_tag: visual-question-answering
19
 
20
 
21
  ## Introduction
 
 
 
 
 
 
22
 
23
  ## Usage
24
 
@@ -193,6 +199,7 @@ see [VideoScore2/training](https://github.com/TIGER-AI-Lab/VideoScore2/tree/main
193
  see [VideoScore2/evaluation](https://github.com/TIGER-AI-Lab/VideoScore2/tree/main/eval) for details
194
 
195
  ## Citation
 
196
  @misc{he2025videoscore2thinkscoregenerative,
197
  title={VideoScore2: Think before You Score in Generative Video Evaluation},
198
  author={Xuan He and Dongfu Jiang and Ping Nie and Minghao Liu and Zhengxuan Jiang and Mingyi Su and Wentao Ma and Junru Lin and Chun Ye and Yi Lu and Keming Wu and Benjamin Schneider and Quy Duc Do and Zhuofeng Li and Yiming Jia and Yuxuan Zhang and Guo Cheng and Haozhe Wang and Wangchunshu Zhou and Qunshu Lin and Yuanxing Zhang and Ge Zhang and Wenhao Huang and Wenhu Chen},
@@ -201,4 +208,5 @@ see [VideoScore2/evaluation](https://github.com/TIGER-AI-Lab/VideoScore2/tree/ma
201
  archivePrefix={arXiv},
202
  primaryClass={cs.CV},
203
  url={https://arxiv.org/abs/2509.22799},
204
- }
 
 
19
 
20
 
21
  ## Introduction
22
+ We present VideoScore2, a multi-dimensional, interpretable, and human-aligned framework that explicitly evaluates visual quality, text-to-video alignment, and physical/common-sense consistency while producing detailed chain-of-thought rationales.
23
+
24
+ Our model is trained on a large-scale dataset VideoFeedback2 containing 27,168 human-annotated videos with both scores and reasoning traces across three dimensions, using a two-stage pipeline of supervised fine-tuning followed by reinforcement learning with Group Relative Policy Optimization (GRPO) to enhance analytical robustness.
25
+
26
+ Extensive experiments demonstrate that VideoScore2 achieves superior performance with 44.35 (+5.94) accuracy on our in-domain benchmark VideoScore-Bench-v2 and 50.37 (+4.32) average performance across four out-of-domain benchmarks (VideoGenReward-Bench, VideoPhy2, etc).
27
+
28
 
29
  ## Usage
30
 
 
199
  see [VideoScore2/evaluation](https://github.com/TIGER-AI-Lab/VideoScore2/tree/main/eval) for details
200
 
201
  ## Citation
202
+ ```bibtex
203
  @misc{he2025videoscore2thinkscoregenerative,
204
  title={VideoScore2: Think before You Score in Generative Video Evaluation},
205
  author={Xuan He and Dongfu Jiang and Ping Nie and Minghao Liu and Zhengxuan Jiang and Mingyi Su and Wentao Ma and Junru Lin and Chun Ye and Yi Lu and Keming Wu and Benjamin Schneider and Quy Duc Do and Zhuofeng Li and Yiming Jia and Yuxuan Zhang and Guo Cheng and Haozhe Wang and Wangchunshu Zhou and Qunshu Lin and Yuanxing Zhang and Ge Zhang and Wenhao Huang and Wenhu Chen},
 
208
  archivePrefix={arXiv},
209
  primaryClass={cs.CV},
210
  url={https://arxiv.org/abs/2509.22799},
211
+ }
212
+ ```