Spaces:
Runtime error
Runtime error
A newer version of the Gradio SDK is available:
6.2.0
Model Zoo
Pretraining
For $\text{InternVideo2}{s2}$, we load those models of $\text{InternVideo2}{s1}$ and further pretrain them on multi-modality datasets.
For $\text{InternVideo2}{clip}$, we load those models of $\text{InternVideo2}{s2}$.
| Model | Setting | Model | Pretraining Script |
|---|---|---|---|
| $\text{InternVideo2}_{s2}$-1B | IV-25.5M | :hugs: HF link | script |
| $\text{InternVideo2}_{clip}$-1B | IV-25.5M | TBD | script |
| $\text{InternVideo2}_{s2}$-6B | IV-400M | TBD | script |
| $\text{InternVideo2}_{clip}$-6B | IV-400M | TBD | script |
Zero-shot Evaluation
Zero-Shot Video-Text Retrieval
| Model | Dataset | T2V | V2T | Evaluation Script |
|---|---|---|---|---|
| $\text{InternVideo2}_{s2}$-1B | MSRVTT | 51.9 | 50.9 | script |
| LSMDC | 32.0 | 27.3 | script | |
| DiDeMo | 57.0 | 54.3 | script | |
| MSVD | 58.1 | 83.3 | script | |
| ANet | 60.4 | 54.8 | script | |
| VATEX | 70.4 | 85.4 | script | |
| $\text{InternVideo2}_{s2}$-6B | MSRVTT | 55.9 | 53.7 | TBD |
| LSMDC | 33.8 | 30.1 | TBD | |
| DiDeMo | 57.9 | 57.1 | TBD | |
| MSVD | 59.3 | 83.1 | TBD | |
| ANet | 63.2 | 56.5 | TBD | |
| VATEX | 71.5 | 85.3 | TBD |
| Model | Dataset | T2V | V2T | Evaluation Script |
|---|---|---|---|---|
| $\text{InternVideo2}_{clip}$-1B | MSRVTT | 50.0 | 48.4 | script |
| LSMDC | 26.4 | 23.1 | script | |
| DiDeMo | 47.8 | 46.4 | script | |
| ANet | 49.4 | 46.2 | script | |
| VATEX_en | 63.5 | 81.2 | script | |
| VATEX_ch | 54.9 | 76.4 | script | |
| $\text{InternVideo2}_{clip}$-6B | MSRVTT | 50.9 | 50.6 | script |
| LSMDC | 29.4 | 26.3 | script | |
| DiDeMo | 50.5 | 46.8 | script | |
| ANet | 50.2 | 47.5 | script | |
| VATEX_en | 64.1 | 82.6 | script | |
| VATEX_ch | 54.6 | 76.9 | script |
Zero-Shot Action Recognition
| Model | Dataset | top-1 | AVG | Script |
|---|---|---|---|---|
| $\text{InternVideo2}_{clip}$-1B | K400 | 73.1 | 82.4 | script |
| K600 | 72.8 | 81.8 | script | |
| K700 | 64.9 | 75.2 | script | |
| UCF101 | 88.8 | - | script | |
| HMDB51 | 53.9 | - | script | |
| MiT | 31.6 | - | script | |
| SSv2-MC | 61.5 | - | script | |
| $\text{InternVideo2}_{clip}$-6B | K400 | 72.7 | 82.2 | script |
| K600 | 71.7 | 81.2 | script | |
| K700 | 64.2 | 75.2 | script | |
| UCF101 | 89.5 | - | script | |
| HMDB51 | 56.7 | - | script | |
| MiT | 32.9 | - | script | |
| SSv2-MC | 63.5 | - | script |