File size: 1,099 Bytes
d36b1f3 588b7a3 d36b1f3 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 |
---
library_name: transformers
tags:
- multimodal
- reasoning
- sft
- rl
datasets:
- multimodal-reasoning-lab/Zebra-CoT
- ModalityDance/Omni-Bench
base_model:
- GAIR/Anole-7b-v0.1
pipeline_tag: any-to-any
---
# Omni-R1
Omni-R1 is trained with multimodal interleaved supervision. It uses PeSFT for stable functional image generation, then PeRPO for RL refinement on unified tasks.
<p align="center">
<a href="https://arxiv.org/abs/2601.09536"><b>Paper</b>👁️</a> ·
<a href="https://github.com/ModalityDance/Omni-R1"><b>Code</b>🐙</a> ·
<a href="https://huggingface.co/datasets/ModalityDance/Omni-Bench"><b>Omni-Bench</b>🧪</a>
</p>
## Citation
```bibtex
@misc{cheng2026omnir1unifiedgenerativeparadigm,
title={Omni-R1: Towards the Unified Generative Paradigm for Multimodal Reasoning},
author={Dongjie Cheng and Yongqi Li and Zhixin Ma and Hongru Cai and Yupeng Hu and Wenjie Wang and Liqiang Nie and Wenjie Li},
year={2026},
eprint={2601.09536},
archivePrefix={arXiv},
primaryClass={cs.AI},
url={https://arxiv.org/abs/2601.09536},
}
``` |