| | --- |
| | base_model: |
| | - deepseek-ai/Janus-Pro-7B |
| | datasets: |
| | - Franklin0/ReasonGen-R1-RL-Geneval-12k |
| | - Franklin0/ReasonGen-R1-RL-DPG-5k |
| | - Franklin0/ReasonGen-R1-RL-T2I-11k |
| | library_name: transformers |
| | license: apache-2.0 |
| | pipeline_tag: text-to-image |
| | --- |
| | |
| | # Model Card for ReasonGen-R1: Chain-of-Thought Reasoning for Autoregressive Image Generation |
| |
|
| | ReasonGen-R1 is an autoregressive image generation model incorporating chain-of-thought reasoning. |
| | Official checkpoint for the paper "[ReasonGen-R1: Cot for Autoregressive Image generation models through SFT and RL](https://huggingface.co/papers/2505.24875)". |
| |
|
| | Website: https://aka.ms/reasongen |
| |
|
| | Code: https://github.com/Franklin-Zhang0/Image-RL |
| |
|
| | <!-- markdownlint-disable first-line-h1 --> |
| | <!-- markdownlint-disable html --> |
| | <!-- markdownlint-disable no-duplicate-header --> |
| |
|
| | <h1> 🚀 ReasonGen-R1: <br> Cot for Autoregressive Image generation models through SFT and RL</h1> |
| |
|
| | </div> |
| |
|
| | <div align="center"> |
| |
|
| | <a href="https://aka.ms/reasongen" target="_blank"> |
| | <img alt="Homepage" src="https://img.shields.io/badge/HomePage-blue" /> |
| | </a> |
| | </a> |
| | <a href="https://huggingface.co/collections/Franklin0/reasongen-r1-6836ed61fc4f6db543c0d368" target="_blank"> |
| | <img alt="Hugging Face" src="https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-ReasonGen%20R1-ffc107?color=ffc107&logoColor=white" /> |
| | </a> |
| | |
| | </div> |
| |
|
| | <p align="center"> |
| | <a href="#2-model-download"><b>📥 Model Download</b></a> | |
| | <a href="#3-quick-start"><b>⚡ Quick Start</b></a> | |
| | <a href="#4-acknowledgements"><b>📜 Acknowledgement</b></a> | |
| | <a href="#5-citation"><b>📖 Citation</b></a> <br> |
| | 📄 <a href="https://arxiv.org/abs/2505.24875"><b>Arxiv Link</b></a> |
| | </p> |
| |
|
| | ## 1. Introduction |
| |
|
| | Although chain-of-thought (CoT) reasoning and reinforcement learning (RL) have driven breakthroughs in NLP, their integration into generative vision models remains underexplored. We introduce ReasonGen-R1, a two-stage framework that first imbues an autoregressive image generator with explicit text-based "thinking" skills via supervised fine-tuning (SFT) on a newly generated reasoning dataset of written rationales, and then refines its outputs using Group Relative Policy Optimization (GRPO). |
| | To enable the model to reason through text before generating images, We automatically generate and release a corpus of model-crafted rationales paired with visual prompts, enabling controlled planning of object layouts, styles, and scene compositions. |
| | Our GRPO algorithm uses reward signals from a pretrained vision–language model to assess overall visual quality, optimizing the policy in each update. |
| | Evaluations on Geneval, DPG, and the T2I benchmark demonstrate that ReasonGen-R1 consistently outperforms strong baselines and prior state-of-the-art models. We will open-source our generated reasoning dataset and training code to accelerate further advances in text-based reasoning–driven image generation. |
| | <div align="center"> |
| | <img alt="image" src="images/model_structure_white_bg.png" style="width:90%;"> |
| | <br> |
| | <img alt="image" src="images/benchmark_and_comparison_white_bg.png" style="width:90%; margin-top: 10px;"> |
| | </div> |
| | |
| |
|
| | ## 2. Model Download |
| | ### Huggingface |
| |
|
| | | Model | Download | |
| | |-----------------------|-----------------------------------------------------------------------------| |
| | | ReasonGen-R1 | [🤗 Hugging Face](https://huggingface.co/Franklin0/ReasonGen-R1) | |
| | | ReasonGen-R1-SFT-Only | [🤗 Hugging Face](https://huggingface.co/Franklin0/ReasonGen-R1-SFT) | |
| |
|
| | | Dataset | Download | |
| | |-----------------------|-----------------------------------------------------------------------------| |
| | | ReasonGen-R1-Datasets | [🤗 Hugging Face](https://huggingface.co/collections/Franklin0/reasongen-r1-6836ed61fc4f6db543c0d368) | |
| |
|
| |
|
| | ## 3. Quick Start |
| |
|
| | ### Installation |
| |
|
| | You can install the necessary dependencies by running the following command: |
| |
|
| | ```shell |
| | cd ~ |
| | mkdir project |
| | cd project |
| | conda create -n image_rl python==3.12 -y |
| | conda activate image_rl |
| | pip3 install torch==2.6.0 torchvision --index-url https://download.pytorch.org/whl/cu124 |
| | pip3 install flash-attn --no-build-isolation |
| | git clone https://github.com/Franklin-Zhang0/ReasonGen-R1.git |
| | cd ReasonGen-R1 |
| | pip install -r requirements.txt |
| | pip install -e . |
| | pip install -e ./Janus |
| | ``` |
| |
|
| | <details> |
| | <summary><h3>Evaluation Environment Installation (Optional)</h3></summary> |
| | If you want to run the evaluation code, you can install the evaluation environment by running the following commands: |
| |
|
| | ```shell |
| | # Geneval |
| | cd ~ |
| | mkdir project |
| | cd project |
| | git clone https://github.com/djghosh13/geneval.git |
| | cd geneval |
| | conda deactivate |
| | conda create -n geneval python=3.9 -y |
| | conda activate geneval |
| | pip install torch==1.13.1 torchvision==0.14.1 torchaudio==0.13.1 |
| | pip install mmcv-full==1.7.0 -f https://download.openmmlab.com/mmcv/dist/cu117/torch1.13/index.html |
| | pip install mmengine==0.7.3 |
| | |
| | pip install pandas |
| | pip install numpy==1.23.1 |
| | |
| | pip install open-clip-torch |
| | pip install clip-benchmark |
| | |
| | git clone https://github.com/open-mmlab/mmdetection.git |
| | cd mmdetection; git checkout 2.x |
| | pip install -v -e . |
| | |
| | cd ../ |
| | bash ./evaluation/download_models.sh "./models" |
| | ``` |
| |
|
| | ```shell |
| | # DPG |
| | cd ~ |
| | cd project |
| | git clone https://github.com/TencentQQGYLab/ELLA.git |
| | cd ELLA |
| | cp ~/project/ReasonGen-R1/benchmark/requirements-for-dpg_bench.txt . |
| | conda deactivate |
| | conda create -n dpg_test python=3.9 -y |
| | conda activate dpg_test |
| | conda install conda-forge::fairseq -y |
| | pip install -r requirements-for-dpg_bench.txt |
| | ``` |
| |
|
| | Once the eval environment is setup, you can use the following commands to run the evaluation: |
| | ```shell |
| | bash -i benchmark/geneval.sh |
| | bash -i benchmark/dpg_eval.sh |
| | ``` |
| | </details> |
| |
|
| | ### Inference |
| | To inference with the ReasonGen-R1 model, you can use the following command: |
| | ```shell |
| | python ReasonGen-R1/Janus/cot_generate_inference.py |
| | ``` |
| |
|
| | ### SFT Training |
| | To train the SFT model from Janus-Pro-7B model on the ReasonGen-R1-SFT-200k dataset, you can use the following command: |
| | ```shell |
| | bash ReasonGen-R1/examples/janus_sft.sh |
| | ``` |
| |
|
| | ### RL Training |
| | To train the RL model from the ReasonGen-R1-SFT model, you can use the following command: |
| | ```shell |
| | bash ReasonGen-R1/Janus/janus_rl.py |
| | ``` |
| |
|
| |
|
| | ## 4. Acknowledgements |
| | We would like to thank <a href="https://github.com/volcengine/verl">Verl</a>, upon which our repo is built. |
| |
|
| | ## 5. Citation |
| |
|
| | ```bibtex |
| | @misc{zhang2025reasongenr1cotautoregressiveimage, |
| | title={ReasonGen-R1: CoT for Autoregressive Image generation models through SFT and RL}, |
| | author={Yu Zhang and Yunqi Li and Yifan Yang and Rui Wang and Yuqing Yang and Dai Qi and Jianmin Bao and Dongdong Chen and Chong Luo and Lili Qiu}, |
| | year={2025}, |
| | eprint={2505.24875}, |
| | archivePrefix={arXiv}, |
| | primaryClass={cs.CV}, |
| | url={https://arxiv.org/abs/2505.24875}, |
| | } |
| | ``` |