internlm
/

CapRL-3B

Image-Text-to-Text

text-generation-inference

Model card Files Files and versions

yuhangzang commited on Sep 28, 2025

Commit

6d287df

·

verified ·

1 Parent(s): 23c8441

Update README.md

Files changed (1) hide show

README.md +4 -0

README.md CHANGED Viewed

@@ -25,6 +25,10 @@ curation pipeline to ensure the quality of the questions and answers used for th
 By employing CapRL training framework, initializing with the Qwen2.5-VL-3B model, and using a carefully
 filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
 ## Key Features
 * **Remarkable visual understanding for Chart, Infographics and Document**: CapRL-3B achieves perception accuracy and visual information coverage comparable to Qwen2.5-VL-72B.
 * **Well-organized output**: The outputs of CapRL-3B are relatively well-structured, making them clear and easy to understand.

 By employing CapRL training framework, initializing with the Qwen2.5-VL-3B model, and using a carefully
 filtered 75K QA dataset as the training set, we obtained a highly capable captioner, CapRL-3B.
+<p align="center">
+    <img src="https://Cooperx521@github.com/InternLM/CapRL/blob/main/assets/teaser.png" width="80%"/>
+<p>
 ## Key Features
 * **Remarkable visual understanding for Chart, Infographics and Document**: CapRL-3B achieves perception accuracy and visual information coverage comparable to Qwen2.5-VL-72B.
 * **Well-organized output**: The outputs of CapRL-3B are relatively well-structured, making them clear and easy to understand.