weihongliang
/

RC-Qwen2VL-7b

personalized_multimodal_understanding

Model card Files Files and versions

weihongliang commited on Aug 23

Commit

e257e8f

·

verified ·

1 Parent(s): 7d54721

Update README.md

Files changed (1) hide show

README.md +1 -0

README.md CHANGED Viewed

@@ -21,6 +21,7 @@ tags:
 * **Objective**: To solve the key challenge of integrating the visual content of **specific objects** in an image with their **associated text** for comprehensive understanding.
 * **Method**: Trained with the **RCVIT(Region-level Context-aware Visual Instruction Tuning)** method and **RCMU** dataset, it uses **bounding boxes** to precisely link visual content with text.
 * **Performance & Applications**: It achieves outstanding performance on RCMU tasks and is successfully applied in advanced scenarios like **multimodal RAG** and **personalized conversation**.
 ## Refer to Qwen2-VL for the requirements:

 * **Objective**: To solve the key challenge of integrating the visual content of **specific objects** in an image with their **associated text** for comprehensive understanding.
 * **Method**: Trained with the **RCVIT(Region-level Context-aware Visual Instruction Tuning)** method and **RCMU** dataset, it uses **bounding boxes** to precisely link visual content with text.
 * **Performance & Applications**: It achieves outstanding performance on RCMU tasks and is successfully applied in advanced scenarios like **multimodal RAG** and **personalized conversation**.
+![](./qwen-exam.jpg)
 ## Refer to Qwen2-VL for the requirements: