P3ngLiu commited on
Commit
473f5b9
·
verified ·
1 Parent(s): 9f93fe0

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +41 -1
README.md CHANGED
@@ -10,4 +10,44 @@ tags:
10
  language:
11
  - zh
12
  - en
13
- ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
10
  language:
11
  - zh
12
  - en
13
+ ---
14
+ # VLM-FO1: Qwen2.5-VL-3B-v01
15
+
16
+ This repository contains the VLM-FO1_Qwen2.5-VL-3B-v01 model, an implementation of the [VLM-FO1](https://github.com/om-ai-lab/VLM-FO1) framework built on the Qwen2.5-VL-3B base model.
17
+
18
+ VLM-FO1 is a novel plug-and-play framework designed to bridge the gap between the high-level reasoning of Vision-Language Models (VLMs) and the need for fine-grained visual perception.
19
+
20
+ ## Model Details
21
+
22
+ ### Model Description
23
+
24
+ VLM-FO1 endows pre-trained VLMs with superior fine-grained perception without compromising their inherent high-level reasoning and general understanding capabilities. It operates as a plug-and-play module that can be integrated with any existing VLM, establishing an effective and flexible paradigm for building the next generation of perception-aware models.
25
+
26
+ VLM-FO1 excels at a wide range of fine-grained perception tasks, including Object Grounding, Region Generative Understanding, Visual Region Reasoning, and more.
27
+
28
+ 🧩 **Plug-and-Play Modularity:** Our framework is designed as a set of enhancement modules that can be seamlessly integrated with any pre-trained VLM, preserving its original weights and capabilities.
29
+
30
+ 🧠 **Hybrid Region Encoder (HFRE):** We introduce a novel Dual-Vision Encoder architecture that fuses semantic-rich features with perception-enhanced features, creating powerful region tokens that capture both high-level meaning and fine-grained spatial detail.
31
+
32
+ 🎯 **State-of-the-Art Performance:** VLM-FO1 achieves SOTA results across a diverse suite of benchmarks.
33
+
34
+ ✅ **Preserves General Abilities:** Our two-stage training strategy ensures that fine-grained perception is gained without causing catastrophic forgetting of the base model's powerful general visual understanding abilities.
35
+
36
+ ### Model Sources
37
+
38
+ - **Repository:** [https://github.com/om-ai-lab/VLM-FO1]
39
+ - **Paper:** [https://arxiv.org/pdf/2509.25916]
40
+
41
+
42
+
43
+ ## Citation
44
+
45
+ ```bibtex
46
+ @article{liu2025vlm,
47
+ title={VLM-FO1: Bridging the Gap Between High-Level Reasoning and Fine-Grained Perception in VLMs},
48
+ author={Liu, Peng and Shen, Haozhan and Fang, Chunxin and Sun, Zhicheng and Liao, Jiajia and Zhao, Tiancheng},
49
+ journal={arXiv preprint arXiv:2509.25916},
50
+ year={2025}
51
+ }
52
+ ```
53
+