CoVT: Chain-of-Visual-Thought
Collection
Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought!
•
7 items
•
Updated
•
5
Checkpoint of https://huggingface.co/papers/2511.19418.
This CoVT checkpoint is aligned with 4 Depth tokens, based on LLaVA-v1.5-13B.
These task-specific tokens are integrated into the model’s embedding space to enhance 3D-awareness.