CoVT Checkpoint (Segmentation, Depth, and DINO Aligned)

Checkpoint of https://huggingface.co/papers/2511.19418.

Model Description

This CoVT checkpoint is aligned with 4 Depth tokens, based on LLaVA-v1.5-13B.
These task-specific tokens are integrated into the model’s embedding space to enhance 3D-awareness.

Downloads last month: 23

Safetensors

Model size

13B params

Tensor type

F32

F16

BF16

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Collection including Wakals/CoVT-LLaVA-13B-depth

CoVT: Chain-of-Visual-Thought

Collection

Enrich VLMs’ vision-centric reasoning capabilities via Chain-of-Visual-Thought! • 7 items • Updated 12 days ago • 5