JinaVDR (Visual Document Retrieval) Collection max. ~1000 images and OCR text included • 42 items • Updated Jul 20 • 7
OpenVision: A Fully-Open, Cost-Effective Family of Advanced Vision Encoders for Multimodal Learning Paper • 2505.04601 • Published May 7 • 29
SpatialScore: Towards Unified Evaluation for Multimodal Spatial Understanding Paper • 2505.17012 • Published May 22 • 12
SpaceThinker Collection Test Time Compute for Quantitative Spatial Reasoning using synthetic reasoning traces from 3D scene graphs • 7 items • Updated Oct 23 • 2
Cosmos-Transfer1 Collection Multimodal Conditional World Generation for World2World Transfer • 6 items • Updated 4 days ago • 29
PixMo Collection A set of vision-language datasets built by Ai2 and used to train the Molmo family of models. Read more at https://molmo.allenai.org/blog • 10 items • Updated 8 days ago • 81
LLM-Neo Collection Model hub for LLM-Neo, including Llama3.1-Neo-1B-100w and Minitron-4B-Depth-Neo-10w. • 3 items • Updated Nov 20, 2024 • 6
VLM Judge Distillation Collection Distilling the 13B SpaceLLaVA VLM-as-a-Judge into a Florence-2 model to efficiently quality filter spatialVQA datasets like OpenSpaces • 4 items • Updated Nov 14, 2024 • 1
DepthPro Models Collection Depth Pro: Sharp Monocular Metric Depth in Less Than a Second • 4 items • Updated Aug 25 • 10
OpenSpaces VLMs Collection VLMs fine-tuned for spatial VQA using the OpenSpaces dataset. • 5 items • Updated Mar 30 • 2
SpaceVLMs Collection Features VLMs fine-tuned for enhanced spatial reasoning using a synthetic data pipeline similar to Spatial VLM. • 11 items • Updated Feb 13 • 6
SpatialVLM: Endowing Vision-Language Models with Spatial Reasoning Capabilities Paper • 2401.12168 • Published Jan 22, 2024 • 29
LEAP Hand: Low-Cost, Efficient, and Anthropomorphic Hand for Robot Learning Paper • 2309.06440 • Published Sep 12, 2023 • 11