Spaces:
Running
on
Zero
Running
on
Zero
A newer version of the Gradio SDK is available:
6.1.0
metadata
title: Spatial-SSRL Spatial Reasoning
emoji: π
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Spatial reasoning with vision-language models
π Spatial-SSRL: Spatial Reasoning with Vision-Language Models
This demo showcases the spatial reasoning capabilities of vision-language models trained to understand 3D spatial relationships from 2D images.
Features
- 3D Location Understanding: Determine which objects are closer or further from the camera
- Orientation Analysis: Understand which direction objects are facing
- Relative Positioning: Answer questions about object positions relative to each other
- Step-by-step Reasoning: The model provides detailed reasoning before answering
How to Use
- Upload an image
- Ask a question about spatial relationships in the image
- The model will provide a detailed answer with reasoning
Example Questions
- "Which object is further away from the camera? A. boat B. fire hydrant"
- "Are the kid and the teddy bear facing same or similar directions?"
- "If I stand at the recreational vehicle's position facing where it is facing, is the dog in front of me or behind me?"
The model is trained to provide answers in a structured format with reasoning enclosed in <think> tags and final answers in \boxed{}.