Spatial-SSRL / README.md
yuhangzang
Add Gradio Space for Spatial-SSRL spatial reasoning demo
1e5cd04

A newer version of the Gradio SDK is available: 6.1.0

Upgrade
metadata
title: Spatial-SSRL Spatial Reasoning
emoji: 🌍
colorFrom: blue
colorTo: indigo
sdk: gradio
sdk_version: 5.49.1
app_file: app.py
pinned: false
license: apache-2.0
short_description: Spatial reasoning with vision-language models

🌍 Spatial-SSRL: Spatial Reasoning with Vision-Language Models

This demo showcases the spatial reasoning capabilities of vision-language models trained to understand 3D spatial relationships from 2D images.

Features

  • 3D Location Understanding: Determine which objects are closer or further from the camera
  • Orientation Analysis: Understand which direction objects are facing
  • Relative Positioning: Answer questions about object positions relative to each other
  • Step-by-step Reasoning: The model provides detailed reasoning before answering

How to Use

  1. Upload an image
  2. Ask a question about spatial relationships in the image
  3. The model will provide a detailed answer with reasoning

Example Questions

  • "Which object is further away from the camera? A. boat B. fire hydrant"
  • "Are the kid and the teddy bear facing same or similar directions?"
  • "If I stand at the recreational vehicle's position facing where it is facing, is the dog in front of me or behind me?"

The model is trained to provide answers in a structured format with reasoning enclosed in <think> tags and final answers in \boxed{}.