Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation
Abstract
A two-stage pipeline with a front-end perception task and a back-end 2D-to-3D uplifting task is proposed for accurate 3D motion analysis of a table tennis ball using monocular video.
Obtaining the precise 3D motion of a table tennis ball from standard monocular videos is a challenging problem, as existing methods trained on synthetic data struggle to generalize to the noisy, imperfect ball and table detections of the real world. This is primarily due to the inherent lack of 3D ground truth trajectories and spin annotations for real-world video. To overcome this, we propose a novel two-stage pipeline that divides the problem into a front-end perception task and a back-end 2D-to-3D uplifting task. This separation allows us to train the front-end components with abundant 2D supervision from our newly created TTHQ dataset, while the back-end uplifting network is trained exclusively on physically-correct synthetic data. We specifically re-engineer the uplifting model to be robust to common real-world artifacts, such as missing detections and varying frame rates. By integrating a ball detector and a table keypoint detector, our approach transforms a proof-of-concept uplifting method into a practical, robust, and high-performing end-to-end application for 3D table tennis trajectory and spin analysis.
Community
The authors present a robust end-to-end pipeline for analyzing table tennis matches, which can, for the first time, derive the precise 3D trajectory and spin of the ball directly from common TV broadcasts. The solution overcomes the central problem of missing 3D ground truth using a two-stage framework:
- Front-End (2D Perception): Segformer++ detectors accurately recognize the 2D ball positions and table keypoints in every frame. This module is trained on the new, high-resolution TTHQ dataset.
- Back-End (2D-to-3D Uplifting): A specialized transformer network is trained exclusively with synthetic, physically correct 3D data to lift the 2D detections into the 3D world.
Due to architectural adaptations, such as a time-proportional Positional Embedding, the pipeline is extremely robust against real-world problems like variable frame rates and faulty detections.
This work represents a practical, ready-to-use tool for detailed technique and performance analysis in table tennis sports. More paper details are given at https://kiedani.github.io/WACV2026/index.html
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- RacketVision: A Multiple Racket Sports Benchmark for Unified Ball and Racket Analysis (2025)
- C4D: 4D Made from 3D through Dual Correspondences (2025)
- SkelSplat: Robust Multi-view 3D Human Pose Estimation with Differentiable Gaussian Rendering (2025)
- WorldMirror: Universal 3D World Reconstruction with Any-Prior Prompting (2025)
- Ego-Exo 3D Hand Tracking in the Wild with a Mobile Multi-Camera Rig (2025)
- Visual Odometry with Transformers (2025)
- 3D Ground Truth Reconstruction from Multi-Camera Annotations Using UKF (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 3
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
Collections including this paper 0
No Collection including this paper