arxiv:2511.20250

Uplifting Table Tennis: A Robust, Real-World Application for 3D Trajectory and Spin Estimation

Published on Nov 25

· Submitted by

Daniel Kienzle on Nov 26

Chair for Machine Learning & Computer Vision

Upvote

Authors:

Daniel Kienzle ,

Julian Lorenz ,

Abstract

A two-stage pipeline with a front-end perception task and a back-end 2D-to-3D uplifting task is proposed for accurate 3D motion analysis of a table tennis ball using monocular video.

AI-generated summary

Obtaining the precise 3D motion of a table tennis ball from standard monocular videos is a challenging problem, as existing methods trained on synthetic data struggle to generalize to the noisy, imperfect ball and table detections of the real world. This is primarily due to the inherent lack of 3D ground truth trajectories and spin annotations for real-world video. To overcome this, we propose a novel two-stage pipeline that divides the problem into a front-end perception task and a back-end 2D-to-3D uplifting task. This separation allows us to train the front-end components with abundant 2D supervision from our newly created TTHQ dataset, while the back-end uplifting network is trained exclusively on physically-correct synthetic data. We specifically re-engineer the uplifting model to be robust to common real-world artifacts, such as missing detections and varying frame rates. By integrating a ball detector and a table keypoint detector, our approach transforms a proof-of-concept uplifting method into a practical, robust, and high-performing end-to-end application for 3D table tennis trajectory and spin analysis.

View arXiv page View PDF Project page GitHub 6 Add to collection

Community

KieDani

Paper author Paper submitter 13 days ago

The authors present a robust end-to-end pipeline for analyzing table tennis matches, which can, for the first time, derive the precise 3D trajectory and spin of the ball directly from common TV broadcasts. The solution overcomes the central problem of missing 3D ground truth using a two-stage framework:

Front-End (2D Perception): Segformer++ detectors accurately recognize the 2D ball positions and table keypoints in every frame. This module is trained on the new, high-resolution TTHQ dataset.
Back-End (2D-to-3D Uplifting): A specialized transformer network is trained exclusively with synthetic, physically correct 3D data to lift the 2D detections into the 3D world.

Due to architectural adaptations, such as a time-proportional Positional Embedding, the pipeline is extremely robust against real-world problems like variable frame rates and faulty detections.
This work represents a practical, ready-to-use tool for detailed technique and performance analysis in table tennis sports. More paper details are given at https://kiedani.github.io/WACV2026/index.html

librarian-bot

13 days ago

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 3

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2511.20250 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2511.20250 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.