arxiv:2601.06336

Future-as-Label: Scalable Supervision from Real-World Outcomes

Published on Jan 9

Authors:

Abstract

Reinforcement learning with verifiable rewards is extended to temporal prediction settings, enabling language models to make probabilistic forecasts under causally masked information using retrospective evaluation with proper scoring rules.

AI-generated summary

Many real-world prediction problems lack labels observable at prediction time, creating a temporal gap between prediction and outcome that yields supervision only after events resolve. To address this setting, we extend reinforcement learning with verifiable rewards to temporally resolved real-world prediction, and use it to train language models to make probabilistic forecasts under causally masked information with retrospective evaluation using proper scoring rules. Supervision is derived solely from post-resolution outcomes, preserving delayed-reward semantics. On real-world forecasting benchmarks, Qwen3-32B trained using Foresight Learning improves Brier score by 27% and halves calibration error relative to its pretrained baseline, and outperforms Qwen3-235B on both constructed future-event prediction tasks and the Metaculus benchmark despite a 7x parameter disadvantage.

View arXiv page View PDF Add to collection

Community

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2601.06336 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2601.06336 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2601.06336 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.