Reinforcement Learning Foundations for Deep Research Systems: A Survey Paper • 2509.06733 • Published Sep 8 • 32
Understanding R1-Zero-Like Training: A Critical Perspective Paper • 2503.20783 • Published Mar 26 • 57