DAComp: Benchmarking Data Agents across the Full Data Intelligence Lifecycle Paper • 2512.04324 • Published 4 days ago • 128
ImpossibleBench: Measuring LLMs' Propensity of Exploiting Test Cases Paper • 2510.20270 • Published Oct 23 • 6
EditScore: Unlocking Online RL for Image Editing via High-Fidelity Reward Modeling Paper • 2509.23909 • Published Sep 28 • 31
The Landscape of Agentic Reinforcement Learning for LLMs: A Survey Paper • 2509.02547 • Published Sep 2 • 225
TreePO: Bridging the Gap of Policy Optimization and Efficacy and Inference Efficiency with Heuristic Tree-based Modeling Paper • 2508.17445 • Published Aug 24 • 80
VideoDeepResearch: Long Video Understanding With Agentic Tool Using Paper • 2506.10821 • Published Jun 12 • 19
MegaPairs: Massive Data Synthesis For Universal Multimodal Retrieval Paper • 2412.14475 • Published Dec 19, 2024 • 55