The Best of N Worlds: Aligning Reinforcement Learning with Best-of-N Sampling via max@k Optimisation Paper • 2510.23393 • Published Oct 27 • 20
The Complexity Trap: Simple Observation Masking Is as Efficient as LLM Summarization for Agent Context Management Paper • 2508.21433 • Published Aug 29 • 7
🦫 PIPer Collection All the resources for our paper "PIPer: On-Device Environment Setup via Online Reinforcement Learning"! • 9 items • Updated Oct 1 • 3
PIPer: On-Device Environment Setup via Online Reinforcement Learning Paper • 2509.25455 • Published Sep 29 • 37
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30 • 50
ImageReFL: Balancing Quality and Diversity in Human-Aligned Diffusion Models Paper • 2505.22569 • Published May 28 • 56
Absolute Zero: Reinforced Self-play Reasoning with Zero Data Paper • 2505.03335 • Published May 6 • 188
SWE-rebench: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents Paper • 2505.20411 • Published May 26 • 91
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28 • 131
Reinforcement Learning for Reasoning in Large Language Models with One Training Example Paper • 2504.20571 • Published Apr 29 • 98
MLGym: A New Framework and Benchmark for Advancing AI Research Agents Paper • 2502.14499 • Published Feb 20 • 192
DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence Paper • 2406.11931 • Published Jun 17, 2024 • 67
Long Code Arena: a Set of Benchmarks for Long-Context Code Models Paper • 2406.11612 • Published Jun 17, 2024 • 25
XLand-100B: A Large-Scale Multi-Task Dataset for In-Context Reinforcement Learning Paper • 2406.08973 • Published Jun 13, 2024 • 89
Linear Transformers with Learnable Kernel Functions are Better In-Context Models Paper • 2402.10644 • Published Feb 16, 2024 • 81