DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 5 days ago • 172
Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 6 days ago • 77
Olmo 3 Post-training Collection All artifacts for post-training Olmo 3. Datasets follow the model that resulted from training on them. • 32 items • Updated 6 days ago • 38
DR Tulu: Reinforcement Learning with Evolving Rubrics for Deep Research Paper • 2511.19399 • Published 13 days ago • 54
nvidia/Nemotron-RL-instruction_following-structured_outputs Viewer • Updated 23 days ago • 9.44k • 705 • 17
Training Long-Context, Multi-Turn Software Engineering Agents with Reinforcement Learning Paper • 2508.03501 • Published Aug 5 • 59
ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning Paper • 2510.12693 • Published Oct 14 • 26