-
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 31 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 47 -
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA
Paper • 2510.04849 • Published • 113
Collections
Discover the best community collections!
Collections including paper arxiv:2511.08567
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 529 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Controlled Decoding from Language Models
Paper • 2310.17022 • Published • 15 -
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
Paper • 2509.10396 • Published • 15 -
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 31
-
NousResearch/Hermes-4-70B-FP8
Text Generation • 71B • Updated • 300 • 25 -
NousResearch/Hermes-4-405B-FP8
Text Generation • 406B • Updated • 1.04k • 21 -
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Paper • 2508.21365 • Published • 29 -
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Paper • 2509.15566 • Published • 14
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 31 -
Reasoning with Sampling: Your Base Model is Smarter Than You Think
Paper • 2510.14901 • Published • 47 -
When Models Lie, We Learn: Multilingual Span-Level Hallucination Detection with PsiloQA
Paper • 2510.04849 • Published • 113
-
NousResearch/Hermes-4-70B-FP8
Text Generation • 71B • Updated • 300 • 25 -
NousResearch/Hermes-4-405B-FP8
Text Generation • 406B • Updated • 1.04k • 21 -
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models
Paper • 2508.21365 • Published • 29 -
BTL-UI: Blink-Think-Link Reasoning Model for GUI Agent
Paper • 2509.15566 • Published • 14
-
lusxvr/nanoVLM-222M
Image-Text-to-Text • 0.2B • Updated • 529 • 98 -
Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning
Paper • 2503.09516 • Published • 36 -
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time
Paper • 2505.24863 • Published • 97 -
QwenLong-L1: Towards Long-Context Large Reasoning Models with Reinforcement Learning
Paper • 2505.17667 • Published • 88
-
Diffusion Augmented Agents: A Framework for Efficient Exploration and Transfer Learning
Paper • 2407.20798 • Published • 24 -
Offline Reinforcement Learning for LLM Multi-Step Reasoning
Paper • 2412.16145 • Published • 38 -
REINFORCE++: A Simple and Efficient Approach for Aligning Large Language Models
Paper • 2501.03262 • Published • 103 -
SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution
Paper • 2502.18449 • Published • 75
-
Controlled Decoding from Language Models
Paper • 2310.17022 • Published • 15 -
Unleashing the Power of Pre-trained Language Models for Offline Reinforcement Learning
Paper • 2310.20587 • Published • 18 -
Inpainting-Guided Policy Optimization for Diffusion Large Language Models
Paper • 2509.10396 • Published • 15 -
The Path Not Taken: RLVR Provably Learns Off the Principals
Paper • 2511.08567 • Published • 31