Contrastive Decoding Improves Reasoning in Large Language Models Paper • 2309.09117 • Published Sep 17, 2023 • 39
Prometheus: Inducing Fine-grained Evaluation Capability in Language Models Paper • 2310.08491 • Published Oct 12, 2023 • 55
Language Models are Hidden Reasoners: Unlocking Latent Reasoning Capabilities via Self-Rewarding Paper • 2411.04282 • Published Nov 6, 2024 • 37
Insight-V: Exploring Long-Chain Visual Reasoning with Multimodal Large Language Models Paper • 2411.14432 • Published Nov 21, 2024 • 25
Ensembling Large Language Models with Process Reward-Guided Tree Search for Better Complex Reasoning Paper • 2412.15797 • Published Dec 20, 2024 • 18
Search-o1: Agentic Search-Enhanced Large Reasoning Models Paper • 2501.05366 • Published Jan 9, 2025 • 102
Token Assorted: Mixing Latent and Text Tokens for Improved Language Model Reasoning Paper • 2502.03275 • Published Feb 5, 2025 • 18
Scaling up Test-Time Compute with Latent Reasoning: A Recurrent Depth Approach Paper • 2502.05171 • Published Feb 7, 2025 • 151
One Example Shown, Many Concepts Known! Counterexample-Driven Conceptual Reasoning in Mathematical LLMs Paper • 2502.10454 • Published Feb 12, 2025 • 7
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12, 2025 • 38
What Are Step-Level Reward Models Rewarding? Counterintuitive Findings from MCTS-Boosted Mathematical Reasoning Paper • 2412.15904 • Published Dec 20, 2024
R1-Searcher: Incentivizing the Search Capability in LLMs via Reinforcement Learning Paper • 2503.05592 • Published Mar 7, 2025 • 27
GoT: Unleashing Reasoning Capability of Multimodal Large Language Model for Visual Generation and Editing Paper • 2503.10639 • Published Mar 13, 2025 • 53
GTR: Guided Thought Reinforcement Prevents Thought Collapse in RL-based VLM Agent Training Paper • 2503.08525 • Published Mar 11, 2025 • 17
OpenVLThinker: An Early Exploration to Complex Vision-Language Reasoning via Iterative Self-Improvement Paper • 2503.17352 • Published Mar 21, 2025 • 24
What, How, Where, and How Well? A Survey on Test-Time Scaling in Large Language Models Paper • 2503.24235 • Published Mar 31, 2025 • 54
Exploring Data Scaling Trends and Effects in Reinforcement Learning from Human Feedback Paper • 2503.22230 • Published Mar 28, 2025 • 45
Open-Reasoner-Zero: An Open Source Approach to Scaling Up Reinforcement Learning on the Base Model Paper • 2503.24290 • Published Mar 31, 2025 • 62
VAPO: Efficient and Reliable Reinforcement Learning for Advanced Reasoning Tasks Paper • 2504.05118 • Published Apr 7, 2025 • 26
Think Only When You Need with Large Hybrid-Reasoning Models Paper • 2505.14631 • Published May 20, 2025 • 20
Enigmata: Scaling Logical Reasoning in Large Language Models with Synthetic Verifiable Puzzles Paper • 2505.19914 • Published May 26, 2025 • 45
VerIPO: Cultivating Long Reasoning in Video-LLMs via Verifier-Gudied Iterative Policy Optimization Paper • 2505.19000 • Published May 25, 2025 • 42
The Entropy Mechanism of Reinforcement Learning for Reasoning Language Models Paper • 2505.22617 • Published May 28, 2025 • 131
AlphaOne: Reasoning Models Thinking Slow and Fast at Test Time Paper • 2505.24863 • Published May 30, 2025 • 97
From Token to Action: State Machine Reasoning to Mitigate Overthinking in Information Retrieval Paper • 2505.23059 • Published May 29, 2025 • 13
OThink-R1: Intrinsic Fast/Slow Thinking Mode Switching for Over-Reasoning Mitigation Paper • 2506.02397 • Published Jun 3, 2025 • 35
Comment on The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity Paper • 2506.09250 • Published Jun 10, 2025 • 27
ReasonFlux-PRM: Trajectory-Aware PRMs for Long Chain-of-Thought Reasoning in LLMs Paper • 2506.18896 • Published Jun 23, 2025 • 29
SPIRAL: Self-Play on Zero-Sum Games Incentivizes Reasoning via Multi-Agent Multi-Turn Reinforcement Learning Paper • 2506.24119 • Published Jun 30, 2025 • 50
KV Cache Steering for Inducing Reasoning in Small Language Models Paper • 2507.08799 • Published Jul 11, 2025 • 40
Reasoning or Memorization? Unreliable Results of Reinforcement Learning Due to Data Contamination Paper • 2507.10532 • Published Jul 14, 2025 • 89
MUR: Momentum Uncertainty guided Reasoning for Large Language Models Paper • 2507.14958 • Published Jul 20, 2025 • 46
NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published Aug 20, 2025 • 40
Intern-S1: A Scientific Multimodal Foundation Model Paper • 2508.15763 • Published Aug 21, 2025 • 261
InMind: Evaluating LLMs in Capturing and Applying Individual Human Reasoning Styles Paper • 2508.16072 • Published Aug 22, 2025 • 4
ThinkDial: An Open Recipe for Controlling Reasoning Effort in Large Language Models Paper • 2508.18773 • Published Aug 26, 2025 • 16
Think in Games: Learning to Reason in Games via Reinforcement Learning with Large Language Models Paper • 2508.21365 • Published Aug 29, 2025 • 29
Reasoning Vectors: Transferring Chain-of-Thought Capabilities via Task Arithmetic Paper • 2509.01363 • Published Sep 1, 2025 • 58
MMR1: Enhancing Multimodal Reasoning with Variance-Aware Sampling and Open Resources Paper • 2509.21268 • Published Sep 25, 2025 • 104
Understanding the Thinking Process of Reasoning Models: A Perspective from Schoenfeld's Episode Theory Paper • 2509.14662 • Published Sep 18, 2025 • 13
Thinking Sparks!: Emergent Attention Heads in Reasoning Models During Post Training Paper • 2509.25758 • Published Sep 30, 2025 • 22
Large Reasoning Models Learn Better Alignment from Flawed Thinking Paper • 2510.00938 • Published Oct 1, 2025 • 58
MITS: Enhanced Tree Search Reasoning for LLMs via Pointwise Mutual Information Paper • 2510.03632 • Published Oct 4, 2025 • 41
SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs Paper • 2510.05069 • Published Oct 6, 2025 • 12
Meta-Awareness Enhances Reasoning Models: Self-Alignment Reinforcement Learning Paper • 2510.03259 • Published Sep 26, 2025 • 57
Reasoning with Sampling: Your Base Model is Smarter Than You Think Paper • 2510.14901 • Published Oct 16, 2025 • 47
Scaling Latent Reasoning via Looped Language Models Paper • 2510.25741 • Published Oct 29, 2025 • 221
When Visualizing is the First Step to Reasoning: MIRA, a Benchmark for Visual Chain-of-Thought Paper • 2511.02779 • Published Nov 4, 2025 • 58
Tiny Model, Big Logic: Diversity-Driven Optimization Elicits Large-Model Reasoning Ability in VibeThinker-1.5B Paper • 2511.06221 • Published Nov 9, 2025 • 132
Nemotron Elastic: Towards Efficient Many-in-One Reasoning LLMs Paper • 2511.16664 • Published Nov 20, 2025 • 26
SO-Bench: A Structural Output Evaluation of Multimodal LLMs Paper • 2511.21750 • Published Nov 23, 2025 • 6
DualVLA: Building a Generalizable Embodied Agent via Partial Decoupling of Reasoning and Action Paper • 2511.22134 • Published Nov 27, 2025 • 21
SpeContext: Enabling Efficient Long-context Reasoning with Speculative Context Sparsity in LLMs Paper • 2512.00722 • Published Nov 30, 2025 • 15
Native Parallel Reasoner: Reasoning in Parallelism via Self-Distilled Reinforcement Learning Paper • 2512.07461 • Published Dec 8, 2025 • 76
Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning Paper • 2512.15687 • Published Dec 17, 2025 • 18
LongVideoAgent: Multi-Agent Reasoning with Long Videos Paper • 2512.20618 • Published 30 days ago • 53
Multi-hop Reasoning via Early Knowledge Alignment Paper • 2512.20144 • Published about 1 month ago • 6
Reasoning Palette: Modulating Reasoning via Latent Contextualization for Controllable Exploration for (V)LMs Paper • 2512.17206 • Published Dec 19, 2025 • 19
Schoenfeld's Anatomy of Mathematical Reasoning by Language Models Paper • 2512.19995 • Published about 1 month ago • 15
RelayLLM: Efficient Reasoning via Collaborative Decoding Paper • 2601.05167 • Published 14 days ago • 28
DiffCoT: Diffusion-styled Chain-of-Thought Reasoning in LLMs Paper • 2601.03559 • Published 15 days ago • 12
PaCoRe: Learning to Scale Test-Time Compute with Parallel Coordinated Reasoning Paper • 2601.05593 • Published 13 days ago • 78
Language of Thought Shapes Output Diversity in Large Language Models Paper • 2601.11227 • Published 6 days ago • 4