Stabilizing Reinforcement Learning with LLMs: Formulation and Practices Paper • 2512.01374 • Published 6 days ago • 77
DeepKE: A Deep Learning Based Knowledge Extraction Toolkit for Knowledge Base Population Paper • 2201.03335 • Published Jan 10, 2022 • 1
Contrastive Demonstration Tuning for Pre-trained Language Models Paper • 2204.04392 • Published Apr 9, 2022 • 1
Qwen2.5-Math Technical Report: Toward Mathematical Expert Model via Self-Improvement Paper • 2409.12122 • Published Sep 18, 2024 • 4
Disentangling Reasoning Tokens and Boilerplate Tokens For Language Model Fine-tuning Paper • 2412.14780 • Published Dec 19, 2024
Beyond Turn Limits: Training Deep Search Agents with Dynamic Context Window Paper • 2510.08276 • Published Oct 9 • 9
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 187
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 187
Demons in the Detail: On Implementing Load Balancing Loss for Training Specialized Mixture-of-Expert Models Paper • 2501.11873 • Published Jan 21 • 66