Reflect, Retry, Reward: Self-Improving LLMs via Reinforcement Learning Paper • 2505.24726 • Published May 30 • 276
Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning Paper • 2506.01939 • Published Jun 2 • 187
DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning Paper • 2504.07128 • Published Apr 2 • 86
Transformer Explainer: Interactive Learning of Text-Generative Models Paper • 2408.04619 • Published Aug 8, 2024 • 172