TIR-Bench: A Comprehensive Benchmark for Agentic Thinking-with-Images Reasoning Paper β’ 2511.01833 β’ Published Nov 3 β’ 15
Mini-o3: Scaling Up Reasoning Patterns and Interaction Turns for Visual Search Paper β’ 2509.07969 β’ Published Sep 9 β’ 59
Symbolic Graphics Programming with Large Language Models Paper β’ 2509.05208 β’ Published Sep 5 β’ 46
SimpleTIR: End-to-End Reinforcement Learning for Multi-Turn Tool-Integrated Reasoning Paper β’ 2509.02479 β’ Published Sep 2 β’ 83
Intern-S1: A Scientific Multimodal Foundation Model Paper β’ 2508.15763 β’ Published Aug 21 β’ 256
MCP-Universe: Benchmarking Large Language Models with Real-World Model Context Protocol Servers Paper β’ 2508.14704 β’ Published Aug 20 β’ 43
Part I: Tricks or Traps? A Deep Dive into RL for LLM Reasoning Paper β’ 2508.08221 β’ Published Aug 11 β’ 49
Zebra-CoT: A Dataset for Interleaved Vision Language Reasoning Paper β’ 2507.16746 β’ Published Jul 22 β’ 35
Scientists' First Exam: Probing Cognitive Abilities of MLLM via Perception, Understanding, and Reasoning Paper β’ 2506.10521 β’ Published Jun 12 β’ 73
MiniMax-M1: Scaling Test-Time Compute Efficiently with Lightning Attention Paper β’ 2506.13585 β’ Published Jun 16 β’ 272
view article Article No GPU left behind: Unlocking Efficiency with Co-located vLLM in TRL +4 Jun 3 β’ 96
π§ Reasoning datasets Collection Datasets with reasoning traces for math and code released by the community β’ 24 items β’ Updated May 19 β’ 175
DeepResearchGym: A Free, Transparent, and Reproducible Evaluation Sandbox for Deep Research Paper β’ 2505.19253 β’ Published May 25 β’ 32
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs Paper β’ 2504.11536 β’ Published Apr 15 β’ 63