From Code Foundation Models to Agents and Applications: A Practical Guide to Code Intelligence Paper • 2511.18538 • Published 14 days ago • 239
AutoEnv: Automated Environments for Measuring Cross-Environment Agent Learning Paper • 2511.19304 • Published 13 days ago • 89
VisJudge-Bench: Aesthetics and Quality Assessment of Visualizations Paper • 2510.22373 • Published Oct 25 • 14
Reasoning via Video: The First Evaluation of Video Models' Reasoning Abilities through Maze-Solving Tasks Paper • 2511.15065 • Published 18 days ago • 74
InteractComp: Evaluating Search Agents With Ambiguous Queries Paper • 2510.24668 • Published Oct 28 • 97
Concise Reasoning, Big Gains: Pruning Long Reasoning Trace with Difficulty-Aware Prompting Paper • 2505.19716 • Published May 26 • 4
You Don't Know Until You Click:Automated GUI Testing for Production-Ready Software Evaluation Paper • 2508.14104 • Published Aug 17 • 1
VeritasFi: An Adaptable, Multi-tiered RAG Framework for Multi-modal Financial Question Answering Paper • 2510.10828 • Published Oct 12 • 1
ReCode: Unify Plan and Action for Universal Granularity Control Paper • 2510.23564 • Published Oct 27 • 120
MedAgentsBench: Benchmarking Thinking Models and Agent Frameworks for Complex Medical Reasoning Paper • 2503.07459 • Published Mar 10 • 16
Alpha-SQL: Zero-Shot Text-to-SQL using Monte Carlo Tree Search Paper • 2502.17248 • Published Feb 24 • 1
A Survey of Self-Evolving Agents: On Path to Artificial Super Intelligence Paper • 2507.21046 • Published Jul 28 • 82