Mitigating Intra- and Inter-modal Forgetting in Continual Learning of Unified Multimodal Models Paper • 2512.03125 • Published Dec 2, 2025 • 1
Synthesizing Agentic Data for Web Agents with Progressive Difficulty Enhancement Mechanisms Paper • 2510.13913 • Published Oct 15, 2025 • 3
EgoVLM: Policy Optimization for Egocentric Video Understanding Paper • 2506.03097 • Published Jun 3, 2025
Hard2Verify: A Step-Level Verification Benchmark for Open-Ended Frontier Math Paper • 2510.13744 • Published Oct 15, 2025 • 5
ClinicalBench: Can LLMs Beat Traditional ML Models in Clinical Prediction? Paper • 2411.06469 • Published Nov 10, 2024 • 17
SFR-DeepResearch: Towards Effective Reinforcement Learning for Autonomously Reasoning Single Agents Paper • 2509.06283 • Published Sep 8, 2025 • 17
Arch-Router: Aligning LLM Routing with Human Preferences Paper • 2506.16655 • Published Jun 19, 2025 • 17
Sasha: Creative Goal-Oriented Reasoning in Smart Homes with Large Language Models Paper • 2305.09802 • Published May 16, 2023
Flavors of Moonshine: Tiny Specialized ASR Models for Edge Devices Paper • 2509.02523 • Published Sep 2, 2025 • 7
Concentration of Measure for Distributions Generated via Diffusion Models Paper • 2501.07741 • Published Jan 13, 2025
CodeUpdateArena: Benchmarking Knowledge Editing on API Updates Paper • 2407.06249 • Published Jul 8, 2024
FaithEval: Can Your Language Model Stay Faithful to Context, Even If "The Moon is Made of Marshmallows" Paper • 2410.03727 • Published Sep 30, 2024 • 2
PhD Knowledge Not Required: A Reasoning Challenge for Large Language Models Paper • 2502.01584 • Published Feb 3, 2025 • 9
Rethinking Addressing in Language Models via Contexualized Equivariant Positional Encoding Paper • 2501.00712 • Published Jan 1, 2025 • 6
Moonshine: Speech Recognition for Live Transcription and Voice Commands Paper • 2410.15608 • Published Oct 21, 2024
The GEM Benchmark: Natural Language Generation, its Evaluation and Metrics Paper • 2102.01672 • Published Feb 2, 2021 • 1