Terminal-Bench: Benchmarking Agents on Hard, Realistic Tasks in Command Line Interfaces Paper • 2601.11868 • Published 8 days ago • 21
Scaling Text-to-Image Diffusion Transformers with Representation Autoencoders Paper • 2601.16208 • Published 2 days ago • 44
Stable-DiffCoder: Pushing the Frontier of Code Diffusion Large Language Model Paper • 2601.15892 • Published 2 days ago • 43
The Flexibility Trap: Why Arbitrary Order Limits Reasoning Potential in Diffusion Language Models Paper • 2601.15165 • Published 3 days ago • 57
EvoCUA: Evolving Computer Use Agents via Learning from Scalable Synthetic Experience Paper • 2601.15876 • Published 2 days ago • 68
Scaling Behavior Cloning Improves Causal Reasoning: An Open Model for Real-Time Video Game Playing Paper • 2601.04575 • Published 16 days ago • 8
Implicit Neural Representation Facilitates Unified Universal Vision Encoding Paper • 2601.14256 • Published 4 days ago • 5
FlashLabs Chroma 1.0: A Real-Time End-to-End Spoken Dialogue Model with Personalized Voice Cloning Paper • 2601.11141 • Published 8 days ago • 16
Numina-Lean-Agent: An Open and General Agentic Reasoning System for Formal Mathematics Paper • 2601.14027 • Published 4 days ago • 10
Render-of-Thought: Rendering Textual Chain-of-Thought as Images for Visual Latent Reasoning Paper • 2601.14750 • Published 3 days ago • 14
Rethinking Video Generation Model for the Embodied World Paper • 2601.15282 • Published 3 days ago • 40
MMDeepResearch-Bench: A Benchmark for Multimodal Deep Research Agents Paper • 2601.12346 • Published 6 days ago • 45
FantasyVLN: Unified Multimodal Chain-of-Thought Reasoning for Vision-Language Navigation Paper • 2601.13976 • Published 4 days ago • 18