SIMA 2: A Generalist Embodied Agent for Virtual Worlds Paper • 2512.04797 • Published 2 days ago • 11
Thinking with Programming Vision: Towards a Unified View for Thinking with Images Paper • 2512.03746 • Published 3 days ago • 15
SpaceTools: Tool-Augmented Spatial Reasoning via Double Interactive RL Paper • 2512.04069 • Published 3 days ago • 21
RELIC: Interactive Video World Model with Long-Horizon Memory Paper • 2512.04040 • Published 3 days ago • 20
Video4Spatial: Towards Visuospatial Intelligence with Context-Guided Video Generation Paper • 2512.03040 • Published 4 days ago • 5
Revisiting the Necessity of Lengthy Chain-of-Thought in Vision-centric Reasoning Generalization Paper • 2511.22586 • Published 9 days ago • 6
DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models Paper • 2512.02556 • Published 5 days ago • 165
SO-Bench: A Structural Output Evaluation of Multimodal LLMs Paper • 2511.21750 • Published 13 days ago • 5
Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer Paper • 2511.22699 • Published 9 days ago • 145
Inferix: A Block-Diffusion based Next-Generation Inference Engine for World Simulation Paper • 2511.20714 • Published 12 days ago • 45
Downscaling Intelligence: Exploring Perception and Reasoning Bottlenecks in Small Multimodal Models Paper • 2511.17487 • Published 15 days ago • 9
WorldGen: From Text to Traversable and Interactive 3D Worlds Paper • 2511.16825 • Published 16 days ago • 21
RynnVLA-002: A Unified Vision-Language-Action and World Model Paper • 2511.17502 • Published 15 days ago • 24