MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory Paper • 2511.22609 • Published 10 days ago • 47
MG-Nav: Dual-Scale Visual Navigation via Sparse Spatial Memory Paper • 2511.22609 • Published 10 days ago • 47 • 2
UltraFlux: Data-Model Co-Design for High-quality Native 4K Text-to-Image Generation across Diverse Aspect Ratios Paper • 2511.18050 • Published 16 days ago • 37
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper • 2510.11696 • Published Oct 13 • 176
Paper2Video: Automatic Video Generation from Scientific Papers Paper • 2510.05096 • Published Oct 6 • 116
Easier Painting Than Thinking: Can Text-to-Image Models Set the Stage, but Not Direct the Play? Paper • 2509.03516 • Published Sep 3 • 11
Beyond Fixed: Variable-Length Denoising for Diffusion Large Language Models Paper • 2508.00819 • Published Aug 1 • 62
SPEED: Scalable, Precise, and Efficient Concept Erasure for Diffusion Models Paper • 2503.07392 • Published Mar 10 • 2
On the Generalization of SFT: A Reinforcement Learning Perspective with Reward Rectification Paper • 2508.05629 • Published Aug 7 • 180
KRIS-Bench: Benchmarking Next-Level Intelligent Image Editing Models Paper • 2505.16707 • Published May 22 • 45
Geometry Forcing: Marrying Video Diffusion and 3D Representation for Consistent World Modeling Paper • 2507.07982 • Published Jul 10 • 33
Number it: Temporal Grounding Videos like Flipping Manga Paper • 2411.10332 • Published Nov 15, 2024 • 14
Number it: Temporal Grounding Videos like Flipping Manga Paper • 2411.10332 • Published Nov 15, 2024 • 14