Sana Collection ā”ļøSana: Efficient High-Resolution Image Synthesis with Linear Diffusion Transformer ⢠22 items ⢠Updated 1 day ago ⢠98
GDPO: Group reward-Decoupled Normalization Policy Optimization for Multi-reward RL Optimization Paper ⢠2601.05242 ⢠Published 13 days ago ⢠201
MC#: Mixture Compressor for Mixture-of-Experts Large Models Paper ⢠2510.10962 ⢠Published Oct 13, 2025
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper ⢠2512.20557 ⢠Published 29 days ago ⢠49
Learning to Reason in 4D: Dynamic Spatial Understanding for Vision Language Models Paper ⢠2512.20557 ⢠Published 29 days ago ⢠49
DC-AR: Efficient Masked Autoregressive Image Generation with Deep Compression Hybrid Tokenizer Paper ⢠2507.04947 ⢠Published Jul 7, 2025 ⢠1
Seed-Prover 1.5: Mastering Undergraduate-Level Theorem Proving via Learning from Experience Paper ⢠2512.17260 ⢠Published Dec 19, 2025 ⢠49
FoundationMotion: Auto-Labeling and Reasoning about Spatial Movement in Videos Paper ⢠2512.10927 ⢠Published Dec 11, 2025 ⢠5
CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training Paper ⢠2504.13161 ⢠Published Apr 17, 2025 ⢠93
NaVILA: Legged Robot Vision-Language-Action Model for Navigation Paper ⢠2412.04453 ⢠Published Dec 5, 2024
EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos Paper ⢠2507.12440 ⢠Published Jul 16, 2025
3D Aware Region Prompted Vision Language Model Paper ⢠2509.13317 ⢠Published Sep 16, 2025 ⢠14
Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations Paper ⢠2508.18132 ⢠Published Aug 25, 2025
QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs Paper ⢠2510.11696 ⢠Published Oct 13, 2025 ⢠178
OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM Paper ⢠2510.15870 ⢠Published Oct 17, 2025 ⢠90