Region-Constraint In-Context Generation for Instructional Video Editing Paper • 2512.17650 • Published 15 days ago • 49
Live Avatar: Streaming Real-time Audio-Driven Avatar Generation with Infinite Length Paper • 2512.04677 • Published 30 days ago • 167
OmniConsistency: Learning Style-Agnostic Consistency from Paired Stylization Data Paper • 2505.18445 • Published May 24, 2025 • 63
Data Mixing Made Efficient: A Bivariate Scaling Law for Language Model Pretraining Paper • 2405.14908 • Published May 23, 2024 • 15
iVideoGPT: Interactive VideoGPTs are Scalable World Models Paper • 2405.15223 • Published May 24, 2024 • 17
Automatic Data Curation for Self-Supervised Learning: A Clustering-Based Approach Paper • 2405.15613 • Published May 24, 2024 • 17
CraftsMan: High-fidelity Mesh Generation with 3D Native Generation and Interactive Geometry Refiner Paper • 2405.14979 • Published May 23, 2024 • 19
Grokked Transformers are Implicit Reasoners: A Mechanistic Journey to the Edge of Generalization Paper • 2405.15071 • Published May 23, 2024 • 41
ConvLLaVA: Hierarchical Backbones as Visual Encoder for Large Multimodal Models Paper • 2405.15738 • Published May 24, 2024 • 46
Meteor: Mamba-based Traversal of Rationale for Large Language and Vision Models Paper • 2405.15574 • Published May 24, 2024 • 55
Tele-Aloha: A Low-budget and High-authenticity Telepresence System Using Sparse RGB Cameras Paper • 2405.14866 • Published May 23, 2024 • 9
Neural Directional Encoding for Efficient and Accurate View-Dependent Appearance Modeling Paper • 2405.14847 • Published May 23, 2024 • 10
NeRF-Casting: Improved View-Dependent Appearance with Consistent Reflections Paper • 2405.14871 • Published May 23, 2024 • 10
Semantica: An Adaptable Image-Conditioned Diffusion Model Paper • 2405.14857 • Published May 23, 2024 • 11
AlignGPT: Multi-modal Large Language Models with Adaptive Alignment Capability Paper • 2405.14129 • Published May 23, 2024 • 14
Improved Distribution Matching Distillation for Fast Image Synthesis Paper • 2405.14867 • Published May 23, 2024 • 15
Distributed Speculative Inference of Large Language Models Paper • 2405.14105 • Published May 23, 2024 • 18
ReVideo: Remake a Video with Motion and Content Control Paper • 2405.13865 • Published May 22, 2024 • 25