Shuai Wang's picture

Shuai Wang

Shuaiii

·

AI & ML interests

None yet

Recent Activity

upvoted a paper 6 days ago

CaptionQA: Is Your Caption as Useful as the Image Itself?

upvoted a paper 6 days ago

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

upvoted a paper 6 days ago

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

View all activity

Organizations

None yet

upvoted 3 papers 6 days ago

CaptionQA: Is Your Caption as Useful as the Image Itself?

Paper • 2511.21025 • Published 11 days ago • 25

DeepSeekMath-V2: Towards Self-Verifiable Mathematical Reasoning

Paper • 2511.22570 • Published 10 days ago • 63

Z-Image: An Efficient Image Generation Foundation Model with Single-Stream Diffusion Transformer

Paper • 2511.22699 • Published 10 days ago • 146

upvoted a paper 22 days ago

Black-Box On-Policy Distillation of Large Language Models

Paper • 2511.10643 • Published 24 days ago • 46

upvoted a paper 23 days ago

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

Paper • 2511.09057 • Published 25 days ago • 75

upvoted 2 papers 28 days ago

Revisiting Multimodal Positional Encoding in Vision-Language Models

Paper • 2510.23095 • Published Oct 27 • 20

ROVER: Benchmarking Reciprocal Cross-Modal Reasoning for Omnimodal Generation

Paper • 2511.01163 • Published Nov 3 • 31

upvoted a paper about 1 month ago

Emu3.5: Native Multimodal Models are World Learners

Paper • 2510.26583 • Published Oct 30 • 106

upvoted 6 papers about 2 months ago

Scaling Instruction-Based Video Editing with a High-Quality Synthetic Dataset

Paper • 2510.15742 • Published Oct 17 • 50

OmniVinci: Enhancing Architecture and Data for Omni-Modal Understanding LLM

Paper • 2510.15870 • Published Oct 17 • 89

QeRL: Beyond Efficiency -- Quantization-enhanced Reinforcement Learning for LLMs

Paper • 2510.11696 • Published Oct 13 • 176

Diffusion Transformers with Representation Autoencoders

Paper • 2510.11690 • Published Oct 13 • 165

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Paper • 2510.09608 • Published Oct 10 • 50

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Paper • 2510.05034 • Published Oct 6 • 48

upvoted 6 papers 2 months ago

VideoNSA: Native Sparse Attention Scales Video Understanding

Paper • 2510.02295 • Published Oct 2 • 9

Learning to See Before Seeing: Demystifying LLM Visual Priors from Language Pre-training

Paper • 2509.26625 • Published Sep 30 • 43

InfLLM-V2: Dense-Sparse Switchable Attention for Seamless Short-to-Long Adaptation

Paper • 2509.24663 • Published Sep 29 • 13

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 184

SIM-CoT: Supervised Implicit Chain-of-Thought

Paper • 2509.20317 • Published Sep 24 • 41

Video models are zero-shot learners and reasoners

Paper • 2509.20328 • Published Sep 24 • 98