Skywork-R1V4: Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch Paper • 2512.02395 • Published 5 days ago • 43
Skywork-R1V4 Collection Toward Agentic Multimodal Intelligence through Interleaved Thinking with Images and DeepResearch • 2 items • Updated 2 days ago • 3
MiroThinker: Pushing the Performance Boundaries of Open-Source Research Agents via Model, Context, and Interactive Scaling Paper • 2511.11793 • Published 23 days ago • 158
HuMo: Human-Centric Video Generation via Collaborative Multi-Modal Conditioning Paper • 2509.08519 • Published Sep 10 • 128
EmbodiedOneVision: Interleaved Vision-Text-Action Pretraining for General Robot Control Paper • 2508.21112 • Published Aug 28 • 77
InternVL3.5: Advancing Open-Source Multimodal Models in Versatility, Reasoning, and Efficiency Paper • 2508.18265 • Published Aug 25 • 208
Skywork-UniPic2 Collection A Unified DiT Multimodal Model for Image Generation, Editing, and Understanding • 8 items • Updated Aug 22 • 10
SVDQuant Collection Models and datasets for "SVDQuant: Absorbing Outliers by Low-Rank Components for 4-Bit Diffusion Models" • 20 items • Updated May 29 • 64
Skywork UniPic: Unified Autoregressive Modeling for Visual Understanding and Generation Paper • 2508.03320 • Published Aug 5 • 62
Kimi k1.5: Scaling Reinforcement Learning with LLMs Paper • 2501.12599 • Published Jan 22 • 125
Skywork-UniPic Collection Unified Autoregressive Modeling for Visual Understanding and Generation • 2 items • Updated Aug 13 • 12
I Think, Therefore I Diffuse: Enabling Multimodal In-Context Reasoning in Diffusion Models Paper • 2502.10458 • Published Feb 12 • 38
Multimodal DeepResearcher: Generating Text-Chart Interleaved Reports From Scratch with Agentic Framework Paper • 2506.02454 • Published Jun 3 • 7
CSVQA: A Chinese Multimodal Benchmark for Evaluating STEM Reasoning Capabilities of VLMs Paper • 2505.24120 • Published May 30 • 49