5 21 21

Zuhao Yang

mwxely

https://mwxely.github.io/

AI & ML interests

Large Multimodal Models

Recent Activity

upvoted a paper 4 days ago

XR: Cross-Modal Agents for Composed Image Retrieval

updated a dataset 9 days ago

mwxely/lmms-eval-test

published a dataset 9 days ago

mwxely/lmms-eval-test

View all activity

Organizations

upvoted a paper 4 days ago

XR: Cross-Modal Agents for Composed Image Retrieval

Paper • 2601.14245 • Published 5 days ago • 9

updated a dataset 9 days ago

mwxely/lmms-eval-test

Updated 9 days ago • 29

published a dataset 9 days ago

mwxely/lmms-eval-test

Updated 9 days ago • 29

upvoted a paper 11 days ago

DeepResearchEval: An Automated Framework for Deep Research Task Construction and Agentic Evaluation

Paper • 2601.09688 • Published 11 days ago • 123

liked a dataset 20 days ago

veggiebird/MATPO-data

Viewer • Updated Oct 8, 2025 • 20k • 217 • 2

upvoted a paper 22 days ago

On the Role of Discreteness in Diffusion LLMs

Paper • 2512.22630 • Published 29 days ago • 18

upvoted a paper 23 days ago

mHC: Manifold-Constrained Hyper-Connections

Paper • 2512.24880 • Published 25 days ago • 279

upvoted a paper 26 days ago

Wan-Move: Motion-controllable Video Generation via Latent Trajectory Guidance

Paper • 2512.08765 • Published Dec 9, 2025 • 132

upvoted a paper 27 days ago

EgoX: Egocentric Video Generation from a Single Exocentric Video

Paper • 2512.08269 • Published Dec 9, 2025 • 119

upvoted 5 papers about 1 month ago

Robust-R1: Degradation-Aware Reasoning for Robust Visual Understanding

Paper • 2512.17532 • Published Dec 19, 2025 • 67

The Prism Hypothesis: Harmonizing Semantic and Pixel Representations via Unified Autoencoding

Paper • 2512.19693 • Published Dec 22, 2025 • 64

Agent Learning via Early Experience

Paper • 2510.08558 • Published Oct 9, 2025 • 272

Kandinsky 5.0: A Family of Foundation Models for Image and Video Generation

Paper • 2511.14993 • Published Nov 19, 2025 • 230

Thinking with Video: Video Generation as a Promising Multimodal Reasoning Paradigm

Paper • 2511.04570 • Published Nov 6, 2025 • 212

New activity in mwxely/TransitBench about 1 month ago

[bot] Conversion to Parquet

#1 opened 7 months ago by

parquet-converter

When do you release the code?

#2 opened 5 months ago by

zhangzb

authored a paper about 1 month ago

A Comprehensive Study on Visual Token Redundancy for Discrete Diffusion-based Multimodal Large Language Models

Paper • 2511.15098 • Published Nov 19, 2025

updated 3 datasets about 2 months ago

Zuhao Yang

AI & ML interests

Recent Activity

Organizations

mwxely's activity

[bot] Conversion to Parquet

When do you release the code?