YukangChen's picture

YukangChen

Yukang

·

https://scholar.google.com/citations?user=6p0ygKUAAAAJ&hl=en

AI & ML interests

Efficient and Long AI

Recent Activity

upvoted a paper 17 days ago

V-ReasonBench: Toward Unified Reasoning Benchmark Suite for Video Generation Models

updated a model 19 days ago

Perflow-Shuai/streaming_vlm_e1_lr2e-5_dt_rebuttal_stage2_ps512_pw512_from_qwen_run2-checkpoint-42-model

published a model 19 days ago

Perflow-Shuai/streaming_vlm_e1_lr2e-5_dt_rebuttal_stage2_ps512_pw512_from_qwen_run2-checkpoint-42-model

View all activity

Organizations

authored 20 papers about 2 months ago

VoxelNeXt: Fully Sparse VoxelNet for 3D Object Detection and Tracking

Paper • 2303.11301 • Published Mar 20, 2023

Spherical Transformer for LiDAR-based 3D Recognition

Paper • 2303.12766 • Published Mar 22, 2023

Denoising Diffusion Step-aware Models

Paper • 2310.03337 • Published Oct 5, 2023 • 1

LISA: Reasoning Segmentation via Large Language Model

Paper • 2308.00692 • Published Aug 1, 2023 • 1

LISA: Reasoning Segmentation via Large Language Model

Paper • 2308.00692 • Published Aug 1, 2023 • 1

Grounded SAM: Assembling Open-World Models for Diverse Visual Tasks

Paper • 2401.14159 • Published Jan 25, 2024 • 6

FocalFormer3D : Focusing on Hard Instance for 3D Object Detection

Paper • 2308.04556 • Published Aug 8, 2023 • 9

Mask-Attention-Free Transformer for 3D Instance Segmentation

Paper • 2309.01692 • Published Sep 4, 2023 • 1

Focal Sparse Convolutional Networks for 3D Object Detection

Paper • 2204.12463 • Published Apr 26, 2022

RL-GPT: Integrating Reinforcement Learning and Code-as-policy

Paper • 2402.19299 • Published Feb 29, 2024 • 2

MR-BEN: A Comprehensive Meta-Reasoning Benchmark for Large Language Models

Paper • 2406.13975 • Published Jun 20, 2024

LongVILA: Scaling Long-Context Visual Language Models for Long Videos

Paper • 2408.10188 • Published Aug 19, 2024 • 52

NVILA: Efficient Frontier Visual Language Models

Paper • 2412.04468 • Published Dec 5, 2024 • 59

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Paper • 2505.13031 • Published May 19 • 4

Scaling RL to Long Videos

Paper • 2507.07966 • Published Jul 10 • 159

TTS-VAR: A Test-Time Scaling Framework for Visual Auto-Regressive Generation

Paper • 2507.18537 • Published Jul 24 • 17

3D Aware Region Prompted Vision Language Model

Paper • 2509.13317 • Published Sep 16 • 14

LongLive: Real-time Interactive Long Video Generation

Paper • 2509.22622 • Published Sep 26 • 184

SANA-Video: Efficient Video Generation with Block Linear Diffusion Transformer

Paper • 2509.24695 • Published Sep 29 • 45

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Paper • 2510.09608 • Published Oct 10 • 50