ranpox (Yiheng Xu)

authored 7 papers 2 months ago

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Paper • 2510.24702 • Published Oct 28, 2025 • 28

authored a paper 8 months ago

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Paper • 2505.13227 • Published May 19, 2025 • 45

authored a paper 11 months ago

Qwen2.5-VL Technical Report

Paper • 2502.13923 • Published Feb 19, 2025 • 212

authored 3 papers about 1 year ago

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Paper • 2412.09605 • Published Dec 12, 2024 • 30

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

Paper • 2412.04454 • Published Dec 5, 2024 • 71

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Paper • 2404.07972 • Published Apr 11, 2024 • 51

authored 6 papers over 2 years ago

Lemur: Harmonizing Natural Language and Code for Language Agents

Paper • 2310.06830 • Published Oct 10, 2023 • 33

DiT: Self-supervised Pre-training for Document Image Transformer

Paper • 2203.02378 • Published Mar 4, 2022 • 3

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

Paper • 2012.14740 • Published Dec 29, 2020 • 3

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

Paper • 2104.08836 • Published Apr 18, 2021

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

Paper • 2110.08518 • Published Oct 16, 2021 • 2

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Paper • 1912.13318 • Published Dec 31, 2019 • 5

Yiheng Xu

AI & ML interests

Organizations

OpenAgents: An Open Platform for Language Agents in the Wild

DocBank: A Benchmark Dataset for Document Layout Analysis

LayoutReader: Pre-training of Text and Layout for Reading Order Detection

In-Context Learning with Many Demonstration Examples

OpenCUA: Open Foundations for Computer-Use Agents

VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos

Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents

Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis

Qwen2.5-VL Technical Report

AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials

Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction

OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

Lemur: Harmonizing Natural Language and Code for Language Agents

DiT: Self-supervised Pre-training for Document Image Transformer

LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding

LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding

MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding

LayoutLM: Pre-training of Text and Layout for Document Image Understanding

Yiheng Xu

AI & ML interests

Organizations

ranpox's activity