OpenAgents: An Open Platform for Language Agents in the Wild Paper • 2310.10634 • Published Oct 16, 2023 • 9
LayoutReader: Pre-training of Text and Layout for Reading Order Detection Paper • 2108.11591 • Published Aug 26, 2021 • 1
VideoAgentTrek: Computer Use Pretraining from Unlabeled Videos Paper • 2510.19488 • Published Oct 22 • 19
Agent Data Protocol: Unifying Datasets for Diverse, Effective Fine-tuning of LLM Agents Paper • 2510.24702 • Published Oct 28 • 27
Scaling Computer-Use Grounding via User Interface Decomposition and Synthesis Paper • 2505.13227 • Published May 19 • 45
AgentTrek: Agent Trajectory Synthesis via Guiding Replay with Web Tutorials Paper • 2412.09605 • Published Dec 12, 2024 • 30
Aguvis: Unified Pure Vision Agents for Autonomous GUI Interaction Paper • 2412.04454 • Published Dec 5, 2024 • 72
OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments Paper • 2404.07972 • Published Apr 11, 2024 • 50
Lemur: Harmonizing Natural Language and Code for Language Agents Paper • 2310.06830 • Published Oct 10, 2023 • 34
DiT: Self-supervised Pre-training for Document Image Transformer Paper • 2203.02378 • Published Mar 4, 2022 • 2
LayoutLMv2: Multi-modal Pre-training for Visually-Rich Document Understanding Paper • 2012.14740 • Published Dec 29, 2020 • 2
LayoutXLM: Multimodal Pre-training for Multilingual Visually-rich Document Understanding Paper • 2104.08836 • Published Apr 18, 2021
MarkupLM: Pre-training of Text and Markup Language for Visually-rich Document Understanding Paper • 2110.08518 • Published Oct 16, 2021 • 2
LayoutLM: Pre-training of Text and Layout for Document Image Understanding Paper • 1912.13318 • Published Dec 31, 2019 • 4