Open Agent Evaluation Laboratory

university

https://boxiyu.github.io/

boxi-yu-194b63279

AI & ML interests

Code Agent, Benchmark Augmentation

Recent Activity

CWCY updated a dataset 1 day ago

OpenAgentLab/SWE-ABS

Bertsekas authored a paper 8 months ago

How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs

Bertsekas authored a paper 8 months ago

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

View all activity

models 0

None public yet

datasets 1

OpenAgentLab/SWE-ABS

Viewer • Updated 1 day ago • 500 • 12