Open Agent Evaluation Laboratory

university

https://boxiyu.github.io/

boxi-yu-194b63279

AI & ML interests

Code Agent, Benchmark Augmentation

Recent Activity

CWCY updated a dataset 2 days ago

OpenAgentLab/SWE-ABS

Bertsekas authored a paper 8 months ago

How Should I Build A Benchmark? Revisiting Code-Related Benchmarks For LLMs

Bertsekas authored a paper 8 months ago

UTBoost: Rigorous Evaluation of Coding Agents on SWE-Bench

View all activity

OpenAgentLab 's models

None public yet