SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks Paper • 2602.12670 • Published 9 days ago • 48
ChromouVQA: Benchmarking Vision-Language Models under Chromatic Camouflaged Images Paper • 2512.05137 • Published Nov 30, 2025