I'm bad at posting things on time! (my excuse is I'm teaching at Addis Coder again this year) The poster session for this paper is happening RIGHT NOW! Session5: V-Gather Find 7/28/2025 18:00-19:30 Say hi to @ChuxuanHu :)
Daniel Kang
Daniel Kang29.7.2025
Can AI agents assess the reproducibility of research findings? Our #ACL2025 paper shows that they fall short with REPRO-Bench, a new benchmark that evaluates agents on real-world social science reproducibility tasks of 112 papers, full PDFs, code, and data. Our highest performing agent scores <40%! 1/6
2,68K