We’re releasing BrowseComp, which stands for Browsing Competition. 🏎️ Think of it like coding or math competitions — while these contests may not perfectly reflect real-world SWE or mathematical research, they do capture a spark of intelligence. This is THE benchmark we should care about when evaluating the intelligence of deep research-like browsing agents.
OpenAI
OpenAI11.4.2025
We’re open-sourcing BrowseComp (“Browsing Competition”), a new, challenging benchmark designed to test how well AI agents can browse the internet to find hard-to-locate information. It’s like an online scavenger hunt…but for browsing agents.
464,68K