Great benchmark. The first AI to do this will be very impressive I particularly like the requirement for experimentation. You can’t really solve any of these until you test hypotheses and learn from them
ARC Prize
ARC Prize19.7. klo 01.26
Today, we're announcing a preview of ARC-AGI-3, the Interactive Reasoning Benchmark with the widest gap between easy for humans and hard for AI We’re releasing: * 3 games (environments) * $10K agent contest * AI agents API Starting scores - Frontier AI: 0%, Humans: 100%
5,68K