Just got off work and tried Grok-4 on an undergrad topology problem. It took 9 minutes to think and then confidently gave a clean, plausible, but totally wrong answer 😅 Don’t think this one qualifies as “skillfully adversarial.” AI models are crushing benchmarks — but still a long way ahead for real math AGI.
Elon Musk
Elon Musk10.7. klo 16.47
Grok 4 is at the point where it essentially never gets math/physics exam questions wrong, unless they are skillfully adversarial. It can identify errors or ambiguities in questions, then fix the error in the question or answer each variant of an ambiguous question.
663,01K