Grok 4 is at the point where it essentially never gets math/physics exam questions wrong, unless they are skillfully adversarial. It can identify errors or ambiguities in questions, then fix the error in the question or answer each variant of an ambiguous question.
Deedy
Deedy10.7. klo 14.07
Insane that Elon Musk has pulled it off again, absolutely crushing the AI wars with Grok 4. Summarizing the core announcements: — Post-training RL spend == pretraining spend — $3/M input told, $15/M output toks, 256k context, price 2x beyond 128k — #1 on Humanity’s Last Exam (general hard problems) 44.4%, #2 is 26.9% — #1 on GPQA (hard graduate problems) 88.9%. #2 is 86.4% — #1 on AIME 2025 (Math) 100%, #2 is 98.4% — #1 on Harvard MIT Math 96.7%, #2 is 82.5% — #1 on USAMO25 (Math) 61.9%, #2 is 49.4% — #1 on ARC-AGI-2 (easy for humans, hard for AI) 15.9%, #2 is 8.6% — #1 on LiveCodeBench (Jan-May) 79.4%, #2 is 75.8% Grok 4 is “potentially better than PhD level in every subject no exception”.. and it’s pretty cheap. Massive moment in the AI wars and Elon has come to play.
6,33M