Not Even Bronze: Evaluating LLMs on 2025 International Math Olympiad 🥉 Nice blog post from the team behind MathArena: Evaluating LLMs on Uncontaminated Math Competitions () providing independent analysis of LLM performance on IMO.
It looks like an advanced version of Gemini with Deep Think just solved 5 out of the 6 IMO problems, earning 35 total points, and officially achieving gold-medal level performance. Congrats on the achievement @lmthang❗️ Can’t wait to play with this model
42,49K