Static benchmarks are super important, and @grok 4’s performance is no doubt impressive. I’m still waiting to see what people think about the model when it’s in their hands. Will it live up to expectations? At the end of the day, it’s about real utility for individual users. Grok 4 is in the hands of our millions of users at @lmarena_ai. Can’t wait to see what they think as a first step! On my end, I’ve tried asking some hard math questions, and @grok seems to do great. It’s concise and factual. Seems super smart and I like discussing research with this model. Check out the exchangeability-related proof in the attached image! Looks like no bugs...
8,61K