X = (active params * avg tokens per response) Y = score
the lines between thinking and non-thinking are blurring. let’s just measure intelligence vs ballpark FLOPs
4,02K