when you think about it stochastic gradient descent at fp8 is just yahtzee at scale
1.38K