I remember being so impressed when I added the transformers XL model of Yang Zhilin as the third model in the transformers library back in early 2019 not surprised he's now shipping crazy impressive stuff with Moonshot AI it's a long story I should tell one day but in a way this small Google team was among the first really understanding the power of scaling training data
14,35K