Good RL environments are the bottleneck
Mechanize
Mechanize12.7. klo 01.39
Despite being trained on more compute than GPT-3, AlphaGo Zero could only play Go, while GPT-3 could write essays, code, translate languages, and assist with countless other tasks. That gap shows that what you train on matters. Rich RL environments are now the bottleneck.
403