2023: at OpenAI, some key figs in post-training were against scaling RL. 2022: Eric/Yuhuai (now xAI) wrote STaR and I wrote "LLM can self-improve". It was clear that RL on clean signals unlocks the next leap. Pre/post-training divide may have been a big slowdown to AI.
34,76K