Algorithmic changes like GSPO for Qwen are mostly a reflection of subtle different needs of a new base model - RL dataset combo rather than being major innovation in fundamentals. Infra and data matter much more than minor RL algorithm tweaks.
15,31K