wake up babe new RL algo dropped
Chujie Zheng ✈️ ICLR
Chujie Zheng ✈️ ICLR25.7. klo 18.35
Proud to introduce Group Sequence Policy Optimization (GSPO), our stable, efficient, and performant RL algorithm that powers the large-scale RL training of the latest Qwen3 models (Instruct, Coder, Thinking) 🚀 📄
698