Haven't tweeted a lot in these last two months as I spent time learning and experimenting with various RL techniques. Excited to share some WIP soon: 1. Compute-optimal recipe for GRPO training 2. RL-powered tool to enhance privacy in LLM interactions The experiments have been promising 👀
3,79K