super neat pipeline, to summarize they: 1. rephrase user-specified task prompt, 2. generate a few (n=25) diverse synthetic training examples (leveraging ICL with large context window to ensure diversity), 3. train model with GRPO + LoRA, using RULER rubrics as reward
Matt Shumer
Matt Shumer30.7.2025
Introducing `AutoRL` 📈 The world's simplest way to train a task-specific LLM with RL. *Just write a SENTENCE describing the model you want.* A chain of AI systems will generate data + rubrics and train a model for you. Powered by ART, it's open source. Link in thread:
5,78K