DApp Store | Web3 Hub tapahtumille ja peleille

Trendaavat aiheet

Reinforcement learning enables LLMs to beat humans on programming/math competitions and has driven recent advances (OpenAI's o-series, Anthropic's Claude 4) Will RL enable broad generalization in the same way that pretraining does? Not with current techniques 🧵 1/7

🔗Links here and thread below: Paper: Medium: Substack: 2/7

Existing evaluation for LLMs primarily assess in-domain performance, using reinforcement post-training (RPT) models trained on mixed-domain data and evaluated on benchmarks closely aligned with their training domains. These setups introduce confounding factors that obscure the true extent of RPT’s generalization ability 3/7

We introduce a unified evaluation framework that isolates and tests RPT’s cross-domain generalization using 16 benchmarks across math, code, and knowledge-intensive reasoning. Within this framework, we evaluate various combinations of base models and RPT strategies 4/7

📌 Our key findings: 1️⃣ RPT gains are mostly in-domain 2️⃣ Math & code generalize well to each other 3️⃣ Structured skills do not transfer to unstructured, knowledge-intensive tasks 5/7

The takeaway? RPT is powerful but narrow It improves performance where it’s trained, but generalizes poorly 6/7

This work is joint with @ChuxuanHu, @maxYuxuanZhu, @aokellermann, Caleb Biddulph, @PunWai, and @jasoncbenn 7/7

2,63K

Johtavat

Rankkaus

Suosikit

Ketjussa trendaava

Trendaa X:ssä

Viimeisimmät suosituimmat rahoitukset

Merkittävin