Trendaavat aiheet
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
Reinforcement learning enables LLMs to beat humans on programming/math competitions and has driven recent advances (OpenAI's o-series, Anthropic's Claude 4)
Will RL enable broad generalization in the same way that pretraining does? Not with current techniques
🧵 1/7
🔗Links here and thread below:
Paper:
Medium:
Substack:
2/7
Existing evaluation for LLMs primarily assess in-domain performance, using reinforcement post-training (RPT) models trained on mixed-domain data and evaluated on benchmarks closely aligned with their training domains. These setups introduce confounding factors that obscure the true extent of RPT’s generalization ability
3/7
We introduce a unified evaluation framework that isolates and tests RPT’s cross-domain generalization using 16 benchmarks across math, code, and knowledge-intensive reasoning. Within this framework, we evaluate various combinations of base models and RPT strategies
4/7

📌 Our key findings:
1️⃣ RPT gains are mostly in-domain
2️⃣ Math & code generalize well to each other
3️⃣ Structured skills do not transfer to unstructured, knowledge-intensive tasks
5/7

The takeaway? RPT is powerful but narrow
It improves performance where it’s trained, but generalizes poorly
6/7
This work is joint with @ChuxuanHu, @maxYuxuanZhu, @aokellermann, Caleb Biddulph, @PunWai, and @jasoncbenn
7/7
2,63K
Johtavat
Rankkaus
Suosikit