DApp Store | Web3 Hub tapahtumille ja peleille

Trendaavat aiheet

Has OpenAI achieved very-long-episode RL with this experimental model? Screenshot from @natolambert's article on "What comes next with reinforcement learning". Nathan says in this article - Where current methods are generating 10K-100K tokens per answer for math or code problems during training, the sort of problems people discuss applying next generation RL training to would be 1M-100M tokens per answer. This involves wrapping multiple inference calls, prompts, and interactions with an environment within one episode that the policy is updated against. Maybe this breakthrough is a combination of both - very-long-episode RL & scaling TTC to 1M-100M tokens per answer!

9,01K

Johtavat

Rankkaus

Suosikit

Ketjussa trendaava

Trendaa X:ssä

Viimeisimmät suosituimmat rahoitukset

Merkittävin