DApp Store | Web3 Hub tapahtumille ja peleille

Trendaavat aiheet

Bonk Eco continues to show strength amid $USELESS rally

Pump.fun to raise $1B token sale, traders speculating on airdrop

Boop.Fun leading the way with a new launchpad on Solana.

BOOP−1,47 %

Boopa+1,16 %

PORK−3 %

will brown

reward hacking @primeintellect

will brown10 tuntia sitten

one of my favorite parts of working at prime intellect is getting to pick the silly names whenever someone launches a new instance

3,51K

will brown11 tuntia sitten

RL went from not working at all to working so well that code can have major correctness bugs and you don't notice because it still just works

17,45K

will brown14 tuntia sitten

one of these days i'm gonna start squashing commits but today is not that day

4,79K

will brown15 tuntia sitten

it’s a shame that we’re running out of internet data because everyone collectively stopped putting new content onto the internet

8,91K

will brown17 tuntia sitten

ChatGPT should have a big green switch that says "Syco Mode"

4,66K

will brown kirjasi uudelleen

Casper Hansen22 tuntia sitten

Recipe to post-train Qwen3 1.7B into a DeepResearch model What does it mean for something small to think deeply? Meet Lucy, a post‑trained Qwen3‑1.7B as a DeepResearch model based on @willccbb's verifiers. Primary Rule-based Rewards: - Answer correctness We check whether the final response literally contains the ground-truth answer. This substring match is cheap and avoids calling a larger LLM judge. - Visit/search ratio If the agent visits at least as many pages as it issues search queries, it receives ((visit_search_ratio - 1) / 4) ** 0.25. If it searches more than it visits, the score is -0.5. Format / Anti Reward-Hacking Rewards: - Tool execution success Each API call that returns without an error counts. The reward is (successful_calls * unique_tools_used) / total_call_attempts. - Thinking efficiency A skew-normal penalty centered at 70 tokens discourages endless chain-of-thought between tool calling while still allowing enough tokens for planning. This is how Qwen3 1.7B learned to search, visit, and synthesize information. Small models can do deep research too!

32,52K

will brown22 tuntia sitten

if a model uses several sequential tool calls interleaved with chain-of-thought reasoning to answer a single question, this is:

11,56K

will brown23 tuntia sitten

the concept of vagueposting about things that are already on github

5,46K

will brown23 tuntia sitten

need to ship just a couple more little things and then can ship the big thing upcoming era is gonna be soooo much fun it’s really all coming together wow

3,49K

will brown22.7. klo 21.02

it's still crazy to me how much my life has totally changed in the past year. last summer i had just finished a CS theory phd, converted from banking intern to banking full-timer, and had just reached 1000 followers on here. yesterday i got recognized by someone on my flight

49,6K

Johtavat

Rankkaus

Suosikit

Ketjussa trendaava

Trendaa X:ssä

Viimeisimmät suosituimmat rahoitukset

Merkittävin