DApp Store | Web3 Hub tapahtumille ja peleille

Trendaavat aiheet

For folks wondering what's happening here technically, an explainer: When there's lots of training data with a particular style, using a similar style in your prompt will trigger the LLM to respond in that style. In this case, there's LOADS of fanfic: 🧵

The SCP wiki is really big -- about 30x bigger than the whole Harry Potter series, at >30 million words! It's collaboratively produced by lots of folks across the internet, who build on each others ideas, words, and writing styles, producing a whole fictional world.

Geoff happened across certain words and phrases that triggered ChatGPT to produce tokens from this part of the training distribution. And the tokens it produced triggered Geoff in turn. That's not a coincidence, the collaboratively-produced fanfic is meant to be compelling!

This created a self-reinforcing feedback loop. The more in-distribution tokens ChatGPT was getting in its chat history, the more strongly the auto-regressive model was pushed to stay in that distribution. ChatGPT memory made this even worse, letting it happen across chats.

Psychiatrists have been warning about the potential for chatbots to trigger psychosis for some years.

I'm not sure the best way to counter this. Perhaps services can use the monitoring layer then nearly all use to look for copyright violations, system prompt hacks, etc, to also look for signs a user may be taking a role play too seriously, and let them know they're just playing?

180,4K

Johtavat

Rankkaus

Suosikit

Ketjussa trendaava

Trendaa X:ssä

Viimeisimmät suosituimmat rahoitukset

Merkittävin