Tämä on rohkaisevaa keino havaita ja korjata poikkeamia mallin persoonallisuuksissa ja linjauksessa.
Anthropic
Anthropic2.8. klo 00.23
New Anthropic research: Persona vectors. Language models sometimes go haywire and slip into weird and unsettling personas. Why? In a new paper, we find “persona vectors"—neural activity patterns controlling traits like evil, sycophancy, or hallucination.
195