Would love this! I tried to fund a power user survey in 2024 but the project lead ended up taking another role, still think it’d be great. One challenge is the power users I care about most rn are AI company staff and it may be hard to get detailed info from them
Honestly a Consumer Reports style panel of power users might be better than METR etc. for measuring AI progress, much more robust to spikiness.
Not meant to sound skeptical, as a power user I think there's been extremely noticeable progress over the past few months fwiw.
New post: on Jan 14, I predicted that SWE time horizon by EOY would be ~24 hours. Now I think it'll be >100 hours, and maybe unbounded. For the first time, I don't see solid evidence against AI R&D automation *this year.* Link below.
Come work with me! METR is looking for engineers, scientists, and a director of operations. Links to open posts in thread, and feel free to DM if you have questions!
Our team is stretched thin at the moment!
To continue upper-bounding the autonomy of AI agents, and developing evaluations for monitoring AI systems and their propensity to subvert human control, we need more great engineering and research staff. Please apply below or DM me!