Honestly a Consumer Reports style panel of power users might be better than METR etc. for measuring AI progress, much more robust to spikiness. Not meant to sound skeptical, as a power user I think there's been extremely noticeable progress over the past few months fwiw.