It’s been really fun seeing Naman work. Evals are such a hard and interesting research space as models get so good. Central to what Cursor does.