DApp Store | Web3 Hub for Events & Games

Trending topics

An $800M company exists because evals were so broken that the founder had to build the same internal tool twice, at two different companies, before anyone would pay for it. First at his own startup. Then again leading the AI team at Figma. Same problem both times: teams shipping AI features had no structured way to know if the outputs were getting better or worse. They were vibe-checking. Manually reading outputs. Guessing. > That's how BrainTrust started. And now Vercel, Replit, Ramp, Zapier, Notion, and Airtable all use it. The number that reframes this: the companies whose AI products actually work are running 12.8 eval experiments per day. Think about that cadence. Most AI teams I talk to aren't running 12.8 per month. The framework is simpler than people expect. Every eval is three things: a set of inputs your product handles, a task that generates outputs, and a scoring function that produces a number between 0 and 1. In this episode, we built one from scratch on camera. Score went from 0 to 0.75 in under 20 minutes. Evals are becoming the new PRD. The PMs who build eval infrastructure now are going to compound product quality in a way that PMs who keep vibe-checking simply cannot match. The gap is already opening.

Top

Ranking

Favorites