You're in an ML Engineer interview at Stripe. The interviewer asks: "People often dispute transactions they actually made. How would you build a model that predicts these fake disputes without any labeled data?" You: "I'll flag cards with high dispute rates." Interview over. Here's what you missed: There's a technique called Active learning that lets you build supervised models without labeled data. It's cheaper and faster than manual annotation. The idea is simple: get human feedback on examples where the model struggles most. Here's how it works: ↳ Start small: Manually label 1-2% of your data. Build your first model on this tiny dataset. It won't be good, but that's the point. ↳ Generate predictions: Run the model on unlabeled data and capture confidence scores. Probabilistic models work well here—look at the gap between the top two predicted classes. ↳ Label strategically: Rank predictions by confidence. Have humans label only the lowest confidence examples. No point labeling what the model already knows. ↳ Repeat and improve: Feed labeled data back to the model. Train again. The model gets smarter about what it doesn't know. Stop when performance meets your requirements. ...