AI agents are taking actions for us: booking flights, writing code, making decisions. But how do we know they're actually doing what they're supposed to? 🤔