Holy shit… Stanford just showed why LLMs sound smart but still fail the moment reality pushes back. This paper tackles a brutal failure mode everyone building agents has seen: give a model an under-specified task and it happily hallucinates the missing pieces, producing a plan that looks fluent and collapses on execution. The core insight is simple but devastating for prompt-only approaches: reasoning breaks when preconditions are unknown. And most real-world tasks are full of unknowns. Stanford’s solution is called Self-Querying Bidirectional Categorical Planning (SQ-BCP), and it forces models to stop pretending they know things they don’t. Instead of assuming missing facts, every action explicitly tracks its preconditions as: • Satisfied • Violated • Unknown Unknown is the key. When the model hits an unknown, it’s not allowed to proceed. It must either: 1. Ask a targeted question to resolve the missing fact or 2. Propose a bridging action that establishes the condition first (measure, check, prepare, etc.) Only after all preconditions are resolved can the plan continue. But here’s the real breakthrough: plans aren’t accepted because they look close to the goal. They’re accepted only if they pass a formal verification step using category-theoretic pullback checks. Similarity scores are used only for ranking, never for correctness. ...