Cyril and the team at CTGT are productizing mechanistic interpretability. They make it possible to edit the behavior of LLMs to add safety policy guarantees without retraining, in a way that is much more reliable than simple prompting.