I got my multi-agent game from 83% successful executions to a little over 92%. Huge improvement! Here is the most important lesson I learned: Move as much functionality as possible into tools and let the LLM use them. The less the LLM has to do, the better. An LLM is much better at deciding which tool to use than doing the work of that tool by itself, even when it's simple work. I reduced a 100-line prompt to 25 lines and 3 Python functions that implement all the functionality. The downside is more code, but the upside is that reliability is up by almost 10%.