One of the major themes of how I architected the agent building this is the agent spends way more time refactoring testing and benchmarking the existing code than it does adding new fetures. Only ~1/10 context windows adds a new feature