We are sharing an early preview of our ongoing SWE-1.6 training run. It significantly improves upon SWE-1.5 while being post-trained on the same pre-trained model - and it runs equally as fast at 950 tok/s. On SWE-Bench Pro it exceeds top open-source models. The preview model still exhibits some undesirable behaviors like overthinking and excessive self-verification, which we aim to improve. We are rolling out early access to a small subset of users in Windsurf.
We refined our RL recipe and scaled our infrastructure to unlock two orders of magnitude more compute than was used to train SWE-1.5. We significantly scaled the number of RL environments and see continued improvements with further RL training.
It has been fun to observe the model learning to think harder and iterate for more turns on hard SWE-Bench Pro problems. On the flip side, we observe overthinking and excessive self-verification in our own dogfooding. Figuring out the right balance between interactivity and long-horizon thinking is an active area of research.
We optimized our training stack to run 6x faster than 3 months ago. For example, our algorithm now tolerates higher staleness which allowed us to fully utilize our inference engines. In our blog post (), we share more details about our training optimizations and how we manage GPU allocation for async RL.
79