Want to get an LLM agent to succeed in an OOD environment? We tackle the hardest case with SPA (Self-Play Agent). No extra data, tools, or stronger models. Pure self-play. We first internalize a world model via Self-Play, then we learn how to win by RL. Like a child playing with the env to simply learn about “what if I do this?” Below, we show our findings on: What is wrong with OOD environments? What are the key factors that allow self-play to succeed? (1/8)