RL is a powerful mechanism for training company-specific models on their unique work and data. This is what we do at Applied Compute. A key challenge is how to make RL efficient, because we need runs to be fast (delivered in days), cheap (scalable unit economics), and predictable (not just fast, but reliably fast). Here are some takeaways: • Synchronous RL is wasteful with time and compute. • Asynchronous RL is more efficient but introduces staleness, which causes learning instabilities. • Modeling and simulations can help analytically solve for what configuration leads to optimal efficiency. This allows us to rapidly prototype training configurations, without burning expensive compute cycles on trial runs. Two of our co-founders, @rhythmrg and @lindensli, discussed some of this research at @aiDotEngineer recently, with a focus on the following subproblem: what is the highest throughput way to do RL given a maximum staleness and compute budget?