In our 2020 paper, we defined deployment efficiency in RL algorithm. Conclusion is that perf is more bounded by the frequency of deployments, than samples. Online learning is the key, and it's exactly how "post-training" was popularized for LLMs. Sunday is 💯 #schmidhubering