some slides from my talk at @PyTorch conf earlier this week about the design choices of verifiers and how we've been building the flagship ecosystem for open RL environments :)
notably: - we think that the right encapsulation for an environment is an installable Python package which implements a factory function, and which can manage external resources either via a library of prebuilt components or via its own custom launchers - we think that the OpenAI Chat Completions API is the right level of abstraction for most developers building environments, with OpenAI Completions as an option for the fraction of cases requiring more fine-grained control - we think that trainer and environment framework developers should bear the burden of exposing clean and familiar primitives to environment builders, which mirror the development experiences of building static agents or evals - we think that RL environments for LLMs bring unique challenges vs previous eras of RL, and that abstractions should evolve to account for this - we think that containers are important for lots of environments, but shouldn't be mandatory for environments that don't need them - we think that building this ecosystem is a global challenge, requiring nuanced and open discussions amongst interested stakeholders to ensure that everyone can benefit we spend a lot of time thinking about this stuff, debating tradeoffs, iterating, and experimenting. if there's something you need that we don't yet support, or suggestions on how we can improve, we're all ears :)
22.11K