some people say that an RL environment is just a docker container others say it's just step() + reset() why not make everyone happy?
i think it’s pretty clear that neither is the whole story. and this is the design challenge that verifiers aims to solve; anything that someone might reasonably consider an RL environment should be supported *naturally*, and the low-level primitives are built with this in mind
s/o @hallerite + @kcoopm for their work on these 🫡
1.55K