A project I'm very happy to see released, led by @couplefire12 during his internship at Together 🔥 If you're curious about reasoning with RL in non-verifiable setups, do take a look!