🤔 How do we train LLMs on real-world tasks where it’s hard to define a single verifiable answer?
Our work at @scale_AI introduces Rubrics as Rewards (RaR) — a framework for on-policy post-training that uses structured, checklist-style rubrics as interpretable reward signals. 🧵