🚨China's DeepSeek just dropped the only open-source model good enough at math to win IMO Gold, and a must-read report! The key idea draws from things Karpathy and others have spoken about: move beyond “final answer RL” into a generator–verifier–meta-verifier loop in pure language. – A verifier is RL-trained to score proofs. – A meta-verifier checks the verifier’s critiques. – A generator is RL-trained on verifier reward signals to write and self-check better proofs. Because everything lives in natural language (no Lean), this recipe SHOULDextend to many verifiable domains: science, code, anywhere where checking is easier than solving!