im only 5m in and it seems like obviously reasoning models need an amygdala current approach (afaik) is like alphago with a policy network only doesn't have to be a separate model (later alphazero used a combined value+policy model) but gotta train that "am i winning" output