RLHF (reinforcement learning with Hasan feedback)