The co-founders of @flappyairplanes call the current RL paradigm for model training "environment slop." They explain: "The reinforcement paradigms of today are shockingly inefficient. You don't really get much generalization across tasks, you teach a model through one kind of learning and then you teach it the next one. It's kind of like whack-a-mole. We look at this and think it's kind of crazy. The next paradigm of AI will not be environment slop." "Human level intelligence is not the ceiling, it is merely the floor on what is possible. If you can train models with vastly less data and possibly more compute in very different ways, what is going to happen? We actually don't know. But I do think they'll be different and weird and they'll have interesting capabilities that we'll find really valuable ways to use."