The real win from being able to train a 1T parameter model on a “shoe string” budget isn’t the cost savings. It’s the efficiency gain that lets you move faster and increase your iteration speed. Pay attention to the slope. Since I can remember, the best deep learning models come from the labs that iterate the fastest.