Let's adapt this to the 5 min and fine tune and train a vision model for this lmaoooo