Diffusion for everything! We share a recipe to start from a pretrained autoregressive VLM and, with very little training compute and some nice annealing tricks, turn it into a SOTA diffusion VLM. Research in diffusion for language is progressing very quickly and in my mind, provides as promising of a path of unifying modalities as the 'omni' autoregressive models. Amazing work led by @mariannearr @ServerProcessor over the summer.