“Next-token prediction” just got a serious rival 🤯 Ant Group just dropped LLaDA 2.1, and it challenges the dominant paradigm of LLMs. Unlike most models that generate one token at a time, LLaDA 2.1 uses diffusion to generate blocks of text in parallel. Why this changes everything:  → Global Planning: It effectively sees the "future" while writing the "past" → Parallel Generation: It generates chunks in parallel, not sequentially, being much faster → Massive Efficiency: 16B MoE architecture that only uses ~1.4B active parameters per step. 100% Open Source.