🚨 NVIDIA just did the impossible. They trained a 12B-parameter language model on 10 trillion tokens entirely in 4-bit precision. It’s called NVFP4, and it might redefine how frontier AI models are trained. Here’s why this matters: • NVFP4 delivers 2–3× faster math throughput and 50% less memory vs FP8 • Accuracy? Practically identical. (MMLU-Pro: FP8 = 62.62%, NVFP4 = 62.58%) • Stability issues? Solved using Random Hadamard transforms, stochastic rounding, and 2D scaling • Trained entirely on NVIDIA Blackwell GPUs the first 4-bit run stable across 10T tokens This is the first successful demonstration of large-scale 4-bit pretraining without losing accuracy. The next generation of frontier models will be faster, cheaper, and greener without compromise.