With nmoe, I finally feel like my expertise in pytorch, cuda, CuteDSL and b200s matches my old expertise in Jax, pallas, mosaic and TPU v4. Took almost 2 years to do it