Open models show 2.5x faster, 6x more expensive Lower batch size, speculative decoding harder Pareto optimal curve for Deepseek at shows this Claude Opus 4.6 is 100 Tok/s/user Deepseek at 100 is 6k Tok/s/GPU At 250 tok/s/user it's closer to 1k