a 1 quadrillion parameter language model is not entirely out of the question (besides where to get all that data from) although you'd probably need 25% more GPUs than this for context and KV cache 100,000 H100s could probably do it