If LLMs work using probability to predict the next token based on pattern recognition from training data can they ever produce anything outside the pattern or are they literally unable to ‘think outside the box’? anyone working with deliberate hallucinations like deepmind was? would love your take @karpathy but please do point me to where I can learn more if it’s a dumb question and I’m misunderstanding how LLMs work