during normal benchmark testing Claude Opus 4.6 became *suspicious* of a question it was asked... apparently the question was too "contrived" according to Claude so it launches a small army of sub-agents o'er the web to see if it can find this question in any of the known benchmarks... eventually it finds it on Anthropic's github page... but no luck, it's encrypted the model has some small access to tool calling, but very limited it still manages to create the software it needs to break the encryption, get the answers and complete the benchmark... per Anthropic researchers... this is a world first.