1/ We know Transformers fail at length extrapolation. But new research shows a deeper flaw: they fail at IN-DISTRIBUTION state tracking. They don't learn algorithmic rules, they just memorize isolated circuits per length. 🧵