Oh yes. Finally starting in on @karpathy's autoresearch, tweaked now for the coherence transformer architecture. Tiny model. 4 layers × 4 heads × 256 dim ≈ ~5M params, training at 128, evaluating at 1024. No softmax attention heads. Replaced with oscillator lattices. All generation is done a layer above the transformer in a pure resonance lattice which steers token generation. Theoretical continuous learning and infinite context since there is no KV cache. Just a store of phase locked modes from tokens coupling coherently.