what’s the most thorough investigation there’s been into a single backprop step with nn interp, I want a paper like the life of a backpropagation walking me layer by layer so I can follow exactly what happens
I want to see like every weight that is adjusted, every off target effect of learning a new concept etc
I basically want the building blocks of interpretability but for backprop
I also want a paper on like the archeology of a model, like train inceptionv1 or whatever from scratch and visualize the whole thing, plot different grok steps, have a model monitor every circuit and keep track of the growth of deutschian knowledge
7.01K