$A^3$-Bench A new benchmark that evaluates memory-driven mechanisms in scientific reasoning. It measures how models activate "anchors" (core formulas) and "attractors" (schemas/examples) during inference—going beyond just checking final answers.