so openzeppelin audited evmbench, that ai security benchmark from openai and paradigm. turns out ai auditors are basically just remembering bugs from their training data 🧵 /1
every audit report published before the training cutoff is baked into the model. so when it "finds" a reentrancy bug its not reasoning about your code, its just matching patterns against hundreds of reports it already memorized
show it a textbook erc-777 reentrancy and it lights up instantly. critical finding, here's the cve, here's 12 similar incidents show it a novel accounting bug in a mechanism it's never seen? it flags missing events and moves on
the scary part isnt that it misses stuff. the scary part is it's completley confident while missing stuff. zero hesitation. just vibes and pattern matching. undoubtedly useful but also insufficient
this is exactly why we went with human auditors for alchemix v3. our earmark/redeem system, packed epoch+index math, survival accumulators. none of this exists in any training corpus. zero examples. ai literally cannot see it
183