sorry, but i keep seeing posts of this nature so i need to clarify. we've known LMs are invertible for TWO YEARS. i showed this during my PhD. quoted paper adds some sophisticated extensions, but "Language Model Inversion" (Morris et al., ICLR 2024) did it first :)
Alex Imas
Alex ImasOct 29, 10:59
Holy s*&t. This paper is insane. You can recover input text from an LLM through inversion. Huge implications for how we understand these models, as well as for things like privacy.
- you can recover prompts from outputs alone, given enough sampling time - you can recover them faster by binary-searching the API if it allows 'logit bias' parameter - there's a cool extension in (Finlayson et al., 2024): you can recover the *last layer of the model itself*
Language Model Inversion
Logits of API-Protected LLMs Leak Proprietary Information
373