DApp Store | Web3 Hub for Events & Games

Trending topics

Leaky LLMs: Accident or Nature? I've just published a new blog post about an LLM data exfiltration challenge; and how I got to side channel, jailbreak and extract the secret the LLM was meant to protect. Definitely not what I woke up to do today 😅

@CuriousLuke93x Sure, it makes the problem twice as hard. Granted. But if instead of 2h of grinding it takes 4h? Heck, make it 24h! The probabilities are still bad when you have autonomous agents.

What you *can* try to do is to add active circuit breakers that halt execution when it detects an attack. That’s what ChatGPT and co are doing (+notifying the police). It’s like fail2ban in SSH world. That can work, but how do you define what’s a fail? What to ban? In a secret extraction challenge, sure, that’s ok. But when you have an agent with access to all your private data, is leaking the pass bad? Yes! How about leaking what you had for breakfast? Well, “it depends”. Yeah, that “depends” is the problem.

19

Top

Ranking

Favorites