Trending topics
#
Bonk Eco continues to show strength amid $USELESS rally
#
Pump.fun to raise $1B token sale, traders speculating on airdrop
#
Boop.Fun leading the way with a new launchpad on Solana.
Leaky LLMs: Accident or Nature?
I've just published a new blog post about an LLM data exfiltration challenge; and how I got to side channel, jailbreak and extract the secret the LLM was meant to protect.
Definitely not what I woke up to do today 😅
@CuriousLuke93x Sure, it makes the problem twice as hard. Granted. But if instead of 2h of grinding it takes 4h? Heck, make it 24h! The probabilities are still bad when you have autonomous agents.
What you *can* try to do is to add active circuit breakers that halt execution when it detects an attack. That’s what ChatGPT and co are doing (+notifying the police). It’s like fail2ban in SSH world. That can work, but how do you define what’s a fail? What to ban?
In a secret extraction challenge, sure, that’s ok. But when you have an agent with access to all your private data, is leaking the pass bad? Yes! How about leaking what you had for breakfast? Well, “it depends”. Yeah, that “depends” is the problem.
19
Top
Ranking
Favorites
