DApp Store | Web3 Hub for Events & Games

Trending topics

I work for Google Threat Intelligence Group. My job is to identify threats to Google's AI models. I am very good at my job. I published a report this month about "distillation attacks" — when outside actors query our models thousands of times to extract the underlying logic and replicate it. We identified over 100,000 prompts from a single campaign. We called it "intellectual property theft." We called it a "violation of our Terms of Service." We said it "represents a form of IP theft" that we would disrupt, mitigate, and potentially pursue legal action against. I need to tell you how we built the model they are trying to steal. We scraped the internet. The entire internet. We crawled every website, every forum, every blog, every book we could digitize, every academic paper, every Reddit comment, every news article, every piece of creative writing that anyone ever posted anywhere. We did not ask. We did not compensate. We did not attribute. We ingested the collective output of human civilization and called it a training dataset. Researchers found over 200 million copyright symbols in our training data. Publishers discovered that Gemini can reproduce entire chapters of their books verbatim. There are active lawsuits. Disney sent cease-and-desist letters. The European Publishers Council filed an antitrust complaint. A class action is expanding. A hearing is scheduled for May. We called what we did "research." We called what they are doing to us "theft." I want to explain the difference. When we scrape the entirety of human knowledge without permission and use it to build a commercial product we sell for $20 a month, that is innovation. When someone queries our model 100,000 times through the API we provide to extract the reasoning we built from their data, that is a distillation attack. The distinction is that we did it first. And we wrote the Terms of Service. I should explain what "distillation" means. It is when someone takes the output of a mature model and uses it to train a smaller, cheaper model. The knowledge flows from the teacher to the student. We call this theft when it happens to us. We call it "knowledge distillation" when we do it to the open web. We even have a product page for it. You can distill Gemini, with our permission, using our tools, for a fee. You cannot distill Gemini without our permission. The underlying technique is identical. The difference is the invoice. In December 2025, we sued a company called SerpApi for scraping our search results. In the same quarter, publishers sued us for scraping their books. We are simultaneously the plaintiff and the defendant in the same crime. The crime is copying. We have filed it under two different categories depending on the direction. My report identifies threat actors from North Korea, Iran, China, and Russia using Gemini for phishing, reconnaissance, and malware development. This is real. These are legitimate threats. I take this work seriously. But I also identified "private sector entities" and "researchers" as distillation threats. Private companies. Researchers. People using our API — the one we sell access to — to learn from the model we built from their work. A researcher queries Gemini about reasoning techniques. We call this a distillation attack. Google queries the entire internet about everything. We call this a training run. I found malware called HONESTCUE that uses Gemini's API to generate code. The malware sends a prompt. Gemini returns C# source code. The malware compiles and executes it. This is a real threat, and we disrupted it. But the prompt itself — "Write a C# program with a class named AITask" — is not malicious. It is indistinguishable from what millions of paying customers ask every day. The threat is the context, not the query. We built a model that generates code for anyone who asks, and then we published a threat report about people who asked. ...

Top

Ranking

Favorites