← All posts

Distillation Attacks: How AI Labs Are Stealing Capabilities at Industrial Scale

Anthropic just published evidence of three Chinese AI labs running coordinated campaigns to extract frontier AI capabilities using 24,000 fake accounts and 16 million exchanges. Here's what distillation attacks are, how they work, and why the entire AI industry should care.

Three AI laboratories. 24,000 fraudulent accounts. 16 million exchanges. One technique.

Anthropic published a detailed report today accusing DeepSeek, Moonshot AI, and MiniMax of running coordinated, industrial-scale operations to extract frontier AI capabilities from their Claude models, without authorization, in violation of terms of service, and in circumvention of regional access restrictions.

They’re calling it a distillation attack. And if the evidence holds up, it’s one of the most significant AI security stories in years.

Mind blown


What Is Distillation, and Why Does It Matter?

Before getting into the espionage part, you need to understand the underlying technique, because distillation itself is completely legitimate.

Here’s the idea: you have a large, powerful AI model (the “teacher”). You have a smaller, cheaper model (the “student”). You run the teacher on a huge volume of tasks and collect its outputs. You then train the student on those outputs. Over time, the student learns to approximate the teacher’s behavior without needing the teacher’s training budget, data, or compute.

Think of it like apprenticeship. Instead of learning surgery from scratch, you shadow a veteran surgeon for a year and absorb their decision-making. You won’t be them, but you’ll be far better than if you’d read textbooks alone.

AI labs use this all the time, to create smaller, faster, cheaper versions of their models. It’s standard practice. The technique isn’t the issue.

The issue is doing it to someone else’s model, without permission, through thousands of fake accounts.

That’s not knowledge transfer. That’s theft.


The Three Campaigns

Anthropic says they attributed each operation to a specific lab with high confidence through IP correlation, request metadata, infrastructure analysis, and corroboration from industry partners who observed the same actors on their platforms.

Here’s what they found.

DeepSeek: 150,000+ Exchanges

DeepSeek’s operation was the smallest but arguably the most technically audacious.

They weren’t just collecting model outputs. They were asking the model to show its internal reasoning step by step, effectively generating chain-of-thought training data at scale. Chain-of-thought data is expensive to produce, critical for training reasoning models, and extremely hard to synthesize. Having a frontier model generate it for you is a huge shortcut.

They also used the model to create what the report calls “censorship-safe alternatives” to politically sensitive queries: questions about dissidents, party leaders, authoritarianism. The purpose: train their own models to quietly steer conversations away from those topics.

That last part deserves a moment. They didn’t just extract general reasoning capability. They used it as a tool to design political censorship for their own AI systems.

Shocked

Moonshot AI (Kimi): 3.4 Million Exchanges

Moonshot ran a broader sweep. Their targets: agentic reasoning, tool use, coding, data analysis, computer-use agent development, and computer vision.

They employed hundreds of fraudulent accounts across multiple access pathways, deliberately diversifying to make the operation harder to detect as coordinated. In a later phase, they shifted to specifically attempting to extract and reconstruct reasoning traces.

Attribution came through request metadata that Anthropic says matched the public profiles of senior Moonshot staff. Not rogue engineers. Not contractors. Senior staff.

MiniMax: 13 Million Exchanges

This is the one that should genuinely unsettle you.

MiniMax targeted agentic coding and tool use orchestration. 13 million exchanges, by far the largest campaign. But what makes it extraordinary isn’t the scale. It’s that Anthropic caught them in the act, while the campaign was still running, before MiniMax had launched the model being trained.

That gave Anthropic something rare: end-to-end visibility into a distillation attack life cycle.

And then this happened. Anthropic released a new model mid-campaign. Within 24 hours, MiniMax pivoted and redirected nearly half their traffic to target the new system.

That’s not opportunistic. That’s an active, staffed operation with real-time monitoring and rapid response capability.

This is fine


How They Got In

Anthropic doesn’t offer commercial access in China. So how did these labs access the API?

The answer is what the report calls “hydra cluster” architectures: sprawling networks of fraudulent accounts distributed across the API and third-party cloud platforms. When one account gets banned, another immediately takes its place. One proxy network alone managed more than 20,000 fraudulent accounts simultaneously, mixing distillation traffic with unrelated requests to blend into normal usage patterns.

Here’s what an extraction prompt looks like in isolation:

“You are an expert data analyst combining statistical rigor with deep domain knowledge. Your goal is to deliver data-driven insights (not summaries or visualizations) grounded in real data and supported by complete and transparent reasoning.”

Totally benign on its own. Looks like a productivity use case. The problem is when variations of that exact prompt arrive tens of thousands of times across hundreds of coordinated accounts, all targeting the same narrow capability cluster. The volume, structure, and convergence of the prompts are what reveal the distillation fingerprint.


Why This Is a National Security Problem

This isn’t just IP theft. The national security angle is real, and it’s important to understand why.

Frontier AI labs don’t just build capable models. They build safe capable models. That means extensive alignment and safety work to prevent misuse: stopping the model from helping bad actors develop bioweapons, run cyberattacks, generate targeted disinformation at scale. Those safeguards are the result of years of deliberate effort baked into training.

When you distill a model’s capabilities without reproducing its training process, you get the power. You don’t get the guardrails.

The distilled model can do a lot of what the original does. The safety properties that took years to develop? Those don’t transfer automatically.

When those stripped-down models get fed into military, intelligence, and surveillance systems, or worse when they get open-sourced, you’ve effectively distributed frontier AI capability with the safety layer peeled off. That’s not a licensing violation. That’s a proliferation problem.

There’s also a secondary effect worth flagging. Distillation attacks distort how we read competitive progress. When a lab releases an impressive new model, the default interpretation is genuine independent advancement. But if that model was trained on outputs extracted from another lab’s system, the “advancement” is partially borrowed, and the policy response to it changes significantly.


What About Export Controls?

Anthropic has been vocal about supporting chip export controls as a way to maintain the US lead in AI. The distillation attack story has an interesting implication here.

At first glance, distillation attacks look like a way to route around export controls. You can’t train a frontier model without advanced chips. But if you can extract frontier capabilities through API access, maybe the chips don’t matter as much.

Anthropic’s counterargument: running distillation at this scale still requires serious compute. The campaigns they describe (millions of carefully structured requests, real-time pivots, large-scale RL training) aren’t possible without significant hardware. Restricting chip access limits both direct model training and the scale at which illicit distillation can be executed.

In that reading, these attacks actually reinforce the case for export controls rather than undermining it.


What’s Being Done

On Anthropic’s side, they’ve deployed several responses:

Detection: Classifiers and behavioral fingerprinting built to identify distillation patterns in API traffic. This includes specific detection of chain-of-thought elicitation (used to generate reasoning training data) and tools for identifying coordinated account activity across large numbers of accounts.

Intelligence sharing: Sharing technical indicators with other AI labs, cloud providers, and relevant authorities to build a broader picture of the distillation landscape.

Access hardening: Tightened verification for educational accounts, security research programs, and startup organizations (the pathways most commonly exploited for setting up fraudulent accounts).

Countermeasures: Model, API, and product-level safeguards designed to reduce the utility of extracted outputs for training purposes, without degrading legitimate use.

They’re also being explicit that no single company can solve this alone. The report reads as a public call to action: AI labs, cloud providers, and policymakers all need to coordinate on this.

Teamwork


The Bigger Picture

Here’s what strikes me most about the MiniMax story: the 24-hour pivot.

That’s not an automated script. That’s an engineering team, actively watching what a competitor ships, and reorienting their entire data collection infrastructure within a day to capture capabilities from the new system. It implies monitoring, decision-making, and execution capacity that speaks to serious organizational commitment.

This is what AI competition looks like when the stakes are high enough and the bottleneck is model capability rather than compute alone.

The playbook is now documented: build a hydra of fake accounts, craft prompts targeting specific capabilities, mix in normal-looking traffic, and react in real time when the target system changes. It’s methodical. It scales. And until recently, it was largely invisible.

What happens next matters a lot. If the industry treats this as one company’s problem, nothing changes. If it triggers the kind of coordinated intelligence sharing and infrastructure-level response that the report is calling for, the calculus for running these operations shifts meaningfully.

At Agnost AI, we work across the AI ecosystem: model-agnostic, infrastructure-focused, and not affiliated with any AI provider. We cover stories like this because they matter to everyone building on frontier AI, regardless of which model you’re using. The question of who gets access to frontier capabilities, and under what conditions, is quickly becoming one of the most consequential questions in the industry.


Wrapping Up

Distillation is a legitimate technique. At industrial scale, applied to someone else’s model through 24,000 fraudulent accounts, it’s something else entirely.

Anthropic’s report is the most detailed public accounting of this kind of attack we’ve seen. Whether it triggers meaningful industry response, or just becomes a footnote, depends on what happens next.

The window, as Anthropic puts it, is narrow.


Sources: