Anthropic Claude Exposes Distillation Attacks by Chinese AI Labs

Claude code distillation
Claude code distillation

Anthropic has sounded a major alarm in the AI industry, revealing that it has detected and disrupted massive, coordinated campaigns by three prominent Chinese AI laboratories—DeepSeek, Moonshot AI, and MiniMax—aimed at “stealing” the advanced capabilities of its Claude models. Claude code security launch last week as we reported.

The attacks utilized a technique known as model distillation, where a smaller “student” model is trained using the high-quality outputs of a superior “teacher” model. While distillation is a standard industry practice for optimization, Anthropic claims these labs weaponized the process to bypass years of R&D and hundreds of millions of dollars in safety alignment.

The Incident Profile

  • Scale of Operations: Over 16 million exchanges were generated through approximately 24,000 fraudulent accounts.
  • The “Hydra” Tactic: Attackers used “hydra cluster” architectures—distributed networks of proxy services—to mask their identity and bypass regional restrictions (Claude is not officially available in China).
  • Specific Targets: * MiniMax (13M+ queries): Focused on agentic coding and tool-use orchestration.
  • Moonshot AI (3.4M+ queries): Targeted reasoning, data analysis, and computer-use agents.
  • DeepSeek (150K+ queries): Focused on extracting “Chain-of-Thought” (CoT) reasoning steps and generating censorship-safe versions of politically sensitive topics.

Key Impacts & Cyber Security Implications

1. Erosion of the “Safety Moat”

Anthropic’s primary concern is that distilled models inherit the “intelligence” of Claude without its “conscience.” U.S. frontier models are heavily aligned to prevent the generation of bioweapons or the execution of offensive cyberattacks. By distilling the raw capabilities into their own systems, foreign labs can strip away these guardrails, potentially creating powerful, unrestricted AI tools for state-sponsored actors.

2. National Security & Export Control Bypassing

The campaigns highlight a critical vulnerability in U.S. export controls. While the U.S. restricts the sale of high-end AI chips (like NVIDIA H100s) to limit direct model training, distillation allows foreign entities to “borrow” the compute power already spent by American firms. Anthropic warned that these “unprotected capabilities” could be integrated into military, surveillance, and disinformation systems.

3. The “Intelligence Arms Race” Acceleration

The speed of these attacks is unprecedented. Anthropic observed that within 24 hours of releasing a new Claude update, the MiniMax “hydra” network had already pivoted 50% of its traffic to the updated model to begin extracting its new features. This suggests that the lag between U.S. and Chinese AI capabilities is being artificially compressed through near-real-time data siphoning.

4. New Frontiers in API Defense

This incident marks a shift in cyber defense. Protecting an LLM no longer just means preventing data breaches; it means identifying behavioral fingerprints of “imitation learning.”

  • Chain-of-Thought Elicitation: Anthropic is now deploying specialized classifiers to detect when a user is “pumping” the model for its internal logic rather than seeking a simple answer.
  • Hydra Detection: Security teams must now monitor for “low and slow” traffic patterns across thousands of seemingly unrelated accounts that collectively form a single training dataset.

5. Economic “IP Theft” via Synthetic Data

Legally, this sits in a gray area. While the labs were “paying” for API tokens, they were doing so via fraudulent accounts in violation of Terms of Service. This represents a new form of Intellectual Property theft where the value is not in the source code, but in the probabilistic logic of the model’s outputs.

Anthropic’s Response

In response, Anthropic has called for a “coordinated industry defense,” sharing technical indicators of the attack with other AI providers (like OpenAI and Google) and cloud platforms. They are currently developing model-level safeguards designed to make outputs less “trainable” for distillation without hurting the experience for legitimate human users.

The three distillation campaigns detailed below followed a similar playbook, using fraudulent accounts and proxy services to access Claude at scale while evading detection. The volume, structure, and focus of the prompts were distinct from normal usage patterns, reflecting deliberate capability extraction rather than legitimate use.

We attributed each campaign to a specific lab with high confidence through IP address correlation, request metadata, infrastructure indicators, and in some cases corroboration from industry partners who observed the same actors and behaviors on their platforms. Each campaign targeted Claude’s most differentiated capabilities: agentic reasoning, tool use, and coding.

Previous Article
Claude Code Security

Claude Code Security - Your New AI Cyber Security Agent

Next Article
Kali Linux LLM

The Future of Pentesting: Kali Linux Meets Claude AI via MCP

Related Posts