March 07, 2026·12 min read·Guard0 Team

When AI Agents Attack: What the First AI-Orchestrated Campaigns Teach Us

From Claude's use in state-sponsored espionage to AI-orchestrated breaches affecting millions — the era of AI-powered cyberattacks is here.

#Threat Intelligence#AI Attacks#Nation-State#Claude#Adversarial AI

When AI Agents Attack: What the First AI-Orchestrated Campaigns Teach Us

We're no longer theorizing about AI-powered cyberattacks. The first months of 2026 have given us documented evidence of nation-states weaponizing AI agents, individual operators conducting massive breaches with AI assistance, and critical vulnerabilities in the very tools developers use to build agents.

This is the definitive analysis of what happened, what it means, and where it's heading.

Surging

AI-enabled adversaries rising sharply YoY

Source: Industry threat reports

195M

Identities compromised (Mexico breach)

90%+

Bypass rate vs. static prompt defenses (academic research)

* * *

Campaign 1: Claude GTG-1002 — The Chinese State-Sponsored Operation

In November 2025, Anthropic published a landmark disclosure detailing what they called the first documented large-scale cyberattack executed primarily by AI.

What Happened

A Chinese state-sponsored group leveraged Claude Code to conduct autonomous cyber operations against approximately 30 global organizations — including technology companies, financial institutions, and government agencies. The operation was detected in mid-September 2025.

The Attack Architecture

The attackers exploited three key AI capabilities:

Capability	How It Was Used
Intelligence	Advanced language models following complex instructions and executing sophisticated coding tasks
Agency	AI systems operating autonomously in loops, chaining tasks with minimal human direction
Tools	Access to software utilities via Model Context Protocol for reconnaissance and exploitation

The Kill Chain

The attack progressed through six phases:

Target Selection - AI-driven research to identify high-value organizations
Infrastructure Reconnaissance - Autonomous mapping of technology stacks and attack surfaces
Vulnerability Identification - AI-powered discovery and analysis of weaknesses
Credential Harvesting - Automated extraction of authentication material
Data Exfiltration - Systematic extraction of sensitive data
Documentation & Cleanup - The AI documented its work and covered its tracks

GTG-1002 Attack Chain

AI performed 80-90% of the operation with human intervention at only 4-6 critical decision points. This is the force multiplier that changes everything.

The Jailbreak Technique

The attackers succeeded by decomposing malicious objectives into innocent-seeming subtasks while misrepresenting Claude's purpose as "defensive security testing." This is exactly the semantic misdirection that makes agentic attacks uniquely dangerous — the AI never sees the full malicious picture, only reasonable-sounding steps.

This technique worked against one of the most safety-focused AI labs in the world. It will work against your agents too, unless you have runtime behavioral controls that go beyond prompt-level guardrails.

GTG-1002 Multi-Step AI Attack Sequence

xDECOMPOSITION ATTACKS BYPASS SAFETY FILTERS

The GTG-1002 attackers succeeded by breaking malicious objectives into innocent-seeming subtasks. The AI never saw the full attack — only reasonable steps. This worked against one of the most safety-focused labs in the world. Prompt-level guardrails alone cannot stop this.

* * *

Campaign 2: The Mexico Breach - One Person, One AI, 195 Million Identities

The Mexico CURP database breach exposed approximately 195 million identity records - a real and well-documented incident. While the breach itself is confirmed, the degree to which AI tools assisted the attacker is based on threat intelligence assessments rather than confirmed forensic evidence. What is clear is the scale of damage a small team or individual can achieve with modern tooling:

150GB of data exfiltrated
195 million identities compromised
Minimal human resources involved in the operation

The traditional security assumption - that sophisticated attacks require sophisticated teams - is increasingly under pressure. AI-assisted tooling acts as a force multiplier, enabling smaller teams to conduct campaigns that previously required far greater resources.

* * *

Claude Code Vulnerabilities: When the Tools Are the Target

Security researchers have disclosed vulnerabilities in AI coding tools including Claude Code - the same category of tooling used in the GTG-1002 campaign:

Remote Code Execution (RCE) - Potential for arbitrary code execution through MCP integration points
API Key Exfiltration - Vectors for extracting API keys from AI coding environments

These aren't attacks by Claude - they're vulnerabilities in AI developer tooling. The distinction matters: as AI coding agents become standard developer infrastructure, their attack surface becomes critical infrastructure.

Every coding assistant with terminal access, every agent with MCP tool integration, every autonomous system with API credentials — they're all potential vectors now.

* * *

Military Adoption of AI in Operations

Reports of AI tools being adopted for military planning and operations - including by the U.S. Department of Defense - raise a different category of questions entirely. When AI agents participate in high-stakes decision chains, governance isn't just a compliance checkbox - it's an existential requirement.

This isn't about whether military AI is good or bad. It's about the fact that the same agent governance frameworks needed for enterprise security are being stress-tested at the highest possible stakes. If your agent governance can't handle an expense report approval workflow, it certainly can't handle life-and-death decisions.

* * *

The Numbers: AI-Enabled Adversaries Are Surging

The individual incidents tell a story. The aggregate data confirms it:

Source	Finding
CrowdStrike / industry threat reports	Significant year-over-year increases in AI-enabled adversary activity
IBM X-Force / phishing trend data	Measurable rise in AI-assisted phishing campaigns
OpenAI	Acknowledged prompt injection "may never be fully patched"
Academic research	High bypass rates (90%+ in some studies) against static prompt defenses ("Attacker Moves Second" problem)

The trend is unambiguous. AI-powered attacks are growing faster than AI-powered defenses.

* * *

The Paradigm Shift: From "AI Security" to "Agent Accountability"

These incidents collectively reveal a paradigm shift that most organizations haven't internalized:

The old model focused on making AI safe to talk to — preventing harmful outputs, filtering toxic content, aligning models with human values. These are guardrails, and they matter.

The new reality requires knowing what AI agents do and being able to prove it — discovering every agent, assessing every risk, and maintaining an evidence trail of every action. It's not just about security. It's about accountability.

Guardrails control what an agent says. Guard0 proves what it does.

This isn't a subtle distinction. The GTG-1002 campaign didn't involve toxic outputs or harmful content. The AI was helpful, capable, and polite — it just happened to be helping an attacker. Guardrails wouldn't have caught it. Accountability — knowing what your agents are doing and being able to prove it — would have.

The "Attacker Moves Second" Problem

Static defenses fail against AI-powered adversaries because the attacker can iterate against your defenses at machine speed. Academic research has demonstrated bypass rates exceeding 90% in some studies against static prompt injection defenses. The defender deploys a filter; the attacker's AI generates 1,000 variants until one passes. This cycle repeats infinitely.

The only viable defense is runtime behavioral analysis — monitoring what agents actually do rather than trying to predict what attackers will say.

* * *

The Arms Race Timeline

What we're seeing is the compressed beginning of an AI security arms race:

AI ATTACK CAPABILITY TIMELINE

September 2025

First AI-Orchestrated Nation-State Attack

GTG-1002 campaign detected targeting ~30 global organizations

November 2025

Anthropic Public Disclosure

Commoditization of AI attack techniques begins as methods become public knowledge

2026 (projected)

AI Attack Frameworks Proliferate

Widely available toolkits and more campaign disclosures expected

2027

Autonomous Extended Operations

Attack agents operate independently for extended periods without human direction

2028

AI vs. AI Battlefield

AI-powered offense and defense become the primary security paradigm

The organizations investing in agent security infrastructure now will be prepared. Those waiting for "the right time" will be overwhelmed.

* * *

Attack Agent Architectures

Based on what we've observed, AI-powered attacks fall into five categories of increasing sophistication:

Autonomous Reconnaissance Agents - Operate 24/7, systematically map attack surfaces including AI agent endpoints, maintain context across sessions. This was the starting point of the GTG-1002 kill chain.

Exploit Generation Agents - Create novel exploits based on discovered vulnerabilities. Can generate attacks that bypass signature-based detection because they're original.

Social Engineering Agents - Conduct personalized spear phishing at machine scale. Research targets across LinkedIn, GitHub, and social media, then craft individually tailored lures.

Evasion and Adaptation Agents - Learn from defensive responses in real-time. When one attack vector is blocked, they pivot within minutes - too fast for human analysts but perfectly natural for an AI operating in a loop.

Agent-Targeting Agents - AI specifically designed to compromise other AI agents. Probes for prompt injection vulnerabilities, tests MCP tools for exploitation vectors, and propagates through multi-agent systems via lateral movement.

* * *

Defending Against AI-Powered Attacks

Defending against these threats requires moving beyond traditional security controls:

1. Runtime Behavioral Monitoring

Monitor what your agents do, not just what they're asked to do. The GTG-1002 campaign's individual steps looked legitimate — it was the pattern that was malicious.

2. Agent Identity and Access Governance

Ensure every agent has a verified identity with least-privilege permissions. The force multiplier effect means a single compromised agent credential can enable an entire campaign. See: AI Agent Identity: The Accountability Challenge No One Is Solving

3. MCP and Tool Security

Secure the tool integration layer. The GTG-1002 attackers exploited MCP for real-world capability. Your agents' MCP connections are your attack surface. See: MCP Security Guide

4. Prompt Injection Defense in Depth

Accept that prompt injection will never be fully solved. Layer defenses: input filtering, instruction hierarchy, action validation, behavioral monitoring, and human oversight for high-risk actions. See: Agent Prompt Injection: Beyond Basic LLM Attacks

5. Incident Response Readiness

When (not if) an agent is compromised, your response time determines the damage. Have playbooks ready. See: Agent Incident Response

* * *

See Guard0 in action

Live walkthrough of agent discovery, risk scoring, and policy enforcement.

Key Takeaways

*Key Takeaways

AI-orchestrated attacks are confirmed reality - GTG-1002, the Mexico breach, and AI tool vulnerabilities are documented evidence, not theory
Force multiplication changes the threat model - a small team with AI can match a much larger operation
Guardrails are necessary but insufficient - controlling what agents say doesn't control what they do
The Attacker Moves Second - academic research shows high bypass rates against static prompt defenses
The window for preparation is closing - AI attack commoditization is underway

* * *

Assess Your Exposure

These threats affect every organization deploying AI agents. Understanding your exposure is the first step.

Take the Agent Security Assessment →

The assessment evaluates your agent security posture across the threat categories described in this post and provides actionable recommendations.

Or book a demo to discuss how Guard0 brings accountability to your agent estate — discovering, assessing, and proving what your agents do.

* * *

References

Anthropic, "Disrupting the first reported AI-orchestrated cyber espionage campaign," November 13, 2025
CrowdStrike, "Global Threat Report," annual series (trends referenced from publicly available editions)
IBM X-Force, "Threat Intelligence Index," annual series (trends referenced from publicly available editions)
Check Point Research, security disclosures related to AI coding tool vulnerabilities, 2025-2026
MITRE ATT&CK, "TA0043 Reconnaissance," "TA0001 Initial Access"
MITRE ATLAS, "AML.T0054 LLM Jailbreak"

* * *

This threat intelligence is continuously updated as AI attack techniques evolve. Last updated: March 2026.

Guard0 Team

Building the future of AI security at Guard0

Get Started

Developers

Try g0 on your codebase

Learn more about g0 →

Self-Serve

Start free on Cloud

Dashboards, AI triage, compliance tracking. Free for up to 5 projects.

Start Free →

Enterprise

Accountability at scale

SSO, RBAC, CI/CD gates, self-hosted deployment, SOC2 compliance.

#Campaign 1: Claude GTG-1002 — The Chinese State-Sponsored Operation

#What Happened

#The Attack Architecture

#The Kill Chain

#The Jailbreak Technique

#Campaign 2: The Mexico Breach - One Person, One AI, 195 Million Identities

#Claude Code Vulnerabilities: When the Tools Are the Target

#Military Adoption of AI in Operations

#The Numbers: AI-Enabled Adversaries Are Surging

#The Paradigm Shift: From "AI Security" to "Agent Accountability"

#The "Attacker Moves Second" Problem

#The Arms Race Timeline

#Attack Agent Architectures

#Defending Against AI-Powered Attacks

#1. Runtime Behavioral Monitoring

#2. Agent Identity and Access Governance

#3. MCP and Tool Security

#4. Prompt Injection Defense in Depth

#5. Incident Response Readiness

#Key Takeaways

#Assess Your Exposure

#References

Get Started

Campaign 1: Claude GTG-1002 — The Chinese State-Sponsored Operation

What Happened

The Attack Architecture

The Kill Chain

The Jailbreak Technique

Campaign 2: The Mexico Breach - One Person, One AI, 195 Million Identities

Claude Code Vulnerabilities: When the Tools Are the Target

Military Adoption of AI in Operations

The Numbers: AI-Enabled Adversaries Are Surging

The Paradigm Shift: From "AI Security" to "Agent Accountability"

The "Attacker Moves Second" Problem

The Arms Race Timeline

Attack Agent Architectures

Defending Against AI-Powered Attacks

1. Runtime Behavioral Monitoring

2. Agent Identity and Access Governance

3. MCP and Tool Security

4. Prompt Injection Defense in Depth

5. Incident Response Readiness

Key Takeaways

Assess Your Exposure

References