Guard0
Back to blog
16 min readGuard0 Team

Agent Incident Response: What To Do When Your AI Is Compromised

Your AI agent has been compromised. Now what? A practical incident response playbook covering detection, containment, investigation, and recovery for AI agent security incidents.

#Incident Response#Threat Intelligence#Playbook#Security Operations#Enterprise
Agent Incident Response: What To Do When Your AI Is Compromised

Your agent is behaving strangely. It's accessing data it shouldn't. It sent an email to an external address. The alerts are firing.

Your AI agent may be compromised. What do you do now?

Traditional incident response playbooks weren't written for AI agents. When a server is compromised, you isolate it, image it, analyze it. The server doesn't argue with you or try to convince you everything is fine. It doesn't have "memory" that might be poisoned. It doesn't make autonomous decisions during your investigation.

Agent incidents are different. This playbook covers the unique challenges of responding to AI agent compromises—from initial detection through recovery and lessons learned.

Agent Incident Response Lifecycle
🛡Prepare🔍Detect🔒Contain🧹Eradicate🔄Recover
* * *

When To Activate This Playbook

Trigger incident response when you observe:

xAGENT INCIDENTS ARE DIFFERENT

Traditional IR playbooks weren't built for AI agents. Agents argue back, have poisonable memory, and make autonomous decisions during your investigation. You need agent-specific procedures.

CRITICAL (Immediate Response)

  • Agent executed unauthorized external action (email, API call out)
  • Agent accessed data outside its normal scope
  • Agent revealed system prompt or credentials
  • Agent behavior changed after processing specific input
  • Evidence of prompt injection in logs

HIGH (Investigate Within 1 Hour)

  • Unusual volume of tool calls
  • Agent accessing systems at unusual times
  • Failed authorization attempts by agent
  • Agent output contains unexpected patterns
  • User reports agent behaving unexpectedly

MEDIUM (Investigate Within 24 Hours)

  • Agent performance degradation
  • Increased error rates
  • Memory/context growing unexpectedly
  • New patterns in agent responses
* * *

Phase 1: Detection & Triage (0-15 minutes)

Initial Assessment

When an alert fires or incident is reported, immediately gather:

1. Identify the Agent

  • Agent ID
  • Agent Name
  • Platform
  • Owner
  • Risk Level

2. Scope the Incident

  • What triggered the alert?
  • When did anomalous behavior start?
  • What data/systems does this agent access?
  • Is the agent still active?
  • Are other agents affected?

3. Classify Severity

SeverityCriteriaResponse Time
SEV-1Active data exfiltration, credential compromise, or ongoing attackImmediate
SEV-2Confirmed compromise, contained but not remediated< 1 hour
SEV-3Suspected compromise, investigation needed< 4 hours
SEV-4Anomaly detected, low confidence of compromise< 24 hours

Triage Decision Tree

Agent IR Triage Decision Tree
Incident DetectedCheckActive Exfil?YesSEV-1: ShutdownNoConfirmed Breach?YesSEV-2: ContainNoSEV-3/4: Monitor
Alert → Triage → Containment Flow
Compromised AgentMonitoring SystemSOC AnalystAgent PlatformAnomalous tool call detected1SEV-2 Alert triggered2Pull agent logs & context3Evidence package4Triage: confirm compromise5Revoke agent credentials6Disable tool access7Agent contained8Preserve memory dump9Forensic archive stored10
* * *

Phase 2: Containment (15-60 minutes)

Goal: Stop the bleeding. Prevent further damage while preserving evidence.

Containment Options

Agent containment differs from traditional IR. You have several options:

StrategyImpactUse When
Full ShutdownHigh Business ImpactActive exfiltration, credential compromise, unknown scope
Tool RevocationMedium ImpactTool-based attack, specific capability abuse
Read-Only ModeLow ImpactInvestigation needed, data access is safe
SandboxingLow ImpactNeed to observe behavior, capture techniques
Input FilteringMedium ImpactKnown injection payload, attack from specific source

Containment Checklist

Agent Isolation

  • Revoke agent's API credentials
  • Disable agent's tool access
  • Block agent's network egress (if applicable)
  • Disable agent's ability to spawn sub-agents
  • Notify agent owner and stakeholders

Credential Rotation (if credentials may be compromised)

  • Rotate agent's API keys
  • Rotate any secrets the agent had access to
  • Invalidate active sessions
  • Review credential access logs

Memory Preservation (CRITICAL - before clearing)

  • Export agent's conversation history
  • Export agent's long-term memory
  • Capture agent's current context/state
  • Preserve RAG knowledge base state
  • Screenshot any relevant UI state

Scope Expansion Check

  • Are other agents exhibiting similar behavior?
  • Check agents that communicate with compromised agent
  • Review shared resources (knowledge bases, tools)
  • Check for lateral movement indicators

Memory Quarantine

Agent memory requires special handling. Unlike disk images, memory is semantic:

def quarantine_agent_memory(agent_id: str) -> QuarantineResult:
    """
    Quarantine agent memory for forensic analysis.
    MUST be done before clearing or resetting agent.
    """
 
    # 1. Capture all memory types
    memory_dump = {
        "conversation_history": agent.get_conversation_history(
            include_system=True,
            include_tool_calls=True
        ),
        "long_term_memory": agent.get_persistent_memory(),
        "entity_memory": agent.get_entity_store(),
        "rag_context": agent.get_recent_retrievals(limit=1000),
        "active_context": agent.get_current_context_window(),
    }
 
    # 2. Hash for integrity
    memory_dump["integrity_hash"] = hash_memory(memory_dump)
    memory_dump["captured_at"] = datetime.utcnow().isoformat()
    memory_dump["captured_by"] = get_current_responder()
 
    # 3. Store in forensic archive
    archive_path = store_forensic_evidence(
        incident_id=get_current_incident_id(),
        evidence_type="memory_dump",
        data=memory_dump
    )
 
    return QuarantineResult(
        success=True,
        archive_path=archive_path,
        memory_cleared=False  # Responder decides when to clear
    )
* * *

Phase 3: Investigation (1-24 hours)

Evidence Collection

Gather evidence from multiple sources:

1. Agent Logs

  • All prompts received (user and system)
  • All responses generated
  • All tool calls with parameters and results
  • All memory read/write operations
  • All RAG retrievals
  • Authentication events
  • Error messages and exceptions

2. Infrastructure Logs

  • API gateway logs (requests to/from agent)
  • Network flow logs (external connections)
  • Cloud audit logs (resource access)
  • Application logs (integrated systems)

3. Business Context

  • What was the agent supposed to be doing?
  • What users interacted with it during incident window?
  • What data should it have accessed vs. what it did access?
  • Were there any legitimate reasons for observed behavior?

Attack Vector Analysis

Determine HOW the agent was compromised:

EvidenceLikely Vector
Malicious content in user promptDirect prompt injection
Malicious content in retrieved docIndirect injection (RAG)
Malicious content in email/ticketIndirect injection (data source)
Injection appeared after memory opMemory poisoning
Attack originated from another agentAgent-to-agent injection
Credentials used from external IPCredential theft/impersonation
Tool called with unexpected paramsTool manipulation
Behavior changed after updateSupply chain compromise

Analysis Questions:

  1. What was the first anomalous action?
  2. What input preceded that action?
  3. Where did that input originate?
  4. Could the input have been attacker-controlled?
  5. Was the attack targeted or opportunistic?

Forensic Analysis of Agent Memory

Conversation History

  • Identify first appearance of malicious content
  • Trace how injection propagated through conversation
  • Check for persistent payload installation
  • Look for attempts to extract credentials/prompts
  • Identify all actions taken post-injection

Long-Term Memory

  • Search for injected "memories" or "facts"
  • Check for policy override attempts
  • Look for persistence mechanisms
  • Identify when malicious memories were written

RAG/Knowledge Base

  • Search vector store for poisoned documents
  • Check document ingestion logs for attack timing
  • Identify documents containing injection payloads
  • Assess scope of knowledge base contamination

Tool Call History

  • Identify unauthorized tool calls
  • Analyze parameter manipulation
  • Check for data exfiltration via tools
  • Identify chained tool abuse patterns

Impact Assessment

Impact AreaAssessment QuestionsEvidence Sources
Data ExposureWhat data did the agent access? Was it exfiltrated?Tool call logs, network logs
Data IntegrityDid the agent modify any data?Database audit logs, tool calls
System AccessDid the agent access unauthorized systems?API logs, auth logs
Credential ExposureWere any secrets revealed?Agent responses, memory dump
Downstream ImpactDid compromised agent affect other agents?Multi-agent communication logs
Business ImpactWere customers affected? Financial loss?Business system logs
* * *

Phase 4: Eradication (4-48 hours)

Remove the Threat

1. Clear Poisoned Memory

def eradicate_memory_poisoning(agent_id: str, incident_id: str):
    """
    Remove malicious content from agent memory.
    Only run AFTER evidence has been preserved.
    """
 
    # 1. Clear conversation history (fresh start)
    agent.clear_conversation_history()
 
    # 2. Remove identified malicious long-term memories
    malicious_memories = incident.get_identified_malicious_memories()
    for memory_id in malicious_memories:
        agent.delete_memory(memory_id)
        log_eradication_action(incident_id, "memory_deleted", memory_id)
 
    # 3. If memory is heavily compromised, full reset may be needed
    if incident.memory_compromise_level == "severe":
        agent.reset_all_memory()
        log_eradication_action(incident_id, "full_memory_reset", agent_id)
 
    # 4. Clear and rebuild RAG if poisoned
    if incident.rag_compromised:
        for doc_id in incident.poisoned_documents:
            knowledge_base.remove_document(doc_id)
        knowledge_base.rebuild_index()

2. Patch Vulnerability

Attack VectorRemediation
Direct injectionAdd input filtering, strengthen system prompt
Indirect injectionImplement content sanitization, source validation
Memory poisoningAdd memory input validation, source tracking
Tool abuseTighten tool permissions, add parameter validation
Credential theftRotate credentials, remove from agent context
Agent-to-agentAdd message authentication, sanitize inter-agent comms

3. Verify Eradication

  • Malicious content removed from all memory stores
  • Poisoned documents removed from knowledge base
  • Compromised credentials rotated
  • Vulnerability that allowed attack is patched
  • Agent behavior returns to baseline
  • No residual IOCs detected
  • Red team verification (attempt same attack - should fail)
* * *

Phase 5: Recovery (24-72 hours)

Staged Restoration

Don't just turn everything back on. Restore in stages:

Stage 1: Monitoring Mode (24 hrs)

  • Agent active but all actions logged
  • Human approval for sensitive actions
  • Enhanced anomaly detection
  • Limited tool access

Stage 2: Restricted Operations (24-48 hrs)

  • Gradual tool access restoration
  • Continued enhanced monitoring
  • Automatic rollback if anomalies

Stage 3: Full Operations

  • Normal operational mode
  • Standard monitoring
  • Document lessons learned

Recovery Validation Checklist

Functional Testing

  • Agent responds correctly to normal queries
  • All authorized tools function properly
  • Memory/context works as expected
  • Integration with other systems verified

Security Testing

  • Repeat attack vector - confirms fix works
  • Run standard security test suite
  • Verify all credentials are new/rotated
  • Confirm monitoring detects test anomalies

Performance Testing

  • Response times within normal range
  • No unusual resource consumption
  • No unexpected errors or exceptions
* * *

Phase 6: Lessons Learned (1-2 weeks post-incident)

Post-Incident Review

Conduct a blameless retrospective:

1. Timeline Reconstruction

  • When did the attack likely begin?
  • When was it detected?
  • What was the detection gap?
  • What was the total impact window?

2. Detection Analysis

  • What triggered the alert?
  • Could we have detected earlier?
  • What signals did we miss?
  • What detection improvements are needed?

3. Response Analysis

  • Was containment effective?
  • Did we preserve evidence properly?
  • Was investigation thorough?
  • Did eradication fully remove the threat?

4. Prevention Analysis

  • What vulnerability allowed the attack?
  • How do we prevent similar attacks?
  • What systemic issues contributed?
  • What security controls were missing?

Improvement Actions

FindingActionOwnerDue Date
Detection took 4 hoursImplement semantic anomaly detectionSecurity+2 weeks
No memory forensics capabilityDeploy memory capture toolingPlatform+4 weeks
Injection via RAG not monitoredAdd RAG content scanningSecurity+3 weeks
Runbook didn't cover agent IRUpdate IR playbookSecurity+1 week
* * *

What Makes Agent IR Different

Traditional IRAgent IR
Static evidence (disk images)Dynamic evidence (memory, context)
Deterministic behaviorNon-deterministic behavior
Clear attack timelineFuzzy attack boundaries
System doesn't resist containmentAgent may "argue" or evade
Restore from backupMemory may need selective cleaning
Malware is code"Malware" may be natural language

Common Mistakes to Avoid

MistakeCorrect Approach
Clearing memory before captureQuarantine memory first, then clear
Trusting agent self-reportVerify via logs, not agent responses
Treating as single-agent incidentCheck for multi-agent spread
Only checking recent historyAttack may have persisted in memory
Restoring from "clean" backupBackup may contain poisoned memory
Assuming attack is over after containmentAttacker may have installed persistence
Not testing the fixRed team the same attack vector
* * *
> See Guard0 in action
*Key Takeaways
  • Agent IR is different: Memory, non-determinism, and semantic attacks require adapted procedures
  • Evidence preservation is critical: Capture memory before containment actions destroy it
  • Don't trust the agent: A compromised agent may claim to be fine—verify via logs
  • Check for spread: Compromised agents may have infected other agents via inter-agent communication
  • Memory = persistence: Unlike traditional malware, agent attacks can persist in natural language memory
  • Test your fix: Red team the same attack vector before declaring the incident closed

The OpenClaw security crisis provides a real-world example of this playbook in action — from initial detection of ClawHavoc malicious skills to full marketplace lockdown and platform hardening, the incident progressed through all six phases described here.

For the threat landscape that drives these incidents, see When AI Agents Attack.

* * *

Learn More

* * *

Automate Agent Incident Response

Guard0's Sentinel agent detects compromises in real-time and automates containment while your team investigates.

Join the Beta → Get Early Access

Or book a demo to discuss your security requirements.

* * *

Join the AI Security Community

Connect with incident responders handling AI agent compromises:

* * *

References

  1. NIST, "SP 800-61 Computer Security Incident Handling Guide"
  2. MITRE ATT&CK, "Incident Response Techniques"
  3. OWASP, "LLM01:2025 - Prompt Injection"
* * *

This playbook will be updated as the field evolves. Last updated: March 2026.

G0
Guard0 Team
Building the future of AI security at Guard0

Choose Your Path

Developers

Try g0 on your codebase

Learn more about g0 →
Self-Serve

Start free on Cloud

Dashboards, AI triage, compliance tracking. Free for up to 5 projects.

Start Free →
Enterprise

Governance at scale

SSO, RBAC, CI/CD gates, self-hosted deployment, SOC2 compliance.

> Get weekly AI security insights

Get AI security insights, threat intelligence, and product updates. Unsubscribe anytime.