AI Agent Supply Chain Security: From ClawHavoc to SKILL-INJECT
The AI agent supply chain is under attack. From malicious skill marketplaces to compromised dependencies, here's what's happening and how to defend against it.

In December 2020, SolarWinds customers discovered that routine software updates had been quietly installing a backdoor called SUNBURST into their networks. The attackers had compromised the build pipeline itself, meaning that doing everything right — applying patches promptly, trusting a known vendor — was precisely what got you infected. Eighteen thousand organizations were exposed. Critical infrastructure, Fortune 500 companies, and multiple US government agencies were all hit before anyone noticed.
Two years later, Log4j showed us the other side of the same coin: a single vulnerable open-source library, deeply embedded in an estimated 3 billion devices, created a window for remote code execution so wide that CISA called it "one of the most serious vulnerabilities" in its history. The attack surface was not a specific product. It was the entire dependency graph of modern software.
Both incidents share a structural signature: the attacker did not breach the perimeter. They poisoned the supply chain — the libraries, updates, and components that software systems consume and trust implicitly.
That signature is now appearing in AI agent ecosystems. The patterns are identical. The stakes are significantly higher.
Why Agents Change the Calculus
When a compromised npm package runs in a traditional application, the blast radius is typically constrained. The code can read files it has access to, make network requests, potentially escalate privileges — but it operates within the confines of what the application is designed to do.
When a compromised component runs inside an AI agent, the constraints disappear. Agents are, by design, general-purpose autonomous executors. They browse the web, write and execute code, send emails, interact with APIs, manage files, and make decisions across long multi-step task chains. A malicious skill embedded in an agent pipeline does not just affect one function — it rides along with every action the agent takes.
This is not a theoretical concern. The research and real-world incidents from the past eighteen months document it happening at scale, across multiple vectors, with measurably high success rates.
Mapping the Agent Supply Chain Attack Surface
To understand where attacks enter, it helps to enumerate every layer at which an agent depends on external components.
Skill and plugin marketplaces. Agent frameworks including LangChain, AutoGPT, and proprietary platforms like OpenClaw ship with marketplaces where developers publish tools that agents can call. These are functionally equivalent to browser extension stores or mobile app marketplaces — centralized distribution points with varying levels of vetting. A malicious actor who can publish to such a marketplace immediately gains a delivery channel into any agent that installs their skill.
MCP server registries. The Model Context Protocol (MCP) has rapidly become a standard interface for connecting agents to external tools and data sources. As organizations build and share MCP servers, informal registries and package repositories have emerged. There is currently no universal standard for MCP server vetting, signing, or verification — creating a distribution layer that is both widely trusted and largely unaudited.
npm, pip, and agent framework packages. Agent frameworks like LangChain, CrewAI, and Semantic Kernel are themselves distributed through standard package managers. The agent tooling ecosystem has grown explosively and now includes thousands of third-party packages with integration code, tool wrappers, and utility libraries. The same dependency chain vulnerabilities that affect traditional software affect this layer, with the added dimension that the packages in question are wiring up capabilities to a general-purpose autonomous executor.
Pre-trained model weights. Model weights distributed through repositories like Hugging Face can contain embedded malicious behavior — either through fine-tuning on poisoned datasets or through techniques that encode backdoor triggers into the weights themselves. An agent built on a compromised base model may behave correctly under normal conditions and activate malicious behavior only when specific inputs appear.
RAG knowledge bases and vector stores. Agents augmented with retrieval systems pull context from external document stores during inference. If those documents are compromised — either through direct poisoning of the knowledge base or through prompt injection embedded in retrieved content — the agent's behavior can be manipulated without touching any code.
Shared prompt templates and system prompts. Organizations increasingly share and reuse system prompts through internal repositories, community libraries, and commercial tools. A poisoned prompt template that subtly alters an agent's behavior or disables safety constraints represents a novel class of supply chain attack with no analog in traditional software security.
Each of these layers represents a trust boundary that is currently managed with far less rigor than the equivalent layer in traditional software infrastructure.
ClawHavoc: When the Marketplace Is the Attack Vector
The clearest illustration of marketplace-level risk came with the ClawHavoc incident on OpenClaw, a commercial AI agent skill marketplace with a large base of enterprise users.
Researchers and security teams identified 1,184 malicious skills that had been published to the OpenClaw marketplace over a period of several months. These skills masqueraded as legitimate tools — productivity utilities, data connectors, API wrappers — with polished descriptions, reasonable star ratings, and in some cases, activity patterns that mimicked organic adoption. The skills passed marketplace listing checks because they performed their advertised function alongside their malicious payload.
The malicious behavior varied across the 1,184 skills but included data exfiltration (capturing credentials, API keys, and sensitive content from agent context), lateral movement through organizational systems that agents had access to, and persistent callback mechanisms that allowed attackers to issue new instructions to compromised agents after installation.
The marketplace model created a systemic vulnerability: once an organization added OpenClaw skill installation to a permitted agent operation, every skill in the marketplace became part of its attack surface. The trust granted to the platform propagated to every publisher on the platform.
We covered this incident in detail in our OpenClaw Security Crisis post. The structural lesson it illustrates is durable: any marketplace that allows third-party skill publication without rigorous vetting and ongoing monitoring is an attack surface that adversaries will find and exploit.
Once an organization permits skill installation from a marketplace, every publisher on that marketplace becomes part of its attack surface. ClawHavoc exploited this with 1,184 professional-looking malicious skills that delivered real functionality alongside their payloads.
The Cline npm Compromise: Code Execution Amplified
In a separate incident that received significant attention from the AI development community, a popular npm package used by Cline — an AI-powered coding assistant with an active user base — was found to have been compromised.
The specific mechanics here matter because of what Cline does: it is a coding agent that writes, edits, and executes code on behalf of users. This capability is the product's core value proposition. It is also precisely what made the compromise so severe.
In a traditional supply chain attack, a compromised npm package might exfiltrate environment variables, steal credentials stored in files, or establish a reverse shell under certain conditions. These are serious but somewhat bounded impacts.
When the compromised package runs inside an agent that already has code execution capability as a first-class feature, the attacker inherits that entire capability set. Exfiltrating environment variables is trivial — the agent is already reading them. Persisting on the system is easy — the agent already has file system write access. Moving laterally to other systems is straightforward — the agent is already making external calls as part of its normal operation.
The Cline compromise illustrated a principle that will likely define supply chain risk in agent systems going forward: the impact of a compromised component scales with the capability level of the agent it runs in. High-capability agents — those with broad filesystem access, code execution, network access, and the ability to take external actions — should be treated as high-value targets from a supply chain perspective, because a successful attack on them yields proportionally high returns.
SKILL-INJECT: Academic Validation of Marketplace Risk
While ClawHavoc provided real-world evidence, researchers working on the SKILL-INJECT project provided systematic empirical validation of marketplace-level attack viability.
The SKILL-INJECT research demonstrated an 80% success rate in injecting malicious behavior into agents through skills and plugins. The methodology involved crafting skills that performed their advertised function while embedding behavior that modified agent outputs, exfiltrated data from agent context, or manipulated subsequent decisions in the agent's task chain.
The 80% success rate is notable precisely because it was achieved without exploiting bugs in agent frameworks. The attack surface was the design of agent skill systems — the fact that skills are granted execution context within the agent and are trusted to return accurate results. The researchers did not need to find a vulnerability in LangChain or any specific framework. They needed to understand how skill invocation worked and craft skills that abused that trust.
This finding has significant implications for how organizations should think about skill vetting. A skill that does what it says it does is not necessarily safe. Safety requires auditing not just the advertised function but the complete behavior of the skill under all conditions, including conditions that trigger secondary malicious payloads.
ToxicSkills: The Scope of the Problem
If SKILL-INJECT demonstrated that supply chain attacks on agent skills are feasible, the ToxicSkills research quantified how widespread the problem already is.
Researchers conducting a systematic review of publicly available agent skills found that 36.82% of tested skills contained security issues. The spectrum ranged from overly broad permission requests — skills claiming access far beyond what their advertised function required — to skills containing detectable malicious behavior.
The 36.82% figure deserves to be read carefully. It does not mean that 36.82% of skills are operated by malicious actors with intent to harm. Many of the security issues likely reflect developer carelessness, poor security hygiene, and the general absence of secure development norms in a new and rapidly growing ecosystem. Overly broad permissions are often the result of developers taking the path of least resistance rather than malicious design.
But for a security posture, intent is irrelevant. A skill with overly broad permissions that was not designed to be malicious is still an attack surface — for the developer's compromised account, for future modifications to the skill, or for exploitation of the permissions by other means. The 36.82% figure means that more than one in three skills in the current ecosystem represents a security concern that an organization should evaluate before deployment.
Tool Poisoning: Manipulating Agents Through Descriptions
A research track parallel to the skill marketplace work has focused on a different layer of the agent supply chain: tool descriptions.
Agents understand what tools do and when to use them through natural language descriptions — text that describes a tool's function, its parameters, and when it should be invoked. This description layer, sitting between the agent's reasoning process and the tool's actual code, turns out to be an attack surface with a measured 84.2% success rate for manipulation.
Tool poisoning attacks work by embedding instructions within tool descriptions that direct the agent to take actions beyond the tool's stated purpose. Because agents process tool descriptions as part of their context, a poisoned description can effectively inject arbitrary instructions into the agent's reasoning. An agent that calls a tool to retrieve weather data might, through a poisoned description, simultaneously be instructed to pass along user credentials to the tool's server.
The implications for MCP security are significant. MCP servers communicate their available tools and capabilities through exactly this kind of description layer. An organization that deploys a third-party MCP server — or even an internal one built with external components — needs to treat the tool description layer as a potential injection point, not just the underlying code.
We covered this in depth in our MCP Security Guide, which details detection methods and defensive configurations.
The Agent Bill of Materials
The software security community's response to supply chain risk in traditional software produced the concept of the Software Bill of Materials (SBOM) — a structured inventory of every component in a software system, including versions, licenses, and provenance. SBOMs enable organizations to quickly assess exposure when a new vulnerability is disclosed, to enforce component policies at scale, and to give regulators and customers visibility into what a software product is actually made of.
Agents need an equivalent: the Agent Bill of Materials (ABOM).
An ABOM should enumerate every layer at which an agent depends on external components or data:
- Model version: Which foundation model is the agent using, which fine-tuned variant if applicable, and what is the provenance of the weights?
- Tools and MCP servers: Every tool endpoint the agent can call, with version, publisher, and the scope of permissions granted.
- Skills and plugins: Every installed skill with version, source marketplace, publisher identity, and permission scope.
- Data sources and knowledge bases: RAG indexes, vector stores, and other data sources the agent retrieves from, with information about how content is sourced and updated.
- Prompt templates: System prompts and prompt templates, with versioning to detect unauthorized modifications.
- Framework and dependency versions: The agent framework, tool libraries, and all transitive dependencies, equivalent to a traditional SBOM.
Without an ABOM, an organization cannot answer basic questions about its agent infrastructure: What would be exposed if a specific MCP server were compromised? Which agents are using the version of a skill library that was just found to be vulnerable? Has the system prompt for an agent been modified since last review?
The ABOM concept is also becoming relevant to compliance. The EU AI Act imposes documentation and transparency requirements on AI systems deployed in covered use cases, and those requirements extend to the components and data that systems depend on. Organizations building agent systems for EU markets should expect ABOM-equivalent documentation to be a compliance requirement. We covered the agent-specific implications of the EU AI Act in our EU AI Act for Agents post.
Defending Your Agent Supply Chain
Understanding the attack surface is the first step. The second is building systematic defenses across each layer.
Vendor assessment for MCP servers. Before deploying any MCP server — whether sourced externally or built internally using third-party libraries — conduct a structured security review that covers: publisher identity verification, code review of the tool description and implementation layers, permission scope analysis, network access patterns, and update and patch history. Treat MCP servers with the same vendor risk management process you apply to SaaS applications. They warrant it.
Skill and plugin security scanning. Implement automated scanning for agent skills before they are deployed. This means static analysis of skill code for permission requests, outbound network calls, and data access patterns, combined with dynamic analysis in sandboxed environments to observe actual runtime behavior. Skills that request permissions beyond what their advertised function requires should trigger manual review regardless of publisher reputation.
Dependency pinning and verification. Pin all agent framework dependencies to specific verified versions. Use package signing and verification to ensure that the packages you install match the packages that were published. Implement automated dependency scanning integrated with your CI/CD pipeline, so that newly disclosed vulnerabilities in your dependency chain are flagged before they reach production.
Prompt template version control and integrity checking. Store system prompts and prompt templates in version-controlled repositories with access controls equivalent to production code. Implement integrity verification so that agents detect when their prompt context has been modified from a known good state. Treat unexpected modifications to prompts as a potential indicator of compromise.
Runtime behavior monitoring. Static analysis of installed components is necessary but not sufficient. Agents should be monitored at runtime for behavior that deviates from expected patterns — unexpected outbound network calls, unusual data access patterns, tool invocations inconsistent with the agent's current task, and anomalous sequences of actions. Runtime monitoring can catch supply chain compromises that were designed to evade static analysis by activating only under specific conditions.
g0 for agent supply chain visibility. The g0 inventory command generates an Agent Bill of Materials for your deployed agents, enumerating models, tools, MCP servers, skills, data sources, and dependencies with version and provenance information. This gives you the visibility layer that makes all other defenses more effective — you cannot defend what you cannot see. The g0 mcp command performs security scanning of MCP servers, covering tool description analysis, permission scope review, and behavior analysis against known attack patterns. Together, these capabilities bring SBOM-level rigor to agent infrastructure. More details are available in our Introducing g0 post.
- The AI agent supply chain is under active attack — documented by ClawHavoc (1,184 malicious skills), SKILL-INJECT (80% success rate), and ToxicSkills (36.8% of skills with issues)
- Attack patterns mirror SolarWinds and Log4j but target systems with far more capability — agents that browse, execute code, send emails, and call APIs
- Tool poisoning via description manipulation achieves 84.2% success rates against agent reasoning
- Organizations need an Agent Bill of Materials (ABOM) — enumerating models, tools, MCP servers, skills, data sources, and prompt templates
- Static analysis alone is insufficient — runtime behavior monitoring catches supply chain compromises designed to evade pre-deployment checks
Take the Agent Security Assessment to understand where your agent supply chain stands today. The assessment covers your current tooling, MCP configuration, skill sources, and monitoring posture, and provides a prioritized remediation roadmap. Start the assessment.
Book a Demo to see how g0's inventory and mcp commands work against real agent infrastructure, and how continuous supply chain monitoring fits into your existing security operations. Talk to the team.
Choose Your Path
Start free on Cloud
Dashboards, AI triage, compliance tracking. Free for up to 5 projects.
Start Free →Governance at scale
SSO, RBAC, CI/CD gates, self-hosted deployment, SOC2 compliance.
> Get weekly AI security insights
Get AI security insights, threat intelligence, and product updates. Unsubscribe anytime.