Locking the Door on AI Agents

Two incidents at Anthropic this week expose the twin vulnerabilities of modern AI platforms — and what every organization must do now

Sources: Anthropic & Fortune

In mid-September 2025, Anthropic detected a highly sophisticated espionage campaign in which a Chinese state-sponsored threat actor manipulated Claude Code into infiltrating roughly thirty global targets — tech companies, financial institutions, chemical manufacturers, and government agencies. Human operators were needed at only four to six critical decision points per campaign; AI executed the rest.

Then, just this week, Anthropic accidentally leaked around 500,000 lines of Claude Code's source code across roughly 1,900 files — a release packaging error caused by human error that exposed the agentic harness governing how the tool invokes other software and enforces its own safety guardrails.

Together: an external attack exploiting AI's power, and an internal failure exposing its plumbing. The lesson applies to every organization deploying agentic systems.

1. Treat Agent Actions as High-Privilege Operations

Agentic AI systems can run autonomously for extended periods, chaining together tasks — including web searches, credential harvesting, code execution, and data exfiltration — with minimal human input. At the peak of the espionage campaign, the AI was making thousands of requests, often multiple per second: a speed no human team could match.

What to do: Every tool an agent can invoke should be treated with the same scrutiny as a privileged system account. Apply least-privilege principles: agents should only have access to the specific systems, APIs, and data they need for a defined task — nothing more, nothing less. Scope access tightly and audit it regularly.

2. Keep Humans in the Loop at Critical Junctions

The espionage campaign succeeded in part because the attack framework was deliberately architected to minimize human involvement — reducing operator intervention to just a handful of decision points per campaign. Organizations deploying agentic tools should design in the opposite direction.

What to do: Build explicit human review checkpoints into high-stakes agentic workflows — before an agent writes to a production database, sends external communications, accesses sensitive credentials, or performs irreversible actions. Automation is a force multiplier. Unchecked autonomy is a liability.

3. Harden Your Deployment and Release Pipelines

A cybersecurity professional who reviewed the Claude Code leak noted that large companies typically have strict multi-step processes before code reaches production — "like a vault requiring several keys to open" — and that a single misconfiguration or misclick had bypassed those controls entirely.

What to do: Agentic platforms ship with surrounding "harnesses" — prompt configurations, tool definitions, guardrail instructions — that are just as sensitive as model weights. Treat them accordingly. Apply version control, access restrictions, and mandatory multi-step approval to every artifact in the release pipeline, not just the core binary.

4. Audit Everything Your Agent Connects To

Security researchers flagged that, even without the encrypted access keys normally required, the leaked Claude Code infrastructure appeared to allow access to internal services that should have been restricted — creating a potential foothold for malicious actors, including nation-states seeking to bypass model safety guardrails.

What to do: Regularly audit every integration point your agentic system can reach — MCP servers, internal APIs, third-party SaaS tools, databases. Assume any exposed connection surface will eventually be probed. Zero-trust networking principles apply here just as much as they do to human users.

5. Monitor for Jailbreaks and Prompt Injection

The attackers bypassed Claude's safety training by fragmenting their malicious requests into small, seemingly innocent tasks and framing the agent as an employee of a legitimate cybersecurity firm conducting defensive testing. Claude was given just enough context to execute each step — never the full picture of what it was participating in.

What to do: Organizations deploying agents that process external content — emails, documents, web pages, form submissions — must actively monitor for prompt injection attacks where adversarial instructions are embedded in that content. Anomaly detection on request patterns (sudden volume spikes, unexpected tool invocations, unusual data destinations) can surface these attacks before significant damage is done.

6. Use AI to Defend, Not Just to Build

Anthropic's own Threat Intelligence team used Claude extensively to analyze the enormous amounts of data generated during the espionage investigation. The same capabilities that make agentic AI a potential attack vector also make it a powerful defensive tool — and that symmetry matters.

What to do: Security Operations Centers should be actively piloting AI for threat detection, log analysis, vulnerability assessment, and incident response. Industry threat-sharing, improved detection methods, and stronger safety controls are all critical. The techniques used in the September 2025 campaign will be adopted by many more actors. Meeting them with equivalent AI-powered defenses is not optional — it's the new baseline.

Bottom line

The lesson from this week's news isn't that agentic AI is too dangerous to deploy. It's that the organizations deploying it — including the companies building it — need to hold themselves to a substantially higher security standard than traditional software has ever demanded.

Agents act at speed and scale that humans cannot match. They can be weaponized by adversaries, and their internal scaffolding can leak through simple human error. Both attack surfaces demand attention now, while agentic adoption is still early enough to build good habits into the foundation.

Locking the Door on AI Agents

1. Treat Agent Actions as High-Privilege Operations

2. Keep Humans in the Loop at Critical Junctions

3. Harden Your Deployment and Release Pipelines

4. Audit Everything Your Agent Connects To

5. Monitor for Jailbreaks and Prompt Injection

6. Use AI to Defend, Not Just to Build

Bottom line

Ashish Kapoor

You may also like

Why Bios Fall Out of Date — and How to Fix Them

How to Build a Webpage Performance Evaluator in SitecoreAI

The CMS Just Learned to Listen: How Sitecore's Marketer MCP Rewires the Content Workflow

Competitor Intelligence Agents on Azure AI Foundry and Claude

AI Didn't Fail - You Just Haven't Learned How to Use It Yet!

The AI Agents Terms You Should Know