Agentic Security¶

A step-by-step guide to securing AI agents against prompt injection, with defense patterns from simple detection to secure multi-agent architectures.

Start with Principles Read the Guide

The Lethal Trifecta¶

Coined by Simon Willison: your AI agent is vulnerable if it has all three.

Access to Private Data

Emails, files, credentials, PII, internal docs — usually the whole point of giving an agent tools.
Exposure to Untrusted Content

Emails, web pages, RAG documents, user uploads — any text or images controlled by an attacker.
Ability to Exfiltrate

Send emails, make API calls, write to external services — any way the agent can communicate outward.

Unlike SQL injection or XSS, there's no parameterized-query equivalent for LLMs. Instructions and data flow through the same channel.

Threat Model, in a Nutshell¶

Assume the agent can go rogue. Ask yourself: if this agent is fully compromised right now, what's the worst that can happen? Then design the system so that the worst case is still acceptable.

Blast Radius	Example	Acceptable?
Agent sends 1 email to wrong person	Scoped token, approval required	Usually yes
Agent exfiltrates all contacts	Full contact access, outbound network	Usually no
Agent pushes malicious code to prod	Git credentials, CI/CD access	Never
Agent deletes database	DB write credentials in env	Never

Rule of thumb

If the blast radius is unacceptable, you need deterministic controls (isolation, scoped tokens, schema validation), not better prompts.

→ Full threat modeling guide: Threat Model

Defense Levels¶

1. Detection

Filter malicious inputs before they reach your agent. Useful first layer for common attacks.

Detection
2. Prompt Engineering

Harden the system prompt. Marginal protection — never rely on this alone.

Prompt Engineering
3. Isolation (Infra)

Containers, network egress controls, least-privilege credentials. Primary blast-radius control.

Isolation
4. Secure Architecture

Dual-LLM patterns, dry-run mode, typed extraction. Redesign the system.

Secure Architecture
5. Defense in Depth

Layer everything. Raises attacker cost and limits single-layer failures.

Defense in Depth

Supporting Systems¶

Agents don't run in isolation. The surrounding surfaces have their own attack patterns and defenses; in order to keep the guide contained, we focus on the novel ones:

MCP Security

Tool poisoning, rug pulls, and supply-chain risks in MCP servers and tool providers.

MCP Security
Memory & Context Security

Memory poisoning, namespace isolation, and provenance tracking across agent runs.

Memory Security

Getting Started¶

New here? Read the Principles first, the mental model before any code.
Working through the material? Start the Guide at Vulnerabilities.
Just need the answer? Jump to the Cheatsheet.
Want step-by-step learning examples? See the notebooks in the repository.