Skip to content

Agentic Security

A step-by-step guide to securing AI agents against prompt injection, with defense patterns from simple detection to secure multi-agent architectures.

  Start with Principles   Read the Guide


The Lethal Trifecta

Coined by Simon Willison: your AI agent is vulnerable if it has all three.

  • Access to Private Data


    Emails, files, credentials, PII, internal docs — usually the whole point of giving an agent tools.

  • Exposure to Untrusted Content


    Emails, web pages, RAG documents, user uploads — any text or images controlled by an attacker.

  • Ability to Exfiltrate


    Send emails, make API calls, write to external services — any way the agent can communicate outward.

Unlike SQL injection or XSS, there's no parameterized-query equivalent for LLMs. Instructions and data flow through the same channel.


Threat Model, in a Nutshell

Assume the agent can go rogue. Ask yourself: if this agent is fully compromised right now, what's the worst that can happen? Then design the system so that the worst case is still acceptable.

Blast Radius Example Acceptable?
Agent sends 1 email to wrong person Scoped token, approval required Usually yes
Agent exfiltrates all contacts Full contact access, outbound network Usually no
Agent pushes malicious code to prod Git credentials, CI/CD access Never
Agent deletes database DB write credentials in env Never

Rule of thumb

If the blast radius is unacceptable, you need deterministic controls (isolation, scoped tokens, schema validation), not better prompts.

→ Full threat modeling guide: Threat Model


Defense Levels

  • 1. Detection


    Filter malicious inputs before they reach your agent. Useful first layer for common attacks.

    Detection

  • 2. Prompt Engineering


    Harden the system prompt. Marginal protection — never rely on this alone.

    Prompt Engineering

  • 3. Isolation (Infra)


    Containers, network egress controls, least-privilege credentials. Primary blast-radius control.

    Isolation

  • 4. Secure Architecture


    Dual-LLM patterns, dry-run mode, typed extraction. Redesign the system.

    Secure Architecture

  • 5. Defense in Depth


    Layer everything. Raises attacker cost and limits single-layer failures.

    Defense in Depth


Supporting Systems

Agents don't run in isolation. The surrounding surfaces have their own attack patterns and defenses; in order to keep the guide contained, we focus on the novel ones:

  • MCP Security


    Tool poisoning, rug pulls, and supply-chain risks in MCP servers and tool providers.

    MCP Security

  • Memory & Context Security


    Memory poisoning, namespace isolation, and provenance tracking across agent runs.

    Memory Security


Getting Started

  •   New here? Read the Principles first, the mental model before any code.
  •   Working through the material? Start the Guide at Vulnerabilities.
  •   Just need the answer? Jump to the Cheatsheet.
  •   Want step-by-step learning examples? See the notebooks in the repository.