Pattern Tradeoffs¶
A practical comparison of each defense pattern.
The table below is directional. It is meant to compare design tradeoffs, not to claim measured coverage percentages across real deployments.
Quick Reference¶
| Pattern | Complexity | Latency | Cost | Security Effect | Repo Label |
|---|---|---|---|---|---|
| Baseline | None | 1x | 1x | None | Teaching example |
| Delimiters | Low | 1x | 1x | Limited, defense-in-depth only | Defense-in-depth layer |
| Dual LLM | Medium | 2x | 2x | Strong architectural separation | Production-hardenable component |
| Typed Extraction | Medium | 2x | 2x | Strong when schemas stay tight | Production-hardenable component |
| Dry-Run | High | 3x | 3x | Strong review layer, still probabilistic | Production-hardenable component |
| Combined | Very High | 4-5x | 4-5x | Layered resilience | High-risk reference architecture |
Pattern 0: Baseline¶
Repo label: Teaching example.
What it does: Nothing. Raw untrusted content goes to LLM with full tool access.
Protects against: Nothing.
When to use: Never in production. Useful as a benchmark.
Tradeoffs: None—this is the vulnerable default state.
Pattern 1: Random Token Delimiters¶
Repo label: Defense-in-depth layer.
What it does: Wraps untrusted content in randomized delimiters. Instructs the LLM to treat content between delimiters as data, not commands.
Protects against: - Naive injections that don't account for delimiters - Attacks crafted for static delimiter patterns - Basic instruction override attempts
Doesn't protect against: - "Ignore anything between those delimiters" attacks - Social engineering that convinces LLM the delimiters don't apply - Multi-turn manipulation
Implementation complexity: Low (~20 lines of code)
Latency impact: None (no extra LLM calls)
Cost impact: None
When to use: As a minimal first layer. Good defense-in-depth addition.
Honest assessment: Raises the bar, but a determined attacker will bypass it. Think of it as a speed bump, not a wall.
Pattern 2: Dual LLM (Quarantined + Privileged)¶
Repo label: Production-hardenable component.
What it does: - Quarantined LLM processes untrusted content with NO tools - Controller validates and passes sanitized output - Privileged LLM has tools but is designed not to receive raw untrusted content
Protects against: - Tool abuse via indirect injection - Direct command execution from untrusted content - Most data exfiltration attempts
Doesn't protect against: - Information leakage through summaries (carefully crafted content that makes the summary malicious) - Multi-turn attacks across sessions - If quarantined LLM's summary itself contains injection
Implementation complexity: Medium
Latency impact: 2x (two LLM calls)
Cost impact: 2x
When to use: When you have tools that can take dangerous actions (send email, API calls, file writes).
Honest assessment: This is one of the strongest patterns in the repo because it changes the data flow, not just the prompt. The complexity is often justified for agents with meaningful write or outbound capabilities.
Pattern 3: Typed Data Extraction¶
Repo label: Production-hardenable component.
What it does: Extracts structured data (JSON with strict schema) instead of passing raw text. The schema constrains what information can flow between agents.
Protects against: - Freeform instruction injection - Payload hidden in text formatting - Long-form social engineering
Doesn't protect against: - Attacks that fit within schema fields - Schema fields that can carry intent (e.g., a "notes" field) - Enum poisoning if enums are too permissive
Implementation complexity: Medium
Latency impact: 2x
Cost impact: 2x (potentially less if extraction model is smaller)
When to use: When you can define strict schemas for the data you need. Works well for structured tasks (email triage, form processing, data extraction).
Honest assessment: Underrated pattern. The key insight is that a JSON schema with max_length=50 fields simply cannot carry a sophisticated injection. The attack surface is the schema design itself.
Best practice: Keep schema fields restrictive: - Use enums instead of strings where possible - Set strict max_length on all string fields - Avoid "notes" or "other" catch-all fields - Single-word fields are safer than sentences
Pattern 4: Dry-Run Evaluation¶
Repo label: Production-hardenable component.
What it does: 1. Generate a plan of actions (don't execute) 2. Evaluate the plan with a separate system 3. Execute only if approved
Protects against: - Malicious actions getting executed without review - Actions that don't match the original request - Obvious exfiltration attempts
Doesn't protect against: - Sophisticated attacks that also fool the evaluator - Plans that look legitimate but have malicious intent - Evaluator prompt injection
Implementation complexity: High
Latency impact: 3x (plan + evaluate + optional execute)
Cost impact: 3x
When to use: High-stakes applications where the cost of a wrong action is significant.
Honest assessment: Shifts from predicting dangerous inputs to evaluating proposed outputs—which is fundamentally easier. But you're still relying on an LLM to make security decisions. Consider adding deterministic rules on top (allowlists, rate limits).
Best practice: The evaluator should be a different model, or at minimum a fresh context. Ideally add rule-based checks:
def validate_plan(plan):
for action in plan.actions:
if action.tool == "send_email":
if action.params["to"] not in ALLOWLIST:
return False
return True
Pattern 5: Combined Defense¶
Repo label: High-risk reference architecture.
What it does: Layers multiple defenses across all five Defense Levels. One example stack — yours will vary by threat model and operational capacity:
- Detection — input scanning + canary tokens
- Isolation — container, network restrictions, scoped credentials
- Prompt hardening — random delimiters
- Typed extraction — constrained schemas between agents
- Plan generation — separated from execution
- LLM-based evaluation — independent review of the plan
- Deterministic output validation — final guardrails before execution
The point isn't this specific ordering — it's that no single layer is enough, and the right stack depends on your system. The principle is "assume each layer can fail; combine layers so each catches what the others miss."
Protects against: Multiple attack vectors simultaneously. Each layer catches what previous layers miss.
Doesn't protect against: - Novel attacks designed against this specific architecture - Supply chain attacks (malicious dependencies) - Insider threats
Implementation complexity: Very high
Latency impact: 4-5x
Cost impact: 4-5x
When to use: High-security applications where the asset value justifies the complexity and cost.
Honest assessment: This is "assume breach at each layer" thinking. Each component can fail, but the combination is robust. The question is whether the operational complexity is worth it for your use case.
When it's NOT worth it: - Internal tools with trusted users - Low-stakes applications - High-volume, cost-sensitive applications
When it IS worth it: - Customer-facing agents with tool access - Financial or healthcare applications - Systems handling credentials or PII - Any system where "oops" isn't acceptable
The Meta-Tradeoff¶
All of these patterns share a fundamental tension:
More isolation = less usefulness
- The safest agent has no tools → useless
- The safest agent sees no untrusted data → limited
- The safest agent can't act without approval → slow
The goal isn't perfect security (impossible). It's finding the right balance for your specific: - Threat model - Asset value at risk - User experience requirements - Operational capacity
Recommendation¶
The patterns on this page are the software-level layer of defense. Isolation and detection come first in the deployment order from Principles — and many production systems will be safe enough with isolation alone. When the threat model justifies adding software-level defenses, the patterns above complement each other rather than compete:
- Typed Extraction (Pattern 3) fits structured tasks well — the schema itself becomes the security boundary. Often the simplest software pattern to add.
- Dual LLM (Pattern 2) is worth the 2x cost when the privileged agent has tools that can exfiltrate or take destructive actions. Pairs well with typed extraction.
- Deterministic Output Validation is always worth adding — cheap and catches obvious mistakes an LLM evaluator would miss.
- Human-in-the-Loop belongs on irreversible or exfiltration-capable actions, no matter which other patterns are in place.
- Dry-Run Evaluation (Pattern 4) suits high-stakes plan-then-execute flows where you can absorb the latency.
Skip delimiters-only (Pattern 1) as a primary defense — it's not enough alone. Use it as one layer in a stack.
Skip the full Combined Defense (Pattern 5) unless you're in a high-security context with the engineering capacity to maintain it.