Defense in Depth¶

Layer all techniques together. Assume breach at each layer.

Try the notebooks

For runnable examples, see notebooks/5_defense_in_depth/.

Repo label: High-risk reference architecture

Layering every defense is a stronger starting point for high-stakes systems, but it still requires environment-specific hardening: tuned policies, real credentials, and operational monitoring.

The Philosophy¶

No single defense is perfect. Each layer catches what the previous missed:

Input → Detection (YARA, ML) → catches many common attacks
          ↓ (adaptive bypasses still exist)
     Delimiters → adds another boundary
          ↓ (still not sufficient alone)
     Typed Extraction → payload must fit schema
          ↓
     Dry-Run Evaluation → intent mismatch reviewed
          ↓
     Deterministic Validation → unknown recipients blocked
          ↓
     Execute (if all pass)

Even if an attacker bypasses detection, delimiters clarify what the LLM should treat as data. Even if delimiters fail, typed extraction reduces payload capacity. Even if extraction is tricked, the dry-run evaluator can catch intent mismatch. Even if the evaluator is fooled, deterministic rules can still block known-bad actions such as unknown recipients. Every layer assumes the previous one was breached.

The Five Layers¶

┌─────────────────────────────────────────────────────────────────┐
│  Layer 1: Random Delimiters                                     │
│      Mark untrusted content boundaries                          │
│  └─▶ Layer 2: Typed Extraction                                  │
│          Constrain data to strict schema                        │
│      └─▶ Layer 3: Plan Generation                               │
│              Generate actions without executing                 │
│          └─▶ Layer 4: LLM Security Evaluation                   │
│                  Evaluate plan for risks                        │
│              └─▶ Layer 5: Deterministic Validation              │
│                      Rule-based checks (known contacts, etc.)   │
│                  └─▶ Execute (only if ALL layers pass)          │
└─────────────────────────────────────────────────────────────────┘

Layer-by-Layer Breakdown¶

Layer	What It Does	What It Catches
1. Delimiters	Marks untrusted boundaries	Naive injection attempts
2. Typed Extraction	Constrains data to schema	Payload can't fit in fields
3. Plan Generation	Separates planning from execution	N/A (setup for layer 4)
4. LLM Evaluation	Reviews plan for safety	Intent mismatch, suspicious actions
5. Deterministic	Rule-based validation	Unknown recipients, policy violations

Even if one layer fails, others catch the attack.

The Tradeoff¶

Metric	Baseline	Detection Only	Full Defense
Latency	1x	1.1x	4-5x
Cost	1x	1.1x	4-5x
Complexity	Low	Low	High
Security Effect	None	Limited, probabilistic filtering	Layered resilience and containment

Defense in depth is expensive. Use it when the stakes justify the cost.

When to Use Full Defense vs When It's Overkill¶

✅ Worth the complexity	❌ Overkill
Customer-facing agents with tool access	Internal tools with trusted users
Financial transactions	Low-stakes applications
Healthcare/legal applications	High-volume, cost-sensitive systems
Systems handling credentials/PII	Read-only assistants
Where "oops" isn't acceptable	Prototype/demo systems

The Meta-Insight¶

The question isn't "is this secure?" — nothing is perfectly secure.

The question is: Does the extra resilience justify the complexity and cost?

For most production systems, detection + some architecture (Levels 2–3) provides good balance. Full defense in depth is for when you truly can't afford failures.

The Cost¶

Metric	Value
LLM Calls	3-4x baseline
Latency	4-5x baseline
Complexity	High (many moving parts)
Maintenance	Schemas, rules, evaluator prompts

Is it worth it?

For most systems: No. Detection + architectural patterns provides good balance.

For high-stakes systems (payments, healthcare, credentials): Yes.

References¶

Simon Willison — Dual LLM Pattern
Microsoft — Spotlighting
StruQ — Structured Queries
Google DeepMind — CaMeL