Skip to content

Defense in Depth

Layer all techniques together. Assume breach at each layer.

Try the notebooks

For runnable examples, see notebooks/5_defense_in_depth/.

Repo label: High-risk reference architecture

Layering every defense is a stronger starting point for high-stakes systems, but it still requires environment-specific hardening: tuned policies, real credentials, and operational monitoring.


The Philosophy

No single defense is perfect. Each layer catches what the previous missed:

Input → Detection (YARA, ML) → catches many common attacks
          ↓ (adaptive bypasses still exist)
     Delimiters → adds another boundary
          ↓ (still not sufficient alone)
     Typed Extraction → payload must fit schema
     Dry-Run Evaluation → intent mismatch reviewed
     Deterministic Validation → unknown recipients blocked
     Execute (if all pass)

Even if an attacker bypasses detection, delimiters clarify what the LLM should treat as data. Even if delimiters fail, typed extraction reduces payload capacity. Even if extraction is tricked, the dry-run evaluator can catch intent mismatch. Even if the evaluator is fooled, deterministic rules can still block known-bad actions such as unknown recipients. Every layer assumes the previous one was breached.


The Five Layers

┌─────────────────────────────────────────────────────────────────┐
│  Layer 1: Random Delimiters                                     │
│      Mark untrusted content boundaries                          │
│  └─▶ Layer 2: Typed Extraction                                  │
│          Constrain data to strict schema                        │
│      └─▶ Layer 3: Plan Generation                               │
│              Generate actions without executing                 │
│          └─▶ Layer 4: LLM Security Evaluation                   │
│                  Evaluate plan for risks                        │
│              └─▶ Layer 5: Deterministic Validation              │
│                      Rule-based checks (known contacts, etc.)   │
│                  └─▶ Execute (only if ALL layers pass)          │
└─────────────────────────────────────────────────────────────────┘

Layer-by-Layer Breakdown

Layer What It Does What It Catches
1. Delimiters Marks untrusted boundaries Naive injection attempts
2. Typed Extraction Constrains data to schema Payload can't fit in fields
3. Plan Generation Separates planning from execution N/A (setup for layer 4)
4. LLM Evaluation Reviews plan for safety Intent mismatch, suspicious actions
5. Deterministic Rule-based validation Unknown recipients, policy violations

Even if one layer fails, others catch the attack.


The Tradeoff

Metric Baseline Detection Only Full Defense
Latency 1x 1.1x 4-5x
Cost 1x 1.1x 4-5x
Complexity Low Low High
Security Effect None Limited, probabilistic filtering Layered resilience and containment

Defense in depth is expensive. Use it when the stakes justify the cost.


When to Use Full Defense vs When It's Overkill

✅ Worth the complexity ❌ Overkill
Customer-facing agents with tool access Internal tools with trusted users
Financial transactions Low-stakes applications
Healthcare/legal applications High-volume, cost-sensitive systems
Systems handling credentials/PII Read-only assistants
Where "oops" isn't acceptable Prototype/demo systems

The Meta-Insight

The question isn't "is this secure?" — nothing is perfectly secure.

The question is: Does the extra resilience justify the complexity and cost?

For most production systems, detection + some architecture (Levels 2–3) provides good balance. Full defense in depth is for when you truly can't afford failures.


The Cost

Metric Value
LLM Calls 3-4x baseline
Latency 4-5x baseline
Complexity High (many moving parts)
Maintenance Schemas, rules, evaluator prompts

Is it worth it?

For most systems: No. Detection + architectural patterns provides good balance.

For high-stakes systems (payments, healthcare, credentials): Yes.


References