Skip to content

RPG Context & RAG System Design

Status: Draft for Review Author: Claude + Human Date: 2025-12-31 Scope: palimpsest context injection and retrieval augmented generation


Executive Summary

The RPG DM agent needs context about the campaign world to generate accurate, immersive responses. This document analyzes our current context injection system, identifies gaps, and proposes a hybrid architecture combining deterministic injection with semantic RAG retrieval.

Key Question: When a player says "I go to the market," how does the DM know: - If we've been there before? - Who the vendors are? - What the setting looks like? - Whether to use existing canon or generate new content?


Table of Contents

  1. Current Architecture
  2. Problem Analysis
  3. Design Considerations
  4. Proposed Hybrid Architecture
  5. Trigger Detection System
  6. Implementation Phases
  7. Cost Analysis
  8. Trade-offs & Risks
  9. Open Questions

1. Current Architecture

1.1 Deterministic Context Injection

Location: api/context/builder.py

Every message to the DM agent is prefixed with deterministic context pulled directly from vault files:

## SESSION CONTEXT: Seagate
**Day 5**

### Player Character
**Jake** (Wizard)
- HP: 13/13
- Location: the-salty-sigil
- Gold: 45 gp

### Current Location
**The Salty Sigil** - A curio shop dealing in rare magical components

### NPCs Present
- **Marlena** (Component dealer) [active] - intimate-ally

### Active Storylines
- **[URGENT]** The Fading Muffle
  The stolen token thrums faintly—someone's scrying | DEADLINE: 36 hours

What's Injected: | Data | Source | Always/Conditional | |------|--------|-------------------| | PC state (HP, gold, conditions) | canon/pcs/*.md frontmatter | Always | | Current location name/description | canon/locations/*.md | Always | | NPCs at current location | canon/npcs/*.md filtered by location | Always | | Active threads (high/medium) | canon/open-threads.md | Always | | Recent timeline events | canon/timeline.md | Always |

1.2 Tool-Based Retrieval

Location: api/agent/tools.py

The DM agent has tools it can call during generation:

Tool Purpose When Used
get_entity(type, id) Fetch full NPC/location/item details When player mentions entity not in context
search_entities(query) Text search across entities When looking for something by description
get_session_context() Refresh full context Start of session

Problem: The DM must recognize it needs to call these tools. If it doesn't realize "the market" is a known location, it will generate a new one instead of retrieving canon.

1.3 RAG Infrastructure (New)

Location: api/rag/

We now have vector embeddings infrastructure:

pgvector database (palimpsest namespace)
├── embeddings table (150 chunks for seagate)
│   ├── NPCs (8 docs, 18 chunks)
│   ├── Locations (11 docs, 20 chunks)
│   ├── Items (8 docs, 16 chunks)
│   ├── Sessions (5 docs, 51 chunks)
│   ├── Secrets (3 docs, 3 chunks)
│   └── Plots (6 docs, 42 chunks)
├── rag_config table (per-campaign settings)
└── search_similar() function

API Endpoints: - POST /campaigns/{id}/rag/index - Index vault documents - POST /campaigns/{id}/rag/search - Semantic similarity search - GET /campaigns/{id}/rag/stats - Indexing statistics - GET/PATCH /campaigns/{id}/rag/config - Toggle RAG per campaign

Not Yet Integrated: RAG is available but not wired into the chat flow.


2. Problem Analysis

2.1 The Market Problem

Scenario: Player says "I go to the market"

Current Behavior: 1. Context injection includes current location (the-salty-sigil) 2. Context does NOT include "the market" details 3. DM agent sees no market info in context 4. DM might: - Generate a new market (wrong if we have canon) - Call search_entities("market") (correct but unreliable) - Ask player to clarify (breaks immersion)

Desired Behavior: 1. System detects "market" as a potential location 2. RAG searches for market-related content 3. If found: inject canon details before DM generates 4. If not found: DM generates new content, system indexes it

2.2 Context Gaps

Gap Example Impact
Locations not at current position "What's the Gilded Quill like?" DM may invent contradictory details
NPCs not present "Where is Jorah?" DM guesses instead of checking canon
Historical events "What happened at the grotto?" DM lacks session history
Secrets/plots "Why does Vorlan want the stone?" DM can't access _gm/secrets
Item details "What does my token do?" DM may forget item properties
Relationships "How does Marlena know Elara?" Relationship graph not surfaced

2.3 Why Tools Alone Don't Solve This

  1. Recognition Failure: DM doesn't know what it doesn't know
  2. Latency: Tool calls happen mid-generation, causing delays
  3. Inconsistency: Sometimes calls tools, sometimes generates
  4. No Preemption: Can't inject context before generation starts

3. Design Considerations

3.1 Design Principles

  1. Canon Supremacy: Existing vault content always takes precedence over generation
  2. Immersion First: Context retrieval should be invisible to player
  3. Fail Safe: When uncertain, retrieve more context rather than less
  4. Cost Aware: Embedding API calls cost money; be strategic
  5. Latency Budget: Total context build should be <500ms

3.2 Context Categories

Category Injection Strategy Rationale
Critical State Always deterministic PC HP, location, conditions must never be wrong
Immediate Environment Always deterministic Current location, present NPCs are always relevant
Active Narrative Always deterministic Threads, deadlines drive gameplay
Referenced Entities RAG on mention Only retrieve what player asks about
Historical Context RAG on trigger Session history is huge, retrieve selectively
GM Secrets RAG with filtering Only surface secrets player could know

3.3 Trigger Types

When should RAG activate?

Trigger Example Search Strategy
Explicit Question "Who is Vorlan?" Search NPCs for "Vorlan"
Location Reference "I go to the market" Search locations for "market"
Historical Reference "What happened last time?" Search sessions for recent events
Item Reference "I use the token" Search items for "token"
Relationship Query "Does Marlena know Elara?" Search NPCs for both names
Topic Keyword "Tell me about the plague" Search all for "plague"

3.4 What NOT to RAG

  • Current location details - Already injected deterministically
  • Present NPCs - Already injected deterministically
  • PC state - Never from RAG, always from frontmatter
  • Active threads - Already injected deterministically
  • Generic actions - "I attack" doesn't need retrieval

4. Proposed Hybrid Architecture

4.1 Architecture Diagram

┌─────────────────────────────────────────────────────────────────┐
│                        User Message                              │
│                   "I head to the market"                         │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    Trigger Analyzer                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │   Entity    │  │  Location   │  │  Historical │              │
│  │  Extractor  │  │  Detector   │  │   Trigger   │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
│         │                │                │                      │
│         └────────────────┼────────────────┘                      │
│                          ▼                                       │
│              Trigger List: ["market"]                            │
└─────────────────────────────────────────────────────────────────┘
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌──────────────────────┐        ┌──────────────────────┐
│  Deterministic Build │        │     RAG Retrieval    │
│  (Always Runs)       │        │  (If Triggers Found) │
│                      │        │                      │
│  • PC State          │        │  Query: "market"     │
│  • Current Location  │        │  → Location match    │
│  • NPCs Present      │        │  → NPC vendors       │
│  • Active Threads    │        │  → Session mentions  │
│  • Recent Events     │        │                      │
└──────────────────────┘        └──────────────────────┘
              │                               │
              └───────────────┬───────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                    Context Assembly                              │
│                                                                  │
│  ## SESSION CONTEXT                                              │
│  [Deterministic: PC, Location, NPCs, Threads]                   │
│                                                                  │
│  ## RETRIEVED CONTEXT                                            │
│  [RAG: Market location details, vendor NPCs]                    │
│                                                                  │
│  ---                                                             │
│  PLAYER: I head to the market                                    │
└─────────────────────────────────────────────────────────────────┘
┌─────────────────────────────────────────────────────────────────┐
│                       DM Agent                                   │
│                                                                  │
│  Has full context → Generates accurate response                  │
│  Uses canon market details, not invented ones                    │
└─────────────────────────────────────────────────────────────────┘

4.2 Context Assembly Order

CONTEXT_TEMPLATE = """
## SESSION CONTEXT: {campaign_name}
**Day {day}**{time}

### Player Character
{pc_block}

### Current Location
{location_block}

### NPCs Present
{npcs_block}

### Active Storylines
{threads_block}

{retrieved_context}

---
PLAYER: {user_message}
"""

Retrieved Context Section (when present):

## Retrieved Context
The following information was retrieved based on your message:

### [LOCATION] The Morning Market
A bustling open-air market in Seagate's merchant quarter...

### [NPC] Old Tam (Fish vendor)
A weathered fisherman who sells the morning catch...

4.3 Deduplication

RAG results must not duplicate deterministic context:

def dedupe_rag_results(rag_results, deterministic_ctx):
    """Remove RAG results that duplicate deterministic context."""

    # Get IDs already in deterministic context
    present_npcs = {npc.id for npc in deterministic_ctx.npcs_present}
    current_loc = deterministic_ctx.location

    # Filter RAG results
    filtered = []
    for result in rag_results:
        if result.source_type == "npc" and result.source_id in present_npcs:
            continue  # Already in "NPCs Present"
        if result.source_type == "location" and result.source_id == current_loc:
            continue  # Already in "Current Location"
        filtered.append(result)

    return filtered

5. Trigger Detection System

5.1 Trigger Types & Patterns

class TriggerType(Enum):
    ENTITY_QUESTION = "entity_question"    # "Who is X?"
    LOCATION_MOVEMENT = "location_movement" # "I go to X"
    LOCATION_QUERY = "location_query"       # "What's X like?"
    HISTORICAL = "historical"               # "What happened..."
    ITEM_REFERENCE = "item_reference"       # "I use the X"
    TOPIC_QUERY = "topic_query"            # "Tell me about X"


TRIGGER_PATTERNS = {
    TriggerType.ENTITY_QUESTION: [
        r"who is (\w+)",
        r"where is (\w+)",
        r"what does (\w+) want",
        r"how is (\w+)",
    ],
    TriggerType.LOCATION_MOVEMENT: [
        r"i (?:go|head|walk|travel|return) to (?:the )?(.+?)(?:\.|$)",
        r"let'?s (?:go|head) to (?:the )?(.+?)(?:\.|$)",
        r"i want to visit (?:the )?(.+?)(?:\.|$)",
    ],
    TriggerType.LOCATION_QUERY: [
        r"what(?:'s| is) (?:the )?(.+?) like",
        r"describe (?:the )?(.+)",
        r"tell me about (?:the )?(.+?)(?:\.|$)",
    ],
    TriggerType.HISTORICAL: [
        r"what happened (?:at|in|to|with) (.+)",
        r"last time (?:at|in|we) (.+)",
        r"remember (?:when|the) (.+)",
        r"previously (.+)",
    ],
    TriggerType.ITEM_REFERENCE: [
        r"i (?:use|examine|look at|inspect) (?:my |the )?(.+)",
        r"what does (?:my |the )?(.+?) do",
    ],
}

5.2 Entity Extraction

Beyond regex, extract proper nouns:

def extract_entities(message: str, known_context: set[str]) -> list[str]:
    """Extract potential entity names from message."""

    entities = []

    # Capitalized words (potential proper nouns)
    caps = re.findall(r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\b', message)

    for word in caps:
        word_lower = word.lower()
        # Skip if already in deterministic context
        if word_lower in known_context:
            continue
        # Skip common words
        if word_lower in COMMON_WORDS:
            continue
        entities.append(word)

    return entities

5.3 Search Strategy by Trigger

Trigger Type Source Types Limit Threshold
Entity Question npc, faction 3 0.5
Location Movement location 3 0.5
Location Query location, session 5 0.4
Historical session, plot 5 0.4
Item Reference item 2 0.5
Topic Query all 5 0.4

5.4 Confidence Scoring

Not all triggers should fire RAG:

def should_rag(trigger: str, trigger_type: TriggerType, ctx: SessionContext) -> bool:
    """Decide if RAG is warranted for this trigger."""

    # High confidence triggers - always RAG
    if trigger_type in [TriggerType.ENTITY_QUESTION, TriggerType.LOCATION_MOVEMENT]:
        return True

    # Check if entity exists in vault (fast file check)
    if vault_entity_exists(trigger):
        return True

    # Low confidence + no vault match - skip RAG
    # Let DM generate or ask for clarification
    return False

6. Implementation Phases

Phase 1: Basic Integration (MVP)

Effort: 4-6 hours

  • Wire RAG retriever into chat route
  • Simple trigger detection (location movement + entity questions)
  • Inject RAG results after deterministic context
  • Add logging for retrieval decisions

Deliverables: - Modified api/routes/chat.py - New api/context/hybrid.py - Logging for debugging

Phase 2: Smart Triggers

Effort: 4-6 hours

  • Full trigger pattern matching
  • Entity extraction from message
  • Confidence scoring
  • Deduplication with deterministic context

Deliverables: - api/context/triggers.py - Enhanced trigger patterns - Test suite for trigger detection

Phase 3: Feedback Loop

Effort: 6-8 hours

  • Track when DM calls get_entity for something not in context
  • Auto-suggest indexing for new entities
  • Log retrieval quality (did RAG help?)
  • Dashboard for RAG effectiveness

Deliverables: - Retrieval analytics - Auto-index suggestions - Quality metrics

Phase 4: Advanced Features

Effort: 8-12 hours

  • Relationship graph queries ("How does X know Y?")
  • Temporal awareness ("What happened on Day 3?")
  • Secret filtering (only surface discoverable secrets)
  • Multi-turn context (remember what was retrieved last turn)

Deliverables: - Relationship index - Temporal search - Secret visibility rules


7. Cost Analysis

7.1 Embedding Costs (OpenAI text-embedding-3-small)

Operation Tokens Cost
Index 1 NPC (~500 chars) ~125 $0.0000025
Index 1 Session (~2000 chars) ~500 $0.00001
Index full campaign (41 docs) ~15,000 $0.0003
Query embedding ~20 $0.0000004

Monthly Estimate (heavy usage): - 100 chat messages/day × 2 RAG queries each = 200 queries - 200 queries × 30 days × $0.0000004 = $0.0024/month - Re-indexing weekly: 4 × $0.0003 = $0.0012/month - Total: ~$0.004/month (negligible)

7.2 Latency Budget

Operation Target Actual
Trigger analysis <10ms ~5ms
Embedding API call <200ms ~150ms
pgvector search <50ms ~20ms
Context assembly <10ms ~5ms
Total RAG overhead <300ms ~180ms

7.3 When to Skip RAG

To minimize latency for simple messages:

SKIP_RAG_PATTERNS = [
    r"^(yes|no|ok|sure|thanks)\.?$",  # Simple responses
    r"^i (attack|cast|roll)",          # Combat actions
    r"^<.+>$",                          # OOC messages
]

8. Trade-offs & Risks

8.1 Architecture Trade-offs

Approach Pros Cons
Deterministic Only Fast, free, predictable Misses referenced entities
RAG Only Comprehensive Slow, may miss obvious context
Tool-Based Only DM decides what to fetch Unreliable recognition
Hybrid (Proposed) Best coverage More complex, ~200ms overhead

8.2 Risks & Mitigations

Risk Impact Mitigation
Over-retrieval Bloated context, higher LLM costs Strict token limits, deduplication
Under-retrieval DM generates wrong info Err on side of retrieval, log misses
Stale embeddings Canon changes not reflected Re-index on vault changes
False triggers RAG for irrelevant terms Confidence scoring, skip common words
Latency spikes Slow responses Timeout + fallback to no-RAG

8.3 Failure Modes

RAG Unavailable:

try:
    rag_context = await retriever.search(triggers)
except Exception as e:
    logger.warning(f"RAG failed, continuing without: {e}")
    rag_context = ""  # Graceful degradation

Embedding API Down: - Fall back to deterministic-only - Log for alerting - Retry on next message


9. Open Questions

For Peer Review

  1. Trigger Sensitivity: How aggressive should trigger detection be?
  2. Conservative: Only explicit questions ("Who is X?")
  3. Aggressive: Any proper noun not in context
  4. Recommendation: Start conservative, tune based on logs

  5. Secret Handling: Should RAG ever surface _gm/secrets?

  6. Option A: Never (DM decides via tools)
  7. Option B: Only if player could have discovered it
  8. Option C: Surface with [SECRET] tag for DM discretion

  9. Multi-Turn Memory: Should we remember what was retrieved last turn?

  10. Pro: Avoids re-retrieving same context
  11. Con: Adds state complexity
  12. Recommendation: Defer to Phase 4

  13. Embedding Model: Stick with OpenAI or switch to local?

  14. OpenAI: $0.02/1M tokens, 1536 dims, excellent quality
  15. Nomic (local): Free, 768 dims, good quality
  16. Recommendation: Keep OpenAI for now, evaluate local later

  17. Index Freshness: When to re-index?

  18. On every vault write (real-time)
  19. On session end (batch)
  20. Manual trigger only
  21. Recommendation: Session end + manual for now

Appendix A: File Inventory

Current Implementation

api/
├── context/
│   └── builder.py          # Deterministic context builder
├── rag/
│   ├── __init__.py
│   ├── db.py               # pgvector connection pool
│   ├── embeddings.py       # OpenAI embedding client
│   ├── chunker.py          # Document chunker
│   ├── indexer.py          # Vault indexer
│   └── retriever.py        # Similarity search
├── routes/
│   ├── chat.py             # Chat endpoint (needs hybrid integration)
│   └── rag.py              # RAG management endpoints
└── agent/
    ├── dm.py               # DM agent definition
    └── tools.py            # Agent tools (get_entity, etc.)

Proposed Additions

api/
├── context/
│   ├── builder.py          # (existing)
│   ├── hybrid.py           # NEW: Hybrid context builder
│   └── triggers.py         # NEW: Trigger detection

Appendix B: Example Flows

Flow 1: Simple Movement (No RAG)

User: "I walk over to Marlena"

Trigger Analysis:
- "Marlena" detected
- Marlena IS in NPCs Present
- No RAG needed

Context: [Deterministic only]
DM Response: Uses existing Marlena context

Flow 2: Location Movement (RAG Triggered)

User: "I head to the market"

Trigger Analysis:
- Location movement detected: "market"
- "market" NOT in current location
- RAG triggered

RAG Search: "market" in locations
- Result: "morning-market" (similarity: 0.72)

Context: [Deterministic] + [RAG: Morning Market details]
DM Response: Uses canon market description

Flow 3: Entity Question (RAG Triggered)

User: "Where is Jorah?"

Trigger Analysis:
- Entity question detected: "Jorah"
- Jorah NOT in NPCs Present
- RAG triggered

RAG Search: "Jorah" in npcs
- Result: "jorah" NPC file (similarity: 0.85)
- Includes: location: gilded-quill

Context: [Deterministic] + [RAG: Jorah at Gilded Quill]
DM Response: "Jorah is at the Gilded Quill..."

Flow 4: New Entity (No RAG Match)

User: "I look for a blacksmith"

Trigger Analysis:
- Location query detected: "blacksmith"
- RAG triggered

RAG Search: "blacksmith" in locations, npcs
- Result: No matches above threshold

Context: [Deterministic only]
DM Response: Generates new blacksmith (can be indexed later)

Revision History

Version Date Author Changes
0.1 2025-12-31 Claude Initial draft