RPG Context & RAG System Design¶

Status: Draft for Review Author: Claude + Human Date: 2025-12-31 Scope: palimpsest context injection and retrieval augmented generation

Executive Summary¶

The RPG DM agent needs context about the campaign world to generate accurate, immersive responses. This document analyzes our current context injection system, identifies gaps, and proposes a hybrid architecture combining deterministic injection with semantic RAG retrieval.

Key Question: When a player says "I go to the market," how does the DM know: - If we've been there before? - Who the vendors are? - What the setting looks like? - Whether to use existing canon or generate new content?

1. Current Architecture¶

1.1 Deterministic Context Injection¶

Location: api/context/builder.py

Every message to the DM agent is prefixed with deterministic context pulled directly from vault files:

## SESSION CONTEXT: Seagate
**Day 5**

### Player Character
**Jake** (Wizard)
- HP: 13/13
- Location: the-salty-sigil
- Gold: 45 gp

### Current Location
**The Salty Sigil** - A curio shop dealing in rare magical components

### NPCs Present
- **Marlena** (Component dealer) [active] - intimate-ally

### Active Storylines
- **[URGENT]** The Fading Muffle
  The stolen token thrums faintly—someone's scrying | DEADLINE: 36 hours

What's Injected: | Data | Source | Always/Conditional | |------|--------|-------------------| | PC state (HP, gold, conditions) | canon/pcs/*.md frontmatter | Always | | Current location name/description | canon/locations/*.md | Always | | NPCs at current location | canon/npcs/*.md filtered by location | Always | | Active threads (high/medium) | canon/open-threads.md | Always | | Recent timeline events | canon/timeline.md | Always |

1.2 Tool-Based Retrieval¶

Location: api/agent/tools.py

The DM agent has tools it can call during generation:

Tool	Purpose	When Used
`get_entity(type, id)`	Fetch full NPC/location/item details	When player mentions entity not in context
`search_entities(query)`	Text search across entities	When looking for something by description
`get_session_context()`	Refresh full context	Start of session

Problem: The DM must recognize it needs to call these tools. If it doesn't realize "the market" is a known location, it will generate a new one instead of retrieving canon.

1.3 RAG Infrastructure (New)¶

Location: api/rag/

We now have vector embeddings infrastructure:

pgvector database (palimpsest namespace)
├── embeddings table (150 chunks for seagate)
│   ├── NPCs (8 docs, 18 chunks)
│   ├── Locations (11 docs, 20 chunks)
│   ├── Items (8 docs, 16 chunks)
│   ├── Sessions (5 docs, 51 chunks)
│   ├── Secrets (3 docs, 3 chunks)
│   └── Plots (6 docs, 42 chunks)
├── rag_config table (per-campaign settings)
└── search_similar() function

API Endpoints: - POST /campaigns/{id}/rag/index - Index vault documents - POST /campaigns/{id}/rag/search - Semantic similarity search - GET /campaigns/{id}/rag/stats - Indexing statistics - GET/PATCH /campaigns/{id}/rag/config - Toggle RAG per campaign

Not Yet Integrated: RAG is available but not wired into the chat flow.

2. Problem Analysis¶

2.1 The Market Problem¶

Scenario: Player says "I go to the market"

Current Behavior: 1. Context injection includes current location (the-salty-sigil) 2. Context does NOT include "the market" details 3. DM agent sees no market info in context 4. DM might: - Generate a new market (wrong if we have canon) - Call search_entities("market") (correct but unreliable) - Ask player to clarify (breaks immersion)

Desired Behavior: 1. System detects "market" as a potential location 2. RAG searches for market-related content 3. If found: inject canon details before DM generates 4. If not found: DM generates new content, system indexes it

2.2 Context Gaps¶

Gap	Example	Impact
Locations not at current position	"What's the Gilded Quill like?"	DM may invent contradictory details
NPCs not present	"Where is Jorah?"	DM guesses instead of checking canon
Historical events	"What happened at the grotto?"	DM lacks session history
Secrets/plots	"Why does Vorlan want the stone?"	DM can't access _gm/secrets
Item details	"What does my token do?"	DM may forget item properties
Relationships	"How does Marlena know Elara?"	Relationship graph not surfaced

2.3 Why Tools Alone Don't Solve This¶

Recognition Failure: DM doesn't know what it doesn't know
Latency: Tool calls happen mid-generation, causing delays
Inconsistency: Sometimes calls tools, sometimes generates
No Preemption: Can't inject context before generation starts

3. Design Considerations¶

3.1 Design Principles¶

Canon Supremacy: Existing vault content always takes precedence over generation
Immersion First: Context retrieval should be invisible to player
Fail Safe: When uncertain, retrieve more context rather than less
Cost Aware: Embedding API calls cost money; be strategic
Latency Budget: Total context build should be <500ms

3.2 Context Categories¶

Category	Injection Strategy	Rationale
Critical State	Always deterministic	PC HP, location, conditions must never be wrong
Immediate Environment	Always deterministic	Current location, present NPCs are always relevant
Active Narrative	Always deterministic	Threads, deadlines drive gameplay
Referenced Entities	RAG on mention	Only retrieve what player asks about
Historical Context	RAG on trigger	Session history is huge, retrieve selectively
GM Secrets	RAG with filtering	Only surface secrets player could know

3.3 Trigger Types¶

When should RAG activate?

Trigger	Example	Search Strategy
Explicit Question	"Who is Vorlan?"	Search NPCs for "Vorlan"
Location Reference	"I go to the market"	Search locations for "market"
Historical Reference	"What happened last time?"	Search sessions for recent events
Item Reference	"I use the token"	Search items for "token"
Relationship Query	"Does Marlena know Elara?"	Search NPCs for both names
Topic Keyword	"Tell me about the plague"	Search all for "plague"

3.4 What NOT to RAG¶

Current location details - Already injected deterministically
Present NPCs - Already injected deterministically
PC state - Never from RAG, always from frontmatter
Active threads - Already injected deterministically
Generic actions - "I attack" doesn't need retrieval

4. Proposed Hybrid Architecture¶

4.1 Architecture Diagram¶

┌─────────────────────────────────────────────────────────────────┐
│                        User Message                              │
│                   "I head to the market"                         │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Trigger Analyzer                              │
│  ┌─────────────┐  ┌─────────────┐  ┌─────────────┐              │
│  │   Entity    │  │  Location   │  │  Historical │              │
│  │  Extractor  │  │  Detector   │  │   Trigger   │              │
│  └─────────────┘  └─────────────┘  └─────────────┘              │
│         │                │                │                      │
│         └────────────────┼────────────────┘                      │
│                          ▼                                       │
│              Trigger List: ["market"]                            │
└─────────────────────────────────────────────────────────────────┘
                              │
              ┌───────────────┴───────────────┐
              ▼                               ▼
┌──────────────────────┐        ┌──────────────────────┐
│  Deterministic Build │        │     RAG Retrieval    │
│  (Always Runs)       │        │  (If Triggers Found) │
│                      │        │                      │
│  • PC State          │        │  Query: "market"     │
│  • Current Location  │        │  → Location match    │
│  • NPCs Present      │        │  → NPC vendors       │
│  • Active Threads    │        │  → Session mentions  │
│  • Recent Events     │        │                      │
└──────────────────────┘        └──────────────────────┘
              │                               │
              └───────────────┬───────────────┘
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                    Context Assembly                              │
│                                                                  │
│  ## SESSION CONTEXT                                              │
│  [Deterministic: PC, Location, NPCs, Threads]                   │
│                                                                  │
│  ## RETRIEVED CONTEXT                                            │
│  [RAG: Market location details, vendor NPCs]                    │
│                                                                  │
│  ---                                                             │
│  PLAYER: I head to the market                                    │
└─────────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────────┐
│                       DM Agent                                   │
│                                                                  │
│  Has full context → Generates accurate response                  │
│  Uses canon market details, not invented ones                    │
└─────────────────────────────────────────────────────────────────┘

4.2 Context Assembly Order¶

CONTEXT_TEMPLATE = """
## SESSION CONTEXT: {campaign_name}
**Day {day}**{time}

### Player Character
{pc_block}

### Current Location
{location_block}

### NPCs Present
{npcs_block}

### Active Storylines
{threads_block}

{retrieved_context}

---
PLAYER: {user_message}
"""

Retrieved Context Section (when present):

## Retrieved Context
The following information was retrieved based on your message:

### [LOCATION] The Morning Market
A bustling open-air market in Seagate's merchant quarter...

### [NPC] Old Tam (Fish vendor)
A weathered fisherman who sells the morning catch...

4.3 Deduplication¶

RAG results must not duplicate deterministic context:

def dedupe_rag_results(rag_results, deterministic_ctx):
    """Remove RAG results that duplicate deterministic context."""

    # Get IDs already in deterministic context
    present_npcs = {npc.id for npc in deterministic_ctx.npcs_present}
    current_loc = deterministic_ctx.location

    # Filter RAG results
    filtered = []
    for result in rag_results:
        if result.source_type == "npc" and result.source_id in present_npcs:
            continue  # Already in "NPCs Present"
        if result.source_type == "location" and result.source_id == current_loc:
            continue  # Already in "Current Location"
        filtered.append(result)

    return filtered

5. Trigger Detection System¶

5.1 Trigger Types & Patterns¶

class TriggerType(Enum):
    ENTITY_QUESTION = "entity_question"    # "Who is X?"
    LOCATION_MOVEMENT = "location_movement" # "I go to X"
    LOCATION_QUERY = "location_query"       # "What's X like?"
    HISTORICAL = "historical"               # "What happened..."
    ITEM_REFERENCE = "item_reference"       # "I use the X"
    TOPIC_QUERY = "topic_query"            # "Tell me about X"


TRIGGER_PATTERNS = {
    TriggerType.ENTITY_QUESTION: [
        r"who is (\w+)",
        r"where is (\w+)",
        r"what does (\w+) want",
        r"how is (\w+)",
    ],
    TriggerType.LOCATION_MOVEMENT: [
        r"i (?:go|head|walk|travel|return) to (?:the )?(.+?)(?:\.|$)",
        r"let'?s (?:go|head) to (?:the )?(.+?)(?:\.|$)",
        r"i want to visit (?:the )?(.+?)(?:\.|$)",
    ],
    TriggerType.LOCATION_QUERY: [
        r"what(?:'s| is) (?:the )?(.+?) like",
        r"describe (?:the )?(.+)",
        r"tell me about (?:the )?(.+?)(?:\.|$)",
    ],
    TriggerType.HISTORICAL: [
        r"what happened (?:at|in|to|with) (.+)",
        r"last time (?:at|in|we) (.+)",
        r"remember (?:when|the) (.+)",
        r"previously (.+)",
    ],
    TriggerType.ITEM_REFERENCE: [
        r"i (?:use|examine|look at|inspect) (?:my |the )?(.+)",
        r"what does (?:my |the )?(.+?) do",
    ],
}

5.2 Entity Extraction¶

Beyond regex, extract proper nouns:

def extract_entities(message: str, known_context: set[str]) -> list[str]:
    """Extract potential entity names from message."""

    entities = []

    # Capitalized words (potential proper nouns)
    caps = re.findall(r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\b', message)

    for word in caps:
        word_lower = word.lower()
        # Skip if already in deterministic context
        if word_lower in known_context:
            continue
        # Skip common words
        if word_lower in COMMON_WORDS:
            continue
        entities.append(word)

    return entities

5.3 Search Strategy by Trigger¶

Trigger Type	Source Types	Limit	Threshold
Entity Question	npc, faction	3	0.5
Location Movement	location	3	0.5
Location Query	location, session	5	0.4
Historical	session, plot	5	0.4
Item Reference	item	2	0.5
Topic Query	all	5	0.4

5.4 Confidence Scoring¶

Not all triggers should fire RAG:

def should_rag(trigger: str, trigger_type: TriggerType, ctx: SessionContext) -> bool:
    """Decide if RAG is warranted for this trigger."""

    # High confidence triggers - always RAG
    if trigger_type in [TriggerType.ENTITY_QUESTION, TriggerType.LOCATION_MOVEMENT]:
        return True

    # Check if entity exists in vault (fast file check)
    if vault_entity_exists(trigger):
        return True

    # Low confidence + no vault match - skip RAG
    # Let DM generate or ask for clarification
    return False

6. Implementation Phases¶

Phase 1: Basic Integration (MVP)¶

Effort: 4-6 hours

Wire RAG retriever into chat route
Simple trigger detection (location movement + entity questions)
Inject RAG results after deterministic context
Add logging for retrieval decisions

Deliverables: - Modified api/routes/chat.py - New api/context/hybrid.py - Logging for debugging

Phase 2: Smart Triggers¶

Effort: 4-6 hours

Full trigger pattern matching
Entity extraction from message
Confidence scoring
Deduplication with deterministic context

Deliverables: - api/context/triggers.py - Enhanced trigger patterns - Test suite for trigger detection

Phase 3: Feedback Loop¶

Effort: 6-8 hours

Track when DM calls get_entity for something not in context
Auto-suggest indexing for new entities
Log retrieval quality (did RAG help?)
Dashboard for RAG effectiveness

Deliverables: - Retrieval analytics - Auto-index suggestions - Quality metrics

Phase 4: Advanced Features¶

Effort: 8-12 hours

Relationship graph queries ("How does X know Y?")
Temporal awareness ("What happened on Day 3?")
Secret filtering (only surface discoverable secrets)
Multi-turn context (remember what was retrieved last turn)

Deliverables: - Relationship index - Temporal search - Secret visibility rules

7. Cost Analysis¶

7.1 Embedding Costs (OpenAI text-embedding-3-small)¶

Operation	Tokens	Cost
Index 1 NPC (~500 chars)	~125	$0.0000025
Index 1 Session (~2000 chars)	~500	$0.00001
Index full campaign (41 docs)	~15,000	$0.0003
Query embedding	~20	$0.0000004

Monthly Estimate (heavy usage): - 100 chat messages/day × 2 RAG queries each = 200 queries - 200 queries × 30 days × $0.0000004 = $0.0024/month - Re-indexing weekly: 4 × $0.0003 = $0.0012/month - Total: ~$0.004/month (negligible)

7.2 Latency Budget¶

Operation	Target	Actual
Trigger analysis	<10ms	~5ms
Embedding API call	<200ms	~150ms
pgvector search	<50ms	~20ms
Context assembly	<10ms	~5ms
Total RAG overhead	<300ms	~180ms

7.3 When to Skip RAG¶

To minimize latency for simple messages:

SKIP_RAG_PATTERNS = [
    r"^(yes|no|ok|sure|thanks)\.?$",  # Simple responses
    r"^i (attack|cast|roll)",          # Combat actions
    r"^<.+>$",                          # OOC messages
]

8. Trade-offs & Risks¶

8.1 Architecture Trade-offs¶

Approach	Pros	Cons
Deterministic Only	Fast, free, predictable	Misses referenced entities
RAG Only	Comprehensive	Slow, may miss obvious context
Tool-Based Only	DM decides what to fetch	Unreliable recognition
Hybrid (Proposed)	Best coverage	More complex, ~200ms overhead

8.2 Risks & Mitigations¶

Risk	Impact	Mitigation
Over-retrieval	Bloated context, higher LLM costs	Strict token limits, deduplication
Under-retrieval	DM generates wrong info	Err on side of retrieval, log misses
Stale embeddings	Canon changes not reflected	Re-index on vault changes
False triggers	RAG for irrelevant terms	Confidence scoring, skip common words
Latency spikes	Slow responses	Timeout + fallback to no-RAG

8.3 Failure Modes¶

RAG Unavailable:

try:
    rag_context = await retriever.search(triggers)
except Exception as e:
    logger.warning(f"RAG failed, continuing without: {e}")
    rag_context = ""  # Graceful degradation

Embedding API Down: - Fall back to deterministic-only - Log for alerting - Retry on next message

9. Open Questions¶

For Peer Review¶

Trigger Sensitivity: How aggressive should trigger detection be?
Conservative: Only explicit questions ("Who is X?")
Aggressive: Any proper noun not in context
Recommendation: Start conservative, tune based on logs
Secret Handling: Should RAG ever surface _gm/secrets?
Option A: Never (DM decides via tools)
Option B: Only if player could have discovered it
Option C: Surface with [SECRET] tag for DM discretion
Multi-Turn Memory: Should we remember what was retrieved last turn?
Pro: Avoids re-retrieving same context
Con: Adds state complexity
Recommendation: Defer to Phase 4
Embedding Model: Stick with OpenAI or switch to local?
OpenAI: $0.02/1M tokens, 1536 dims, excellent quality
Nomic (local): Free, 768 dims, good quality
Recommendation: Keep OpenAI for now, evaluate local later
Index Freshness: When to re-index?
On every vault write (real-time)
On session end (batch)
Manual trigger only
Recommendation: Session end + manual for now

Appendix A: File Inventory¶

Current Implementation¶

api/
├── context/
│   └── builder.py          # Deterministic context builder
├── rag/
│   ├── __init__.py
│   ├── db.py               # pgvector connection pool
│   ├── embeddings.py       # OpenAI embedding client
│   ├── chunker.py          # Document chunker
│   ├── indexer.py          # Vault indexer
│   └── retriever.py        # Similarity search
├── routes/
│   ├── chat.py             # Chat endpoint (needs hybrid integration)
│   └── rag.py              # RAG management endpoints
└── agent/
    ├── dm.py               # DM agent definition
    └── tools.py            # Agent tools (get_entity, etc.)

Proposed Additions¶

api/
├── context/
│   ├── builder.py          # (existing)
│   ├── hybrid.py           # NEW: Hybrid context builder
│   └── triggers.py         # NEW: Trigger detection

Appendix B: Example Flows¶

Flow 1: Simple Movement (No RAG)¶

User: "I walk over to Marlena"

Trigger Analysis:
- "Marlena" detected
- Marlena IS in NPCs Present
- No RAG needed

Context: [Deterministic only]
DM Response: Uses existing Marlena context

Flow 2: Location Movement (RAG Triggered)¶

User: "I head to the market"

Trigger Analysis:
- Location movement detected: "market"
- "market" NOT in current location
- RAG triggered

RAG Search: "market" in locations
- Result: "morning-market" (similarity: 0.72)

Context: [Deterministic] + [RAG: Morning Market details]
DM Response: Uses canon market description

Flow 3: Entity Question (RAG Triggered)¶

User: "Where is Jorah?"

Trigger Analysis:
- Entity question detected: "Jorah"
- Jorah NOT in NPCs Present
- RAG triggered

RAG Search: "Jorah" in npcs
- Result: "jorah" NPC file (similarity: 0.85)
- Includes: location: gilded-quill

Context: [Deterministic] + [RAG: Jorah at Gilded Quill]
DM Response: "Jorah is at the Gilded Quill..."

Flow 4: New Entity (No RAG Match)¶

User: "I look for a blacksmith"

Trigger Analysis:
- Location query detected: "blacksmith"
- RAG triggered

RAG Search: "blacksmith" in locations, npcs
- Result: No matches above threshold

Context: [Deterministic only]
DM Response: Generates new blacksmith (can be indexed later)

Revision History¶

Version	Date	Author	Changes
0.1	2025-12-31	Claude	Initial draft