RPG Context & RAG System Design¶
Status: Draft for Review Author: Claude + Human Date: 2025-12-31 Scope: palimpsest context injection and retrieval augmented generation
Executive Summary¶
The RPG DM agent needs context about the campaign world to generate accurate, immersive responses. This document analyzes our current context injection system, identifies gaps, and proposes a hybrid architecture combining deterministic injection with semantic RAG retrieval.
Key Question: When a player says "I go to the market," how does the DM know: - If we've been there before? - Who the vendors are? - What the setting looks like? - Whether to use existing canon or generate new content?
Table of Contents¶
- Current Architecture
- Problem Analysis
- Design Considerations
- Proposed Hybrid Architecture
- Trigger Detection System
- Implementation Phases
- Cost Analysis
- Trade-offs & Risks
- Open Questions
1. Current Architecture¶
1.1 Deterministic Context Injection¶
Location: api/context/builder.py
Every message to the DM agent is prefixed with deterministic context pulled directly from vault files:
## SESSION CONTEXT: Seagate
**Day 5**
### Player Character
**Jake** (Wizard)
- HP: 13/13
- Location: the-salty-sigil
- Gold: 45 gp
### Current Location
**The Salty Sigil** - A curio shop dealing in rare magical components
### NPCs Present
- **Marlena** (Component dealer) [active] - intimate-ally
### Active Storylines
- **[URGENT]** The Fading Muffle
The stolen token thrums faintly—someone's scrying | DEADLINE: 36 hours
What's Injected:
| Data | Source | Always/Conditional |
|------|--------|-------------------|
| PC state (HP, gold, conditions) | canon/pcs/*.md frontmatter | Always |
| Current location name/description | canon/locations/*.md | Always |
| NPCs at current location | canon/npcs/*.md filtered by location | Always |
| Active threads (high/medium) | canon/open-threads.md | Always |
| Recent timeline events | canon/timeline.md | Always |
1.2 Tool-Based Retrieval¶
Location: api/agent/tools.py
The DM agent has tools it can call during generation:
| Tool | Purpose | When Used |
|---|---|---|
get_entity(type, id) |
Fetch full NPC/location/item details | When player mentions entity not in context |
search_entities(query) |
Text search across entities | When looking for something by description |
get_session_context() |
Refresh full context | Start of session |
Problem: The DM must recognize it needs to call these tools. If it doesn't realize "the market" is a known location, it will generate a new one instead of retrieving canon.
1.3 RAG Infrastructure (New)¶
Location: api/rag/
We now have vector embeddings infrastructure:
pgvector database (palimpsest namespace)
├── embeddings table (150 chunks for seagate)
│ ├── NPCs (8 docs, 18 chunks)
│ ├── Locations (11 docs, 20 chunks)
│ ├── Items (8 docs, 16 chunks)
│ ├── Sessions (5 docs, 51 chunks)
│ ├── Secrets (3 docs, 3 chunks)
│ └── Plots (6 docs, 42 chunks)
├── rag_config table (per-campaign settings)
└── search_similar() function
API Endpoints:
- POST /campaigns/{id}/rag/index - Index vault documents
- POST /campaigns/{id}/rag/search - Semantic similarity search
- GET /campaigns/{id}/rag/stats - Indexing statistics
- GET/PATCH /campaigns/{id}/rag/config - Toggle RAG per campaign
Not Yet Integrated: RAG is available but not wired into the chat flow.
2. Problem Analysis¶
2.1 The Market Problem¶
Scenario: Player says "I go to the market"
Current Behavior:
1. Context injection includes current location (the-salty-sigil)
2. Context does NOT include "the market" details
3. DM agent sees no market info in context
4. DM might:
- Generate a new market (wrong if we have canon)
- Call search_entities("market") (correct but unreliable)
- Ask player to clarify (breaks immersion)
Desired Behavior: 1. System detects "market" as a potential location 2. RAG searches for market-related content 3. If found: inject canon details before DM generates 4. If not found: DM generates new content, system indexes it
2.2 Context Gaps¶
| Gap | Example | Impact |
|---|---|---|
| Locations not at current position | "What's the Gilded Quill like?" | DM may invent contradictory details |
| NPCs not present | "Where is Jorah?" | DM guesses instead of checking canon |
| Historical events | "What happened at the grotto?" | DM lacks session history |
| Secrets/plots | "Why does Vorlan want the stone?" | DM can't access _gm/secrets |
| Item details | "What does my token do?" | DM may forget item properties |
| Relationships | "How does Marlena know Elara?" | Relationship graph not surfaced |
2.3 Why Tools Alone Don't Solve This¶
- Recognition Failure: DM doesn't know what it doesn't know
- Latency: Tool calls happen mid-generation, causing delays
- Inconsistency: Sometimes calls tools, sometimes generates
- No Preemption: Can't inject context before generation starts
3. Design Considerations¶
3.1 Design Principles¶
- Canon Supremacy: Existing vault content always takes precedence over generation
- Immersion First: Context retrieval should be invisible to player
- Fail Safe: When uncertain, retrieve more context rather than less
- Cost Aware: Embedding API calls cost money; be strategic
- Latency Budget: Total context build should be <500ms
3.2 Context Categories¶
| Category | Injection Strategy | Rationale |
|---|---|---|
| Critical State | Always deterministic | PC HP, location, conditions must never be wrong |
| Immediate Environment | Always deterministic | Current location, present NPCs are always relevant |
| Active Narrative | Always deterministic | Threads, deadlines drive gameplay |
| Referenced Entities | RAG on mention | Only retrieve what player asks about |
| Historical Context | RAG on trigger | Session history is huge, retrieve selectively |
| GM Secrets | RAG with filtering | Only surface secrets player could know |
3.3 Trigger Types¶
When should RAG activate?
| Trigger | Example | Search Strategy |
|---|---|---|
| Explicit Question | "Who is Vorlan?" | Search NPCs for "Vorlan" |
| Location Reference | "I go to the market" | Search locations for "market" |
| Historical Reference | "What happened last time?" | Search sessions for recent events |
| Item Reference | "I use the token" | Search items for "token" |
| Relationship Query | "Does Marlena know Elara?" | Search NPCs for both names |
| Topic Keyword | "Tell me about the plague" | Search all for "plague" |
3.4 What NOT to RAG¶
- Current location details - Already injected deterministically
- Present NPCs - Already injected deterministically
- PC state - Never from RAG, always from frontmatter
- Active threads - Already injected deterministically
- Generic actions - "I attack" doesn't need retrieval
4. Proposed Hybrid Architecture¶
4.1 Architecture Diagram¶
┌─────────────────────────────────────────────────────────────────┐
│ User Message │
│ "I head to the market" │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ Trigger Analyzer │
│ ┌─────────────┐ ┌─────────────┐ ┌─────────────┐ │
│ │ Entity │ │ Location │ │ Historical │ │
│ │ Extractor │ │ Detector │ │ Trigger │ │
│ └─────────────┘ └─────────────┘ └─────────────┘ │
│ │ │ │ │
│ └────────────────┼────────────────┘ │
│ ▼ │
│ Trigger List: ["market"] │
└─────────────────────────────────────────────────────────────────┘
│
┌───────────────┴───────────────┐
▼ ▼
┌──────────────────────┐ ┌──────────────────────┐
│ Deterministic Build │ │ RAG Retrieval │
│ (Always Runs) │ │ (If Triggers Found) │
│ │ │ │
│ • PC State │ │ Query: "market" │
│ • Current Location │ │ → Location match │
│ • NPCs Present │ │ → NPC vendors │
│ • Active Threads │ │ → Session mentions │
│ • Recent Events │ │ │
└──────────────────────┘ └──────────────────────┘
│ │
└───────────────┬───────────────┘
▼
┌─────────────────────────────────────────────────────────────────┐
│ Context Assembly │
│ │
│ ## SESSION CONTEXT │
│ [Deterministic: PC, Location, NPCs, Threads] │
│ │
│ ## RETRIEVED CONTEXT │
│ [RAG: Market location details, vendor NPCs] │
│ │
│ --- │
│ PLAYER: I head to the market │
└─────────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────────┐
│ DM Agent │
│ │
│ Has full context → Generates accurate response │
│ Uses canon market details, not invented ones │
└─────────────────────────────────────────────────────────────────┘
4.2 Context Assembly Order¶
CONTEXT_TEMPLATE = """
## SESSION CONTEXT: {campaign_name}
**Day {day}**{time}
### Player Character
{pc_block}
### Current Location
{location_block}
### NPCs Present
{npcs_block}
### Active Storylines
{threads_block}
{retrieved_context}
---
PLAYER: {user_message}
"""
Retrieved Context Section (when present):
## Retrieved Context
The following information was retrieved based on your message:
### [LOCATION] The Morning Market
A bustling open-air market in Seagate's merchant quarter...
### [NPC] Old Tam (Fish vendor)
A weathered fisherman who sells the morning catch...
4.3 Deduplication¶
RAG results must not duplicate deterministic context:
def dedupe_rag_results(rag_results, deterministic_ctx):
"""Remove RAG results that duplicate deterministic context."""
# Get IDs already in deterministic context
present_npcs = {npc.id for npc in deterministic_ctx.npcs_present}
current_loc = deterministic_ctx.location
# Filter RAG results
filtered = []
for result in rag_results:
if result.source_type == "npc" and result.source_id in present_npcs:
continue # Already in "NPCs Present"
if result.source_type == "location" and result.source_id == current_loc:
continue # Already in "Current Location"
filtered.append(result)
return filtered
5. Trigger Detection System¶
5.1 Trigger Types & Patterns¶
class TriggerType(Enum):
ENTITY_QUESTION = "entity_question" # "Who is X?"
LOCATION_MOVEMENT = "location_movement" # "I go to X"
LOCATION_QUERY = "location_query" # "What's X like?"
HISTORICAL = "historical" # "What happened..."
ITEM_REFERENCE = "item_reference" # "I use the X"
TOPIC_QUERY = "topic_query" # "Tell me about X"
TRIGGER_PATTERNS = {
TriggerType.ENTITY_QUESTION: [
r"who is (\w+)",
r"where is (\w+)",
r"what does (\w+) want",
r"how is (\w+)",
],
TriggerType.LOCATION_MOVEMENT: [
r"i (?:go|head|walk|travel|return) to (?:the )?(.+?)(?:\.|$)",
r"let'?s (?:go|head) to (?:the )?(.+?)(?:\.|$)",
r"i want to visit (?:the )?(.+?)(?:\.|$)",
],
TriggerType.LOCATION_QUERY: [
r"what(?:'s| is) (?:the )?(.+?) like",
r"describe (?:the )?(.+)",
r"tell me about (?:the )?(.+?)(?:\.|$)",
],
TriggerType.HISTORICAL: [
r"what happened (?:at|in|to|with) (.+)",
r"last time (?:at|in|we) (.+)",
r"remember (?:when|the) (.+)",
r"previously (.+)",
],
TriggerType.ITEM_REFERENCE: [
r"i (?:use|examine|look at|inspect) (?:my |the )?(.+)",
r"what does (?:my |the )?(.+?) do",
],
}
5.2 Entity Extraction¶
Beyond regex, extract proper nouns:
def extract_entities(message: str, known_context: set[str]) -> list[str]:
"""Extract potential entity names from message."""
entities = []
# Capitalized words (potential proper nouns)
caps = re.findall(r'\b([A-Z][a-z]+(?:\s+[A-Z][a-z]+)*)\b', message)
for word in caps:
word_lower = word.lower()
# Skip if already in deterministic context
if word_lower in known_context:
continue
# Skip common words
if word_lower in COMMON_WORDS:
continue
entities.append(word)
return entities
5.3 Search Strategy by Trigger¶
| Trigger Type | Source Types | Limit | Threshold |
|---|---|---|---|
| Entity Question | npc, faction | 3 | 0.5 |
| Location Movement | location | 3 | 0.5 |
| Location Query | location, session | 5 | 0.4 |
| Historical | session, plot | 5 | 0.4 |
| Item Reference | item | 2 | 0.5 |
| Topic Query | all | 5 | 0.4 |
5.4 Confidence Scoring¶
Not all triggers should fire RAG:
def should_rag(trigger: str, trigger_type: TriggerType, ctx: SessionContext) -> bool:
"""Decide if RAG is warranted for this trigger."""
# High confidence triggers - always RAG
if trigger_type in [TriggerType.ENTITY_QUESTION, TriggerType.LOCATION_MOVEMENT]:
return True
# Check if entity exists in vault (fast file check)
if vault_entity_exists(trigger):
return True
# Low confidence + no vault match - skip RAG
# Let DM generate or ask for clarification
return False
6. Implementation Phases¶
Phase 1: Basic Integration (MVP)¶
Effort: 4-6 hours
- Wire RAG retriever into chat route
- Simple trigger detection (location movement + entity questions)
- Inject RAG results after deterministic context
- Add logging for retrieval decisions
Deliverables:
- Modified api/routes/chat.py
- New api/context/hybrid.py
- Logging for debugging
Phase 2: Smart Triggers¶
Effort: 4-6 hours
- Full trigger pattern matching
- Entity extraction from message
- Confidence scoring
- Deduplication with deterministic context
Deliverables:
- api/context/triggers.py
- Enhanced trigger patterns
- Test suite for trigger detection
Phase 3: Feedback Loop¶
Effort: 6-8 hours
- Track when DM calls
get_entityfor something not in context - Auto-suggest indexing for new entities
- Log retrieval quality (did RAG help?)
- Dashboard for RAG effectiveness
Deliverables: - Retrieval analytics - Auto-index suggestions - Quality metrics
Phase 4: Advanced Features¶
Effort: 8-12 hours
- Relationship graph queries ("How does X know Y?")
- Temporal awareness ("What happened on Day 3?")
- Secret filtering (only surface discoverable secrets)
- Multi-turn context (remember what was retrieved last turn)
Deliverables: - Relationship index - Temporal search - Secret visibility rules
7. Cost Analysis¶
7.1 Embedding Costs (OpenAI text-embedding-3-small)¶
| Operation | Tokens | Cost |
|---|---|---|
| Index 1 NPC (~500 chars) | ~125 | $0.0000025 |
| Index 1 Session (~2000 chars) | ~500 | $0.00001 |
| Index full campaign (41 docs) | ~15,000 | $0.0003 |
| Query embedding | ~20 | $0.0000004 |
Monthly Estimate (heavy usage): - 100 chat messages/day × 2 RAG queries each = 200 queries - 200 queries × 30 days × $0.0000004 = $0.0024/month - Re-indexing weekly: 4 × $0.0003 = $0.0012/month - Total: ~$0.004/month (negligible)
7.2 Latency Budget¶
| Operation | Target | Actual |
|---|---|---|
| Trigger analysis | <10ms | ~5ms |
| Embedding API call | <200ms | ~150ms |
| pgvector search | <50ms | ~20ms |
| Context assembly | <10ms | ~5ms |
| Total RAG overhead | <300ms | ~180ms |
7.3 When to Skip RAG¶
To minimize latency for simple messages:
SKIP_RAG_PATTERNS = [
r"^(yes|no|ok|sure|thanks)\.?$", # Simple responses
r"^i (attack|cast|roll)", # Combat actions
r"^<.+>$", # OOC messages
]
8. Trade-offs & Risks¶
8.1 Architecture Trade-offs¶
| Approach | Pros | Cons |
|---|---|---|
| Deterministic Only | Fast, free, predictable | Misses referenced entities |
| RAG Only | Comprehensive | Slow, may miss obvious context |
| Tool-Based Only | DM decides what to fetch | Unreliable recognition |
| Hybrid (Proposed) | Best coverage | More complex, ~200ms overhead |
8.2 Risks & Mitigations¶
| Risk | Impact | Mitigation |
|---|---|---|
| Over-retrieval | Bloated context, higher LLM costs | Strict token limits, deduplication |
| Under-retrieval | DM generates wrong info | Err on side of retrieval, log misses |
| Stale embeddings | Canon changes not reflected | Re-index on vault changes |
| False triggers | RAG for irrelevant terms | Confidence scoring, skip common words |
| Latency spikes | Slow responses | Timeout + fallback to no-RAG |
8.3 Failure Modes¶
RAG Unavailable:
try:
rag_context = await retriever.search(triggers)
except Exception as e:
logger.warning(f"RAG failed, continuing without: {e}")
rag_context = "" # Graceful degradation
Embedding API Down: - Fall back to deterministic-only - Log for alerting - Retry on next message
9. Open Questions¶
For Peer Review¶
- Trigger Sensitivity: How aggressive should trigger detection be?
- Conservative: Only explicit questions ("Who is X?")
- Aggressive: Any proper noun not in context
-
Recommendation: Start conservative, tune based on logs
-
Secret Handling: Should RAG ever surface
_gm/secrets? - Option A: Never (DM decides via tools)
- Option B: Only if player could have discovered it
-
Option C: Surface with [SECRET] tag for DM discretion
-
Multi-Turn Memory: Should we remember what was retrieved last turn?
- Pro: Avoids re-retrieving same context
- Con: Adds state complexity
-
Recommendation: Defer to Phase 4
-
Embedding Model: Stick with OpenAI or switch to local?
- OpenAI: $0.02/1M tokens, 1536 dims, excellent quality
- Nomic (local): Free, 768 dims, good quality
-
Recommendation: Keep OpenAI for now, evaluate local later
-
Index Freshness: When to re-index?
- On every vault write (real-time)
- On session end (batch)
- Manual trigger only
- Recommendation: Session end + manual for now
Appendix A: File Inventory¶
Current Implementation¶
api/
├── context/
│ └── builder.py # Deterministic context builder
├── rag/
│ ├── __init__.py
│ ├── db.py # pgvector connection pool
│ ├── embeddings.py # OpenAI embedding client
│ ├── chunker.py # Document chunker
│ ├── indexer.py # Vault indexer
│ └── retriever.py # Similarity search
├── routes/
│ ├── chat.py # Chat endpoint (needs hybrid integration)
│ └── rag.py # RAG management endpoints
└── agent/
├── dm.py # DM agent definition
└── tools.py # Agent tools (get_entity, etc.)
Proposed Additions¶
api/
├── context/
│ ├── builder.py # (existing)
│ ├── hybrid.py # NEW: Hybrid context builder
│ └── triggers.py # NEW: Trigger detection
Appendix B: Example Flows¶
Flow 1: Simple Movement (No RAG)¶
User: "I walk over to Marlena"
Trigger Analysis:
- "Marlena" detected
- Marlena IS in NPCs Present
- No RAG needed
Context: [Deterministic only]
DM Response: Uses existing Marlena context
Flow 2: Location Movement (RAG Triggered)¶
User: "I head to the market"
Trigger Analysis:
- Location movement detected: "market"
- "market" NOT in current location
- RAG triggered
RAG Search: "market" in locations
- Result: "morning-market" (similarity: 0.72)
Context: [Deterministic] + [RAG: Morning Market details]
DM Response: Uses canon market description
Flow 3: Entity Question (RAG Triggered)¶
User: "Where is Jorah?"
Trigger Analysis:
- Entity question detected: "Jorah"
- Jorah NOT in NPCs Present
- RAG triggered
RAG Search: "Jorah" in npcs
- Result: "jorah" NPC file (similarity: 0.85)
- Includes: location: gilded-quill
Context: [Deterministic] + [RAG: Jorah at Gilded Quill]
DM Response: "Jorah is at the Gilded Quill..."
Flow 4: New Entity (No RAG Match)¶
User: "I look for a blacksmith"
Trigger Analysis:
- Location query detected: "blacksmith"
- RAG triggered
RAG Search: "blacksmith" in locations, npcs
- Result: No matches above threshold
Context: [Deterministic only]
DM Response: Generates new blacksmith (can be indexed later)
Revision History¶
| Version | Date | Author | Changes |
|---|---|---|---|
| 0.1 | 2025-12-31 | Claude | Initial draft |