Playtest Findings: Ironvale Campaign¶

Date: 2025-12-29 LLM: DeepSeek (deepseek-chat) Campaign: ironvale (minimal seed - only campaign_id provided)

Summary¶

Successfully tested the MCP-based DM system with a minimal campaign seed. The DM generated a complete opening scene with vivid world-building, but there were issues with tool usage and campaign context adherence.

What Worked¶

1. MCP Connection Functional¶

Streamable-http transport works correctly
Tools are discovered and available (46 tools exposed)
Session management working after initial hiccups

2. Tool Execution Successful¶

create_campaign was called and completed successfully
AnythingLLM correctly invoked MCP server
Response included thoughts array showing tool execution

3. Rich Narrative Generation¶

Vivid opening scene at "The Broken Wheel Inn"
Multiple NPCs created narratively (Gareth, Lyra, Borin Stonefist)
PC backstory generated (Valerius, disgraced knight)
Immediate drama hook (wounded man, highwayman attack)

4. Response Format Followed¶

Appropriate length (~3000 characters)
5 action options as specified
Option 5 was "wild/unexpected" as required
Markdown formatting with headers and emphasis

Issues Found¶

Issue 1: Wrong campaign_id Used (HIGH)¶

Problem: DM created a new campaign campaign_001 instead of using ironvale

Evidence:

{
  "thoughts": [
    "@agent is executing `palimpsest-create_campaign` tool {
      \"campaign_id\": \"campaign_001\",  // Should be \"ironvale\"
      ...
    }"
  ]
}

Impact: New campaign created, existing ironvale campaign unused

Root Cause: DeepSeek didn't follow the system prompt instruction:

Always use campaign_id="ironvale" when calling tools.

Fix Needed: - Consider embedding campaign_id in workspace agent config - Make prompt instruction more emphatic - Or add a validation layer that rejects wrong campaign_ids

Issue 2: Entities Not Persisted (MEDIUM)¶

Problem: DM outputted JSON code blocks for entities instead of calling MCP tools

Example from response:

// This is just text output, NOT a tool call:
{
  "entity_id": "npc_gareth",
  "name": "Gareth",
  "type": "npc",
  ...
}

Impact: - NPCs only exist in narrative text - PC stats not tracked - Location not searchable - No persistence for future sessions

Root Cause: The prompt says "Use create_entity type=npc" but the LLM interpreted this as output format, not tool invocation

Fix Needed: - Reword prompt: "You MUST call the create_entity tool, not output JSON" - Add examples showing tool invocation vs text output - Consider workflow enforcement in AnythingLLM

Issue 3: Session Management Initial Failure (LOW)¶

Problem: First MCP connection attempts failed with:

TypeError: fetch failed
Bad Request: Missing session ID

Resolution: AnythingLLM restart cleared the issue

Root Cause: Stale MCP client state from previous failed connections

Fix Needed: - Document AnythingLLM restart as troubleshooting step - Consider connection health monitoring

Recommendations¶

Short-term Fixes¶

Reword tool usage instructions in system prompt:

CRITICAL: You must CALL the MCP tools, not output JSON code blocks.
When creating an NPC, call: create_entity(campaign_id="ironvale", type="npc", ...)
Do NOT output JSON - execute the tool.

Embed campaign_id constraint at multiple points in the prompt
Add session start validation - have DM call get_session_context first and verify campaign_id matches

Long-term Improvements¶

Campaign context injection - Add campaign_id to every tool call automatically
Tool usage examples in prompt showing correct invocation patterns
Workflow enforcement - AnythingLLM agent flow that requires certain tools in sequence
Model comparison - Test with Claude, GPT-4, Llama to compare tool adherence

Test Continuation¶

To continue the playtest properly: 1. Clear the errant campaign_001 campaign 2. Update the system prompt with clearer instructions 3. Reset ironvale workspace threads 4. Try again with stricter tool usage guidance

Metrics¶

Metric	Value
MCP Connection	Success (after restart)
Tool Invocation	1/4 expected (25%)
Narrative Quality	Excellent
Format Compliance	Good
Campaign Adherence	Failed
Entity Persistence	Failed