AnythingLLM - RAG Platform¶

AnythingLLM is a full-stack RAG (Retrieval-Augmented Generation) platform that enables intelligent document search and AI-powered conversations with your data. It replaces SurfSense as our self-hosted alternative to Glean, NotebookLM, and Perplexity.

Overview¶

Property	Value
URL	https://ai.bogocat.com
Namespace	`anythingllm`
Auth	Authentik forward auth
Source	GitHub - Mintplex-Labs/anything-llm

Architecture¶

┌──────────────────────────────────────────────────────────────────────┐
│                         K8s Cluster                                  │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │  Namespace: anythingllm                                        │  │
│  │                                                                │  │
│  │  ┌─────────────────────────────────────────────────────────┐   │  │
│  │  │                    Pod: anythingllm                     │   │  │
│  │  │  ┌─────────────────┐    ┌────────────────────────────┐  │   │  │
│  │  │  │  AnythingLLM    │    │  git-sync (sidecar)        │  │   │  │
│  │  │  │  Port: 3001     │    │  Syncs tower-fleet/docs    │  │   │  │
│  │  │  │                 │    │  every 5 minutes           │  │   │  │
│  │  │  └────────┬────────┘    └──────────┬─────────────────┘  │   │  │
│  │  │           │                        │                    │   │  │
│  │  │           └────────┬───────────────┘                    │   │  │
│  │  │                    ▼                                    │   │  │
│  │  │         ┌─────────────────────┐                         │   │  │
│  │  │         │  emptyDir volume    │  /synced-docs           │   │  │
│  │  │         │  (shared docs)      │  tower-fleet docs       │   │  │
│  │  │         └─────────────────────┘                         │   │  │
│  │  └─────────────────────────────────────────────────────────┘   │  │
│  │                                                                │  │
│  │  ┌─────────────────────┐                                       │  │
│  │  │  PVC (20Gi)         │  Longhorn - persistent storage        │  │
│  │  │  /app/server/storage│  Vector DB, embeddings, config        │  │
│  │  └─────────────────────┘                                       │  │
│  └────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────┘
                            │
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
┌────────────────────┐ ┌──────────┐ ┌──────────────────┐
│ Windows PC         │ │ GitHub   │ │ Authentik        │
│ 10.89.97.100:8080  │ │ tower-   │ │ auth.bogocat.com │
│ llama.cpp server   │ │ fleet    │ │ Forward auth     │
│ RTX 3080 (10GB)    │ │ repo     │ │                  │
│ - Qwen3-8B (chat)  │ │          │ │                  │
│ - nomic-embed-text │ │          │ │                  │
└────────────────────┘ └──────────┘ └──────────────────┘

Features¶

Multi-user Workspaces: Isolate documents and conversations by workspace
Header-based Auth: Works with Authentik forward auth (no separate login)
Multiple LLM Backends: Ollama, OpenAI, Anthropic, Mistral, llama.cpp, and more
Hybrid Search: Vector embeddings + keyword search
Document Processing: PDF, DOCX, TXT, and web pages
Chat Interface: Ask questions about your documents
API Access: REST API for programmatic access

Deployment¶

K8s Manifests¶

Location: /root/tower-fleet/manifests/apps/anythingllm/

# Deploy
/root/tower-fleet/scripts/deploy-anythingllm.sh

# Check status
kubectl get pods -n anythingllm
kubectl get ingress -n anythingllm

Authentik Setup¶

Prerequisites:

Create Proxy Provider in Authentik:
Name: anythingllm-provider
Authorization flow: default-provider-authorization-implicit-consent
External host: https://ai.bogocat.com
Mode: Forward auth (single application)
Create Application:
Name: AnythingLLM
Slug: anythingllm
Provider: anythingllm-provider
Bind to appropriate group (e.g., authentik Admins)
Assign to Embedded Outpost:
Go to Applications > Outposts > authentik Embedded Outpost
Add AnythingLLM application

Traffic Flow¶

Internet → VPS Caddy → K8s Ingress → Authentik auth check → AnythingLLM
           ai.bogocat.com            /outpost.goauthentik.io/auth/nginx

Configuration¶

LLM Backend (Chat)¶

Configure via the AnythingLLM web UI after first login:

Navigate to Settings > LLM Preference
Select provider: Generic OpenAI
Base URL: http://10.89.97.100:8080/v1
API Key: not-needed (any non-empty string)
Model: local (name is ignored, uses loaded model)

Note: This uses llama.cpp server running on the Windows PC. Default model is Qwen3-8B.

Embedding Model¶

Embeddings require switching models since llama.cpp loads one model at a time.

Setup: 1. Navigate to Settings > Embedding Preference 2. Select: Generic OpenAI 3. Base URL: http://10.89.97.100:8080/v1 4. API Key: not-needed 5. Model: nomic-embed-text-v1.5

Workflow for re-indexing documents:

# From Proxmox host
llama-model embed-docs
# This swaps to nomic-embed-text, waits for you to trigger indexing,
# then restores the chat model when you press ENTER

See Local LLM Setup for details.

Storage¶

Data persists in a 20Gi PVC: - /app/server/storage - Documents, vector DB, config

# Check PVC
kubectl get pvc -n anythingllm

Document Sync Architecture¶

The Problem¶

RAG systems require keeping indexed documents in sync with source data. Without proper sync: - Answers become stale as documentation updates - Users lose trust when AI gives outdated information - Manual re-indexing is tedious and error-prone

Industry Standard Patterns¶

Pattern	Description	Complexity	When to Use
Webhook + Job	Git push triggers K8s Job to re-index	Medium	High-volume, CI/CD integrated
Scheduled Cron	Periodic full re-index	Low	Low change frequency, simple setup
Sidecar Sync	Container pulls repo, app watches folder	Low	Git-based docs, K8s native
Event Stream	Kafka/message queue for changes	High	Enterprise, real-time requirements

Why We Chose: git-sync Sidecar¶

For tower-fleet documentation, the sidecar pattern is optimal because:

K8s Native: Runs alongside AnythingLLM in same pod, no external dependencies
Git-Based Source: tower-fleet docs are in GitHub, git-sync handles this natively
Delta Sync: git-sync only fetches changed files, not full repo each time
Simplicity: No webhooks, no external jobs, no message queues
Proven: Same pattern used for OtterWiki documentation sync
Self-Healing: If sync fails, it retries automatically

Trade-offs accepted: - 5-minute sync delay (acceptable for documentation) - emptyDir volume is ephemeral (docs re-sync on pod restart, ~10 seconds)

Alternatives Considered¶

Alternative	Why Not Chosen
Webhook + Job	Overkill for ~160 docs, adds complexity
Manual Upload	Doesn't scale, easy to forget
Mounted PVC with cron	Requires separate cronjob, more moving parts
GitHub Actions	External dependency, network issues affect sync

How It Works¶

┌─────────────────────────────────────────────────────────────┐
│  Every 5 minutes:                                           │
│                                                             │
│  1. git-sync checks GitHub for changes                      │
│  2. If changes detected, pulls only changed files           │
│  3. Updates /synced-docs/current symlink atomically         │
│  4. AnythingLLM folder watch detects changes                │
│  5. Changed documents re-indexed automatically              │
└─────────────────────────────────────────────────────────────┘

Configuration¶

git-sync sidecar settings: - Repository: git@github.com:jakecelentano/tower-fleet.git - Sync period: 300 seconds (5 minutes) - Mount path: /synced-docs - Docs available at: /synced-docs/current/docs/

AnythingLLM folder watch: Configure in UI: Settings > Data Connectors > Watch Folder - Path: /synced-docs/current/docs - This auto-indexes new/changed documents

Verify Sync is Working¶

# Check git-sync logs
kubectl logs -n anythingllm deployment/anythingllm -c git-sync --tail=20

# Verify docs are synced
kubectl exec -n anythingllm deployment/anythingllm -c anythingllm -- ls /synced-docs/current/docs/

# Check last sync time
kubectl logs -n anythingllm deployment/anythingllm -c git-sync | grep "updated successfully"

Operations¶

View Logs¶

# AnythingLLM main container
kubectl logs -f deployment/anythingllm -n anythingllm -c anythingllm

# git-sync sidecar
kubectl logs -f deployment/anythingllm -n anythingllm -c git-sync

# Both containers
kubectl logs -f deployment/anythingllm -n anythingllm --all-containers

Restart¶

kubectl rollout restart deployment/anythingllm -n anythingllm

Update Image¶

kubectl set image deployment/anythingllm anythingllm=mintplexlabs/anythingllm:latest -n anythingllm
kubectl rollout status deployment/anythingllm -n anythingllm

Check llama.cpp Connectivity¶

# From Proxmox host
llama-model status

# From a debug pod in K8s
kubectl run -it --rm debug --image=curlimages/curl -n anythingllm -- curl http://10.89.97.100:8080/v1/models

Troubleshooting¶

Redirected to /onboarding Every Time¶

Cause: AnythingLLM requires completing the full onboarding flow once, including setting a password. This creates an admin user in the database. Without this user, the app redirects to onboarding on every visit.

Fix: Complete the onboarding flow once: 1. Go through the setup wizard 2. Set a password when prompted (required even with Authentik forward auth) 3. Complete the LLM/embedding configuration 4. After this, sessions persist normally

Why this happens: AnythingLLM doesn't have true header-based SSO (GitHub Issue #696). Authentik forward auth gates access but AnythingLLM still manages its own user/session state internally.

Note: The password you set isn't used for login (Authentik handles that) - it just satisfies AnythingLLM's requirement for a configured admin user.

401 Unauthorized on Access¶

Authentik forward auth is not configured. Check:

Proxy Provider exists with correct external host
Application is created and linked to provider
Application is assigned to embedded outpost
User is in allowed group

LLM Responses Slow¶

Check llama.cpp server on Windows PC:

llama-model status
ssh jakec@10.89.97.100 "nvidia-smi"  # Check VRAM usage

Embeddings Not Working¶

Ensure embedding model is loaded:

# Check current model
llama-model current

# If not embedding model, switch to it
llama-model embed-docs

Pod CrashLoopBackOff¶

Check logs and events:

kubectl logs deployment/anythingllm -n anythingllm
kubectl describe pod -n anythingllm -l app.kubernetes.io/name=anythingllm

Why AnythingLLM?¶

AnythingLLM was chosen over SurfSense because:

Authentik Integration: Supports header-based auth for seamless SSO
Stable Docker Image: Single image that works out of the box
Multi-user: Workspace isolation for different document collections
Active Development: Regular updates, responsive maintainers
llama.cpp Support: Can use faster llama.cpp backend instead of Ollama

See SurfSense (deprecated) for why we moved away from the previous solution.

Resources¶

Source: https://github.com/Mintplex-Labs/anything-llm
Docs: https://docs.useanything.com/
Docker Hub: https://hub.docker.com/r/mintplexlabs/anythingllm