Skip to content

AnythingLLM - RAG Platform

AnythingLLM is a full-stack RAG (Retrieval-Augmented Generation) platform that enables intelligent document search and AI-powered conversations with your data. It replaces SurfSense as our self-hosted alternative to Glean, NotebookLM, and Perplexity.

Overview

Property Value
URL https://ai.bogocat.com
Namespace anythingllm
Auth Authentik forward auth
Source GitHub - Mintplex-Labs/anything-llm

Architecture

┌──────────────────────────────────────────────────────────────────────┐
│                         K8s Cluster                                  │
│  ┌────────────────────────────────────────────────────────────────┐  │
│  │  Namespace: anythingllm                                        │  │
│  │                                                                │  │
│  │  ┌─────────────────────────────────────────────────────────┐   │  │
│  │  │                    Pod: anythingllm                     │   │  │
│  │  │  ┌─────────────────┐    ┌────────────────────────────┐  │   │  │
│  │  │  │  AnythingLLM    │    │  git-sync (sidecar)        │  │   │  │
│  │  │  │  Port: 3001     │    │  Syncs tower-fleet/docs    │  │   │  │
│  │  │  │                 │    │  every 5 minutes           │  │   │  │
│  │  │  └────────┬────────┘    └──────────┬─────────────────┘  │   │  │
│  │  │           │                        │                    │   │  │
│  │  │           └────────┬───────────────┘                    │   │  │
│  │  │                    ▼                                    │   │  │
│  │  │         ┌─────────────────────┐                         │   │  │
│  │  │         │  emptyDir volume    │  /synced-docs           │   │  │
│  │  │         │  (shared docs)      │  tower-fleet docs       │   │  │
│  │  │         └─────────────────────┘                         │   │  │
│  │  └─────────────────────────────────────────────────────────┘   │  │
│  │                                                                │  │
│  │  ┌─────────────────────┐                                       │  │
│  │  │  PVC (20Gi)         │  Longhorn - persistent storage        │  │
│  │  │  /app/server/storage│  Vector DB, embeddings, config        │  │
│  │  └─────────────────────┘                                       │  │
│  └────────────────────────────────────────────────────────────────┘  │
└──────────────────────────────────────────────────────────────────────┘
              ┌─────────────┼─────────────┐
              ▼             ▼             ▼
┌────────────────────┐ ┌──────────┐ ┌──────────────────┐
│ Windows PC         │ │ GitHub   │ │ Authentik        │
│ 10.89.97.100:8080  │ │ tower-   │ │ auth.bogocat.com │
│ llama.cpp server   │ │ fleet    │ │ Forward auth     │
│ RTX 3080 (10GB)    │ │ repo     │ │                  │
│ - Qwen3-8B (chat)  │ │          │ │                  │
│ - nomic-embed-text │ │          │ │                  │
└────────────────────┘ └──────────┘ └──────────────────┘

Features

  • Multi-user Workspaces: Isolate documents and conversations by workspace
  • Header-based Auth: Works with Authentik forward auth (no separate login)
  • Multiple LLM Backends: Ollama, OpenAI, Anthropic, Mistral, llama.cpp, and more
  • Hybrid Search: Vector embeddings + keyword search
  • Document Processing: PDF, DOCX, TXT, and web pages
  • Chat Interface: Ask questions about your documents
  • API Access: REST API for programmatic access

Deployment

K8s Manifests

Location: /root/tower-fleet/manifests/apps/anythingllm/

# Deploy
/root/tower-fleet/scripts/deploy-anythingllm.sh

# Check status
kubectl get pods -n anythingllm
kubectl get ingress -n anythingllm

Authentik Setup

Prerequisites:

  1. Create Proxy Provider in Authentik:
  2. Name: anythingllm-provider
  3. Authorization flow: default-provider-authorization-implicit-consent
  4. External host: https://ai.bogocat.com
  5. Mode: Forward auth (single application)

  6. Create Application:

  7. Name: AnythingLLM
  8. Slug: anythingllm
  9. Provider: anythingllm-provider
  10. Bind to appropriate group (e.g., authentik Admins)

  11. Assign to Embedded Outpost:

  12. Go to Applications > Outposts > authentik Embedded Outpost
  13. Add AnythingLLM application

Traffic Flow

Internet → VPS Caddy → K8s Ingress → Authentik auth check → AnythingLLM
           ai.bogocat.com            /outpost.goauthentik.io/auth/nginx

Configuration

LLM Backend (Chat)

Configure via the AnythingLLM web UI after first login:

  1. Navigate to Settings > LLM Preference
  2. Select provider: Generic OpenAI
  3. Base URL: http://10.89.97.100:8080/v1
  4. API Key: not-needed (any non-empty string)
  5. Model: local (name is ignored, uses loaded model)

Note: This uses llama.cpp server running on the Windows PC. Default model is Qwen3-8B.

Embedding Model

Embeddings require switching models since llama.cpp loads one model at a time.

Setup: 1. Navigate to Settings > Embedding Preference 2. Select: Generic OpenAI 3. Base URL: http://10.89.97.100:8080/v1 4. API Key: not-needed 5. Model: nomic-embed-text-v1.5

Workflow for re-indexing documents:

# From Proxmox host
llama-model embed-docs
# This swaps to nomic-embed-text, waits for you to trigger indexing,
# then restores the chat model when you press ENTER

See Local LLM Setup for details.

Storage

Data persists in a 20Gi PVC: - /app/server/storage - Documents, vector DB, config

# Check PVC
kubectl get pvc -n anythingllm

Document Sync Architecture

The Problem

RAG systems require keeping indexed documents in sync with source data. Without proper sync: - Answers become stale as documentation updates - Users lose trust when AI gives outdated information - Manual re-indexing is tedious and error-prone

Industry Standard Patterns

Pattern Description Complexity When to Use
Webhook + Job Git push triggers K8s Job to re-index Medium High-volume, CI/CD integrated
Scheduled Cron Periodic full re-index Low Low change frequency, simple setup
Sidecar Sync Container pulls repo, app watches folder Low Git-based docs, K8s native
Event Stream Kafka/message queue for changes High Enterprise, real-time requirements

Why We Chose: git-sync Sidecar

For tower-fleet documentation, the sidecar pattern is optimal because:

  1. K8s Native: Runs alongside AnythingLLM in same pod, no external dependencies
  2. Git-Based Source: tower-fleet docs are in GitHub, git-sync handles this natively
  3. Delta Sync: git-sync only fetches changed files, not full repo each time
  4. Simplicity: No webhooks, no external jobs, no message queues
  5. Proven: Same pattern used for OtterWiki documentation sync
  6. Self-Healing: If sync fails, it retries automatically

Trade-offs accepted: - 5-minute sync delay (acceptable for documentation) - emptyDir volume is ephemeral (docs re-sync on pod restart, ~10 seconds)

Alternatives Considered

Alternative Why Not Chosen
Webhook + Job Overkill for ~160 docs, adds complexity
Manual Upload Doesn't scale, easy to forget
Mounted PVC with cron Requires separate cronjob, more moving parts
GitHub Actions External dependency, network issues affect sync

How It Works

┌─────────────────────────────────────────────────────────────┐
│  Every 5 minutes:                                           │
│                                                             │
│  1. git-sync checks GitHub for changes                      │
│  2. If changes detected, pulls only changed files           │
│  3. Updates /synced-docs/current symlink atomically         │
│  4. AnythingLLM folder watch detects changes                │
│  5. Changed documents re-indexed automatically              │
└─────────────────────────────────────────────────────────────┘

Configuration

git-sync sidecar settings: - Repository: git@github.com:jakecelentano/tower-fleet.git - Sync period: 300 seconds (5 minutes) - Mount path: /synced-docs - Docs available at: /synced-docs/current/docs/

AnythingLLM folder watch: Configure in UI: Settings > Data Connectors > Watch Folder - Path: /synced-docs/current/docs - This auto-indexes new/changed documents

Verify Sync is Working

# Check git-sync logs
kubectl logs -n anythingllm deployment/anythingllm -c git-sync --tail=20

# Verify docs are synced
kubectl exec -n anythingllm deployment/anythingllm -c anythingllm -- ls /synced-docs/current/docs/

# Check last sync time
kubectl logs -n anythingllm deployment/anythingllm -c git-sync | grep "updated successfully"

Operations

View Logs

# AnythingLLM main container
kubectl logs -f deployment/anythingllm -n anythingllm -c anythingllm

# git-sync sidecar
kubectl logs -f deployment/anythingllm -n anythingllm -c git-sync

# Both containers
kubectl logs -f deployment/anythingllm -n anythingllm --all-containers

Restart

kubectl rollout restart deployment/anythingllm -n anythingllm

Update Image

kubectl set image deployment/anythingllm anythingllm=mintplexlabs/anythingllm:latest -n anythingllm
kubectl rollout status deployment/anythingllm -n anythingllm

Check llama.cpp Connectivity

# From Proxmox host
llama-model status

# From a debug pod in K8s
kubectl run -it --rm debug --image=curlimages/curl -n anythingllm -- curl http://10.89.97.100:8080/v1/models

Troubleshooting

Redirected to /onboarding Every Time

Cause: AnythingLLM requires completing the full onboarding flow once, including setting a password. This creates an admin user in the database. Without this user, the app redirects to onboarding on every visit.

Fix: Complete the onboarding flow once: 1. Go through the setup wizard 2. Set a password when prompted (required even with Authentik forward auth) 3. Complete the LLM/embedding configuration 4. After this, sessions persist normally

Why this happens: AnythingLLM doesn't have true header-based SSO (GitHub Issue #696). Authentik forward auth gates access but AnythingLLM still manages its own user/session state internally.

Note: The password you set isn't used for login (Authentik handles that) - it just satisfies AnythingLLM's requirement for a configured admin user.

401 Unauthorized on Access

Authentik forward auth is not configured. Check:

  1. Proxy Provider exists with correct external host
  2. Application is created and linked to provider
  3. Application is assigned to embedded outpost
  4. User is in allowed group

LLM Responses Slow

Check llama.cpp server on Windows PC:

llama-model status
ssh jakec@10.89.97.100 "nvidia-smi"  # Check VRAM usage

Embeddings Not Working

Ensure embedding model is loaded:

# Check current model
llama-model current

# If not embedding model, switch to it
llama-model embed-docs

Pod CrashLoopBackOff

Check logs and events:

kubectl logs deployment/anythingllm -n anythingllm
kubectl describe pod -n anythingllm -l app.kubernetes.io/name=anythingllm

Why AnythingLLM?

AnythingLLM was chosen over SurfSense because:

  1. Authentik Integration: Supports header-based auth for seamless SSO
  2. Stable Docker Image: Single image that works out of the box
  3. Multi-user: Workspace isolation for different document collections
  4. Active Development: Regular updates, responsive maintainers
  5. llama.cpp Support: Can use faster llama.cpp backend instead of Ollama

See SurfSense (deprecated) for why we moved away from the previous solution.

Resources

  • Source: https://github.com/Mintplex-Labs/anything-llm
  • Docs: https://docs.useanything.com/
  • Docker Hub: https://hub.docker.com/r/mintplexlabs/anythingllm