AnythingLLM - RAG Platform¶
AnythingLLM is a full-stack RAG (Retrieval-Augmented Generation) platform that enables intelligent document search and AI-powered conversations with your data. It replaces SurfSense as our self-hosted alternative to Glean, NotebookLM, and Perplexity.
Overview¶
| Property | Value |
|---|---|
| URL | https://ai.bogocat.com |
| Namespace | anythingllm |
| Auth | Authentik forward auth |
| Source | GitHub - Mintplex-Labs/anything-llm |
Architecture¶
┌──────────────────────────────────────────────────────────────────────┐
│ K8s Cluster │
│ ┌────────────────────────────────────────────────────────────────┐ │
│ │ Namespace: anythingllm │ │
│ │ │ │
│ │ ┌─────────────────────────────────────────────────────────┐ │ │
│ │ │ Pod: anythingllm │ │ │
│ │ │ ┌─────────────────┐ ┌────────────────────────────┐ │ │ │
│ │ │ │ AnythingLLM │ │ git-sync (sidecar) │ │ │ │
│ │ │ │ Port: 3001 │ │ Syncs tower-fleet/docs │ │ │ │
│ │ │ │ │ │ every 5 minutes │ │ │ │
│ │ │ └────────┬────────┘ └──────────┬─────────────────┘ │ │ │
│ │ │ │ │ │ │ │
│ │ │ └────────┬───────────────┘ │ │ │
│ │ │ ▼ │ │ │
│ │ │ ┌─────────────────────┐ │ │ │
│ │ │ │ emptyDir volume │ /synced-docs │ │ │
│ │ │ │ (shared docs) │ tower-fleet docs │ │ │
│ │ │ └─────────────────────┘ │ │ │
│ │ └─────────────────────────────────────────────────────────┘ │ │
│ │ │ │
│ │ ┌─────────────────────┐ │ │
│ │ │ PVC (20Gi) │ Longhorn - persistent storage │ │
│ │ │ /app/server/storage│ Vector DB, embeddings, config │ │
│ │ └─────────────────────┘ │ │
│ └────────────────────────────────────────────────────────────────┘ │
└──────────────────────────────────────────────────────────────────────┘
│
┌─────────────┼─────────────┐
▼ ▼ ▼
┌────────────────────┐ ┌──────────┐ ┌──────────────────┐
│ Windows PC │ │ GitHub │ │ Authentik │
│ 10.89.97.100:8080 │ │ tower- │ │ auth.bogocat.com │
│ llama.cpp server │ │ fleet │ │ Forward auth │
│ RTX 3080 (10GB) │ │ repo │ │ │
│ - Qwen3-8B (chat) │ │ │ │ │
│ - nomic-embed-text │ │ │ │ │
└────────────────────┘ └──────────┘ └──────────────────┘
Features¶
- Multi-user Workspaces: Isolate documents and conversations by workspace
- Header-based Auth: Works with Authentik forward auth (no separate login)
- Multiple LLM Backends: Ollama, OpenAI, Anthropic, Mistral, llama.cpp, and more
- Hybrid Search: Vector embeddings + keyword search
- Document Processing: PDF, DOCX, TXT, and web pages
- Chat Interface: Ask questions about your documents
- API Access: REST API for programmatic access
Deployment¶
K8s Manifests¶
Location: /root/tower-fleet/manifests/apps/anythingllm/
# Deploy
/root/tower-fleet/scripts/deploy-anythingllm.sh
# Check status
kubectl get pods -n anythingllm
kubectl get ingress -n anythingllm
Authentik Setup¶
Prerequisites:
- Create Proxy Provider in Authentik:
- Name:
anythingllm-provider - Authorization flow: default-provider-authorization-implicit-consent
- External host:
https://ai.bogocat.com -
Mode: Forward auth (single application)
-
Create Application:
- Name:
AnythingLLM - Slug:
anythingllm - Provider:
anythingllm-provider -
Bind to appropriate group (e.g.,
authentik Admins) -
Assign to Embedded Outpost:
- Go to Applications > Outposts > authentik Embedded Outpost
- Add AnythingLLM application
Traffic Flow¶
Internet → VPS Caddy → K8s Ingress → Authentik auth check → AnythingLLM
ai.bogocat.com /outpost.goauthentik.io/auth/nginx
Configuration¶
LLM Backend (Chat)¶
Configure via the AnythingLLM web UI after first login:
- Navigate to Settings > LLM Preference
- Select provider: Generic OpenAI
- Base URL:
http://10.89.97.100:8080/v1 - API Key:
not-needed(any non-empty string) - Model:
local(name is ignored, uses loaded model)
Note: This uses llama.cpp server running on the Windows PC. Default model is Qwen3-8B.
Embedding Model¶
Embeddings require switching models since llama.cpp loads one model at a time.
Setup:
1. Navigate to Settings > Embedding Preference
2. Select: Generic OpenAI
3. Base URL: http://10.89.97.100:8080/v1
4. API Key: not-needed
5. Model: nomic-embed-text-v1.5
Workflow for re-indexing documents:
# From Proxmox host
llama-model embed-docs
# This swaps to nomic-embed-text, waits for you to trigger indexing,
# then restores the chat model when you press ENTER
See Local LLM Setup for details.
Storage¶
Data persists in a 20Gi PVC:
- /app/server/storage - Documents, vector DB, config
Document Sync Architecture¶
The Problem¶
RAG systems require keeping indexed documents in sync with source data. Without proper sync: - Answers become stale as documentation updates - Users lose trust when AI gives outdated information - Manual re-indexing is tedious and error-prone
Industry Standard Patterns¶
| Pattern | Description | Complexity | When to Use |
|---|---|---|---|
| Webhook + Job | Git push triggers K8s Job to re-index | Medium | High-volume, CI/CD integrated |
| Scheduled Cron | Periodic full re-index | Low | Low change frequency, simple setup |
| Sidecar Sync | Container pulls repo, app watches folder | Low | Git-based docs, K8s native |
| Event Stream | Kafka/message queue for changes | High | Enterprise, real-time requirements |
Why We Chose: git-sync Sidecar¶
For tower-fleet documentation, the sidecar pattern is optimal because:
- K8s Native: Runs alongside AnythingLLM in same pod, no external dependencies
- Git-Based Source: tower-fleet docs are in GitHub, git-sync handles this natively
- Delta Sync: git-sync only fetches changed files, not full repo each time
- Simplicity: No webhooks, no external jobs, no message queues
- Proven: Same pattern used for OtterWiki documentation sync
- Self-Healing: If sync fails, it retries automatically
Trade-offs accepted: - 5-minute sync delay (acceptable for documentation) - emptyDir volume is ephemeral (docs re-sync on pod restart, ~10 seconds)
Alternatives Considered¶
| Alternative | Why Not Chosen |
|---|---|
| Webhook + Job | Overkill for ~160 docs, adds complexity |
| Manual Upload | Doesn't scale, easy to forget |
| Mounted PVC with cron | Requires separate cronjob, more moving parts |
| GitHub Actions | External dependency, network issues affect sync |
How It Works¶
┌─────────────────────────────────────────────────────────────┐
│ Every 5 minutes: │
│ │
│ 1. git-sync checks GitHub for changes │
│ 2. If changes detected, pulls only changed files │
│ 3. Updates /synced-docs/current symlink atomically │
│ 4. AnythingLLM folder watch detects changes │
│ 5. Changed documents re-indexed automatically │
└─────────────────────────────────────────────────────────────┘
Configuration¶
git-sync sidecar settings:
- Repository: git@github.com:jakecelentano/tower-fleet.git
- Sync period: 300 seconds (5 minutes)
- Mount path: /synced-docs
- Docs available at: /synced-docs/current/docs/
AnythingLLM folder watch:
Configure in UI: Settings > Data Connectors > Watch Folder
- Path: /synced-docs/current/docs
- This auto-indexes new/changed documents
Verify Sync is Working¶
# Check git-sync logs
kubectl logs -n anythingllm deployment/anythingllm -c git-sync --tail=20
# Verify docs are synced
kubectl exec -n anythingllm deployment/anythingllm -c anythingllm -- ls /synced-docs/current/docs/
# Check last sync time
kubectl logs -n anythingllm deployment/anythingllm -c git-sync | grep "updated successfully"
Operations¶
View Logs¶
# AnythingLLM main container
kubectl logs -f deployment/anythingllm -n anythingllm -c anythingllm
# git-sync sidecar
kubectl logs -f deployment/anythingllm -n anythingllm -c git-sync
# Both containers
kubectl logs -f deployment/anythingllm -n anythingllm --all-containers
Restart¶
Update Image¶
kubectl set image deployment/anythingllm anythingllm=mintplexlabs/anythingllm:latest -n anythingllm
kubectl rollout status deployment/anythingllm -n anythingllm
Check llama.cpp Connectivity¶
# From Proxmox host
llama-model status
# From a debug pod in K8s
kubectl run -it --rm debug --image=curlimages/curl -n anythingllm -- curl http://10.89.97.100:8080/v1/models
Troubleshooting¶
Redirected to /onboarding Every Time¶
Cause: AnythingLLM requires completing the full onboarding flow once, including setting a password. This creates an admin user in the database. Without this user, the app redirects to onboarding on every visit.
Fix: Complete the onboarding flow once: 1. Go through the setup wizard 2. Set a password when prompted (required even with Authentik forward auth) 3. Complete the LLM/embedding configuration 4. After this, sessions persist normally
Why this happens: AnythingLLM doesn't have true header-based SSO (GitHub Issue #696). Authentik forward auth gates access but AnythingLLM still manages its own user/session state internally.
Note: The password you set isn't used for login (Authentik handles that) - it just satisfies AnythingLLM's requirement for a configured admin user.
401 Unauthorized on Access¶
Authentik forward auth is not configured. Check:
- Proxy Provider exists with correct external host
- Application is created and linked to provider
- Application is assigned to embedded outpost
- User is in allowed group
LLM Responses Slow¶
Check llama.cpp server on Windows PC:
Embeddings Not Working¶
Ensure embedding model is loaded:
# Check current model
llama-model current
# If not embedding model, switch to it
llama-model embed-docs
Pod CrashLoopBackOff¶
Check logs and events:
kubectl logs deployment/anythingllm -n anythingllm
kubectl describe pod -n anythingllm -l app.kubernetes.io/name=anythingllm
Why AnythingLLM?¶
AnythingLLM was chosen over SurfSense because:
- Authentik Integration: Supports header-based auth for seamless SSO
- Stable Docker Image: Single image that works out of the box
- Multi-user: Workspace isolation for different document collections
- Active Development: Regular updates, responsive maintainers
- llama.cpp Support: Can use faster llama.cpp backend instead of Ollama
See SurfSense (deprecated) for why we moved away from the previous solution.
Resources¶
- Source: https://github.com/Mintplex-Labs/anything-llm
- Docs: https://docs.useanything.com/
- Docker Hub: https://hub.docker.com/r/mintplexlabs/anythingllm