Book Downloader & Calibre¶
Overview¶
The book downloader is an IRC-based automation system for acquiring and managing ebooks. It provides a web interface for searching IRC channels, downloading books via DCC, and automatically importing them into a Calibre library.
Current Status: Running on LXC 122 (calibre-web) at http://10.89.97.85:8084
GitHub: Not yet in version control (TODO)
Architecture¶
Current Implementation (LXC 122)¶
┌─────────────────────────────────────────┐
│ LXC 122: calibre-web (10.89.97.85) │
├─────────────────────────────────────────┤
│ Services: │
│ - book-downloader.service (port 8084) │
│ - calibre-web.service (port 8083) │
│ │
│ Components: │
│ 1. Node.js/Express Web UI │
│ /opt/book-downloader/server.js │
│ - Search interface │
│ - Download management │
│ - Auto-import watcher │
│ │
│ 2. IRC Automation Scripts │
│ /root/irc-scripts/ │
│ - irssi (IRC client) │
│ - tmux (session management) │
│ - Bash automation workflows │
│ │
│ 3. Calibre CLI │
│ - Library management │
│ - Auto-import via calibredb │
└─────────────────────────────────────────┘
│
▼
Shared Storage (LXC 101):
- /mnt/media/books_load (downloads)
- /mnt/media/calibre (library)
Data Flow¶
- Search: User enters query → Web UI sends IRC
@searchcommand - Results: IRC bots respond → Results parsed and displayed with numbers
- Selection: User selects books → DCC download requests sent to IRC
- Download: Books downloaded to
/mnt/media/books_load - Import: File watcher detects new files → Auto-imports to Calibre
- Cleanup: Successful imports trigger file deletion
Key Files¶
LXC 122 Filesystem:
/opt/book-downloader/
├── server.js # Main Express application
├── public/
│ ├── index.html # Web UI
│ └── style.css
└── package.json
/root/irc-scripts/
├── book-finder.sh # Complete workflow (search → select → download)
├── search-and-capture.sh # IRC search and result parsing
├── browse-and-select.sh # Interactive selection UI
├── download-books.sh # DCC download automation
├── irc-session-manager.sh # Persistent IRC session management
└── monitor-downloads.sh # File monitoring utility
/etc/systemd/system/
└── book-downloader.service # Systemd service definition
/mnt/media/books_load/ # Download staging area
├── search-results.txt # Human-readable results
├── search-results.dat # Machine-readable data
└── selected.txt # User selections
Dependencies¶
System:
- Node.js 20
- irssi (IRC client)
- tmux (session manager)
- Calibre CLI (calibredb)
NPM: - express ^4.18.2 - chokidar ^3.5.3 (file watching)
Current Issues & Limitations¶
✅ Status: Working¶
The service is currently operational. Previous logs showed restart failures (EADDRINUSE errors) due to systemd restart loops, but the service successfully started and is now stable.
Verification:
# Check service status
pct exec 122 -- systemctl status book-downloader
# Test health endpoint
curl http://10.89.97.85:8084/api/health
# View recent activity
pct exec 122 -- journalctl -u book-downloader --since "1 hour ago"
Known Limitations¶
- Single Instance: IRC session requires exactly one process (cannot scale)
- Port Forwarding: DCC transfers need ports 1024-65535 accessible
- No Input Validation: Search queries passed directly to shell (security risk)
- Hardcoded Paths: Configuration not environment-aware
- No Authentication: Web UI is completely open
Security Concerns¶
⚠️ CRITICAL VULNERABILITIES:
Command Injection (server.js:95):
const searchCmd = `${CONFIG.scriptDir}/search-and-capture.sh "${query.replace(/"/g, '\\"')}"`;
await execCommand(searchCmd, { timeout: 60000 });
File Operations: - Downloads arbitrary files from IRC (untrusted source) - No file type validation beyond extension - Auto-imports without scanning
Recommendations:
- Use child_process.spawn() with argument array instead of string interpolation
- Validate search queries against allowlist
- Add file scanning before import
- Implement authentication (Authentik forward auth)
Migration to Kubernetes¶
Decision: Standalone App or Shared Service?¶
RECOMMENDED: Standalone Kubernetes App
Rationale: - ✅ Single purpose (book acquisition workflow) - ✅ Independent data (doesn't integrate with other apps) - ✅ Different tech stack (Bash + IRC vs Next.js) - ✅ Security isolation (should be sandboxed from main apps) - ✅ Optional service (not critical infrastructure)
Integration Strategy: - Calibre-Web can run as sidecar container or separate deployment - Expose book catalog via API for home-portal integration - Link from home-portal dashboard as external service
Architecture: StatefulSet with Sidecar¶
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: book-downloader
namespace: media
spec:
serviceName: book-downloader
replicas: 1 # MUST be 1 (IRC session state)
template:
spec:
# REQUIRED: DCC transfers need direct network access
hostNetwork: true
containers:
# Container 1: Book Downloader UI + IRC
- name: downloader
image: registry.home.internal/book-downloader:latest
ports:
- containerPort: 8084
name: http
env:
- name: IRC_SCRIPTS_DIR
value: /scripts
- name: DOWNLOAD_DIR
value: /data/books_load
- name: CALIBRE_LIBRARY
value: /data/calibre
volumeMounts:
- name: books-load
mountPath: /data/books_load
- name: calibre-library
mountPath: /data/calibre
- name: irc-state
mountPath: /root/.irssi
- name: irc-scripts
mountPath: /scripts
readOnly: true
livenessProbe:
httpGet:
path: /api/health
port: 8084
initialDelaySeconds: 30
periodSeconds: 60
readinessProbe:
httpGet:
path: /api/health
port: 8084
initialDelaySeconds: 10
periodSeconds: 30
# Container 2: Calibre-Web (Optional)
- name: calibre-web
image: lscr.io/linuxserver/calibre-web:latest
ports:
- containerPort: 8083
name: web
env:
- name: PUID
value: "1000"
- name: PGID
value: "1000"
volumeMounts:
- name: calibre-library
mountPath: /books
- name: calibre-config
mountPath: /config
volumes:
- name: irc-scripts
configMap:
name: irc-scripts
defaultMode: 0755
- name: books-load
persistentVolumeClaim:
claimName: books-load-pvc
- name: calibre-library
persistentVolumeClaim:
claimName: calibre-library-pvc
# Persistent state for IRC session
volumeClaimTemplates:
- metadata:
name: irc-state
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-path
resources:
requests:
storage: 1Gi
- metadata:
name: calibre-config
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: local-path
resources:
requests:
storage: 2Gi
Storage Requirements¶
---
# books-load PVC (download staging)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: books-load-pvc
namespace: media
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 20Gi
---
# calibre-library PVC (main library)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: calibre-library-pvc
namespace: media
spec:
accessModes:
- ReadWriteOnce
storageClassName: local-path
resources:
requests:
storage: 100Gi # Adjust based on existing library size
ConfigMap for IRC Scripts¶
apiVersion: v1
kind: ConfigMap
metadata:
name: irc-scripts
namespace: media
data:
book-finder.sh: |
#!/bin/bash
# Content from /root/irc-scripts/book-finder.sh
search-and-capture.sh: |
#!/bin/bash
# Content from /root/irc-scripts/search-and-capture.sh
# ... (add all other scripts)
Dockerfile¶
FROM node:20-bookworm
# Install IRC and Calibre dependencies
RUN apt-get update && apt-get install -y \
irssi \
tmux \
calibre \
bc \
&& rm -rf /var/lib/apt/lists/*
WORKDIR /app
# Copy application
COPY server.js package*.json ./
COPY public/ ./public/
# Install Node dependencies
RUN npm ci --only=production
# Create directories
RUN mkdir -p /data/books_load /data/calibre /root/.irssi
# Expose web UI port
EXPOSE 8084
# Health check
HEALTHCHECK --interval=60s --timeout=10s --start-period=30s --retries=3 \
CMD node -e "require('http').get('http://localhost:8084/api/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1); })"
CMD ["node", "server.js"]
Migration Checklist¶
Phase 1: Preparation¶
-
[ ] Backup existing Calibre library:
-
[ ] Export IRC scripts to tower-fleet:
-
[ ] Version control the application:
Phase 2: Code Refactoring¶
- [ ] Add environment variable support (server.js:11-18)
- [ ] Fix command injection vulnerability (server.js:95)
- [ ] Improve health check endpoint (server.js:45-46)
- [ ] Add input validation middleware
- [ ] Update hardcoded IP addresses
Phase 3: Containerization¶
- [ ] Create Dockerfile
-
[ ] Build and test locally:
-
[ ] Push to registry:
Phase 4: Kubernetes Deployment¶
-
[ ] Create namespace:
-
[ ] Create ConfigMap from IRC scripts:
-
[ ] Create PVCs for storage
- [ ] Deploy StatefulSet
- [ ] Create Services and Ingress
- [ ] Add Authentik forward auth (recommended)
Phase 5: Data Migration¶
-
[ ] Copy Calibre library to PVC:
-
[ ] Verify library integrity in Calibre-Web
Phase 6: Testing¶
- [ ] Test search functionality
- [ ] Test download workflow
- [ ] Test auto-import
- [ ] Verify IRC session persistence across pod restarts
- [ ] Test DCC transfers work with hostNetwork
Phase 7: Cutover¶
- [ ] Update DNS or Ingress to point to k8s service
- [ ] Monitor for 24 hours
-
[ ] Stop LXC 122 service:
-
[ ] Document rollback procedure
Critical Migration Considerations¶
1. IRC Session State¶
Challenge: IRC uses persistent tmux sessions with authentication state
Solutions:
- StatefulSet ensures stable pod identity and persistent volumes
- irc-state PVC preserves /root/.irssi/ config and logs
- Init container can auto-connect to IRC on first startup
Alternative: Use IRC bouncer (ZNC) as separate service for better resilience
2. DCC File Transfers¶
Challenge: DCC requires inbound connections on ports 1024-65535
Solutions: - hostNetwork: true - Simplest, same as LXC (RECOMMENDED) - NodePort - Expose DCC port range (complex, not recommended) - Protocol change - Use XDCC-only without DCC (requires IRC script changes)
Testing DCC:
3. Security Hardening¶
Required before k8s deployment:
-
Fix command injection:
-
Add NetworkPolicy:
apiVersion: networking.k8s.io/v1 kind: NetworkPolicy metadata: name: book-downloader-policy spec: podSelector: matchLabels: app: book-downloader policyTypes: - Ingress - Egress ingress: - from: - namespaceSelector: matchLabels: name: ingress-nginx # Only allow ingress egress: - to: - podSelector: {} # Allow internal - to: - namespaceSelector: {} ports: - protocol: TCP port: 53 # DNS - protocol: UDP port: 53 - to: - podSelector: {} ports: - protocol: TCP port: 6667 # IRC -
Add authentication via Authentik:
- See:
/root/tower-fleet/docs/infrastructure/authentik-forward-auth.md - Create application in Authentik
- Add middleware to Ingress
4. Calibre Library Migration¶
Options:
A. Migrate existing library: - Pros: Keep all metadata, covers, reading progress - Cons: Large transfer (100GB+), potential corruption - Recommended: Only if library is well-maintained
B. Start fresh library: - Pros: Clean slate, better organization - Cons: Lose metadata, need to re-import - Recommended: If library has duplicate/corrupt entries
Migration command:
# Copy entire library
rsync -av --progress \
/vault/subvol-101-disk-0/media/calibre/ \
/path/to/pvc/mount/
Troubleshooting¶
Service Won't Start¶
Symptom: EADDRINUSE: address already in use 0.0.0.0:8084
Cause: Port conflict or previous process not cleaned up
Fix:
# Find process using port
pct exec 122 -- lsof -i :8084
# Kill stuck process
pct exec 122 -- kill -9 <PID>
# Restart service
pct exec 122 -- systemctl restart book-downloader
Download Fails with "Download failed" Error¶
Symptom: Search works correctly, but clicking "Download" shows "Download failed" error in web UI
Cause: Permission issue with selected.txt file. In unprivileged LXC containers, files created by IRC downloads (running as nobody:nogroup) cannot be overwritten by Node.js app (running as root). The UID/GID mappings prevent modification.
Quick Fix:
# Delete the old selected.txt file
pct exec 122 -- rm -f /mnt/media/books_load/selected.txt
# Try download again from web UI - will work immediately
Permanent Code Fix: Update server.js around line 185:
// BEFORE:
await fs.writeFile(CONFIG.selectionFile, selectionContent);
// AFTER: Add retry with delete on permission error
try {
await fs.writeFile(CONFIG.selectionFile, selectionContent);
} catch (error) {
if (error.code === 'EACCES') {
// Permission denied - delete old file and retry
await fs.unlink(CONFIG.selectionFile).catch(() => {});
await fs.writeFile(CONFIG.selectionFile, selectionContent);
} else {
throw error;
}
}
Verification:
# Check logs for actual error
pct exec 122 -- journalctl -u book-downloader --since "5 minutes ago" | grep -i error
# Test download endpoint directly
curl -X POST http://10.89.97.85:8084/api/download \
-H "Content-Type: application/json" \
-d '{"selections": [1]}'
IRC Session Lost¶
Symptom: Search fails with "IRC not connected"
Fix:
# Check IRC session status
pct exec 122 -- /root/irc-scripts/irc-session-manager.sh status
# Restart IRC session
pct exec 122 -- /root/irc-scripts/irc-session-manager.sh restart
Auto-Import Not Working¶
Symptom: Books download but don't appear in Calibre
Debug:
# Check file watcher logs
pct exec 122 -- journalctl -u book-downloader | grep -i "detected\|import"
# Manually import
pct exec 122 -- calibredb add /mnt/media/books_load/book.epub \
--library-path=/mnt/media/calibre
DCC Transfers Fail¶
Symptom: Downloads timeout or never start
Fix:
# Check firewall
iptables -L | grep -i dcc
# Test from IRC client
/dcc get SearchBot test.epub
# Check DCC settings
/set dcc
Related Documentation¶
- Authentik Forward Auth:
/root/tower-fleet/docs/infrastructure/authentik-forward-auth.md - Docker Deployment:
/root/tower-fleet/docs/workflows/docker-deployment.md - Ingress Configuration:
/root/tower-fleet/docs/reference/ingress-configuration.md - Storage Strategy:
/root/tower-fleet/docs/decisions/ADR-002-storage-strategy.md
Future Enhancements¶
Short-term¶
- [ ] Add authentication (Authentik)
- [ ] Fix security vulnerabilities
- [ ] Add Prometheus metrics
- [ ] Create backup automation
Long-term¶
- [ ] Replace IRC with library APIs (Library Genesis, Anna's Archive)
- [ ] Add book recommendation engine
- [ ] Integrate with reading tracker apps (Goodreads, etc.)
- [ ] Mobile app for book browsing
- [ ] OCR support for scanned PDFs
- [ ] Automated metadata enrichment
Last Updated: 2025-12-02 Status: LXC deployment (working), k8s migration (planned)