Skip to content

Book Downloader & Calibre

Overview

The book downloader is an IRC-based automation system for acquiring and managing ebooks. It provides a web interface for searching IRC channels, downloading books via DCC, and automatically importing them into a Calibre library.

Current Status: Running on LXC 122 (calibre-web) at http://10.89.97.85:8084

GitHub: Not yet in version control (TODO)


Architecture

Current Implementation (LXC 122)

┌─────────────────────────────────────────┐
│  LXC 122: calibre-web (10.89.97.85)    │
├─────────────────────────────────────────┤
│  Services:                              │
│  - book-downloader.service (port 8084) │
│  - calibre-web.service (port 8083)     │
│                                         │
│  Components:                            │
│  1. Node.js/Express Web UI              │
│     /opt/book-downloader/server.js     │
│     - Search interface                  │
│     - Download management               │
│     - Auto-import watcher               │
│                                         │
│  2. IRC Automation Scripts              │
│     /root/irc-scripts/                  │
│     - irssi (IRC client)                │
│     - tmux (session management)         │
│     - Bash automation workflows         │
│                                         │
│  3. Calibre CLI                         │
│     - Library management                │
│     - Auto-import via calibredb         │
└─────────────────────────────────────────┘
   Shared Storage (LXC 101):
   - /mnt/media/books_load (downloads)
   - /mnt/media/calibre (library)

Data Flow

  1. Search: User enters query → Web UI sends IRC @search command
  2. Results: IRC bots respond → Results parsed and displayed with numbers
  3. Selection: User selects books → DCC download requests sent to IRC
  4. Download: Books downloaded to /mnt/media/books_load
  5. Import: File watcher detects new files → Auto-imports to Calibre
  6. Cleanup: Successful imports trigger file deletion

Key Files

LXC 122 Filesystem:

/opt/book-downloader/
├── server.js              # Main Express application
├── public/
│   ├── index.html         # Web UI
│   └── style.css
└── package.json

/root/irc-scripts/
├── book-finder.sh         # Complete workflow (search → select → download)
├── search-and-capture.sh  # IRC search and result parsing
├── browse-and-select.sh   # Interactive selection UI
├── download-books.sh      # DCC download automation
├── irc-session-manager.sh # Persistent IRC session management
└── monitor-downloads.sh   # File monitoring utility

/etc/systemd/system/
└── book-downloader.service  # Systemd service definition

/mnt/media/books_load/      # Download staging area
├── search-results.txt      # Human-readable results
├── search-results.dat      # Machine-readable data
└── selected.txt            # User selections

Dependencies

System: - Node.js 20 - irssi (IRC client) - tmux (session manager) - Calibre CLI (calibredb)

NPM: - express ^4.18.2 - chokidar ^3.5.3 (file watching)


Current Issues & Limitations

✅ Status: Working

The service is currently operational. Previous logs showed restart failures (EADDRINUSE errors) due to systemd restart loops, but the service successfully started and is now stable.

Verification:

# Check service status
pct exec 122 -- systemctl status book-downloader

# Test health endpoint
curl http://10.89.97.85:8084/api/health

# View recent activity
pct exec 122 -- journalctl -u book-downloader --since "1 hour ago"

Known Limitations

  1. Single Instance: IRC session requires exactly one process (cannot scale)
  2. Port Forwarding: DCC transfers need ports 1024-65535 accessible
  3. No Input Validation: Search queries passed directly to shell (security risk)
  4. Hardcoded Paths: Configuration not environment-aware
  5. No Authentication: Web UI is completely open

Security Concerns

⚠️ CRITICAL VULNERABILITIES:

Command Injection (server.js:95):

const searchCmd = `${CONFIG.scriptDir}/search-and-capture.sh "${query.replace(/"/g, '\\"')}"`;
await execCommand(searchCmd, { timeout: 60000 });
- User input directly interpolated into shell command - Escaping only handles quotes, not other shell metacharacters - Risk: Arbitrary command execution

File Operations: - Downloads arbitrary files from IRC (untrusted source) - No file type validation beyond extension - Auto-imports without scanning

Recommendations: - Use child_process.spawn() with argument array instead of string interpolation - Validate search queries against allowlist - Add file scanning before import - Implement authentication (Authentik forward auth)


Migration to Kubernetes

Decision: Standalone App or Shared Service?

RECOMMENDED: Standalone Kubernetes App

Rationale: - ✅ Single purpose (book acquisition workflow) - ✅ Independent data (doesn't integrate with other apps) - ✅ Different tech stack (Bash + IRC vs Next.js) - ✅ Security isolation (should be sandboxed from main apps) - ✅ Optional service (not critical infrastructure)

Integration Strategy: - Calibre-Web can run as sidecar container or separate deployment - Expose book catalog via API for home-portal integration - Link from home-portal dashboard as external service

Architecture: StatefulSet with Sidecar

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: book-downloader
  namespace: media
spec:
  serviceName: book-downloader
  replicas: 1  # MUST be 1 (IRC session state)
  template:
    spec:
      # REQUIRED: DCC transfers need direct network access
      hostNetwork: true

      containers:
      # Container 1: Book Downloader UI + IRC
      - name: downloader
        image: registry.home.internal/book-downloader:latest
        ports:
        - containerPort: 8084
          name: http
        env:
        - name: IRC_SCRIPTS_DIR
          value: /scripts
        - name: DOWNLOAD_DIR
          value: /data/books_load
        - name: CALIBRE_LIBRARY
          value: /data/calibre
        volumeMounts:
        - name: books-load
          mountPath: /data/books_load
        - name: calibre-library
          mountPath: /data/calibre
        - name: irc-state
          mountPath: /root/.irssi
        - name: irc-scripts
          mountPath: /scripts
          readOnly: true
        livenessProbe:
          httpGet:
            path: /api/health
            port: 8084
          initialDelaySeconds: 30
          periodSeconds: 60
        readinessProbe:
          httpGet:
            path: /api/health
            port: 8084
          initialDelaySeconds: 10
          periodSeconds: 30

      # Container 2: Calibre-Web (Optional)
      - name: calibre-web
        image: lscr.io/linuxserver/calibre-web:latest
        ports:
        - containerPort: 8083
          name: web
        env:
        - name: PUID
          value: "1000"
        - name: PGID
          value: "1000"
        volumeMounts:
        - name: calibre-library
          mountPath: /books
        - name: calibre-config
          mountPath: /config

      volumes:
      - name: irc-scripts
        configMap:
          name: irc-scripts
          defaultMode: 0755
      - name: books-load
        persistentVolumeClaim:
          claimName: books-load-pvc
      - name: calibre-library
        persistentVolumeClaim:
          claimName: calibre-library-pvc

  # Persistent state for IRC session
  volumeClaimTemplates:
  - metadata:
      name: irc-state
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: local-path
      resources:
        requests:
          storage: 1Gi
  - metadata:
      name: calibre-config
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: local-path
      resources:
        requests:
          storage: 2Gi

Storage Requirements

---
# books-load PVC (download staging)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: books-load-pvc
  namespace: media
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 20Gi

---
# calibre-library PVC (main library)
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: calibre-library-pvc
  namespace: media
spec:
  accessModes:
  - ReadWriteOnce
  storageClassName: local-path
  resources:
    requests:
      storage: 100Gi  # Adjust based on existing library size

ConfigMap for IRC Scripts

apiVersion: v1
kind: ConfigMap
metadata:
  name: irc-scripts
  namespace: media
data:
  book-finder.sh: |
    #!/bin/bash
    # Content from /root/irc-scripts/book-finder.sh
  search-and-capture.sh: |
    #!/bin/bash
    # Content from /root/irc-scripts/search-and-capture.sh
  # ... (add all other scripts)

Dockerfile

FROM node:20-bookworm

# Install IRC and Calibre dependencies
RUN apt-get update && apt-get install -y \
    irssi \
    tmux \
    calibre \
    bc \
    && rm -rf /var/lib/apt/lists/*

WORKDIR /app

# Copy application
COPY server.js package*.json ./
COPY public/ ./public/

# Install Node dependencies
RUN npm ci --only=production

# Create directories
RUN mkdir -p /data/books_load /data/calibre /root/.irssi

# Expose web UI port
EXPOSE 8084

# Health check
HEALTHCHECK --interval=60s --timeout=10s --start-period=30s --retries=3 \
  CMD node -e "require('http').get('http://localhost:8084/api/health', (r) => { process.exit(r.statusCode === 200 ? 0 : 1); })"

CMD ["node", "server.js"]

Migration Checklist

Phase 1: Preparation

  • [ ] Backup existing Calibre library:

    tar -czf /vault/backups/calibre-library-$(date +%Y%m%d).tar.gz \
      /vault/subvol-101-disk-0/media/calibre
    

  • [ ] Export IRC scripts to tower-fleet:

    mkdir -p /root/tower-fleet/apps/book-downloader/irc-scripts
    pct exec 122 -- tar -czf - /root/irc-scripts | \
      tar -xzf - -C /root/tower-fleet/apps/book-downloader/
    

  • [ ] Version control the application:

    cd /root/tower-fleet/apps/book-downloader
    git init
    git add .
    git commit -m "chore: initial import of book-downloader from LXC 122"
    

Phase 2: Code Refactoring

  • [ ] Add environment variable support (server.js:11-18)
  • [ ] Fix command injection vulnerability (server.js:95)
  • [ ] Improve health check endpoint (server.js:45-46)
  • [ ] Add input validation middleware
  • [ ] Update hardcoded IP addresses

Phase 3: Containerization

  • [ ] Create Dockerfile
  • [ ] Build and test locally:

    docker build -t book-downloader:test .
    docker run --rm -p 8084:8084 \
      -v /tmp/test-downloads:/data/books_load \
      -v /tmp/test-calibre:/data/calibre \
      book-downloader:test
    

  • [ ] Push to registry:

    docker tag book-downloader:test registry.home.internal/book-downloader:latest
    docker push registry.home.internal/book-downloader:latest
    

Phase 4: Kubernetes Deployment

  • [ ] Create namespace:

    kubectl create namespace media
    

  • [ ] Create ConfigMap from IRC scripts:

    kubectl create configmap irc-scripts \
      --from-file=/root/tower-fleet/apps/book-downloader/irc-scripts/ \
      --namespace=media
    

  • [ ] Create PVCs for storage

  • [ ] Deploy StatefulSet
  • [ ] Create Services and Ingress
  • [ ] Add Authentik forward auth (recommended)

Phase 5: Data Migration

  • [ ] Copy Calibre library to PVC:

    kubectl run -it --rm copy-calibre \
      --image=busybox \
      --restart=Never \
      --namespace=media \
      --overrides='...' \
      -- sh -c "cp -r /source/* /target/"
    

  • [ ] Verify library integrity in Calibre-Web

Phase 6: Testing

  • [ ] Test search functionality
  • [ ] Test download workflow
  • [ ] Test auto-import
  • [ ] Verify IRC session persistence across pod restarts
  • [ ] Test DCC transfers work with hostNetwork

Phase 7: Cutover

  • [ ] Update DNS or Ingress to point to k8s service
  • [ ] Monitor for 24 hours
  • [ ] Stop LXC 122 service:

    pct exec 122 -- systemctl stop book-downloader
    pct exec 122 -- systemctl disable book-downloader
    

  • [ ] Document rollback procedure


Critical Migration Considerations

1. IRC Session State

Challenge: IRC uses persistent tmux sessions with authentication state

Solutions: - StatefulSet ensures stable pod identity and persistent volumes - irc-state PVC preserves /root/.irssi/ config and logs - Init container can auto-connect to IRC on first startup

Alternative: Use IRC bouncer (ZNC) as separate service for better resilience

2. DCC File Transfers

Challenge: DCC requires inbound connections on ports 1024-65535

Solutions: - hostNetwork: true - Simplest, same as LXC (RECOMMENDED) - NodePort - Expose DCC port range (complex, not recommended) - Protocol change - Use XDCC-only without DCC (requires IRC script changes)

Testing DCC:

# From inside pod
tmux attach -t irc-session
/dcc get SearchBot book.epub

3. Security Hardening

Required before k8s deployment:

  1. Fix command injection:

    // BEFORE (vulnerable):
    execCommand(`script.sh "${query.replace(/"/g, '\\"')}"`)
    
    // AFTER (safe):
    spawn('script.sh', [query], { shell: false })
    

  2. Add NetworkPolicy:

    apiVersion: networking.k8s.io/v1
    kind: NetworkPolicy
    metadata:
      name: book-downloader-policy
    spec:
      podSelector:
        matchLabels:
          app: book-downloader
      policyTypes:
      - Ingress
      - Egress
      ingress:
      - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx  # Only allow ingress
      egress:
      - to:
        - podSelector: {}  # Allow internal
      - to:
        - namespaceSelector: {}
        ports:
        - protocol: TCP
          port: 53  # DNS
        - protocol: UDP
          port: 53
      - to:
        - podSelector: {}
        ports:
        - protocol: TCP
          port: 6667  # IRC
    

  3. Add authentication via Authentik:

  4. See: /root/tower-fleet/docs/infrastructure/authentik-forward-auth.md
  5. Create application in Authentik
  6. Add middleware to Ingress

4. Calibre Library Migration

Options:

A. Migrate existing library: - Pros: Keep all metadata, covers, reading progress - Cons: Large transfer (100GB+), potential corruption - Recommended: Only if library is well-maintained

B. Start fresh library: - Pros: Clean slate, better organization - Cons: Lose metadata, need to re-import - Recommended: If library has duplicate/corrupt entries

Migration command:

# Copy entire library
rsync -av --progress \
  /vault/subvol-101-disk-0/media/calibre/ \
  /path/to/pvc/mount/


Troubleshooting

Service Won't Start

Symptom: EADDRINUSE: address already in use 0.0.0.0:8084

Cause: Port conflict or previous process not cleaned up

Fix:

# Find process using port
pct exec 122 -- lsof -i :8084

# Kill stuck process
pct exec 122 -- kill -9 <PID>

# Restart service
pct exec 122 -- systemctl restart book-downloader

Download Fails with "Download failed" Error

Symptom: Search works correctly, but clicking "Download" shows "Download failed" error in web UI

Cause: Permission issue with selected.txt file. In unprivileged LXC containers, files created by IRC downloads (running as nobody:nogroup) cannot be overwritten by Node.js app (running as root). The UID/GID mappings prevent modification.

Quick Fix:

# Delete the old selected.txt file
pct exec 122 -- rm -f /mnt/media/books_load/selected.txt

# Try download again from web UI - will work immediately

Permanent Code Fix: Update server.js around line 185:

// BEFORE:
await fs.writeFile(CONFIG.selectionFile, selectionContent);

// AFTER: Add retry with delete on permission error
try {
  await fs.writeFile(CONFIG.selectionFile, selectionContent);
} catch (error) {
  if (error.code === 'EACCES') {
    // Permission denied - delete old file and retry
    await fs.unlink(CONFIG.selectionFile).catch(() => {});
    await fs.writeFile(CONFIG.selectionFile, selectionContent);
  } else {
    throw error;
  }
}

Verification:

# Check logs for actual error
pct exec 122 -- journalctl -u book-downloader --since "5 minutes ago" | grep -i error

# Test download endpoint directly
curl -X POST http://10.89.97.85:8084/api/download \
  -H "Content-Type: application/json" \
  -d '{"selections": [1]}'

IRC Session Lost

Symptom: Search fails with "IRC not connected"

Fix:

# Check IRC session status
pct exec 122 -- /root/irc-scripts/irc-session-manager.sh status

# Restart IRC session
pct exec 122 -- /root/irc-scripts/irc-session-manager.sh restart

Auto-Import Not Working

Symptom: Books download but don't appear in Calibre

Debug:

# Check file watcher logs
pct exec 122 -- journalctl -u book-downloader | grep -i "detected\|import"

# Manually import
pct exec 122 -- calibredb add /mnt/media/books_load/book.epub \
  --library-path=/mnt/media/calibre

DCC Transfers Fail

Symptom: Downloads timeout or never start

Fix:

# Check firewall
iptables -L | grep -i dcc

# Test from IRC client
/dcc get SearchBot test.epub

# Check DCC settings
/set dcc


  • Authentik Forward Auth: /root/tower-fleet/docs/infrastructure/authentik-forward-auth.md
  • Docker Deployment: /root/tower-fleet/docs/workflows/docker-deployment.md
  • Ingress Configuration: /root/tower-fleet/docs/reference/ingress-configuration.md
  • Storage Strategy: /root/tower-fleet/docs/decisions/ADR-002-storage-strategy.md

Future Enhancements

Short-term

  • [ ] Add authentication (Authentik)
  • [ ] Fix security vulnerabilities
  • [ ] Add Prometheus metrics
  • [ ] Create backup automation

Long-term

  • [ ] Replace IRC with library APIs (Library Genesis, Anna's Archive)
  • [ ] Add book recommendation engine
  • [ ] Integrate with reading tracker apps (Goodreads, etc.)
  • [ ] Mobile app for book browsing
  • [ ] OCR support for scanned PDFs
  • [ ] Automated metadata enrichment

Last Updated: 2025-12-02 Status: LXC deployment (working), k8s migration (planned)