The Backup That Was Never Actually a Backup
Imagine your home server dies. Drive failure, power surge, theft, flood — pick your disaster. You have backups. Except when you go to restore them, you discover:
- The backup script has been failing silently for three months
- The backup includes the application directory but not the data directory
- The backup is on the same NAS that also died
- You can restore the files but you have no idea how to reinstall and reconfigure the application
- Everything works, but all your Docker volumes pointed to a path you can’t remember
Backup is not the same as recovery. Having files somewhere is not a disaster recovery strategy. A disaster recovery plan is the full picture: what you back up, how you restore it, in what order, and how you verify it actually works.
The Disaster Scenarios You Need to Plan For
Different disasters require different recovery strategies. Don’t let perfect be the enemy of good — plan for the likely ones:
Single drive failure: Most common. RAID or ZFS RAIDZ handles this automatically. The “disaster” is replacing the failed drive and waiting for the rebuild.
Accidental deletion: “rm -rf /data” moments. File-level backups that support point-in-time recovery. Recycle bin features in your NAS.
Application/OS corruption: Botched update, config error, failed migration. VM snapshots. System-level backups. Ability to roll back.
Full hardware failure: Server won’t boot, motherboard dead. Need to restore to new hardware. Your backup must be hardware-independent.
Ransomware: Backups need to be air-gapped or immutable. If the ransomware can reach your backup destination, it will encrypt that too.
Site-level disaster (fire, flood, theft): Requires off-site backups. Your backup NAS in the same room as your server doesn’t help here.
RTO and RPO: The Two Numbers That Define Your DR Plan
These terms come from enterprise DR planning but are useful at any scale.
RPO — Recovery Point Objective: How much data loss is acceptable? If your RPO is 4 hours, your backup needs to run at least every 4 hours. If your RPO is 0 (no data loss acceptable), you need real-time replication.
RTO — Recovery Time Objective: How long can you be down before it matters? If your RTO is 1 day, you have 24 hours to restore. If your RTO is 1 hour, your restore process needs to be fast and well-practiced.
For a home lab, be honest with yourself:
| Service | My RPO | My RTO | Why |
|---|---|---|---|
| Personal media library | 7 days | 3 days | I can re-download, it’s just annoying |
| Home automation config | 1 day | 4 hours | I like my automations |
| Password manager | 1 hour | 1 hour | This is important |
| Family photos | 0 (no loss) | 2 days | Irreplaceable |
| Minecraft server | 1 day | 2 days | The kids will complain |
Setting explicit RPO/RTO forces prioritization. You can’t treat everything as equally important, and the constraints help you decide backup frequency and restore process complexity.
The 3-2-1 Backup Rule
The foundational backup strategy: 3 copies, on 2 different media types, with 1 copy offsite.
3 copies: The original plus two backups. One backup means if the backup and the original both fail (they’re often on the same system), you have nothing.
2 media types: Don’t put both backups on the same type of storage. External hard drives + cloud, or NAS + tape, or NAS + cloud. Different media fails differently.
1 offsite: One backup needs to be physically somewhere else. Cloud storage counts. A hard drive at a family member’s house counts. Your backup NAS in the same rack does not count.
Some add a “1 offline” modifier — 3-2-1-1: one copy offline or air-gapped, protecting against ransomware and network-reachable failures.
Proxmox VM Backups
Proxmox has excellent built-in backup capabilities that should be your first layer.
Using Proxmox Backup Server (PBS)
PBS is a dedicated backup application that pairs with Proxmox VE. Deduplication means incremental backups are efficient even for large VMs.
# On Proxmox VE host — add PBS as storage# Datacenter → Storage → Add → Proxmox Backup Server# Server: pbs.local, Datastore: your-datastore# Fingerprint: (get from PBS → Dashboard → Show Fingerprint)Configure automated backups in Proxmox VE:
Datacenter → Backup → Add:
- Node: your-node (or All)
- Storage: your PBS storage
- Schedule:
0 2 * * *(2am daily) - Selection: All VMs / specific VMs
- Mode: Snapshot (or Suspend for consistency)
- Retention: Keep last 7 daily, 4 weekly, 3 monthly
Restore from PBS:
# Via web UI: VM → Backup tab → Restore# Or CLI:qmrestore PBS:backup/100/2026-04-01T02:00:00Z 100 --storage local-lvmManual VM Snapshots vs Backups
Snapshots are for short-term protection (before risky updates). Backups are for disaster recovery. Don’t confuse them:
# Create snapshot before a risky operationqm snapshot 100 pre-update --description "Before kernel update"
# Roll back if neededqm rollback 100 pre-update
# Delete snapshot when doneqm delsnapshot 100 pre-updateSnapshots live on the same storage as the VM. They protect against oops moments, not hardware failures.
Docker Volume Backup Strategies
Docker volumes are a common backup blind spot. People back up their compose files but not the data directories.
Strategy 1: Bind Mounts to a Backed-Up Path
The simplest approach — use bind mounts that point to a path your backup tool already covers:
services: postgres: image: postgres:16 volumes: - /opt/appdata/postgres:/var/lib/postgresql/data
nextcloud: image: nextcloud:latest volumes: - /opt/appdata/nextcloud:/var/www/htmlNow /opt/appdata/ is the single directory to back up for all your Docker data.
Strategy 2: Database Dump Before Backup
For databases, file-level backup of a running database can produce inconsistent snapshots. Dump first:
#!/bin/bashBACKUP_DIR="/opt/backups/db"DATE=$(date +%Y%m%d-%H%M%S)
# Dump PostgreSQLdocker exec postgres pg_dumpall -U postgres > "${BACKUP_DIR}/postgres-${DATE}.sql"
# Dump MySQL/MariaDBdocker exec mariadb mysqldump -u root -p"${MYSQL_ROOT_PASSWORD}" --all-databases > "${BACKUP_DIR}/mysql-${DATE}.sql"
# Keep last 7 daysfind "${BACKUP_DIR}" -name "*.sql" -mtime +7 -deleteRun this before your file backup job.
Strategy 3: docker-volume-backup
# docker-compose.yml — add alongside your servicesservices: backup: image: offen/docker-volume-backup:latest environment: BACKUP_CRON_EXPRESSION: "0 2 * * *" BACKUP_RETENTION_DAYS: "7" AWS_S3_BUCKET_NAME: my-backup-bucket AWS_ACCESS_KEY_ID: ${AWS_ACCESS_KEY_ID} AWS_SECRET_ACCESS_KEY: ${AWS_SECRET_ACCESS_KEY} volumes: - myapp_data:/backup/myapp_data:ro - /var/run/docker.sock:/var/run/docker.sock:roOffsite Backup with Restic and Backblaze B2
Restic is an excellent backup tool: deduplication, encryption at rest, incremental, supports many backends.
Installing Restic
# Debian/Ubuntusudo apt install restic
# Or latest binarycurl -L https://github.com/restic/restic/releases/latest/download/restic_linux_amd64.bz2 | bunzip2 > /usr/local/bin/resticchmod +x /usr/local/bin/resticConfigure Backblaze B2
Create a Backblaze account, create a B2 bucket (note: private), create an application key with access to that bucket.
# Environment variables (store in /etc/restic-env, chmod 600)export B2_ACCOUNT_ID="your-account-id"export B2_ACCOUNT_KEY="your-application-key"export RESTIC_REPOSITORY="b2:my-backup-bucket:/homelab"export RESTIC_PASSWORD="your-strong-encryption-password"# Initialize repository (first time only)source /etc/restic-envrestic init
# Backup your data directoryrestic backup /opt/appdata --tag docker-data
# Backup with exclusionsrestic backup /opt/appdata \ --exclude /opt/appdata/*/cache \ --exclude /opt/appdata/*/logs \ --tag docker-data
# Check backup integrityrestic check
# List snapshotsrestic snapshots
# Restore specific snapshotrestic restore latest --target /restore/test
# Restore specific path from latest snapshotrestic restore latest --target /tmp/restore --include /opt/appdata/nextcloudAutomated Backup with systemd
[Unit]Description=Restic backup to Backblaze B2OnFailure=restic-backup-failure@%n.service
[Service]Type=oneshotEnvironmentFile=/etc/restic-envExecStart=/usr/local/bin/restic backup /opt/appdata --tag docker-dataExecStart=/usr/local/bin/restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 12 --pruneExecStart=/usr/local/bin/restic check --read-data-subset=10%%[Unit]Description=Run Restic backup daily
[Timer]OnCalendar=*-*-* 03:00:00Persistent=trueRandomizedDelaySec=1800
[Install]WantedBy=timers.targetsudo systemctl enable --now restic-backup.timerAlert on Backup Failure
# /etc/systemd/system/restic-backup-failure@.service[Unit]Description=Notify on backup failure
[Service]Type=oneshotExecStart=/opt/scripts/notify-backup-failure.sh#!/bin/bashcurl -X POST https://ntfy.sh/your-backup-alerts \ -d "Restic backup failed on $(hostname) at $(date)" \ -H "Title: Backup Failure" \ -H "Priority: high"Testing Your Backups — The Step Everyone Skips
A backup you’ve never restored is a backup of unknown quality. Test regularly:
Monthly: Spot check restore
# Restore a random file and verify it's readablesource /etc/restic-envrestic restore latest --target /tmp/restore-test --include /opt/appdata/nextcloud/config/config.phpdiff /opt/appdata/nextcloud/config/config.php /tmp/restore-test/opt/appdata/nextcloud/config/config.phprm -rf /tmp/restore-testQuarterly: Full application restore test Spin up a VM, restore your backup, start your application, verify it works. This is the only way to know your RTO is achievable.
Annually: Site-level disaster simulation Assume your entire server is gone. Start from scratch using only your off-site backups and your documentation. What breaks? What’s missing? Update your documentation.
A Practical DR Runbook Template
# Disaster Recovery Runbook — [Service Name]Last tested: [date]Last updated: [date]
## Service Description[What does this service do? What data does it hold?]
## RPO / RTO- Recovery Point Objective: X hours- Recovery Time Objective: X hours
## Backup Locations- Primary: [PBS datastore / path]- Off-site: [Backblaze B2 bucket / Restic repo]- Database dumps: /opt/backups/db/
## Recovery Procedure
### Prerequisites- [ ] New server/VM provisioned with [OS version]- [ ] Docker and Docker Compose installed- [ ] SSH access configured
### Step 1: Restore application datarestic -r b2:my-bucket:/homelab restore latest \ --target /opt/appdata \ --include /opt/appdata/[service-name]
### Step 2: Restore databasedocker exec -i postgres psql -U postgres < /backups/postgres-latest.sql
### Step 3: Start applicationcd /opt/compose/[service-name]docker compose up -d
### Step 4: Verify- [ ] Application responds at [URL]- [ ] Check [specific data] is intact- [ ] Run smoke test: [test command]
## Known Issues[What might go wrong during recovery]
## Contact[Who knows this service and can help]Document it before you need it. The 3am disaster recovery session is not the time to be reading documentation for the first time — it’s when you want to be executing a checklist you’ve already validated works.
The backup that’s never been restored might as well not exist. The runbook that’s never been tested will fail when you need it most. Testing is not optional; it’s the only way to know you actually have disaster recovery, rather than the appearance of it.