Your Repo Is 800 GB and check Just Timed Out
It starts innocently. You set up restic, pointed it at Backblaze, wrote a cron job, and forgot about it. Exactly as the gods of backup intended.
Then one day you check on it — because something feels wrong and you’re a responsible sysadmin, or because your bill went up — and you’re staring at a repo that has ballooned to 800 GB. You run restic check to make sure things are healthy before you touch anything. Four hours later, your SSH session times out and you still don’t have an answer.
Welcome to restic repository maintenance. It’s not glamorous, but neither is restoring from a corrupt backup at 2 AM.
Let’s talk about what actually needs to happen, in what order, and how to automate it so future-you doesn’t end up here again.
The Lifecycle: What Actually Happens to Your Data
Restic stores backup data in pack files inside the repo. When you run restic backup, it:
- Chunks your files using content-defined chunking (CDC)
- Deduplicates chunks that already exist in the repo
- Writes new chunks to pack files
- Creates a snapshot — basically a JSON manifest pointing to all the chunks that make up your backup at that point in time
The problem is restic never deletes anything automatically. Every snapshot you’ve ever taken is still there. Every chunk those snapshots reference is still stored. Your repo only grows.
This is where the trio comes in:
forget— marks old snapshots for deletion based on your retention policy. Doesn’t actually delete any data yet.prune— goes through the pack files, figures out which chunks are no longer referenced by any snapshot, and removes them.check— verifies repo integrity. Reads the index, checks that all referenced chunks exist, optionally reads and verifies actual data.
The order matters: forget, then prune, then check. Running check before prune is fine for sanity checks, but don’t expect it to tell you what prune will clean up.
forget: Teaching Restic What to Keep
restic forget without any flags does absolutely nothing useful — it lists snapshots. You need to tell it your retention policy:
restic forget \ --keep-daily 7 \ --keep-weekly 4 \ --keep-monthly 6 \ --keep-yearly 2 \ --pruneThis keeps:
- One snapshot per day for the last 7 days
- One per week for the last 4 weeks
- One per month for the last 6 months
- One per year for the last 2 years
The --prune flag tells forget to immediately run prune after marking snapshots. Convenient, but you lose some control. I’ll explain why you might want to keep them separate in a minute.
Before you commit to a policy, run it in dry-run mode:
restic forget \ --keep-daily 7 \ --keep-weekly 4 \ --keep-monthly 6 \ --keep-yearly 2 \ --dry-runYou’ll see exactly which snapshots would be removed. Look at this output carefully. If you set --keep-daily 7 but you only run backups weekly, restic will happily keep just your last 7 successful weekly snapshots — not 7 days’ worth. The flags are about snapshot count within time windows, not calendar duration. This trips people up constantly.
Tags and Hosts
If you’re backing up multiple machines or datasets to the same repo, scope your forget with --tag or --host:
restic forget \ --host homelab-nas \ --keep-daily 7 \ --keep-weekly 4 \ --pruneWithout this, your retention policy might eat snapshots from different hosts than you intended. Check restic snapshots to see what tags and hosts you’re working with.
prune: The Part That Actually Frees Space
Once forget has marked snapshots as removed, prune cleans up the orphaned data. This is the expensive operation — it has to read the index, figure out what’s referenced, and rewrite pack files that are only partially used.
Basic prune:
restic pruneModern restic (v0.14+) supports --max-unused to control how aggressive the cleanup is:
restic prune --max-unused 5%This tells restic to leave up to 5% of the repo as unused data (unreferenced chunks sitting in pack files that aren’t worth rewriting yet). Lower threshold = more thorough cleanup = more time and bandwidth. The default is 5%, which is sane for most setups.
v0.17/0.18+ Is a Big Deal
If you’re on an older restic release, upgrade. Seriously.
Restic v0.17 introduced --repack-uncompressed which rewrites pack files that were created before compression was supported (restic added compression in v0.14). If you started your repo before that, you’ve got a mix of compressed and uncompressed packs. This flag fixes it:
restic prune --repack-uncompressedv0.17 and v0.18 also dramatically improved prune performance by reducing the amount of index data that needs to be loaded into memory. If you’ve been avoiding prune because it OOMs your NAS — try again with the latest version.
check: The Right Way to Not Destroy Your Weekend
Here’s what most people do wrong: they run restic check --read-data on an 800 GB repo stored on Backblaze B2. This downloads every byte of your backup to verify it. At typical B2 download speeds, this takes forever and costs you money.
Don’t do that.
Instead, use subset checking:
restic check --read-data-subset=1G/10This reads 1/10 of the pack data, using a deterministic selection that cycles through different portions on each run. Run this weekly in your automation, and over 10 weeks you’ve verified the entire repo without ever hammering your bandwidth in a single shot.
For the paranoid (and you should be slightly paranoid about backups), there’s also:
restic check --read-data-subset=10%Same idea, percentage-based. Pick whichever makes more intuitive sense to you.
The check without --read-data or --read-data-subset still verifies the index and that all pack files referenced by snapshots actually exist — it just doesn’t verify the contents of those pack files. It’s fast and catches the most common failure modes (missing files, corrupt index). Do this after every prune.
The Remote Is Slow and Prune Is Painful
If your repo lives on a remote backend — Backblaze B2, S3, SFTP, whatever — prune is going to be painful. It has to download pack files, figure out which chunks to keep, then re-upload modified packs and delete the originals. On a slow connection, forget plus prune on a 500 GB repo can take six hours.
A few ways to make this hurt less:
Option 1: REST Server as a Local Cache
Run rest-server locally and use it as your primary backup target. It’s a lightweight HTTP server that implements the restic REST backend protocol. Fast local I/O, then you sync to remote storage separately.
docker run -p 8000:8000 \ -v /mnt/backup/restic-rest:/data \ restic/rest-server \ --no-auth \ --append-onlyNote the --append-only flag — this enables append-only mode, which means the server refuses any DELETE operations on pack files. Even if ransomware gets your backup client and knows your restic password, it can’t delete your backup data through the REST API. You need direct server access to run prune. This is the right tradeoff for most home lab setups.
Set your repo URL to rest:http://localhost:8000/myrepo and now prune runs entirely locally, fast.
Option 2: rclone serve restic
If your data is already on cloud storage and you don’t want to move it, rclone can serve a local REST interface backed by your remote:
rclone serve restic \ --addr localhost:8000 \ b2:my-backup-bucket/resticThen point restic at rest:http://localhost:8000/. You get local-feeling prune operations that transfer data through rclone’s connection pooling and caching. Not as fast as true local storage, but much better than raw SFTP or native B2 API.
Option 3: Rethink Your Prune Frequency
Prune doesn’t need to run after every backup. Run forget (without --prune) after every backup to mark stale snapshots, then run prune once a week or month. The repo will grow slightly between prune runs due to unreferenced data, but your daily backup windows stay short.
Cache Directory: The Silent Performance Killer
Restic uses a local cache at ~/.cache/restic/ (or $XDG_CACHE_HOME/restic/) to store index data and pack metadata locally. This cache makes repeated operations on the same repo dramatically faster.
The problem: on remote repos, the cache can get stale, or it can grow unbounded over time. If restic is complaining about mismatched cache data, blow it out:
restic cache --cleanupOr nuke it entirely:
restic cache --no-cache # disable for a single runWhen running restic in scripts or containers that don’t have persistent home directories, set RESTIC_CACHE_DIR explicitly:
export RESTIC_CACHE_DIR=/var/cache/resticPut this in your systemd service environment or cron environment. Without it, restic will rebuild the cache from scratch on every run, which is expensive on remote repos.
Automation: Systemd Timer + Healthchecks
Cron jobs are fine, but systemd timers give you better logging and don’t silently disappear if the system reboots mid-job.
Here’s a complete example. First, the environment file (keep your restic secrets out of the unit files):
RESTIC_REPOSITORY=b2:my-backup-bucket:/resticRESTIC_PASSWORD_FILE=/etc/restic/passwordAWS_ACCESS_KEY_ID=your_b2_key_idAWS_SECRET_ACCESS_KEY=your_b2_app_keyB2_ACCOUNT_ID=your_b2_account_idB2_ACCOUNT_KEY=your_b2_app_keyHEALTHCHECK_URL=https://hc-ping.com/your-uuid-hereThe backup service:
[Unit]Description=Restic backupAfter=network-online.targetWants=network-online.target
[Service]Type=oneshotEnvironmentFile=/etc/restic/envExecStartPre=/bin/sh -c 'curl -fsS -m 10 --retry 5 -o /dev/null "${HEALTHCHECK_URL}/start"'ExecStart=/usr/local/bin/restic backup \ --one-file-system \ --tag systemd \ /home /etc /var/libExecStartPost=/bin/sh -c 'curl -fsS -m 10 --retry 5 -o /dev/null "${HEALTHCHECK_URL}"'ExecStopPost=/bin/sh -c 'if [ "$EXIT_STATUS" != "0" ]; then curl -fsS -m 10 --retry 5 -o /dev/null "${HEALTHCHECK_URL}/fail"; fi'The maintenance service:
[Unit]Description=Restic maintenance (forget, prune, check)After=network-online.targetWants=network-online.target
[Service]Type=oneshotEnvironmentFile=/etc/restic/envExecStart=/usr/local/bin/restic forget \ --tag systemd \ --keep-daily 7 \ --keep-weekly 4 \ --keep-monthly 6 \ --keep-yearly 2ExecStart=/usr/local/bin/restic prune --max-unused 5%ExecStart=/usr/local/bin/restic check --read-data-subset=1G/10ExecStartPost=/bin/sh -c 'curl -fsS -m 10 --retry 5 -o /dev/null "${HEALTHCHECK_URL}"'The timer to run maintenance weekly:
[Unit]Description=Weekly restic maintenance
[Timer]OnCalendar=Sun 03:00RandomizedDelaySec=30minPersistent=true
[Install]WantedBy=timers.targetEnable it all:
systemctl daemon-reloadsystemctl enable --now restic-backup.timersystemctl enable --now restic-maintenance.timerCheck your timers are registered:
systemctl list-timers restic*The RandomizedDelaySec spreads the start time by up to 30 minutes, which helps if you have multiple machines hitting the same backend.
Migrating to a New Backend
Your needs change. Maybe you started on SFTP to a Raspberry Pi in your closet and now you want to move to S3-compatible storage (Backblaze B2 with S3 API, Wasabi, Cloudflare R2). Or you’re consolidating repos.
Restic doesn’t have a native repo migration command, but the pattern is simple:
# Mount old reporestic -r sftp:oldhost:/backup/restic mount /mnt/restic-old &
# Restore everything from old, back up to newrestic -r s3:s3.us-west-004.backblazeb2.com/my-new-bucket backup /mnt/restic-old/snapshotsThis is a full re-upload, so plan for bandwidth and time accordingly. The alternative is using restic copy which was introduced in v0.10:
restic -r s3:s3.us-west-004.backblazeb2.com/my-new-bucket \ copy \ --from-repo sftp:oldhost:/backup/restic \ --from-password-file /etc/restic/old-passwordcopy transfers snapshots between repos without fully restoring them, which is much more efficient. It preserves snapshot metadata, deduplication, and compression.
Restic vs Kopia vs Borg: The Maintenance Angle
You’re using restic, but let’s briefly acknowledge the competition since you might be thinking about it.
Kopia handles garbage collection automatically in the background — you don’t have a manual forget + prune cycle. It also has a UI and does maintenance-in-backup rather than as a separate step. If the restic maintenance ceremony annoys you, Kopia is worth a look. The tradeoff is a more complex mental model and a repo format that’s less widely supported by third-party tooling.
Borg (and its S3-capable successor Borgmatic) uses borg prune + borg compact (compact was added later to actually free space after prune). The compact step trips up people who migrate from older Borg versions. Borg is faster than restic for repos on local or SFTP storage because it doesn’t have restic’s cloud-friendly chunking overhead. For remote cloud storage, restic generally wins.
For most home lab setups with cloud backends: stick with restic. The tooling is mature, the community is large, and v0.17+ has closed most of the performance gaps that used to make people reach for alternatives.
The Bottom Line
Restic is a great backup tool that will absolutely eat your storage budget if you ignore it for a year. The maintenance loop is not optional.
Run it in this order: forget (with your retention policy), prune (to actually free space), check --read-data-subset (to verify without downloading the world). Automate it with a systemd timer, ping healthchecks.io so you know when it fails, and set --append-only on your rest-server if ransomware protection matters to you (it should).
Upgrade to restic v0.17 or newer if you haven’t — faster prune, --repack-uncompressed, and better memory usage are all worth it.
Your 2 AM self will thank you for the 15 minutes you spent setting this up correctly today.