wiki: add restic + B2 fleet backups runbook
Architecture, per-engine DB dump patterns, restore procedure, add-a-host, and gotchas (RESTIC_CACHE_DIR/$HOME, missing sqlite3, docker dump env vars, delete-capable B2 key). Linked in SUMMARY under storage-backup.
This commit is contained in:
parent
2bed2cbae3
commit
4599ed607c
2 changed files with 138 additions and 1 deletions
136
02-selfhosting/storage-backup/restic-b2-fleet-backups.md
Normal file
136
02-selfhosting/storage-backup/restic-b2-fleet-backups.md
Normal file
|
|
@ -0,0 +1,136 @@
|
||||||
|
---
|
||||||
|
title: "App-Consistent Fleet Backups with restic + Backblaze B2"
|
||||||
|
domain: selfhosting
|
||||||
|
category: storage-backup
|
||||||
|
tags: [restic, backblaze, b2, backup, ansible, systemd, postgresql, mysql, sqlite, docker, disaster-recovery]
|
||||||
|
status: published
|
||||||
|
created: 2026-06-19
|
||||||
|
updated: 2026-06-19
|
||||||
|
---
|
||||||
|
|
||||||
|
# App-Consistent Fleet Backups with restic + Backblaze B2
|
||||||
|
|
||||||
|
A repeatable pattern for backing up a mixed fleet (Ubuntu + Fedora, VPS + homelab, bare services + Docker) to Backblaze B2 with [restic](https://restic.net) — encrypted, deduplicated, and **app-consistent** (databases are dumped before the snapshot, not copied live). Driven by Ansible and a per-host `systemd` timer.
|
||||||
|
|
||||||
|
## The Short Answer
|
||||||
|
|
||||||
|
Per host, nightly: **dump every database to a staging dir → `restic backup` that staging dir plus the data paths → apply retention → wipe staging.** A monthly timer runs `restic prune`. Anything that fails emails the admin. One B2 bucket holds a separate repo per host at `b2:<bucket>:<hostname>`.
|
||||||
|
|
||||||
|
Retention is `--keep-daily 7 --keep-weekly 4 --keep-monthly 6` (~6 months of history).
|
||||||
|
|
||||||
|
## Why dump databases first
|
||||||
|
|
||||||
|
Copying a live database's files (`/var/lib/mysql`, a running SQLite file, a Postgres data dir) gives you a *crash-consistent* copy at best — restorable only if you're lucky. Logical dumps are guaranteed consistent:
|
||||||
|
|
||||||
|
- **MySQL / MariaDB:** `mysqldump --single-transaction --routines --triggers --databases <db>`
|
||||||
|
- **PostgreSQL:** `pg_dump -Fc <db>` (custom format) via the `postgres` system user (peer auth)
|
||||||
|
- **SQLite:** `sqlite3 <file> ".backup '<out>'"` — uses the online backup API, safe against a running writer
|
||||||
|
- **Dockerized DBs:** `docker exec <container> sh -c '<dump cmd>'`, letting the container's own shell expand its root-password env var
|
||||||
|
|
||||||
|
restic then backs up the dump files (which dedupe beautifully — only the changed blocks upload each night).
|
||||||
|
|
||||||
|
## Repository layout
|
||||||
|
|
||||||
|
- **One private B2 bucket** (e.g. `majorshouse-backups`).
|
||||||
|
- **One repo per host:** `b2:majorshouse-backups:<hostname>`.
|
||||||
|
- The application key needs **read + write + delete** for the bucket. restic deletes objects during `forget`/`prune`, so a pure *append-only* key will break retention. (True append-only requires splitting `forget`/`prune` onto a separate maintenance key — a worthwhile hardening step, but not the default.)
|
||||||
|
- Credentials live in an `EnvironmentFile` (`/etc/restic/restic-env`, mode `0600`, root): `RESTIC_REPOSITORY`, `RESTIC_PASSWORD`, `B2_ACCOUNT_ID`, `B2_ACCOUNT_KEY`.
|
||||||
|
|
||||||
|
## The backup script (shape)
|
||||||
|
|
||||||
|
```bash
|
||||||
|
set -uo pipefail
|
||||||
|
STAGING=/var/backups/restic-staging
|
||||||
|
rm -rf "$STAGING"; mkdir -p "$STAGING"; chmod 700 "$STAGING"
|
||||||
|
|
||||||
|
# per-engine dumps into $STAGING ...
|
||||||
|
mysqldump --single-transaction --routines --triggers --databases wordpress > "$STAGING/mysql-wordpress.sql"
|
||||||
|
sudo -u postgres pg_dump -Fc mastodon_production > "$STAGING/pg-mastodon_production.dump"
|
||||||
|
sqlite3 /opt/phantombot/config/phantombot.db ".backup '$STAGING/sqlite-phantombot.db'"
|
||||||
|
|
||||||
|
restic backup --tag fleet-backup --host "$(hostname -s)" \
|
||||||
|
"$STAGING" /var/www /etc/letsencrypt --exclude /path/to/already-offsite/media
|
||||||
|
|
||||||
|
restic forget --keep-daily 7 --keep-weekly 4 --keep-monthly 6
|
||||||
|
rm -rf "$STAGING"
|
||||||
|
```
|
||||||
|
|
||||||
|
Wrap each step so a failure mails the admin and aborts (don't silently back up a half-state). On hosts where the `mail` CLI is absent, pipe a message to `/usr/sbin/sendmail -t` instead.
|
||||||
|
|
||||||
|
## systemd units
|
||||||
|
|
||||||
|
A oneshot service + a timer. Stagger `OnCalendar` per host to spread B2 load, and **always set `RESTIC_CACHE_DIR`** (see Gotchas):
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# restic-backup.service
|
||||||
|
[Service]
|
||||||
|
Type=oneshot
|
||||||
|
EnvironmentFile=/etc/restic/restic-env
|
||||||
|
Environment=RESTIC_CACHE_DIR=/var/cache/restic
|
||||||
|
ExecStart=/usr/local/sbin/restic-backup.sh
|
||||||
|
Nice=10
|
||||||
|
IOSchedulingClass=idle
|
||||||
|
```
|
||||||
|
|
||||||
|
```ini
|
||||||
|
# restic-backup.timer
|
||||||
|
[Timer]
|
||||||
|
OnCalendar=*-*-* 02:30:00
|
||||||
|
RandomizedDelaySec=20m
|
||||||
|
Persistent=true
|
||||||
|
[Install]
|
||||||
|
WantedBy=timers.target
|
||||||
|
```
|
||||||
|
|
||||||
|
A second `restic-prune.timer` runs `restic prune` monthly (`OnCalendar=*-*-01 04:00:00`).
|
||||||
|
|
||||||
|
## Restore procedure
|
||||||
|
|
||||||
|
The whole point. From the target host (or any host with the repo creds):
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# load repo + B2 creds without echoing them
|
||||||
|
set -a; . /etc/restic/restic-env; set +a
|
||||||
|
|
||||||
|
restic snapshots # list; note the snapshot ID or use 'latest'
|
||||||
|
|
||||||
|
# restore specific paths to a scratch dir (never restore in place blindly)
|
||||||
|
restic restore latest --target /tmp/restore \
|
||||||
|
--include /var/backups/restic-staging \
|
||||||
|
--include /var/www/html/wp-config.php
|
||||||
|
|
||||||
|
# verify before doing anything with it
|
||||||
|
ls -la /tmp/restore/var/backups/restic-staging/
|
||||||
|
head -1 /tmp/restore/var/backups/restic-staging/mysql-wordpress.sql # "-- MySQL dump 10.13 ..."
|
||||||
|
```
|
||||||
|
|
||||||
|
To recover a database, restore the dump then load it: `mysql <db> < mysql-<db>.sql`, `pg_restore -d <db> pg-<db>.dump`, or copy the SQLite file back. **Test restores periodically** — a backup you've never restored is a hope, not a backup. Restore the highest-stakes data (password manager, mail) first in any drill.
|
||||||
|
|
||||||
|
## Adding a host
|
||||||
|
|
||||||
|
1. Add it to the `backups` inventory group.
|
||||||
|
2. Give it a `host_vars` scope — which DBs to dump and which paths to back up:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
restic_backup_oncalendar: "*-*-* 02:40:00" # stagger
|
||||||
|
restic_mysql_dbs: [castopod_db]
|
||||||
|
restic_paths: [/var/www/html/castopod]
|
||||||
|
restic_excludes: [/var/www/html/castopod/public/media] # already offsite
|
||||||
|
```
|
||||||
|
3. Run the playbook against that host. The role installs restic, deploys the script + units, `restic init`s the repo if absent, and enables the timers.
|
||||||
|
|
||||||
|
## Gotchas & Notes
|
||||||
|
|
||||||
|
- **`RESTIC_CACHE_DIR` is mandatory under systemd.** systemd services run with no `$HOME`, so restic can't find its cache and warns *"unable to locate cache directory: neither $XDG_CACHE_HOME nor $HOME are defined"* — and re-reads **every file** each run (no incremental). Point it at `/var/cache/restic` in the unit.
|
||||||
|
- **`sqlite3` may not be installed.** A host that runs a SQLite-backed app (e.g. a bot) often lacks the `sqlite3`/`sqlite` CLI. Install it where `restic_sqlite_paths` is set, or the `.backup` step fails.
|
||||||
|
- **Docker DB password env-var names vary.** Don't assume: the MariaDB image may use `MYSQL_ROOT_PASSWORD` (not `MARIADB_ROOT_PASSWORD`), and a Postgres container's superuser is whatever `POSTGRES_USER` is set to — reference `"$POSTGRES_USER"` rather than hardcoding `postgres`. Check with `docker exec <c> sh -c 'env | grep -oE "^(MYSQL|MARIADB|POSTGRES)_[A-Z_]*"'` (name only).
|
||||||
|
- **B2 key needs delete capability.** Otherwise `forget`/`prune` fail. Scope the key to the bucket; reach for per-host `namePrefix`-restricted keys for blast-radius isolation.
|
||||||
|
- **Exclude data that's already offsite.** Media already synced to object storage (S3/B2 via the app or `rclone`) should be `--exclude`d so you don't pay to store it twice.
|
||||||
|
- **First upload is slow, the rest are fast.** The initial snapshot reads and uploads everything; subsequent runs only ship changed blocks. For a large first run, fire it detached and watch from a transient unit that emails you on completion.
|
||||||
|
- **Keep secrets out of git.** The repo password and B2 key belong in an Ansible vault (committed encrypted), referenced into the role — never in plaintext vars.
|
||||||
|
|
||||||
|
## See Also
|
||||||
|
|
||||||
|
- [rsync Backup Patterns](rsync-backup-patterns.md)
|
||||||
|
- [SnapRAID & MergerFS Storage Setup](../../01-linux/storage/snapraid-mergerfs-setup.md)
|
||||||
|
- [restic documentation](https://restic.readthedocs.io)
|
||||||
|
|
@ -1,6 +1,6 @@
|
||||||
---
|
---
|
||||||
created: 2026-04-02T16:03
|
created: 2026-04-02T16:03
|
||||||
updated: 2026-05-15T09:00
|
updated: 2026-06-19T10:05
|
||||||
---
|
---
|
||||||
* [Home](index.md)
|
* [Home](index.md)
|
||||||
* [Linux & Sysadmin](01-linux/index.md)
|
* [Linux & Sysadmin](01-linux/index.md)
|
||||||
|
|
@ -31,6 +31,7 @@ updated: 2026-05-15T09:00
|
||||||
* [AWS S3 Cost Management](02-selfhosting/cloud/aws-s3-cost-management.md)
|
* [AWS S3 Cost Management](02-selfhosting/cloud/aws-s3-cost-management.md)
|
||||||
* [VPS Migration Baseline Checklist](02-selfhosting/cloud/vps-migration-baseline-checklist.md)
|
* [VPS Migration Baseline Checklist](02-selfhosting/cloud/vps-migration-baseline-checklist.md)
|
||||||
* [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
|
* [rsync Backup Patterns](02-selfhosting/storage-backup/rsync-backup-patterns.md)
|
||||||
|
* [Fleet Backups with restic + B2](02-selfhosting/storage-backup/restic-b2-fleet-backups.md)
|
||||||
* [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
|
* [Tuning Netdata Web Log Alerts](02-selfhosting/monitoring/tuning-netdata-web-log-alerts.md)
|
||||||
* [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
|
* [Tuning Netdata Docker Health Alarms](02-selfhosting/monitoring/netdata-docker-health-alarm-tuning.md)
|
||||||
* [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md)
|
* [Deploying Netdata to a New Server](02-selfhosting/monitoring/netdata-new-server-setup.md)
|
||||||
|
|
|
||||||
Loading…
Add table
Reference in a new issue