Most self-hosted infrastructure has backups that were set up once, never tested, and discovered to be broken at the worst possible moment. This post covers a backup strategy that's actually reliable: Restic for deduplicated encrypted snapshots, S3-compatible object storage for offsite retention, retention policies that don't accumulate forever, and โ critically โ automated restore testing so you find out if something is broken before you need the backup.
Why Restic
Restic is a modern backup tool with three properties that distinguish it from older alternatives (rsync, tar, duplicity):
- Client-side encryption by default. Every backup is encrypted before it leaves your machine using AES-256-CTR with Poly1305-AES authentication. The backup server never sees plaintext data.
- Content-addressed deduplication. Data is chunked using CDC (content-defined chunking) and deduplicated across all snapshots. A 100GB VM image that changes 1GB per day stores roughly 100GB + (1GB ร days), not 100GB ร days.
- Supports many backends. Local disk, SFTP, S3/MinIO, Backblaze B2, Azure Blob, and more โ all with the same CLI interface.
Setting up a repository
# Install Restic (Ubuntu 24.04)
apt install restic
# Initialize a repository on Backblaze B2
export B2_ACCOUNT_ID="your-account-id"
export B2_ACCOUNT_KEY="your-application-key"
export RESTIC_PASSWORD="your-very-long-repository-password"
restic -r b2:your-bucket-name:backups/server-01 init
# Or on S3-compatible storage (Hetzner Object Storage, MinIO, Wasabi, etc.)
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"
restic -r s3:https://s3.hetzner.com/your-bucket/server-01 init
Store your repository password safely. If you lose it, the repository is unrecoverable โ all the data is encrypted and there is no backdoor. Use a password manager (PassVault works well here) and store a copy in a separate secure location, not on the same server being backed up.
The backup script
Rather than running Restic directly from cron, wrap it in a script that handles environment variables, logging, and failure notification:
#!/usr/bin/env bash
# /usr/local/bin/backup-server.sh
set -euo pipefail
# Load secrets from a restricted environment file
# chmod 600 /etc/restic/env
source /etc/restic/env
REPO="s3:https://s3.hetzner.com/mybucket/server-01"
LOG="/var/log/restic/$(date +%Y-%m-%d).log"
HOSTNAME=$(hostname -s)
log() { echo "[$(date -Iseconds)] $*" | tee -a "$LOG"; }
log "=== Backup started on $HOSTNAME ==="
# Backup application data (excluding caches and temp files)
restic -r "$REPO" backup \
--tag "app-data" \
--exclude-caches \
--exclude "*.log" \
--exclude "*/tmp/*" \
--exclude "*/cache/*" \
/var/lib/app \
/etc \
/home \
2>&1 | tee -a "$LOG"
# Backup PostgreSQL (dump first, then backup the dump)
pg_dumpall -U postgres | gzip > /tmp/pg_backup_$(date +%Y%m%d).sql.gz
restic -r "$REPO" backup --tag "postgres" /tmp/pg_backup_$(date +%Y%m%d).sql.gz
rm -f /tmp/pg_backup_$(date +%Y%m%d).sql.gz
# Apply retention policy
log "Applying retention policy..."
restic -r "$REPO" forget \
--keep-daily 7 \ # Keep daily snapshots for 7 days
--keep-weekly 5 \ # Keep weekly snapshots for 5 weeks
--keep-monthly 12 \ # Keep monthly snapshots for 12 months
--keep-yearly 3 \ # Keep yearly snapshots for 3 years
--prune \ # Actually remove unreferenced data
2>&1 | tee -a "$LOG"
log "=== Backup complete ==="
# Alert on failure (hook into PagerDuty/Alertmanager/simple email)
# This only runs if the script exits non-zero (set -e)
trap 'log "BACKUP FAILED"; /usr/local/bin/send-alert.sh "Backup failed on $HOSTNAME"' ERR
Retention policy reasoning
The retention policy above keeps:
- 7 daily snapshots โ recover from an accidental deletion or corruption discovered within a week
- 5 weekly snapshots โ catch issues discovered after daily retention expires
- 12 monthly snapshots โ one year of monthly restore points for longer-running problems (ransomware, silent corruption)
- 3 yearly snapshots โ long-term compliance or audit requirements
The --prune flag deletes data chunks no longer referenced by any kept snapshot. Without it, forget only removes snapshot metadata โ the actual data stays and your storage costs keep growing. Run --prune regularly, but note it can take significant time on large repositories.
Integrity verification
Restic's check command verifies repository integrity. Run it weekly:
# Verify metadata integrity (fast โ seconds to minutes)
restic -r "$REPO" check
# Verify data integrity (reads actual data from storage โ slow, but thorough)
# Run monthly, or before a critical restore
restic -r "$REPO" check --read-data
# Read a random 10% of data (good balance of speed vs confidence)
restic -r "$REPO" check --read-data-subset=10%
Automated restore testing โ the part everyone skips
A backup you've never tested restoring is a backup you can't trust. This script runs weekly on a separate server and verifies that the most recent snapshot can be successfully restored and that the restored data passes sanity checks:
#!/usr/bin/env bash
# /usr/local/bin/test-restore.sh โ run on a separate test server
set -euo pipefail
source /etc/restic/env
REPO="s3:https://s3.hetzner.com/mybucket/server-01"
RESTORE_DIR="/tmp/restore-test-$(date +%Y%m%d)"
echo "Starting restore test..."
# Restore the latest snapshot to a temp directory
restic -r "$REPO" restore latest --target "$RESTORE_DIR"
# Sanity checks โ adapt these to your application
if [[ ! -f "$RESTORE_DIR/var/lib/app/config.json" ]]; then
echo "FAIL: config.json missing from restore"
exit 1
fi
# Check that the PostgreSQL dump can be parsed
if ! gzip -t "$RESTORE_DIR/tmp/pg_backup_"*.sql.gz 2>/dev/null; then
echo "FAIL: PostgreSQL dump is not valid gzip"
exit 1
fi
# Check file count is within expected range
FILE_COUNT=$(find "$RESTORE_DIR/var/lib/app" -type f | wc -l)
if [[ $FILE_COUNT -lt 100 ]]; then
echo "FAIL: Suspiciously few files in restore ($FILE_COUNT)"
exit 1
fi
echo "Restore test PASSED. Files: $FILE_COUNT"
rm -rf "$RESTORE_DIR"
The 3-2-1 rule applied here: 3 copies of data (live server + local backup + remote object storage), on 2 different media types (local disk + cloud object storage), with 1 copy offsite (Backblaze B2 or Hetzner Object Storage in a different region from your servers). Restic handles the encryption and deduplication โ your job is to make sure the destinations exist and the credentials are rotated and tested.
Scheduling with systemd timers (better than cron)
# /etc/systemd/system/restic-backup.service
[Unit]
Description=Restic backup
After=network-online.target
Wants=network-online.target
[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup-server.sh
User=root
StandardOutput=journal
StandardError=journal
# /etc/systemd/system/restic-backup.timer
[Unit]
Description=Run Restic backup daily at 02:00
Requires=restic-backup.service
[Timer]
OnCalendar=*-*-* 02:00:00
RandomizedDelaySec=30m # Spread load if many servers backup at once
Persistent=true # Run missed jobs after downtime
[Install]
WantedBy=timers.target
# Enable:
systemctl daemon-reload
systemctl enable --now restic-backup.timer
Using systemd timers instead of cron gives you proper dependency ordering (the network must be up), journal logging for all output, and `systemctl status restic-backup.timer` to see the next run time and the last result at a glance.