Infrastructure

Backup strategy for self-hosted infrastructure: Restic + object storage.


Most self-hosted infrastructure has backups that were set up once, never tested, and discovered to be broken at the worst possible moment. This post covers a backup strategy that's actually reliable: Restic for deduplicated encrypted snapshots, S3-compatible object storage for offsite retention, retention policies that don't accumulate forever, and โ€” critically โ€” automated restore testing so you find out if something is broken before you need the backup.

Why Restic

Restic is a modern backup tool with three properties that distinguish it from older alternatives (rsync, tar, duplicity):

  • Client-side encryption by default. Every backup is encrypted before it leaves your machine using AES-256-CTR with Poly1305-AES authentication. The backup server never sees plaintext data.
  • Content-addressed deduplication. Data is chunked using CDC (content-defined chunking) and deduplicated across all snapshots. A 100GB VM image that changes 1GB per day stores roughly 100GB + (1GB ร— days), not 100GB ร— days.
  • Supports many backends. Local disk, SFTP, S3/MinIO, Backblaze B2, Azure Blob, and more โ€” all with the same CLI interface.

Setting up a repository

# Install Restic (Ubuntu 24.04)
apt install restic

# Initialize a repository on Backblaze B2
export B2_ACCOUNT_ID="your-account-id"
export B2_ACCOUNT_KEY="your-application-key"
export RESTIC_PASSWORD="your-very-long-repository-password"

restic -r b2:your-bucket-name:backups/server-01 init

# Or on S3-compatible storage (Hetzner Object Storage, MinIO, Wasabi, etc.)
export AWS_ACCESS_KEY_ID="your-key"
export AWS_SECRET_ACCESS_KEY="your-secret"

restic -r s3:https://s3.hetzner.com/your-bucket/server-01 init

Store your repository password safely. If you lose it, the repository is unrecoverable โ€” all the data is encrypted and there is no backdoor. Use a password manager (PassVault works well here) and store a copy in a separate secure location, not on the same server being backed up.

The backup script

Rather than running Restic directly from cron, wrap it in a script that handles environment variables, logging, and failure notification:

#!/usr/bin/env bash
# /usr/local/bin/backup-server.sh
set -euo pipefail

# Load secrets from a restricted environment file
# chmod 600 /etc/restic/env
source /etc/restic/env

REPO="s3:https://s3.hetzner.com/mybucket/server-01"
LOG="/var/log/restic/$(date +%Y-%m-%d).log"
HOSTNAME=$(hostname -s)

log() { echo "[$(date -Iseconds)] $*" | tee -a "$LOG"; }

log "=== Backup started on $HOSTNAME ==="

# Backup application data (excluding caches and temp files)
restic -r "$REPO" backup \
  --tag "app-data" \
  --exclude-caches \
  --exclude "*.log" \
  --exclude "*/tmp/*" \
  --exclude "*/cache/*" \
  /var/lib/app \
  /etc \
  /home \
  2>&1 | tee -a "$LOG"

# Backup PostgreSQL (dump first, then backup the dump)
pg_dumpall -U postgres | gzip > /tmp/pg_backup_$(date +%Y%m%d).sql.gz
restic -r "$REPO" backup --tag "postgres" /tmp/pg_backup_$(date +%Y%m%d).sql.gz
rm -f /tmp/pg_backup_$(date +%Y%m%d).sql.gz

# Apply retention policy
log "Applying retention policy..."
restic -r "$REPO" forget \
  --keep-daily   7  \   # Keep daily snapshots for 7 days
  --keep-weekly  5  \   # Keep weekly snapshots for 5 weeks
  --keep-monthly 12 \   # Keep monthly snapshots for 12 months
  --keep-yearly  3  \   # Keep yearly snapshots for 3 years
  --prune           \   # Actually remove unreferenced data
  2>&1 | tee -a "$LOG"

log "=== Backup complete ==="

# Alert on failure (hook into PagerDuty/Alertmanager/simple email)
# This only runs if the script exits non-zero (set -e)
trap 'log "BACKUP FAILED"; /usr/local/bin/send-alert.sh "Backup failed on $HOSTNAME"' ERR

Retention policy reasoning

The retention policy above keeps:

  • 7 daily snapshots โ€” recover from an accidental deletion or corruption discovered within a week
  • 5 weekly snapshots โ€” catch issues discovered after daily retention expires
  • 12 monthly snapshots โ€” one year of monthly restore points for longer-running problems (ransomware, silent corruption)
  • 3 yearly snapshots โ€” long-term compliance or audit requirements

The --prune flag deletes data chunks no longer referenced by any kept snapshot. Without it, forget only removes snapshot metadata โ€” the actual data stays and your storage costs keep growing. Run --prune regularly, but note it can take significant time on large repositories.

Integrity verification

Restic's check command verifies repository integrity. Run it weekly:

# Verify metadata integrity (fast โ€” seconds to minutes)
restic -r "$REPO" check

# Verify data integrity (reads actual data from storage โ€” slow, but thorough)
# Run monthly, or before a critical restore
restic -r "$REPO" check --read-data

# Read a random 10% of data (good balance of speed vs confidence)
restic -r "$REPO" check --read-data-subset=10%

Automated restore testing โ€” the part everyone skips

A backup you've never tested restoring is a backup you can't trust. This script runs weekly on a separate server and verifies that the most recent snapshot can be successfully restored and that the restored data passes sanity checks:

#!/usr/bin/env bash
# /usr/local/bin/test-restore.sh โ€” run on a separate test server

set -euo pipefail
source /etc/restic/env

REPO="s3:https://s3.hetzner.com/mybucket/server-01"
RESTORE_DIR="/tmp/restore-test-$(date +%Y%m%d)"

echo "Starting restore test..."

# Restore the latest snapshot to a temp directory
restic -r "$REPO" restore latest --target "$RESTORE_DIR"

# Sanity checks โ€” adapt these to your application
if [[ ! -f "$RESTORE_DIR/var/lib/app/config.json" ]]; then
  echo "FAIL: config.json missing from restore"
  exit 1
fi

# Check that the PostgreSQL dump can be parsed
if ! gzip -t "$RESTORE_DIR/tmp/pg_backup_"*.sql.gz 2>/dev/null; then
  echo "FAIL: PostgreSQL dump is not valid gzip"
  exit 1
fi

# Check file count is within expected range
FILE_COUNT=$(find "$RESTORE_DIR/var/lib/app" -type f | wc -l)
if [[ $FILE_COUNT -lt 100 ]]; then
  echo "FAIL: Suspiciously few files in restore ($FILE_COUNT)"
  exit 1
fi

echo "Restore test PASSED. Files: $FILE_COUNT"
rm -rf "$RESTORE_DIR"

The 3-2-1 rule applied here: 3 copies of data (live server + local backup + remote object storage), on 2 different media types (local disk + cloud object storage), with 1 copy offsite (Backblaze B2 or Hetzner Object Storage in a different region from your servers). Restic handles the encryption and deduplication โ€” your job is to make sure the destinations exist and the credentials are rotated and tested.

Scheduling with systemd timers (better than cron)

# /etc/systemd/system/restic-backup.service
[Unit]
Description=Restic backup
After=network-online.target
Wants=network-online.target

[Service]
Type=oneshot
ExecStart=/usr/local/bin/backup-server.sh
User=root
StandardOutput=journal
StandardError=journal

# /etc/systemd/system/restic-backup.timer
[Unit]
Description=Run Restic backup daily at 02:00
Requires=restic-backup.service

[Timer]
OnCalendar=*-*-* 02:00:00
RandomizedDelaySec=30m   # Spread load if many servers backup at once
Persistent=true          # Run missed jobs after downtime

[Install]
WantedBy=timers.target

# Enable:
systemctl daemon-reload
systemctl enable --now restic-backup.timer

Using systemd timers instead of cron gives you proper dependency ordering (the network must be up), journal logging for all output, and `systemctl status restic-backup.timer` to see the next run time and the last result at a glance.


โ† Back to Blog Proxmox for Production โ†’