ZFS is the filesystem we use for all persistent storage in 47Network Studio hardware engagements. It's not the simplest option โ ext4 or xfs would work fine for many use cases โ but ZFS offers a combination of properties that matter for infrastructure you're expected to operate reliably for years: end-to-end checksums that catch silent data corruption, copy-on-write snapshots with no performance penalty, built-in compression that often makes storage faster not just smaller, and a replication primitive (send/receive) that makes off-site backup straightforward. This post covers the concepts and the concrete commands for a production-grade setup.
Pool design: choosing your RAID level
A ZFS pool is the top-level storage construct โ it contains one or more vdevs (virtual devices), each of which can be a single disk, a mirror, or a RAIDZ array. The pool's available capacity and fault tolerance are determined by the vdev configuration.
- Mirror (RAID-1): two or more disks, all containing identical data. Excellent read performance (reads from any disk), survives any single disk failure, high write performance. Costs 50% of raw capacity for a 2-way mirror. Best for NVMe SSDs where you want maximum IOPS.
- RAIDZ1 (RAID-5 equivalent): N+1 disks. One disk of parity, survives one failure. Acceptable for archival storage. Not recommended for modern large disks โ an 18TB disk rebuild takes days and rebuild reads can trigger another failure on a stressed array.
- RAIDZ2 (RAID-6 equivalent): N+2 disks. Two disks of parity, survives two simultaneous failures. The recommended choice for spinning disks holding important data. Minimum 4 disks.
- RAIDZ3: N+3 disks. Three parity disks. Justified only for very large arrays or extremely high-value data where extended rebuild time is a concern.
# Create a RAIDZ2 pool with 6 disks
zpool create -o ashift=12 datapool raidz2 \
/dev/disk/by-id/ata-WDC_WD180EDAZ_A1B2C3 \
/dev/disk/by-id/ata-WDC_WD180EDAZ_D4E5F6 \
/dev/disk/by-id/ata-WDC_WD180EDAZ_G7H8I9 \
/dev/disk/by-id/ata-WDC_WD180EDAZ_J1K2L3 \
/dev/disk/by-id/ata-WDC_WD180EDAZ_M4N5O6 \
/dev/disk/by-id/ata-WDC_WD180EDAZ_P7Q8R9
# Always use disk/by-id paths, not /dev/sdX โ device letters change after reboots
# ashift=12 = 4K sector size alignment (correct for all modern drives)
Dataset hierarchy: structure before you need it
ZFS datasets are lightweight, filesystem-like containers within a pool. Each dataset can have its own compression, quota, snapshot policy, and mountpoint. Design your dataset hierarchy to match your data access patterns and retention requirements:
# Create datasets for different workloads
zfs create datapool/vms # Proxmox VM disk images
zfs create datapool/containers # LXC containers
zfs create datapool/backups # restic backup repository
zfs create datapool/media # Large media files (different compression)
zfs create datapool/postgres # PostgreSQL data directory
# Per-dataset settings
zfs set compression=lz4 datapool/vms # Fast, good ratio for VM images
zfs set compression=zstd datapool/backups # Better ratio for backup data
zfs set compression=off datapool/media # Already compressed, skip
zfs set compression=lz4 datapool/postgres # DB pages compress well
# Quotas and reservations
zfs set quota=2T datapool/backups # Hard cap at 2TB
zfs set reservation=100G datapool/postgres # Always reserve 100GB
# Disable access time updates for better performance
zfs set atime=off datapool/vms
zfs set atime=off datapool/postgres
Snapshots: free-as-in-beer point-in-time backups
ZFS snapshots are copy-on-write โ creating one is instant and costs no disk space until data in the dataset diverges from the snapshot. A snapshot of a 1TB dataset takes less than a second and initially uses zero additional space.
# Take a snapshot (naming convention: dataset@YYYY-MM-DD)
zfs snapshot datapool/postgres@2026-02-25
# List all snapshots
zfs list -t snapshot
# Roll back to a snapshot (DESTRUCTIVE โ discards all changes since snapshot)
zfs rollback datapool/postgres@2026-02-25
# Clone a snapshot to a new dataset (useful for testing migrations)
zfs clone datapool/postgres@2026-02-25 datapool/postgres-test
# Automated snapshots with a cron job or systemd timer
# Using zfs-auto-snapshot or manual:
# /etc/cron.d/zfs-snapshots
0 * * * * root zfs snapshot datapool/postgres@$(date +%Y-%m-%d-%H%M) # Hourly
0 2 * * * root zfs snapshot datapool/vms@$(date +%Y-%m-%d) # Daily
# Prune old snapshots โ keep last 24 hourly, 30 daily
Send/receive: replication and off-site backup
ZFS send/receive is the most powerful ZFS feature for backup and replication. It streams the exact ZFS data stream โ including snapshot history โ to another pool, either locally or over SSH. Incremental sends transfer only the changed blocks since the last snapshot:
# Initial full send to backup host
zfs send datapool/postgres@2026-02-25 | \
ssh backup-host zfs receive backuppool/postgres
# Incremental send (only changes since @2026-02-24)
zfs send -i datapool/postgres@2026-02-24 \
datapool/postgres@2026-02-25 | \
ssh backup-host zfs receive backuppool/postgres
# With compression and verbose progress
zfs send -v datapool/postgres@2026-02-25 | \
pv | \ # pv shows transfer progress
zstd | \ # compress the stream
ssh backup-host "zstd -d | zfs receive backuppool/postgres"
# Resume interrupted transfer (large pools over slow links)
zfs send -t $(cat /tmp/resume-token) | ssh backup-host zfs receive -s backuppool/postgres
Maintenance: scrubs and monitoring
# Schedule weekly scrubs (checks every block for corruption)
echo "0 2 * * 0 root /sbin/zpool scrub datapool" >> /etc/crontab
# Check pool health
zpool status datapool
# Monitor ZFS with Prometheus (zfs_exporter)
# Key metrics to alert on:
# - zpool_state != 0 (pool is degraded or faulted)
# - zfs_scrub_errors_total > 0 (checksum errors found)
# - zpool_free_bytes < 10% of total (pool getting full)
# ZFS performance degrades sharply above 80% full โ alert at 75%
In 47Network Studio hardware engagements, ZFS runs on every NAS and storage server. The law firm engagement uses a 6-disk RAIDZ2 pool for all file server storage, with daily snapshots retained for 30 days and weekly send/receive replication to an off-site backup server. The media Proxmox cluster uses ZFS-backed shared storage for all VM disk images, with per-VM snapshots before any maintenance window.
ZFS and ECC RAM: ZFS's data integrity guarantees are only as strong as your RAM. If RAM flips a bit while data passes through it, ZFS will checksum that corrupted data as correct. ECC (error-correcting) RAM catches and corrects these in-flight bit flips. For any storage server where data integrity is critical โ and if you're using ZFS, it is โ ECC RAM is not optional.