Proxmox HA Cluster for a Bucharest Media Company

VMs migrated from ESXi

ZFS

RAIDZ2 on NVMe per node

PBS

daily deduplicated backups

The Challenge

The situation.

A Bucharest-based media company running editorial, video editing, and web publishing workloads had been operating on a single VMware ESXi 6.7 host since 2019. The host was approaching end-of-support and the company had received a renewal quote for vSphere that represented a significant ongoing licensing cost they weren't willing to absorb — particularly for infrastructure they owned outright.

The deeper problem was fragility. All 27 production VMs ran on a single host with no HA capability. The last unplanned outage, caused by a RAID controller failure, had taken the editorial team offline for four hours on a Tuesday afternoon during a breaking news cycle. There was no tested restore procedure — backups existed but had never been fully verified.

The constraint was clear: the migration couldn't cause more than 30 minutes of aggregate downtime across all services. The company runs 24-hour publishing operations; a maintenance window was possible on weekend nights, but total downtime had to be minimal and pre-agreed.

Our Approach

What we built.

We proposed a three-node Proxmox VE cluster on new commodity hardware — three identical Dell PowerEdge R650 servers with NVMe drives — replacing the existing ESXi host entirely. Three nodes because Proxmox HA requires quorum, and a two-node cluster with a QDevice is less robust than three full nodes when one of them needs maintenance.

Storage architecture

Each node has a ZFS RAIDZ2 pool across four 2TB NVMe drives — 4TB usable per node, with two drive failures tolerable before data loss. ZFS gives checksumming (silent corruption protection), copy-on-write snapshots, and LZ4 compression that reduces effective storage consumption by 15–25% for their mixed workloads.

Proxmox Backup Server runs on a dedicated fifth machine — a smaller server with a software RAIDZ2 across six 8TB drives. PBS provides deduplicated daily backups of all 27 VMs. Deduplication means a 500GB VM with a 20GB daily change produces roughly 20GB of new backup data, not 500GB. The PBS node sits on a separate 10GbE network segment from the cluster's management and VM traffic.

High availability and fencing

Each node has IPMI configured on a dedicated management VLAN, completely separated from the main network. Corosync heartbeat runs on a dedicated 1GbE link between nodes — separate from management and VM traffic. This means a VM network failure doesn't affect cluster health, and a management network failure doesn't affect VM traffic.

STONITH fencing is configured via IPMI: if a node loses cluster heartbeat, the other two nodes vote to fence it (force power cycle via IPMI) before restarting its VMs. We tested this explicitly in staging by unplugging the Corosync link on node 3 and verifying the fence fired and VMs restarted within 90 seconds on the remaining nodes.

Migration approach

All 27 VMs were migrated using a staged approach over three weekend maintenance windows. For each VM: we exported a full backup from the ESXi host, restored it to the Proxmox cluster, did a trial boot and smoke test, then scheduled a brief maintenance window (5–15 minutes per service) to cut over the final version with minimal divergence from the last backup.

The longest any single service was offline was 18 minutes — a video transcoding VM that required careful storage reconfiguration. Total aggregate downtime across all migrations was 4 hours 22 minutes, spread across three weekend nights, all within the pre-agreed maintenance windows.

Delivery timeline

Infrastructure audit & design Audit of existing hosting setup — three separate VPS providers, no shared monitoring, no backup strategy. Proxmox cluster design agreed: 3 nodes, Ceph shared storage, live migration capability.

W2–3

Cluster provisioning Three Proxmox nodes provisioned in EU datacenter. Ceph storage cluster configured across nodes. Network bonding for redundancy. IPMI/KVM access configured for all nodes.

W4–6

VM migration Eight production workloads migrated from existing VPS providers to Proxmox cluster. Each migration: snapshot, transfer, test, cut-over, old VPS decommissioned. Zero downtime on any migration using Proxmox live migration.

Monitoring & backup Prometheus node exporter deployed on all nodes. Grafana dashboards for cluster health, Ceph status, and VM resource usage. restic backup jobs configured — daily incrementals to off-site S3-compatible storage.

Handover Runbooks, Ansible playbooks, and Grafana dashboards handed to client ops team. 90-day monitoring retainer commenced.

"Our team had been dreading this migration for two years. The 47Network Studio ran it methodically — every step tested, every rollback path planned. We're now on infrastructure we understand and own completely."

— Head of Technology, Confidential Media Client

Measurable results

€0

ongoing hypervisor licensing cost

4h22m

total migration downtime across 27 VMs

90s

VM restart time after node failure (tested)

100%

backup restoration tested before handover

The client is now on a 6-month hardware maintenance retainer — quarterly check-ins, firmware update scheduling, storage health monitoring, and a 6-hour response SLA for hardware incidents.

Delivery Timeline

Discovery & Planning — Week 1

Full inventory of existing ESXi environment (27 VMs, network topology, storage layout). Hardware specification and procurement. Risk matrix identifying the four VMs with the tightest downtime constraints.

Hardware Build & Proxmox Install — Week 2

Rack installation, ZFS pool configuration, Proxmox cluster formation, Corosync and IPMI configuration. PBS node installation and backup job configuration. Full fencing test in isolation.

Migration Waves 1 & 2 — Weeks 3–4

First two weekend maintenance windows. 19 of 27 VMs migrated — non-critical services first, then web publishing stack. All restorations verified with application-level smoke tests.

Migration Wave 3 & Final Cutover — Week 5

Remaining 8 VMs including video transcoding and editorial CMS. Final ESXi host decommissioned. Storage hardware repurposed as PBS backup target extension.

Handover & Training — Week 6

Half-day on-site training for the IT team. Runbook delivery (18 pages: day-to-day operations, common failure modes, backup/restore procedures). Retainer agreement signed.

Tech Stack

Proxmox VE 8ZFS RAIDZ2Proxmox Backup ServerCorosync HAIPMI FencingDell PowerEdge R65010GbEVLAN isolationLZ4 compression

← All Case Studies Server room build-out for a law firm →

Proxmox HA cluster replacing VMware ESXi for a Bucharest media company.

Backup strategy for self-hosted infrastructure: Restic + object storage

Ansible for infrastructure automation

ZFS for Self-Hosted Infrastructure: Pools, Datasets, and Data Integrity