The situation.
A Bucharest-based media company running editorial, video editing, and web publishing workloads had been operating on a single VMware ESXi 6.7 host since 2019. The host was approaching end-of-support and the company had received a renewal quote for vSphere that represented a significant ongoing licensing cost they weren't willing to absorb โ particularly for infrastructure they owned outright.
The deeper problem was fragility. All 27 production VMs ran on a single host with no HA capability. The last unplanned outage, caused by a RAID controller failure, had taken the editorial team offline for four hours on a Tuesday afternoon during a breaking news cycle. There was no tested restore procedure โ backups existed but had never been fully verified.
The constraint was clear: the migration couldn't cause more than 30 minutes of aggregate downtime across all services. The company runs 24-hour publishing operations; a maintenance window was possible on weekend nights, but total downtime had to be minimal and pre-agreed.
What we built.
We proposed a three-node Proxmox VE cluster on new commodity hardware โ three identical Dell PowerEdge R650 servers with NVMe drives โ replacing the existing ESXi host entirely. Three nodes because Proxmox HA requires quorum, and a two-node cluster with a QDevice is less robust than three full nodes when one of them needs maintenance.
Storage architecture
Each node has a ZFS RAIDZ2 pool across four 2TB NVMe drives โ 4TB usable per node, with two drive failures tolerable before data loss. ZFS gives checksumming (silent corruption protection), copy-on-write snapshots, and LZ4 compression that reduces effective storage consumption by 15โ25% for their mixed workloads.
Proxmox Backup Server runs on a dedicated fifth machine โ a smaller server with a software RAIDZ2 across six 8TB drives. PBS provides deduplicated daily backups of all 27 VMs. Deduplication means a 500GB VM with a 20GB daily change produces roughly 20GB of new backup data, not 500GB. The PBS node sits on a separate 10GbE network segment from the cluster's management and VM traffic.
High availability and fencing
Each node has IPMI configured on a dedicated management VLAN, completely separated from the main network. Corosync heartbeat runs on a dedicated 1GbE link between nodes โ separate from management and VM traffic. This means a VM network failure doesn't affect cluster health, and a management network failure doesn't affect VM traffic.
STONITH fencing is configured via IPMI: if a node loses cluster heartbeat, the other two nodes vote to fence it (force power cycle via IPMI) before restarting its VMs. We tested this explicitly in staging by unplugging the Corosync link on node 3 and verifying the fence fired and VMs restarted within 90 seconds on the remaining nodes.
Migration approach
All 27 VMs were migrated using a staged approach over three weekend maintenance windows. For each VM: we exported a full backup from the ESXi host, restored it to the Proxmox cluster, did a trial boot and smoke test, then scheduled a brief maintenance window (5โ15 minutes per service) to cut over the final version with minimal divergence from the last backup.
The longest any single service was offline was 18 minutes โ a video transcoding VM that required careful storage reconfiguration. Total aggregate downtime across all migrations was 4 hours 22 minutes, spread across three weekend nights, all within the pre-agreed maintenance windows.
Delivery timeline
"Our team had been dreading this migration for two years. The 47Network Studio ran it methodically โ every step tested, every rollback path planned. We're now on infrastructure we understand and own completely."
โ Head of Technology, Confidential Media ClientThe client is now on a 6-month hardware maintenance retainer โ quarterly check-ins, firmware update scheduling, storage health monitoring, and a 6-hour response SLA for hardware incidents.
Discovery & Planning โ Week 1
Full inventory of existing ESXi environment (27 VMs, network topology, storage layout). Hardware specification and procurement. Risk matrix identifying the four VMs with the tightest downtime constraints.
Hardware Build & Proxmox Install โ Week 2
Rack installation, ZFS pool configuration, Proxmox cluster formation, Corosync and IPMI configuration. PBS node installation and backup job configuration. Full fencing test in isolation.
Migration Waves 1 & 2 โ Weeks 3โ4
First two weekend maintenance windows. 19 of 27 VMs migrated โ non-critical services first, then web publishing stack. All restorations verified with application-level smoke tests.
Migration Wave 3 & Final Cutover โ Week 5
Remaining 8 VMs including video transcoding and editorial CMS. Final ESXi host decommissioned. Storage hardware repurposed as PBS backup target extension.
Handover & Training โ Week 6
Half-day on-site training for the IT team. Runbook delivery (18 pages: day-to-day operations, common failure modes, backup/restore procedures). Retainer agreement signed.