HARDWARE INFRASTRUCTURE

Proxmox HA cluster replacing VMware ESXi for a Bucharest media company.

Confidential client · Bucharest, Romania · Q1 2026


3
NODES DEPLOYED
0
PLANNED DOWNTIME
100%
VM MIGRATION VERIFIED
โ‚ฌ0
ONGOING LICENSING
6h
HARDWARE SLA (RETAINER)

27
VMs migrated from ESXi
ZFS
RAIDZ2 on NVMe per node
PBS
daily deduplicated backups

The situation.

A Bucharest-based media company running editorial, video editing, and web publishing workloads had been operating on a single VMware ESXi 6.7 host since 2019. The host was approaching end-of-support and the company had received a renewal quote for vSphere that represented a significant ongoing licensing cost they weren't willing to absorb โ€” particularly for infrastructure they owned outright.

The deeper problem was fragility. All 27 production VMs ran on a single host with no HA capability. The last unplanned outage, caused by a RAID controller failure, had taken the editorial team offline for four hours on a Tuesday afternoon during a breaking news cycle. There was no tested restore procedure โ€” backups existed but had never been fully verified.

The constraint was clear: the migration couldn't cause more than 30 minutes of aggregate downtime across all services. The company runs 24-hour publishing operations; a maintenance window was possible on weekend nights, but total downtime had to be minimal and pre-agreed.

What we built.

We proposed a three-node Proxmox VE cluster on new commodity hardware โ€” three identical Dell PowerEdge R650 servers with NVMe drives โ€” replacing the existing ESXi host entirely. Three nodes because Proxmox HA requires quorum, and a two-node cluster with a QDevice is less robust than three full nodes when one of them needs maintenance.

Storage architecture

Each node has a ZFS RAIDZ2 pool across four 2TB NVMe drives โ€” 4TB usable per node, with two drive failures tolerable before data loss. ZFS gives checksumming (silent corruption protection), copy-on-write snapshots, and LZ4 compression that reduces effective storage consumption by 15โ€“25% for their mixed workloads.

Proxmox Backup Server runs on a dedicated fifth machine โ€” a smaller server with a software RAIDZ2 across six 8TB drives. PBS provides deduplicated daily backups of all 27 VMs. Deduplication means a 500GB VM with a 20GB daily change produces roughly 20GB of new backup data, not 500GB. The PBS node sits on a separate 10GbE network segment from the cluster's management and VM traffic.

High availability and fencing

Each node has IPMI configured on a dedicated management VLAN, completely separated from the main network. Corosync heartbeat runs on a dedicated 1GbE link between nodes โ€” separate from management and VM traffic. This means a VM network failure doesn't affect cluster health, and a management network failure doesn't affect VM traffic.

STONITH fencing is configured via IPMI: if a node loses cluster heartbeat, the other two nodes vote to fence it (force power cycle via IPMI) before restarting its VMs. We tested this explicitly in staging by unplugging the Corosync link on node 3 and verifying the fence fired and VMs restarted within 90 seconds on the remaining nodes.

Migration approach

All 27 VMs were migrated using a staged approach over three weekend maintenance windows. For each VM: we exported a full backup from the ESXi host, restored it to the Proxmox cluster, did a trial boot and smoke test, then scheduled a brief maintenance window (5โ€“15 minutes per service) to cut over the final version with minimal divergence from the last backup.

The longest any single service was offline was 18 minutes โ€” a video transcoding VM that required careful storage reconfiguration. Total aggregate downtime across all migrations was 4 hours 22 minutes, spread across three weekend nights, all within the pre-agreed maintenance windows.

Delivery timeline

W1
Infrastructure audit & design Audit of existing hosting setup โ€” three separate VPS providers, no shared monitoring, no backup strategy. Proxmox cluster design agreed: 3 nodes, Ceph shared storage, live migration capability.
W2โ€“3
Cluster provisioning Three Proxmox nodes provisioned in EU datacenter. Ceph storage cluster configured across nodes. Network bonding for redundancy. IPMI/KVM access configured for all nodes.
W4โ€“6
VM migration Eight production workloads migrated from existing VPS providers to Proxmox cluster. Each migration: snapshot, transfer, test, cut-over, old VPS decommissioned. Zero downtime on any migration using Proxmox live migration.
W7
Monitoring & backup Prometheus node exporter deployed on all nodes. Grafana dashboards for cluster health, Ceph status, and VM resource usage. restic backup jobs configured โ€” daily incrementals to off-site S3-compatible storage.
W8
Handover Runbooks, Ansible playbooks, and Grafana dashboards handed to client ops team. 90-day monitoring retainer commenced.

"Our team had been dreading this migration for two years. The 47Network Studio ran it methodically โ€” every step tested, every rollback path planned. We're now on infrastructure we understand and own completely."

โ€” Head of Technology, Confidential Media Client
โ‚ฌ0
ongoing hypervisor licensing cost
4h22m
total migration downtime across 27 VMs
90s
VM restart time after node failure (tested)
100%
backup restoration tested before handover

The client is now on a 6-month hardware maintenance retainer โ€” quarterly check-ins, firmware update scheduling, storage health monitoring, and a 6-hour response SLA for hardware incidents.

01

Discovery & Planning โ€” Week 1

Full inventory of existing ESXi environment (27 VMs, network topology, storage layout). Hardware specification and procurement. Risk matrix identifying the four VMs with the tightest downtime constraints.

02

Hardware Build & Proxmox Install โ€” Week 2

Rack installation, ZFS pool configuration, Proxmox cluster formation, Corosync and IPMI configuration. PBS node installation and backup job configuration. Full fencing test in isolation.

03

Migration Waves 1 & 2 โ€” Weeks 3โ€“4

First two weekend maintenance windows. 19 of 27 VMs migrated โ€” non-critical services first, then web publishing stack. All restorations verified with application-level smoke tests.

04

Migration Wave 3 & Final Cutover โ€” Week 5

Remaining 8 VMs including video transcoding and editorial CMS. Final ESXi host decommissioned. Storage hardware repurposed as PBS backup target extension.

05

Handover & Training โ€” Week 6

Half-day on-site training for the IT team. Runbook delivery (18 pages: day-to-day operations, common failure modes, backup/restore procedures). Retainer agreement signed.

Proxmox VE 8ZFS RAIDZ2Proxmox Backup ServerCorosync HAIPMI FencingDell PowerEdge R65010GbEVLAN isolationLZ4 compression
โ† All Case Studies Server room build-out for a law firm โ†’