Most Proxmox clusters run for months without touching Proxmox HA configuration. The problem is that “never needed it” and “correctly configured” are not the same thing. When a node actually fails, the difference shows up in the next 90 seconds.
Proxmox High Availability is not a single feature. It’s three separate systems working in sequence: Corosync tracks which nodes are alive, the watchdog ensures a failed node actually stops, and the HA manager restarts protected VMs on surviving nodes. Each layer has to work correctly for automatic recovery to happen. If any one fails silently, the cluster looks healthy until a node goes down.
TL;DR
- HA requires shared storage — VMs on local storage cannot be protected
- Minimum 3 nodes for reliable quorum; 2-node clusters need a quorum device
- Fencing happens before failover, not after — by design
- Softdog watchdog is default; hardware fencing (IPMI) is more reliable
- Failover is stop-and-restart, not live migration — expect 2–4 minutes
- Protected VMs must be explicitly added to HA; it is not automatic
What Proxmox HA actually does
Proxmox HA means: if a cluster node becomes unreachable, the VMs that were running on it restart on surviving nodes automatically, without operator intervention. The full specification is in the Proxmox VE High Availability documentation.
The scope is deliberately narrow. HA protects against node failure — hardware crash, kernel panic, power loss, unresponsive host. It does not protect against guest OS problems, storage corruption, or application-level failures inside the VM.
Proxmox HA manages this through two daemons running on every cluster node:
- pve-ha-crm (Cluster Resource Manager) — runs on one node at a time as the active manager. Tracks the state of all HA resources, decides where to restart them, coordinates fencing. If the CRM node itself fails, another node automatically takes over the role — there is no single point of failure in the manager layer.
- pve-ha-lrm (Local Resource Manager) — runs on every node. Executes start/stop commands for resources on the local node. Also responsible for feeding the watchdog timer.
The split is intentional. LRM acts locally and independently of network connectivity. CRM coordinates cluster-wide decisions. This separation is what makes self-fencing possible even when the network to the CRM is gone.
To identify which node currently holds the active CRM role:
ha-manager statusThe output labels the active manager node. All other nodes run CRM in standby.
The recovery sequence from node failure to VM restart looks like this:
Node Failure
↓
Corosync Detects Loss (heartbeat timeout)
↓
Quorum Check (majority still online?)
↓
Fencing (watchdog expires / IPMI power-off)
↓
HA Resource Relocation (CRM picks surviving node)
↓
VM Restart on Surviving NodeEach step is a hard dependency. Quorum lost — everything stops. Fencing incomplete — no failover. Shared storage missing — VM cannot start on the new node.
Quorum: why the majority vote matters
Quorum is the mechanism that prevents split-brain. To understand why it exists, consider what happens without it:
Node A ------ Shared Storage ------ Node B
Network link fails:
Node A thinks: "Node B is dead, I should start B's VMs"
Node B thinks: "Node A is dead, I should start A's VMs"
Both nodes start writing to the same VM disks simultaneously.
Result: disk corruption.Quorum prevents this by requiring a majority vote before any node can act. In Proxmox HA, Corosync implements quorum: a node can only participate in cluster decisions if the cluster has quorum — meaning more than half of the configured nodes are visible to each other.
In a 3-node cluster: lose one node, quorum is 2/3 — cluster continues. Lose two nodes, quorum is 1/3 — cluster stops accepting decisions entirely. When quorum is lost, Proxmox does not try to continue. Services stop. This is the correct behavior — acting without quorum risks data corruption.
Corosync network stability
Corosync heartbeats are sensitive to network latency and packet loss. A flapping network link can cause false quorum loss — the cluster thinks a node is dead when it isn’t, triggering unnecessary fencing and failover. In real deployments, the most common cause of unexpected HA events is an overloaded or unreliable cluster network. Use a dedicated network interface for Corosync traffic whenever possible, separate from storage and VM traffic. Avoid Wi-Fi. For larger clusters, redundant Corosync links (two separate paths) reduce the risk significantly.
Why 3 nodes is the minimum
With 2 nodes, a single failure drops the surviving node to 1/2 — exactly on the quorum boundary. Proxmox requires strict majority, so 1 surviving node out of 2 has no quorum and cannot act. A 2-node cluster provides zero automatic failover by default.
A quorum device (QDevice) solves this by adding a tiebreaker vote from a lightweight external service — a small VM elsewhere, a Raspberry Pi, any system that can run the QDevice daemon. With QDevice, the surviving node has 2/3 votes and can proceed with failover. Without it, the surviving node is stuck.
QDevice is a workaround, not an equivalent to a third node. The third node contributes compute capacity for VM failover. QDevice only contributes a vote — if one node is down in a 2-node+QDevice setup, the surviving node must absorb all VMs on its own. Plan capacity accordingly.
Check quorum state:
pvecm statusThe output shows Quorate: Yes or Quorate: No, current vote count, and expected votes. If the cluster has lost quorum during planned maintenance, the expected votes parameter can be adjusted temporarily — but this is a maintenance procedure, not normal operations. The recovery procedure for quorum loss is in the cluster quorum guide.
Fencing: the safety switch that makes failover safe
Fencing is the mechanism that guarantees a node is truly stopped before its VMs are started elsewhere. Without confirmed fencing, the Proxmox HA manager cannot know whether the failed node has actually stopped writing to shared storage. Fencing must complete before failover begins — this is not configurable, it’s how the system is designed.
Softdog watchdog (default)
Since Proxmox VE 4.0, the default Proxmox HA fencing mechanism is self-fencing via the Linux watchdog, as documented in the Proxmox HA Manager documentation:
- The pve-ha-lrm daemon on each node continuously writes to
/dev/watchdog— resetting a countdown timer. - As long as LRM is healthy and the node has quorum, it keeps feeding the watchdog.
- If the node loses quorum — network isolation, kernel panic, LRM crash — it stops feeding the watchdog.
- The watchdog timer expires (default: 60 seconds) and forces a reboot.
- The forced reboot is the fencing confirmation. CRM proceeds with failover.
The default module is softdog — a pure software implementation in the Linux kernel, no additional hardware required.
lsmod | grep -i watchdogThe watchdog device should be present at /dev/watchdog. If it’s missing after a kernel update, check /etc/modules and add softdog if it’s gone — a missing watchdog means fencing won’t work, and the HA manager will stall waiting for a fencing confirmation that never arrives.
Failure scenario
Watchdog not loaded after kernel update. Kernel updates occasionally affect module loading. If softdog isn’t present at boot, the LRM cannot feed the watchdog, and Proxmox disables HA as a safety measure rather than run without fencing capability. The symptom is HA resources not starting after reboot with no obvious error. Verify: lsmod | grep watchdog. Fix: add softdog to /etc/modules, then modprobe softdog.
Why fencing takes 60 seconds by default
The 60-second watchdog timeout is intentional. The cluster deliberately waits rather than act immediately. A fast failover that risks disk corruption is worse than a slow failover that guarantees safety. In practice, operators notice the delay most when testing HA for the first time — the VM is down and the cluster is sitting there for a minute doing nothing visible. It is working correctly.
With hardware fencing, this delay drops to 10–15 seconds because IPMI confirms power-off directly rather than waiting for a timer.
Hardware fencing (IPMI, iDRAC, iLO)
Softdog has a real limitation: it depends on the failed node’s OS to execute the reboot. A kernel that has hung at the hardware level may not be able to reboot itself, and softdog will not trigger. Hardware fencing bypasses this entirely — the HA manager sends a power-off command through the IPMI interface from outside the failed node.
Hardware fencing is the correct choice for any environment where node hardware failures are a realistic concern. Configure it at: Datacenter → Datacenter Settings → fencing. Each node’s IPMI address, credentials, and fence agent type are set here.
Before relying on hardware fencing, test the IPMI credentials independently:
ipmitool -H [ipmi-address] -U [user] -P [password] chassis power statusStale credentials are a common cause of fencing failures in environments where IPMI passwords were rotated without updating Proxmox configuration.
How failover actually works
When a cluster node becomes unreachable:
- Corosync detects the loss. Heartbeat packets stop. After the configured timeout, Corosync marks the node as lost and recalculates quorum.
- Quorum check. If surviving nodes have quorum, the cluster can proceed. If not, everything stops here.
- CRM initiates fencing. Waits for confirmation — watchdog timer expiry or IPMI power-off acknowledgment.
- Resources are relocated. CRM evaluates surviving nodes against HA group membership and priorities. Picks the target node for each protected VM.
- LRM starts the VMs. Cold start from last disk state. Whatever was in RAM is gone. Disk state is preserved because VMs must be on shared storage.
The nofailback flag controls what happens after the failed node recovers and rejoins. With nofailback enabled (the default recommendation), VMs stay on the new node. Without it, Proxmox migrates them back — a second disruption for no operational benefit in most cases.
| Scenario | Typical recovery time | Notes |
|---|---|---|
| Softdog fencing | 2–4 minutes | Watchdog timer must expire before fencing is confirmed |
| IPMI hardware fencing | 1–2 minutes | Power-off confirmed directly, no timer wait |
| Live Migration (planned) | Seconds | Not failover — both nodes must be healthy |
| Backup restore | Varies | Last resort; VM data recovered from backup target |
These times assume shared storage is accessible and the VMs start cleanly on the receiving node. A VM with startup issues adds to recovery time regardless of fencing method.
HA vs Live Migration vs Replication
These three features serve different purposes and operators frequently confuse them. The short version: live migration is for planned maintenance, HA failover is for unplanned failures, replication is for DR.
| Feature | Downtime | When it applies | Requires |
|---|---|---|---|
| Live Migration | Seconds (brief pause) | Planned node maintenance | Both nodes healthy, shared or replicated storage |
| HA Failover | 2–4 minutes | Unplanned node failure | Cluster quorum, shared storage, fencing |
| Replication | Minutes (RPO-dependent) | Planned DR / site failure | Proxmox replication job configured |
| Backup Restore | Longest | Data recovery / corruption | External backup target (PBS, NFS) |
HA and live migration are complementary. Before planned maintenance, migrate VMs off a node manually using live migration — no downtime, no fencing event. HA handles the scenario where there’s no opportunity to do that. For the full backup and restore workflow, see the Proxmox backup strategy guide and PBS restore guide.
Prerequisites before enabling HA
Proxmox HA configuration on top of an unstable foundation produces unpredictable results. Verify these before adding any HA resources:
- Cluster is healthy:
pvecm statusshows all nodes,Quorate: Yes - Shared storage accessible from all nodes:
pvesm status— storage must appear on every node - VM disks on shared storage: not local-lvm, not local-zfs — those are not HA-capable regardless of configuration
- Watchdog active:
lsmod | grep watchdogreturns softdog or hardware watchdog - HA service running:
systemctl status pve-ha-crm pve-ha-lrmon each node - Time synchronized: Corosync is sensitive to clock drift; verify NTP is active on all nodes
Shared storage for HA
HA only protects VMs on storage accessible from all cluster nodes. This is the constraint that catches operators most often — the cluster is configured, HA resources are added, and the first real failure reveals all VMs are on local storage.
| Storage | HA capable | Notes |
|---|---|---|
| Ceph RBD | Yes | Native Proxmox integration, no external SPOF, recommended for new builds |
| Shared ZFS over iSCSI | Yes | Performant, common in existing infrastructure; ZFS server becomes SPOF unless it’s also clustered |
| NFS | Yes | Simplest to set up; NFS server is a new SPOF unless separately protected |
| local-lvm | No | Local to the Proxmox host — not accessible from other nodes |
| local-zfs | No | Also local storage, despite ZFS being capable of replication |
Recommended order for new builds: Ceph RBD first (no single point of failure, scales with the cluster), shared ZFS over iSCSI second (if Ceph overhead is too much for homelab hardware), NFS third (simplest, acceptable for homelab where NFS server loss is tolerable). Local storage in any form is not HA capable.
pvesm statusStorage listed here must appear on all nodes for HA use. Any storage showing only on one node is local storage, regardless of what the name suggests. For deeper storage architecture decisions, the Proxmox storage guide covers ZFS vs LVM-thin vs Ceph tradeoffs. For a dedicated breakdown of NFS, iSCSI, and Ceph as shared storage options for HA clusters, see the Proxmox shared storage guide.
Configuring HA resources
Proxmox HA protection is not automatic. Each VM must be explicitly added as an HA resource.
Via web UI: select the VM → More → Manage HA. Set state to started and assign an HA group if relevant.
Via CLI:
ha-manager add vm:100 --state startedha-manager statusThe output shows each protected resource, its current state (started, stopped, migrating, error), and which node it’s on. Resources in error state are not retried automatically after the configured attempt limit is exhausted — investigate with journalctl -u pve-ha-crm before manually resetting.
HA groups control which nodes a VM can run on. Assign VMs to groups that match their storage and network requirements. The priority setting within a group determines preferred placement when multiple eligible nodes are available.
Containers (LXC) can also be HA protected using the same workflow — ha-manager add ct:101 --state started — as long as the container’s storage is on shared storage.
Planned maintenance and HA
Shutting down a node without telling Proxmox HA first triggers fencing and failover — the cluster doesn’t know the shutdown was planned. Use maintenance mode instead:
ha-manager crm-command node-maintenance enable NODEOr set the node to maintenance mode via the web UI (Node → More → Maintenance Mode). When maintenance mode is active, Proxmox migrates HA resources off the node gracefully before allowing the shutdown. No fencing occurs, no failover event — the cluster stays healthy and the VMs land cleanly on other nodes. Re-enable after maintenance:
ha-manager crm-command node-maintenance disable NODESkipping this step on a planned maintenance is one of the most common unnecessary failover events in homelab clusters. The 2-minute failover with VMs restarting cold is easily avoided with one command.
Running Proxmox HA in a homelab
Proxmox HA works in homelab environments, but the hardware constraints are real and worth knowing before building the cluster.
Mini PC and NUC clusters are the most common homelab HA setup — three units of the same model, each running Proxmox, connected over a dedicated switch. The MS-01, N100-based mini PCs, and Intel NUCs all work. The practical limitation is memory: a 3-node mini PC cluster with 32GB per node gives 64GB usable (one node’s worth reserved for failover headroom). For most homelab workloads this is fine. For VM-heavy setups it gets tight fast.
Shared storage in a homelab is the harder problem. Mini PCs don’t have IPMI, which means hardware fencing isn’t available — softdog is the fencing method, which means 60-second failover delays and a dependency on the host OS being able to reboot itself. NFS from a NAS is the simplest shared storage path; iSCSI from a TrueNAS or Synology works well. Ceph is possible across three nodes but adds meaningful RAM overhead (1–2GB per node for the Ceph daemons) and is usually overkill for a 3-node homelab.
2-node homelab clusters with QDevice are a common configuration. A Raspberry Pi running the QDevice daemon is a reliable, inexpensive tiebreaker — a Pi 4 with a good SD card handles the role with no issues. The QDevice itself needs to be reliably online; if it’s also on the same power circuit as the two cluster nodes, a power event takes down the entire quorum. Put it on a UPS or a separate circuit if the setup is meant to handle real failures.
Without IPMI, hardware fencing is unavailable on most homelab hardware. This means softdog is the only fencing method. It works, but it has the limitation noted earlier — a host with a hard hardware hang may not be able to execute the watchdog reboot. In a homelab this is usually acceptable. In a production environment it isn’t.
The homelab HA sweet spot: 3 nodes, NFS or iSCSI shared storage, softdog fencing, QDevice if running 2 nodes. Ceph adds resilience at the cost of complexity and RAM. For hardware recommendations covering mini PCs specifically, see the mini PC for Proxmox guide.
What HA does not cover
Understanding the limits of Proxmox HA prevents over-relying on it for scenarios it wasn’t designed for.
Guest OS crashes. If the VM’s operating system freezes or panics, the host node is fine, quorum is intact, HA sees nothing to act on. The VM shows as running from the cluster’s perspective. For guest-level crash recovery, configure the QEMU guest agent watchdog separately — this is a per-VM setting, not an HA feature.
Application failures inside the VM. A database that crashes, a web service that stops responding — HA monitors the VM’s running state, not what’s happening inside it.
Storage failure. If shared storage becomes unavailable, HA cannot restart VMs on other nodes because those nodes also cannot reach the storage. Storage failure is a separate failure domain. For ZFS storage failure recovery, see the ZFS recovery guide.
Single-node Proxmox. HA is a cluster feature. There is no HA on a standalone host regardless of configuration.
HA protects the VM, not the service inside it. Workloads that genuinely cannot tolerate several minutes of downtime need application-level clustering — database replication, load-balanced services, active-active setups — in addition to Proxmox HA, not instead of it.
Verification and testing
Configuring Proxmox HA and assuming it works is how operators discover misconfigurations during an actual outage. Run through this before relying on Proxmox HA in production:
HA pre-production verification
- Confirm watchdog loaded:
lsmod | grep watchdog - Confirm HA services active:
systemctl status pve-ha-crm pve-ha-lrm - Confirm shared storage visible on all nodes:
pvesm status - Confirm HA resources in
startedstate:ha-manager status - Confirm quorum healthy:
pvecm status—Quorate: Yes - Test failover on a non-critical VM: shut down one node cleanly (not via maintenance mode), observe VM restart on another node, measure actual recovery time
- Review CRM and LRM logs after the test:
journalctl -u pve-ha-crmandjournalctl -u pve-ha-lrm
Forum threads document a consistent pattern: operators test HA once at setup, it passes, then months later a real failure reveals configuration drift — VMs migrated to local storage, watchdog blacklisted after a kernel update, a node removed from the HA group and never re-added. Scheduled re-verification every few months costs 10 minutes and has caught real misconfigurations before they became outages.
For log analysis during an HA event, the Proxmox logs guide covers what to look for in Corosync and HA daemon output specifically.
Common failure patterns
VM assigned to HA but on local storage. HA manager attempts restart on another node, fails because the disk is inaccessible, marks resource as error. Fix: migrate VM disk to shared storage, then re-enable HA protection.
Quorum lost before fencing completes — 2-node cluster without QDevice. Surviving node cannot fence or recover. Cluster is stuck until the failed node is brought back or quorum is manually adjusted. The cluster quorum guide covers the recovery procedure.
Hardware fencing credentials stale. HA manager cannot power off the failed node, fencing times out, failover is delayed or blocked entirely. Test IPMI credentials independently before any production deployment.
All nodes updated simultaneously. Version mismatch during simultaneous updates can cause Corosync instability. Update one node at a time, validate cluster health after each. Covered in detail in the Proxmox update guide.
VM won’t start after failover. Usually a storage or configuration issue on the receiving node. Check ha-manager status for error state, then journalctl -u pve-ha-lrm on the target node. The VM won’t start diagnostic guide covers individual startup failure modes in detail.
Final thoughts
Proxmox HA is reliable when the prerequisites are in place: shared storage, proper fencing, quorum-capable topology. The configuration itself is not complex. What catches operators is structural — VMs on local storage, watchdog drift after kernel updates, 2-node clusters without QDevice.
The single verification that catches most HA misconfigurations before the first real outage:
ha-manager status
pvecm statusThen confirm every HA-protected VM is stored on shared storage. If both commands return clean output and storage checks out, the cluster will behave as expected when a node actually fails.
FAQ
What is quorum in a Proxmox cluster?
Quorum is the majority-vote mechanism that prevents split-brain. Proxmox requires more than half of configured cluster nodes to be online before the cluster can make decisions. In a 3-node cluster, losing one node leaves 2/3 — quorum intact. Losing two leaves 1/3 — quorum lost, cluster stops acting.
Why is fencing required before failover?
Without confirmed fencing, the HA manager cannot know whether the failed node has stopped writing to shared storage. If both the original node and the recovery node write to the same VM disk simultaneously, the result is data corruption. Fencing — watchdog self-reboot or IPMI power-off — gives the HA manager a confirmed signal before proceeding.
Does Proxmox HA use live migration during failover?
No. Failover is stop-and-restart. The VM starts fresh on a surviving node from its last disk state — whatever was in RAM is lost. Live migration requires both nodes to be healthy and communicating, which is not the case during an unplanned failover. Use live migration for planned maintenance; HA handles the rest.
Can I use Proxmox HA with a 2-node cluster?
With a QDevice configured, yes. Without one, a single node failure drops the survivor to 1/2 — no quorum, no automatic recovery. A QDevice on a third lightweight system (a Raspberry Pi works fine) provides the tiebreaker vote that makes 2-node failover possible. It’s a working solution but not equivalent to a third full node, which also contributes compute capacity for the failed node’s VMs.
Does HA work with local-zfs storage?
No. Local-ZFS is local storage — it exists only on the Proxmox host’s own disks and is not accessible from other nodes. VMs on local-zfs cannot be failed over regardless of HA configuration. Only storage that appears on all cluster nodes qualifies.
Does Proxmox HA require Ceph?
No. Any shared storage accessible from all cluster nodes works — NFS, iSCSI, Ceph RBD, FC. Ceph is recommended for new builds because it has no external single point of failure, but NFS or iSCSI are valid alternatives depending on existing infrastructure.
Can LXC containers be HA protected?
Yes. Containers use the same HA mechanism as VMs — ha-manager add ct:101 --state started — with the same requirement: container storage must be on shared storage accessible from all nodes. Containers on local storage are not HA capable.
What happens if the node running CRM fails?
Another node automatically takes over the CRM role. There is no single point of failure in the HA manager layer. The standby CRM daemons on surviving nodes detect the loss and elect a new active manager. The HA recovery process continues from the new CRM.
Where are the HA manager logs?
Two daemons, two log sources. On the active CRM node: journalctl -u pve-ha-crm. On any node for local execution activity: journalctl -u pve-ha-lrm. CRM logs show cluster-level decisions — fencing, resource assignment, failover sequencing. LRM logs show local execution — start/stop commands, watchdog feed status. Both are needed for post-incident analysis.
Proxmox VE Series
22 articles — Installation · Storage · Networking · HA · Recovery