Proxmox Logs Explained: Where to Look When Something Breaks (2026)

10 min read

Proxmox doesn’t have a single log file. That’s the first thing operators get wrong when troubleshooting — they go looking for “the Proxmox log” and end up either in the wrong place or overwhelmed by raw journal output that doesn’t point anywhere useful.

The better mental model isn’t “where are the logs?” It’s “which component failed?” Once that question is answered, the right log becomes obvious. A failed backup points somewhere different than a GUI that won’t load, which points somewhere different than a cluster that lost quorum. The component tells you the log. The log tells you the cause.

This guide maps Proxmox log locations and architecture to the actual failure scenarios operators run into — what each log captures, what it doesn’t, and where evidence actually hides during diagnosis.

TL;DR — Proxmox Logs

There is no single “pve-manager.log” — logs are split by component
On PVE 8+, journald is the primary source; /var/log/syslog may not exist
Failed tasks → Task History first, then system journal
GUI/API problems → pveproxy journal + access.log; backend errors → pvedaemon
Grey question marks → pvestatd journal
Cluster/quorum problems → corosync journal
Host lockup with nothing in journald → investigation moves to IPMI/serial
Volatile journal = lost evidence after reboot — configure persistent journald now

In this article

The Failure-Layer Framework

Before opening any log, identify which layer failed. This single step eliminates most of the time operators waste looking in the wrong place during troubleshooting.

Layer	What failed	Where to look first
Operation	Backup, restore, migration, clone, snapshot	Task History / `pvenode task log`
GUI / API frontend	Login failed, 500 error, web UI unreachable	`journalctl -u pveproxy.service` + `access.log`
API backend	VM operation failures, API backend errors, task execution issues	`journalctl -u pvedaemon.service`
Status path	Grey `?`, unknown VM/CT status, slow storage list	`journalctl -u pvestatd`
Cluster integrity	Quorum lost, node disappeared, fencing	`journalctl -b -u corosync`
Storage	ZFS errors, NFS/CIFS mount failures, Ceph OSD down	`journalctl -k` + storage-specific logs
HA	HA not restarting VMs, unexpected fencing	`journalctl -u pve-ha-lrm` / `pve-ha-crm`
Host hardware	Hard lockup, random reboot, complete silence	`journalctl -k -b -1` → IPMI/serial if empty

Proxmox Logs: The System Journal as Primary Source

On Proxmox VE 8 and newer, journald is the canonical logging source. Services like pveproxy, pvedaemon, pvestatd, pve-cluster, and corosync all write to it. The GUI’s “Syslog” view under Node → System → Syslog reads from journald directly.

One thing to verify early: on fresh PVE 8+ installations, /var/log/syslog and /var/log/kern.log may not exist at all. These files only appear if rsyslog is installed and receiving journal output. Many forum threads and older guides reference these files — operators who upgraded from older Proxmox versions or followed legacy documentation are regularly surprised when ls /var/log/syslog returns nothing. Check journald first; don’t assume the text files are there.

Failure Scenario

Volatile journal after reboot. By default, journald may store logs only in memory under /run/log/journal. If that’s the case, the journal disappears on reboot — and a reboot is exactly what happens after a crash. Post-mortem investigation starts with an empty journal. To verify: ls /var/log/journal. If that directory doesn’t exist, journald is running in volatile mode. Fix: mkdir -p /var/log/journal && systemctl restart systemd-journald. Do this before you need it.

Journald does not keep logs for a fixed number of days. The full configuration options are documented in the systemd journald.conf man page. It uses size-based rotation: roughly 10% of the filesystem, capped at 4 GiB. During a log storm — a failing backup generating thousands of entries per minute — older records disappear much faster than expected. When investigating a slow-burning incident, check whether the relevant time window is still present before building a theory.

There’s also a common trap with systemctl status: after journal rotation, it shows “Journal has been rotated since unit was started. Log output is incomplete or unavailable.” This isn’t a Proxmox problem — it means the journal rotated and systemctl status can only show the recent buffer. Switch to journalctl --since/--until with the specific time window.

Proxmox Task Logs: Operation-Level Diagnosis

For any background operation — backup, restore, migration, clone, snapshot, replication — the Proxmox task log is the first place to look, not the system journal.

Proxmox stores these under /var/log/pve/tasks. The CLI interface is documented in the pvenode man page. The GUI’s Task History panel reads from the same location. Each task has a UPID that identifies it across both interfaces.

# list recent failed tasks
pvenode task list --errors

# filter by VM
pvenode task list --errors --vmid 100

# filter by type
pvenode task list --typefilter vzdump

# read a specific task log
pvenode task log UPID:pve:...

Task logs show the operation step by step: snapshot creation, compression phase, transfer, storage write, cleanup. When a backup fails, the task log usually shows the exact phase and error. Common patterns: I/O errors during the zstd phase pointing to failing storage, “cannot activate storage on node” indicating a mount problem.

Why operators get this wrong: they go straight to journalctl for backup failures, wade through unrelated system messages, and miss the task log that has the answer in three lines. Task log first, journal second — use the journal to understand why the storage failed, not to find the backup failure itself.

One gap: task logs capture operation output, not always root cause at the storage layer. Proxmox support staff regularly asks for both — the task log to see what failed, and the system journal around the same timestamp to see why the underlying resource was unavailable.

pveproxy and pvedaemon: GUI, API, and Backend Operations

These two services cover different layers of the same stack and are worth understanding separately.

pveproxy: The HTTPS Frontend

pveproxy is the HTTPS frontend on port 8006. Every web UI request and API call passes through it. When the GUI won’t load, login fails, or API calls return 401/500/596, start here.

journalctl -u pveproxy.service
journalctl -fu pveproxy          # follow live

The journal shows service startup failures, SSL/certificate problems, permission errors on key files, worker crashes.

Alongside the journal, check the access log:

/var/log/pveproxy/access.log

Every HTTP(S) request is recorded here: IP address, endpoint, response code, timestamp. Useful for spotting patterns — repeated 401s from a specific IP, a specific endpoint consistently returning 500, API integrations that started failing after an update.

The access log has a documented gap: for HTTP 500 errors, it records the fact but not the internal reason. This is a recurring complaint from API integrations — the log shows the 500 but doesn’t explain what caused it. For those cases, check pvedaemon.

pvedaemon: The API Backend

pvedaemon is the backend API service — it handles the actual execution of requests that pveproxy forwards. When pveproxy shows a 500 but the access log doesn’t explain why, pvedaemon is where the backend error surfaces.

journalctl -u pvedaemon.service
journalctl -fu pvedaemon         # follow live

This is where to look for: VM operation failures (start/stop/migrate commands that fail at the execution layer), task execution errors that aren’t fully captured in the task log, storage action failures when a specific operation against a storage backend fails, API backend errors that aren’t exposed through pveproxy alone.

Field Note

Forum threads that ask about “GUI works but VM operations fail” or “API returns 500 for specific endpoints” often end with pvedaemon output. Operators who only check pveproxy miss the backend half of the picture. The two services are complementary — pveproxy shows the HTTP layer, pvedaemon shows the execution layer.

pvestatd: The Question Mark Problem

Grey question marks on VMs and containers, “unknown” statuses, storage entries that won’t update — these almost always trace back to pvestatd.

pvestatd is the status daemon. It polls VM and CT status, storage availability, and metric server connections. When it can’t reach something, the status goes grey.

journalctl -u pvestatd
journalctl -fu pvestatd   # live

Common journal entries: got timeout, long status update intervals, metric server send errors, NFS/CIFS/RBD mounts that stopped responding.

Why operators get this wrong: grey question marks look like VMs crashed. Often the VMs are running fine — only the status polling path is broken. A dying NFS mount or an unreachable metric server can make everything appear unknown without affecting actual workloads.

pvestatd is an indicator, not the root cause. It tells you polling failed. The actual problem is in whatever pvestatd was trying to reach.

Kernel Logs: Where Hardware Problems Surface

Kernel messages are where hardware-layer problems leave evidence — and for many homelab crashes, they’re the only evidence that exists before the system goes silent.

journalctl -k             # kernel messages, current boot
journalctl -k -b -1       # kernel messages from previous boot
journalctl -k -p err      # kernel errors only

What shows up here that doesn’t appear elsewhere:

OOM killer events. When the kernel kills a process due to memory pressure, OOM killer messages appear in the kernel log with the process name, PID, and memory stats. These don’t appear in service journals.

Soft lockup and RCU stall warnings. “watchdog: BUG: soft lockup” entries indicate a CPU core isn’t responding to the scheduler. Common with overloaded systems or certain driver bugs.

Storage I/O errors. Block device errors, SCSI error messages, and disk timeout warnings all surface in the kernel log. ata1.00: failed command: READ FPDMA QUEUED or blk_update_request: I/O error, dev sda are typical patterns indicating a failing drive.

NIC resets and driver errors. Intel i226-V link drops, driver reset events, and network interface errors appear here before showing up anywhere in application logs. If pvestatd shows timeouts and the kernel log shows NIC resets around the same time, the cause is clear.

Kernel panics. If the system panicked and rebooted, the previous boot’s kernel log sometimes captures the panic trace. Sometimes it’s empty — which is itself informative.

Field Note

Homelab operators running consumer hardware on recent kernels see more kernel-level noise than production deployments on validated hardware. Intel i226-V NIC instability under kernel 6.8+ is a documented pattern — link drops surface in kernel logs first, then in pvestatd timeouts, then in VM connectivity issues. Tracing backwards from the application symptom to the kernel log usually takes five minutes once you know where to look.

Storage Logs: ZFS, NFS, CIFS, and Ceph

Most homelab Proxmox setups use ZFS, NFS, or CIFS. Ceph is less common outside multi-node clusters. Each storage type leaves evidence in different places.

ZFS

ZFS problems surface in two places: the kernel log and zpool status.

# current pool health
zpool status

# I/O error counts and scrub results
zpool status -v

# kernel messages about ZFS
journalctl -k | grep -i zfs

zpool status shows pool health, disk error counts, and scrub results. A pool in DEGRADED state with checksum errors on a specific vdev points directly to a failing drive — this is more actionable than anything in the system journal. ZFS also logs I/O errors to the kernel log, so journalctl -k with a grep for zfs or the device name often surfaces the same information in a timestamped context.

ARC-related issues (unexpected memory pressure, slow VM performance after RAM is exhausted) don’t generate obvious log entries — arc_summary is more useful than any log for diagnosing ZFS memory behavior.

NFS and CIFS/SMB

NFS and CIFS mount problems typically appear in two places: the kernel log and pvestatd.

# mount errors and NFS timeouts
journalctl -k | grep -iE "nfs|cifs|mount"

# pvestatd polling failures that result from the mount problem
journalctl -u pvestatd

NFS timeouts show up as kernel messages first — nfs: server not responding or nfs: server timed out. These precede the grey question marks in the GUI. CIFS credential failures and connection drops also surface in the kernel log.

The practical troubleshooting sequence for “storage shows grey in the GUI”: check pvestatd for timeout messages, then check the kernel log for mount-level errors at the same timestamp. The pvestatd entry tells you polling failed; the kernel entry tells you why the mount stopped responding.

Ceph

Ceph has its own logging stack, separate from the main Proxmox journal.

# specific OSD
journalctl -u ceph-osd@3.service

# crash metadata
ceph crash ls
ceph crash info <id>

# full daemon logs
ls /var/log/ceph/

Even with Ceph symptoms, the root cause isn’t always in Ceph logs. Official Proxmox Ceph documentation explicitly recommends checking system logs on the affected node, disk health (SMART), and IPMI or RAID controller logs alongside Ceph daemon logs. There are documented cases where Ceph logs showed nothing obvious and the actual cause was a failing HBA.

Corosync and pve-cluster: Cluster Integrity

Quorum loss, node disappearing from the cluster, fencing, join failures — start with corosync.

journalctl -b -u corosync
pvecm status               # current quorum state
journalctl -u pve-cluster.service

Corosync logs show cluster membership events: nodes joining and leaving, quorum gained and lost, link down events, qdevice connectivity. pvecm status shows the current state — useful for confirming whether quorum is currently healthy before digging through logs.

pve-cluster handles the cluster filesystem under /etc/pve. Authkey rotation problems and pmxcfs errors appear here. For deep pmxcfs debugging, there’s a verbose mode: create /etc/pve/.debug and restart the service.

Corosync logs often show consequences, not causes. “Node left cluster” or “link down” doesn’t explain whether the physical switch failed, the NIC dropped, or the node locked up. Corosync sees the result; the kernel log or IPMI shows the cause.

HA Logs

If HA isn’t restarting VMs after a node failure, or fencing behavior doesn’t match expectations, the HA stack logs explain the decision chain.

journalctl -u pve-ha-lrm   # local resource manager on the node running the service
journalctl -u pve-ha-crm   # cluster resource manager on the master node

HA logs record every decision: start, stop, recovery attempts, error states, fencing triggers, maintenance mode transitions, watchdog events. These are the authoritative record of what HA decided and why.

These logs don’t replace corosync or storage logs. If HA didn’t fence because quorum was never actually lost, corosync explains that. If HA couldn’t start a VM because storage was unavailable, storage logs explain that.

Cluster-Specific Complications

Two things make log investigation harder in multi-node setups.

pveproxy forwards requests across nodes. When logged into the GUI on node A and performing an operation that actually runs on node B, errors surface in node B’s journal — not node A’s. If investigation of a GUI or API error on node A shows nothing, check the other nodes. The request may have been forwarded silently.

The GUI aggregates tasks from all nodes. The lower Task Log panel shows recent tasks from the entire cluster. This is convenient but creates false confidence — “I can see the logs” may mean aggregated output from multiple nodes. When something specific goes wrong, identify which node was responsible and check that node’s journal directly.

The Complete Command Reference

# current boot overview, reverse chronological
journalctl -b -r

# errors and above only
journalctl -b -p err..alert

# specific services
journalctl -u pveproxy.service
journalctl -u pvedaemon.service
journalctl -u pvestatd.service
journalctl -b -u corosync
journalctl -u pve-cluster.service
journalctl -u pve-ha-lrm
journalctl -u pve-ha-crm
journalctl -u ceph-osd@3.service

# kernel messages
journalctl -k
journalctl -k -b -1          # previous boot
journalctl -k -p err         # kernel errors only

# time window
journalctl --since "2026-05-30 08:00" --until "2026-05-30 09:00"

# follow live
journalctl -fu pveproxy
journalctl -fu pvestatd

# task investigation
pvenode task list --errors
pvenode task list --errors --vmid 100
pvenode task list --typefilter vzdump
pvenode task log UPID:pve:...

# storage health
zpool status
zpool status -v
ceph crash ls

Proxmox Log Blind Spots: Where Evidence Disappears

Knowing where logs are missing matters as much as knowing where to look.

Blind spot	What it means for diagnosis
Hard lockup, power loss, hardware reset	The journal may simply stop. No error, no warning — just silence where the log ends. Investigation moves to IPMI/iDRAC/iLO event logs, RAID or HBA controller logs, UPS event history, serial console, or kdump if configured.
HTTP 500 without internal detail	`access.log` records the 500 response but not the internal reason. API integrations regularly report this as the only trace available. Check pvedaemon journal for backend errors around the same timestamp.
pvestatd shows symptoms, not causes	Grey question marks mean polling failed. They don’t mean VMs are down, and they don’t identify why polling failed. Look at what pvestatd was trying to reach — the storage backend, the mount, the metric server.
Guest-internal problems	Proxmox host logs cover host services, task workers, kernel, and storage stack. They don’t show what’s happening inside a VM or LXC container. Guest-side investigation requires accessing the guest’s own logs directly.
Volatile journal after reboot	If journald isn’t set to persistent mode, the crash evidence is gone by the time investigation starts. Configure persistent journald before it’s needed.

Investigating a crash with no logs

Check journalctl -k -b -1 — previous boot kernel messages may capture the last events before silence
Check journalctl -b -1 -p err — errors from previous boot across all services
If journal is empty or shows abrupt cutoff — Proxmox-side logging has nothing more to offer
Move to IPMI/iDRAC/iLO event log — hardware-level events survive host crashes
Check RAID or HBA controller logs if storage is involved
Check UPS event history if power loss is possible
Consider configuring kdump for future incidents if the crash is recurring

Final Thoughts

Proxmox’s logging architecture reflects how the software is built: separate daemons for separate responsibilities, with journald as the common thread. The frustration most operators experience during troubleshooting comes from looking for a unified “Proxmox log” that doesn’t exist.

The practical model that works: identify the failure layer first, go to the corresponding Proxmox log, and treat absence of evidence as its own signal. A hard lockup with nothing in journald isn’t “no logs” — it’s a finding that the failure happened below the logging layer, which points directly toward hardware investigation.

For homelab setups specifically: configure persistent journald before you need it. Make that change today. The post-mortem investigation after an unexpected reboot depends on a decision made before the reboot happened.

Proxmox Operator Cluster

Foundation

Proxmox vs ESXi Free 2026 What Is Proxmox VE? Best Mini PC for Proxmox How to Install Proxmox VE 9.1 8 Things to Do After Installing Proxmox How to Update Proxmox VE Safely How Much RAM Does Proxmox Really Need? Proxmox Storage: ZFS vs LVM-thin vs Ceph Proxmox Networking: Bridges, VLANs, Bonds Proxmox Backup Strategy Proxmox HA Cluster Proxmox GPU Passthrough ESXi to Proxmox Migration

Troubleshooting & Recovery

Proxmox Random Crashes: How to Find the Real Cause Proxmox Logs Explained: Where to Look When Something Breaks Proxmox Cluster Quorum Lost: What It Means and How to Fix It