Windows server event log troubleshooting starts with the right mental model. Event ID 41, 6008, 7034, 2019, or 129 rarely proves the root cause on its own — these events tell you which component noticed the problem first, not why the problem happened.
The workflow is simple: export the logs before touching anything, build a timeline around the incident, and use the Event IDs below to decide the next check. Do not start by searching one Event ID in isolation. Start by finding what happened before it.
Most operators get windows server event log troubleshooting wrong by treating each event as a verdict instead of a signal. The discipline is correlation across a timeline — not single-event lookup.
- Do not approach windows server event log troubleshooting by looking up one Event ID. Build a timeline first.
- Export logs before changing anything:
wevtutil epl System C:\IR\System_%COMPUTERNAME%_pre.evtx - Event 41 means the previous shutdown was unclean. Read
BugcheckCodebefore blaming the power supply. - Events 129/153 mean storage timeout and retry. Recurring events are early warning before NTFS 55/98 corruption.
- Event 7034 means a service died with no recovery action. Event 7031 means recovery is configured and will run.
- Default log sizes are 20 MB. On a busy server, the System log can overwrite itself in days. Raise the limit before the next incident.
Event IDs Covered in This Windows Server Event Log Troubleshooting Guide
Use these as jump points if you already know the Event ID you are chasing.
| Category | Event IDs |
|---|---|
| Reboots and shutdowns | 41, 1001, 6008, 1074, 6005, 6006, 6013 |
| Service Control Manager | 7000, 7001, 7009, 7011, 7031, 7034, 7036, 7045 |
| Storage and NTFS | 7, 11, 51, 55, 98, 129, 153 |
| Memory pool | 2019, 2020 |
| Windows Update | 19, 20, 21 |
| Application crashes | 1000, 1001 |
| Log clearing and tampering | 104, 1102 |
Fast Triage Map
Start here when a server is sick and you need to decide what to pull first. Windows server event log troubleshooting always begins with mapping the symptom to the right log and Event ID cluster.
| Symptom | First Events to Check | What to Decide |
|---|---|---|
| Unexpected reboot | 41, 6008, 1001, 1074, 6006 | BSOD, power loss, forced reset, or clean reboot? |
| Service stopped responding | 7031, 7034, 7000, 7001, 7009, 7011 | Crash, dependency failure, timeout, or no recovery configured? |
| Disk warnings or errors | 7, 11, 51, 55, 98, 129, 153 | Removable-media noise or fixed-disk failure path? |
| Server hangs, out of resources | 2019, 2020 | Kernel pool leak — which driver? |
| Update failed or stuck | 19, 20, 21 | Patch installed, failed, or pending reboot? |
| Cleared logs | 1102, 104 | Authorized maintenance or tampering? |
First 5 Minutes: Export Logs and Build the Timeline
Before touching anything, export the relevant logs. Once you start investigating — running commands, restarting services, rebooting — events can roll over, be overwritten, or change context. An exported .evtx file is your baseline for all subsequent windows server event log troubleshooting.
# Export System and Application logs before investigation
wevtutil epl System C:\IR\System_%COMPUTERNAME%_pre.evtx
wevtutil epl Application C:\IR\Application_%COMPUTERNAME%_pre.evtxThen pull the last 6 hours of Critical and Error events across both logs in one pass:
Get-WinEvent -FilterHashtable @{
LogName = 'System','Application'
Level = 1,2
StartTime = (Get-Date).AddHours(-6)
} | Sort-Object TimeCreated |
Select-Object TimeCreated,LogName,Id,ProviderName,LevelDisplayName,MessageSort by TimeCreated ascending and read the output chronologically. The first anomaly in the timeline is almost always more useful than the loudest event.
Many operators go straight to the loudest Critical event and work from there. The real cause is usually something quieter that appeared several minutes earlier. This is the single most common mistake in windows server event log troubleshooting — loudest is not earliest.
Unexpected Reboots and Shutdowns
Event 41 — Kernel-Power (System log, Critical)
Event 41 means the previous shutdown was not clean. Windows logs it on the next boot when it finds no evidence that the prior session ended normally. It is a post-hoc signal — and it does not identify root cause.
The EventData fields inside Event 41 are where windows server event log troubleshooting for stability incidents actually starts. Open the Details tab in Event Viewer (XML view) and read BugcheckCode first.
| Event 41 Field | Meaning | Next Step |
|---|---|---|
BugcheckCode non-zero |
A Stop error (BSOD) occurred | Convert decimal to hex, check Event 1001, analyze C:\Windows\MEMORY.DMP |
BugcheckCode = 0, PowerButtonTimestamp non-zero |
Physical power button was held | Confirm who or what triggered it |
| All values zero | Windows could not record the failure | Check UPS, BMC/iDRAC/iLO hardware logs, Event 46, thermals, RAM, host or hypervisor logs |
Event 41 with all-zero values is routinely blamed on the power supply. A server that froze and was hard-reset, a VM that the hypervisor killed, or a system that could not write a crash dump because of Event 46 (volmgr dump init failed) will produce the same all-zero Event 41. Check Event 46, BMC sensor logs, and memory diagnostics before ordering hardware.
The BugcheckCode value is recorded in decimal. Event 41 showing 159 converts to 0x9F — that is DRIVER_POWER_STATE_FAILURE, not a power supply fault. Convert the code and look it up before drawing conclusions. Microsoft’s documentation on unexpected reboots covers the BugcheckCode field in detail.
Reboot and Shutdown Timeline Correlation
A clean, planned reboot leaves a specific footprint in the System log. An unexpected one does not. Recognizing the difference is fundamental to windows server event log troubleshooting for stability problems.
| Event ID | Source | Meaning |
|---|---|---|
| 1074 | User32 | A process or user initiated a planned shutdown or restart — includes who and why |
| 6006 | EventLog | Event Log service stopped — the clean shutdown marker |
| 6005 | EventLog | Event Log service started — system is booting |
| 6008 | EventLog | Previous shutdown was unexpected — logged on next boot when 6006 is absent |
| 6013 | EventLog | Daily uptime report in seconds — confirms whether the box rebooted |
| 1001 | BugCheck / WER | System bugcheck record — paired with Event 41 when a Stop code was recorded |
Normal clean reboot pattern: 1074 → 6006 → [down] → 6005 → 6013. If you see 6005 with no preceding 6006, the box went down uncleanly — look for 6008 and 41 to confirm.
Event 6008 has a documented false positive: a forced remote shutdown via shutdown.exe against a locked or screensaver-locked session may log 6008 even though the action was intentional. Correlate with 1074 before treating it as an incident.
Event 1001 — Two Different Events, Same ID
Event 1001 appears in two logs with completely different meanings. In the System log with source BugCheck: a Stop error was recorded — includes the hex code and the dump path. In the Application log with source Windows Error Reporting: a WER fault-bucket record for an application crash. Do not confuse them — the System 1001 means the OS crashed; the Application 1001 means a user-mode process crashed.
Service Control Manager Failures
SCM events tell you a service crashed, timed out, or failed to start. They do not tell you why. Pair them with Application log Event 1000 and any driver or application-specific logs around the same timestamp.
| Event ID | Level | Meaning |
|---|---|---|
| 7000 | Error | Service failed to start — often error 1053 (did not respond in time) |
| 7001 | Error | Dependency service failed to start — error 1068 |
| 7009 | Error | 30-second timeout waiting for the service to connect at startup |
| 7011 | Error | 30-second timeout waiting for a transaction response mid-operation |
| 7031 | Error | Service crashed — recovery action IS configured, will execute |
| 7034 | Error | Service crashed — no recovery action configured, no remediation |
| 7036 | Information | Service entered running or stopped state — high-volume noise, not an error |
| 7045 | Information | A new service was installed — includes ImagePath and AccountName for triage |
Events 7031 and 7034 share the same opening line: “The [service] service terminated unexpectedly. It has done this [X] time(s).” The difference is what follows. Event 7031 appends the recovery action and delay. Event 7034 stops there — no recovery configured means no automatic remediation. This distinction matters when doing windows server event log troubleshooting for service availability incidents.
- Read the error code in the Event 7000 description — common values: 1053 (timeout), 1067 (process terminated), 1069 (logon failure), 1920 (file access denied).
- If Event 7001 accompanies it, identify the dependency service that failed first — that is the real failure point.
- Check whether the service account password expired or the “Log on as a service” right was revoked.
- On overloaded or storage-starved hosts, 7009/7011 timeouts appear — the default SCM timeout is 30 seconds, set in
HKLM\SYSTEM\CurrentControlSet\Control\ServicesPipeTimeout. Treat a slow-starting service as the real problem, not the timeout itself. - Confirm the service binary path exists and the account has Read/Execute on it.
Event 7045 (new service installed) is worth alerting on in production. A new auto-start service running from C:\Temp or C:\Users\ under LocalSystem is a red flag. Services installed by directly writing registry keys do not generate a 7045 — that gap matters for forensic completeness.
Disk, NTFS, and Storage Timeout Events
Storage events form a causal chain. In windows server event log troubleshooting for disk problems, catching events 129 and 153 early is prevention. Seeing events 55 and 98 means you are already in recovery territory.
- Single Event 51 tied to removable media or a USB removal — Microsoft documents this as potentially harmless. Ignore if isolated and non-recurring.
- Recurring Event 51 on a fixed system disk — treat as a real I/O issue. Back up immediately, check SMART counts, inspect cables and controller.
- Events 129 or 153 recurring — storage timeouts and retries. Treat as an early-warning incident. Investigate the storage path before corruption occurs.
- Event 55 or 98 following 129/153 — NTFS corruption or dirty volume. Schedule repair with
chkdsk /rorRepair-Volumein a maintenance window. - Event 7 recurring with non-zero SMART reallocated or pending sector counts — the disk is failing. Back up and replace.
| Event ID | Source | Meaning |
|---|---|---|
| 7 | disk | Bad block detected on the device |
| 11 | disk / controller driver | Driver detected a controller error — cable, backplane, or HBA issue |
| 51 | Disk | Error during paging operation — can be noise on removable media or a real fixed-disk issue |
| 55 | Ntfs | File system structure is corrupt — run chkdsk; check VSS shadows if chkdsk comes back clean |
| 98 | Ntfs | Volume is flagged dirty — requires offline chkdsk; verify with fsutil dirty query C: |
| 129 | storahci / storvsc / HBA driver | Storport reset issued — storage stopped responding long enough to trigger a reset |
| 153 | disk | I/O operation retried — miniport-level timeout complement to Event 129 |
Event 51 generates the most panic calls relative to its actual severity. Microsoft’s guidance on data corruption and disk errors is explicit that Event 51 is logged on buffered I/O — including removable media events — and that a single occurrence can have no harmful effect. The test is whether it recurs on a fixed disk alongside other storage events.
The 129→153→55→98 chain is the signature of a storage subsystem that is failing or overloaded. Operators who catch 129/153 early have time to act. Operators who ignore them until 55/98 appear are already in a repair window.
# Quick storage health sweep across relevant Event IDs
Get-WinEvent -FilterHashtable @{
LogName = 'System'
ID = 7,11,51,55,98,129,153
Level = 1,2,3
} | Sort-Object TimeCreated | Format-Table TimeCreated,Id,ProviderName,LevelDisplayName -Auto
# Check whether a volume is flagged dirty
fsutil dirty query C:Memory Pool Exhaustion: Events 2019 and 2020
Events 2019 and 2020 are logged by the Server service (srv.sys) when the nonpaged pool (2019) or paged pool (2020) runs out. The Server service is almost never the culprit — it is the first component to fail an allocation from a pool that a leaking kernel driver has been consuming for hours or days.
Symptoms look like general server sluggishness, followed by the server stopping all network responses while still pinging. A reboot clears the symptom temporarily, and then the leak resumes. Windows server event log troubleshooting for pool exhaustion requires poolmon — Event Viewer alone cannot identify which driver holds the memory.
- Open poolmon.exe (from the Windows Driver Kit or Windows Assessment and Deployment Kit).
- Press P to toggle to the pool type showing the high usage (nonpaged for 2019, paged for 2020).
- Press B to sort by bytes consumed. The four-character tag at the top is the likely offender.
- Map the tag to a driver using
findstr /s [TAG] C:\Windows\System32\drivers\*.sysor checkpooltag.txtin the WDK triage folder. - Update, rollback, or replace the driver identified by the tag.
The nonpaged pool is RAM that the kernel cannot page to disk — it must stay resident. Third-party network, storage, and filter drivers are the most common leak source. Patching or replacing the offending driver is the only real fix. The full poolmon workflow — including tag-to-driver mapping and common leaking drivers — is covered in the Windows Server performance troubleshooting guide.
Windows Update Events: 19, 20, and 21
Windows Update events live in the Microsoft-Windows-WindowsUpdateClient/Operational log, not the System log. Event 19 is a successful install. Event 20 is a failure. Event 21 signals a restart is required.
# Pull update history: success, failure, restart-required
Get-WinEvent -FilterHashtable @{
LogName = 'Microsoft-Windows-WindowsUpdateClient/Operational'
ID = 19,20,21
} | Select-Object TimeCreated,Id,Message | Sort-Object TimeCreatedThe hex error code in Event 20 is the useful part. Common codes on Windows Server: 0x800F0922 (CBS corruption — run DISM /Online /Cleanup-Image /RestoreHealth then SFC /scannow); 0x80070643 (.NET install failure); 0x80070103 (driver already at an equal or newer version — usually correct behavior, not an error). For deeper failures, the authoritative source is %windir%\Logs\CBS\CBS.log.
Many consumer guides suggest disabling Fast Startup, changing power plans, or updating audio drivers. On Windows Server, treat that advice as irrelevant unless the component is actually in the failure path.
Application Crashes: Events 1000 and 1001
Event 1000 (source: Application Error) is the standard application crash record — includes the faulting process, faulting module, and exception code. The faulting module is frequently a shared Windows DLL (ntdll.dll, kernelbase.dll) rather than the real culprit. Those DLLs are victims of invalid memory access, not the source of it.
Exception code 0xc0000005 is an access violation — the most common crash code and not diagnostic on its own. The faulting module offset and a process dump are needed to identify the real cause.
To capture a dump automatically on future crashes:
# Configure WER to save process dumps for a specific application
# Replace app.exe with the crashing process name
$regPath = 'HKLM:\SOFTWARE\Microsoft\Windows\Windows Error Reporting\LocalDumps\app.exe'
New-Item -Path $regPath -Force
Set-ItemProperty -Path $regPath -Name DumpFolder -Value 'C:\CrashDumps'
Set-ItemProperty -Path $regPath -Name DumpType -Value 2 # Full dump
Set-ItemProperty -Path $regPath -Name DumpCount -Value 5Cleared Logs and Tampering: Events 104 and 1102
Event 1102 appears in the Security log whenever the audit log is cleared — it always records the account that cleared it, regardless of audit policy settings. Event 104 is the same signal in the System and Application logs.
There is rarely a legitimate operational reason to manually clear the Security log on a production server. A 1102 you did not initiate, especially near a 6008 or other anomalies, should be treated as a tamper signal until confirmed otherwise. Clearing Security logs is a standard post-compromise step — this is where windows server event log troubleshooting intersects with incident response.
A 1102 event on a domain controller during an active incident investigation is not always an admin clearing logs for tidiness. Without centralized Windows Event Forwarding, the evidence of what happened before the clear is already gone from the local system. Log centralization is a pre-incident requirement, not a post-incident nice-to-have.
Windows Server Event Log Troubleshooting: Commands Operators Actually Use
Get-WinEvent -FilterHashtable is the operator default for interactive triage. wevtutil is the tool for exports, log configuration, batch scripts, and Server Core. Both use the same underlying XPath query layer.
The critical performance point: use FilterHashtable or FilterXPath to push predicates into the event log engine. Piping to Where-Object retrieves every event first and filters after — dramatically slower on large logs. Microsoft’s FilterHashtable documentation covers the full syntax.
# Last 24h Critical and Error events across System and Application
Get-WinEvent -FilterHashtable @{
LogName = 'System','Application'
Level = 1,2
StartTime = (Get-Date).AddDays(-1)
} | Sort-Object TimeCreated |
Format-Table TimeCreated,Id,ProviderName,LevelDisplayName -Auto
# Kernel-Power 41 events — check BugcheckCode in output
Get-WinEvent -FilterHashtable @{
ProviderName = 'Microsoft-Windows-Kernel-Power'
Id = 41
} | Select-Object TimeCreated,Message
# Storage events — sweep for any Level 1-3 disk/NTFS events
Get-WinEvent -FilterHashtable @{
LogName = 'System'
ID = 7,11,51,55,98,129,153
Level = 1,2,3
} | Format-Table TimeCreated,Id,ProviderName,LevelDisplayName -Auto
# Export System log before investigation (run first, always)
wevtutil epl System C:\IR\System_%COMPUTERNAME%_pre.evtx
# Query Event 41 from command line — useful on Server Core
wevtutil qe System /q:"*[System[(EventID=41)]]" /f:text /rd:true /c:10
# Check whether a volume is dirty (companion to Event 98)
fsutil dirty query C:To read EventData fields from the XML — useful for extracting BugcheckCode programmatically:
$evt = Get-WinEvent -FilterHashtable @{ ProviderName='Microsoft-Windows-Kernel-Power'; Id=41 } |
Select-Object -First 1
([xml]$evt.ToXml()).Event.EventData.DataUse Get-WinEvent -Path C:\IR\System_pre.evtx to query an exported archive file the same way you would query a live log.
Event Viewer Custom Views
Custom Views are the standing dashboards that make windows server event log troubleshooting repeatable. Build them once and export the XML to deploy across the fleet — no rebuilding filters during every incident.
Three views worth having as defaults:
Last 24h Errors and Critical — aggregates Critical and Error events across System and Application logs with a 24-hour time filter. This is the “what’s wrong with this box right now” first look. The built-in Administrative Events view does the same thing but includes Warnings, which generates significant noise on most production servers.
Storage health — System log, Sources disk and Ntfs and the HBA driver, Event IDs 7, 11, 51, 55, 98, 129, 153. Leave the time filter open to capture history. A single glance shows whether storage events are isolated or building into a pattern. For the full diagnostic sequence when these events appear, see Windows Server Storage Troubleshooting: Disk Errors, CHKDSK, and Knowing When a Drive Is Dying.
Service failures — System log, Source Service Control Manager, Event IDs 7000, 7001, 7009, 7011, 7031, 7034, 7045. Filters out the 7036 noise (service state changes) that makes raw SCM filtering hard to read.
Build a filter in the GUI, then switch to the XML tab to copy the generated XPath. That same XPath works directly in wevtutil and Get-WinEvent -FilterXPath.
Windows Event Forwarding Baseline
For a single server, Event Viewer and PowerShell are enough. For multiple servers, default local log sizes mean events disappear before you look at them — and a cleared local log (Event 1102) removes your evidence. Windows server event log troubleshooting across a fleet is only reliable when logs survive local rollover and tampering.
Windows Event Forwarding (WEF) collects events from source machines and writes them to a central Windows Event Collector (WEC) server. Events survive local rollover and local log clearing because they already exist on the collector.
The source-initiated model scales better for more than a handful of machines. Configure it via Group Policy: Computer Configuration → Administrative Templates → Windows Components → Event Forwarding → Configure target Subscription Manager — set the value to Server=http://<collector-FQDN>:5985/wsman/SubscriptionManager/WEC,Refresh=60.
On the collector: run wecutil qc once to configure the service. On sources: winrm quickconfig. Transport is WinRM over port 5985 (HTTP, Kerberos-encrypted in-domain). To forward the Security log, add NETWORK SERVICE to the Event Log Readers group on each source machine.
On any multi-server environment where log history matters for troubleshooting or compliance, WEF is not optional — it is the difference between having evidence and reconstructing from memory.
Recommended Operator Baseline
Before the next incident, raise log sizes. The classic System, Application, and Security logs default to 20,480 KB (20 MB). On a busy Windows Server, the System log overwrites itself in days. The CIS Benchmark and DISA STIG require the Security log at 196,608 KB or higher. Set these via Group Policy: Computer Configuration → Administrative Templates → Windows Components → Event Log Service.
Export before you touch. The wevtutil epl command takes seconds. Make it the first step in every investigation, before restarting services, applying patches, or rebooting.
Alert on Event 1102 and 104. A cleared Security or System log on a production server is rare under normal operations. Alerting on these events — via Task Scheduler, WEF subscription filter, or a monitoring tool — costs almost nothing and surfaces both authorized maintenance and potential tampering.
Build three Custom Views and deploy them across the fleet. Last-24h Critical+Error, storage health, and service failures. Export the XML and push via GPO or startup script.
Effective windows server event log troubleshooting is not a reactive skill — it depends on infrastructure you put in place before things break: baseline log sizes, centralized collection, and standing Custom Views that make the first 5 minutes of any incident faster. For sustained CPU, memory, and disk analysis beyond what event logs show, see Windows Server Performance Troubleshooting: A Triage Workflow. For disk-related events that escalate to a storage failure requiring recovery, see Windows Server Backup and Bare Metal Recovery with wbadmin.
Frequently Asked Questions
What does windows server event log troubleshooting look like for Event ID 41?
Start by reading the BugcheckCode field inside the event (Details tab, XML view). Non-zero means a BSOD occurred — convert the decimal value to hex and check Event 1001 for the dump path. All zeros means Windows could not record the failure: check UPS, BMC/iDRAC/iLO logs, Event 46, thermals, and RAM before assuming a power supply fault. Then correlate with Events 6008, 1074, and 6006 to build the shutdown timeline.
What is the difference between Event ID 7031 and 7034?
Both mean a service terminated unexpectedly. Event 7031 means a recovery action is configured in the Services snap-in Recovery tab — SCM will restart the service, run a program, or reboot after the configured delay. Event 7034 means no recovery action is configured and no automatic remediation will happen. The event text looks identical until you read whether a corrective action is mentioned.
Is Event ID 51 always a sign of a failing disk?
No. A single Event 51 can be completely harmless, particularly when it occurs after inserting blank optical media or removing a USB device without safe removal. Microsoft documents this explicitly. Recurring Event 51 on a fixed internal disk — especially alongside Events 7, 11, 55, 98, 129, or 153 — should be treated as a real storage issue. The pattern matters, not the event in isolation.
What should I check first when a Windows Server reboots unexpectedly?
Export the System log immediately, then look for the reboot timeline cluster: 6005 (boot), 6006 (clean shutdown marker — check whether it appears before the last 6005), 6008 (unexpected shutdown), and 41 (with BugcheckCode). If 6006 is missing before 6005, the shutdown was unclean. Event 1074 confirms a planned shutdown when present. Work backward from the first anomaly in the timeline, not forward from the loudest error.
Why does Get-WinEvent perform better than Get-EventLog?
Get-WinEvent with -FilterHashtable or -FilterXPath pushes the query into the Windows event log engine and filters before returning results. Get-EventLog (deprecated) and Get-WinEvent piped to Where-Object both retrieve all events first and filter afterward — significantly slower on large logs. On a System log with months of data, the difference is seconds versus minutes.
When should I use wevtutil instead of Get-WinEvent?
Use wevtutil for log exports (epl), log configuration (sl / gl), and in Server Core environments or batch scripts where PowerShell is limited. Use Get-WinEvent for interactive triage, object-based output, and remote queries via -ComputerName. Both support the same XPath query syntax — the same filter works in either tool.
Windows Server Operations Series
8 articles — Event Logs · Performance · Services · Remote Access · Network · Storage · Backup · Boot Recovery