Windows Server Storage Troubleshooting: Disk Errors and CHKDSK (2026)

12 min read

TL;DR

Event ID 7 (bad block) means physical media damage – run SMART immediately, do not wait for CHKDSK
Event ID 51 is logged on buffered I/O only – a single occurrence after a USB removal or media change is not a dying disk
Event IDs 129 and 153 are different: 129 is an adapter-level reset, 153 is a disk-level timeout – the distinction matters before chasing hardware
Check SMART before running CHKDSK – a drive showing Predictive Failure should be imaged, not scanned
Get-PhysicalDisk | Select FriendlyName, HealthStatus, OperationalStatus is the fastest SMART triage on directly attached disks
CHKDSK is a file system tool, not a hardware repair tool – a clean pass on a failing drive only buys time
On hardware RAID systems, Windows event log shows symptoms only – the RAID console gives the actual diagnosis

A disk throwing errors is not always a dying disk. Event ID 51 fires after a USB removal. CHKDSK completes with zero errors on a drive that fails completely two weeks later. The hard part of windows server storage troubleshooting is not running the tools – it is interpreting what they tell you and making the right call: investigate further, schedule replacement, or image and replace immediately.

On this page

This windows server storage troubleshooting guide covers the event IDs that matter for disk health, how to read them correctly, SMART verification with Get-PhysicalDisk, the CHKDSK workflow and its limits, and the judgment calls operators get wrong most often. For network-related storage issues such as iSCSI path failures, see the Windows Server Network Troubleshooting article. For performance counters related to disk latency, see Windows Server Performance Troubleshooting.

Windows server storage troubleshooting decision tree - Event ID triage flowchart for disk errors, CHKDSK, and SMART on Windows Server 2025

The Judgment Call at the Core of Windows Server Storage Troubleshooting

Most windows server storage troubleshooting guides stop at “run CHKDSK /r and reboot.” That advice is incomplete – and in some cases actively harmful.

CHKDSK repairs file system structures. It can mark bad sectors and recover readable data from them. What it cannot do is fix a drive with deteriorating platters, a failing read head, or growing reallocated sector counts. A CHKDSK pass that completes with zero errors means the file system is intact. It says nothing about whether the drive will survive the next six months.

The judgment call is: distinguish between a file system problem (CHKDSK fixes it) and a hardware problem (image and replace). The event IDs and SMART data tell you which one you are dealing with. Running CHKDSK on a hardware-failing drive wastes the time you could have used to copy the data off.

Event IDs for Windows Server Storage Troubleshooting

Windows logs disk and storage errors in the System event log. The source is either disk (physical disk driver), Ntfs (file system), or Storport (storage port driver). Each tells a different part of the story.

Event ID 7 – Bad Block (Source: disk)

The device, \Device\Harddisk0\DR0, has a bad block.

This is a hard read failure. The disk returned an unrecoverable read error on a specific sector. Bad blocks on a modern drive indicate physical media damage – the drive’s internal remapping has been exhausted for that area, or the drive is failing to remap fast enough. A single Event ID 7 warrants immediate SMART check. Multiple Event ID 7s across different sessions mean the drive is deteriorating. Do not wait for CHKDSK – image first.

Event ID 11 – Controller Error (Source: disk)

The driver detected a controller error on \Device\Harddisk1\DR1.

Ambiguous by design. Could be the disk controller, the cable, the HBA, or the drive firmware. First step: reseat the SATA/SAS cable and test with a different port on the controller. If Event ID 11 follows the disk to a new port – the disk is the problem. If it stays on the original port – suspect the cable or controller. Event ID 11 on a RAID controller usually means the controller is reporting the error, not the underlying disk – check the RAID event log separately.

Event ID 51 – Paging Operation Error (Source: disk)

An error was detected on device \Device\Harddisk2\DR2 during a paging operation.

This one gets misread constantly. Event ID 51 is logged on buffered I/O only – reads and writes that go through the file system cache, memory-mapped files, and paging file activity. It is not logged on nonbuffered (direct) I/O.

Practical implication: a single Event ID 51 occurrence after inserting blank media, removing a USB drive, or a brief power fluctuation is not a hardware failure. The event fires because the I/O to the paging file or a memory-mapped file was interrupted – not because the disk has bad sectors. When Event ID 51 becomes a concern: multiple occurrences on the same disk during normal operation with no recent hardware changes. At that point, treat it like Event ID 7 and run SMART verification.

Failure scenario A burst of Event ID 51 entries appearing within a 30-second window during a known maintenance window is almost always a SAN or iSCSI path failover – the multipath driver renegotiating during a path change. These are I/O interruptions during path switch, not disk errors. Check the time correlation against any change window before escalating to a storage hardware issue.

Event IDs 55 and 98 – NTFS File System Corruption (Source: Ntfs)

The file system structure on the disk is corrupt and unusable. Please run the chkdsk utility on the volume.

Event ID 55 means NTFS detected a structural inconsistency – a directory entry, MFT record, or metadata structure does not match expectations. Event ID 98 is the volume dirty bit – set when Windows did not flush properly and cleared after a successful CHKDSK pass. These are file system events, not hardware events. They often follow a hard power loss or improper shutdown. CHKDSK /f is the correct response. If CHKDSK finds and fixes the issues cleanly – done. If CHKDSK reports uncorrectable errors – the underlying hardware may be the real cause.

Event IDs 129 and 153 – Storport Timeouts (Source: Storport)

Reset to device, \Device\Scsi\storport1, was issued.              (Event ID 129)
The IO operation at logical block address X for Disk Y timed out. (Event ID 153)

Event ID 129 is a Storport-level reset – the storage port driver reset the entire adapter because a command did not complete in time. Event ID 153 is a miniport-level timeout – a specific command to a specific disk timed out. The distinction matters: Event ID 129 on a RAID controller points to the controller or its bus connection. Event ID 153 on a specific disk points to that disk or its cable and port. If you see 153 on one disk and 129 on the controller in the same time window – the disk timeout triggered the controller reset. Under heavy I/O load, occasional 129/153 events on spinning disks are not immediately alarming. Sustained occurrences under light load mean the disk or controller is struggling.

Event ID Reference Table

Event ID	Source	What It Means	Urgency
7	disk	Hard read failure – unrecoverable bad block	High – SMART check immediately
11	disk	Controller error – disk, cable, or HBA	Medium – reseat cable, test port
51	disk	Paging I/O error – buffered I/O only	Low (single) / High (repeated)
55	Ntfs	NTFS structure corrupt – run CHKDSK /f	Medium – CHKDSK, monitor hardware
98	Ntfs	Volume dirty – CHKDSK queued on next boot	Low – normal after improper shutdown
129	Storport	Adapter-level reset – controller or bus	Medium-High – check controller
153	Storport	Disk-level command timeout	Medium – check disk and cable

For filtering these events by time window and correlating them with other system activity, see the Windows Server Event Log Troubleshooting article. Microsoft Learn documents the full CHKDSK command reference and exit codes including switch behavior across file system types.

SMART Verification: First Step in Windows Server Storage Troubleshooting

Before running CHKDSK, check SMART status. A drive in Predictive Failure state should not get a CHKDSK – it should get imaged and replaced. This is the step most operators skip, and it is where the worst data loss scenarios begin.

# Quick health overview - run this first
Get-PhysicalDisk | Select-Object FriendlyName, HealthStatus, OperationalStatus, Size

# Detailed reliability counters
Get-PhysicalDisk | Get-StorageReliabilityCounter |
  Select-Object DeviceId, ReadErrorsTotal, WriteErrorsTotal, Temperature, Wear

HealthStatus values:

Healthy – no issues reported
Warning – SMART thresholds crossed but not failed; schedule replacement
Unhealthy – drive is failing or has failed; do not write new data, image immediately
Unknown – disk is behind a hardware RAID controller that abstracts SMART; check the RAID management console directly

OperationalStatus values to watch:

Predictive Failure – SMART predicts imminent failure; image and replace
Lost Communication – disk disappeared from the bus; check cable and power
Removed – disk was removed or failed; offline state

Get-StorageReliabilityCounter requires the disk to be managed through Windows Storage Spaces or directly attached without a hardware RAID controller abstracting it. On systems with hardware RAID, SMART data is only accessible through the vendor management utility.

SSD and NVMe Considerations

SMART interpretation differs between spinning disks and SSDs. On SSDs, reallocated sector counts are less meaningful – SSDs handle wear leveling internally and expose different attributes. The key indicators for SSD health in windows server storage troubleshooting are wear level (reported as Wear in Get-StorageReliabilityCounter, where 0 means new and 100 means end of life on most implementations) and media errors rather than reallocated sectors.

Event ID 157 (disk was surprised removed) can appear on NVMe drives under driver stress or slot instability – not always a physical removal. NVMe failures also surface through Event ID 129/153 in the Storport log, same as SAS/SATA. If Get-PhysicalDisk shows Unknown for an NVMe drive, the driver may not be exposing health data – check Device Manager for driver warnings first.

Vendor SMART Tools

When Windows reporting shows Unknown or when hardware RAID abstracts disk identity, vendor tools are the only path to disk-level SMART data. The relevant tools by platform: HP/HPE environments use HPE Smart Storage Administrator (SSA); Dell environments use OpenManage Server Administrator (OMSA) or the iDRAC storage page; Broadcom/LSI RAID controllers use MegaRAID Storage Manager; for directly attached consumer or enterprise SSDs, Samsung Magician and Intel MAS (Memory and Storage Tool) expose manufacturer-specific attributes that Windows does not surface. None of these replace Get-PhysicalDisk for quick triage – they supplement it when Windows reporting is limited.

CHKDSK in Windows Server Storage Troubleshooting: What It Fixes and What It Cannot

CHKDSK operates at the file system layer. It verifies and repairs MFT integrity, directory structure consistency, cluster allocation, and file system metadata. It does not repair physical media. When CHKDSK marks a bad sector, it moves the data elsewhere and flags that sector as unusable. The sector is still physically on the disk – just avoided. If the drive continues to develop bad sectors, CHKDSK will keep marking them until there is nowhere left to move data.

When to Run CHKDSK

After Event ID 55 or 98 following a power loss or improper shutdown
After a clean SMART check with no hardware error events
As part of a scheduled health check on volumes not verified recently

When NOT to Run CHKDSK

When SMART shows Predictive Failure or Unhealthy – image first
When Event ID 7 is firing repeatedly – the drive may fail during a multi-hour CHKDSK scan
When the volume is needed immediately – CHKDSK /r on a system volume requires a reboot and exclusive access

# Read-only scan - reports errors, fixes nothing (can run online)
chkdsk C:

# Fix file system errors - requires dismount or scheduled reboot for system volume
chkdsk C: /f

# Fix errors and scan for bad sectors (slow - hours on large spinning disks)
chkdsk C: /r

# Schedule on next reboot for system volume - answer Y when prompted
chkdsk C: /f /r

On a 4TB spinning disk, chkdsk /r can take 8-12 hours. On SSDs, CHKDSK /f (file system check only) is typically sufficient – CHKDSK /r on an SSD reads every sector but does not recover data the same way spinning disk recovery works.

Reading CHKDSK results after completion:

Get-WinEvent -LogName Application |
  Where-Object { $_.Id -eq 26214 -and $_.ProviderName -eq 'Microsoft-Windows-Chkdsk' } |
  Select-Object -First 5 | Format-List TimeCreated, Message

Or in Event Viewer: Applications and Services Logs Or in Event Viewer: Applications and Services Logs > Microsoft > Windows > Chkdsk > Operational.

gt; Microsoft Or in Event Viewer: Applications and Services Logs > Microsoft > Windows > Chkdsk > Operational.

gt; Windows Or in Event Viewer: Applications and Services Logs > Microsoft > Windows > Chkdsk > Operational.

gt; Chkdsk Or in Event Viewer: Applications and Services Logs > Microsoft > Windows > Chkdsk > Operational.

gt; Operational. For SMART attribute definitions and OperationalStatus values, the Get-StorageReliabilityCounter reference on Microsoft Learn lists all exposed counters.

The Decision Tree: Investigate, Schedule, or Replace

When windows server storage troubleshooting surfaces errors, the question is always timing and risk. The signals map to three responses:

Image and replace immediately:

SMART HealthStatus: Unhealthy or Predictive Failure
Multiple Event ID 7 errors across different sessions
ReadErrorsTotal or WriteErrorsTotal climbing across daily Get-StorageReliabilityCounter checks
CHKDSK completes but reports uncorrectable errors
Event ID 11 follows the disk to a different cable and port

Investigate and monitor closely:

Single Event ID 51 with no corroborating events
Event ID 11 that does not follow the disk to a new port
SMART shows Warning but no Predictive Failure
Event ID 129/153 only under sustained heavy I/O

CHKDSK and monitor:

Event ID 55 or 98 after power loss or improper shutdown
SMART shows Healthy, no hardware error events in the log
One-time NTFS inconsistency with no repeat

Failure scenario In practice, “investigate and monitor” gets treated as “do nothing.” Monitoring means pulling SMART data on days 1, 3, and 7, watching for new Event ID 7 or 11 entries, and confirming a verified backup exists before the monitoring window closes. A drive that looks borderline-Warning on day 1 can cross into Predictive Failure by day 4. The monitoring window is not passive – it requires active checks.

What Breaks: Windows Server Storage Troubleshooting Failure Modes

CHKDSK passes cleanly but the disk dies shortly after

Run Get-PhysicalDisk | Get-StorageReliabilityCounter and check ReadErrorsTotal and WriteErrorsTotal
Pull the same counters again after 3 days and compare – a rising count is the signal CHKDSK does not expose
Check for Event ID 7 entries in the System log that may have appeared between the CHKDSK run and now
If counters are rising or Event ID 7 appears, treat as hardware failure regardless of the clean CHKDSK result
Image the volume before the next scheduled check – do not wait for Predictive Failure state

Get-PhysicalDisk shows Unknown for all disks

Confirm whether the server uses a hardware RAID controller – Unknown is expected behavior behind HP Smart Array, Dell PERC, or Broadcom MegaRAID
On hardware RAID, open the vendor management console (HPE SSA, Dell OMSA, MegaRAID Storage Manager) for actual disk health
If the server uses direct-attached disks without RAID and Unknown is showing, check the storage driver in Device Manager for warnings
On NVMe drives showing Unknown, update the NVMe driver and retest – some inbox drivers do not expose health attributes

CHKDSK scheduled but never runs on reboot

Confirm CHKDSK was actually scheduled: reg query "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager" /v BootExecute
The value should contain autocheck autochk * – if empty or modified, AutoChk has been disabled
Restore the default value: reg add "HKLM\SYSTEM\CurrentControlSet\Control\Session Manager" /v BootExecute /t REG_MULTI_SZ /d "autocheck autochk *" /f
Reschedule CHKDSK and reboot again
If AutoChk runs but skips the volume, the dirty bit may have been cleared by another process before boot completed

Event ID 51 flood with no obvious cause

Check the time window – if all 51 events appear within 30-60 seconds, correlate with any change window or SAN/iSCSI maintenance
If iSCSI or FC multipath is in use, path failover during a switch or storage maintenance generates Event ID 51 bursts – this is expected behavior
If events are ongoing and unpredictable with no maintenance correlation, check Storport for accompanying Event ID 129/153
Run Get-PhysicalDisk | Select HealthStatus, OperationalStatus to confirm the disk is not already in Warning or Unhealthy state
If SMART is clean and 51s are isolated to a specific time pattern matching paging activity under load, check the pagefile disk for I/O saturation in Performance Monitor (Avg. Disk sec/Transfer > 25ms is the threshold for investigation)

Windows Server Storage Troubleshooting: Verification Checklist Before Escalating

Run through this windows server storage troubleshooting checklist and document the output before opening a hardware ticket or ordering a replacement disk. Having this data ready shortens vendor escalation significantly.

# 1. SMART health overview
Get-PhysicalDisk | Select-Object FriendlyName, HealthStatus, OperationalStatus

# 2. Reliability counters - run on days 1, 3, and 7 and compare
Get-PhysicalDisk | Get-StorageReliabilityCounter |
  Select-Object DeviceId, ReadErrorsTotal, WriteErrorsTotal, Temperature, Wear

# 3. Disk error events - last 7 days
Get-WinEvent -LogName System -MaxEvents 1000 |
  Where-Object { $_.Id -in @(7, 11, 51, 129, 153) } |
  Select-Object TimeCreated, Id, Message | Sort-Object TimeCreated

# 4. NTFS events
Get-WinEvent -LogName System -MaxEvents 500 |
  Where-Object { $_.Id -in @(55, 98) -and $_.ProviderName -eq 'Ntfs' } |
  Select-Object TimeCreated, Id, Message

# 5. Volume dirty status
fsutil dirty query C:

# 6. Volume health and free space
Get-Volume | Select-Object DriveLetter, FileSystemLabel, HealthStatus, SizeRemaining, Size

Scope: What This Windows Server Storage Troubleshooting Guide Does Not Cover

Hardware RAID rebuild and degraded array recovery – requires vendor-specific tooling and is out of scope here
Storage Spaces pool degradation and virtual disk repair – separate topic planned for a future article
iSCSI and SAN path troubleshooting – multipath I/O, path failover, iSCSI initiator errors: planned for a future article
Boot disk failure recovery – if the system disk is failing and Windows will not boot, see the Windows Server Boot Failure Recovery article (planned). Microsoft Learn covers disk management and volume troubleshooting for additional reference.
Windows Server Backup and restore – see the Windows Server Backup and Recovery article (planned)

Windows Server Storage Troubleshooting: Frequently Asked Questions

Can CHKDSK repair a failing hard drive?

No. In windows server storage troubleshooting, CHKDSK repairs file system structures – MFT entries, directory records, and allocation tables. It can mark bad sectors as unusable and move data away from them, but it cannot fix deteriorating hardware. A clean CHKDSK pass on a failing drive means the file system is intact. It does not mean the drive is healthy.

What does Event ID 51 actually mean?

It means a buffered I/O operation to a paging file or memory-mapped file was interrupted. Because it only fires on buffered I/O and not on direct I/O, a single occurrence – especially after a USB removal or media change – is not a reliable indicator of a failing disk. Multiple occurrences on the same disk during normal operation are a different story and warrant SMART investigation.

Should I run CHKDSK before checking SMART?

No. Check SMART first. If the drive shows Predictive Failure or Unhealthy, image it before running anything. CHKDSK on a failing drive can take hours and the drive may not survive the scan. SMART takes 30 seconds and tells you whether CHKDSK is even the right tool.

What is the difference between Event ID 129 and 153?

Event ID 129 is an adapter-level reset – the Storport driver reset the entire storage adapter because a command did not complete. Event ID 153 is a disk-level timeout – a specific command to a specific disk timed out. If you see 153 on a disk and 129 on the controller in the same time window, the disk timeout triggered the controller reset. Isolate them by checking which physical disk the 153 event references.

Does Get-PhysicalDisk work on hardware RAID systems?

Partially. Get-PhysicalDisk sees the RAID logical drive, not the individual physical disks behind it. HealthStatus will show Unknown for the underlying disks because the RAID controller abstracts them from Windows. For actual disk health on hardware RAID, use the vendor management utility – HPE SSA, Dell OMSA, or MegaRAID Storage Manager depending on the platform.

Final Thoughts

Effective windows server storage troubleshooting follows a clear hierarchy: event log first, SMART second, CHKDSK third. Running them in the wrong order wastes time at best and destroys data at worst.

The event IDs are precise if you read them correctly. Event ID 7 is a hardware problem. Event ID 51 might not be. Event ID 55 is a file system problem that CHKDSK can fix. The difference between those three outcomes determines whether you are rebooting for a CHKDSK pass or imaging a drive at 2 AM before it dies completely.

CHKDSK gets asked to do hardware triage. It cannot. Know the difference, run SMART before CHKDSK, and treat “image and replace” as the conservative call – not the dramatic one.

Windows Server Operations Series

8 articles — Event Logs · Performance · Services · Remote Access · Network · Storage · Backup · Boot Recovery

Operations & Troubleshooting

Event Log Troubleshooting: Event IDs, Timelines, and Operator Commands Performance Troubleshooting: CPU, Memory, Disk, and Poolmon Triage Service Failures & Recovery: Dependency Chains and Auto-Restart RDP and WinRM Down: How to Recover Remote Access Network Troubleshooting: Client-Side Connectivity Triage Storage Troubleshooting: Disk Errors, CHKDSK, and Knowing When a Drive Is Dying Backup and Bare Metal Recovery: wbadmin, WSB, and Restore Verification Boot Failure Recovery: WinRE, bootrec, BCDEdit, and 0xc000000e