Active Directory Health Check: Full Checklist for Domain Controllers (Windows Server 2025)

15 min read

Active Directory Health Check: Full Checklist for Domain Controllers (Windows Server 2025)

Active Directory failures rarely arrive without warning. Replication errors accumulate quietly for days before authentication stops working. DNS misconfiguration sits undetected until a DC promotion fails. SYSVOL backlogs grow in the background while Group Policy keeps applying from cache — until it doesn’t.

A structured active directory health check catches those signals before they become incidents. This checklist covers the full diagnostic surface: services, replication, DNS, SYSVOL, FSMO, time hierarchy, Kerberos, event logs, and the AD database. It’s structured by cadence so you can run the right checks at the right frequency, not the same scan every day.

TL;DR
  • Run dcdiag /q /skip:systemlog and repadmin /replsummary after every change — not just on a schedule
  • A “largest delta” under 60 minutes in repadmin /replsummary is normal — it is not a replication failure
  • Eight critical services must be running on every DC: NTDS, ADWS, DNS, Dnscache, KDC, W32Time, Netlogon, DFSR
  • On Windows Server 2025: wmic is disabled by default — replace with Get-CimInstance in any health scripts
  • Confirm KB5060842 is installed on WS2025 DCs — without it, a DC can pass local tests yet be unreachable on the domain network after a restart
  • The Kerberos clock skew tolerance is exactly 5 minutes — any time offset beyond that breaks authentication domain-wide

What a Healthy Active Directory Environment Looks Like

Before running any commands, it helps to know what you’re aiming for. A healthy AD environment has a specific, verifiable state — not a vague sense that “things seem fine.”

Area Healthy state
Services NTDS, ADWS, DNS, Dnscache, KDC, W32Time, Netlogon, DFSR all Running and set to Automatic
Replication No errors in repadmin /showrepl * /errorsonly; largest delta under 60 minutes
SYSVOL SYSVOL and NETLOGON shares present on every DC; no DFSR backlog; no events 2213 or 4012
DNS All SRV records registered; dcdiag /test:DNS returns no FAIL columns; no stale A records for decommissioned DCs
Time PDC emulator syncs from external NTP; all DCs within ±2 minutes of each other; offset well under 5-minute Kerberos threshold
FSMO All five role holders online and reachable; all DCs agree on role placement
Kerberos No Event 4769 with failure code 0x25 (clock skew); secure channel valid on all DCs
Event logs No events 1311, 1864, 2042 in Directory Service log; no events 5719, 5783 in System log
dcdiag dcdiag /e /c /q /skip:systemlog returns no failures
Database NTDS volume ≥10% free; system state backup within policy; AD Recycle Bin enabled

Every check in this article maps to one of those rows.

Before You Start

Required access: Domain Admins or delegated diagnostic permissions. Run all commands from an elevated PowerShell or CMD prompt on a DC. For enterprise-wide dcdiag /e, run from a DC that has network visibility to all sites.

Tools used in this active directory health check — all built into Windows Server or available via RSAT:

  • dcdiag — built into Windows Server, available via RSAT on management workstations
  • repadmin — built into Windows Server
  • netdom, nltest — built into Windows Server
  • w32tm — built into Windows Server
  • PowerShell AD module: Get-ADReplicationFailure, Get-ADReplicationPartnerMetadata — requires RSAT AD DS Tools
Failure scenario

On Windows Server 2025, KB5060842 must be installed before running any local health checks. Before this June 2025 cumulative update, WS2025 DCs could lose the domain firewall profile after a restart and become unreachable on the domain network — while local diagnostic tools showed green. Verify with Get-HotFix -Id KB5060842 first. If it is not installed, local results are not trustworthy.

Check 1: Core Services

Every DC must be running eight services. If any are stopped, nothing else in this active directory health check matters until they’re back.

$services = 'ntds','adws','dns','dnscache','kdc','w32time','netlogon','dfsr' Get-Service $services | Select-Object Name, Status, StartType

Expected output: Status = Running for all. StartType should be Automatic — a service set to Manual or Disabled will not survive a reboot.

Service What it does What breaks when stopped
NTDS Active Directory database engine Everything — DC stops functioning entirely
ADWS AD Web Services / PowerShell AD module PowerShell AD cmdlets fail
DNS Name resolution for clients and DCs DC locator, domain joins, replication
Dnscache DNS client cache DC can’t resolve partner DCs by name
KDC Kerberos ticket issuance Authentication across the domain
W32Time Time synchronization Kerberos fails when skew exceeds 5 minutes
Netlogon Secure channel, SYSVOL/NETLOGON shares, SRV records Domain authentication, Group Policy
DFSR SYSVOL replication between DCs Group Policy replication stops

If DFSR is missing from your environment, you’re running legacy FRS SYSVOL replication. FRS was removed in Windows Server 2025 — if any DC runs WS2025 and FRS is still present, the SYSVOL migration is incomplete.

Check 2: AD Replication

Replication is the most common source of AD failures. Run this first when something feels wrong. The full repadmin command reference on Microsoft Learn covers every flag used below.

repadmin /replsummary

The “largest delta” column shows how long ago the last successful replication occurred per DC. A delta under 60 minutes is normal — that’s the default intra-site replication polling interval. Operators frequently flag a 45-minute delta as a failure when it’s just the scheduler. A delta over 60 minutes warrants investigation. A delta measured in hours or days is an active failure.

repadmin /showrepl * /errorsonly

If this returns nothing, replication is healthy. If it returns errors, the error code drives the next step — 1722 (RPC unavailable, usually firewall or DNS), 8453 (access denied, permissions or secure channel), 8606/8614 (lingering objects or replication stalled past tombstone lifetime).

repadmin /queue

A queue that doesn’t drain over time means replication is stuck, not just delayed. Use this to distinguish “slow” from “broken.”

Get-ADReplicationFailure -Target * -Scope Forest | Sort-Object FailureCount -Descending

For deeper investigation of specific replication failures, see Active Directory Replication Not Working: How to Diagnose and Fix.

Check 3: SYSVOL and NETLOGON

SYSVOL and NETLOGON must be shared and accessible on every DC. When they’re not, Group Policy stops applying — sometimes silently, because clients use cached policy until it expires.

net share

Both SYSVOL and NETLOGON must appear. If either is missing, Netlogon is likely stopped or SYSVOL hasn’t completed initial synchronization.

dfsrdiag replicationstate

Expected output: no backlog or a low queue count. A persistent backlog on a specific DC indicates a DFSR problem — dirty shutdown, database corruption, or staging quota exhaustion are the most common causes.

Get-WinEvent -LogName "DFS Replication" -MaxEvents 50 | Where-Object { $_.Level -le 3 } | Select-Object TimeCreated, Id, Message | Format-Table -AutoSize -Wrap

Events 2213, 4012, and 5002 are the most operationally significant. Event 2213 indicates a dirty shutdown requiring a non-authoritative restore. Event 4012 means DFSR stopped replicating because SYSVOL content is too old — a more severe form of backlog requiring a D2/D4 restore procedure.

For detailed DFSR troubleshooting, see SYSVOL Replication Issues in Active Directory: DFSR Troubleshooting.

Check 4: DNS

DNS and AD are tightly coupled. Most replication errors (1722 in particular) trace back to DNS. This active directory health check covers the areas that actually fail in practice.

dcdiag /test:DNS /v

This runs seven sub-tests. Read the summary table at the end of the output — columns labeled Auth/Basc/Forw/Del/Dyn/RReg/Ext, one row per DC. A WARN means dcdiag couldn’t fully validate the configuration. Run with /v to get detail behind any flagged column.

The most common DNS failures in practice: a DC pointing only to itself for DNS (breaks name resolution after a restart before AD is fully started), missing SRV records, stale A records for decommissioned DCs, and scavenging disabled so stale records accumulate over months.

nslookup -type=SRV _ldap._tcp.dc._msdcs.yourdomain.com

If SRV records are missing, force re-registration:

nltest /dsregdns net stop netlogon && net start netlogon

For comprehensive DNS troubleshooting, see Active Directory DNS Problems: SRV Records, Zones, and Resolution Failures.

Check 5: Time Synchronization

Kerberos authentication fails when the clock offset between a client and a DC exceeds five minutes. That’s a hard failure — authentication stops working, domain joins fail, and replication can break.

w32tm /query /status

The key fields: Source (should be the PDC emulator or external NTP), Last Successful Sync Time, and Poll Interval. A source of Local CMOS Clock means W32Time has lost contact with its time source — this needs immediate attention.

w32tm /monitor /domain:yourdomain.com

On the PDC emulator, verify Type = NTP with an external server configured. On all other DCs, Type should be NT5DS. If the PDC emulator is syncing from CMOS or from another DC, the whole domain has no external time anchor.

w32tm /query /configuration

For time synchronization recovery procedures, see Active Directory Time Synchronization: Fix PDC Emulator, W32Time, and Kerberos Clock Skew.

Check 6: FSMO Role Holders

Every active directory health check must confirm that all five FSMO roles are held by reachable, functional DCs. If a role holder is offline, specific operations fail — new user accounts can’t be created (RID Master down), time sync breaks down (PDC Emulator down), schema changes are blocked (Schema Master down).

netdom query fsmo
nltest /dsgetdc:yourdomain.com /force
dcdiag /test:KnowsOfRoleHolders /e /v

Run the KnowsOfRoleHolders test with /e to compare knowledge across all DCs. If DCs disagree on role holders, replication is broken between them. For role management — transfer, seize, role holder failure — see FSMO Roles Active Directory: What They Do and How to Manage Them.

Check 7: Kerberos and Secure Channel

Services and event logs can all look normal while Kerberos and the secure channel are broken. These checks catch failures that don’t surface in service state.

Test-ComputerSecureChannel -Verbose
nltest /sc_verify:yourdomain.com

A broken secure channel causes sporadic authentication failures and replication access-denied errors (8453). It usually means the DC’s computer account password is out of sync.

Get-WinEvent -LogName Security -FilterXPath "*[System[(EventID=4769)]]" -MaxEvents 20 | Where-Object { $_.Message -match "0x25|0x18" }

Event 4769 with failure code 0x25 is a clock skew failure (KRB_AP_ERR_SKEW). A flood of 0x25 errors means time synchronization is broken somewhere in the chain — check the PDC emulator configuration before investigating anything else. Event 4769 with 0x18 is a bad password and points to a different problem entirely.

Check 8: Event Logs

Automated checks catch most things. Event logs surface failure patterns that commands don’t expose — especially slow-developing problems like a DC approaching tombstone lifetime.

Get-WinEvent -LogName "Directory Service" -MaxEvents 100 | Where-Object { $_.Level -le 2 } | Select-Object TimeCreated, Id, Message | Format-List
Event ID Source Meaning Action
1311 NTDS Replication KCC cannot build a spanning tree — replication path broken Investigate unreachable DC
1864 NTDS Replication DC hasn’t replicated in 14+ days — approaching tombstone lifetime Investigate immediately
2042 NTDS Replication DC hasn’t replicated past tombstone lifetime — lingering objects likely Urgent — replication blocked
5719 NETLOGON DC can’t locate a DC at startup — DNS or timing issue Check DNS and startup sequence
5783 NETLOGON Secure channel failure Run nltest /sc_verify
40960 LSASRV Kerberos authentication failure — time skew or KDC unreachable Check W32Time and KDC service

Events 1864 and 2042 require immediate action. A DC that hasn’t replicated in 60+ days will be blocked from replicating by default — the data divergence risk is too high for AD to reconcile automatically.

Check 9: dcdiag — Full Diagnostic Pass

dcdiag is the most comprehensive single-command active directory health check available on Windows Server. Run it at least weekly and after any structural change (DC promotion/demotion, site link changes, schema updates).

dcdiag /e /c /q /skip:systemlog /f:dcdiag-output.txt

The SystemLog test fails constantly in production environments because of unrelated OS events — Windows Update, driver messages, anything that generated an error in the last hour. Skip it in routine scans and review the System log separately when investigating specific incidents.

dcdiag /test:Advertising /test:Replications /test:SysVolCheck /test:DFSREvent /test:DNS /v

Run these five tests immediately after any DC promotion. They confirm the new DC is advertising correctly, has replicated, and has a healthy SYSVOL before clients start using it.

For a complete explanation of every dcdiag test, what FAILED means for each, common error codes (1722, 8453, 8606), and the DNS sub-test column breakdown, see DCDIAG Explained: How to Read and Interpret Domain Controller Diagnostics.

Check 10: AD Database

The AD database (NTDS.dit) rarely fails, but when it does the recovery path is painful. These checks take two minutes and are worth running quarterly.

$ntdsPath = (Get-ItemProperty "HKLM:\System\CurrentControlSet\Services\NTDS\Parameters")."DSA Database file" $drive = Split-Path $ntdsPath -Qualifier Get-PSDrive $drive | Select-Object Name, Used, Free

The NTDS volume needs free space for defrag operations and log file growth. Flag any volume below 10% free.

Get-Item $ntdsPath | Select-Object Name, Length, LastWriteTime

NTDS.dit grows over time and never shrinks automatically. Online defrag (runs nightly by default) reclaims whitespace internally but doesn’t reduce file size. Offline defrag with ntdsutil reduces file size but requires stopping AD DS — a maintenance window operation. Don’t run offline defrag unless the file is significantly oversized relative to your object count.

AD should be backed up at least once within the tombstone lifetime (default 60 days). In practice, “once every 60 days” is not a recovery strategy. The real question is: if a DC failed right now, how old would your backup be? Most shops target daily or weekly system state backups on at least one DC per domain.

Active Directory Health Check PowerShell Script

For environments where you want a repeatable, schedulable active directory health check without running commands manually, this script covers the core diagnostic areas and writes a summary to the console. It’s a starting point — expand it to match your environment. For the full dcdiag switch reference used in Check 9, see the Microsoft Learn dcdiag documentation.

#Requires -Modules ActiveDirectory # AD Health Check — run elevated on a DC # Output: console summary per check area $domain = (Get-ADDomain).DNSRoot $errors = @() Write-Host "`n=== AD Health Check: $domain ===" -ForegroundColor Cyan Write-Host (Get-Date) "`n" # 1. Core services Write-Host "--- Services ---" $svcNames = 'ntds','adws','dns','dnscache','kdc','w32time','netlogon','dfsr' Get-Service $svcNames | ForEach-Object { $status = if ($_.Status -eq 'Running') { "OK" } else { "FAIL" } if ($_.Status -ne 'Running') { $errors += "Service not running: $($_.Name)" } Write-Host "$status $($_.Name) [$($_.Status)]" } # 2. Replication summary Write-Host "`n--- Replication ---" $replErrors = repadmin /showrepl * /errorsonly 2>&1 if ($replErrors -match '\S') { Write-Host "WARN Replication errors detected — run repadmin /showrepl * /errorsonly for detail" $errors += "Replication errors present" } else { Write-Host "OK No replication errors" } # 3. SYSVOL shares Write-Host "`n--- SYSVOL ---" $shares = net share 2>&1 foreach ($share in @('SYSVOL','NETLOGON')) { if ($shares -match $share) { Write-Host "OK $share share present" } else { Write-Host "FAIL $share share missing" $errors += "$share share missing" } } # 4. DFSR backlog Write-Host "`n--- DFSR ---" $dfsr = dfsrdiag replicationstate 2>&1 if ($dfsr -match 'No backlog') { Write-Host "OK No DFSR backlog" } else { Write-Host "WARN DFSR backlog detected — run dfsrdiag replicationstate for detail" $errors += "DFSR backlog present" } # 5. Time sync Write-Host "`n--- Time ---" $time = w32tm /query /status 2>&1 $source = ($time | Select-String "Source").ToString() Write-Host "INFO $source" if ($source -match 'Local CMOS Clock') { Write-Host "FAIL W32Time lost external time source" $errors += "W32Time source is Local CMOS Clock" } else { Write-Host "OK Time source looks healthy" } # 6. FSMO roles Write-Host "`n--- FSMO ---" $fsmo = netdom query fsmo 2>&1 Write-Host ($fsmo | Out-String).Trim() # 7. Critical Directory Service events Write-Host "`n--- Event Log ---" $criticalIds = @(1311, 1864, 2042) $events = Get-WinEvent -LogName "Directory Service" -MaxEvents 500 -ErrorAction SilentlyContinue | Where-Object { $_.Id -in $criticalIds } if ($events) { $events | ForEach-Object { Write-Host "WARN Event $($_.Id): $($_.Message.Substring(0,[Math]::Min(120,$_.Message.Length)))..." $errors += "Directory Service Event $($_.Id) found" } } else { Write-Host "OK No critical Directory Service events (1311, 1864, 2042)" } # Summary Write-Host "`n=== Summary ===" if ($errors.Count -eq 0) { Write-Host "All checks passed." -ForegroundColor Green } else { Write-Host "$($errors.Count) issue(s) found:" -ForegroundColor Yellow $errors | ForEach-Object { Write-Host " - $_" } }
Failure scenario

On Windows Server 2025, the script’s WMIC-based checks must be replaced with Get-CimInstance. Any community script downloaded before mid-2025 that uses wmic for WMI queries will fail silently or throw an error on WS2025 DCs where WMIC is disabled by default. Review any third-party script before running it in a WS2025 environment.

For a more comprehensive PowerShell-based AD health report with HTML output, the community scripts ADxRay by Claudio Merola and ALI TAJRAN’s Get-ADHealth.ps1 are widely deployed in SMB environments and cover additional areas (security posture, object inventory, schema version) that a minimal script doesn’t reach.

Common Active Directory Health Check Mistakes

These are the patterns that cause operators to miss real problems or waste time on non-problems during an active directory health check.

Checking only one DC. Replication failures are asymmetric — a problem between DC01 and DC03 won’t show up when you run diagnostics only on DC01 and DC02. Always run dcdiag /e and repadmin /showrepl * with the asterisk to cover the whole forest.

Treating a 45-minute replication delta as a failure. The default intra-site replication polling interval is 15 seconds per change notification, but the KCC schedules replication at roughly 60-minute intervals for steady-state. A “largest delta” of 45 minutes in repadmin /replsummary is the scheduler doing its job. A delta of 6 hours is a problem.

Assuming DNS is healthy because name resolution works. Client-facing name resolution can work fine while SRV record registration is broken, dynamic updates are disabled, or stale DC records are causing RPC failures between DCs. DNS health for clients and DNS health for AD replication are different things. Run dcdiag /test:DNS /v regardless.

Trusting event logs alone without running repadmin. The Directory Service event log shows you that something went wrong. It often doesn’t show you which partner, which naming context, or the exact error code. repadmin /showrepl * /errorsonly gives you the specific replication failure — the event log is a signal to look, not a complete diagnosis.

Not validating SYSVOL. SYSVOL can be shared and accessible while Group Policy content is stale or missing on a specific DC. A client hitting that DC gets old policy from cache and no one notices until the cached policy expires or the cache is cleared. dfsrdiag replicationstate catches active backlogs before they matter to end users.

Running dcdiag without /skip:systemlog and treating every failure as AD-related. The SystemLog test fires on anything in the System event log in the last hour — Windows Update, driver events, hardware messages. In a production environment, it almost always fails. The result is a dcdiag output with ten apparent failures where nine are noise and one is real. Use /skip:systemlog for routine scans.

Windows Server 2025 — Active Directory Health Check Changes

If any DC in your environment runs Windows Server 2025, these changes affect existing active directory health check procedures now.

WMIC disabled by default. Any legacy health-check script using wmic will fail silently or with an error. Replace with Get-CimInstance. This includes popular community scripts and some third-party monitoring agents that haven’t been updated.

VBScript deprecated. VBS-based AD monitoring wrappers need to be migrated to PowerShell. VBScript became a Feature-on-Demand in WS2025 and is targeted for removal in a future release.

PowerShell 2.0 engine removed (September 2025 update and later). If any monitoring tool depends on the PS 2.0 engine specifically, it stops working on updated WS2025 systems.

Credential Guard enabled by default. This protects domain credentials but can cause failures with legacy authentication protocols and some monitoring agents that use older NTLM flows. Verify monitoring agent compatibility before deploying WS2025 DCs into production.

New functional level 10 and 32k database pages. Windows Server 2025 introduces functional level 10, which enables 32k database pages — the first ESE page-size change since Windows 2000. This change is forest-wide and irreversible once enabled. All DCs must be on WS2025 first, and the change requires explicitly running Enable-ADOptionalFeature. Don’t enable it under time pressure.

KB5060842 — domain firewall profile bug. The most operationally urgent WS2025 item. Before this June 2025 cumulative update, WS2025 DCs could lose the domain firewall profile after a restart and become unreachable on the domain network while local diagnostic tools reported healthy. Every WS2025 DC needs to be at this patch level or later — verify with Get-HotFix -Id KB5060842.

Active Directory Health Check — Cadence Reference

Use this as an operational template. The goal is running the right check at the right frequency — not the same full scan every day. Microsoft’s Active Directory replication concepts documentation covers the underlying topology and scheduling model if you want to understand the intervals behind the cadence.

Daily (automate where possible)

Check Command Healthy threshold
Core services Get-Service ntds,adws,dns,dnscache,kdc,w32time,netlogon,dfsr All Running
Replication summary repadmin /replsummary No errors; largest delta <60 min
Critical events Directory Service log, Level ≤ 2 No events 1311, 1864, 2042
DC reachability nltest /dsgetdc:domain /force Returns a DC without error

After Every Change

Check Command What to verify
Replication errors repadmin /showrepl * /errorsonly No output = no errors
dcdiag core tests dcdiag /test:Advertising /test:Replications /test:SysVolCheck /test:DNS /v All PASSED
SYSVOL shares net share on affected DCs SYSVOL and NETLOGON present
FSMO knowledge dcdiag /test:KnowsOfRoleHolders /e All DCs agree on role holders

Weekly

Check Command What to verify
Full replication detail repadmin /showrepl * /errorsonly No output
DFSR health dfsrdiag replicationstate No persistent backlog
DNS test suite dcdiag /test:DNS /v All columns PASS or WARN with known cause
dcdiag full pass dcdiag /e /c /q /skip:systemlog /f:dcdiag-weekly.txt No failures
Time hierarchy w32tm /monitor /domain:yourdomain.com All DCs within ±2 minutes
Secure channel nltest /sc_verify:domain Secure channel valid

Quarterly

Check Command / Action What to verify
NTDS volume free space Check drive where NTDS.dit lives ≥10% free
Backup recency Check backup logs or Windows Server Backup System state backed up within policy
AD Recycle Bin Get-ADOptionalFeature "Recycle Bin Feature" Enabled
Trust health nltest /domain_trusts All trusts verified (if trusts exist)
Functional level Get-ADForest | Select-Object ForestMode At target level
FSMO role placement netdom query fsmo Roles distributed per design
WS2025 patch level Get-HotFix -Id KB5060842 Installed (WS2025 DCs only)

FAQ

How often should I perform an Active Directory health check?

Replication and service state should be checked daily — ideally automated. A full active directory health check covering DNS, SYSVOL, time, FSMO, and dcdiag should run weekly and after every structural change. Quarterly checks cover the AD database, backup recency, and trust health. The cadence table above maps the right command to the right frequency.

What is the most important Active Directory health check?

Replication. Everything else in AD depends on replication working correctly — DNS records, Group Policy, Kerberos, SYSVOL. repadmin /showrepl * /errorsonly returning nothing is the single most reliable signal that an AD environment is healthy. If replication has errors, fix those before running any other active directory health check.

Can dcdiag detect all Active Directory problems?

No. dcdiag tests configuration and state across about 20 default areas — services, replication connection objects, SYSVOL, DNS records, FSMO role knowledge. It doesn’t check disk space, backup recency, GPO content correctness, absolute time accuracy, or security hardening posture. A clean dcdiag is a necessary check, not a sufficient one. For what dcdiag specifically misses, see DCDIAG Explained: How to Read and Interpret Domain Controller Diagnostics.

How do I know if SYSVOL replication is healthy?

Three checks: net share confirms SYSVOL and NETLOGON shares are present; dfsrdiag replicationstate shows whether DFSR has an active backlog; and the DFS Replication event log shows whether Events 2213 or 4012 have fired recently. All three should be clean. SYSVOL can be shared while Group Policy content is stale on a specific DC — the event log and dfsrdiag backlog check together catch that scenario.

What causes Active Directory replication failures?

The three most common causes in practice: DNS misconfiguration (DCs can’t resolve each other, breaking RPC — error 1722), firewall blocking the RPC endpoint mapper on TCP 135 or dynamic ports 49152–65535 (also error 1722), and permissions or secure channel problems (error 8453). Longer-standing failures — a DC that missed the tombstone lifetime — produce errors 8606 or 8614 and require more involved recovery. Start with repadmin /showrepl * /errorsonly to get the specific error code, then follow the error to its cause.

Final Thoughts

A structured active directory health check isn’t just diagnostic housekeeping. Replication errors that go unchecked for weeks become lingering object problems. DNS misconfigurations that seem minor cause replication failures on the next DC promotion. SYSVOL backlogs that no one monitors leave some users getting stale Group Policy from cache long after the problem started.

The cadence table above is designed to catch those patterns before they escalate. Daily automated checks on services and replication take seconds. The weekly dcdiag pass catches the slow-developing failures. The quarterly checks cover the areas — database space, backup recency, trust health — that rarely fail but are painful when they do.

When something does fail, the articles below cover the remediation side for each area in this checklist: