Active Directory Health Check: Full Checklist for Domain Controllers (Windows Server 2025)
Active Directory failures rarely arrive without warning. Replication errors accumulate quietly for days before authentication stops working. DNS misconfiguration sits undetected until a DC promotion fails. SYSVOL backlogs grow in the background while Group Policy keeps applying from cache — until it doesn’t.
A structured active directory health check catches those signals before they become incidents. This checklist covers the full diagnostic surface: services, replication, DNS, SYSVOL, FSMO, time hierarchy, Kerberos, event logs, and the AD database. It’s structured by cadence so you can run the right checks at the right frequency, not the same scan every day.
- Run
dcdiag /q /skip:systemlogandrepadmin /replsummaryafter every change — not just on a schedule - A “largest delta” under 60 minutes in
repadmin /replsummaryis normal — it is not a replication failure - Eight critical services must be running on every DC: NTDS, ADWS, DNS, Dnscache, KDC, W32Time, Netlogon, DFSR
- On Windows Server 2025:
wmicis disabled by default — replace withGet-CimInstancein any health scripts - Confirm KB5060842 is installed on WS2025 DCs — without it, a DC can pass local tests yet be unreachable on the domain network after a restart
- The Kerberos clock skew tolerance is exactly 5 minutes — any time offset beyond that breaks authentication domain-wide
What a Healthy Active Directory Environment Looks Like
Before running any commands, it helps to know what you’re aiming for. A healthy AD environment has a specific, verifiable state — not a vague sense that “things seem fine.”
| Area | Healthy state |
|---|---|
| Services | NTDS, ADWS, DNS, Dnscache, KDC, W32Time, Netlogon, DFSR all Running and set to Automatic |
| Replication | No errors in repadmin /showrepl * /errorsonly; largest delta under 60 minutes |
| SYSVOL | SYSVOL and NETLOGON shares present on every DC; no DFSR backlog; no events 2213 or 4012 |
| DNS | All SRV records registered; dcdiag /test:DNS returns no FAIL columns; no stale A records for decommissioned DCs |
| Time | PDC emulator syncs from external NTP; all DCs within ±2 minutes of each other; offset well under 5-minute Kerberos threshold |
| FSMO | All five role holders online and reachable; all DCs agree on role placement |
| Kerberos | No Event 4769 with failure code 0x25 (clock skew); secure channel valid on all DCs |
| Event logs | No events 1311, 1864, 2042 in Directory Service log; no events 5719, 5783 in System log |
| dcdiag | dcdiag /e /c /q /skip:systemlog returns no failures |
| Database | NTDS volume ≥10% free; system state backup within policy; AD Recycle Bin enabled |
Every check in this article maps to one of those rows.
Before You Start
Required access: Domain Admins or delegated diagnostic permissions. Run all commands from an elevated PowerShell or CMD prompt on a DC. For enterprise-wide dcdiag /e, run from a DC that has network visibility to all sites.
Tools used in this active directory health check — all built into Windows Server or available via RSAT:
dcdiag— built into Windows Server, available via RSAT on management workstationsrepadmin— built into Windows Servernetdom,nltest— built into Windows Serverw32tm— built into Windows Server- PowerShell AD module:
Get-ADReplicationFailure,Get-ADReplicationPartnerMetadata— requires RSAT AD DS Tools
On Windows Server 2025, KB5060842 must be installed before running any local health checks. Before this June 2025 cumulative update, WS2025 DCs could lose the domain firewall profile after a restart and become unreachable on the domain network — while local diagnostic tools showed green. Verify with Get-HotFix -Id KB5060842 first. If it is not installed, local results are not trustworthy.
Check 1: Core Services
Every DC must be running eight services. If any are stopped, nothing else in this active directory health check matters until they’re back.
$services = 'ntds','adws','dns','dnscache','kdc','w32time','netlogon','dfsr'
Get-Service $services | Select-Object Name, Status, StartTypeExpected output: Status = Running for all. StartType should be Automatic — a service set to Manual or Disabled will not survive a reboot.
| Service | What it does | What breaks when stopped |
|---|---|---|
| NTDS | Active Directory database engine | Everything — DC stops functioning entirely |
| ADWS | AD Web Services / PowerShell AD module | PowerShell AD cmdlets fail |
| DNS | Name resolution for clients and DCs | DC locator, domain joins, replication |
| Dnscache | DNS client cache | DC can’t resolve partner DCs by name |
| KDC | Kerberos ticket issuance | Authentication across the domain |
| W32Time | Time synchronization | Kerberos fails when skew exceeds 5 minutes |
| Netlogon | Secure channel, SYSVOL/NETLOGON shares, SRV records | Domain authentication, Group Policy |
| DFSR | SYSVOL replication between DCs | Group Policy replication stops |
If DFSR is missing from your environment, you’re running legacy FRS SYSVOL replication. FRS was removed in Windows Server 2025 — if any DC runs WS2025 and FRS is still present, the SYSVOL migration is incomplete.
Check 2: AD Replication
Replication is the most common source of AD failures. Run this first when something feels wrong. The full repadmin command reference on Microsoft Learn covers every flag used below.
repadmin /replsummaryThe “largest delta” column shows how long ago the last successful replication occurred per DC. A delta under 60 minutes is normal — that’s the default intra-site replication polling interval. Operators frequently flag a 45-minute delta as a failure when it’s just the scheduler. A delta over 60 minutes warrants investigation. A delta measured in hours or days is an active failure.
repadmin /showrepl * /errorsonlyIf this returns nothing, replication is healthy. If it returns errors, the error code drives the next step — 1722 (RPC unavailable, usually firewall or DNS), 8453 (access denied, permissions or secure channel), 8606/8614 (lingering objects or replication stalled past tombstone lifetime).
repadmin /queueA queue that doesn’t drain over time means replication is stuck, not just delayed. Use this to distinguish “slow” from “broken.”
Get-ADReplicationFailure -Target * -Scope Forest | Sort-Object FailureCount -DescendingFor deeper investigation of specific replication failures, see Active Directory Replication Not Working: How to Diagnose and Fix.
Check 3: SYSVOL and NETLOGON
SYSVOL and NETLOGON must be shared and accessible on every DC. When they’re not, Group Policy stops applying — sometimes silently, because clients use cached policy until it expires.
net shareBoth SYSVOL and NETLOGON must appear. If either is missing, Netlogon is likely stopped or SYSVOL hasn’t completed initial synchronization.
dfsrdiag replicationstateExpected output: no backlog or a low queue count. A persistent backlog on a specific DC indicates a DFSR problem — dirty shutdown, database corruption, or staging quota exhaustion are the most common causes.
Get-WinEvent -LogName "DFS Replication" -MaxEvents 50 |
Where-Object { $_.Level -le 3 } |
Select-Object TimeCreated, Id, Message |
Format-Table -AutoSize -WrapEvents 2213, 4012, and 5002 are the most operationally significant. Event 2213 indicates a dirty shutdown requiring a non-authoritative restore. Event 4012 means DFSR stopped replicating because SYSVOL content is too old — a more severe form of backlog requiring a D2/D4 restore procedure.
For detailed DFSR troubleshooting, see SYSVOL Replication Issues in Active Directory: DFSR Troubleshooting.
Check 4: DNS
DNS and AD are tightly coupled. Most replication errors (1722 in particular) trace back to DNS. This active directory health check covers the areas that actually fail in practice.
dcdiag /test:DNS /vThis runs seven sub-tests. Read the summary table at the end of the output — columns labeled Auth/Basc/Forw/Del/Dyn/RReg/Ext, one row per DC. A WARN means dcdiag couldn’t fully validate the configuration. Run with /v to get detail behind any flagged column.
The most common DNS failures in practice: a DC pointing only to itself for DNS (breaks name resolution after a restart before AD is fully started), missing SRV records, stale A records for decommissioned DCs, and scavenging disabled so stale records accumulate over months.
nslookup -type=SRV _ldap._tcp.dc._msdcs.yourdomain.comIf SRV records are missing, force re-registration:
nltest /dsregdns
net stop netlogon && net start netlogonFor comprehensive DNS troubleshooting, see Active Directory DNS Problems: SRV Records, Zones, and Resolution Failures.
Check 5: Time Synchronization
Kerberos authentication fails when the clock offset between a client and a DC exceeds five minutes. That’s a hard failure — authentication stops working, domain joins fail, and replication can break.
w32tm /query /statusThe key fields: Source (should be the PDC emulator or external NTP), Last Successful Sync Time, and Poll Interval. A source of Local CMOS Clock means W32Time has lost contact with its time source — this needs immediate attention.
w32tm /monitor /domain:yourdomain.comOn the PDC emulator, verify Type = NTP with an external server configured. On all other DCs, Type should be NT5DS. If the PDC emulator is syncing from CMOS or from another DC, the whole domain has no external time anchor.
w32tm /query /configurationFor time synchronization recovery procedures, see Active Directory Time Synchronization: Fix PDC Emulator, W32Time, and Kerberos Clock Skew.
Check 6: FSMO Role Holders
Every active directory health check must confirm that all five FSMO roles are held by reachable, functional DCs. If a role holder is offline, specific operations fail — new user accounts can’t be created (RID Master down), time sync breaks down (PDC Emulator down), schema changes are blocked (Schema Master down).
netdom query fsmonltest /dsgetdc:yourdomain.com /forcedcdiag /test:KnowsOfRoleHolders /e /vRun the KnowsOfRoleHolders test with /e to compare knowledge across all DCs. If DCs disagree on role holders, replication is broken between them. For role management — transfer, seize, role holder failure — see FSMO Roles Active Directory: What They Do and How to Manage Them.
Check 7: Kerberos and Secure Channel
Services and event logs can all look normal while Kerberos and the secure channel are broken. These checks catch failures that don’t surface in service state.
Test-ComputerSecureChannel -Verbosenltest /sc_verify:yourdomain.comA broken secure channel causes sporadic authentication failures and replication access-denied errors (8453). It usually means the DC’s computer account password is out of sync.
Get-WinEvent -LogName Security -FilterXPath "*[System[(EventID=4769)]]" -MaxEvents 20 |
Where-Object { $_.Message -match "0x25|0x18" }Event 4769 with failure code 0x25 is a clock skew failure (KRB_AP_ERR_SKEW). A flood of 0x25 errors means time synchronization is broken somewhere in the chain — check the PDC emulator configuration before investigating anything else. Event 4769 with 0x18 is a bad password and points to a different problem entirely.
Check 8: Event Logs
Automated checks catch most things. Event logs surface failure patterns that commands don’t expose — especially slow-developing problems like a DC approaching tombstone lifetime.
Get-WinEvent -LogName "Directory Service" -MaxEvents 100 |
Where-Object { $_.Level -le 2 } |
Select-Object TimeCreated, Id, Message |
Format-List| Event ID | Source | Meaning | Action |
|---|---|---|---|
| 1311 | NTDS Replication | KCC cannot build a spanning tree — replication path broken | Investigate unreachable DC |
| 1864 | NTDS Replication | DC hasn’t replicated in 14+ days — approaching tombstone lifetime | Investigate immediately |
| 2042 | NTDS Replication | DC hasn’t replicated past tombstone lifetime — lingering objects likely | Urgent — replication blocked |
| 5719 | NETLOGON | DC can’t locate a DC at startup — DNS or timing issue | Check DNS and startup sequence |
| 5783 | NETLOGON | Secure channel failure | Run nltest /sc_verify |
| 40960 | LSASRV | Kerberos authentication failure — time skew or KDC unreachable | Check W32Time and KDC service |
Events 1864 and 2042 require immediate action. A DC that hasn’t replicated in 60+ days will be blocked from replicating by default — the data divergence risk is too high for AD to reconcile automatically.
Check 9: dcdiag — Full Diagnostic Pass
dcdiag is the most comprehensive single-command active directory health check available on Windows Server. Run it at least weekly and after any structural change (DC promotion/demotion, site link changes, schema updates).
dcdiag /e /c /q /skip:systemlog /f:dcdiag-output.txtThe SystemLog test fails constantly in production environments because of unrelated OS events — Windows Update, driver messages, anything that generated an error in the last hour. Skip it in routine scans and review the System log separately when investigating specific incidents.
dcdiag /test:Advertising /test:Replications /test:SysVolCheck /test:DFSREvent /test:DNS /vRun these five tests immediately after any DC promotion. They confirm the new DC is advertising correctly, has replicated, and has a healthy SYSVOL before clients start using it.
For a complete explanation of every dcdiag test, what FAILED means for each, common error codes (1722, 8453, 8606), and the DNS sub-test column breakdown, see DCDIAG Explained: How to Read and Interpret Domain Controller Diagnostics.
Check 10: AD Database
The AD database (NTDS.dit) rarely fails, but when it does the recovery path is painful. These checks take two minutes and are worth running quarterly.
$ntdsPath = (Get-ItemProperty "HKLM:\System\CurrentControlSet\Services\NTDS\Parameters")."DSA Database file"
$drive = Split-Path $ntdsPath -Qualifier
Get-PSDrive $drive | Select-Object Name, Used, FreeThe NTDS volume needs free space for defrag operations and log file growth. Flag any volume below 10% free.
Get-Item $ntdsPath | Select-Object Name, Length, LastWriteTimeNTDS.dit grows over time and never shrinks automatically. Online defrag (runs nightly by default) reclaims whitespace internally but doesn’t reduce file size. Offline defrag with ntdsutil reduces file size but requires stopping AD DS — a maintenance window operation. Don’t run offline defrag unless the file is significantly oversized relative to your object count.
AD should be backed up at least once within the tombstone lifetime (default 60 days). In practice, “once every 60 days” is not a recovery strategy. The real question is: if a DC failed right now, how old would your backup be? Most shops target daily or weekly system state backups on at least one DC per domain.
Active Directory Health Check PowerShell Script
For environments where you want a repeatable, schedulable active directory health check without running commands manually, this script covers the core diagnostic areas and writes a summary to the console. It’s a starting point — expand it to match your environment. For the full dcdiag switch reference used in Check 9, see the Microsoft Learn dcdiag documentation.
#Requires -Modules ActiveDirectory
# AD Health Check — run elevated on a DC
# Output: console summary per check area
$domain = (Get-ADDomain).DNSRoot
$errors = @()
Write-Host "`n=== AD Health Check: $domain ===" -ForegroundColor Cyan
Write-Host (Get-Date) "`n"
# 1. Core services
Write-Host "--- Services ---"
$svcNames = 'ntds','adws','dns','dnscache','kdc','w32time','netlogon','dfsr'
Get-Service $svcNames | ForEach-Object {
$status = if ($_.Status -eq 'Running') { "OK" } else { "FAIL" }
if ($_.Status -ne 'Running') { $errors += "Service not running: $($_.Name)" }
Write-Host "$status $($_.Name) [$($_.Status)]"
}
# 2. Replication summary
Write-Host "`n--- Replication ---"
$replErrors = repadmin /showrepl * /errorsonly 2>&1
if ($replErrors -match '\S') {
Write-Host "WARN Replication errors detected — run repadmin /showrepl * /errorsonly for detail"
$errors += "Replication errors present"
} else {
Write-Host "OK No replication errors"
}
# 3. SYSVOL shares
Write-Host "`n--- SYSVOL ---"
$shares = net share 2>&1
foreach ($share in @('SYSVOL','NETLOGON')) {
if ($shares -match $share) {
Write-Host "OK $share share present"
} else {
Write-Host "FAIL $share share missing"
$errors += "$share share missing"
}
}
# 4. DFSR backlog
Write-Host "`n--- DFSR ---"
$dfsr = dfsrdiag replicationstate 2>&1
if ($dfsr -match 'No backlog') {
Write-Host "OK No DFSR backlog"
} else {
Write-Host "WARN DFSR backlog detected — run dfsrdiag replicationstate for detail"
$errors += "DFSR backlog present"
}
# 5. Time sync
Write-Host "`n--- Time ---"
$time = w32tm /query /status 2>&1
$source = ($time | Select-String "Source").ToString()
Write-Host "INFO $source"
if ($source -match 'Local CMOS Clock') {
Write-Host "FAIL W32Time lost external time source"
$errors += "W32Time source is Local CMOS Clock"
} else {
Write-Host "OK Time source looks healthy"
}
# 6. FSMO roles
Write-Host "`n--- FSMO ---"
$fsmo = netdom query fsmo 2>&1
Write-Host ($fsmo | Out-String).Trim()
# 7. Critical Directory Service events
Write-Host "`n--- Event Log ---"
$criticalIds = @(1311, 1864, 2042)
$events = Get-WinEvent -LogName "Directory Service" -MaxEvents 500 -ErrorAction SilentlyContinue |
Where-Object { $_.Id -in $criticalIds }
if ($events) {
$events | ForEach-Object {
Write-Host "WARN Event $($_.Id): $($_.Message.Substring(0,[Math]::Min(120,$_.Message.Length)))..."
$errors += "Directory Service Event $($_.Id) found"
}
} else {
Write-Host "OK No critical Directory Service events (1311, 1864, 2042)"
}
# Summary
Write-Host "`n=== Summary ==="
if ($errors.Count -eq 0) {
Write-Host "All checks passed." -ForegroundColor Green
} else {
Write-Host "$($errors.Count) issue(s) found:" -ForegroundColor Yellow
$errors | ForEach-Object { Write-Host " - $_" }
}On Windows Server 2025, the script’s WMIC-based checks must be replaced with Get-CimInstance. Any community script downloaded before mid-2025 that uses wmic for WMI queries will fail silently or throw an error on WS2025 DCs where WMIC is disabled by default. Review any third-party script before running it in a WS2025 environment.
For a more comprehensive PowerShell-based AD health report with HTML output, the community scripts ADxRay by Claudio Merola and ALI TAJRAN’s Get-ADHealth.ps1 are widely deployed in SMB environments and cover additional areas (security posture, object inventory, schema version) that a minimal script doesn’t reach.
Common Active Directory Health Check Mistakes
These are the patterns that cause operators to miss real problems or waste time on non-problems during an active directory health check.
Checking only one DC. Replication failures are asymmetric — a problem between DC01 and DC03 won’t show up when you run diagnostics only on DC01 and DC02. Always run dcdiag /e and repadmin /showrepl * with the asterisk to cover the whole forest.
Treating a 45-minute replication delta as a failure. The default intra-site replication polling interval is 15 seconds per change notification, but the KCC schedules replication at roughly 60-minute intervals for steady-state. A “largest delta” of 45 minutes in repadmin /replsummary is the scheduler doing its job. A delta of 6 hours is a problem.
Assuming DNS is healthy because name resolution works. Client-facing name resolution can work fine while SRV record registration is broken, dynamic updates are disabled, or stale DC records are causing RPC failures between DCs. DNS health for clients and DNS health for AD replication are different things. Run dcdiag /test:DNS /v regardless.
Trusting event logs alone without running repadmin. The Directory Service event log shows you that something went wrong. It often doesn’t show you which partner, which naming context, or the exact error code. repadmin /showrepl * /errorsonly gives you the specific replication failure — the event log is a signal to look, not a complete diagnosis.
Not validating SYSVOL. SYSVOL can be shared and accessible while Group Policy content is stale or missing on a specific DC. A client hitting that DC gets old policy from cache and no one notices until the cached policy expires or the cache is cleared. dfsrdiag replicationstate catches active backlogs before they matter to end users.
Running dcdiag without /skip:systemlog and treating every failure as AD-related. The SystemLog test fires on anything in the System event log in the last hour — Windows Update, driver events, hardware messages. In a production environment, it almost always fails. The result is a dcdiag output with ten apparent failures where nine are noise and one is real. Use /skip:systemlog for routine scans.
Windows Server 2025 — Active Directory Health Check Changes
If any DC in your environment runs Windows Server 2025, these changes affect existing active directory health check procedures now.
WMIC disabled by default. Any legacy health-check script using wmic will fail silently or with an error. Replace with Get-CimInstance. This includes popular community scripts and some third-party monitoring agents that haven’t been updated.
VBScript deprecated. VBS-based AD monitoring wrappers need to be migrated to PowerShell. VBScript became a Feature-on-Demand in WS2025 and is targeted for removal in a future release.
PowerShell 2.0 engine removed (September 2025 update and later). If any monitoring tool depends on the PS 2.0 engine specifically, it stops working on updated WS2025 systems.
Credential Guard enabled by default. This protects domain credentials but can cause failures with legacy authentication protocols and some monitoring agents that use older NTLM flows. Verify monitoring agent compatibility before deploying WS2025 DCs into production.
New functional level 10 and 32k database pages. Windows Server 2025 introduces functional level 10, which enables 32k database pages — the first ESE page-size change since Windows 2000. This change is forest-wide and irreversible once enabled. All DCs must be on WS2025 first, and the change requires explicitly running Enable-ADOptionalFeature. Don’t enable it under time pressure.
KB5060842 — domain firewall profile bug. The most operationally urgent WS2025 item. Before this June 2025 cumulative update, WS2025 DCs could lose the domain firewall profile after a restart and become unreachable on the domain network while local diagnostic tools reported healthy. Every WS2025 DC needs to be at this patch level or later — verify with Get-HotFix -Id KB5060842.
Active Directory Health Check — Cadence Reference
Use this as an operational template. The goal is running the right check at the right frequency — not the same full scan every day. Microsoft’s Active Directory replication concepts documentation covers the underlying topology and scheduling model if you want to understand the intervals behind the cadence.
Daily (automate where possible)
| Check | Command | Healthy threshold |
|---|---|---|
| Core services | Get-Service ntds,adws,dns,dnscache,kdc,w32time,netlogon,dfsr |
All Running |
| Replication summary | repadmin /replsummary |
No errors; largest delta <60 min |
| Critical events | Directory Service log, Level ≤ 2 | No events 1311, 1864, 2042 |
| DC reachability | nltest /dsgetdc:domain /force |
Returns a DC without error |
After Every Change
| Check | Command | What to verify |
|---|---|---|
| Replication errors | repadmin /showrepl * /errorsonly |
No output = no errors |
| dcdiag core tests | dcdiag /test:Advertising /test:Replications /test:SysVolCheck /test:DNS /v |
All PASSED |
| SYSVOL shares | net share on affected DCs |
SYSVOL and NETLOGON present |
| FSMO knowledge | dcdiag /test:KnowsOfRoleHolders /e |
All DCs agree on role holders |
Weekly
| Check | Command | What to verify |
|---|---|---|
| Full replication detail | repadmin /showrepl * /errorsonly |
No output |
| DFSR health | dfsrdiag replicationstate |
No persistent backlog |
| DNS test suite | dcdiag /test:DNS /v |
All columns PASS or WARN with known cause |
| dcdiag full pass | dcdiag /e /c /q /skip:systemlog /f:dcdiag-weekly.txt |
No failures |
| Time hierarchy | w32tm /monitor /domain:yourdomain.com |
All DCs within ±2 minutes |
| Secure channel | nltest /sc_verify:domain |
Secure channel valid |
Quarterly
| Check | Command / Action | What to verify |
|---|---|---|
| NTDS volume free space | Check drive where NTDS.dit lives | ≥10% free |
| Backup recency | Check backup logs or Windows Server Backup | System state backed up within policy |
| AD Recycle Bin | Get-ADOptionalFeature "Recycle Bin Feature" |
Enabled |
| Trust health | nltest /domain_trusts |
All trusts verified (if trusts exist) |
| Functional level | Get-ADForest | Select-Object ForestMode |
At target level |
| FSMO role placement | netdom query fsmo |
Roles distributed per design |
| WS2025 patch level | Get-HotFix -Id KB5060842 |
Installed (WS2025 DCs only) |
FAQ
How often should I perform an Active Directory health check?
Replication and service state should be checked daily — ideally automated. A full active directory health check covering DNS, SYSVOL, time, FSMO, and dcdiag should run weekly and after every structural change. Quarterly checks cover the AD database, backup recency, and trust health. The cadence table above maps the right command to the right frequency.
What is the most important Active Directory health check?
Replication. Everything else in AD depends on replication working correctly — DNS records, Group Policy, Kerberos, SYSVOL. repadmin /showrepl * /errorsonly returning nothing is the single most reliable signal that an AD environment is healthy. If replication has errors, fix those before running any other active directory health check.
Can dcdiag detect all Active Directory problems?
No. dcdiag tests configuration and state across about 20 default areas — services, replication connection objects, SYSVOL, DNS records, FSMO role knowledge. It doesn’t check disk space, backup recency, GPO content correctness, absolute time accuracy, or security hardening posture. A clean dcdiag is a necessary check, not a sufficient one. For what dcdiag specifically misses, see DCDIAG Explained: How to Read and Interpret Domain Controller Diagnostics.
How do I know if SYSVOL replication is healthy?
Three checks: net share confirms SYSVOL and NETLOGON shares are present; dfsrdiag replicationstate shows whether DFSR has an active backlog; and the DFS Replication event log shows whether Events 2213 or 4012 have fired recently. All three should be clean. SYSVOL can be shared while Group Policy content is stale on a specific DC — the event log and dfsrdiag backlog check together catch that scenario.
What causes Active Directory replication failures?
The three most common causes in practice: DNS misconfiguration (DCs can’t resolve each other, breaking RPC — error 1722), firewall blocking the RPC endpoint mapper on TCP 135 or dynamic ports 49152–65535 (also error 1722), and permissions or secure channel problems (error 8453). Longer-standing failures — a DC that missed the tombstone lifetime — produce errors 8606 or 8614 and require more involved recovery. Start with repadmin /showrepl * /errorsonly to get the specific error code, then follow the error to its cause.
Final Thoughts
A structured active directory health check isn’t just diagnostic housekeeping. Replication errors that go unchecked for weeks become lingering object problems. DNS misconfigurations that seem minor cause replication failures on the next DC promotion. SYSVOL backlogs that no one monitors leave some users getting stale Group Policy from cache long after the problem started.
The cadence table above is designed to catch those patterns before they escalate. Daily automated checks on services and replication take seconds. The weekly dcdiag pass catches the slow-developing failures. The quarterly checks cover the areas — database space, backup recency, trust health — that rarely fail but are painful when they do.
When something does fail, the articles below cover the remediation side for each area in this checklist:
- Replication failures (error 1722, 8453, lingering objects): Active Directory Replication Not Working: How to Diagnose and Fix
- SYSVOL and DFSR failures (dirty shutdown, D2/D4 restore): SYSVOL Replication Issues in Active Directory: DFSR Troubleshooting
- DNS failures (SRV records, zone configuration, domain join errors): Active Directory DNS Problems: SRV Records, Zones, and Resolution Failures
- Time synchronization failures (PDC emulator config, W32Time errors, Kerberos skew): Active Directory Time Synchronization: Fix PDC Emulator, W32Time, and Kerberos Clock Skew
- FSMO role management (transfer, seize, role holder failure): FSMO Roles Active Directory: What They Do and How to Manage Them
- Group Policy troubleshooting (GPO not applying, LSDOU processing): Group Policy in Active Directory: How GPO Processing Works
- dcdiag test-by-test reference and error codes: DCDIAG Explained: How to Read and Interpret Domain Controller Diagnostics
Active Directory Series
14 articles — Windows Server 2025 · Forest & Domain · FSMO · GPO · Replication · DNS