15 min read

Active Directory Health Check: Full Checklist for Domain Controllers (Windows Server 2025)

Active Directory failures rarely arrive without warning. Replication errors accumulate quietly for days before authentication stops working. DNS misconfiguration sits undetected until a DC promotion fails. SYSVOL backlogs grow in the background while Group Policy keeps applying from cache — until it doesn’t.

On this page

A structured active directory health check catches those signals before they become incidents. This checklist covers the full diagnostic surface: services, replication, DNS, SYSVOL, FSMO, time hierarchy, Kerberos, event logs, and the AD database. It’s structured by cadence so you can run the right checks at the right frequency, not the same scan every day.

TL;DR

Run dcdiag /q /skip:systemlog and repadmin /replsummary after every change — not just on a schedule
A “largest delta” under 60 minutes in repadmin /replsummary is normal — it is not a replication failure
Eight critical services must be running on every DC: NTDS, ADWS, DNS, Dnscache, KDC, W32Time, Netlogon, DFSR
On Windows Server 2025: wmic is disabled by default — replace with Get-CimInstance in any health scripts
Confirm KB5060842 is installed on WS2025 DCs — without it, a DC can pass local tests yet be unreachable on the domain network after a restart
The Kerberos clock skew tolerance is exactly 5 minutes — any time offset beyond that breaks authentication domain-wide

What a Healthy Active Directory Environment Looks Like

Before running any commands, it helps to know what you’re aiming for. A healthy AD environment has a specific, verifiable state — not a vague sense that “things seem fine.”

Area	Healthy state
Services	NTDS, ADWS, DNS, Dnscache, KDC, W32Time, Netlogon, DFSR all Running and set to Automatic
Replication	No errors in `repadmin /showrepl * /errorsonly`; largest delta under 60 minutes
SYSVOL	SYSVOL and NETLOGON shares present on every DC; no DFSR backlog; no events 2213 or 4012
DNS	All SRV records registered; `dcdiag /test:DNS` returns no FAIL columns; no stale A records for decommissioned DCs
Time	PDC emulator syncs from external NTP; all DCs within ±2 minutes of each other; offset well under 5-minute Kerberos threshold
FSMO	All five role holders online and reachable; all DCs agree on role placement
Kerberos	No Event 4769 with failure code 0x25 (clock skew); secure channel valid on all DCs
Event logs	No events 1311, 1864, 2042 in Directory Service log; no events 5719, 5783 in System log
dcdiag	`dcdiag /e /c /q /skip:systemlog` returns no failures
Database	NTDS volume ≥10% free; system state backup within policy; AD Recycle Bin enabled

Every check in this article maps to one of those rows.

Before You Start

Required access: Domain Admins or delegated diagnostic permissions. Run all commands from an elevated PowerShell or CMD prompt on a DC. For enterprise-wide dcdiag /e, run from a DC that has network visibility to all sites.

Tools used in this active directory health check — all built into Windows Server or available via RSAT:

dcdiag — built into Windows Server, available via RSAT on management workstations
repadmin — built into Windows Server
netdom, nltest — built into Windows Server
w32tm — built into Windows Server
PowerShell AD module: Get-ADReplicationFailure, Get-ADReplicationPartnerMetadata — requires RSAT AD DS Tools

Failure scenario

On Windows Server 2025, KB5060842 must be installed before running any local health checks. Before this June 2025 cumulative update, WS2025 DCs could lose the domain firewall profile after a restart and become unreachable on the domain network — while local diagnostic tools showed green. Verify with Get-HotFix -Id KB5060842 first. If it is not installed, local results are not trustworthy.

Check 1: Core Services

Every DC must be running eight services. If any are stopped, nothing else in this active directory health check matters until they’re back.

$services = 'ntds','adws','dns','dnscache','kdc','w32time','netlogon','dfsr'
Get-Service $services | Select-Object Name, Status, StartType

Expected output: Status = Running for all. StartType should be Automatic — a service set to Manual or Disabled will not survive a reboot.

Service	What it does	What breaks when stopped
NTDS	Active Directory database engine	Everything — DC stops functioning entirely
ADWS	AD Web Services / PowerShell AD module	PowerShell AD cmdlets fail
DNS	Name resolution for clients and DCs	DC locator, domain joins, replication
Dnscache	DNS client cache	DC can’t resolve partner DCs by name
KDC	Kerberos ticket issuance	Authentication across the domain
W32Time	Time synchronization	Kerberos fails when skew exceeds 5 minutes
Netlogon	Secure channel, SYSVOL/NETLOGON shares, SRV records	Domain authentication, Group Policy
DFSR	SYSVOL replication between DCs	Group Policy replication stops

If DFSR is missing from your environment, you’re running legacy FRS SYSVOL replication. FRS was removed in Windows Server 2025 — if any DC runs WS2025 and FRS is still present, the SYSVOL migration is incomplete.

Check 2: AD Replication

Replication is the most common source of AD failures. Run this first when something feels wrong. The full repadmin command reference on Microsoft Learn covers every flag used below.

repadmin /replsummary

The “largest delta” column shows how long ago the last successful replication occurred per DC. A delta under 60 minutes is normal — that’s the default intra-site replication polling interval. Operators frequently flag a 45-minute delta as a failure when it’s just the scheduler. A delta over 60 minutes warrants investigation. A delta measured in hours or days is an active failure.

repadmin /showrepl * /errorsonly

If this returns nothing, replication is healthy. If it returns errors, the error code drives the next step — 1722 (RPC unavailable, usually firewall or DNS), 8453 (access denied, permissions or secure channel), 8606/8614 (lingering objects or replication stalled past tombstone lifetime).

repadmin /queue

A queue that doesn’t drain over time means replication is stuck, not just delayed. Use this to distinguish “slow” from “broken.”

Get-ADReplicationFailure -Target * -Scope Forest | Sort-Object FailureCount -Descending

For deeper investigation of specific replication failures, see Active Directory Replication Not Working: How to Diagnose and Fix.

Check 3: SYSVOL and NETLOGON

SYSVOL and NETLOGON must be shared and accessible on every DC. When they’re not, Group Policy stops applying — sometimes silently, because clients use cached policy until it expires.

net share

Both SYSVOL and NETLOGON must appear. If either is missing, Netlogon is likely stopped or SYSVOL hasn’t completed initial synchronization.

dfsrdiag replicationstate

Expected output: no backlog or a low queue count. A persistent backlog on a specific DC indicates a DFSR problem — dirty shutdown, database corruption, or staging quota exhaustion are the most common causes.

Get-WinEvent -LogName "DFS Replication" -MaxEvents 50 |
  Where-Object { $_.Level -le 3 } |
  Select-Object TimeCreated, Id, Message |
  Format-Table -AutoSize -Wrap

Events 2213, 4012, and 5002 are the most operationally significant. Event 2213 indicates a dirty shutdown requiring a non-authoritative restore. Event 4012 means DFSR stopped replicating because SYSVOL content is too old — a more severe form of backlog requiring a D2/D4 restore procedure.

For detailed DFSR troubleshooting, see SYSVOL Replication Issues in Active Directory: DFSR Troubleshooting.

Check 4: DNS

DNS and AD are tightly coupled. Most replication errors (1722 in particular) trace back to DNS. This active directory health check covers the areas that actually fail in practice.

dcdiag /test:DNS /v

This runs seven sub-tests. Read the summary table at the end of the output — columns labeled Auth/Basc/Forw/Del/Dyn/RReg/Ext, one row per DC. A WARN means dcdiag couldn’t fully validate the configuration. Run with /v to get detail behind any flagged column.

The most common DNS failures in practice: a DC pointing only to itself for DNS (breaks name resolution after a restart before AD is fully started), missing SRV records, stale A records for decommissioned DCs, and scavenging disabled so stale records accumulate over months.

nslookup -type=SRV _ldap._tcp.dc._msdcs.yourdomain.com

If SRV records are missing, force re-registration:

nltest /dsregdns
net stop netlogon && net start netlogon

For comprehensive DNS troubleshooting, see Active Directory DNS Problems: SRV Records, Zones, and Resolution Failures.

Check 5: Time Synchronization

Kerberos authentication fails when the clock offset between a client and a DC exceeds five minutes. That’s a hard failure — authentication stops working, domain joins fail, and replication can break.

w32tm /query /status

The key fields: Source (should be the PDC emulator or external NTP), Last Successful Sync Time, and Poll Interval. A source of Local CMOS Clock means W32Time has lost contact with its time source — this needs immediate attention.

w32tm /monitor /domain:yourdomain.com

On the PDC emulator, verify Type = NTP with an external server configured. On all other DCs, Type should be NT5DS. If the PDC emulator is syncing from CMOS or from another DC, the whole domain has no external time anchor.

w32tm /query /configuration

For time synchronization recovery procedures, see Active Directory Time Synchronization: Fix PDC Emulator, W32Time, and Kerberos Clock Skew.

Check 6: FSMO Role Holders

Every active directory health check must confirm that all five FSMO roles are held by reachable, functional DCs. If a role holder is offline, specific operations fail — new user accounts can’t be created (RID Master down), time sync breaks down (PDC Emulator down), schema changes are blocked (Schema Master down).

netdom query fsmo

nltest /dsgetdc:yourdomain.com /force

dcdiag /test:KnowsOfRoleHolders /e /v

Run the KnowsOfRoleHolders test with /e to compare knowledge across all DCs. If DCs disagree on role holders, replication is broken between them. For role management — transfer, seize, role holder failure — see FSMO Roles Active Directory: What They Do and How to Manage Them.

Check 7: Kerberos and Secure Channel

Services and event logs can all look normal while Kerberos and the secure channel are broken. These checks catch failures that don’t surface in service state.

Test-ComputerSecureChannel -Verbose

nltest /sc_verify:yourdomain.com

A broken secure channel causes sporadic authentication failures and replication access-denied errors (8453). It usually means the DC’s computer account password is out of sync.

Get-WinEvent -LogName Security -FilterXPath "*[System[(EventID=4769)]]" -MaxEvents 20 |
  Where-Object { $_.Message -match "0x25|0x18" }

Event 4769 with failure code 0x25 is a clock skew failure (KRB_AP_ERR_SKEW). A flood of 0x25 errors means time synchronization is broken somewhere in the chain — check the PDC emulator configuration before investigating anything else. Event 4769 with 0x18 is a bad password and points to a different problem entirely.

Check 8: Event Logs

Automated checks catch most things. Event logs surface failure patterns that commands don’t expose — especially slow-developing problems like a DC approaching tombstone lifetime.

Get-WinEvent -LogName "Directory Service" -MaxEvents 100 |
  Where-Object { $_.Level -le 2 } |
  Select-Object TimeCreated, Id, Message |
  Format-List

Event ID	Source	Meaning	Action
1311	NTDS Replication	KCC cannot build a spanning tree — replication path broken	Investigate unreachable DC
1864	NTDS Replication	DC hasn’t replicated in 14+ days — approaching tombstone lifetime	Investigate immediately
2042	NTDS Replication	DC hasn’t replicated past tombstone lifetime — lingering objects likely	Urgent — replication blocked
5719	NETLOGON	DC can’t locate a DC at startup — DNS or timing issue	Check DNS and startup sequence
5783	NETLOGON	Secure channel failure	Run `nltest /sc_verify`
40960	LSASRV	Kerberos authentication failure — time skew or KDC unreachable	Check W32Time and KDC service

Events 1864 and 2042 require immediate action. A DC that hasn’t replicated in 60+ days will be blocked from replicating by default — the data divergence risk is too high for AD to reconcile automatically.

Check 9: dcdiag — Full Diagnostic Pass

dcdiag is the most comprehensive single-command active directory health check available on Windows Server. Run it at least weekly and after any structural change (DC promotion/demotion, site link changes, schema updates).

dcdiag /e /c /q /skip:systemlog /f:dcdiag-output.txt

The SystemLog test fails constantly in production environments because of unrelated OS events — Windows Update, driver messages, anything that generated an error in the last hour. Skip it in routine scans and review the System log separately when investigating specific incidents.

dcdiag /test:Advertising /test:Replications /test:SysVolCheck /test:DFSREvent /test:DNS /v

Run these five tests immediately after any DC promotion. They confirm the new DC is advertising correctly, has replicated, and has a healthy SYSVOL before clients start using it.

For a complete explanation of every dcdiag test, what FAILED means for each, common error codes (1722, 8453, 8606), and the DNS sub-test column breakdown, see DCDIAG Explained: How to Read and Interpret Domain Controller Diagnostics.

Check 10: AD Database

The AD database (NTDS.dit) rarely fails, but when it does the recovery path is painful. These checks take two minutes and are worth running quarterly.

$ntdsPath = (Get-ItemProperty "HKLM:\System\CurrentControlSet\Services\NTDS\Parameters")."DSA Database file"
$drive = Split-Path $ntdsPath -Qualifier
Get-PSDrive $drive | Select-Object Name, Used, Free

The NTDS volume needs free space for defrag operations and log file growth. Flag any volume below 10% free.

Get-Item $ntdsPath | Select-Object Name, Length, LastWriteTime

NTDS.dit grows over time and never shrinks automatically. Online defrag (runs nightly by default) reclaims whitespace internally but doesn’t reduce file size. Offline defrag with ntdsutil reduces file size but requires stopping AD DS — a maintenance window operation. Don’t run offline defrag unless the file is significantly oversized relative to your object count.

AD should be backed up at least once within the tombstone lifetime (default 60 days). In practice, “once every 60 days” is not a recovery strategy. The real question is: if a DC failed right now, how old would your backup be? Most shops target daily or weekly system state backups on at least one DC per domain.

Active Directory Health Check PowerShell Script

For environments where you want a repeatable, schedulable active directory health check without running commands manually, this script covers the core diagnostic areas and writes a summary to the console. It’s a starting point — expand it to match your environment. For the full dcdiag switch reference used in Check 9, see the Microsoft Learn dcdiag documentation.

#Requires -Modules ActiveDirectory
# AD Health Check — run elevated on a DC
# Output: console summary per check area

$domain = (Get-ADDomain).DNSRoot
$errors = @()

Write-Host "`n=== AD Health Check: $domain ===" -ForegroundColor Cyan
Write-Host (Get-Date) "`n"

# 1. Core services
Write-Host "--- Services ---"
$svcNames = 'ntds','adws','dns','dnscache','kdc','w32time','netlogon','dfsr'
Get-Service $svcNames | ForEach-Object {
    $status = if ($_.Status -eq 'Running') { "OK" } else { "FAIL" }
    if ($_.Status -ne 'Running') { $errors += "Service not running: $($_.Name)" }
    Write-Host "$status  $($_.Name) [$($_.Status)]"
}

# 2. Replication summary
Write-Host "`n--- Replication ---"
$replErrors = repadmin /showrepl * /errorsonly 2>&1
if ($replErrors -match '\S') {
    Write-Host "WARN  Replication errors detected — run repadmin /showrepl * /errorsonly for detail"
    $errors += "Replication errors present"
} else {
    Write-Host "OK    No replication errors"
}

# 3. SYSVOL shares
Write-Host "`n--- SYSVOL ---"
$shares = net share 2>&1
foreach ($share in @('SYSVOL','NETLOGON')) {
    if ($shares -match $share) {
        Write-Host "OK    $share share present"
    } else {
        Write-Host "FAIL  $share share missing"
        $errors += "$share share missing"
    }
}

# 4. DFSR backlog
Write-Host "`n--- DFSR ---"
$dfsr = dfsrdiag replicationstate 2>&1
if ($dfsr -match 'No backlog') {
    Write-Host "OK    No DFSR backlog"
} else {
    Write-Host "WARN  DFSR backlog detected — run dfsrdiag replicationstate for detail"
    $errors += "DFSR backlog present"
}

# 5. Time sync
Write-Host "`n--- Time ---"
$time = w32tm /query /status 2>&1
$source = ($time | Select-String "Source").ToString()
Write-Host "INFO  $source"
if ($source -match 'Local CMOS Clock') {
    Write-Host "FAIL  W32Time lost external time source"
    $errors += "W32Time source is Local CMOS Clock"
} else {
    Write-Host "OK    Time source looks healthy"
}

# 6. FSMO roles
Write-Host "`n--- FSMO ---"
$fsmo = netdom query fsmo 2>&1
Write-Host ($fsmo | Out-String).Trim()

# 7. Critical Directory Service events
Write-Host "`n--- Event Log ---"
$criticalIds = @(1311, 1864, 2042)
$events = Get-WinEvent -LogName "Directory Service" -MaxEvents 500 -ErrorAction SilentlyContinue |
    Where-Object { $_.Id -in $criticalIds }
if ($events) {
    $events | ForEach-Object {
        Write-Host "WARN  Event $($_.Id): $($_.Message.Substring(0,[Math]::Min(120,$_.Message.Length)))..."
        $errors += "Directory Service Event $($_.Id) found"
    }
} else {
    Write-Host "OK    No critical Directory Service events (1311, 1864, 2042)"
}

# Summary
Write-Host "`n=== Summary ==="
if ($errors.Count -eq 0) {
    Write-Host "All checks passed." -ForegroundColor Green
} else {
    Write-Host "$($errors.Count) issue(s) found:" -ForegroundColor Yellow
    $errors | ForEach-Object { Write-Host "  - $_" }
}

Failure scenario

On Windows Server 2025, the script’s WMIC-based checks must be replaced with Get-CimInstance. Any community script downloaded before mid-2025 that uses wmic for WMI queries will fail silently or throw an error on WS2025 DCs where WMIC is disabled by default. Review any third-party script before running it in a WS2025 environment.

For a more comprehensive PowerShell-based AD health report with HTML output, the community scripts ADxRay by Claudio Merola and ALI TAJRAN’s Get-ADHealth.ps1 are widely deployed in SMB environments and cover additional areas (security posture, object inventory, schema version) that a minimal script doesn’t reach.

Common Active Directory Health Check Mistakes

These are the patterns that cause operators to miss real problems or waste time on non-problems during an active directory health check.

Checking only one DC. Replication failures are asymmetric — a problem between DC01 and DC03 won’t show up when you run diagnostics only on DC01 and DC02. Always run dcdiag /e and repadmin /showrepl * with the asterisk to cover the whole forest.

Treating a 45-minute replication delta as a failure. The default intra-site replication polling interval is 15 seconds per change notification, but the KCC schedules replication at roughly 60-minute intervals for steady-state. A “largest delta” of 45 minutes in repadmin /replsummary is the scheduler doing its job. A delta of 6 hours is a problem.

Assuming DNS is healthy because name resolution works. Client-facing name resolution can work fine while SRV record registration is broken, dynamic updates are disabled, or stale DC records are causing RPC failures between DCs. DNS health for clients and DNS health for AD replication are different things. Run dcdiag /test:DNS /v regardless.

Trusting event logs alone without running repadmin. The Directory Service event log shows you that something went wrong. It often doesn’t show you which partner, which naming context, or the exact error code. repadmin /showrepl * /errorsonly gives you the specific replication failure — the event log is a signal to look, not a complete diagnosis.

Not validating SYSVOL. SYSVOL can be shared and accessible while Group Policy content is stale or missing on a specific DC. A client hitting that DC gets old policy from cache and no one notices until the cached policy expires or the cache is cleared. dfsrdiag replicationstate catches active backlogs before they matter to end users.

Running dcdiag without /skip:systemlog and treating every failure as AD-related. The SystemLog test fires on anything in the System event log in the last hour — Windows Update, driver events, hardware messages. In a production environment, it almost always fails. The result is a dcdiag output with ten apparent failures where nine are noise and one is real. Use /skip:systemlog for routine scans.

Windows Server 2025 — Active Directory Health Check Changes

If any DC in your environment runs Windows Server 2025, these changes affect existing active directory health check procedures now.

WMIC disabled by default. Any legacy health-check script using wmic will fail silently or with an error. Replace with Get-CimInstance. This includes popular community scripts and some third-party monitoring agents that haven’t been updated.

VBScript deprecated. VBS-based AD monitoring wrappers need to be migrated to PowerShell. VBScript became a Feature-on-Demand in WS2025 and is targeted for removal in a future release.

PowerShell 2.0 engine removed (September 2025 update and later). If any monitoring tool depends on the PS 2.0 engine specifically, it stops working on updated WS2025 systems.

Credential Guard enabled by default. This protects domain credentials but can cause failures with legacy authentication protocols and some monitoring agents that use older NTLM flows. Verify monitoring agent compatibility before deploying WS2025 DCs into production.

New functional level 10 and 32k database pages. Windows Server 2025 introduces functional level 10, which enables 32k database pages — the first ESE page-size change since Windows 2000. This change is forest-wide and irreversible once enabled. All DCs must be on WS2025 first, and the change requires explicitly running Enable-ADOptionalFeature. Don’t enable it under time pressure.

KB5060842 — domain firewall profile bug. The most operationally urgent WS2025 item. Before this June 2025 cumulative update, WS2025 DCs could lose the domain firewall profile after a restart and become unreachable on the domain network while local diagnostic tools reported healthy. Every WS2025 DC needs to be at this patch level or later — verify with Get-HotFix -Id KB5060842.

Active Directory Health Check — Cadence Reference

Use this as an operational template. The goal is running the right check at the right frequency — not the same full scan every day. Microsoft’s Active Directory replication concepts documentation covers the underlying topology and scheduling model if you want to understand the intervals behind the cadence.

Daily (automate where possible)

Check	Command	Healthy threshold
Core services	`Get-Service ntds,adws,dns,dnscache,kdc,w32time,netlogon,dfsr`	All Running
Replication summary	`repadmin /replsummary`	No errors; largest delta <60 min
Critical events	Directory Service log, Level ≤ 2	No events 1311, 1864, 2042
DC reachability	`nltest /dsgetdc:domain /force`	Returns a DC without error

After Every Change

Check	Command	What to verify
Replication errors	`repadmin /showrepl * /errorsonly`	No output = no errors
dcdiag core tests	`dcdiag /test:Advertising /test:Replications /test:SysVolCheck /test:DNS /v`	All PASSED
SYSVOL shares	`net share` on affected DCs	SYSVOL and NETLOGON present
FSMO knowledge	`dcdiag /test:KnowsOfRoleHolders /e`	All DCs agree on role holders

Weekly

Check	Command	What to verify
Full replication detail	`repadmin /showrepl * /errorsonly`	No output
DFSR health	`dfsrdiag replicationstate`	No persistent backlog
DNS test suite	`dcdiag /test:DNS /v`	All columns PASS or WARN with known cause
dcdiag full pass	`dcdiag /e /c /q /skip:systemlog /f:dcdiag-weekly.txt`	No failures
Time hierarchy	`w32tm /monitor /domain:yourdomain.com`	All DCs within ±2 minutes
Secure channel	`nltest /sc_verify:domain`	Secure channel valid

Quarterly

Check	Command / Action	What to verify
NTDS volume free space	Check drive where NTDS.dit lives	≥10% free
Backup recency	Check backup logs or Windows Server Backup	System state backed up within policy
AD Recycle Bin	`Get-ADOptionalFeature "Recycle Bin Feature"`	Enabled
Trust health	`nltest /domain_trusts`	All trusts verified (if trusts exist)
Functional level	`Get-ADForest \| Select-Object ForestMode`	At target level
FSMO role placement	`netdom query fsmo`	Roles distributed per design
WS2025 patch level	`Get-HotFix -Id KB5060842`	Installed (WS2025 DCs only)

FAQ

How often should I perform an Active Directory health check?

Replication and service state should be checked daily — ideally automated. A full active directory health check covering DNS, SYSVOL, time, FSMO, and dcdiag should run weekly and after every structural change. Quarterly checks cover the AD database, backup recency, and trust health. The cadence table above maps the right command to the right frequency.

What is the most important Active Directory health check?

Replication. Everything else in AD depends on replication working correctly — DNS records, Group Policy, Kerberos, SYSVOL. repadmin /showrepl * /errorsonly returning nothing is the single most reliable signal that an AD environment is healthy. If replication has errors, fix those before running any other active directory health check.

Can dcdiag detect all Active Directory problems?

No. dcdiag tests configuration and state across about 20 default areas — services, replication connection objects, SYSVOL, DNS records, FSMO role knowledge. It doesn’t check disk space, backup recency, GPO content correctness, absolute time accuracy, or security hardening posture. A clean dcdiag is a necessary check, not a sufficient one. For what dcdiag specifically misses, see DCDIAG Explained: How to Read and Interpret Domain Controller Diagnostics.

How do I know if SYSVOL replication is healthy?

Three checks: net share confirms SYSVOL and NETLOGON shares are present; dfsrdiag replicationstate shows whether DFSR has an active backlog; and the DFS Replication event log shows whether Events 2213 or 4012 have fired recently. All three should be clean. SYSVOL can be shared while Group Policy content is stale on a specific DC — the event log and dfsrdiag backlog check together catch that scenario.

What causes Active Directory replication failures?

The three most common causes in practice: DNS misconfiguration (DCs can’t resolve each other, breaking RPC — error 1722), firewall blocking the RPC endpoint mapper on TCP 135 or dynamic ports 49152–65535 (also error 1722), and permissions or secure channel problems (error 8453). Longer-standing failures — a DC that missed the tombstone lifetime — produce errors 8606 or 8614 and require more involved recovery. Start with repadmin /showrepl * /errorsonly to get the specific error code, then follow the error to its cause.

Final Thoughts

A structured active directory health check isn’t just diagnostic housekeeping. Replication errors that go unchecked for weeks become lingering object problems. DNS misconfigurations that seem minor cause replication failures on the next DC promotion. SYSVOL backlogs that no one monitors leave some users getting stale Group Policy from cache long after the problem started.

The cadence table above is designed to catch those patterns before they escalate. Daily automated checks on services and replication take seconds. The weekly dcdiag pass catches the slow-developing failures. The quarterly checks cover the areas — database space, backup recency, trust health — that rarely fail but are painful when they do.

When something does fail, the articles below cover the remediation side for each area in this checklist:

Replication failures (error 1722, 8453, lingering objects): Active Directory Replication Not Working: How to Diagnose and Fix
SYSVOL and DFSR failures (dirty shutdown, D2/D4 restore): SYSVOL Replication Issues in Active Directory: DFSR Troubleshooting
DNS failures (SRV records, zone configuration, domain join errors): Active Directory DNS Problems: SRV Records, Zones, and Resolution Failures
Time synchronization failures (PDC emulator config, W32Time errors, Kerberos skew): Active Directory Time Synchronization: Fix PDC Emulator, W32Time, and Kerberos Clock Skew
FSMO role management (transfer, seize, role holder failure): FSMO Roles Active Directory: What They Do and How to Manage Them
Group Policy troubleshooting (GPO not applying, LSDOU processing): Group Policy in Active Directory: How GPO Processing Works
dcdiag test-by-test reference and error codes: DCDIAG Explained: How to Read and Interpret Domain Controller Diagnostics

Active Directory Series

14 articles — Windows Server 2025 · Forest & Domain · FSMO · GPO · Replication · DNS

Foundation

What Is Active Directory? How It Actually Works AD Components: Forest, Domain, and OU Explained How to Install Active Directory on Windows Server 2025 Post-Install Checklist: Tasks Before Going Live

Architecture

Sites and Services: Replication Topology Explained FSMO Roles: What They Do and How to Manage Them OU Design: Structure, Delegation, and GPO Scope Group Policy: How GPO Processing Works

Troubleshooting & Recovery

Replication Not Working: Diagnose Every Error Code SYSVOL Replication Issues: DFSR Troubleshooting DNS Problems: SRV Records, Zones, and Auth Failures Time Synchronization: Fix PDC Emulator and Kerberos Skew

Health & Maintenance

AD Health Check: Full Checklist for Domain Controllers DCDIAG Explained: How to Read DC Diagnostics