A functioning, modern Windows network is a complex mesh of relationships and dependencies involving a variety of different systems and services, including AD, DNS, the GC, and operations master servers. Running an effective Windows network means having a handle of every aspect of your network environment at all times.
Its no surprise that the primary monitoring consideration in Windows is AD and its related services and components. This includes? responsiveness to DNS and LDAP queries, AD inter-site and intra-site replication, and a special Windows service called the Knowledge Consistency Checker (KCC). In addition, the health and availability of services such as DNS, the GC, and Dfs are also important.
(The KCC is a special Windows service that automatically generates AD?s replication topology and ensures that all domain controllers on the network participate in replication )
However, knowing what metrics to monitor is only a first step. By far, the most important and complex aspect of monitoring network health and performance isn?t related to determining what to monitor but rather how to digest the raw data collected from the array of metrics and make? useful determinations from that data. For example, although it would be possible to collect data on several dozen metrics (via Performance Monitor) related to AD replication, simply having this information at hand doesn?t tell you how to interpret the data or what you should? consider acceptable tolerance ranges for each metric. A useful monitoring system not only collects raw data but also understands the inter-relation of that data and how to use the information to identify problems on the network. This kind of artificial intelligence represents the true value of network? monitoring software.
In order to ensure the health and availability of AD as well as other critical Windows network services, organizations will need to regularly monitor a number of different services and components.
Category Potential Problems
/AD Low CPU or memory resources on domain controllers Low disk space on volumes housing the Sysvol folder, the AD database (NTDS.DIT) file, and/or the AD transactional log files Slow or broken connections between domain controllers Slow or failed client network logon authentication requests Slow or failed LDAP query responses Slow or failed Key Distribution Center (KDC) requests Slow or failed AD synchronization requests NetLogon (LSASS) service not functioning properly Directory Service Agent (DSA) service not functioning properly KCC not functioning properly? Excessive number of SMB connections Insufficient RID allocation pool size on local server Problems with transitive or external trusts to Win2K or down-level NT domains Low AD cache hit rate for name resolution queries (as a result of inefficient AD design)
Failed replication (due to domain controller or network connectivity problems) .Slow replication .Replication topology invalid/incomplete (lacks transitive closure/consistency) .Replication using excessive network bandwidth.Too many properties being dropped during replication Update Sequence Number (USN) update failures.Other miscellaneous replication-related failure events.
GC Slow or failed GC query responses.GC replication failures.
Missing or incorrect SRV records for domain controllers.Slow or failed DNS query responses.DNS server zone file update failures.
(FSMOs) Inaccessibility of one or more operation master (FSMO) servers.Forest or domain-centric operation master roles not consistent across domain controllers within domain/forest Slow or failed role master responses .
Low-level network connectivity problems.TCP/IP routing problems.DHCP IP address allocation pool shortages.WINS server query or replication failures (for legacy NetBIOS .systems and applications)Naming context lost + found items exist.Application or service failures or performance problems.