Active Directory Sites – Best Practices
Incorrect sites, subnets or connection object configuration can cause AD replication and authentication to fail. This article discusses some of the common reasons for these failures and best practices for keeping the Sites healthy.
Site Links and Replication Connections
Site not contained in a Site Link
All sites need to be contained in at least one Site Link in order to replicate to other sites. Automatic Site Covering and DFS costing will also be affected.
Manual Connection Objects
If you leave the KCC to do it’s job, it will automatically create the necessary connection objects. However, any manually created connection object (INCLUDING an automatically created object that has been modified) will remain static. “Admin made it, so admin must know something I don’t know” is the general logic behind this. Only create manual connection objects if you know something the KCC doesn’t know. Don’t confuse a connection object with a site link.
If you are cleaning up the connection objects, don’t delete more than 10 connections at a time or a Version Vector Join (vv join) might be required to re-join the DC.
Connection Objects with non-default schedules
By default, connection objects will inherit their schedule based on the site link. However, they can be changed directly. Once you make a change to a connection object, it will no longer be managed by the KCC and will be treated as a manual connection object.
Redundant Site Links
If two site links contain the same two remote sited, a suboptimal replication topology may result.
Inter-Site Change Notification
Replication of AD is always pulled and not pushed. Within a site, when a change occurs, a DC will notify other DCs of the change so that they can pull the change. Between sites, this is not used and rather a schedule is used with the lowest time being 15 minutes.
This can be changed to work with Change Notification making inter-site replication much faster (but using more bandwidth as a consequence). It is recommended to only enable change notification on a link if it is a high speed link or a dedicated Exchange site.
To enable Change Notification, use adsiedit.msc and update the attribute called “Options” on the site link to a value of 1. You can find this object in the Configuration NC.
Site links contain 0 or 1 sites
Site links are logical objects allowing DCs in remote sites to replicate. There must be 2 or more sites associated with a site link. The deletion of a site may require the manual clean-up of the respective site link.
Disabled Connection Objects
This is uncommon but can be difficult to find. A connection object which is disabled will naturally not replicate.
Domain Controller Configuration
Forest Functional level not at 2003
If all of your DCs are 2003 or higher (should be at the time of publishing this…) then ensure that you raise the Forest Functional Level to 2003. This enables the following benefits:
- Renaming domain controllers
- LastLogonTimeStamp attribute
- Replicating group change deltas
- Renaming domains
- Cross forest trusts
- Improved KCC scalability
DCs not in the Domain Controllers OU
DCs should not be moved from the Domain Controllers OU or the Default Domain Controllers GPO won’t apply to them. This can cause replication failure. If you have to move a DC to a different OU (e.g. for delegation purposes), ensure that the Default Domain Controllers GPO is linked to the new OU.
AutoSiteCoverage Enabled on 2003 while RODCs exist
AutoSiteCoverage enables a DC to cover a site where no DCs exist by registering the relevant SRV records for the site in question. Windows 2003 DCs don’t recognise RODCs and if AutoSiteCoverage is enabled on these DCs, they will register their SRV records in this site. This will result in users authenticating to the 2003 DC even though an RODC exists in the site.
To resolve this, either disable AutoSiteCoverage on the 2003 DC or install the RODC Compatibility Pack on the 2003 DCs.
REG_DWORD called AutoSiteCoverage, value = 1 or 0
Metadata for old DCs Found
In the event that a DC has to be forceably removed (dcpromo /forceremoval) such as when it has not replicated beyond the TSL, you will need to clean up the DC Metadata on the central DCs. Metadata includes elements such as the Computer Object, NTDS Settings, FRSMember object and DNS Records. Use ntdsutil to perfom this:
Site has Universal Group Membership Caching enabled and has a GC
Universal Group Membership Caching is set at the site level and affects all DCs in the site. If one of the DCs is a GC, the remaining DCs will continue to cache Universal Group Membership resulting un unpredictable authentication failures (dependant on which DC is chosen for authentication by the DS Locator Service).
No GC in Site
In order to logon, a user account needs to be evaluated against Universal Group Membership which is stored on GCs. A site without GCs can cause logon failure as a result. A new option is to enable Universal Group Membership Caching in order not to require a GC in each site.
Missing Subnets in AD
Sites consist of one or more subnets and allow clients to logon to a local Domain Controller quickly through the DC Locator Process. If the subnet definition is missing from AD, the client will logon to any generic DC which may be on the other side of the world. You can easily find subnets not defined in AD by reviewing the Netlogon.log file in %systemroot%\debug folder. You can look for all DCs with event 5778 using eventcomb and then selectively gather the various netlogon.log files.
Topology Clean-up Disabled
This option disables the automatic clean-up of unnecessary connection objects and replication links. To re-enable it, run:
repadmin /siteoptions HubServer1 -IS_TOPL_CLEANUP_DISABLED
Detect Stale Topology Disabled
This site option is used by the KCC Branch Office Mode which tells the KCC to ignore failed replication and not to try to find a path around.
repadmin /siteoptions BranchServer1 -IS_TOPL_DETECT_STALE_DISABLED
This should not be enabled on Central or Hub Sites or replication failures can result. To undo this:
repadmin /siteoptions HubServer1 +IS_TOPL_DETECT_STALE_DISABLED
KCC Intra-Site Topology Disabled
If the KCC Intra-Site Topology is disabled, all replication connections need to be manually maintained which will have a high administrative burdon. This is not recommended and rather allow the KCC to dynamically build the topology every 15 minutes.
repadmin /siteoptions HubServer1 +IS_AUTO_TOPOLOGY_DISABLED
For inter-site, you may choose to disable the KCC and create manual connection objects as follows:
repadmin /siteoptions HubServer1 IS_INTER_SITE_AUTO_TOPOLOGY_DISABLED
Inbound Replication Disabled
Disabling inbound replication should only be used for testing and should be removed once complete. Leaving inbound replication disabled will eventually orphan the DC once the TSL has expired. To re-enable inbound replication, run the following (Note the + and – switches on the Repadmin options to confirm or negate the option):
repadmin /options site:Branch -DISABLE_INBOUND_REPL
Outbound Replication Disabled
Outbound replication is disabled automatically when a DC has not replicated within it’s tombstone linetime (180 days). If it has been disabled manually you need to reenable it as follows:
repadmin /options site:Branch -DISABLE_OUTBOUND_REPL
If you’ve configured replication on a schedule on a site link, this schedule will be ignored if the “Ignore IP Schedules” option is set on the IP Container.
This is NOT the GUI for “Options = 1” which enables inter-site change notification.
Topology Minimum Hops Disabled
By default, the KCC will create the intra-site repl topology so that no replication partner is more than 3 hops away. This 3 hop limit can be disabled as follows:
repadmin /siteoptions server1 +IS_TOPL_MIN_HOPS_DISABLED
To undo this, negate the option (-) as follows:
repadmin /siteoptions server1 -IS_TOPL_MIN_HOPS_DISABLED
Non-default dSHeuristics Value
The dSHeuristics attribute modifies the behaviour of certain aspects of the Domain Controllers. An examples of behavioral changes include enabling anonymous LDAP operations. The dsHeuristics attribute is located at CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=<forest root domain>
The data is a Unicode string where each value may represent a different possible setting.
The default value is <not set>
For more information on dSHeuristics:
Recycle bin deleted object lifetime
Without knowing the Recycle Bin Deleted Object Lifetime, it’s not possible to know if a deleted object will be recoverable. By default, the value is set to Null and it uses the value of the TombStone Lifetime instead. The TSL is also set to Null by default and if it remains null, it uses the hard coded value of 60 (or 180 if the forest was deployed on 2003 SP1 or above). If the value is changed, ensure it is longer than your backup schedule to avoid having to do authoritative restores on deleted objects.
The location of the TombStone Lifetime and the Deleted Object Lifetime are both at CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=<forest root domain> with the following Attribute Names:
TombStone Lifetime (TSL): tombstoneLifetime
Deleted Object Lifetime: msDS-DeletedObjectLifetime
Preferred Bridgeheads Exclude NC
Suppose you disable BASL (Bridge All Site Links). On your central site you have a DC in DomA and a DC in DomB. You make the DC in DomA the Preferred Bridgehead for IP.
This will result in remote sites with DCs in DomB being unable to replicate. After the TSL expires you are going to end up with lingering objects even if you fix this problem. This will have highly undesirable implications.
Preferred Bridgehead Configuration
Bridgeheads are created automatically for each NC by the KCC/ISTG. Manually specifying a preferred bridgehead is not recommended.
If the preferred bridgehead becomes unavailable, replication will fail and no automated failover to a non-preferred Bridgehead will take place.
If you need to use preferred bridgeheads instead of random KCC/ISTG generated bridgeheads, ensure that for each NC, there are at least 2 servers defined in the site.
Single Preferred Bridgehead for the Domain
In a scenario where multiple DCs exist in the central site and only one DC is selected as the Preferred Bridgehead, this represents a single point of replication failure.
Manually created Inbound Replication Connections from an RODC
A manually created inbound replication connection from an RODC will result in failed replication as an RODC will never replicate outbound.
RODC’s lowest cost site link contains only one 2008 RWDC
The Filtered Attribute Set (FAS) is the definition of what an RODC may replicate (some attributes being filtered). It only recognises the FAS when replicating to a 2008 RWDC. If there is only 1 RWDC at the next hop which fails, the RODC may replicate with a 2003 DC including all attributes. It’s important to validate the site links, site link bridges and costs to ensure that there are at least 2 RWDCs each RODC can replicate from.
Multiple RODCs in a Site
RODCs cache users passwords. In the event of a disconnection to a RWDC, the users can logon using the cached RODC password.
In the event that there are multiple RODCs in the Site for the same domain, it is unpredictable which RODC will respond to an Authentication Request. Therefore, user logon experience will be equally unpredictable.
RWDC and RODC in the same Site
Typically, RODCs are placed in remote branch sites by themselves. In the event that there are both RWDCs and RODCs, there will be a noticeable and unpredictable user experience in the event of the RWDC being unavailable. This is especially true during WAN outages where passwords are not cached.
Only one RWDC in a Domain
Although a single RWDC and many RODCs can exist in a domain, this is not recommended. RODCs can’t replicate outbound and in the event of failure of the RWDC an undesirable AD Restore would be required.
- Get rid of those Lingering Objects (robsilver.org)
- DC Locator – What Does “NO_CLIENT_SITE” Mean In Netlogon.log (itworldjd.wordpress.com)
- DC Locator Process in W2K, W2K3(R2) and W2K8 (premglitz.wordpress.com)
- Demystifying Time in your Domain (robsilver.org)
- RODC pre-populating passwords (itworldjd.wordpress.com)
- Active Directory Forest Functional Level and Domain Functional Level (sandeshvidhate.wordpress.com)
- Windows Server 8: Part 1 – Active Directory (slalom.com)