Sometimes, before trying to troubleshoot, you might want to restart all of the major components of AD. Copy this into a small CMD and run (requires psexec):
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i ipconfig /flushdns
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i nbtstat -R
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i nbtstat -RR
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i ipconfig /registerdns
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i netdiag /fix
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i dcdiag /fix
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i net stop netlogon & net start netlogon
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i net stop dhcp & net start Dhcp
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i net stop dns & net start dns
for /F %%i in (‘dsquery server -forest -o rdn’) do psexec \\%%~i repadmin /syncall
Incorrect sites, subnets or connection object configuration can cause AD replication and authentication to fail. This article discusses some of the common reasons for these failures and best practices for keeping the Sites healthy.
Site Links and Replication Connections
Site not contained in a Site Link
All sites need to be contained in at least one Site Link in order to replicate to other sites. Automatic Site Covering and DFS costing will also be affected.
Manual Connection Objects
If you leave the KCC to do it’s job, it will automatically create the necessary connection objects. However, any manually created connection object (INCLUDING an automatically created object that has been modified) will remain static. “Admin made it, so admin must know something I don’t know” is the general logic behind this. Only create manual connection objects if you know something the KCC doesn’t know. Don’t confuse a connection object with a site link.
If you are cleaning up the connection objects, don’t delete more than 10 connections at a time or a Version Vector Join (vv join) might be required to re-join the DC.
Connection Objects with non-default schedules
By default, connection objects will inherit their schedule based on the site link. However, they can be changed directly. Once you make a change to a connection object, it will no longer be managed by the KCC and will be treated as a manual connection object.
Redundant Site Links
If two site links contain the same two remote sited, a suboptimal replication topology may result.
Inter-Site Change Notification
Replication of AD is always pulled and not pushed. Within a site, when a change occurs, a DC will notify other DCs of the change so that they can pull the change. Between sites, this is not used and rather a schedule is used with the lowest time being 15 minutes.
This can be changed to work with Change Notification making inter-site replication much faster (but using more bandwidth as a consequence). It is recommended to only enable change notification on a link if it is a high speed link or a dedicated Exchange site.
To enable Change Notification, use adsiedit.msc and update the attribute called “Options” on the site link to a value of 1. You can find this object in the Configuration NC.
Site links contain 0 or 1 sites
Site links are logical objects allowing DCs in remote sites to replicate. There must be 2 or more sites associated with a site link. The deletion of a site may require the manual clean-up of the respective site link.
Disabled Connection Objects
This is uncommon but can be difficult to find. A connection object which is disabled will naturally not replicate.
Domain Controller Configuration
Forest Functional level not at 2003
If all of your DCs are 2003 or higher (should be at the time of publishing this…) then ensure that you raise the Forest Functional Level to 2003. This enables the following benefits:
- Renaming domain controllers
- LastLogonTimeStamp attribute
- Replicating group change deltas
- Renaming domains
- Cross forest trusts
- Improved KCC scalability
DCs not in the Domain Controllers OU
DCs should not be moved from the Domain Controllers OU or the Default Domain Controllers GPO won’t apply to them. This can cause replication failure. If you have to move a DC to a different OU (e.g. for delegation purposes), ensure that the Default Domain Controllers GPO is linked to the new OU.
AutoSiteCoverage Enabled on 2003 while RODCs exist
AutoSiteCoverage enables a DC to cover a site where no DCs exist by registering the relevant SRV records for the site in question. Windows 2003 DCs don’t recognise RODCs and if AutoSiteCoverage is enabled on these DCs, they will register their SRV records in this site. This will result in users authenticating to the 2003 DC even though an RODC exists in the site.
To resolve this, either disable AutoSiteCoverage on the 2003 DC or install the RODC Compatibility Pack on the 2003 DCs.
REG_DWORD called AutoSiteCoverage, value = 1 or 0
Metadata for old DCs Found
In the event that a DC has to be forceably removed (dcpromo /forceremoval) such as when it has not replicated beyond the TSL, you will need to clean up the DC Metadata on the central DCs. Metadata includes elements such as the Computer Object, NTDS Settings, FRSMember object and DNS Records. Use ntdsutil to perfom this:
Site has Universal Group Membership Caching enabled and has a GC
Universal Group Membership Caching is set at the site level and affects all DCs in the site. If one of the DCs is a GC, the remaining DCs will continue to cache Universal Group Membership resulting un unpredictable authentication failures (dependant on which DC is chosen for authentication by the DS Locator Service).
No GC in Site
In order to logon, a user account needs to be evaluated against Universal Group Membership which is stored on GCs. A site without GCs can cause logon failure as a result. A new option is to enable Universal Group Membership Caching in order not to require a GC in each site.
Missing Subnets in AD
Sites consist of one or more subnets and allow clients to logon to a local Domain Controller quickly through the DC Locator Process. If the subnet definition is missing from AD, the client will logon to any generic DC which may be on the other side of the world. You can easily find subnets not defined in AD by reviewing the Netlogon.log file in %systemroot%\debug folder. You can look for all DCs with event 5778 using eventcomb and then selectively gather the various netlogon.log files.
Topology Clean-up Disabled
This option disables the automatic clean-up of unnecessary connection objects and replication links. To re-enable it, run:
repadmin /siteoptions HubServer1 -IS_TOPL_CLEANUP_DISABLED
Detect Stale Topology Disabled
This site option is used by the KCC Branch Office Mode which tells the KCC to ignore failed replication and not to try to find a path around.
repadmin /siteoptions BranchServer1 -IS_TOPL_DETECT_STALE_DISABLED
This should not be enabled on Central or Hub Sites or replication failures can result. To undo this:
repadmin /siteoptions HubServer1 +IS_TOPL_DETECT_STALE_DISABLED
KCC Intra-Site Topology Disabled
If the KCC Intra-Site Topology is disabled, all replication connections need to be manually maintained which will have a high administrative burdon. This is not recommended and rather allow the KCC to dynamically build the topology every 15 minutes.
repadmin /siteoptions HubServer1 +IS_AUTO_TOPOLOGY_DISABLED
For inter-site, you may choose to disable the KCC and create manual connection objects as follows:
repadmin /siteoptions HubServer1 IS_INTER_SITE_AUTO_TOPOLOGY_DISABLED
Inbound Replication Disabled
Disabling inbound replication should only be used for testing and should be removed once complete. Leaving inbound replication disabled will eventually orphan the DC once the TSL has expired. To re-enable inbound replication, run the following (Note the + and – switches on the Repadmin options to confirm or negate the option):
repadmin /options site:Branch -DISABLE_INBOUND_REPL
Outbound Replication Disabled
Outbound replication is disabled automatically when a DC has not replicated within it’s tombstone linetime (180 days). If it has been disabled manually you need to reenable it as follows:
repadmin /options site:Branch -DISABLE_OUTBOUND_REPL
If you’ve configured replication on a schedule on a site link, this schedule will be ignored if the “Ignore IP Schedules” option is set on the IP Container.
This is NOT the GUI for “Options = 1” which enables inter-site change notification.
Topology Minimum Hops Disabled
By default, the KCC will create the intra-site repl topology so that no replication partner is more than 3 hops away. This 3 hop limit can be disabled as follows:
repadmin /siteoptions server1 +IS_TOPL_MIN_HOPS_DISABLED
To undo this, negate the option (-) as follows:
repadmin /siteoptions server1 -IS_TOPL_MIN_HOPS_DISABLED
Non-default dSHeuristics Value
The dSHeuristics attribute modifies the behaviour of certain aspects of the Domain Controllers. An examples of behavioral changes include enabling anonymous LDAP operations. The dsHeuristics attribute is located at CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=<forest root domain>
The data is a Unicode string where each value may represent a different possible setting.
The default value is <not set>
For more information on dSHeuristics:
Recycle bin deleted object lifetime
Without knowing the Recycle Bin Deleted Object Lifetime, it’s not possible to know if a deleted object will be recoverable. By default, the value is set to Null and it uses the value of the TombStone Lifetime instead. The TSL is also set to Null by default and if it remains null, it uses the hard coded value of 60 (or 180 if the forest was deployed on 2003 SP1 or above). If the value is changed, ensure it is longer than your backup schedule to avoid having to do authoritative restores on deleted objects.
The location of the TombStone Lifetime and the Deleted Object Lifetime are both at CN=Directory Service,CN=Windows NT,CN=Services,CN=Configuration,DC=<forest root domain> with the following Attribute Names:
TombStone Lifetime (TSL): tombstoneLifetime
Deleted Object Lifetime: msDS-DeletedObjectLifetime
Preferred Bridgeheads Exclude NC
Suppose you disable BASL (Bridge All Site Links). On your central site you have a DC in DomA and a DC in DomB. You make the DC in DomA the Preferred Bridgehead for IP.
This will result in remote sites with DCs in DomB being unable to replicate. After the TSL expires you are going to end up with lingering objects even if you fix this problem. This will have highly undesirable implications.
Preferred Bridgehead Configuration
Bridgeheads are created automatically for each NC by the KCC/ISTG. Manually specifying a preferred bridgehead is not recommended.
If the preferred bridgehead becomes unavailable, replication will fail and no automated failover to a non-preferred Bridgehead will take place.
If you need to use preferred bridgeheads instead of random KCC/ISTG generated bridgeheads, ensure that for each NC, there are at least 2 servers defined in the site.
Single Preferred Bridgehead for the Domain
In a scenario where multiple DCs exist in the central site and only one DC is selected as the Preferred Bridgehead, this represents a single point of replication failure.
Manually created Inbound Replication Connections from an RODC
A manually created inbound replication connection from an RODC will result in failed replication as an RODC will never replicate outbound.
RODC’s lowest cost site link contains only one 2008 RWDC
The Filtered Attribute Set (FAS) is the definition of what an RODC may replicate (some attributes being filtered). It only recognises the FAS when replicating to a 2008 RWDC. If there is only 1 RWDC at the next hop which fails, the RODC may replicate with a 2003 DC including all attributes. It’s important to validate the site links, site link bridges and costs to ensure that there are at least 2 RWDCs each RODC can replicate from.
Multiple RODCs in a Site
RODCs cache users passwords. In the event of a disconnection to a RWDC, the users can logon using the cached RODC password.
In the event that there are multiple RODCs in the Site for the same domain, it is unpredictable which RODC will respond to an Authentication Request. Therefore, user logon experience will be equally unpredictable.
RWDC and RODC in the same Site
Typically, RODCs are placed in remote branch sites by themselves. In the event that there are both RWDCs and RODCs, there will be a noticeable and unpredictable user experience in the event of the RWDC being unavailable. This is especially true during WAN outages where passwords are not cached.
Only one RWDC in a Domain
Although a single RWDC and many RODCs can exist in a domain, this is not recommended. RODCs can’t replicate outbound and in the event of failure of the RWDC an undesirable AD Restore would be required.
- Get rid of those Lingering Objects (robsilver.org)
- DC Locator – What Does “NO_CLIENT_SITE” Mean In Netlogon.log (itworldjd.wordpress.com)
- DC Locator Process in W2K, W2K3(R2) and W2K8 (premglitz.wordpress.com)
- Demystifying Time in your Domain (robsilver.org)
- RODC pre-populating passwords (itworldjd.wordpress.com)
- Active Directory Forest Functional Level and Domain Functional Level (sandeshvidhate.wordpress.com)
- Windows Server 8: Part 1 – Active Directory (slalom.com)
What are Lingering Objects?
Objects that exist on 1 or more DCs but not on others (how bizarre, the DCs are supposed to replicate all objects aren’t they?)
How does it happen?
Well, when you delete an object from AD, it is removed from general visibility and is marked with a Tombstone Flag. This flag is replicated to all DCs and sure enough, the object is removed from visibility on all DCs after full propagation.
After a period of time (Tombstone Lifetime [TSL] – 60 days default for forests that started with W2K. 180 days default if the first DC in a forest is W2K3 SP1), the garbage collection process hard deletes these objects.
Ok, scenario – what happens if a DC was unavailable for the period of the tombstone life and after this period, comes back online. Remember, it won’t receive the tombstone. Dadaaahhhhhh, it has an object that no other DC has. The object seems to be lingering about when it was supposed to be deleted.
Other ways this can happen is a System State Restore older than the TSL, promoting a DC using install from media (IFM) and significant time changes.
How do I prevent it?
Getting a new job isn’t a bad idea. If this isn’t possible, then enable Strict Replication.
Suppose there is a Lingering object on a DC that’s been offline for longer than the TSL. Bring it online. We know it’s going to have a few lingering objects. These objects are only replicated when there is a change. Suppose you change any attribute on one of these objects. This will force replication and replicate to all DCs! This object is then “reanimated” on the other DCs.
By enabling Strict Replication, a DC won’t accept an attribute change to an object that doesn’t exist in it’s naming context.
NOTE: Before enabling strict replication, ensure that all lingering objects have been cleaned up from the forest or you may have some significant replication issues.
The setting for replication consistency is stored in the registry in the Strict Replication Consistency entry in
HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Parameters. It should be a RegDWord with a value of 1 to enable it.
Another consideration – are you using virtualization snapshot software. This is NOT A BACKUP of Active Directory as it doesn’t take into consideration the role of InvocationID and USN Rollback can happen.
What is USN Rollback?
This is somewhat related to plastic surgery which allows someone with too much money to roll back time. Except, that someone is a DC in the hands of an incompetent IT Admin.
- InvocationID – ID of a DC
- USN – Number representing the last change that occurred originating from this DC
So, DC1 makes a change to User 1 and registers USN (change number) 5001. DC2 replicates this change. DC2 will now only replicate changes above DC1:5001 because it is already up-to-date to 5001.
Suppose that you “restore” DC1 from a Snapshot. Suppose the snapshot was made when the USN was at 4500. Question. Why does replication not take place when you create a new object on DC1.
You’ve got it! That’s USN Rollback.
Now, you have a new object on the restored DC not replicating to the other DCs. It’s a Lingering object.
So, when your VMWare admin says he’s found a new way to quickly backup AD, you’ll need to educate him. Snapshots are NOT backups.
Ok, how do I find Lingering Objects?
Use repadmin for this with the /removelingeringobjects /Advisory_Mode fo find these objects.
You need a good Source DC to start this. Basically, we are going to check to see if on a different DC there are additional objects in comparison to our source DC. An event 1946 will be created for each Lingering object identified.
And to remove them?
Basically, the same command as above, except don’t use the /Advisory_Mode switch. Event 1945 will be created for each object removed.
Using Repadmin /removelingeringobjects can be a nightmare in large organizations with many DCs. You need to run it from a source DC to each other writable DC individually, and then start again from a new source until each DC has been used as a source against all other DCs. i.e. N(2(N-1)). So, if you have 500 DCs, this is, erm, one moment, can it be 499’000 commands?
Isn’t there an easier way?
Yes, have a look at repldiag.exe. This can automate the process of removing Lingering Objects but requires connectivity to all DCs. It also supports a Test First, Run Later methodology. However, you only need to run it once and it’s done. You can find repldiag on codeplex here.
Simply [disclaimer goes here] run it in test mode, then run it for real as follows:
repldiag /removelingeringobjects /advisorymode
I hope this demonstrates the importance of ensuring that you keep an eye on Replication and also to potentially review your Backup strategy. If you can’t manage replication consistently to remote branch offices, rather don’t include a DC there.
There are a large number of posts on the internet with regards to setting up or troubleshooting time in an Active Directoryforest. This blog aims to shed some light on the key principals, configuration and useful resources for time in an Active Directory forest.
Warning: SERVERNAME is not advertising as a time server or errors related to the server not advertising itself as a Domain Controller.
The TIMESERV flag will not be set for that DC if there are any issues with the Windows Time Service.
Firstly, do you really need accurate time. Not really. In fact, Microsoft don’t even support high time accuracy. http://blogs.technet.com/askds/archive/2007/10/23/high-accuracy-w32time-requirements.aspx
However, it would be a really nice feature to have all machines on the network having the correct time. Naturally, there are some instances where accurate time is absolutely necessary; e.g. banking, time logging applications, transport systems etc. The Stratum is the degree to which a computers clock is accurate. The lower the value, the higher the accuracy where a value of 1 is considered the highest accuracy.
So, your time can be a year off and your forest will work fine. PLEASE don’t test this. USN rollback or Certificate expiration may occur if you experiment with this which is a topic for another day…
Next question – do the clocks need to be in sync within the Forest? Yes they do, give or take 5 minutes in order to ensure you don’t break Kerberos as per RFC 1510.
So, it is more important to ensure that the clocks are in sync than to ensure accurate time, although accurate time is nice.
Windows 2000 used SNTP (Simple Network Time Protocol) and Net Time as both the protocol and management tool for managing time. This is pretty much outdated although some Domain Admins are still using the legacy mechanisms (net time /setsntp….) to try and configure the 2003/2008/2008 R2 Domain Controllers.
From Windows Server 2003 onwards, NTP is used as the protocol (uses Coordinated Universal Time [UTC] agnostic of time zones) on UDP port 123. It uses the Windows Time Service (w32time.dll) to manage time which is in turn configured via the w32tm.exe command line tool.
There’s a fundamental difference between w32tm and Net Time command lines. Net time only queries the time of the remote computer using the multi-functional net.exe utility while w32tm specifically queries the Windows Time Service. However, net time is still useful as it can be used to query remote devices which don’t run the Windows Time Service or NTP.
Please note that the command “Net Time” with no switches will query the time of a Domain Controller and not the local machine. This can be overridden with the \\computername switch. To query the time in a different domain, use the /domain:domainname switch. Although the “Net Time” command queries remote computers, the /SET switch only sets the time of the local machine (not the \\computername machine).
Setting up the forest, the best practice is to get the Forest Root PDC Emulator to retrieve it’s time from a reliable time source (Manual NTP Server), while all other DCs retrieve their time from the PDC Emulator in the forest root (Domhier). Clients retrieve their time from any Domain Controller advertising as a time server.
Let’s look at the steps involved in setting up the PDC Emulator in the forest root domain on the assumption that you would like it to synchronise it’s time with a remote reliable NTP server.
PDC Emulator Configuration (Forest Root Domain)
Before starting any configuration, you need to make sure that you can access an external reliable NTP server. If you are struggling to find one, a pool of load balanced NTP servers is available on the Internet in the NTP Pool project. This project will have servers close to you which will provide you with marginally higher accuracy based on reduced round trip inconsistencies. Have a look at http://www.pool.ntp.org to find an NTP Pool near you. Remember that you will need UDP port 123 assess from your PDC Emulator to the desired Internet based NTP server.
Next, find the PDC Emulator. You can find the PDC Emulator for the domain using the “netdom query fsmo” command on any domain controller.
On the PDC Emulator, let’s first clear all the w32tm config on the PDC Emulator. This will allow us to start afresh and not be concerned with previous potential inaccurate configurations. This is optional, but something I usually do to ensure that I am aware of every config entry I make. To do this:
Wait a minute or two
Now, to configure the PDC Emulator, run the following:
w32tm /configure /manualpeerlist:pool.ntp.org,0x1 /syncfromflags:manual /update
Note: The 0x1 is required as this is a DNS name and not an IP Address.
Syncfromflags:manual tells the server PDC Emulator that it will use an external NTP server for time, and not the domain.
Remember to restart the Windows Time Service after each configuration change. Use the following commend to restart the Windows Time Service:
Net stop w32time & net start w32time
Once you have done this, you can verify these settings in the Registry in the following location:
You can also use w32tm to check the new configuration:
W32tm /query /configuration
ONLY the PDC Emulator of the Forest Root Domain should have the Type configured as NTP. All other machines in the domain should have this entry set to NT5DS in order to obtain their time from the Domain and not external NTP servers.
You now need to inform the server to get out there and find what the time is supposed to be using NTP. Use the following command to do this:
W32tm /resync /rediscover
At any time, you can use the following command to monitor the server which is really great for troubleshooting:
You can also check the status of the server as follows:
W32tm /query /status
The following two registry entries specify the maximum time shift that the DC will accept in seconds from it’s peers:
MaxPosPhaseCorrection (default – 172800 seconds)
MaxNegPhaseCorrection (default – 172800 seconds)
Although Microsoft recommends changing this to 900 seconds, others have commented to reduce this to 300 seconds to ensure you don’t have any 300 second Kerberos issues. Use your discretion here. I always use 300 seconds. The default is 2 days (172800 decimal). If you are 2 days out, it might be weekend and you are still working…
Note: If your DC is having difficulty based on any of the above steps, ensure that there are no GPO Time Settings applying to the Domain Controller. You can find this using Resultant Set of Policy in the following GPO Settings path:
Computer Configuration > Administrative Templates > System > Windows Time Service
Client and additional Domain Controller Configuration
On the assumption that not GPO configuration settings have been applied, the clients should work fine under normal circumstances.
All client devices within the domain should receive their time from the domain. To manually tell a client to do this, run the following:
w32tm /config /syncfromflags:domhier /update
This can also be done using Group Policy here:
Computer Configuration > Administrative Templates > System > Windows Time Service
Once you have done this, you can verify these settings on the client in the Registry in the following location:
Windows Time Technical Reference
High Accuracy w32time Requirements:
NTP Pool Project:
The 10’000 year clock:
Time Software Providers:
Time and Frequency Receiver Manufacturers: