6

Click here to load reader

6 Tips for Troubleshooting Active Directory -- Redmondmag

Embed Size (px)

Citation preview

Page 1: 6 Tips for Troubleshooting Active Directory -- Redmondmag

3/20/2014 6 Tips for Troubleshooting Active Directory -- Redmondmag.com

http://redmondmag.com/Articles/2009/07/01/6-Tips-for-Troubleshooting-Active-Directory.aspx?p=1 1/6

By Gary Olsen

IN-DEPTH

6 Tips for Troubleshooting Active Directory

A few hints you might not be familiar with can help you spot and fix problems in AD.

07/01/2009

I've made a living troubleshooting Active Directory since the Windows 2000 beta, and it's beenan interesting ride. While much of my AD work has been smooth, the lack of troubleshootingresources has thrown in a few bumps. Troubleshooting is rarely covered in any standard trainingcourse, and administrators are usually left at the mercies of Microsoft's support site, Google,EventID.net and other similar sites. While these resources provide a wealth of data, you reallyneed to do a lot of preliminary work and then search for specific problems on the Web.

Along my journey, I've found a few shortcuts. The following are my top six AD troubleshootingtips. The best thing about these tips is that they're free. That's right: There's no need to buy third-party tools. All the tools noted here are either native to Windows 2003 or in Support Tools or theResource Kit, and they're surprisingly powerful. These tips are loosely ordered from the mostimportant to the ones that have a narrower focus.

Tip 1: Determining DNS HealthThe first thing we want to determine when assessing AD's overall health is DNS. Failing DNScan cause problems such as client authentication, application failure, Exchange failures with e-mail or GAL lookups, LDAP query failures, replication failures ... you get the picture. DNS iscritical. There's a very powerful option for DCDiag.exe: C:\DCdiag /Test:DNS /e /v can be

redirected to a file. The /e option indicates the test will be run on all DNS servers and /v is forverbose output. In a large environment, this may take a while to run, but it's worth the wait. Ialways read this starting at the bottom of the report, which is a table like that shown in Table 1.DCdiag runs six different tests: Authentication (Auth), Basic Connectivity (Basc), Forwarders(Forw), Delegation (Del), Dynamic registration enabled (Dyn) and Resource Record registration(RReg). The table also lists the External (Ext) test (connection to the Internet), but this commanddoesn't perform that test.

Auth Basc Forw Del Dyn RReg Ext

Domain: Corp.netCorp-DC02Corp-DC03Corp-DC01

PASSPASSPASS

WARNWARNWARN

n/an/an/a

n/an/an/a

PASSPASSPASS

FAILFAILFAIL

n/an/an/a

Domain:EMEA.Corp.net

EMEA-DC03

EMEA-DC02EMEA-DC01

FAILFAIL

FAIL

FAILFAIL

FAIL

n/an/a

n/a

n/an/a

n/a

n/an/a

n/a

n/an/a

n/a

n/an/a

n/a

Page 2: 6 Tips for Troubleshooting Active Directory -- Redmondmag

3/20/2014 6 Tips for Troubleshooting Active Directory -- Redmondmag.com

http://redmondmag.com/Articles/2009/07/01/6-Tips-for-Troubleshooting-Active-Directory.aspx?p=1 2/6

Domain:Americas.Corp.net

AM-DC10AM-DC11AM-DC12AM-DC13

PASSPASSPASSPASS

PASSPASSPASSPASS

PASSPASSPASSPASS

PASSPASSPASSPASS

PASSPASSPASSPASS

FAILFAILFAILFAIL

n/an/an/an/a

Table 1. Enterprise DNS Infrastructure test results.

In the sample output in Table 1, every DNS server -- which is usually also every domain controller(DC) -- in the forest is listed by domain. The cool thing about that is that it shows the domainconfiguration of the forest, which is very handy if you're a consultant or support engineer and notfamiliar with the environment. In reading the data in the table, the results are:

PASS: The DNS server passed this particular testFAIL: The DNS server failed the testN/A: The test was not run. This is usually due to a previous test failing, so it makes nosense to test a dependant function, which will fail anyway.

In Table 1, we see the value of this test. In a glance, I can see where my trouble spots are. In amultiple-domain forest, you must run this command with Enterprise Admin credentials, or you willget FAIL results on all tests for all DNS servers in domains for which you don't have privileges.This is what happened for the EMEA domain in Table 1.

For further help, a complete, detailed list of the test results is available earlier in the report. Forinstance, I can go to the top of the report and search for Corp-DC02, and get details as shown inFigure 1.

There's a lot of good information I've cut for brevity, but you can construct the DNS resolverconfiguration for each DNS server just from the data here. There's a lot of other data here aswell. But the point is that this section shows why the forwarders test had an N/A in the summaryin Table 1. Using this method, we can pick our way through all the warnings, failures and N/Aresults in the summary table. And, of course, the beauty is that you have all DNS servers in theforest in one nice text file generated from one command. This can be run even from a client thathas DCDiag on it.

Figure 1. Test results for domain controllers:

DC: Test-DC1.Wtec.adapps.com Domain: Wtec.adapps.com TEST: Authentication (Auth) Authentication test: Successfully completed TEST: Basic (Basc) Microsoft(R) Windows(R) Server 2003, Enterprise Edition (Service Pack level: 2.0) is supported NETLOGON service is running <snip> IP address is static

IP address: 10.13.62.95

<snip> DNS servers: 10.13.62.105 (<name unavailable>) [Valid] 10.13.62.95 (<name unavailable>) [Valid]

Page 3: 6 Tips for Troubleshooting Active Directory -- Redmondmag

3/20/2014 6 Tips for Troubleshooting Active Directory -- Redmondmag.com

http://redmondmag.com/Articles/2009/07/01/6-Tips-for-Troubleshooting-Active-Directory.aspx?p=1 3/6

Tip 2: Determining AD Replication HealthThe Support tools for Windows 2003 Service Pack 1 (SP1) include a new Repadmin optioncalled /replsum. Similar to the DCdiag /Test:DNS command in Tip No. 1, /replsum collectsreplication information for every DC in every domain in the forest. It will report the last timereplication occurred between the DC the command was run on and each other DC in the forest.While there are a number of different options, I've only used these:

Repadmin /replsum /bysrc /bydest /sort:delta

/bysrc indicates to collect data for DCs that have replicated from the DC this command isrun on/bydest indicates to collect data for DCs that have replicated to the DC this command isrun on/sort:Delta means to show the results in descending order

A sample output is shown in Table 2. This shows six DCs in the domain and the delta since theirlast replication. Here we can easily see that the domain is healthy except for WTEC-DC2, whichhas not replicated for five days with an error 1722. I know this DC is down due to a plannedmove in the data center. In addition, if a DC has not replicated for its tombstone lifetime days, itwill be flagged in this report so an administrator can immediately see the danger and take stepsto remove it from the network.

Source Largest Delta Fails/Total %% Error

WTEC-DC2 05d.13h:39m:15s 5 / 5 100 (1722)The RPCserver is...

WTEC-DC1 41m:26s 0 /20 0

DDMCWIN2K8 39m:00s 0 /4 0

GSE-EXCH3 08m:59s 0/6 0

MRNVMWTEC 08m:56s 0 / 4 0

WTEC-DC6 08m:34s 0 / 6 0

Destination

DC

Largest Delta Fails/Total %% Error

WTEC-DC1 05d.13h:39m:39s 5 /25 20 (1722) The RPCserver is...

DDMCWIN2K8 41m:50s 0 / 4 0

GSE-EXCH3 13m:35s 0 / 6 0

MRNVMWTEC 07m:24s 0 / 4 0

WTEC-DC6 06m:25s 0 / 6 0

Table 2.

Page 4: 6 Tips for Troubleshooting Active Directory -- Redmondmag

3/20/2014 6 Tips for Troubleshooting Active Directory -- Redmondmag.com

http://redmondmag.com/Articles/2009/07/01/6-Tips-for-Troubleshooting-Active-Directory.aspx?p=1 4/6

Tip 3: Replication Details for All DCs in the ForestThis technique -- very similar to the method used in Tip No. 2 -- will provide more detail. Thecommand is Repadmin /showrepl * /csv >showrepl.csv. This puts the output in .CSV format, asshown in Table 3.

I like this command because it frequently turns up errors in more detail than theRepadmin/replsum command. Additionally, it will often report different errors -- or additionalerrors, error codes and so on -- and provide the naming context and specific data that /replsumdoesn't provide.

Naming Context Source DC Number

of

Failures

Last

Failure

Time

Last

Success

Time

Last

Failure

Status

DC=Wtec,DC=adapps,DC=hp,DC=com WTEC-DC2 535 2/18/200921:36

2/13/20097:50

1722

DC=Wtec,DC=adapps,DC=hp,DC=com GSE-EXCH3 0 0 2/18/200921:37

0

DC=Wtec,DC=adapps,DC=hp,DC=com WTEC-DC6 0 0 2/18/200921:37

0

DC=Wtec,DC=adapps,DC=hp,DC=com MRNVMWTEC 0 0 2/18/200921:37

0

Table 3.

Tip 4: NTDS DiagnosticsThis tip is an absolute essential for getting additional data on Directory Service (DS) events. It'senabled per DC in the registry at HKEY_LOCAL_MACHINE\SYSTEM\

CurrentControlSet\Services\NTDS\Diagnostics. It's fairly straightforward. There are a variety ofvalues that, when enabled, will dump additional events into the event log to assist withtroubleshooting. The valid data for these values is an integer from zero to five, inclusive. Thedefault value is zero, meaning minimal verbosity, and a setting of five will dump more than youwant. Normally I set it at three and see if I need more. For instance, if I need more verbosedetails on replication, I'd set the "5 Replication Events" value to three and then reproduce theproblem. Make sure to reset the value to zero when troubleshooting is concluded. These settingswill fill up the event log quickly.

The most common values I use include:

1 Knowledge Consistency Checker10 Performance Counters13 Name Resolution (this is DNS related)15 Field Engineering18 Global Catalog2 Security Events5 Replication Events8 Directory Access9 Internal Processing

Page 5: 6 Tips for Troubleshooting Active Directory -- Redmondmag

3/20/2014 6 Tips for Troubleshooting Active Directory -- Redmondmag.com

http://redmondmag.com/Articles/2009/07/01/6-Tips-for-Troubleshooting-Active-Directory.aspx?p=1 5/6

The 9 Internal Processing value is handy for getting additional details for DS events that indicatean internal error has occurred. This will often cause additional events that will aid in diagnosingthe problem. It's common to set more than one of these values. For instance, in replicationtroubleshooting, it would be reasonable to enable 1 Knowledge Consistency Checker and 5Replication Events.

The 15 Field Engineering value will dump several additional events to the DS log. Unlike theother diagnostics, this one needs to be set to five to provide relevant data. Specifically, it willproduce events 1644 and 1643, which report inefficient LDAP queries including the client whowas the source of the query, the query string and the root of the query. This is important becauseone of the headaches related to AD is the Local System Authority Subsystem Service (LSASS)process using up enough resources to hang or crash a DC and cause client log-on delays.Inefficient LDAP queries by a user or by an application -- or even a Linux client log-on -- will put aheavier load on LSASS. Enabling this diagnostic will quickly identify the guilty party by name orIP address. Some admins leave this diagnostic permanently enabled to monitor a busyenvironment, but again, it will fill up the event logs and possibly hide or overwrite other importantevents in the DS log.

Tip 5: Group Policy Management Console and HTML ReportsI'm sure nearly every AD admin alive uses this tool, but I thought it would be worth mentioning thevalue of HTML reports. There are two types of reports I use very frequently because I'm dealingwith environments I'm not familiar with, and I usually want proof of the settings of a Group PolicyObject (GPO) as well as the results from a particular client or clients.

Getting a report of a GPO is valuable even if you're the admin because it shows exactly what

settings are defined -- in fact, it shows only the settings that are defined -- so you don't have towade through the GP editor to find which ones are set. This is a quick way to see if the GPO isdefined as you think it is. It also shows links, filters applied and other details. HTML reports forthe Default Domain Policy are easy to read and can be expanded and closed by sections asneeded, because they're in HTML format. To get this report, just right-click on any GPO in thedomain tree and select "Save Report."

One of the problems with solving a GPO-related issue at a client is pestering the user, who maybe hundreds of miles away, to log in and get a GPResult. If the user has logged in at least onceon a workstation, Group Policy Management Console (GPMC) can provide you with an HTML-formatted GPResult that is produced when the user logs on. This is obtained in the GPMCconsole by right-clicking the "Group Policy Results" node and selecting the Group Policy ResultsWizard. Of course, GPResult is a necessity in diagnosing client-side issues.

Tip 6: Active Directory Performance DiagnosisWhile there are many other troubleshooting tips I could have elaborated on here, this is one thatprobably isn't well known. In troubleshooting server performance, there's a standard set ofobjects, including processor, LogicalDisk, Server, Memory, System and so on. However, there'san NTDS object that provides us with relevant AD counters such as DRA, Kerberos, LDAP andeven NTLM-related counters. In addition, we can collect valuable AD data by monitoring theLSASS process. I recommend enabling the following:

Object: ProcessCounters: %ProcessorTime, Working Set, Working Set Peak

Object: NTDSCounters: (all counters)

Page 6: 6 Tips for Troubleshooting Active Directory -- Redmondmag

3/20/2014 6 Tips for Troubleshooting Active Directory -- Redmondmag.com

http://redmondmag.com/Articles/2009/07/01/6-Tips-for-Troubleshooting-Active-Directory.aspx?p=1 6/6

Unfortunately, there's little information available on what acceptable thresholds are. The only oneI've found that even addresses this is Microsoft's Branch Office Deployment guide. While thereare many counters may or may not be familiar, I've only found a few that are significant:

DRA Pending Replication Synchronizations: These are the directory synchronizations thatare queued and are essentially replication backlog. Microsoft only says these valuesshould be "as low as possible" and that "hardware is slowing replication." These could beindications that DC resources are at high utilization.

LDAP Client Sessions: This is the number of sessions opened by LDAP clients at thetime the data is taken. This is helpful in determining LDAP client activity and if the DC isable to handle the load. Of course, spikes during normal periods of authentication -- suchas first thing in the morning -- are not necessarily a problem, but long sustained periods ofhigh values indicate an overworked DC.

LDAP Bind Time: This is the time in milliseconds needed to complete the last successfulLDAP binding. Documentation says that this should be "as low as possible," but if you runthe perfmon output through the Performance Analyzer of Logs (PAL) tool, it will flag 15milliseconds as a warning threshold and 30 milliseconds as an error threshold. The fix ismore resources: processor, memory and so on. (Note: PAL is an excellent performance-analysis tool, and is available online.)

In diagnosing the LSASS process, as in any performance analysis, a baseline must beestablished. A note on Microsoft's DS blog indicates that if a baseline is not available, use 80percent. That is, the LSASS counters shouldn't indicate more than 80 percent consumption.Above 80 percent consumption indicates an overload condition, which could be a high LDAPquery demand (see Tip No. 4) or general lack of server resources. The resolution is to increaseresources or reduce demand, but be advised this has the potential to cause a performance hit inthe domain.

If you really want to solve your LSASS resource issues, put your DCs on x64 platforms withseveral processors and 32GB of RAM. You might be surprised at how much memory LSASSreally can use.

About the Author

Gary is a Solution Architect in Hewlett-Packard's Technology Services organization and lives

in Roswell, GA. Gary has worked in the IT industry since 1981 and holds an MS in ComputerAided Manufacturing from Brigham Young University. Gary has authored numerous technical

articles for TechTarget (http://searchwindowsserver.techtarget.com), Redmond Magazine

(www.redmondmag.com) and TechNet magazine, and has presented numerous times at the

HP Technology Forum, TechMentors Conference and at Microsoft TechEd 2011. Gary is a

Microsoft MVP for Directory Services and is the founder and President of the Atlanta Active

Directory Users Group (http://aadug.org).