Executive Summary:
Active Directory (AD) is an important technology. However, AD presents certain problems that can negate the service’s power and strength. Learn how to troubleshoot common AD annoyances such as time synchronization after hardware replacement, cross-forest authentication, usability of the least privilege model, and 64-bit Windows challenges.
|
Over the past
eight years
I’ve
helped
plan, implement,
and operate
various Active
Directory (AD)
infrastructures.
And as much as I
value AD’s power
and strength, I’ve also
learned quite a few annoying things
about AD that sometimes prevent it from operating
as smoothly as possible. In this article I
discuss some of these annoyances and explain
how to best work around them.
Special Hardware
Problems
In general, all AD domains are rather tolerant
to hardware problems that take down a
single domain controller (DC). Of course this
is only true if you follow the best practice of
implementing more than one DC per domain
and if you continuously monitor that they’re
replicating the changes amongst themselves.
This way, if one DC fails for some
reason, clients wanting to authenticate
to the domain will leverage DNS to
find another DC in the network
to connect and authenticate
to. For normal operations,
no problems
occur even if one of
the special Flexible
Single-Master Operation
role-holder DCs goes down
for a few hours or even a few days. AD is
designed to operate without all the FSMO DCs
being available all the time. Obviously, you
shouldn’t update your schema or mass-create
new objects in your domain when specific
FMSO DCs are down. But normal operations,
such as users changing their passwords
or administrators adding an occasional new
object to the domain, will still run. This is one
of the key strengths of AD and its multi-master
replication model.
But sometimes it isn’t the hardware failure of
the DC that causes a problem. Sometimes problems
don’t start until you repair the hardware
and reboot the DC—especially if the DC is your
domain’s PDC emulator. By default, all DCs in
an AD domain synchronize their time with the
PDC emulator of the respective domain. Computers
and servers joined to the domain then
synchronize their time with the DC they use for
authentication—usually a DC within their AD
site. For Kerberos authentication to work, all
these clients and DCs must be synchronized in
time. (In an AD domain, Windows 2000 Server
and later clients and servers leverage Kerberos
by default.) If the time skew (difference) is too
large between a client and the server it wants
to access a resource from, such as a file share,
authentication to the resource server fails. The
default accepted time skew in an AD forest
is five minutes. So even if a user or computer
properly authenticates to a domain, it might fail
to access a server because of a time difference.
What does all of this have to do with the
hardware failure of a DC in your domain,
potentially even the PDC? Quite simple: If your
hardware repair involves replacing a server’s
motherboard, you usually also replace the
on-board clock. And it’s highly unlikely that the time set on the new motherboard’s system
clock is in sync with the rest of your AD forest.
If you then just reboot the PDC while it’s on
the network, the other DCs will synchronize
their times with the PDC when they see that
it’s online again. Thus, you might introduce
a time skew on various machines in your
environment that’s unacceptable to Kerberos.
Although your PDC might have been properly
configured to replicate with an external time
source, the wrong time has now made its way
into your network and will cause problems
such as Microsoft Exchange Server servers not
being able to leverage Global Catalogs (GCs) in
their site for LDAP lookups, or users not being
able to access file shares. Your environment
might not normalize for hours or days and
might even require manual intervention.
The solution to this problem is as simple as
the problem itself: If you need to replace a DC’s
motherboard, particularly for a broken DC that
hosts the PDC emulator FSMO role, remove the
network cable before you reboot the DC. After
the DC reboots successfully (which might take
longer than normal because the DC won’t be
able to find other DCs to replicate with), you
need to log on locally and update the time on
the DC. Afterwards, you can plug the network
cable back in. Alternatively, if your PDC is still
responsive, you can temporarily transfer the
PDC role to another DC and transfer it back after
replacing the motherboard. These methods prevent
time-synchronization problems that in turn
cause trouble with network authentication.
Cross-Forest
Authentication
It’s difficult enough for most AD administrators
to understand how clients leverage DNS to
locate DCs of their own forest or domain within
their own network. So before we look at how this
process might work across different AD forests
and networks, let’s quickly review the DC location
process within an AD domain.
In short, a Win2K or later client that has
never authenticated to an AD domain will
query DNS to ask for any DC that’s responsible
for its own domain. The client does so by asking
the DNS server to return the list of all DCs
that have registered the generic DC locator
record (which by default includes all DCs in
an AD domain). To retrieve these records, the
clients first query for the generic LDAP service
records in the DNS hierarchy’s _msdcs zone.
For an AD domain called MyCompany.net,
these generic records are located in the following
DNS hierarchy: _ldap._tcp.dc._msdcs
.MyCompany.net.
The client then contacts a few of the DCs in
the list returned from the DNS server, notifies
them of its intention to be authenticated, and
waits for the first DC to respond. However,
the DCs are smart enough to understand the
situation and therefore check the IP address
that the client is using in its request. They see
that the client is joined to the domain, and they
compare the client’s IP address with the site and subnet data stored in the AD configuration
partition. With this data, the DCs determine
the client’s proper site, and they tell the client
to connect back to the DNS server and query
for the actual DC to authenticate to in its own
site. The client then requests the proper Kerberos
service record.
Let’s assume the client is located in a
branch office of the MyCompany.net AD
domain and the AD site name is BranchSite.
The client would query DNS for the sitespecific
Kerberos service records registered
for this site. These records are located in
the following DNS hierarchy: _kerberos._tcp
.BranchSite._sites.dc._msdcs.MyCompany
.net. Figure 1 also shows this hierarchy, from
the Microsoft Management Console (MMC)
DNS snap-in.
The DNS server will then return only DCs
that are responsible for the client’s site, which
the client in turn leverages to authenticate to the
domain. Fortunately, the client stores the information
for the last AD site it belonged to in the registry
and leverages this information directly the
next time it needs to locate a DC. To find the AD
site name that a client cached for itself, go to the
HKEY_LOCAL_MACHINE\SYSTEM\Current
ControlSet\Services\Netlogon\Parameters DynamicSiteName registry subkey.
Even after a user authenticates to his or her
proper domain, cross-domain resource access
involves a few more steps in multi-domain forests.
Part of the DC locator process is repeated
when the user accesses resources in another
domain in the forest. Although all the domains
in a forest trust one another, the Kerberos
Ticket Granting Ticket (TGT) that the client
received from its own DC at logon is valid only
for requesting service tickets that in turn grant
access to resources in the client’s own domain.
When a user accesses a resource in another
domain in the same forest (e.g., a file server),
the client again first queries DNS to locate a
DC of the file server’s domain to request a TGT
that’s valid in this domain. The good thing is
that the client will immediately find the correct
DC to use. Because the client already knows
what site it’s in, it uses a site-specific DNS
query (such as the one I described) to locate a
DC of the other domain.
One reason this process works so efficiently
within a forest without the need to query for the
generic DC locator records of the other domain
is that all domains in the same forest replicate
and use the same AD configuration partition.
This partition also contains the site and subnet
information, so that DCs from any domains in
the forest properly register locator records for the
respective AD sites. If no DC exists in an AD site,
or if a site doesn’t have a DC for every domain
in the forest, the AutoSiteCoverage mechanism
ensures that the closest DC will register a locator
record in DNS. This means that within its
forest, a client can always locate a site-specific
DC for any domain in the forest. The client
doesn’t have to leverage the generic DC locator
records, which might direct the client to a DC on
the other side of the world to authenticate with.
However, remember that this process assumes
that the site-specific DCs are available—if not,
the client will fail back to leveraging the generic
DC locator records and might therefore experience
slow authentication and GPO processing.
Suppose that you’ve just acquired another
company that has its own AD forest. To efficiently
allow collaboration between the
employees of both companies, you decide to
establish a trust across both forests. Your plan
might be to later consolidate both forests; however,
a forest trust is often the first step to allow access to resources in different forests for both
parts of a merged company.
Continued on Page 2