Wednesday, September 28, 2011

dcpromo fails with error 1326: unknown user name or password

Problem:
Recieved an error 1325 (Logon failure: unknown user name or bad password) while attempting a dcpromo.
Here's a screenshot of the error: 


The Solution:
I ended up just setting the local administrator password to match my domain administrator password and magically it worked! It made no sense to me! Anyone know why?


Thanks to http://tsoorad.blogspot.com/2010/01/oh-my-aching-brain-cell-or-dcpromo-u.html for the fix!

Wednesday, September 21, 2011

Event Id 34 and 50 Time-Service errors fixed on Virtual DC

Recently I deployed a 2008 R2 Domain Controller running off VMware vSphere with VMware tools installed. I had found out that the VMware tools had Time synchronization with ESX host enabled (which the ESX hosts did not have ntp configured properly) and thus threw the DC's time sync way off. As a note it is best practice to either use only one of the time synchronization methods with DC's..... either all use Sync with ESX host or all use w32time...from my experience w32time seems to work better.




After discovering the time sync with esx host was checked I ended up getting lots of event errors 34 and 50 in the event logs. The fix was to uncheck the Sync with ESX host and edit the following Registry keys to allow me to sync the time back beyond the allowed sync change thresholds:


[HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Config]
 MaxAllowedPhaseOffset 0xffffffff
MaxPosPhaseCorrection 0xffffffff
MaxNegPhaseCorrection 0xffffffff

 After changing those keys I ran the following script and everything was fixed. :)

NET TIME /DOMAIN:<your domain> /SET

w32tm /config /update

w32tm /config /manualpeerlist:<your PDCe DC>

w32tm /config /syncfromflags:domhier /update

net stop w32time && net start w32time

Monday, September 19, 2011

Workstation trust relationship issues and iPhone login prompt issues: RESOLVED

Just recently I worked on resolving an authentication issue that took a month long to solve with Microsoft Premier Support Services on a Sev-A case. The issue ended up just needing a hotfix applied to all the remaining 2003 DCs in the domain.

The symptoms were the following:
1) iPhones were randomly prompting for logon and sometimes would not allow authentication for up to 30 minutes. This was also logged in the UAG 2010 servers as Event ID 14 mentioning the "trust relationship" has failed. Android phones also had the same issue however they never prompted on error so it was quiet on the android front.


 
2)  Workstations would randomly fall off the domain and get errors like Workstation trust relationship between this workstation and primary domain failed, or no logon servers available to process this logon request. The workaround was to logon with your username with the SPN like user@user.com or rejoin the machine to the domain.

Environment for the affected AD site:

5 2003 R2 DCs, 2 2008 R2 DCs, Exchange 2010 in the site (3 CAS) with iPhones/Android phones coming in through 2 UAG 2010 array.

Problem and Solution:

We found out that the KRBTGT object had been authoritatively restored causing it increase it's version number basically making it unreadable by the 2003 DCs. We also found that the KRBTGT object had been moved to a different OU folder (possibly by a disabled users script moving disabled accounts to the "Disabled Users" OU). Simply applying the hotfix from MSFT to all the 2003 DCs solved the problem.Another solution would be to upgrade all the remaining 2003 DCs.


Takeaways:
Solving kerberos authentication issues is sometimes very hard to do. We ended up working on many fronts to solve this problem (UAG, Exchange, AD). Some good debugging techniques are to Enable Kerberos Logging, Enable netlogon.log logging on all the effected DC's and try to actively start up netmon traces as well when you notice the issue occuring.

Microsoft KB with hotfix: http://support.microsoft.com/kb/939820

Domain Controller Upgrade causes Exchange outage

The other day we experienced an exchange outage at a customer site while doing DC upgrades from 2003 to 2008 R2. The exchange box was reporting the error below:





The fix was simply to recycle the Microsoft Exchange Active Directory Topology Service and it's dependents (it was pointing to the DC we were upgrading and thus it went down and the outage occurred). We recycled the service and it pointed to the other healthy DC in the site.