The Scenario
A recent project was an OCS2007R2 migration to Lync 2013. The environment had two central sites. At the main central site, “somebody” had already installed Lync 2010, migrated most of the users, while the other central site was still R2. Users had moved between pools in both directions. Edge services were supplied by the Lync 2010 pool. Users were homed on a variety of OU’s. The CSAdministrator and RTCUniversalServerAdmins groups were not domain admin enabled. The Lync 2013 environment was also two central sites. Users from each site moved to their respective 2013 pool.
Everything went pretty well until we went to move users. All the users eventually moved, but due to a really hosed up Active Directory, we had “issues.” After the user move, the R2 components were decommissioned as were the Lync 2010 components. THEN we discovered that a small subset (about 200 users) of the total user population could not login. These users were all coming from domain-joined machines. We tried disabling the users from Lync, waiting for 24 hours, then enabling from scratch. Still had issues. Move the user to the other pool? Nope. Still no good. Different workstation? No. Login from mobile works maybe? Not a chance. The various Lync clients simply said the user account information was no good. Over and over. Not a certificate issue, it was behaving like the username or password was fat-fingered. But the user was already had a valid login to the domain. Grrr. Worse yet, Lync 2010 or and R2 client would login. Around in circles we went.
To be very clear, this was NOT an AD issue, this was an issue with SOME users that moved from an OCS 2007 R2 pool or a Lync 2010 pool to a new Lync 2013 pool.
At which point we decided to nuke the users from the database.
The Fix
Before you go much further – this is a one way procedure. When you get to step 4, there is no getting it back without going all the way through. So, you might want to consider if you have exhausted all your options before you take this route. There is also a slight chance of thoroughly borking up the RTC database when you do this.
Proceed at YOUR OWN RISK
You may want to verify that your environment is replicating properly before you go much further, because we are going to force the issue down in step 3. Get a PowerShell window open and run “get-CsManagementStoreReplicationStatus” and be checking for all servers showing TRUE. If not, then you may want to fix that first. You do check for this on a semi-regular basis don’t you?
OK. 7 steps to success. Here we go
Let’s assume this user is totally unable to login to Lync 2013, even though EVERYTHING looks like it should work, and credentials work just fine in the domain for everything else.
Step 1 - Export user data
Execute this either on a remote PowerShell or on a Lync PowerShell on the server itself. This will save contacts, but not meeting/conference information.
Export-CSUserData –poolfqdn poolfqdn –userfilter “username@domain.com” –filename “whereyouwantthefiletogo”
Like so:
Step 2 - Remove the user from Lync Server
You can do this from the control panel, or get fancy and one-line it from PowerShell. Here is the Control Panel method:
Step 3 - Invoke replication and wait for replication status to show true.
Let’s hope your replication process is good, eh? - Invoke-CSManagementStoreReplication, wait a bit, then get-CsManagementStoreReplicationStatus
Step 4 - Clear out the entry of user object from the Front End server local database
We used the SQL Management Studio for this. If all you have is an Standard Edition (SE) or two, then you will have to either be a SQL wizard, or go install the SQL Management Studio somewhere. I have an Enterprise (EE) pool, so I have a SQL Backend with the Management Studio. If you have an EE pool, then you need to do this on all pool FE servers BEFORE moving to step 5.
Open SQL Management Studio. Once you have that open, connect to each FE server RTCLocal database instance in question. In this case, we have two EE pool servers, so we connected to both, each to the RTCLocal instance.
Now, open up the individual instance, select the ‘rtc’ database…
Now, select “New Query” from the tool bar…
…and you should have something that looks like this – importantly, the database little indicator window thingy should have lit up…notice you have a new tool bar that sprang into being when the mighty admin-mage invoked the Query…
From here, in the SQLQuery pane, we want to enter the following – and this is CASE SENSITIVE…
execute dbo.RtcDeleteResource ‘username@domain.com’
Here is my example. Note that the user name is SPECIFIC and it has SINGLE quotes and it was the SIP ADDRESS not the SAMAccountName or UPN or NetBios format. SIP ADDRESS! Oh, did I mention the stored procedure is CASE SENSITIVE?
When you get that EXACTLY right, click on the “Execute” button as indicated. Should you have done all of this correctly, you will see this:
Now, what does it say on the shampoo bottle? Lather, rinse, and repeat as necessary. If you have 10 users that are borked, then do all 10. And make sure the process is run against all applicable pool servers. If you have 4 EE servers, then you must do this for each user on each server.
Step 5 - Enable user
You don’t need me to illustrate this do you? Use either PowerShell or Control Panel and enable the user that you just got done (hopefully) nuking from the system.
Step 6 - Import the user data from the backup file
Import-CsUserData –poolfqdn poolfqdn –UserFilter "username@domain.com” –filename “sourcefile”
Like so:
Step 7 - Restart the FE service
I use a cmd line window. “net stop rtcsrv && net start rtcsrv” but you can get a services.msc window open and do restart from there…takes about 3 minutes or less to restart – any user on the server when you do this will get dropped with no warning.
On the nice side, when the server comes back, they should sign back in automatically. If you did the rest of your work correctly (read DNSLB), they might have already signed in to another pool member. Just to get dropped again when you restart the RtcSrv on the other pool members. Maybe you should do this after hours.
Summary
We stipulated some users that could not login, even when we KNEW the domain credentials were good. With all hope for world peace exhausted, we proceeded through a seven-step process that nukes the user from the system, resets the system, and in theory gives you a new user, with their old contacts (but not their conference information).
YMMV
2 comments:
Hi.
I have this exact issue and do see what you mention, that user SIP address is still present on one of my FE severs in the RTC DB, after deleting Lync user object.
I haven't proceeded trying your fix yet, because i had a further look and was wondering even more.
My DB entries isn't identical between my 2 FE servers and one DB have more entries than the other.
I was now wondering if this is normal or telling that I do have an even bigger issue?
Do You know?
If your user cannot login anyway, you have nothing to lose. And you will most likely be needing to clean out the stale user info anyway...I don't see where you have anything to lose. You can always export their data first.
Post a Comment