About Me

My Photo
TsooRad is a blog for John Weber. John is a Lync Server MVP (2010-2014). My day job is titled "Principal Consulting Engineer" - I work with an awesome group of people at CDW, LLC. I’ve been at this gig in one fashion or another since 1988 - starting with desktops (remember Z-248’s?) and now I am in Portland, Oregon. I focus on collaboration and infrastructure. This means Exchange of all flavors, LCS/OCS/Lync, Windows, business process, and learning new stuff. I have a variety of interests - some of which may rear their ugly head in this forum. I have a variety of certifications dating back to Novell CNE and working up through the Microsoft MCP stack to MCITP multiple times. FWIW, I am on my third career - ex-USMC, retired US Army. I have a fancy MBA. One of these days, I intend to start teaching. The opinions expressed on this blog are mine and mine alone.

2013/04/22

Lync 2013 EE pool won’t start

Sub-title: We get to explore the Reset-CSPoolRegistrarState cmdlet

Life is tough.  Apparently the drunk that took out the power station the other day and caused a major power outage in my zip code is going to find that out.

What I found out was that when my lab restarted, the Lync 2013 Pool FE servers would not start any service except for the Replication Service (Lync Server Replica Replicator Agent – who names this stuff anyway?).  Wowzer.  This is not good.  I know that having only two nodes puts me living on the edge, but it is only a techie lab with no real users!  Never had this bad of a failure before.  I have been good and followed the restart instructions for a two-node EE pool.  You have read that, right? If not, here you go.  Look down at the bottom of the page – you will catch on quick. 

Issues like this is why I always recommend having Lync 2013 EE pools with at least three members.  Can’t have that quorum deficit biting you!  At the very least, I rant and rave at some length about setting up for failure and make sure the administrators of the impending doom system know what is what and the risks and the mitigation process.

In our little story, the command

Reset-CSPoolRegistrarState –ResetType QuorumLossRecovery –PoolFQDN ls2013pool.tsoorad.net

(runs on each pool FE member) did not achieve the desired pool state.  At about this time, I am really starting to exercise my military jargon skills.

When the power did come back on, there was no way to follow the recommended practice of starting the servers in reverse order they went down – they both went down at the same time!  The event viewers had enough red errors in them to allow me to open a paint store.

What to do, what to do?  Well, our intrepid hero simply dusted off the command syntax for Reset-CSPoolRegistrarState. Take a gander at this explanation of the reset types:

* ServiceReset – The RtcSrv and fabricHostSvc services are stopped and restarted. A service reset will be performed if the ResetType is not specified.

* QuorumLossRecovery – Reloads user data from the backup store for any routing groups currently in quorum loss. (A quorum loss occurs when neither a database nor its replicas are available.) Data not yet written to the database could be lost when you do this type of reset.

* FullReset – performs the same type of reset as QuorumLossRecovery but, in addition, rebuilds the local Lync Server databases. This type of reset can be potentially long and resource-intensive.

* MachineStateRemoved -- Removes the specified server from the pool. This type of reset should be used only when the server in question (or its databases) have been permanently lost.

A short period of pondering removed the “MachineStateRemoved” from being a valid fix, and the “QuorumLossRecovery” had already failed. so I ran the following (you might want to run the Lync Management Shell as Administrator, or you will find out why I always run PowerShell as Administrator): 

Reset-CSPoolRegistrarState –ResetType FullReset –PoolFQDN ls2013pool.tsoorad.net

I then did a “Shutdown /r /t 0” to get a system restart.  One at a time, mind you, but I booted both my EE nodes after running the reset-cspooregistrar cmdlet.  Success!  The sun is in my sky once again.

Now, this is a little drastic, but what the heck?  To quote Alfred E. Newman, “What, me worry?”  In my case, the lab pool was not starting anyway.  In a real life situation the data loss would be that data that had not committed to the local database prior to the failure.If you run into this, and the pool won’t start anyway, there is probably a bigger issue than losing a few contacts or a few chats.  .

YMMV.

No comments: