TsooRaD: 2016-02

2016/02/17

1008;reason=Unable to resolve DNS SRV record

ms-diagnostics: 1008;reason="Unable to resolve DNS SRV record";domain="domain.com";dns-srv-result="NegativeResult";dns-source="InternalCache";source="SfBSIP.domain.com"

Scenario Outline

SFB on-premises patched to November 2015. Split-DNS. Firewalls, networks, and even VLANs are all highly segregated. Classic DMZ in operation with outside firewall, inside firewall, no internet browser access from DMZ servers. Port 53 outbound from DMZ servers is not allowed.

The edge servers are using internal DNS resolution (hello InfoSec!). Everything is testing perfectly. IM/P, WebCon, media flow; the mobile clients are working, and PPT publishing internally and externally is perfect. After working through the expected HLB and firewall issues, we are looking right successful. First time through. Nailed it. But wait!

Organization moves from closed federation to open federation. About a week later we notice that federation is suddenly borked – and one-way presence rears it’s ugly head – it would appear that federated partner –> internal org can start things, but the opposite does not work so well. However, everything except presence works AFTER the inside person responds to the outside –> inside toast. Screen sharing fails also – unless the outside person starts the screen share, then the inside person can share. This is a hint for you troubleshooting mavens – we’ll wait while you digest all this information.

TShoot

We traced the above client side errors and see the following:

Subscribe attempt…

…and the resulting 504.

We traced the same errors from the server-side (thank you centralzed logging) and see the same set of outcomes. Here is a simple subscribe request from the inside to a federated partner…

…and you can see the 504 – I cannot find out who I am because I cannot resolve my federation SRV record. This is not good.

A side symptom was that we were seeing similar 504 errors on test-csfederatedpartner and test-csmcxpushnotification.

Hmmm. Does this look like the Edge server cannot find itself? Like there is no _sipfederationtls._tcp.domain.com record? Consider the lock-down environment, and the requirement that all DNS come from the inside…and the inside is going to be authoritative for the zone. Hmm. Lync 2013 documentation (essentially the same for SfB) indicates the SRV record for _sipfederationtls._tcp.domain.com needs to be on the external DNS server. So, go double check that. Yes. We got that part right.

The Fix

Simple. We put the _sipfederationtls._tcp.domain.com SRV record into place on the internal DNS, with the proper target. And then modified the host file on each Edge server to have the public IP for themselves. We did a TTL of 5 minutes on the SRV record. Almost immediate relief. It was like watching Bones cure the planetwide plague with a simple shot of his hyper-injector and you get watch the horrible disease be cured before the next commercial break.

But WHY?

Why did the transition from closed federation to open federation cause this? And why did “this” take 7 days to manifest itself in failures? Why didn’t the issue show up immediately?

Summary

I can guess at the first, as to the second and third, I am clueless. I am not willing to guess in a public forum, so you will have to draw your own conclusions. But I do know what fixed this issue – the federation SRV record being added to the internal DNS zone and modification of the Edge Server host files so that they can find the SRV target by IP.

YMMV

2016/02/09

SfB Patch/Upgrade Outline

Skype for Business (SfB) Server 2015 embodies several server-side enhancements beyond Lync Server 2013. The patching cycle for the SfB environment will need to be modified to allow for these enhancements. SfB contains five layers of servers, each of which will need to have separate handling:

Front End Pool Servers
Persistent Chat
Edge Servers
OWAS (Office Web Apps Servers) Servers
SQL Server and File Shares.

Host Server updates also need consideration – primarily because rebooting SfB servers can cause Windows Fabric errors that can affect the ability of the SfB server to recover into a running state.

Host Servers need to be patched to corporate standards; however, the application host servers cannot just be rebooted at will. Rebooting servers that host SfB services will result service outages and potentially in service failures where the servers may not recover services after rebooting.

Accordingly, phase one in the entire setup for patching SfB and related servers is to set the Windows Update to download but require administrator to install. For ORGNAME this may require moving servers away from containers to which GPO applies and controls WUPDATE settings.

This guidance will not apply to the SfB Edge servers as they are not domain members. However, the SfB Edge servers should be checked to ensure that the WUPDATE is set as shown.

Locate and download the latest SfB server updater from this site: https://technet.microsoft.com/en-us/office/dn788954 - as of this writing, November 2015 is the latest SfB 2015 update. The consolidated server update installer is preferred over the individual updates.

Note that the file name show is mostly correct, but that I rename them to help me keep track of what is what.

Place the update file in a separate folder on each front end, persistent chat, and edge server. The update process generates log files which are kept in the origination location. Having a separate folder for each updater constrains the log file location and makes the entire thing easy to delete or verify later.

SfB Front End Servers

Reference:

https://support.microsoft.com/en-us/kb/3061064
1. Find the section labeled: “Upgrade or update the Enterprise Edition pool that has at least three front-end servers” and READ IT.
Read the following TechNet guidance: https://technet.microsoft.com/en-us/library/jj204736.aspx
Then execute those instructions ONE SERVER AT A TIME.
After each server reboots wait until ALL indicated services are running before moving to the next server in the pool. Keep in mind that these services are on delayed startup, and there could be a significant (10-15 minutes) delay before the SfB Front-End service starts.

SfB Persistent Chat Server

SfB Persistent Chat requires only that the services be running after the persistent chat server is patched and rebooted.

SfB Edge Servers

Edge servers are easy. Execute the serverupdateinstaller.exe on one Edge server at a time. Reboot if requested. If a reboot is needed, monitor the reboot process until the SfB services are restarted (about 10 minutes). Otherwise, verify the following services are running.

Do the next edge server.

Office Web Apps Server (OWAS)

The OWAS requires different handling from the other servers. See the following articles:

Assuming the two OWAS servers are hlbwowasp101 and hlbwowasp102, the following commands will recreate the OWAS farm when the time comes:

1. From server hlbwowasp101.corp.domain.com, open PowerShell as administrator, and execute the following (command wrapped):

new-officewebappsfarm -internalurl https://hlbsfbowas.domain.com -externalurl https://hlbsfbowas.domain.com -certificatename sfbwebext

2. From server hlbwowasp102.corp.domain.com, open powershell as administrator, and execute the following command AFTER the previous command on the other server:

new-officewebappsmachine -machinetojoin hlbwowasp101.corp.domain.com

After patching, reboot, and recreation of the WebAppsFarm, verify the following service is running on each server:

SfB File Shares

ORGNAME runs the SfB file share (\\corp.domain.com\sfb-fileshare ) on the OWAS servers. Care must be given to handling the DFS in that the entire environment is relying on the sfb-fileshare for various functions and downtime on the OWAS servers will affect all other servers. Other than the update process shown above, the OWAS servers should only be updated one at a time.

SfB SQL

ORGNAME is using a single SQL server. This server should be patched along with the other SfB infrastructure with the following caveat: The SQL needs to be back online within 30 minutes or there will be impact to the users. The impact will be the clients entering “resiliency mode” due to the SQL server not being available to the front end servers. For more information, see this: https://technet.microsoft.com/en-us/library/jj205184.aspx.

If you have mirrored SQL or perhaps Availability Groups in SQL, then you will need to investigate the SQL patching process from a slighlty different aspect – namely, keeping the active node where you want it.

Summary

SfB has changed the patching process from how it was done in Lync 2010 and Lync 2013. Each layer of the system needs something that is just a little different from the other layers of the system.

YMMV

2016/02/03

Netgear ProSAFE and Skype for Business

What are talking about today?

We all know how strapped the small business can be when it comes to resources. Staff, cash, cash flow, technical skills, and time. Another item is IP space.

When it comes to trifecta of IP space, cash, and technical skills, and then throwing the desire to have Skype for Business (SfB) deliver all communication modalities things can get a bit dicey. Between the need to communicate and the need to conserve cash, what usually gets hammered is the technical skills. So a pricey consultant is called in, who promptly tells you that you need to spend huge wads of cash on a firewall that will accept and deal with larger IP address space. This is counter-productive you might think, and you would be right.

But, Netgear has some answers, and combined with some judicious telecom provider shopping, you may be able to find the answers in the form of a /29 CIDR block.

I won’t try to answer all the questions here, but I do want to illustrate the configuration of the Netgear ProSAFE to support SfB. My issue is that the stock documentation for the Netgear ProSAFE is a bit shy on details. Past the statements that it supports multiple IP’s there is not much in the way of “how to do it” and if you think your ITSP provider is going to help, think again.

Tsoorad to the Rescue

As you might have guessed, Tsoorad.net falls into this category. First off, I am a cheap ba$tid. Second, I have particular needs what with the technical lab and all. Finally, I face the ca$h situation just like everyone else, and when combined with “first off” I decided against purchasing the more expen$ive “indu$try $tandard $olution$.” Still with a bit of reading and learning and poking your nose where the vendors don’t want you to poke it, you can make your system play well with others. I will NOT cover the SIP trunk provider or the reverse proxy stuff here; that you can get elsewhere in other articles.

IP Space

SfB, if you want the full suite of operations and features, is going to require a minimum of two (2) public IP addresses. No way around it. One (1) for the external side of the SfB edge server, and one (1) for the reverse proxy. You can do away with the reverse proxy address, but you will give up inviting outsiders to your web conferences and forget about your mobile clients working. As a side note, you may be able to piggy-back your reverse proxy needs onto an existing IP with 443 and content redirects, but that is usually well past internal SMB technical chops and will have you calling in a provider or con$ultant.

If you want to investigate the minimum IP requirement solution, see this section of the SfB documentation. You can also work up this requirement using the SfB Planning tool. For those tech heads out there, you should be reading this first. Adelante!

The biggest problem with the minimum IP space solution is using port 444 for web conferencing. In my experience, most larger environments will not allow port 444 outbound willy-nilly. Which will mean that crucial web conference that you are hosting on your minimum IP space solution will most likely not have any large corporate attendees.

Enter the /29

Offering 5 usable addresses, the /29 CIDR block is the smallest block that is usually provisioned by providers – other than giving out singles. I use a /29. One IP for general use, one IP for my SIP trunk with Intelepeer (they are great), and three for SfB. For my lab use, I have Office 365 in hybrid, and I also bring port 25 into one of my SfB-assigned addresses and use the firewall to pick off the SMTP traffic and send it to the proper internal server. Works great, less filling. And yes, the Netgear ProSAFE can be configured to support all of this (see how I worked around to the beginning thesis?)

ProSAFE Configuration

Ignoring the unboxing and basic connection steps, the first thing you need to do is configure the firewall for your public IP space.

Next, you may wish to change your things to match your internal subnet to which you are connecting.

Now onto the real fun

SfB has a variety of port requirements. You can refresh your memory here. Assume the following

IP #1 = SIP
IP #2 = WebCon
IP #3 = AV
IP #4 = reverse proxy

So, IP #1 needs TCP 443, 5061, 5269. Depending on your DNS source, opening TCP/UDP 53 may be needed – this is the for CRL check. IP #2 needs TCP 443. IP #3 needs TCP 443, UDP 3478, and optionally TCP/UDP 50,000-59999. Simple, right? IP #4 needs TCP 443 and 80. If you are confused about the 50k range, see this. Nothing has changed here since Lync 2013.

The Netgear issue

Netgear, like any firewall, blocks by default. So we need to poke some holes. Netgear, like almost everyone else, also provides some predefined services – and you can use them; like HTTPS or SMTP. No sense in re-inventing wheels, eh? The other issue with the ProSAFE is that ALL traffic, by default, goes OUT on whatever IP you assigned as the device primary in that first screen shot. The solution here is to create outbound bindings so SfB traffic goes out on the IP that is expected.

Create Services

Because we are going to use the pre-defined HTTPS service, here is what is needed. I feel that the construction of these services is very self-explanatory.

Inbound/Outbound Services

So here is where the rubber meets the road, or, to not mix metaphors and to maintain our focus, where the electrons meet the transistors. When creating your new outbound rule, what you need to know in advance, is what your NAT is that matches the service to the IP. Enter an IP that is valid for the your CIDR mask and it will work. Typos count against you.

For the inbound service you need to know the exact same thing, and guess what, it looks pretty much the same. A few things of note here. First, you need the NAT information in advance. Also, see the “WAN Destination IP Address:” thing? This is where the Netgear documentation falls flat. Doing multiple IP is simply not mentioned as to HOW. But here you go. Enter an IP that is valid for the your CIDR mask and it will work.

OK. Now that we have that sorted, here is the full outbound rule set…

…and the full inbound rule set.

But wait! The sharp-eyed critic will notice that the rules appear to be doubled-up for the HTTPS and TCP/UDP for WebCon and AV. Yes, they are. Inbound rules are dependent on External DNS resolution, and the outbound rules are dependent on the sending entity having an IP that matches the rule. I keep my lab ready to demonstrate for the unbelievers that doing a single IP SfB (or Lync) edge will result in THEIR environment not being able to connect to a web conference hosted on a single IP Edge server (because of the port 444 thing). Which usually brings them out of the dark into the light which is the greater good. I can live with some topology builder work and some server re-ip – it only takes a few minutes, and saves much teeth-gnashing later. And the firewall don’t care if it has a few rules that aren’t in active use.

Furthermore, the really critical reader will complain that I have said squatoosh about the reverse proxy in all of this. Here’s why: Inbound rule #4 lands on my HLB, which is, by definition, a reverse proxy. I bring ALL miscellanous web traffic for my domains to this one address, land it on my HLB layer, and use content redirect rules to parcel things out to the appropriate target service/server. Works very nice, is totally less filling, and makes me wonder why anyone does anything else.

Let’s Wrap This Up

Netgear ProSAFE can be a great solution for the smaller SMB. Skype for Business can also be a great solution for the smaller SMB. But, to get all that both offer, you need to dig a little deeper than just throwing things at the wall to see what sticks. Single IP Edge works, but it may not be the best choice. Finally, we demonstrated how to make a ProSAFE work with a CIDR block, and also showed the rules needed to enable all of SfB through a ProSAFE device.

YMMV

2016/02/02

Skype Office 365 File Transfer

We all know and love file transfer within the Lync 2010, Lync 2013, and SfB client. OK, while you may not think it’s so hot, *I* love it. Therefore, YOU all love it too. Simple logic. No?

The Ask

At any rate, I was recently asked to kill two features in an Office 365 Skype for Business deployment. To whit, the conversation history, and surprisingly (for me) the file transfer. As you can guess, some legal issues come into play here, and compliance dictates that these features be off.

OK, the first was pretty easy. If you do some google-fu (bing-fu works also) on “Disable office 365 Auto Archive Skype” you will surely find this article. You should have no problems doing this little magic trick. Even my shadow did it very well.

But, let’s talk about disabling the file transfer. As we all know, file transfer is part of the conferencing policy granted per user. OK, but I don’t want to apply a conferencing policy to each user as we migrate them up into Office 365, or at least I don’t want to do it one at a time via the GUI. Maybe there is another way? It turns out that maybe there is – or not.

My first effort was doing this: In the Skype for Business Office Portal, under the users, and having a user selected, uncheck the indicated item. See this for reference. You can see why I tried this.

Guess what, it don’t work. Don’t know why, but 24 hours later, the users we tried it on were happily doing file transfers and thumbing their noses at the corporate legal hacks. If you read my reference up there, you will also note that this is one user at a time, AND wants a hold via Exchange to be placed on the user. Another solution was needed.

The Solution

Thinking to identify which conferencing policy to use, if you do a remote powershell into your Office 365 tenant, and then a get-csconferencingpolicy you get the following list from hell – none of which you can change. I changed my font in PowerShell to get as much into the graphic as possible, and I still don’t have them all there. You get the idea.

There must be a better way.

Enter, stage right, the Knower Of All Things, Bob Wille, who pointed out to run

get-csconferencingpolicy –applicableto –identity

which works out to this for my North American user object:

Way better. Again, font reduced to fit things onto screen, and at least this time they all fit. And, you are welcome, I have highlighted the one to use.

Now, a simple

grant-csconferencingpolicy –policyname tag: BposSAllModalityNoFT –identity martin.luther@tsoorad.net

and we are done. And for your information the above line has a space between the ‘:’ and the “B” because of the totally unasked for emoticon that combination creates Sad smile .

You can also imagine that doing a get-csonlineuser and piping that into the grant-csconferencingpolicy would quickly apply this to all of your users.

YMMV.

Inter-pool Dial in Conference Transfer (SfB)

The Scenario

Doing an upgrade of the entire environment, Lync 2010 to Skype for Business Server 2015 – on-premises. The existing dialin conferencing was ITSP to Avaya 6.3 SM to Lync 2010 Mediation server. In this case the mediation servers were a separate pool from the FE pool.

The Issue

When we moved the first pilot users to the new pool and then tested against use cases, the dialin conferencing failed. Oooops! Some tracing and sharking of wires, along with looking at the client-side logs was pretty clear. The call landed on the Lync 2010 mediation server just fine, the conferencing attendant on the Lync 2010 FE (and the proper one at that) picked up the call, but then promptly dropped the call during the conference identification phase.

The Fix

I have seen this issue before, back in the murky days (daze?). But usually you see this with simul ring or call forwarding out of the system to the PSTN. See this here for some background. I had thought this was fixed in Lync 2013 as I had not seen this for a long time. And then, for some reason I thought it had been fixed in SfB. But, apparently maybe not. And specifically, if you are doing 2010 pools to SfB pools. See this here for the exact error (or at least a reasonable approximation).

So off we went to the trunk configuration, set the refer support to NONE, and magically our dialin conferencing calls worked as expected. We got by with doing the global trunk configuration, you may need to be a bit more granular.

YMMV

TsooRaD

About Me

2016/02/17

1008;reason=Unable to resolve DNS SRV record

Scenario Outline

TShoot

The Fix

But WHY?

Summary

2016/02/09

SfB Patch/Upgrade Outline

SfB Front End Servers

SfB Persistent Chat Server

SfB Edge Servers

Office Web Apps Server (OWAS)

SfB File Shares

SfB SQL

Summary

2016/02/03

Netgear ProSAFE and Skype for Business

What are talking about today?

Tsoorad to the Rescue

IP Space

Enter the /29

ProSAFE Configuration

Now onto the real fun

The Netgear issue

Create Services

Inbound/Outbound Services

Let’s Wrap This Up

2016/02/02

Skype Office 365 File Transfer

The Ask

The Solution

There must be a better way.

Inter-pool Dial in Conference Transfer (SfB)

The Scenario

The Issue

The Fix

test 02 Feb