Many thanks go to Chard Johnston (AudioCodes), and Jeremy Silber (CDW)
The project was Lync 2013 Enterprise, two sites, full HA, DR, and call recording using AudioCodes SmartTap. The edges in both sites were DNSLB.
Once we started making more than a few calls to external numbers, we noticed that the SmartTap was not recording as expected. This caused a few calls to the helpful AudioCodes support engineers. It turns out that SmartTap does a little call-redirection magic, and captures all the necessary traffic to record both sides of a phone call from the edge servers. And when one user lands on Edge1 and the other user lands on Edge2, we start seeing calls failing.
We were also failing regular calls between Lync users that used the Edge servers. Same symptoms. Calls would start, then fail when the time came to establish media. Needless to say, this was not good.
Interestingly, this problem has been around for bit. Jeremy Silber has an excellent article that outlines the problem, the cause, and the fix in explicit detail. Even better, if you talk to Jeremy (I happen to have direct access) he can translate the contents of that blog into English! Highly recommended reading. Having it translated to English so that either of my brain cells could comprehend was priceless. Firewall rules had been through the change order process at least a month ago, so we thought we were good there. All previous testing had been good. But we had not tested voice/video yet.
What is going on here?
I had a heck of time getting those firewall rules in, not at the technical level, but at the explaining the “why” in English. So to get my skills up to speed, I discussed the issue (and the fix) with Chard Johnston of AudioCodes – seeing as how he was buried in trying to get SmartTap working correctly. After I showed Chard the hairpin requirement – see previous reference to Jeremy Silber – Chard went to Microsoft using his channels. Apparently this discussion went on for a bit. Chard came back and created the following diagrams.
Why is this needed? Well, if you look back at Jeremy’s blog, you will see that the candidate pairs that are exchanged between the users don’t have FQDN, they have IP. And, done correctly, the IP will be that of the EXTERNAL PUBLIC IP of the AV service per edge. So, the firewall must allow traffic from one public IP to simply hairpin back to the other edge public IP.
Remember that while SmartTap highlighted the issue, it was a firewall configuration that was the real culprit. If we look at the reference article in the Lync 2013 documentation, we don’t find the requirement for the Edge server AV service to talk to the other edge pool server AV service via the public IP address. Once you know the requirement exists, the information is there if you read between the lines a bit.
In the end, the requested firewall rules were not implemented correctly, so we had some one-way conversations going, and some quick adjustments by the firewall team had everything ironed out.