TsooRaD: 2009-10

2009/10/28

Users unable to join LM session, MEET NOW button not working, LM functions grayed out.

In the middle of a deployment - an OCS R1 to R2 migration - we noticed that the LM functions were not working. LM worked for the individual workstations when connecting to a remote session initiated by a federated partner. The meeting policy at the global level was created, edited, and assigned correctly.

We noticed that we had neglected to run the Web Conferencing validation tests. We had a tick box on our checklist, we had just overlooked it. Running the validation wizard revealed that the validation connection checks were failing to successfully contact the MCU on either of the Front End servers. We double-checked everything, and concluded that everything on the OCS side was correct.

“Aha,” says we. “Must be a certificate issue.” Except that we were using good public certificates on all the interfaces for the FE servers. Bummer. Except, I then read this blog article.

Oddly, this was so on target it floored me. I used the solution that changed the usage on the Trusted Root Certificate to “ALL” - voila! problem resolved. I don’t know who exactly wrote this, and what follows is a cut ‘n paste and edit of the relevant parts of that blog that fixed my issue. Many thanks to the unknown CSS engineer (Dave) who took the time to write this up.

Event Type:    Error
Event Source:    OCS MCU Infrastructure
Event ID:    61013
User:        N/A
Computer:    OCS1
Description:
The process DataMCUSvc(2596) failed to send health notifications to the MCU factory at https://OCS1.contoso.com:444/LiveServer/MCUFactory/.
Failure occurrences: 3491, since 3/24/2009 10:05:18 PM.

If you run the Web Conferencing validation wizard from the OCS Pool, you may find the following error in the output log:

MCU Type: meeting
URL: https://OCS1.contoso.com:444/LiveServer/MCUFactory/
HTTP Connectivity Error : ReceiveFailure
HTTP Connectivity Error : Receive failure typically indicates that the connection was closed by
the remote host. This can happen if the remote server does not trust the certificate presented by the
Local Server.

HTTP Connectivity Error : Ensure that the certificate of the local server and remote server are both
valid, have not expired, and contain valid subject name. In addition, ensure that the certificate chain
of both Server(s) are valid. Ensure that the certificate chain of the local server is installed
on the remote server and vice-versa. The most up-to date certificate chain that was used to issue
the server certificate must be present.

When you see errors like these, it usually indicates that a certificate-related authentication problem exists with the OCS Pool (or with a particular OCS Front End server). Most of the time, this turns out to be a problem with the certificate from an issuing Certification Authority. To troubleshoot this issue, you would typically perform the following steps:

Log in to the affected OCS 2007 Front End server either locally or remotely using Remote Desktops.
If the issuing CA is a Root CA (the top of the list), expand Trusted Root Certification Authorities > Certificates

If the issuing CA is an Intermediate CA (not the top of the list), expand Intermediate Certification Authorities > Certificates
From the list of CA certificates, right click on the certificate and choose Properties
Under the General tab, verify that Enable all purposes for this certificate is selected (or, if Enable only the following purposes is selected, verify that both Server Authentication and Client Authentication are enabled)
Click OK to close the properties of the CA certificate.
If this was an Intermediate CA certificate, repeat steps 6 through 10 until these settings from all certificates in the trusted certification chain are verified
Close the Certificates Management Console (be sure to restart services if you made any changes)

Why this occurred on a brand new R2 installation on server 2008 SP2 is beyond me. The OCS R1 system (on Server 2003 SP2 R2) did not have this issue, but the brand new setup did. Go figure.

Edge Server Certs and blank Communicator message windows

A client recently changed their certificates on the edge server. They put together a certificate that handled everything with one certificate. However, the SAN construction on the cert was a little wrong.

Symptoms:

Presence worked, but federated contacts could not fully establish an IM session. LM and AV did not work as expected either. If a federated user initiated an IM, the internal user would get the toast and then when the toast was opened, there was nothing but a blank….but the toast had the initial message…but a blank content pane. If the internal user attempted to initiate an IM session, the reverse would occur. After the blank IM window appeared, any subsequent efforts at IM resulted in a timeout with a 504 error.

What caused this?

Logging on both edges revealed that the initial IM invite was addressed to the proper SIP SRV record, but after the initial ACK, the client system packets were being directed at a different FQDN. Digging into the client’s edge server revealed that the FQDN was the actual server FQDN. It seems the cert had been issued for the FQDN and that SIP, AV, and LM were on the cert, but that SIP was not the FIRST SAN name.

So, what happens is that the federated contact can get to SIP.domain.com (via _sipfederatedtls._tcp.domain.com SRV) for the initial invite, but after that the packet sourcing of the remainder of the conversation looked like it came from FQDN of the client’s edge because, according to the certificate on the Access Edge, that is exactly where it came from. The initial SIP invite worked because the traffic arrived at the edge.domain.com access edge interface IP address, and the SAN on the existing (new) cert did indeed have SIP.domain.com as a valid domain. However, certificate’s common name and first SAN entry is what drives that particular NIC FQDN name when it comes to transmitting vice receiving. The end result is the federated side of the conversation gets started just fine, but then tries to communicate to an FQDN that is not accessible from the internet.

Clear as the bottom of a well on a dark night, eh?

The Fix:

Changing the certificates back to the originally installed set fixed the issue….

SIP.domain.com
AV.domain.com
LM.domain.com

This will also work if you use ONE certificate with those three names (or whatever name you choose for each) as long as the SIP.domain.com is both the common name of the cert as well as the first SAN entry. The FQDN of the actual server should only show on the internally-facing Edge interface. For even MORE confusion, see this:

Bon Appetit!

2009/10/14

MS KB 974571 and OCS/LCS

http://communicationsserverteam.com/archive/2009/10/14/632.aspx outlines and issue with applying this security patch to your OCS/LCS servers. The only fix if this is happening to you is to uninstall the KB fix.

2009/10/01

Microsoft delivers zero license cost XMPP Gateway for OCS

Today, Microsoft delivers a new gateway. Read about it here. This is GREAT news.

Now you can federate/PIC with Microsoft Live, Google Talk, and Jabber.

TsooRaD

About Me