Secure signaling and Media exchange Part I

Introduction

This first part, of a two part post, I will describe the technology behind secure signalling and media exchange in a Cisco Unified Call Manager deployment. It will describe how secure signalling and media exchange between phones, Call Manager and an H323 gateway, works. In part II, I will describe the configuration steps required to pull all this off. A lot of UC engineers might never come across a deployment that requires the need for secure signalling and or media encryption, an sure enough my first secure UC deployment forced me to do a ton of research on this topic. I personally believe there is no other way of successfully deploying something, then to gain a full understanding of the mechanics behind it first. Before I started doing research for this particular deployment I was told; “Yeah, we require encrypted voice calls, it’s easy, just distribute some wildcard certificates and Bob’s your uncle!”. Well, not really.

I started off reading up on PKI and the way Cisco implements it, which in most ways is the same as PKI anywhere else, for this I used the CIPT2 study guide, which has some real good chapters on PKI, TLS/SSL and sRTP. Also, the Cisco support commmunity has some excellent posts on the topic. I personnally got really interested by the topic, seeing most engineers I meet only have a very rough understanding of how PKI works, possibly because I mainly deal with voice engineers who are not primarily interested in security. Mate, if you understand PKI on a voice deployment, you understand PKI everywhere, so a very good skill to have.

This post is based on a deployment containing 2 call managers (8.6.2), phones and 2 H323 gateways. Please also note that I will not describe how to configure signed phone loads and configurations, I will just stick to secure signalling and media.

1. Secure signalling and media exchange, technology

1.1 Secure media exchange

When used in voice deployments, secure media exchange basically means encrypted and authenticated voice calls. This means RTP is somehow encrypted so that play back is not possible (with wireshark it is dead easy to play back an unencrypted RTP stream). sRTP does just that; prevent playback. sRTP uses AES (128bit) as the cipher for encrypting and decrypting the RTP payload. Please remember that RTP can also contain DTMF data. AES does not ensure message integrity iself. This could potentially allow an attacker to either forge the data or at least to replay previously transmitted data. Because of this, the sRTP standard also provides the means to secure the integrity of data and thus safety from replay. To authenticate the data stream and protect its integrity, HMAC-SHA-1 is used. The HMAC is calculated over the packet payload and material from the packet header, including the packet sequence number, this is why it doesnt like NAT. All this stuff is documented in RFC3711, so there is nothing Cisco about this so far (apart form maybe some of the guys who worked on the RFC). A more in depth read on sRTP can be found at:

http://www.cisco.com/web/about/security/intelligence/securing-voip.html

On top of all this, a key derivation function is used to derive the different keys used in a crypto context (sRTP and SRTCP encryption keys and salts, SRTP and SRTCP authentication keys) from one single master key in a cryptographically secure way. Thus, the key management protocol needs to exchange only one master key, all the necessary session keys are generated by applying the key derivation function. SDES, ZRTP and MIKEY can all be used to exchange session keys, or at least an initial master key. SDES uses plain text key exchange, inside the signalling protocol (could be via SIP in its SDP or via Skinny). Using the plain text master key to deduce the session key, is theoretically possible, but is not a simple undertaking. Added security can be laid over the top of signalling (such as SSL and/or IPSEC) to protect the plain text keys. for some external references on key derivation RFC’s please go to:

http://en.wikipedia.org/wiki/Secure_Real-time_Transport_Protocol

1.2 Secure signalling and PKI

Secure signalling is aimed at protecting the session key exchange, as described in chapter 1.1. If we consider a “typical” UC deployment with an IOS gateway, CUCM and a bunch of skinny phones. Secure signalling will need to be achieved on 2 levels:

Between the phones and the CUCM (SSL using certificates)
Between the CUCM Cluster and the gateway. (IPSEC)

You obvisously need the 2 steps, if you are making PSTN calls. If only internal calls between phones within a single cluster are to be secured, then obviously, step 2 does not apply. With step 2 and securing PSTN inbound and outbound calls, the demarcation of security is at a PRI/FXO/FXS/BRI port level when using traditional TDM. If you are using a SIP trunk to a provider, you will need discuss secure RTP with them, but realistically, that would already be outside the sphere of your organisations administration, and therefore any security cannot be trusted per definition. Remember we are not talking about a point to point VPN across the internet.

But before I go to our two steps, let me refresh the 4 crucial building blocks of data security in general and discuss PKI. When referring to DATA security and cryptography, the following services can be identified:

Authentication (proof of a party’s identity; certificate, pre-shared key etc.)
Confidentiality (inability for 3rd party to replay a data stream; AES, 3DES)
Integrity (absence of any alteration in data, for instance replay)
Non-Repudiation (undeniable proof of origin of data, PKI does not provide this)

In the previous paragraph I have pretty much ticked discussed how confidentiality and integrity can be achieved, namely through AES and HMAC. Now, with secure signalling, data encryption and integrity (using Hash value calculation) is carried out again, so it is basically a second security perimeter.

Secure signalling between a Call Manager cluster and a Voice gateway can be achieved using IPSEC. I will explain the configuration tasks in part 2 of this series.

Secure signalling between phones and their cluster is a bit trickier. For its security it relies on SSL and with it comes the wonderful world of PKI.

Public Key Infrastructure is an information repository that ties entities to key pairs. These key pairs are suitable for use with an asymmetric algorithm (such as RSA). A PKI facilitates dissemination of information to a wide audience on behalf of those whose information is published in it.

The type of PKI that SSL uses, necessitates a third party to issue certificates used to mediate the authentication between entities interested in engaging in transactions. This third party verifies that the entity requesting a certificate is who or what the entity claims to be and then issues a certificate. Third parties that broker trust in this manner are called Certificate Authorities (CA).

A certificate is the signature of a public key and information that describes the holder of the public key (remember certificates are based on a private and public RSA key). The signature is created using an asymmetric cipher and a private key held by a CA. The CA’s private key has a corresponding public key that is part of a root certificate. A root certificate is a public key and its corresponding information that has been signed by the private key that corresponds to the aforementioned public key. A root certificate is also called a self-signed certificate because the private key in a key pair is used to sign the public key and identifying information, instead using a private key from a different key pair to sign the public key and its identifying information. I always like to compare CA signed certificates with passports signed by some official government body recognized by many other countries.

When running a cluster in mixed mode, SSL will be used to authenticate the phone to the server and vice versa. (Remember SSL uses asymmetric encryption, which is slower that say AES and therefore not usable to encrypt RTP).

Let us have a look at the picture below, which shows how web server authentication works (I know it’s not a phone, but a web browser, but the mechanism is exactly the same).

Server side authentication, phase 1

So when contacting a web server and SSL is required, the first thing that will need to be done on the web browser site is obtain the public key of the web server (signed by a third party CA). When the wen browser receives this certificate it verifies is the CA is valid. MAC’s use keychain for this, most other normal OS’s use a certificate store in the browser, phones will use a CTL file (certificate trust list).

Up to this point, authentication is only partially done because the only thing that has happened is the exchange of the public key certificate. The next step (phase 2), is that the client challenges the server with a random string.

Server side authentication, phase 2

The client then encrypts that string with the public key and can ONLY be decrypted with the private key that the server holds. So if the server echoes the same challenge string back to the client, this proves the server does indeed have the private key belonging to the public key in the certificate.

The mechanism above only authenticates the server to the client, not the other way around. In a secure voice deployment phones will also authenticate to the CUCM server, so authentication is bidirectional.

Exchange Symmetric session keys for secure media streams (sRTP)

At this stage, we have 2 way authentication, using asymmetric RSA key pairs. the next step is to securely exchange symmetric session key for further encryption (AES), using RSA (through SSL). (see below)

Session Key exchange

As I stated earlier in this chapter, SDES, ZRTP or MIKEY are used to derive the session keys. SDES is probably the most common one and is used for instance in SIP by means of adding a a=crypto attribute to SDP (see below).

SDES, clear text session key exchange

From a PCI perspective, encryption of the SIP signalling traffic is typically not mandated by the PCI QSA since using that master key to deduce the session key is not a simple undertaking, which means that SRTP does come with a lot of added value even if not coupled with SSL

CUCM interpretation of PKI

So how does all this theory fit into a CUCM deployement, well, as you can see in the picture below, TLS and IPSEC will need to be used.

CUCM-phone-VGW, secure signalling

I will focus on the TLS/SSL part of the above picture.

CUCM has some added complexity in terms of PKI, compared to like server authentication, using a single PKI. The reason is that the CUCM and TFTP server, as well as CAPF and MIC (Manufacturer installer certificate). Are all issuers of certificates (PKI roots).

CUCM PKI Roots

To tie all these certificates together, CUCM uses what is called CTL (Certificate Trust List). It can be compared to a browsers certificate store. It is essentially a list, containing the public keys of CUCM, TFTP and CAPF. A phone will need to trust all these PKI roots. It can only be created with the CTL client software and you will need at least two Cisco Security tokens for it. Now I have to debunk the popular believe that it is is possible to implement secure signalling and media stream encryption without the tokens and a CTL file/client; NOT TRUE, without it you will not be able to put the cluster into Mixed mode, which is required to run sRTP.

The way the CTL file gets created is as follows (see picture below). The CUCM, TFTP and CAPF all introduce their self signed certificate into the client.

CTL Client CTL file creation

The CTL client will then sign the file using its private key that is in the security Token. Additionally the CTL file will also contain the Cisco Manufacturing CA public key, this because the MICs and the certificate of the security tokens (storing the keys that are used by the Cisco CTL client) are issued by a manufacturing CA. To allow the phone to verify the certificates issued by this Cisco CA, the phone needs the certificate of the Cisco Manufacturing CA.

Once this file is created the phones will do the initial CTL file download at next boot. This transfer is secure, because the phones have no way to verify the authenticity of any of the certs in the CTL file.

After the CTL file is loaded onto the phone, either the MIC or an LSC (Locally signed certficate) are used for two way authentication on the phone. I will go into LSC enrollment in Part 2 of this post.

SUMMARY

OK, If you are still with me I think we are getting there. So far I have discussed the basic principles of security (authenticity, confidentiality, integrity and non-repudiation). I have also explained PKI which forms the basis of secure signalling between phones and CUCM. Between a CUCM and a VGW (IP leg), IPSEC is used to secure signalling. Secure signalling allows the secure exchange of session keys that are used to secure the media stream.

Secure signaling and Media exchange Part I