ZRTP
ZRTP
Zimmermann
Request for Comments: 6189 Zfone Project
Category: Informational A. Johnston, Ed.
ISSN: 2070-1721 Avaya
J. Callas
Apple, Inc.
April 2011
Abstract
Copyright Notice
Copyright (c) 2011 IETF Trust and the persons identified as the
document authors. All rights reserved.
Table of Contents
1. Introduction ....................................................4
2. Terminology .....................................................5
3. Overview ........................................................6
3.1. Key Agreement Modes ........................................7
3.1.1. Diffie-Hellman Mode Overview ........................7
3.1.2. Preshared Mode Overview .............................9
3.1.3. Multistream Mode Overview ...........................9
4. Protocol Description ...........................................10
4.1. Discovery .................................................10
4.1.1. Protocol Version Negotiation .......................11
4.1.2. Algorithm Negotiation ..............................13
4.2. Commit Contention .........................................14
4.3. Matching Shared Secret Determination ......................15
4.3.1. Calculation and Comparison of Hashes of
Shared Secrets .....................................17
4.3.2. Handling a Shared Secret Cache Mismatch ............18
4.4. DH and Non-DH Key Agreements ..............................19
4.4.1. Diffie-Hellman Mode ................................19
4.4.1.1. Hash Commitment in Diffie-Hellman Mode ....20
4.4.1.2. Responder Behavior in
Diffie-Hellman Mode .......................21
4.4.1.3. Initiator Behavior in
Diffie-Hellman Mode .......................22
4.4.1.4. Shared Secret Calculation for DH Mode .....22
4.4.2. Preshared Mode .....................................25
4.4.2.1. Commitment in Preshared Mode ..............25
4.4.2.2. Initiator Behavior in Preshared Mode ......26
4.4.2.3. Responder Behavior in Preshared Mode ......26
4.4.2.4. Shared Secret Calculation for
Preshared Mode ............................27
1. Introduction
2. Terminology
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and
"OPTIONAL" in this document are to be interpreted as described in
[RFC2119].
3. Overview
Both ZRTP endpoints begin the ZRTP exchange by sending a ZRTP Hello
message to the other endpoint. The purpose of the Hello message is
to confirm that the endpoint supports the protocol and to see what
algorithms the two ZRTP endpoints have in common.
The Hello message contains the SRTP configuration options and the
ZID. Each instance of ZRTP has a unique 96-bit random ZRTP ID or ZID
that is generated once at installation time. ZIDs are discovered
during the Hello message exchange. The received ZID is used to look
up retained shared secrets from previous ZRTP sessions with the
endpoint.
Hello and other ZRTP messages also contain a hash image that is used
to link the messages together. This allows rejection of false ZRTP
messages injected during an exchange.
After both endpoints exchange Hello and HelloACK messages, the key
agreement exchange can begin with the ZRTP Commit message. ZRTP
supports a number of key agreement modes including both Diffie-
Hellman and non-Diffie-Hellman modes as described in the following
sections.
The Commit message may be sent immediately after both endpoints have
completed the Hello/HelloACK discovery handshake, or it may be
deferred until later in the call, after the participants engage in
some unencrypted conversation. The Commit message may be manually
activated by a user interface element, such as a GO SECURE button,
which becomes enabled after the Hello/HelloACK discovery phase. This
emulates the user experience of a number of secure phones in the
Public Switched Telephone Network (PSTN) world [comsec]. However, it
is expected that most simple ZRTP user agents will omit such buttons
and proceed directly to secure mode by sending a Commit message
immediately after the Hello/HelloACK handshake.
An example ZRTP call flow is shown in Figure 1. Note that the order
of the Hello/HelloACK exchanges in F1/F2 and F3/F4 may be reversed.
That is, either Alice or Bob might send the first Hello message.
Note that the endpoint that sends the Commit message is considered
the initiator of the ZRTP session and drives the key agreement
exchange. The Diffie-Hellman public values are exchanged in the
DHPart1 and DHPart2 messages. SRTP keys and salts are then
calculated.
The initiator needs to generate its ephemeral key pair before sending
the Commit, and the responder generates its key pair before sending
DHPart1.
Alice Bob
| |
| Alice and Bob establish a media session. |
| They initiate ZRTP on media ports |
| |
| F1 Hello (version, options, Alice’s ZID) |
|-------------------------------------------------->|
| HelloACK F2 |
|<--------------------------------------------------|
| Hello (version, options, Bob’s ZID) F3 |
|<--------------------------------------------------|
| F4 HelloACK |
|-------------------------------------------------->|
| |
| Bob acts as the initiator. |
| |
| Commit (Bob’s ZID, options, hash value) F5 |
|<--------------------------------------------------|
| F6 DHPart1 (pvr, shared secret hashes) |
|-------------------------------------------------->|
| DHPart2 (pvi, shared secret hashes) F7 |
|<--------------------------------------------------|
| |
| Alice and Bob generate SRTP session key. |
| |
| F8 Confirm1 (MAC, D,A,V,E flags, sig) |
|-------------------------------------------------->|
| Confirm2 (MAC, D,A,V,E flags, sig) F9 |
|<--------------------------------------------------|
| F10 Conf2ACK |
|-------------------------------------------------->|
| SRTP begins |
|<=================================================>|
| |
The ZRTP Confirm1 and Confirm2 messages are sent for a number of
reasons, not the least of which is that they confirm that all the key
agreement calculations were successful and thus the encryption will
work. They also carry other information such as the Disclosure flag
(D), the Allow Clear flag (A), the SAS Verified flag (V), and the
Private Branch Exchange (PBX) Enrollment flag (E). All flags are
encrypted to shield them from a passive observer.
4. Protocol Description
ZRTP MUST be multiplexed on the same ports as the RTP media packets.
In all key agreement modes, the initiator SHOULD NOT send RTP media
after sending the Commit message, and it MUST NOT send SRTP media
before receiving either the Conf2ACK or the first SRTP media (with a
valid SRTP auth tag) from the responder. The responder SHOULD NOT
send RTP media after receiving the Commit message, and MUST NOT send
SRTP media before receiving the Confirm2 message.
4.1. Discovery
The Hello message includes the ZRTP version, Hash Type, Cipher Type,
SRTP authentication tag type, Key Agreement Type, and Short
Authentication String (SAS) algorithms that are supported. The Hello
message also includes a hash image as described in Section 9. In
addition, each endpoint sends and discovers ZIDs. The received ZID
is used later in the protocol as an index into a cache of shared
secrets that were previously negotiated and retained between the two
parties.
Each party declares what version of the ZRTP protocol they support
via the version field in the Hello message (Section 5.2). If both
parties have the same version number in their Hello messages, they
can proceed with the rest of the protocol. To facilitate both
parties reaching this state of protocol version agreement in their
Hello messages, ZRTP should use information provided in the signaling
layer, if available. If a ZRTP endpoint supports more than one
version of the protocol, it SHOULD declare them all in a list of SIP
SDP a=zrtp-hash attributes (defined in Section 8), listing separate
hashes, with separate ZRTP version numbers in each item in the list.
Both parties should inspect the list of ZRTP version numbers supplied
by the other party in the SIP SDP a=zrtp-hash attributes. Both
parties SHOULD choose the highest version number that appears in both
parties’ list of a=zrtp-hash version numbers, and use that version
for their Hello messages. If both parties use the SIP signaling in
this manner, their initial Hello messages will have the same ZRTP
version number, provided they both have at least one supported
protocol version in common. Before the ZRTP key agreement can
proceed, an endpoint MUST have sent and received Hellos with the same
protocol version.
The above comparisons are iterated until the version numbers match,
or until it exits on a failure to match.
For example, assume that Alice supports protocol versions 1.10 and
2.00, and Bob supports versions 1.10 and 1.20. Alice initially
sends a Hello with version 2.00, and Bob initially sends a Hello
with version 1.20. Bob ignores Alice’s 2.00 Hello and continues
to send his 1.20 Hellos. Alice detects that Bob does not support
2.00 and she stops sending her 2.00 Hellos and starts sending a
stream of 1.10 Hellos. Bob sees the 1.10 Hello from Alice and
stops sending his 1.20 Hellos and switches to sending 1.10 Hellos.
At that point, they have converged on using version 1.10 and the
protocol proceeds on that basis.
If both endpoints follow this method, they may each start their DH
calculations as soon as they receive the Hello message, and there
will be no need for either endpoint to discard their DH calculation
if the other endpoint becomes the initiator.
This method is used only to negotiate DH key size. For the rest of
the algorithm choices, it’s simply whatever the initiator selects
from the algorithms in common. Note that the DH key size influences
the Hash Type and the size of the symmetric cipher key, as explained
in Section 5.1.5.
o If the two Commits are both Preshared mode, and one party has set
the MiTM (M) flag in the Hello message and the other has not, the
Commit message from the party who set the (M) flag MUST be
discarded, and the one who has not set the (M) flag becomes the
initiator, regardless of the nonce values. In other words, for
Preshared mode, the phone is the initiator and the PBX is the
responder.
o If the two Commits are either both DH modes or both non-DH modes,
then the Commit message with the lowest hvi (hash value of
initiator) value (for DH Commits), or lowest nonce value (for
non-DH Commits), MUST be discarded and the other side is the
initiator, and the protocol proceeds with the initiator’s Commit.
The two hvi or nonce values are compared as large unsigned
integers in network byte order.
If one Commit is for Multistream mode while the other is for non-
Multistream (DH or Preshared) mode, a software error has occurred and
the ZRTP negotiation should be terminated. This should never occur
In the event that Commit messages are sent by both ZRTP endpoints at
the same time, but are received in different media streams, the same
resolution rules apply as if they were received on the same stream.
The media stream in which the Commit was received or sent will
proceed through the ZRTP exchange while the media stream with the
discarded Commit must wait for the completion of the other ZRTP
exchange.
For both the initiator and the responder, the shared secrets s1, s2,
and s3 will be calculated so that they can all be used later to
calculate s0 in Section 4.4.1.4. Here is how s1, s2, and s3 are
calculated by both parties.
If s1, s2, or s3 have null values, they are assumed to have a zero
length for the purposes of hashing them later during the s0
calculation in Section 4.4.1.4.
From these comparisons, s1, s2, and s3 are calculated per the methods
described in Section 4.3. The secrets corresponding to matching
hashes are kept while the secrets corresponding to the non-matching
ones are replaced with a null, which is assumed to have a zero length
for the purposes of hashing them later. The resulting s1, s2, and s3
values are used later to calculate s0 in Section 4.4.1.4.
For example, consider two ZRTP endpoints who share secrets rs1 and
pbxsecret (defined in Section 7.3.1). During the comparison, rs1ID
and pbxsecretID will match but auxsecretID will not. As a result,
s1 = rs1, s2 will be null, and s3 = pbxsecret.
If one party has a cached shared secret and the other party does not,
this indicates one of two possible situations. Either there is a
MiTM attack or one of the legitimate parties has lost their cached
shared secret by some mishap. Perhaps they inadvertently deleted
their cache or their cache was lost or disrupted due to restoring
their disk from an earlier backup copy. The party that has the
surviving cache entry can easily detect that a cache mismatch has
occurred, because they expect their own cached secret to match the
other party’s cached secret, but it does not match. It is possible
for both parties to detect this condition if both parties have
surviving cached secrets that have fallen out of sync, due perhaps to
one party restoring from a disk backup.
If either party discovers a cache mismatch, the user agent who makes
this discovery must treat this as a possible security event and MUST
alert their own user that there is a heightened risk of a MiTM
attack, and that the user should verbally compare the SAS with the
other party to ascertain that no MiTM attack has occurred. If a
cache mismatch is detected and it is not possible to compare the SAS,
either because the user interface does not support it or because one
or both endpoints are unmanned devices, and no other SAS comparison
mechanism is available, the session MAY be terminated.
Even if the user interface does not permit an SAS comparison, the
human user MUST be warned and may elect to proceed with the call at
their own risk.
The next step is the generation of a secret for deriving SRTP keying
material. ZRTP uses Diffie-Hellman and two non-Diffie-Hellman modes,
described in the following subsections.
where g and p are determined by the Key Agreement Type value. The DH
public value pvi value is formatted as a big-endian octet string and
fixed to the bit-length of the DH prime; leading zeros MUST NOT be
truncated.
Note that the Hello message includes the fields shown in Figure 3.
Upon receipt of the Commit message, the responder generates its own
fresh random DH secret value, svr, and computes the public value.
(Note that to speed up processing, this computation can be done in
advance, with no need to discard this computation if both endpoints
chose the same algorithm via Section 4.1.2.) For guidance on random
number generation, see Section 4.8.
Upon receipt of the DHPart2 message, the responder checks that the
initiator’s DH public value is not equal to 1 or p-1. An attacker
might inject a false DHPart2 message with a value of 1 or p-1 for
g^svi mod p, which would cause a disastrously weak final DH result to
be computed. If pvi is 1 or p-1, the user SHOULD be alerted of the
attack and the protocol exchange MUST be terminated. Otherwise, the
responder computes its own value for the hash commitment using the DH
public value (pvi) received in the DHPart2 message and its own Hello
message and compares the result with the hvi received in the Commit
message. If they are different, a MiTM attack is taking place and
the user is alerted and the protocol exchange terminated.
Upon receipt of the DHPart1 message, the initiator checks that the
responder’s DH public value is not equal to 1 or p-1. An attacker
might inject a false DHPart1 message with a value of 1 or p-1 for
g^svr mod p, which would cause a disastrously weak final DH result to
be computed. If pvr is 1 or p-1, the user should be alerted of the
attack and the protocol exchange MUST be terminated.
A hash of the received and sent ZRTP messages in the current ZRTP
exchange in the following order is calculated by both parties:
Note that only the ZRTP messages (Figures 3, 5, 8, and 9), not the
entire ZRTP packets, are included in the total_hash.
Key | Size of
Agreement | DHResult
------------------------
DH-3072 | 384 octets
------------------------
DH-2048 | 256 octets
------------------------
ECDH P-256 | 32 octets
------------------------
ECDH P-384 | 48 octets
------------------------
The authors believe the calculation of the final shared secret, s0,
is in compliance with the recommendations in Sections 5.8.1 and
6.1.2.1 of NIST SP 800-56A [NIST-SP800-56A]. This is done by hashing
a concatenation of a number of items, including the DHResult, the
ZID’s of the initiator (ZIDi) and the responder (ZIDr), the
total_hash, and the set of non-null shared secrets as described in
Section 4.3.
Note that temporary values s1, s2, and s3 were calculated per the
methods described in Section 4.3. DHResult, s1, s2, and s3 MUST all
be erased from memory immediately after they are used to calculate
s0.
The ZRTP key derivation function (KDF) (Section 4.5.1) requires the
use of a KDF Context field (per [NIST-SP800-108] guidelines), which
should include the ZIDi, ZIDr, and a nonce value known to both
parties. The total_hash qualifies as a nonce value, because its
computation included nonce material from the initiator’s Commit
message and the responder’s Hello message.
The Preshared key agreement mode can be used to generate SRTP keys
and salts without a DH calculation, instead relying on a shared
secret from previous DH calculations between the endpoints.
All of the explicit length fields, len(), in the above hash are 32-
bit big-endian integers, giving the length in octets of the field
that follows. Some members of the set of shared secrets (rs1,
auxsecret, and pbxsecret) may have lengths of zero if they are null
(not available), and are each preceded by a 4-octet length field.
For example, if auxsecret is null, len(auxsecret) is 0x00000000, and
auxsecret itself would be absent from the hash calculation, which
means len(pbxsecret) would immediately follow len(auxsecret).
Note: Since the nonce is used to calculate different SRTP key and
salt pairs for each session, a duplication will result in the same
key and salt being generated for the two sessions, which would
have disastrous security consequences.
The responder uses the received keyID to search for matching key
material in its cache. It does this by computing a preshared_key
value and keyID value using the same formula as the initiator,
depending on what is available in the responder’s local cache. If
the locally computed keyID does not match the received keyID in the
Commit, the responder recomputes a new preshared_key and keyID from a
different subset of shared keys from the cache, dropping auxsecret,
pbxsecret, or both from the hash calculation, until a matching
preshared_key is found or it runs out of possibilities. Note that
rs2 is not included in the process.
In Preshared mode, both the DHPart1 and DHPart2 messages are skipped.
After receiving the Commit message from the initiator, the responder
sends the Confirm1 message after calculating this stream’s SRTP keys,
as described below.
Preshared mode requires that the s0 and ZRTPSess keys be derived from
the preshared_key, and this must be done in a way that guarantees
uniqueness for each session. This is done by using nonce material
from both parties: the explicit nonce in the initiator’s Preshared
Commit message (Figure 7) and the H3 field in the responder’s Hello
message (Figure 3). Thus, both parties force the resulting shared
secret to be unique for each session.
A hash of the received and sent ZRTP messages in the current ZRTP
exchange for the current media stream is calculated:
Note that only the ZRTP messages (Figures 3 and 7), not the entire
ZRTP packets, are included in the total_hash.
The ZRTP key derivation function (KDF) (Section 4.5.1) requires the
use of a KDF Context field (per [NIST-SP800-108] guidelines), which
should include the ZIDi, ZIDr, and a nonce value known to both
parties. The total_hash qualifies as a nonce value, because its
computation included nonce material from the initiator’s Commit
message and the responder’s Hello message.
At this point in Preshared mode, the two endpoints proceed to the key
derivations of ZRTPSess and the rest of the keys in Section 4.5.2,
now that there is a defined s0.
The Multistream key agreement mode can be used to generate SRTP keys
and salts for additional media streams established between a pair of
endpoints. Multistream mode cannot be used unless there is an active
SRTP session established between the endpoints, which means a ZRTP
Session key is active. This ZRTP Session key can be used to generate
keys and salts without performing another DH calculation. In this
mode, the retained shared secret cache is not used or updated. As a
result, multiple ZRTP Multistream mode exchanges can be processed in
parallel between two endpoints.
Multistream mode is also used to resume a secure call that has gone
clear using a GoClear message as described in Section 4.7.2.1.
Multistream session, a ZRTP endpoint MUST use the same ZID for all
media streams, matching the ZID used in the first media stream.
Note: Since the nonce is used to calculate different SRTP key and
salt pairs for each media stream, a duplication will result in the
same key and salt being generated for the two media streams, which
would have disastrous security consequences.
If both sides send Multistream Commit messages at the same time, the
contention is resolved and the initiator/responder roles are settled
according to Section 4.2, and the protocol proceeds.
A hash of the received and sent ZRTP messages in the current ZRTP
exchange for the current media stream is calculated:
This refers to the Hello and Commit messages for the current media
stream, which is using Multistream mode, not the original media
stream that included a full DH key agreement. Note that only the
ZRTP messages (Figures 3 and 6), not the entire ZRTP packets, are
included in the hash.
The ZRTP key derivation function (KDF) (Section 4.5.1) requires the
use of a KDF Context field (per [NIST-SP800-108] guidelines), which
should include the ZIDi, ZIDr, and a nonce value known to both
parties. The total_hash qualifies as a nonce value, because its
computation included nonce material from the initiator’s Commit
message and the responder’s Hello message.
The current stream’s SRTP keys and salts for the initiator and
responder are calculated using the ZRTP Session Key ZRTPSess and the
nonces implicitly included in the total_hash. The nonces also ensure
that KDF_Context will be unique for each media stream, which is
critical for security. For each additional media stream, a separate
s0 is derived from ZRTPSess via the ZRTP key derivation function
(Section 4.5.1):
Note that the ZRTPSess key was previously derived from material that
also includes a different and more inclusive total_hash from the
entire packet sequence that performed the original DH exchange for
the first media stream in this ZRTP session.
The authors believe the ZRTP KDF is in full compliance with the
recommendations in NIST SP 800-108 [NIST-SP800-108]. Section 7.5 of
the NIST document describes "key separation", which is a security
requirement for the cryptographic keys derived from the same key
derivation key. The keys shall be separate in the sense that the
compromise of some derived keys will not degrade the security
strength of any of the other derived keys or the security strength of
the key derivation key. Strong preimage resistance is provided.
The ZRTP KDF runs the NIST pseudorandom function (PRF) in counter
mode, with only a single iteration of the counter. The NIST PRF is
based on the HMAC function. The ZRTP KDF never has to generate more
than 256 bits (or 384 bits for Suite B applications) of output key
material, so only a single invocation of the HMAC function is needed.
The ZRTP KDF is defined in this manner, per Sections 5 and 5.1 of
[NIST-SP800-108]:
The HMAC in the KDF is keyed by KI, which is a secret key derivation
key that is unknown to the wiretapper (for example, s0). The HMAC is
computed on a concatenated set of nonsecret fields that are defined
as follows. The first field is a 32-bit big-endian integer counter
(i) required by NIST to be included in the HMAC each time the HMAC is
computed, which we have set to the fixed value of 0x000001 because we
only compute the HMAC once. Label is a string of nonzero octets that
identifies the purpose for the derived keying material. The octet
0x00 is a delimiter required by NIST. The NIST KDF formula has a
"Context" field that includes ZIDi, ZIDr, and some optional nonce
material known to both parties. L is a 32-bit big-endian positive
integer, not to exceed the length in bits of the output of the HMAC.
The output of the KDF is truncated to the leftmost L bits. If SHA-
384 is the negotiated hash algorithm, the HMAC would be HMAC-SHA-384;
thus, the maximum value of L would be 384, the negotiated hash
length.
The ZRTP KDF is not to be confused with the SRTP KDF defined in
[RFC3711].
Both DH mode and Preshared mode (but not Multistream mode) come to
this common point in the protocol to derive ZRTPSess and the SAS from
s0, via the ZRTP Key Derivation Function (Section 4.5.1). At this
point, s0 has been calculated, as well as KDF_Context. These
calculations are done only for the first media stream, not for
Multistream mode.
The ZRTPSess key is used only for these two purposes: 1) to generate
the additional s0 keys (Section 4.4.3.2) for adding additional media
streams to this session in Multistream mode, and 2) to generate the
pbxsecret (Section 7.3.1) that may be cached for use in future
sessions. The ZRTPSess key is kept for the duration of the call
signaling session between the two ZRTP endpoints. That is, if there
are two separate calls between the endpoints (in SIP terms, separate
SIP dialogs), then a ZRTP Session Key MUST NOT be used across the two
call signaling sessions. ZRTPSess MUST be destroyed no later than
the end of the call signaling session.
Note that KDF_Context is unique for each media stream, but only the
first media stream is permitted to calculate ZRTPSess.
Despite the exposure of the SAS to the two parties, the rest of the
keying material is protected by the key separation properties of the
KDF (Section 4.5.1).
MAY be derived and "exported" from the ZRTP protocol and provided as
a shared secret to the VoIP client for these non-VoIP purposes. The
application can use this exported key in application-specific ways,
outside the scope of the ZRTP protocol.
The application may use this exported key to derive other subkeys for
various non-ZRTP purposes, via a KDF using separate KDF label strings
defined by the application. This key or its derived subkeys can be
used for encryption, or used to authenticate other key exchanges
carried out by the application, protected by ZRTP’s MiTM defense
umbrella. The exported key and its descendants may be used for as
long as needed by the application, maintained in a separate crypto
context that may outlast the VoIP session.
DH mode, Multistream mode, and Preshared mode all come to this common
point in the protocol to derive a set of keys from s0. It can be
assumed that s0 has been calculated, as well the ZRTPSess key and
KDF_Context. A separate s0 key is associated with each media stream.
Subkeys are not drawn directly from s0, as done in NIST SP 800-56A.
To enhance key separation, ZRTP uses s0 to key a Key Derivation
Function (Section 4.5.1) based on [NIST-SP800-108]. Since s0 already
included total_hash in its derivation, it is redundant to use
total_hash again in the KDF Context in all the invocations of the KDF
keyed by s0. Nonetheless, NIST SP 800-108 always requires KDF
Context to be defined for the KDF, and nonce material is required in
some KDF invocations (especially for Multistream mode and Preshared
mode), so total_hash is included as a nonce in the KDF Context.
Separate SRTP master keys and master salts are derived for use in
each direction for each media stream. Unless otherwise specified,
ZRTP uses SRTP with no Master Key Identifier (MKI), 32-bit
authentication using HMAC-SHA1, AES-CM 128 or 256-bit key length,
112-bit session salt key length, 2^48 key derivation rate, and SRTP
prefix length 0. Secure RTCP (SRTCP) is also used, deriving the
SRTCP keys from the same master keys and salts as SRTP, using the
mechanisms specified in [RFC3711], without requiring a separate ZRTP
negotiation for RTCP.
The ZRTP initiator encrypts and the ZRTP responder decrypts packets
by using srtpkeyi and srtpsalti, while the ZRTP responder encrypts
and the ZRTP initiator decrypts packets by using srtpkeyr and
srtpsaltr. The SRTP key and salt values are truncated (taking the
leftmost bits) to the length determined by the chosen SRTP profile.
These are generated by:
The MAC keys are the same length as the output of the underlying hash
function in the KDF and are thus generated without truncation. They
are used only by ZRTP and not by SRTP. Different MAC keys are needed
for the initiator and the responder to ensure that GoClear messages
in each direction are unique and can not be cached by an attacker and
reflected back to the endpoint.
ZRTP keys are generated for the initiator and responder to use to
encrypt the Confirm1 and Confirm2 messages. They are truncated to
the same size as the negotiated SRTP key size.
4.6. Confirmation
The Confirm1 and Confirm2 messages (Figure 10) contain the cache
expiration interval (defined in Section 4.9) for the newly generated
retained shared secret. The flagoctet is an 8-bit unsigned integer
made up of these flags: the PBX Enrollment flag (E) defined in
Section 7.3.1, the SAS Verified flag (V) defined in Section 7.1, the
Allow Clear flag (A) defined in Section 4.7.2, and the Disclosure
flag (D) defined in Section 11.
Part of the Confirm1 and Confirm2 messages are encrypted using full-
block Cipher Feedback Mode and contain a 128-bit random Cipher
FeedBack (CFB) Initialization Vector (IV). The Confirm1 and Confirm2
messages also contain a MAC covering the encrypted part of the
Confirm1 or Confirm2 message that includes a string of zeros, the
signature length, flag octet, cache expiration interval, signature
type block (if present), and signature (Section 7.2) (if present).
For the responder:
After receiving the Confirm messages, both parties must now update
their retained shared secret rs1 in their respective caches, provided
the following conditions hold:
For DH mode only, before updating the retained shared secret rs1 in
the cache, each party first discards their old rs2 and copies their
old rs1 to rs2. The old rs1 is saved to rs2 because of the risk of
session interruption after one party has updated his own rs1 but
before the other party has enough information to update her own rs1.
If that happens, they may regain cache sync in the next session by
using rs2 (per Section 4.3). This mitigates the well-known Two
Generals’ Problem [Byzantine]. The old rs1 value is not saved in
Preshared mode.
For DH mode and Preshared mode, both parties compute a new rs1 value
from s0 via the ZRTP key derivation function (Section 4.5.1):
Note that KDF_Context is unique for each media stream, but only the
first media stream is permitted to update rs1.
Each media stream has its own s0. At this point in the protocol for
each media stream, the corresponding s0 MUST be erased.
4.7. Termination
Because no key agreement has been reached, the Error message cannot
use the same MAC protection as the GoClear message. A denial of
service is possible by injecting fake Error messages. (However, even
if the Error message were somehow designed with integrity protection,
it would raise other questions. What would a badly formed Error
message mean if it were sent to report a badly formed message? A
good message?)
Both of these MACs are calculated across the 8-octet "GoClear "
Message Type Block, including the trailing space.
After sending a GoClear message, the ZRTP endpoint stops sending SRTP
packets. When a ClearACK is received, the ZRTP endpoint deletes the
crypto context for the SRTP session, as defined in Section 4.7.2.1,
and may then resume sending RTP packets.
After the users have transitioned from SRTP media back to RTP media
(clear mode), they may decide later to return to secure mode by
manual activation, usually by pressing a GO SECURE button. In that
case, a new secure session is initiated by the party that presses the
button, by sending a new Commit message, leading to a new session key
negotiation. It is not necessary to send another Hello message, as
the two parties have already done that at the start of the call and
thus have already discovered each other’s ZRTP capabilities. It is
possible for users to toggle back and forth between clear and secure
modes multiple times in the same session, just as they could in the
old days of secure PSTN phones.
All SRTP session key material MUST be erased by the receiver of the
GoClear message upon receiving a properly authenticated GoClear. The
same key destruction MUST be done by the sender of GoClear message,
upon receiving the ClearACK. This must be done for the key material
for all of the media streams.
All key material that would have been erased at the end of the SIP
session MUST be erased, as described in Section 4.7.3, with the
single exception of ZRTPSess. In this case, ZRTPSess is destroyed in
a manner different from the other key material. Both parties replace
ZRTPSess with a KDF-derived non-invertible function of itself:
All SRTP session key material MUST be erased by both parties at the
end of the call. In particular, the destroyed key material includes
the SRTP session keys and salts, SRTP master keys and salts, and all
material sufficient to reconstruct the SRTP keys and salts, including
ZRTPSess and s0 (although s0 should have been destroyed earlier, in
Section 4.6.1). This must be done for the key material for all of
the media streams. The only exceptions are the cached shared secrets
needed for future sessions, including rs1, rs2, and pbxsecret.
The ZRTP protocol uses random numbers for cryptographic key material,
notably for the DH secret exponents and nonces, which must be freshly
generated with each session. Whenever a random number is needed, all
of the following criteria must be satisfied:
Random numbers MUST be freshly generated, meaning that they must not
have been used in a previous calculation.
Each instance of ZRTP has a unique 96-bit random ZRTP ID, or ZID,
that is generated once at installation time. It is used to look up
retained shared secrets in a local cache. A single global ZID for a
single installation is the simplest way to implement ZIDs. However,
it is specifically not precluded for an implementation to use
multiple ZIDs, up to the limit of a separate one per callee. This
then turns it into a long-lived "association ID" that does not apply
to any other associations between a different pair of parties. It is
a goal of this protocol to permit both options to interoperate
freely. A PBX acting as a trusted man in the middle will also
generate a single ZID and use that ZID for all endpoints behind it,
as described in Section 10.
The ZID should not be hard coded or hard defined in the firmware of a
product. It should be randomly generated by the software and stored
at installation or initialization time. It should be randomly
generated rather than allocated from a preassigned range of ZID
values, because 96 bits should be enough to avoid birthday collisions
in realistic scenarios.
5. ZRTP Messages
All ZRTP messages use the message format defined in Figure 2. All
word lengths referenced in this specification are 32 bits, or 4
octets. All integer fields are carried in network byte order, that
is, most-significant byte (octet) first, commonly known as big-
endian.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 0 0 1|Not Used (set to zero) | Sequence Number |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Magic Cookie ’ZRTP’ (0x5a525450) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Identifier |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| ZRTP Message (length depends on Message Type) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CRC (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Source Identifier is the SSRC number of the RTP stream to which this
ZRTP packet relates. For cases of forking or forwarding, RTP, and
hence ZRTP, may arrive at the same port from several different
sources -- each of these sources will have a different SSRC and may
initiate an independent ZRTP protocol session. SSRC collisions would
be disruptive to ZRTP. SSRC collision handling procedures are
described in Section 4.1.
The hash algorithm and its related MAC algorithm are negotiated via
the Hash Type Block found in the Hello message (Section 5.2) and the
Commit message (Section 5.4).
the hash algorithm that will be used throughout the ZRTP key
exchange, not the hash algorithm to be used in the SRTP
Authentication Tag.
At the time of this writing, the NIST SHA-3 hashes [SHA-3] are not
yet available. NIST is expected to publish SHA-3 in 2012, as a
successor to the SHA-2 hashes in [FIPS-180-3].
ZRTP makes use of message authentication codes (MACs) that are keyed
hashes based on the negotiated Hash Type. For the SHA-2 and SHA-3
hashes, the negotiated MAC is the HMAC based on the negotiated hash.
This MAC function is also used in the ZRTP key derivation function
(Section 4.5.1).
The negotiated Hash Type does not apply to the hash used in the
digital signature defined in Section 7.2. For example, even if the
negotiated Hash Type is SHA-256, the digital signature may use SHA-
384 if an Elliptic Curve Digital Signature Algorithm (ECDSA) P-384
signature key is used. Digital signatures are optional in ZRTP.
A future hash may include its own built-in MAC, not based on the HMAC
construct, for example, the Skein hash function [Skein]. If NIST
chooses such a hash as the SHA-3 winner, Hash Types "N256", and
"N384" will still use the related HMAC as the negotiated MAC. If an
implementer wishes to use Skein and its built-in MAC as the
negotiated MAC, new Hash Types must be used.
While most of the hash and MAC usage in ZRTP is defined by the
negotiated Hash Type (Section 5.1.2), some hashes and MACs must be
precomputed prior to negotiations, and thus cannot have their
algorithms negotiated during the ZRTP exchange. They are implicitly
predetermined to use SHA-256 [FIPS-180-3] and HMAC-SHA-256.
These are the hashes and MACs that MUST use the Implicit hash and MAC
algorithm:
The block cipher algorithm is negotiated via the Cipher Type Block
found in the Hello message (Section 5.2) and the Commit message
(Section 5.4).
All ZRTP endpoints MUST support AES-128 (AES1) and MAY support AES-
192 (AES2), AES-256 (AES3), or other Cipher Types. The Advanced
Encryption Standard is defined in [FIPS-197].
Other block ciphers may be supported that have the same block size
and key sizes as AES. If implemented, they may be used anywhere in
ZRTP or SRTP in place of the AES, in the same modes of operation and
key size. Notably, in counter mode to replace AES-CM in [RFC3711]
and [RFC6188], as well as in CFB mode to encrypt a portion of the
Confirm message (Figure 10) and SASrelay message (Figure 16). ZRTP
endpoints MAY support the TwoFish [TwoFish] block cipher.
ZRTP endpoints MAY support 32-bit and 64-bit SRTP authentication tags
based on the Skein hash function [Skein]. The Skein-512-MAC key
length is fixed at 256 bits for this application, and the output
length is adjustable. The Skein MAC is defined in Sections 2.6 and
4.3 of [Skein] and is not based on the HMAC construct. Reference
implementations for Skein may be found at [Skein1]. A Skein-based
MAC is significantly more efficient than HMAC-SHA1, especially for
short SRTP payloads.
The Skein MAC key is computed by the SRTP key derivation function,
which is also referred to as the AES-CM PRF, or pseudorandom
function. This is defined either in [RFC3711] or in [RFC6188],
Implementers should be aware that AES-GCM and AES-CCM for SRTP are
expected to become available when [SRTP-AES-GCM] is published as an
RFC. If an implementer wishes to use these modes when they become
available, new Auth Tag Types must be added.
All ZRTP endpoints MUST support DH3k, SHOULD support Preshared, and
MAY support EC25, EC38, and DH2k.
The table below lists the pv length in words and DHPart1 and DHPart2
message length in words for each Key Agreement Type Block.
The SAS Type determines how the SAS is rendered to the user so that
the user may verbally compare it with his partner over the voice
channel. This allows detection of a MiTM attack.
All ZRTP endpoints MUST support the base32 and MAY support the
base256 rendering schemes for the Short Authentication String, and
other SAS rendering schemes. See Section 4.5.2 for how the sasvalue
is computed and Section 7 for how the SAS is used.
For the SAS Type of "B32 ", the most-significant (leftmost) 20 bits
of the 32-bit sasvalue are rendered as a form of base32 encoding.
The leftmost 20 bits of the sasvalue results in four base32
characters that are rendered, most-significant quintet first, to both
ZRTP endpoints. Here is a normative pseudocode implementation of the
base32 function:
This base32 encoding scheme differs from RFC 4648, and was designed
(by Bryce Wilcox-O’Hearn) to represent bit sequences in a form that
is convenient for human users to manipulate with minimal ambiguity.
The unusually permuted character ordering was designed for other
applications that use bit sequences that do not end on quintet
boundaries.
Signature | Meaning
Type Block |
------------------------------------------------
"PGP " | OpenPGP Signature, per RFC 4880
|
------------------------------------------------
"X509" | ECDSA, with X.509v3 cert
| per RFC 5759 and FIPS-186-3
------------------------------------------------
All ZRTP messages begin with the preamble value 0x505a, then a 16-bit
length in 32-bit words. This length includes only the ZRTP message
(including the preamble and the length) but not the ZRTP packet
header or CRC. The 8-octet Message Type follows the length field.
o The MiTM flag (M) is a Boolean that is set to true if and only if
this Hello message is sent from a device, usually a PBX, that has
the capability to send an SASrelay message (Section 5.13).
The next 8 bits are unused and SHOULD be set to zero when sent and
MUST be ignored on receipt.
The 64-bit MAC at the end of the message is computed across the whole
message, not including the MAC, using the MAC algorithm defined in
Section 5.1.2.2. The MAC key is the sender’s H2 (defined in
Section 9), and thus the MAC cannot be checked by the receiving party
until the sender’s H2 value is known to the receiving party later in
the protocol.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="Hello " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version="1.10" (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Client Identifier (4 words) |
| |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Hash image H3 (8 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| ZID (3 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0|S|M|P| unused (zeros)| hc | cc | ac | kc | sc |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| hash algorithms (0 to 7 values) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cipher algorithms (0 to 7 values) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| auth tag types (0 to 7 values) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Key Agreement Types (0 to 7 values) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SAS Types (0 to 7 values) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MAC (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="HelloACK" (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The Commit message contains the Message Type Block, then the 256-bit
hash image H2, which is defined in Section 9. The next parameter is
the initiator’s ZID, the 96-bit-long unique identifier for the ZRTP
endpoint, which MUST have the same value as was used in the Hello
message.
The 64-bit MAC at the end of the message is computed across the whole
message, not including the MAC, using the MAC algorithm defined in
Section 5.1.2.2. The MAC key is the sender’s H1 (defined in
Section 9), and thus the MAC cannot be checked by the receiving party
until the sender’s H1 value is known to the receiving party later in
the protocol.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=29 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="Commit " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Hash image H2 (8 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| ZID (3 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| hash algorithm |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cipher algorithm |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| auth tag type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Key Agreement Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SAS Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| hvi (8 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MAC (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=25 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="Commit " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Hash image H2 (8 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| ZID (3 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| hash algorithm |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cipher algorithm |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| auth tag type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Key Agreement Type = "Mult" |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SAS Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| nonce (4 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MAC (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=27 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="Commit " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Hash image H2 (8 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| ZID (3 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| hash algorithm |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cipher algorithm |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| auth tag type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Key Agreement Type = "Prsh" |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| SAS Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| nonce (4 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| keyID (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MAC (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The 64-bit MAC at the end of the message is computed across the whole
message, not including the MAC, using the MAC algorithm defined in
Section 5.1.2.2. The MAC key is the sender’s H0 (defined in
Section 9), and thus the MAC cannot be checked by the receiving party
until the sender’s H0 value is known to the receiving party later in
the protocol.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="DHPart1 " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Hash image H1 (8 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rs1IDr (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rs2IDr (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| auxsecretIDr (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| pbxsecretIDr (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| pvr (length depends on KA Type) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MAC (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The 64-bit MAC at the end of the message is computed across the whole
message, not including the MAC, using the MAC algorithm defined in
Section 5.1.2.2. The MAC key is the sender’s H0 (defined in
Section 9), and thus the MAC cannot be checked by the receiving party
until the sender’s H0 value is known to the receiving party later in
the protocol.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=depends on KA Type |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="DHPart2 " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Hash image H1 (8 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rs1IDi (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rs2IDi (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| auxsecretIDi (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| pbxsecretIDi (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| pvi (length depends on KA Type) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MAC (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The first field inside the encrypted region is the hash preimage H0,
which is defined in detail in Section 9.
The next 15 bits are not used and SHOULD be set to zero when sent and
MUST be ignored when received in Confirm1 or Confirm2 messages.
The next 8 bits are used for flags. Undefined flags are set to zero
and ignored. Four flags are currently defined. The PBX Enrollment
flag (E) is a Boolean bit defined in Section 7.3.1. The SAS Verified
flag (V) is a Boolean bit defined in Section 7.1. The Allow Clear
flag (A) is a Boolean bit defined in Section 4.7.2. The Disclosure
Flag (D) is a Boolean bit defined in Section 11. The cache
expiration interval is defined in Section 4.9.
The responder uses the zrtpkeyr to encrypt the Confirm1 message. The
initiator uses the zrtpkeyi to encrypt the Confirm2 message.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="Confirm1" or "Confirm2" (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| confirm_mac (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| CFB Initialization Vector (4 words) |
| |
| |
+===============================================================+
| |
| Hash preimage H0 (8 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|E|V|A|D|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| cache expiration interval (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| optional signature type block (1 word if present) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| optional signature block (variable length) |
| . . . |
| |
| |
+===============================================================+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="Conf2ACK" (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=4 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="Error " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Integer Error Code (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Defined hexadecimal values for the Error Code are listed in the table
below.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="ErrorACK" (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=5 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="GoClear " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| clear_mac (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="ClearACK" (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The next 15 bits are not used and SHOULD be set to zero when sent,
and they MUST be ignored when received in SASrelay messages.
The next 9 bits contain the signature length. The trusted MiTM MAY
compute a digital signature on the SAS hash, as described in
Section 7.2, using a persistent signing key owned by the trusted
MiTM. If no SAS signature is present, all bits are set to zero. The
signature length is in words and includes the signature type block.
If the calculated signature octet count is not a multiple of 4, zeros
are added to pad it out to a word boundary. If no signature block is
present, the overall length of the SASrelay message will be set to 19
words.
The next 8 bits are used for flags. Undefined flags are set to zero
and ignored. Three flags are currently defined. The Disclosure Flag
(D) is a Boolean bit defined in Section 11. The Allow Clear flag (A)
is a Boolean bit defined in Section 4.7.2. The SAS Verified flag (V)
is a Boolean bit defined in Section 7.1. These flags are updated
values to the same flags provided earlier in the Confirm message, but
they are updated to reflect the new flag information relayed by the
PBX from the other party.
The next 32-bit word contains the SAS rendering scheme for the
relayed sashash, which will be the same rendering scheme used by the
other party on the other side of the trusted MiTM. Section 7.3
describes how the PBX determines whether the ZRTP client regards the
PBX as a trusted MiTM. If the PBX determines that the ZRTP client
trusts the PBX, the next 8 words contain the sashash relayed from the
other party. The first 32-bit word of the sashash contains the
sasvalue, which may be rendered to the user using the specified SAS
rendering scheme. If this SASrelay message is being sent to a ZRTP
client that does not trust this MiTM, the sashash will be ignored by
the recipient and should be set to zeros by the PBX.
Depending on whether the trusted MiTM had taken the role of the
initiator or the responder during the ZRTP key negotiation, the
SASrelay message is encrypted with zrtpkeyi or zrtpkeyr.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=variable |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="SASrelay" (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| MAC (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| CFB Initialization Vector (4 words) |
| |
| |
+===============================================================+
| Unused (15 bits of zeros) | sig len (9 bits)|0 0 0 0|0|V|A|D|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| rendering scheme of relayed SAS (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Trusted MiTM relayed sashash (8 words) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| optional signature type block (1 word if present) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| optional signature block (variable length) |
| . . . |
| |
| |
+===============================================================+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=3 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="RelayACK" (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
The Ping and PingACK messages are unrelated to the rest of the ZRTP
protocol. No ZRTP endpoint is required to generate a Ping message,
but every ZRTP endpoint MUST respond to a Ping message with a PingACK
message.
Although Ping and PingACK messages have no effect on the rest of the
ZRTP protocol, their inclusion in this specification simplifies the
design of "bump-in-the-wire" ZRTP proxies (Section 10) (notably,
[Zfone]). It enables proxies to be designed that do not rely on
assistance from the signaling layer to map out the associations
between media streams and ZRTP endpoints.
Before sending a ZRTP Hello message, a ZRTP proxy MAY send a Ping
message as a means to sort out which RTP media streams are connected
to particular ZRTP endpoints. Ping messages are generated only by
ZRTP proxies. If neither party is a ZRTP proxy, no Ping messages
will be encountered. Ping retransmission behavior is discussed in
Section 6.
The Ping message contains a version number that defines what version
of PingACK is requested. If that version number is supported by the
Ping responder, a PingACK with a format that matches that version
will be received. Otherwise, a PingACK with a lower version number
may be received.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=6 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="Ping " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version="1.10" (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EndpointHash (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|0 1 0 1 0 0 0 0 0 1 0 1 1 0 1 0| length=9 words |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Message Type Block="PingACK " (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| version="1.10" (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EndpointHash of PingACK Sender (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| EndpointHash of Received Ping (2 words) |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Source Identifier (SSRC) of Received Ping (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
6. Retransmissions
Practical experience has shown that RTP packet loss at the start of
an RTP session can be extremely high. Since the entire ZRTP message
exchange occurs during this period, the defined retransmission scheme
ZRTP endpoints MUST NOT exceed the bandwidth of the resulting media
session as determined by the offer/answer exchange in the signaling
layer.
The Ping message (Section 5.15) may follow the same retransmission
schedule as the Hello message, but this is not required in this
specification. Ping message retransmission is subject to
application-specific ZRTP proxy heuristics.
The retry schedule must handle not only packet loss, but also slow or
heavily loaded peers that need additional time to perform their DH
calculations. The following mitigations are recommended:
the media relay might be converting from TCP to UDP. There have been
empirical observations of this in the wild. In cases where TCP is
used, ZRTP and TCP might together generate some extra
retransmissions. It is tempting to avoid this effect by eliminating
the ZRTP retransmission schedule when connected to a TCP channel, but
that would risk failure of the protocol, because it may not be TCP
all the way to the remote ZRTP endpoint. It only takes a few packets
to complete a ZRTP exchange, so trying to optimize out the extra
retransmissions in that scenario is not worth the risk.
There is only one SAS value computed per call. That is the SAS value
for the first media stream established, which is calculated in
Section 4.5.2. This SAS applies to all media streams for the same
session.
The SAS Verified flag (V) is set based on the user indicating that
SAS comparison has been successfully performed. The SAS Verified
flag is exchanged securely in the Confirm1 and Confirm2 messages
(Figure 10) of the next session. In other words, each party sends
the SAS Verified flag from the previous session in the Confirm
message of the current session. It is perfectly reasonable to have a
ZRTP endpoint that never sets the SAS Verified flag, because it would
require adding complexity to the user interface to allow the user to
set it. The SAS Verified flag is not required to be set, but if it
is available to the client software, it allows for the possibility
that the client software could render to the user that the SAS verify
procedure was carried out in a previous session.
If at any time the users carry out the SAS comparison procedure, and
it actually fails to match, then this means there is a very
resourceful MiTM. If this is the first call, the MiTM was there on
the first call, which is impressive enough. If it happens in a later
call, it also means the MiTM must also know the cached shared secret,
because you could not have carried out any voice traffic at all
unless the session key was correctly computed and is also known to
the attacker. This implies the MiTM must have been present in all
the previous sessions, since the initial establishment of the first
shared secret. This is indeed a resourceful attacker. It also means
that if at any time he ceases his participation as a MiTM on one of
your calls, the protocol will detect that the cached shared secret is
no longer valid -- because it was really two different shared secrets
all along, one of them between Alice and the attacker, and the other
between the attacker and Bob. The continuity of the cached shared
secrets makes it possible for us to detect the MiTM when he inserts
himself into the ongoing relationship, as well as when he leaves.
Also, if the attacker tries to stay with a long lineage of calls, but
fails to execute a DH MiTM attack for even one missed call, he is
permanently excluded. He can no longer resynchronize with the chain
of cached shared secrets.
Note that the choice of hash algorithm used in the digital signature
is independent of the hash used in the sashash. The sashash is
determined by the negotiated Hash Type (Section 5.1.2), while the
hash used by the digital signature is separately defined by the
digital signature algorithm. For example, the sashash may be based
on SHA-256, while the digital signature might use SHA-384, if an
ECDSA P-384 key is used.
Both ECDSA and DSA [FIPS-186-3] have a feature that allows most of
the signature calculation to be done in advance of the session,
reducing latency during call setup. This is useful for low-power
mobile handsets.
may result. Some firewalls and NATs may discard fragmented UDP
packets, which would cause the ZRTP exchange to fail. It is
RECOMMENDED that a ZRTP endpoint avoid sending signatures if they
would cause UDP fragmentation. For a discussion on MTU size and PMTU
discovery, see [RFC1191] and [RFC1981].
The first field after the 4-octet Signature Type Block is the OpenPGP
signature. The format of this signature and the algorithms that
create it are specified by [RFC4880]. The signature is comprised of
a complete OpenPGP version 4 signature in binary form (not Radix-64),
as specified in RFC 4880, Section 5.2.3, enclosed in the full OpenPGP
packet syntax. The length of the OpenPGP signature is parseable from
the signature, and depends on the type and length of the signing key.
The total length of all the material in Figure 20, including the key
server URI, must not exceed 511 32-bit words (2044 octets). This
length, in words, is stored in the signature length field in the
Confirm or SASrelay message containing the signature. It is
desirable to avoid UDP fragmentation, so the URI should be kept
short.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Signature Type Block = "PGP " (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| OpenPGP signature |
| (variable length) |
| . . . |
| |
+===============================================================+
The first field after the 4-octet Signature Type Block is the DER
encoded X.509v3 certificate (the signed public key) of the ECDSA
signing key that created the signature. The format of this
certificate is specified by the NSA’s Suite B Certificate and CRL
Profile [RFC5759].
The total length of all the material in Figure 21, including the
X.509v3 certificate, must not exceed 511 32-bit words (2044 octets).
This length, in words, is stored in the signature length field in the
Confirm or SASrelay message containing the signature. It is
desirable to avoid UDP fragmentation, so the certificate material
should be kept to a much smaller size than this. End user certs
issued for this purpose should minimize the size of extraneous
material such as legal notices.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Signature Type Block = "X509" (1 word) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| Signing key’s X.509v3 certificate |
| (variable length) |
| . . . |
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
| ECDSA P-256 or P-384 signature |
| (16 words or 24 words) |
| . . . |
| |
+===============================================================+
It’s not strictly necessary to use a PKI to back the public key that
signs the SAS. For example, it is possible to use a self-signed
X.509v3 certificate or an OpenPGP key that is not signed by any other
key. In this scenario, the same key continuity technique used by SSH
[RFC4251] may be used. The public key is cached locally the first
time it is encountered, and when the same public key is encountered
again in subsequent sessions, it’s deemed not to be a MiTM attack.
If there is no MiTM attack in the first session, there cannot be a
MiTM attack in any subsequent session. This is exactly how SSH does
it.
Of course, the security rests on the assumption that the MiTM did not
attack in the first session. That assumption seems to work most of
the time in the SSH world. The user would have to be warned the
first time a public key is encountered, just as in SSH. If possible,
the SAS should be checked before the user consents to caching the new
public key. If the SAS matches in the first session, there is no
MiTM, and it’s safe to cache the public key. If no SAS comparison is
possible, it’s up to the user, or up to the application, to decide
whether to take a leap of faith and proceed. That’s how SSH works
most of the time, because SSH users don’t have the chance to verbally
compare an SAS with anyone.
For example, imagine that Bob has a ZRTP-enabled VoIP phone that has
been registered with his company’s PBX, so that it is regarded as an
extension of the PBX. Alice, whose phone is not associated with the
PBX, might dial the PBX from the outside, and a ZRTP connection is
negotiated between her phone and the PBX. She then selects Bob’s
extension from the company directory in the PBX. The PBX makes a
call to Bob’s phone (which might be offsite, many miles away from the
PBX through the Internet) and a separate ZRTP connection is
negotiated between the PBX and Bob’s phone. The two ZRTP sessions
have different session keys and different SASs, which would render
the SAS useless for verbal comparison between Alice and Bob. They
might even mistakenly believe that a wiretapper is present because of
the SAS mismatch, causing undue alarm.
ZRTP has a mechanism for solving this problem by having the PBX relay
the Alice/PBX SAS to Bob, sending it through to Bob in a special
SASrelay message as defined in Section 5.13, which is sent after the
PBX/Bob ZRTP negotiation is complete, after the Confirm messages.
Only the PBX, acting as a special trusted MiTM (trusted by the
recipient of the SASrelay message), will relay the SAS. The SASrelay
message protects the relayed SAS from tampering via an included MAC,
similar to how the Confirm message is protected. Bob’s ZRTP-enabled
phone accepts the relayed SAS for rendering only because Bob’s phone
had previously been configured to trust the PBX. This special
trusted relationship with the PBX can be established through a
special security enrollment procedure (Section 7.3.1). After that
enrollment procedure, the PBX is treated by Bob as a special trusted
MiTM. This results in Alice’s SAS being rendered to Bob, so that
Alice and Bob may verbally compare them and thus prevent a MiTM
attack by any other untrusted MiTM.
The trusted MiTM key can be stored in a special cache at the time of
the initial enrollment (which is carried out only once for Bob’s
phone), and Bob’s phone associates this key with the ZID of the PBX,
while the PBX associates it with the ZID of Bob’s phone. After the
enrollment has established and stored this trusted MiTM key, it can
be detected during subsequent ZRTP session negotiations between the
PBX and Bob’s phone, because the PBX and the phone MUST pass the hash
of the trusted MiTM key in the DH message. It is then used as part
of the key agreement to calculate s0.
The PBX can determine whether it is trusted by the ZRTP user agent of
a phone. The presence of a shared trusted MiTM key in the key
negotiation sequence indicates that the phone has been enrolled with
this PBX and therefore trusts it to act as a trusted MiTM. During a
key agreement with two other ZRTP endpoints, the PBX may have a
shared trusted MiTM key with both endpoints, only one endpoint, or
neither endpoint. If the PBX has a shared trusted MiTM key with
neither endpoint, the PBX MUST NOT relay the SAS. If the PBX has a
shared trusted MiTM key with only one endpoint, the PBX MUST relay
the SAS from one party to the other by sending an SASrelay message to
the endpoint with which it shares a trusted MiTM key. If the PBX has
a (separate) shared trusted MiTM key with each of the endpoints, the
PBX MUST relay the SAS to only one endpoint, not both endpoints.
Note: In the case of a PBX sharing trusted MiTM keys with both
endpoints, it does not matter which endpoint receives the relayed
SAS as long as only one endpoint receives it.
The relayed SAS fields contain the SAS rendering type and the
complete sashash. The receiver absolutely MUST NOT render the
relayed SAS if it does not come from a specially trusted ZRTP
endpoint. The security of the ZRTP protocol depends on not rendering
a relayed SAS from an untrusted MiTM, because it may be relayed by a
MiTM attacker. See the SASrelay message definition (Figure 16) for
further details.
To ensure that both Alice and Bob will use the same SAS rendering
scheme after the keys are negotiated, the PBX also sends the SASrelay
message to the unenrolled party (which does not regard this PBX as a
trusted MiTM), conveying the SAS rendering scheme, but not the
sashash, which it sets to zero. The unenrolled party will ignore the
relayed SAS field, but will use the specified SAS rendering scheme.
Both the PBX and the endpoint need to know when enrollment is taking
place. One way of doing this is to set up an enrollment extension on
the PBX that a newly configured endpoint would call and establish a
ZRTP session. The PBX would then play audio media that offers the
user an opportunity to configure his phone to trust this PBX as a
trusted MiTM. The PBX calculates and stores the trusted MiTM shared
secret in its cache and associates it with this phone, indexed by the
phone’s ZID. The trusted MiTM PBX shared secret is derived from
ZRTPSess via the ZRTP key derivation function (Section 4.5.1) in this
manner:
The pbxsecret is calculated for the whole ZRTP session, not for each
stream within a session, thus the KDF Context field in this case does
not include any stream-specific nonce material.
The PBX signals the enrollment process by setting the PBX Enrollment
flag (E) in the Confirm message (Figure 10). This flag is used to
trigger the ZRTP endpoint’s user interface to prompt the user to see
if it wants to trust this PBX and calculate and store the pbxsecret
in the cache. If the user decides to respond by activating the
appropriate user interface element (a menu item, checkbox, or
button), his ZRTP user agent calculates pbxsecret using the same
formula, and saves it in a special cache entry associated with this
PBX.
After this enrollment process, the PBX and the ZRTP-enabled phone
both share a secret that enables the phone to recognize the PBX as a
trusted MiTM in future calls. This means that when a future call
from an outside ZRTP-enabled caller is relayed through the PBX to
this phone, the phone will render a relayed SAS from the PBX. If the
SASrelay message comes from a MiTM that does not know the pbxsecret,
the phone treats it as a bad-guy MiTM, and refuses to render the
relayed SAS. Regardless of which party initiates any future phone
calls through the PBX, the enrolled phone or the outside phone, the
PBX will relay the SAS to the enrolled phone.
PBX. That would enable the malevolent MiTM to wiretap all future
calls without arousing suspicion, because he would appear to be
trusted.
8. Signaling Interactions
This section discusses how ZRTP, SIP, and SDP work together.
Note that ZRTP may be implemented without coupling with the SIP
signaling. For example, ZRTP can be implemented as a "bump in the
wire" or as a "bump in the stack" in which RTP sent by the SIP User
Agent (UA) is converted to ZRTP. In these cases, the SIP UA will
have no knowledge of ZRTP. As a result, the signaling path discovery
mechanisms introduced in this section should not be definitive --
they are a hint. Despite the absence of an indication of ZRTP
support in an offer or answer, a ZRTP endpoint SHOULD still send
Hello messages.
ZRTP endpoints that have control over the signaling path include a
ZRTP SDP attributes in their SDP offers and answers. The ZRTP
attribute, a=zrtp-hash, is used to indicate support for ZRTP and to
convey a hash of the Hello message. The hash is computed according
to Section 8.1.
zrtp-version = token
zrtp-hash-value = 1*(HEXDIG)
v=0
o=bob 2890844527 2890844527 IN IP4 client.biloxi.example.com
s=
c=IN IP4 client.biloxi.example.com
t=0 0
m=audio 3456 RTP/AVP 97 33
a=rtpmap:97 iLBC/8000
a=rtpmap:33 no-op/8000
<allOneLine>
a=zrtp-hash:1.10 fe30efd02423cb054e50efd0248742ac7a52c8f91bc2
df881ae642c371ba46df
</allOneLine>
8.1. Binding the Media Stream to the Signaling Layer via the Hello Hash
Tying the media stream to the signaling channel can help prevent a
third party from inserting false media packets. If the signaling
layer contains information that ties it to the media stream, false
media streams can be rejected.
After the Hello Hash is used to properly identify the ZRTP Hello
message as belonging to this particular SIP call, the rest of the
ZRTP message sequence is protected from false packet injection by
other protection mechanisms, such as the hash chaining mechanism
defined in Section 9.
If and only if the signaling path and the SDP is protected by some
form of end-to-end integrity protection, such as one of the
abovementioned mechanisms, so that it can guarantee delivery of the
a=zrtp-hash attribute without any tampering by a third party, and if
there is good reason to trust the signaling layer to protect the
interests of the end user, it is possible to authenticate the key
exchange and prevent a MiTM attack. This can be done without
requiring the users to verbally compare the SAS, by using the hash
chaining mechanism defined in Section 9 to provide a series of MAC
keys that protect the entire ZRTP key exchange. Thus, an end-to-end
integrity-protected signaling layer automatically enables an
integrity-protected Diffie-Hellman exchange in ZRTP, which in turn
means immunity from a MiTM attack. Here’s how it works.
The Hello message must be assembled before any hash algorithms are
negotiated, so an implicit predetermined hash algorithm and MAC
algorithm (both defined in Section 5.1.2.2) must be used. All of the
aforementioned MACs keyed by the hashes in the aforementioned hash
chain MUST be computed with the MAC algorithm defined in
Section 5.1.2.2, with the MAC truncated to 64 bits.
8.2. Deriving the SRTP Secret (srtps) from the Signaling Layer
The shared secret calculations defined in Section 4.3 make use of the
SRTP secret (srtps), if it is provided by the signaling layer.
ZRTP computes srtps from the SRTP master key and salt parameters
provided by the signaling layer in this manner, truncating the result
to 256 bits:
When voice is compressed with a VBR codec, the packet lengths vary
depending on the types of sounds being compressed. This leaks a lot
of information about the content even if the packets are encrypted,
regardless of what encryption protocol is used [Wright1]. It is
RECOMMENDED that VBR codecs be avoided in encrypted calls. It is not
a problem if the codec adapts the bitrate to the available channel
bandwidth. The vulnerable codecs are the ones that change their
bitrate depending on the type of sound being compressed.
The security problems of VBR and VAD are addressed in detail by the
guidelines in [VBR-AUDIO]. It is RECOMMENDED that ZRTP endpoints
follow these guidelines.
An attacker who is not in the media path may attempt to inject false
ZRTP protocol packets, possibly to effect a denial-of-service attack
or to inject his own media stream into the call. VoIP, by its
nature, invites various forms of denial-of-service attacks and
requires protocol features to reject such attacks. While bogus SRTP
packets may be easily rejected via the SRTP auth tag field, that can
only be applied after a key agreement is completed. During the ZRTP
key negotiation phase, other false packet rejection mechanisms are
needed. One such mechanism is the use of the total_hash in the final
shared secret calculation, but that can only detect false packets
after performing the computationally expensive Diffie-Hellman
calculation.
H1 = hash (H0)
H2 = hash (H1)
H3 = hash (H2)
This one-way hash chain MUST use the hash algorithm defined in
Section 5.1.2.2, truncated to 256 bits. Each 256-bit hash image is
the preimage of the next, and the sequence of images is sent in
reverse order in the ZRTP packet sequence. The hash image H3 is sent
in the Hello message, H2 is sent in the Commit message, H1 is sent in
the DHPart1 or DHPart2 messages, and H0 is sent in the Confirm1 or
Confirm2 messages. The initial random H0 nonces that each party
generates MUST be unpredictable to an attacker and unique within a
ZRTP session, which thereby forces the derived hash images H1-H3 to
also be unique and unpredictable.
The recipient checks if the packet has the correct hash preimage, by
hashing it and comparing the result with the hash image for the
preceding packet. Packets that contain an incorrect hash preimage
MUST NOT be used by the recipient, but they MAY be processed as
security exceptions, perhaps by logging or alerting the user. As
long as these bogus packets are not used, and correct packets are
still being received, the protocol SHOULD be allowed to run to
completion, thereby rendering ineffective this denial-of-service
attack.
Note that since H2 is sent in the Commit message, and the initiator
does not receive a Commit message, the initiator computes the
responder’s missing H2 by hashing the responder’s H1. An analogous
interpolation is performed by both parties to handle the skipped
DHPart1 and DHPart2 messages in Preshared (Section 3.1.2) or
Multistream (Section 3.1.3) modes.
Because these hash images alone do not protect the rest of the
contents of the packet they reside in, this scheme assumes the
attacker cannot modify the packet contents from a legitimate party,
which is a reasonable assumption for an attacker who is not in the
media path. This covers an important range of denial-of-service
attacks. For dealing with the remaining set of attacks that involve
packet modification, other mechanisms are used, such as the
total_hash in the final shared secret calculation, and the hash
commitment in the Commit message.
Some ZRTP user agents allow the user to manually switch to clear mode
(via the GoClear message) in the middle of a secure call, and then
later initiate secure mode again. Many consumer client products will
omit this feature, but those that allow it may return to secure mode
again in the same media stream. Although the same chain of hash
images will be reused and thus rendered ineffective the second time,
no real harm is done because the new SRTP session keys will be
derived in part from a cached shared secret, which was safely
protected from the MiTM in the previous DH exchange earlier in the
same session.
In cases where centralized media mixing is taking place, the SAS will
not match when compared by the humans. This situation can sometimes
be known in the SIP signaling by the presence of the isfocus feature
tag [RFC4579]. As a result, when the isfocus feature tag is present,
the DH exchange can be authenticated by the mechanism defined in
Section 8.1.1 or by validating signatures (Section 7.2) in the
Confirm or SASrelay messages. For example, consider an audio
conference call with three participants Alice, Bob, and Carol hosted
on a conference bridge in Dallas. There will be three ZRTP encrypted
media streams, one encrypted stream between each participant and
Dallas. Each will have a different SAS. Each participant will be
able to validate their SAS with the conference bridge by using
Note that the intention here is to have the Disclosure flag identify
products that are designed to disclose their session keys, not to
identify which particular calls are compromised on a call-by-call
basis. This is an important legal distinction, because most
government sanctioned wiretap regulations require a VoIP service
provider to not reveal which particular calls are wiretapped. But
there is nothing illegal about revealing that a product is designed
to be wiretap-friendly. The ZRTP protocol mandates that such a
product "out" itself.
Of course, a ZRTP implementer can lie about his product having a back
door, but the ZRTP standard mandates that ZRTP-compliant products
MUST adhere to the requirement that a back door be confessed by
sending the Disclosure flag to the other party.
The ZRTP Disclosure flag only governs the ZRTP/SRTP stream itself.
It does not govern the underlying RTP media stream, nor the actual
media itself. Consequently, a PBX that uses ZRTP may provide
conference calls, call monitoring, call recording, voicemail, or
other PBX features and still say that it does not disclose the ZRTP
key material. A video system may provide DVR features and still say
that it does not disclose the ZRTP key material. The ZRTP Disclosure
flag, when not set, means only that the ZRTP cryptographic key
material stays within the bounds of the ZRTP subsystem.
Note also that the ZRTP Disclosure Flag does not require an
implementation to preclude hacking or malware. Malware that leaks
ZRTP cryptographic key material does not create a liability for the
implementer from non-compliance with the ZRTP specification.
The role of the ZID in the management of the local cache of shared
secrets is explained in Section 4.9. A particular ZID is associated
with a particular ZRTP endpoint, typically a VoIP client. A single
If the remote ZID originates from a PBX, the displayed name would be
the name of that PBX, which might be the name of the company who owns
that PBX.
This section discuses how ZRTP meets all RTP security requirements
discussed in the Media Security Requirements [RFC5479] document
without any dependencies on other protocols or extensions, unlike
DTLS-SRTP [RFC5764] which requires additional protocols and
mechanisms.
R-RTP-CHECK is met since the ZRTP packet format does not pass the
RTP validity check.
R-DOS is met since ZRTP does not introduce any new denial-of-
service attacks.
Some questions have been raised about voice spoofing during the short
authentication string (SAS) comparison. But it is a mistake to think
this is simply an exercise in voice impersonation (perhaps this could
be called the "Rich Little" attack). Although there are digital
signal processing techniques for changing a person’s voice, that does
not mean a MiTM attacker can safely break into a phone conversation
and inject his own SAS at just the right moment. He doesn’t know
exactly when or in what manner the users will choose to read aloud
the SAS, or in what context they will bring it up or say it, or even
which of the two speakers will say it, or if indeed they both will
say it. In addition, some methods of rendering the SAS involve using
a list of words such as the PGP word list[Juola2], in a manner
analogous to how pilots use the NATO phonetic alphabet to convey
information. This can make it even more complicated for the
attacker, because these words can be worked into the conversation in
unpredictable ways. If the session also includes video (an
increasingly common usage scenario), the MiTM may be further deterred
by the difficulty of making the lips sync with the voice-spoofed SAS.
The PGP word list is designed to make each word phonetically
distinct, which also tends to create distinctive lip movements.
Remember that the attacker places a very high value on not being
detected, and if he makes a mistake, he doesn’t get to do it over.
A question has been raised regarding the safety of the SAS procedure
for people who don’t know each other’s voices, because it may allow
an attack from a MiTM even if he lacks voice impersonation
capabilities. This is not as much of a problem as it seems, because
it isn’t necessary that users recognize each other by their voice.
It is only necessary that they detect that the voice used for the SAS
procedure doesn’t match the voice in the rest of the phone
conversation.
sessions all the way back to the first one, which is assumed to be
difficult for the attacker. ZRTP’s key continuity features are
actually better than SSH, at least for VoIP, for reasons described in
Section 15.1. All this is accomplished without resorting to a
centrally managed PKI.
16. Acknowledgments
17. References
[RFC3711] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)",
RFC 3711, March 2004.
[RFC4880] Callas, J., Donnerhacke, L., Finney, H., Shaw, D., and R.
Thayer, "OpenPGP Message Format", RFC 4880, November 2007.
[FIPS-140-2-Annex-A]
"Annex A: Approved Security Functions for FIPS PUB 140-2",
NIST FIPS PUB 140-2 Annex A, January 2011.
[FIPS-140-2-Annex-D]
"Annex D: Approved Key Establishment Techniques for FIPS
PUB 140-2", NIST FIPS PUB 140-2 Annex D, January 2011.
[FIPS-180-3]
"Secure Hash Standard (SHS)", NIST FIPS PUB 180-3, October
2008.
[FIPS-186-3]
"Digital Signature Standard (DSS)", NIST FIPS PUB 186-
3, June 2009.
[FIPS-198-1]
"The Keyed-Hash Message Authentication Code (HMAC)", NIST
FIPS PUB 198-1, July 2008.
[NIST-SP800-38A]
Dworkin, M., "Recommendation for Block Cipher Modes of
Operation", NIST Special Publication 800-38A, 2001
Edition.
[NIST-SP800-56A]
Barker, E., Johnson, D., and M. Smid, "Recommendation for
Pair-Wise Key Establishment Schemes Using Discrete
Logarithm Cryptography", NIST Special Publication 800-
56A Revision 1, March 2007.
[NIST-SP800-90]
Barker, E. and J. Kelsey, "Recommendation for Random
Number Generation Using Deterministic Random Bit
Generators", NIST Special Publication 800-90 (Revised),
March 2007.
[NIST-SP800-108]
Chen, L., "Recommendation for Key Derivation Using
Pseudorandom Functions", NIST Special Publication 800-
108, October 2009.
[NSA-Suite-B]
"NSA Suite B Cryptography", NSA Information Assurance
Directorate, NSA Suite B Cryptography.
[NSA-Suite-B-Guide-56A]
"Suite B Implementer’s Guide to NIST SP 800-56A", Suite B
Implementer’s Guide to NIST SP 800-56A, 28 July 2009.
[TwoFish] Schneier, B., Kelsey, J., Whiting, D., Hall, C., and N.
Ferguson, "Twofish: A 128-Bit Block Cipher", June 1998,
<http://www.schneier.com/paper-twofish-paper.html>.
[pgpwordlist]
"PGP Word List", December 2010, <http://en.wikipedia.org/
w/index.php?title=PGP_word_list&oldid=400752943>.
[RFC1981] McCann, J., Deering, S., and J. Mogul, "Path MTU Discovery
for IP version 6", RFC 1981, August 1996.
[RFC3824] Peterson, J., Liu, H., Yu, J., and B. Campbell, "Using
E.164 numbers with the Session Initiation Protocol (SIP)",
RFC 3824, June 2004.
[RFC4567] Arkko, J., Lindholm, F., Naslund, M., Norrman, K., and E.
Carrara, "Key Management Extensions for Session
Description Protocol (SDP) and Real Time Streaming
Protocol (RTSP)", RFC 4567, July 2006.
[SRTP-AES-GCM]
McGrew, D., "AES-GCM and AES-CCM Authenticated Encryption
in Secure RTP (SRTP)", Work in Progress, January 2011.
[ECC-OpenPGP]
Jivsov, A., "ECC in OpenPGP", Work in Progress,
March 2011.
[VBR-AUDIO]
Perkins, C. and J. Valin, "Guidelines for the use of
Variable Bit Rate Audio with Secure RTP", Work
in Progress, December 2010.
[SIP-IDENTITY]
Wing, D. and H. Kaplan, "SIP Identity using Media Path",
Work in Progress, February 2008.
[NIST-SP800-57-Part1]
Barker, E., Barker, W., Burr, W., Polk, W., and M. Smid,
"Recommendation for Key Management - Part 1: General
(Revised)", NIST Special Publication 800-57 - Part
1 Revised March 2007.
[NIST-SP800-131A]
Barker, E. and A. Roginsky, "Recommendation for the
Transitioning of Cryptographic Algorithms and Key
Lengths", NIST Special Publication 800-131A January 2011.
[Byzantine]
"The Two Generals’ Problem", March 2011, <http://
en.wikipedia.org/w/
index.php?title=Two_Generals%27_Problem&oldid=417855753>.
[TESLA] Perrig, A., Canetti, R., Tygar, J., and D. Song, "The
TESLA Broadcast Authentication Protocol", October 2002, <h
ttp://www.ece.cmu.edu/~adrian/projects/tesla-cryptobytes/
tesla-cryptobytes.pdf>.
[comsec] Blossom, E., "The VP1 Protocol for Voice Privacy Devices
Version 1.2", <http://www.comsec.com/vp1-protocol.pdf>.
[Wright1] Wright, C., Ballard, L., Coull, S., Monrose, F., and G.
Masson, "Spot me if you can: Uncovering spoken phrases in
encrypted VoIP conversations", Proceedings of the 2008
IEEE Symposium on Security and Privacy 2008,
<http://cs.jhu.edu/~cwright/oakland08.pdf>.
[Sunshine] Sunshine, J., Egelman, S., Almuhimedi, H., Atri, N., and
L. Cranor, "Crying Wolf: An Empirical Study of SSL Warning
Effectiveness", USENIX Security Symposium 2009,
<http://lorrie.cranor.org/pubs/sslwarnings.pdf>.
Authors’ Addresses
Philip Zimmermann
Zfone Project
Santa Cruz, California
EMail: prz@mit.edu
URI: http://philzimmermann.com
EMail: alan.b.johnston@gmail.com
Jon Callas
Apple, Inc.
EMail: jon@callas.org