RFC 4867 RTP Payload Format and File Storage Format
RFC 4867 RTP Payload Format and File Storage Format
RFC 4867 RTP Payload Format and File Storage Format
PROPOSED STANDARD
Audio Codecs
Copyright Notice
Abstract
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Table of Contents
1. Introduction ....................................................4
2. Conventions and Acronyms ........................................4
3. Background on AMR/AMR-WB and Design Principles ..................5
3.1. The Adaptive Multi-Rate (AMR) Speech Codec .................5
3.2. The Adaptive Multi-Rate Wideband (AMR-WB) Speech Codec .....6
3.3. Multi-Rate Encoding and Mode Adaptation ....................6
3.4. Voice Activity Detection and Discontinuous Transmission ....7
3.5. Support for Multi-Channel Session ..........................7
3.6. Unequal Bit-Error Detection and Protection .................8
3.6.1. Applying UEP and UED in an IP Network ...............8
3.7. Robustness against Packet Loss ............................10
3.7.1. Use of Forward Error Correction (FEC) ..............10
3.7.2. Use of Frame Interleaving ..........................12
3.8. Bandwidth-Efficient or Octet-Aligned Mode .................12
3.9. AMR or AMR-WB Speech over IP Scenarios ....................13
4. AMR and AMR-WB RTP Payload Formats .............................15
4.1. RTP Header Usage ..........................................15
4.2. Payload Structure .........................................17
4.3. Bandwidth-Efficient Mode ..................................17
4.3.1. The Payload Header .................................17
4.3.2. The Payload Table of Contents ......................18
4.3.3. Speech Data ........................................20
4.3.4. Algorithm for Forming the Payload ..................21
4.3.5. Payload Examples ...................................21
4.3.5.1. Single-Channel Payload Carrying a
Single Frame ..............................21
4.3.5.2. Single-Channel Payload Carrying
Multiple Frames ...........................22
4.3.5.3. Multi-Channel Payload Carrying
Multiple Frames ...........................23
4.4. Octet-Aligned Mode ........................................25
4.4.1. The Payload Header .................................25
4.4.2. The Payload Table of Contents and Frame CRCs .......26
4.4.2.1. Use of Frame CRC for UED over IP ..........28
4.4.3. Speech Data ........................................30
4.4.4. Methods for Forming the Payload ....................31
4.4.5. Payload Examples ...................................32
4.4.5.1. Basic Single-Channel Payload
Carrying Multiple Frames ..................32
4.4.5.2. Two-Channel Payload with CRC,
Interleaving, and Robust Sorting ..........32
4.5. Implementation Considerations .............................33
4.5.1. Decoding Validation ................................34
5. AMR and AMR-WB Storage Format ..................................35
5.1. Single-Channel Header .....................................35
5.2. Multi-Channel Header ......................................36
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
1. Introduction
This document obsoletes RFC 3267 and extends that specification with
offer/answer rules. See Section 10 for the changes made to this
format in relation to RFC 3267.
Even though this RTP payload format definition supports the transport
of both AMR and AMR-WB speech, it is important to remember that AMR
and AMR-WB are two different codecs and they are always handled as
different payload types in RTP.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
The byte order used in this document is network byte order, i.e., the
most significant byte first. The bit order is also the most
significant bit first. This is presented in all figures as having
the most significant bit leftmost on a line and with the lowest
number. Some bit fields may wrap over multiple lines in which cases
the bits on the first line are more significant than the bits on the
next line.
The AMR codec is a multi-mode codec that supports eight narrow band
speech encoding modes with bit rates between 4.75 and 12.2 kbps. The
sampling frequency used in AMR is 8000 Hz and the speech encoding is
performed on 20 ms speech frames. Therefore, each encoded AMR speech
frame represents 160 samples of the original speech.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Among the eight AMR encoding modes, three are already separately
adopted as standards of their own. Particularly, the 6.7 kbps mode
is adopted as PDC-EFR [18], the 7.4 kbps mode as IS-641 codec in TDMA
[17], and the 12.2 kbps mode as GSM-EFR [16].
With AMR or AMR-WB, mobile radio systems are able to use available
bandwidth as effectively as possible. For example, in GSM it is
possible to dynamically adjust the speech encoding rate during a
session so as to continuously adapt to the varying transmission
conditions by dividing the fixed overall bandwidth between speech
data and error protective coding. This enables the best possible
trade-off between speech compression rate and error tolerance. To
perform mode adaptation, the decoder (speech receiver) needs to
signal the encoder (speech sender) the new mode it prefers. This
mode change signal is called Codec Mode Request or CMR.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
For example, the GSM radio link can only use a subset of at most four
different modes in a given session. This subset can be any
combination of the eight AMR modes for an AMR session or any
combination of the nine AMR-WB modes for an AMR-WB session.
Moreover, for better interoperability with GSM through a gateway, the
decoder is allowed to use out-of-band means to set the minimum number
of frames between two mode changes and to limit the mode change among
neighboring modes only.
Both the RTP payload format and the storage format defined in this
document support multi-channel audio content (e.g., a stereophonic
speech session).
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
The speech bits encoded in each AMR or AMR-WB frame have different
perceptual sensitivity to bit errors. This property has been
exploited in cellular systems to achieve better voice quality by
using unequal error protection and detection (UEP and UED)
mechanisms.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
There are at least two basic approaches for carrying AMR and AMR-WB
traffic over bit error tolerant IP networks:
In either approach, at least part of the class B/C bits are left
without error-check and thus bit error tolerance is achieved.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Approach 1 is bit efficient, flexible and simple, but comes with two
disadvantages, namely, a) bit errors in protected speech bits will
cause the payload to be discarded, and b) when transporting multiple
AMR or AMR-WB frames in a RTP payload, there is the possibility that
a single bit error in protected bits will cause all the frames to be
discarded.
The choice between the above two approaches must be made based on the
available bandwidth, and the desired tolerance to bit errors.
Neither solution is appropriate for all cases. Section 8 defines
parameters that may be used at session setup to choose between these
approaches.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
--+--------+--------+--------+--------+--------+--------+--------+--
| f(n-2) | f(n-1) | f(n) | f(n+1) | f(n+2) | f(n+3) | f(n+4) |
--+--------+--------+--------+--------+--------+--------+--------+--
The use of this approach does not require signaling at the session
setup. However, a parameter for providing a maximum delay in
transmitting any redundant frame is defined in Section 8. In other
words, the speech sender can choose to use this scheme without
consulting the receiver. This is because a packet containing
redundant frames will not look different from a packet with only new
frames. The receiver may receive multiple copies or versions
(encoded with different modes) of a frame for a certain timestamp if
no packet is lost. If multiple versions of the same speech frame are
received, it is recommended that the mode with the highest rate be
used by the speech decoder.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Between the two operation modes, only the octet-aligned mode has the
capability to use the robust sorting, interleaving, and frame CRC to
make the speech transport more robust to packet loss and bit errors.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
+----------+ +----------+
| | IP/UDP/RTP/AMR or | |
| TERMINAL |<----------------------->| TERMINAL |
| | IP/UDP/RTP/AMR-WB | |
+----------+ +----------+
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
AMR or AMR-WB
over
I.366.{2,3} or +------+ +----------+
3G Iu or | | IP/UDP/RTP/AMR or | |
<------------->| GW |<---------------------->| TERMINAL |
GSM Abis | | IP/UDP/RTP/AMR-WB | |
etc. +------+ +----------+
|
GSM/ | IP network
3GPP UMTS network |
The IP terminal should not set the CMR (see Section 4.3.1), but the
gateway can set the CMR value on frames going toward the encoder in
the non-IP part to optimize speech quality from that encoder to the
gateway. The gateway can alternatively set a lower CMR value, if
desired, as one means to control congestion on the IP network.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Figure 4: GW to GW scenario
This scenario requires the same mechanisms for preserving UED/UEP and
CMR information as in the single gateway scenario. In addition, the
CMR value may be set in packets received by the gateways on the IP
network side. The gateway should forward to the non-IP side a CMR
value that is the minimum of three values:
The AMR and AMR-WB payload formats have identical structure, so they
are specified together. The only differences are in the types of
codec frames contained in the payload. The payload format consists
of the RTP header, payload header, and payload data.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
The RTP header marker bit (M) SHALL be set to 1 if the first frame-
block carried in the packet contains a speech frame which is the
first in a talkspurt. For all other packets the marker bit SHALL be
set to zero (M=0).
The assignment of an RTP payload type for this new packet format is
outside the scope of this document, and will not be specified here.
It is expected that the RTP profile under which this payload format
is being used will assign a payload type for this encoding or specify
that the payload type is to be bound dynamically.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
+----------------+-------------------+----------------
| payload header | table of contents | speech data ...
+----------------+-------------------+----------------
0 1 2 3
+-+-+-+-+
| CMR |
+-+-+-+-+
The codec mode request received in the CMR field is valid until the
next codec mode request is received, i.e., a newly received CMR value
corresponding to a speech mode, or NO_DATA overrides the previously
received CMR value corresponding to a speech mode or NO_DATA.
Therefore, if a terminal continuously wishes to receive frames in the
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
same mode X, it needs to set CMR=X for all its outbound payloads, and
if a terminal has no preference in which mode to receive, it SHOULD
set CMR=15 in all its outbound payloads.
An IP end-point SHOULD NOT set the codec mode request based on packet
losses or other congestion indications, for several reasons:
The encoder SHOULD follow a received codec mode request, but MAY
change to a lower-numbered mode if it so chooses, for example, to
control congestion.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
0 1 2 3 4 5
+-+-+-+-+-+-+
|F| FT |Q|
+-+-+-+-+-+-+
NO_DATA (FT=15) frame could mean either that no data for that frame
has been produced by the speech encoder or that no data for that
frame is transmitted in the current payload (i.e., valid data for
that frame could be sent in either an earlier or later packet).
If receiving a ToC entry with a FT value in the range 9-14 for AMR or
10-13 for AMR-WB, the whole packet SHOULD be discarded. This is to
avoid the loss of data synchronization in the depacketization
process, which can result in a huge degradation in speech quality.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| FT |Q|1| FT |Q|0| FT |Q|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Below is an example of how the ToC entries will appear in the ToC of
a packet carrying three consecutive frame-blocks in a session with
two channels (L and R).
+----+----+----+----+----+----+
| 1L | 1R | 2L | 2R | 3L | 3R |
+----+----+----+----+----+----+
|<------->|<------->|<------->|
Frame- Frame- Frame-
Block 1 Block 2 Block 3
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Each speech frame represents 20 ms of speech encoded with the mode
indicated in the FT field of the corresponding ToC entry. The length
of the speech frame is implicitly defined by the mode indicated in
the FT field. The order and numbering notation of the bits are as
specified for Interface Format 1 (IF1) in [2] for AMR and [4] for
AMR-WB. As specified there, the bits of speech frames have been
rearranged in order of decreasing sensitivity, while the bits of
comfort noise frames are in the order produced by the encoder. The
resulting bit sequence for a frame of length K bits is denoted d(0),
d(1), ..., d(K-1).
If speech data is missing for one or more speech frame within the
sequence, because of, for example, DTX, a ToC entry with FT set to
NO_DATA SHALL be included in the ToC for each of the missing frames,
but no data bits are included in the payload for the missing frame
(see Section 4.3.5.2 for an example).
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CMR=15|0| FT=4 |1|d(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| d(147)|P|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
As shown below, the payload carries a mode request for the encoder on
the receiver's side to change its future coding mode to AMR-WB 8.85
kbps (CMR=1). None of the frames are damaged at IP origin (Q=1).
The encoded speech and SID bits, d(0) to d(131), g(0) to g(39), and
h(0) to h(176), are arranged in the payload in descending sensitivity
order according to [4]. (Note, no speech bits are present for the
third frame.) Finally, seven zero bits are padded to the end to
make the payload octet aligned.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CMR=1 |1| FT=0 |1|1| FT=9 |1|1| FT=15 |1|0| FT=1 |1|d(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| d(131)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|g(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| g(39)|h(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| h(176)|P|P|P|P|P|P|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
In the payload, all speech frames contain the same mode 7.4 kbps
(FT=4) and are not damaged at IP origin. The CMR is set to 15, i.e.,
no specific mode is requested. The two channels are defined as left
(L) and right (R) in that order. The encoded speech bits is
designated dXY(0).. dXY(K-1), where X = block number, Y = channel,
and K is the number of speech bits for that mode. Exemplifying this,
for frame-block 1 of the left channel, the encoded bits are
designated as d1L(0) to d1L(147).
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CMR=15|1|1L FT=4|1|1|1R FT=4|1|1|2L FT=4|1|1|2R FT=4|1|1|3L FT|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|4|1|0|3R FT=4|1|d1L(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| d1L(147)|d1R(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: ... :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| d1R(147)|d2L(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: ... :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|d2L(147|d2R(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: ... :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| d2R(147)|d3L(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: ... :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| d3L(147)|d3R(0) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: ... :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| d3R(147)|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Sjoberg, et al. Standards Track [Page 24]
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
0 1
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5
+-+-+-+-+-+-+-+-+- - - - - - - -
| CMR |R|R|R|R| ILL | ILP |
+-+-+-+-+-+-+-+-+- - - - - - - -
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
+---------------------+
| list of ToC entries |
+---------------------+
| list of frame CRCs | (optional)
- - - - - - - - - - -
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
The list of ToC entries is organized in the same way as described for
bandwidth-efficient mode in 4.3.2, with the following exception:
when interleaving is used, the frame-blocks in the ToC will almost
never be placed consecutively in time. Instead, the presence and
order of the frame-blocks in a packet will follow the pattern
described in 4.4.1.
Packet #1
---------
ILL=2, ILP=0:
+----+----+----+----+----+----+
| 1L | 1R | 4L | 4R | 7L | 7R |
+----+----+----+----+----+----+
|<------->|<------->|<------->|
Frame- Frame- Frame-
Block 1 Block 4 Block 7
Packet #2
---------
ILL=2, ILP=1:
+----+----+----+----+----+----+
| 2L | 2R | 5L | 5R | 8L | 8R |
+----+----+----+----+----+----+
|<------->|<------->|<------->|
Frame- Frame- Frame-
Block 2 Block 5 Block 8
Packet #3
---------
ILL=2, ILP=2:
+----+----+----+----+----+----+
| 3L | 3R | 6L | 6R | 9L | 9R |
+----+----+----+----+----+----+
|<------->|<------->|<------->|
Frame- Frame- Frame-
Block 3 Block 6 Block 9
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|F| FT |Q|P|P|
+-+-+-+-+-+-+-+-+
Note, the number of class A bits for various coding modes in AMR
codec is specified as informative in [2] and is therefore copied
into Table 1 in Section 3.6 to make it normative for this payload
format. The number of class A bits for various coding modes in
AMR-WB codec is specified as normative in Table 2 in [4], and the
SID frame (FT=9) has 40 class A bits. These definitions of class
A bits MUST be used for this payload format.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
The receiver of the payload SHOULD examine the data integrity of the
received class A bits by re-calculating the CRC over the received
class A bits and comparing the result to the value found in the
received payload header. If the two values mismatch, the receiver
SHALL consider the class A bits in the receiver frame damaged and
MUST clear the Q flag of the frame (i.e., set it to 0). This will
subsequently cause the frame to be marked as SPEECH_BAD, if the FT of
the frame is 0..7 for AMR or 0..8 for AMR-WB, or SID_BAD if the FT of
the frame is 8 for AMR or 9 for AMR-WB, before it is passed to the
speech decoder. See [6] and [7] more details.
The following example shows an octet-aligned ToC with a CRC list for
a payload containing 3 speech frames from a single-channel session
(assuming none of the FTs is equal to 14 or 15):
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1| FT#1 |Q|P|P|1| FT#2 |Q|P|P|0| FT#3 |Q|P|P| CRC#1 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CRC#2 | CRC#3 |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
0 1 2 3 4 5 6 7
+---+---+---+---+---+---+---+---+
| c0| c1| c2| c3| c4| c5| c6| c7|
+---+---+---+---+---+---+---+---+
(MSB) (LSB)
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
- The last octet of each speech frame MUST be padded with zero
bits at the end if all bits in the octet are not used. The
padding bits MUST be ignored on reception. In other words,
each speech frame MUST be octet-aligned.
The use of robust sorting order for a payload type MUST be agreed via
out-of-band means. Section 8 specifies a media type parameter for
this purpose.
Note, robust sorting order MUST only be performed on the frame level
and thus is independent of interleaving, which is at the frame-block
level, as described in Section 4.4.1. In other words, robust sorting
can be applied to either non-interleaved or interleaved payload
types.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
The payload begins with the payload header of one octet, or two
octets if frame interleaving is selected. The payload header is
followed by the table of contents consisting of a list of one-octet
ToC entries. If frame CRCs are to be included, they follow the table
of contents with one 8-bit CRC filling each octet. Note that if a
given frame has a ToC entry with FT=14 or 15, there will be no CRC
present.
The UED and/or UEP is RECOMMENDED to cover at least the RTP header,
payload header, table of contents, and class A bits of a sorted
payload. Exactly how many octets need to be covered depends on the
network and application. If CRCs are used together with robust
sorting, only the RTP header, the payload header, and the ToC SHOULD
be covered by UED/UEP. The means for communicating the number of
octets to be covered to other layers performing UED/UEP is beyond the
scope of this specification.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CMR=6 |R|R|R|R|1|FT#1=5 |Q|P|P|0|FT#2=5 |Q|P|P| f1(0..7) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| f1(8..15) | f1(16..23) | .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: ... :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... |f1(152..158) |P| f2(0..7) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| f2(8..15) | f2(16..23) | .... |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: ... :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ... |f2(152..158) |P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Note, in the above example, the last octet in both speech frames is
padded with one zero bit to make it octet-aligned.
The two channels are left (L) and right (R) with L coming before R.
In the payload, a codec mode request is also sent (CMR=6), requesting
the encoder at the receiver's side to use AMR 10.2 kbps coding mode.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
The first two frames in the payload are the L and R channel speech
frames of frame-block #1, consisting of bits f1L(0..158) and
f1R(0..158), respectively. The next two frames are the L and R
channel frames of frame-block #3, consisting of bits f3L(0..158) and
f3R(0..158), respectively, due to interleaving. For each of the four
speech frames, a CRC is calculated as CRC1L(0..7), CRC1R(0..7),
CRC3L(0..7), and CRC3R(0..7), respectively. Finally, the payload is
robust sorted.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CMR=6 |R|R|R|R| ILL=1 | ILP=0 |1|FT#1L=5|Q|P|P|1|FT#1R=5|Q|P|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|1|FT#3L=5|Q|P|P|0|FT#3R=5|Q|P|P| CRC1L | CRC1R |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| CRC3L | CRC3R | f1L(0..7) | f1R(0..7) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| f3L(0..7) | f3R(0..7) | f1L(8..15) | f1R(8..15) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| f3L(8..15) | f3R(8..15) | f1L(16..23) | f1R(16..23) |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
: ... :
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| f3L(144..151) | f3R(144..151) |f1L(152..158)|P|f1R(152..158)|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|f3L(152..158)|P|f3R(152..158)|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Note, in the above example, the last octet in all four speech frames
is padded with one zero bit to make it octet-aligned.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
See 3GPP TS 26.103 [28] for preferred AMR and AMR-WB configurations
for operation in GSM and 3GPP UMTS networks. In gateway scenarios,
encoders can be requested through the "mode-set" parameter to use a
limited mode-set that is supported by the link beyond the gateway.
Further, to avoid congestion on that link, the encoder SHOULD limit
the initial codec mode for a session to a lower mode, until at least
one frame-block is received with rate control information.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
The storage format is used for storing AMR or AMR-WB speech frames in
a file or as an email attachment. Multiple channel content is
supported.
+------------------+
| Header |
+------------------+
| Speech frame 1 |
+------------------+
: ... :
+------------------+
| Speech frame n |
+------------------+
There also exists another storage format for AMR and AMR-WB that is
suitable for applications with more advanced demands on the storage
format, like random access or synchronization with video. This
format is the 3GPP-specified ISO-based multimedia file format 3GP
[31]. Its media type is specified by RFC 3839 [32].
The magic number for single-channel AMR files MUST consist of ASCII
character string:
"#!AMR\n"
(or 0x2321414d520a in hexadecimal).
The magic number for single-channel AMR-WB files MUST consist of
ASCII character string:
"#!AMR-WB\n"
(or 0x2321414d522d57420a in hexadecimal).
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Note, the "\n" is an important part of the magic numbers and MUST be
included in the comparison, since, otherwise, the single-channel
magic numbers above will become indistinguishable from those of the
multi-channel files defined in the next section.
+------------------+
| magic number |
+------------------+
| chan-desc field |
+------------------+
The magic number for multi-channel AMR files MUST consist of the
ASCII character string:
"#!AMR_MC1.0\n"
(or 0x2321414d525F4D43312E300a in hexadecimal).
The magic number for multi-channel AMR-WB files MUST consist of the
ASCII character string:
"#!AMR-WB_MC1.0\n"
(or 0x2321414d522d57425F4D43312E300a in hexadecimal).
The version number in the magic numbers refers to the version of the
file format.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Reserved bits | CHAN |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Each stored speech frame starts with a one-octet frame header with
the following format:
0 1 2 3 4 5 6 7
+-+-+-+-+-+-+-+-+
|P| FT |Q|P|P|
+-+-+-+-+-+-+-+-+
The FT field and the Q bit are defined in the same way as in Section
4.3.2. The P bits are padding and MUST be set to 0, and MUST be
ignored.
Following this one octet header come the speech bits as defined in
4.4.3. The last octet of each frame is padded with zeroes, if
needed, to achieve octet alignment.
The following example shows an AMR frame in 5.9 kbps coding mode
(with 118 speech bits) in the storage format.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|P| FT=2 |Q|P|P| |
+-+-+-+-+-+-+-+-+ +
| |
+ Speech bits for frame-block n, channel k +
| |
+ +-+-+
| |P|P|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Comfort noise frames of other types than AMR SID (FT=8) (i.e., frame
type 9, 10, and 11 for AMR) SHALL NOT be used in the AMR file format.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
6. Congestion Control
Another parameter that may impact the bandwidth demand for AMR and
AMR-WB is the number of frame-blocks that are encapsulated in each
RTP payload. Packing more frame-blocks in each RTP payload can
reduce the number of packets sent and hence the overhead from
IP/UDP/RTP headers, at the expense of increased delay.
7. Security Considerations
RTP packets using the payload format defined in this specification
are subject to the general security considerations discussed in [8]
and in any used profile, like AVP [12] or SAVP [26].
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
7.1. Confidentiality
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Two separate media type registrations are made, one for AMR and one
for AMR-WB, because they are distinct encodings that must be
distinguished by their own media type.
Data formats are specified for both real-time transport in RTP and
for storage type applications such as email attachments.
8.1. AMR Media Type Registration
The media type for the Adaptive Multi-Rate (AMR) codec is allocated
from the IETF tree since AMR is a widely used speech codec in general
VoIP and messaging applications. This media type registration covers
both real-time transfer via RTP and non-real-time transfers via
stored files.
Optional parameters:
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Encoding considerations:
The Audio data is binary data, and must be encoded for non-
binary transport; the Base64 encoding is suitable for email.
When used in RTP context the data is framed as defined in [14].
Security considerations:
See Section 7 of RFC 4867.
Public specification:
RFC 4867
3GPP TS 26.090, 26.092, 26.093, 26.101
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Additional information:
The following applies to stored-file transfer methods:
Magic numbers:
single-channel:
ASCII character string "#!AMR\n"
(or 0x2321414d520a in hexadecimal)
multi-channel:
ASCII character string "#!AMR_MC1.0\n"
(or 0x2321414d525F4D43312E300a in hexadecimal)
File extensions: amr, AMR
Macintosh file type code: "amr " (fourth character is space)
AMR speech frames may also be stored in the file format "3GP"
defined in 3GPP TS 26.244 [31], which is identified using the
media types "audio/3GPP" or "video/3GPP" as registered by RFC
3839 [32].
Restrictions on usage:
When this media type is used in the context of transfer over
RTP, the RTP payload format specified in Section 4 SHALL be
used. In all other contexts, the file format defined in Section
5 SHALL be used.
Author:
Magnus Westerlund <magnus.westerlund@ericsson.com>
Ari Lakaniemi <ari.lakaniemi@nokia.com>
Change controller:
IETF Audio/Video Transport working group delegated from the
IESG.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
The media type for the Adaptive Multi-Rate Wideband (AMR-WB) codec is
allocated from the IETF tree since AMR-WB is a widely used speech
codec in general VoIP and messaging applications. This media type
registration covers both real-time transfer via RTP and non-real-
time transfers via stored files.
Optional parameters:
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Encoding considerations:
The Audio data is binary data, and must be encoded for non-
binary transport; the Base64 encoding is suitable for email.
When used in RTP context the data is framed as defined in [14].
Security considerations:
See Section 7 of RFC 4867.
Public specification:
RFC 4867
3GPP TS 26.190, 26.192, 26.193, 26.201
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Additional information:
The following applies to stored-file transfer methods:
Magic numbers:
single-channel:
ASCII character string "#!AMR-WB\n"
(or 0x2321414d522d57420a in hexadecimal)
multi-channel:
ASCII character string "#!AMR-WB_MC1.0\n"
(or 0x2321414d522d57425F4D43312E300a in hexadecimal)
File extensions: awb, AWB
Macintosh file type code: amrw
Object identifier or OID: none
AMR-WB speech frames may also be stored in the file format "3GP"
defined in 3GPP TS 26.244 [31] and identified using the media
type "audio/3GPP" or "video/3GPP" as registered by RFC 3839
[32].
Restrictions on usage:
When this media type is used in the context of transfer over
RTP, the RTP payload format specified in Section 4 SHALL be
used. In all other contexts, the file format defined in Section
5 SHALL be used.
Author:
Magnus Westerlund <magnus.westerlund@ericsson.com>
Ari Lakaniemi <ari.lakaniemi@nokia.com>
Change controller:
IETF Audio/Video Transport working group delegated from the
IESG.
- The media type ("audio") goes in SDP "m=" as the media name.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
8.3.3. Examples
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Offer:
Answer:
Offer:
Answer:
m=audio 49120 RTP/AVP 97
a=rtpmap:97 AMR/8000/1
a=fmtp:97 mode-set=0,2,4,7; mode-change-period=2; \
mode-change-capability=2; mode-change-neighbor=1
a=maxptime:20
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Note that the payload format (encoding) names are commonly shown in
upper case. MIME subtypes are commonly shown in lower case. These
names are case-insensitive in both places. Similarly, parameter
names are case-insensitive both in MIME types and in the default
mapping to the SDP a=fmtp attribute.
9. IANA Considerations
Two media types (audio/AMR and audio/AMR-WB) have been updated; see
Section 8.
The differences between RFC 3267 and this document are as follows:
- Added clarification of behavior in regards to mode change period
and mode-change neighbor that is expected from an IP client; see
Section 4.5.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
- The reference list has been updated to now published RFCs: RFC
3448, RFC 3550, RFC 3551, RFC 3711, RFC 3828, and RFC 4566. A
reference to 3GPP TS 26.101 has also been added.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
11. Acknowledgements
12. References
12.1. Normative References
[5] Bradner, S., "Key words for use in RFCs to Indicate Requirement
Levels", BCP 14, RFC 2119, March 1997.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
[19] Larzon, L-A., Degermark, M., Pink, S., Jonsson, L-E., and G.
Fairhurst, "The Lightweight User Datagram Protocol (UDP-Lite)",
RFC 3828, July 2004.
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
[21] Handley, M., Floyd, S., Padhye, J., and J. Widmer, "TCP Friendly
Rate Control (TFRC): Protocol Specification", RFC 3448, January
2003.
[22] Li, A., et al., "An RTP Payload Format for Generic FEC with
Uneven Level Protection", Work in Progress.
[26] Baugher, M., McGrew, D., Naslund, M., Carrara, E., and K.
Norrman, "The Secure Real-time Transport Protocol (SRTP)", RFC
3711, March 2004.
[27] Perkins, C., Kouvelas, I., Hodson, O., Hardman, V., Handley, M.,
Bolot, J., Vega-Garcia, A., and S. Fosse-Parisis, "RTP Payload
for Redundant Audio Data", RFC 2198, September 1997.
[28] 3GPP TS 26.103, "Speech codec list for GSM and UMTS", version
5.5.0 (2004-09), 3rd Generation Partnership Project (3GPP).
[29] Schulzrinne, H., Rao, A., and R. Lanphier, "Real Time Streaming
Protocol (RTSP)", RFC 2326, April 1998.
[31] 3GPP TS 26.244, "3GPP file format (3GP)", version 6.1.0 (2004-
09), 3rd Generation Partnership Project (3GPP).
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Authors' Addresses
Johan Sjoberg
Ericsson AB
SE-164 80 Stockholm, SWEDEN
Phone: +46 8 7190000
EMail: Johan.Sjoberg@ericsson.com
Magnus Westerlund
Ericsson Research
Ericsson AB
SE-164 80 Stockholm, SWEDEN
Ari Lakaniemi
Nokia Research Center
P.O.Box 407
FIN-00045 Nokia Group, FINLAND
Phone: +358-71-8008000
EMail: ari.lakaniemi@nokia.com
Qiaobing Xie
Motorola, Inc.
1501 W. Shure Drive, 2-B8
Arlington Heights, IL 60004, USA
Phone: +1-847-632-3028
EMail: Qiaobing.Xie@motorola.com
RFC 4867 RTP Payload Format for AMR and AMR-WB April 2007
Intellectual Property
The IETF invites any interested party to bring to its attention any
copyrights, patents or patent applications, or other proprietary
rights that may cover technology that may be required to implement
this standard. Please address the information to the IETF at
<%ietf-ipr@ietf.org.
Acknowledgement