VoIP Book
VoIP Book
VoIP Book
The information contained in this document is the property of Spirent Communications, and is furnished for use by
recipient only for the purpose stated in the Software License Agreement accompanying the document. Except as per-
mitted by such License Agreement, no part of this publication may be reproduced, stored in a retrieval system, or
transmitted, in any form or by any means, without the prior written permission of Spirent Communications, Inc.
The information contained in this document is subject to change without notice and does not represent a commitment
on the part of Spirent Communications. The information in this document is believed to be accurate and reliable,
however, Spirent Communications assumes no responsibility or liability for any errors or inaccuracies that may
appear in the document.
All other trademarks and registered trademarks are the property of their respective owners.
Spirent Communications warrants to recipient that hardware which it supplies with this document (“Product”) will be
free from significant defects in materials and workmanship for a period of twelve (12) months from the date of
delivery (the “Warranty Period”), under normal use and conditions.
Defective Product under warranty shall be, at Spirent Communications’ discretion, repaired or replaced or a credit
issued to recipient’s account for an amount equal to the price paid for such Product provided that: (a) such Product is
returned to Spirent Communications after first obtaining a return authorization number and shipping instructions,
freight prepaid, to Spirent Communications’ location in the United States; (b) recipient provide a written explanation of
the defect claimed; and (c) the claimed defect actually exists and was not caused by neglect, accident, misuse, improper
installation, improper repair, fire, flood, lightning, power surges, earthquake or alteration. Spirent Communications
will ship repaired Product to recipient, freight prepaid, within ten (10) working days after receipt of defective Product.
Except as otherwise stated, any claim on account of defective materials or for any other cause whatsoever will
conclusively be deemed waived by recipient unless written notice thereof is given to Spirent Communications within
the Warranty Period. Product will be subject to Spirent Communications’ standard tolerances for variations.
Interest in Voice over IP (VoIP) has increased steadily over the past few years. Enterprises,
ISPs, ITSPs (Internet Telephony Service Providers), and carriers view VoIP as a viable
way to implement packet voice. Reasons for implementing VoIP typically include toll-
bypass, network consolidation, and service convergence. Toll-bypass allows long-distance
calls to be placed without incurring the usual toll charges. Through network consolidation,
voice, video, and data can be carried over a single network infrastructure, thereby
simplifying network management and reducing cost through the use of common
equipment. With service convergence, enhanced functionality can be implemented
through the coupling of multimedia services. This full integration permits new
applications, such as unified messaging and web call center.
However, designing a VoIP network requires careful planning to ensure that voice quality
can be properly maintained. This document examines the factors that affect voice quality
and the test and analysis strategy for a VoIP network.
VoIP Components
Figure 1 on page 2 shows the major components of a VoIP network. The gateway converts
signals from the traditional telephony interfaces (POTS, T1/E1, ISDN, E&M trunks) to
VoIP. An IP phone is a terminal that has native VoIP support and can connect directly to an
IP network. In this paper, the term terminal will be used to refer to either a gateway, an IP
phone, or a PC with a VoIP interface.1 The server provides management and
administrative functions to support the routing of calls across the network. In a system
based on H.323, the server is known as a gatekeeper. In SIP/SDP, the server is a SIP
server. In a system based on MGCP or MEGACO, the server is a call agent. Finally, the IP
network provides connectivity between all the terminals. The IP network can be a private
network, an Intranet, or the Internet.
1. This is consistent with the terminology used in H.323. However, this paper does not assume that
the VoIP system is necessarily based on H.323.
Once a call has been set up, speech will be digitized and then transmitted across the
network as IP frames. Voice samples are first encapsulated in RTP (Real-time Transport
Protocol) and UDP (User Datagram Protocol) before being transmitted in an IP frame.
Figure 2 shows an example of a VoIP frame in both LAN and WAN.
For example, if the CODEC used is G.711 and the packetization period is 20 ms, the
payload will be 160 bytes. This will result in a total frame length of 206 bytes in WAN and
218 bytes in LAN.
Voice Quality
In designing a VoIP network, it is important to consider all the factors that will affect voice
quality. A summary of the major factors follows.
Before analog voice can be transmitted over an IP network, it must first be digitized. The
common coding standards are listed in the following table:
G.726 ADPCM (Adaptive Differential Pulse Code Modulation) 16, 24, 32, 40
There is a general correlation between the voice quality and the data rate: the higher the
data rate, the higher the voice quality. The relationship between the two will be examined
in greater detail in “Mean Opinion Score (MOS)” on page 10.
Frame Loss
VoIP frames have to traverse an IP network, which is unreliable. Frames may be dropped
as a result of network congestion or data corruption. Furthermore, for real-time traffic like
voice, retransmission of lost frames at the transport layer is not practical because of the
additional delays. Hence, voice terminals have to deal with missing voice samples, also
referred to as frame erasures. The effect of frame loss on voice quality depends on how the
terminals handle frame erasures.
In the simplest case, the terminal leaves a gap in the voice stream if a voice sample is
missing. If too many frames are lost, the speech will sound choppy with syllables or words
missing. One possible recovery strategy is to replay the previous voice sample. This works
well if only a few samples are missing. To better cope with burst errors, interpolation is
usually used. Based on the previous voice samples, the decoder will predict what the
missing frames should be. This technique is known as Packet Loss Concealment (PLC).
For example, ITU-T G.711 Appendix I describes a PLC algorithm for PCM. A circular
history buffer consisting of 48.75 ms of the previous voice samples is kept. Once frame
erasure is detected, the contents of the history buffer will be used to estimate the current
pitch period. This information will then be used to generate a synthesized signal to fill in
the gap. With PLC in G.711, the audio output is delayed by an additional 3.75 ms to
provide a smooth transition between the real and synthesized signals. CELP-based speech
coders such as G.723.1, G.728, and G.729 also have PLC algorithms built into their
standards. In general, if the erasures are not too long, and the signal is not changing very
rapidly, the erasures may be inaudible after concealment.
ITU-T G.113 Appendix I provides some provisional planning guidelines on the effect of
frame loss on voice quality. The impact is measured in terms of Ie, the equipment
impairment factor.2 This is a dimensionless number in which 0 means no impairment. The
larger the Ie factor, the more severe the impairment. The following table is derived from
G.113 Appendix I and it shows the impact of frame loss on the Ie factor.
When the frame loss rate is 2%, the equipment impairment is 35 for standard G.711.
However, with PLC, the equipment impairment is reduced to 7. Note that with low bit-rate
coders such as G.729A and G.723.1, there is equipment impairment of 11 and 15
respectively even with no frame loss. A 2% frame loss will increase the impairment to 19
and 24 respectively.
Codec Ie (0% loss) Ie (2% random frame loss) Ie (5% random frame loss)
G.729A 11 19 26*
* The values were for 4% random frame loss. The values for 5% were not provided in the Appendix.
† The values were for 4% random frame loss. The values for 5% were not provided in the Appendix.
2. The Ie factor is derived subjectively using a Mean Opinion Score (MOS). The methodology is
described in G.113 Annex E. The Ie factor can also be used in the E-Model, which is discussed
later in “Transmission Characteristics and the E-Model” on page 12.
Another important consideration in designing a VoIP network is the effect of delay.
Impairments caused by delays include echo and talker overlap. The effect of delay on
voice transmission is discussed in ITU-T G.114.
Sources of Delays
Before assessing the impact of delay, it is useful to first identify the sources of delays.
Algorithmic Delay. This is the delay introduced by the CODEC and is inherent in the
coding algorithm. The following table summarizes the algorithmic delay of common
coding standards.
G.711 0.125*
G.726 1
G.728 3-5
G.729 15†
G.723.1 37.5‡
Packetization Delay. In RTP, voice samples are often accumulated before putting into a
frame for transmission to reduce the amount of overhead. RFC 1890 specifies that the
default packetization period should be 20 ms. For G.711, this means that 160 samples will
be accumulated and then transmitted in a single frame. On the other hand, G.723.1
generates a voice frame every 30 ms and each voice frame is usually transmitted as a
single RTP packet.
Serialization Delay. This is the time required to transmit the IP packet. For example, if
G.711 is used and the packetization period is 20 ms (i.e., there are 160 bytes in the RTP
payload), then the entire frame will be 206 bytes assuming PPP encapsulation. To transmit
the frame, it will require 1.1 ms on a T1 line, 3.2 ms at 512 kbps, and 25.8 ms at 64 kbps.
Furthermore, the serialization delay is incurred whenever it passes through another store-
and-forward device such as a router or a switch. Thus, a frame that traverses 10 routers
will incur this delay 10 times.
Propagation Delay. This is the time required for the electrical or optical signal to travel
along a transmission medium and is a function of the geographic distance. The
propagation speed in a cable is approximately 4 to 6 microseconds per kilometer. For
satellite transmission, the delay is 110 ms for a 14000-km altitude satellite and 260 ms for
a 36000-km altitude satellite.
Component Delay. These are delays caused by the various components within the
transmission system. For example, a frame passing through a router has to move from the
input port to the output port through the backplane. There is some minimum delay due to
the speed of the backplane and some variable delays due to queuing and router processing.
Echo Cancellation
The first impairment caused by delay is the effect of echo. Echo can arise in a voice
network due to poor coupling between the earpiece and the mouthpiece in the handset.
This is known as acoustic echo. It can also arise when part of the electrical energy is
reflected back to the speaker by the hybrid circuit3 in the PSTN (Public Switched
Telephone Network). This is known as hybrid echo.
When the one-way end-to-end delay is short, whatever echo that is generated by the voice
circuit will come back to the speaker very quickly and will not be noticeable. In fact, the
guideline is that echo cancellation is not necessary if the one-way delay is less than 25 ms.
In other words, if the echo comes back within 50 ms, it will not be noticeable. However,
the one-way delay in a VoIP network will almost always exceed 25 ms. Therefore, echo
cancellation is always required.
Talker Overlap
Even with perfect echo cancellation, carrying on a two-way conversation becomes difficult
when the delay is too long because of talker overlap. This is the problem that occurs when
one party cuts off the other party’s speech because of the long delay. G.114 provides the
following guidelines regarding the one-way delay limit:
3. The hybrid circuit converts the two-wire PSTN circuit to a four-wire circuit.
In general, jitter will result in clumping and gaps in the incoming data stream. The general
strategy in dealing with jitter is to hold the incoming frames in a playout buffer long
enough to allow the slowest frames to arrive in time to be played in the correct sequence.
The larger the amount of jitter, the longer some of the frames will be held in the buffer,
which introduces additional delay.
To minimize the delay due to buffering, most implementations use an adaptive jitter buffer.
In other words, if the amount of jitter in the network is small, the buffer size will be small.
If the jitter increases due to increased network load, the buffer size will increase
automatically to compensate for it. Therefore, jitter in the network will impair voice
quality to the extent that it increases the end-to-end delay due to the playout buffer.
Sometimes when the jitter is too large, the playout buffer may choose to allow some frame
loss to keep the additional delay from getting too long.
Figure 3. Jitter
Delay Budget
Figure 4 shows an example of a VoIP network and the sources of delay. The following
delay budget can be constructed. Assume an end-to-end delay target of 150 ms.
However, this example assumes that you knew the exact topology of the network, and thus
were able to calculate all the delay components. In the next example (Figure 5 on page 9),
we assume that the voice gateways are connected via a VPN service offered by an ISP.
Assume an end-to-end delay target of 150 ms:
MOS is the most relevant test, because it is humans who use the voice network and it is
humans whose opinions count. However, a subjective test that involves human subjects
can be time-consuming to administer. Hence, there is a lot of interest in devising objective
tests that can be used to approximate human perception of voice quality.
Figure 6. PSQM
Using this method, if the input and output signals are identical, the PSQM score will be
zero. The bigger the differences, the higher the score will be up to a maximum of 6.5.
However, unlike more traditional measurements such as signal-to-noise ratio (SNR), the
emphasis of PSQM is on differences that will affect human perception of speech quality.
One of the PSQM’s criticisms is that it was originally designed to measure the quality of
coding standards. Therefore, it does not fully take into account the effect of various
transmission impairments. PSQM+ was proposed in December 1997 and accounts for:
• Different perceptions due to volume or loud distortions
• Speech that has dropouts
With PSQM+, the correlation between the objective score and MOS is improved.
4. Based on the transmission impairments, different transmission rating factors can be computed
depending on the coder used. Each coder will result in a different Ie factor and thus a different R
Testing VoIP
The previous section examined the various voice quality measures. In this section, we will
examine the different test configurations and the objectives of the tests.
IP Network Analysis
The main objective of this test is to measure the transmission characteristics of an IP
network to determine if it can support VoIP applications. It is also an important test to
measure the effectiveness of QoS mechanisms. Figure 7 shows the typical test
To test the effectiveness of QoS, you must be able to simulate a mixture of voice and data
traffic. By measuring the transmission characteristics of each flow, you can test whether
the IP network provides different treatment to voice and data traffic. Note that this test
does not involve the use of any VoIP equipment.
Using the handsets, a voice call can be placed and human subjects can be used to assess
the quality of the voice transmission.
Using a PSQM test system, the same test can be done objectively. This allows different
gateway configurations (changing the CODEC or turning voice activity detection on and
off) to be rapidly tested. The traffic generator can also be used in this configuration to
increase the load on the IP network, thereby testing the QoS of the network.
In addition to testing voice transmission, other types of information may also be
transmitted over the VoIP network. These include DTMF tones, fax, and modem. For
example, if you configure the gateway to use a low bit-rate coder such as G.729 with VAD,
a fax will not transmit properly. However, some gateways can detect the presence of the
fax tone and will switch over to G.711 and turn off VAD automatically. Other gateways
implement fax-relay that extracts the fax data and only transmits the data across the IP
network. These mechanisms need to be tested.
Another common test configuration involves the use of an impairment simulator. When
testing voice terminals, it is often desirable to test their performance under degraded
operating conditions. For example, determining what happens to voice quality if the frame
loss rate is 1%, 2%, 3%, and so on. To a certain extent, the insertion of data traffic into the
network to cause congestion can achieve that. However, it is difficult to adjust the data
traffic level to cause a frame loss rate of, say, exactly 3%. An impairment simulator can be
used to precisely create a set of degraded operating conditions. The resulting voice quality
can again be measured either subjectively by human listeners or objectively using a PSQM
test system. This test configuration is illustrated in Figure 9.
Case Studies
Quality-of-Service (QoS)
Whether VoIP is implemented using dedicated or shared facilities, QoS is an important
consideration. A QoS-enabled network will differentiate between different types of traffic
and offer different treatments. Using standard-based methods, this can be achieved using
either the TOS (Type Of Service) bits or the DiffServ (Differentiated Services) field in the
IP header, or through the use of signaling protocols such as RSVP (Resource reSerVation
Protocol) and MPLS (Multi-Protocol Label Switching). Routers and switches also support
prioritization based on physical port, protocol, IP addresses, transport addresses, or even
frame length.
In analyzing whether an IP network can support VoIP, the effectiveness of its QoS must be
evaluated. In particular, questions that are of interest include the following:
• How can the network differentiate VoIP traffic from other types of traffic?5
• In the event of network congestion, what is the frame loss rate of voice vis-à-vis data?
• What is the average delay experienced by voice frames?
• What is the average jitter?
• Can the network scale if there is a large number of flows?
The following series of tests were designed to investigate the behavior of common
prioritization schemes in routers. Figure 11 on page 19 shows the test configuration.
Using the SmartBits, a number of traffic flows were defined and injected into the IP
network. These traffic flows represented different types of traffic. They differed from one
another in terms of IP addresses, IP precedence, port numbers, size of packets, or a
combination of these factors. By observing the output of the IP network, we can determine
how the network treated the traffic differently. To see the effect of QoS, there must be
some resource contention. Therefore, the bandwidth of the serial link was configured to be
500 kbps. Because traffic arrived from a 10 Mbps Ethernet port, congestion started to
occur when the input load exceeded about 6%.6 This number was quite arbitrary. If the
WAN bandwidth was changed, the congestion point would also change.
5. The issue is that both data and voice appear as IP frames. Furthermore, RTP does not use well-
known port numbers.
6. The congestion point varies depending on the frame size. For example, using 200 byte frames
(including header and FCS), the maximum frame rate on 10 Mbps Ethernet is 5682 frames per
second. If the load is 6%, the frame rate will be 341 frames per second. Using PPP encapsulation
on the WAN link, the bit rate will be 513 Kbps, which is just beyond the capacity of the link. The
congestion point will occur sooner if longer frames are used because the overhead will be less.
Priority Queuing
In priority queuing, traffic is classified into separate queues – for example, high, medium,
normal, and low. The queues are serviced in the strict order of priority. In other words, the
high priority queue must be empty before the medium priority queue is serviced, and both
the high and medium priority queues must be empty before the normal queue is serviced
and so on. Figure 12 on page 20 shows the frame loss behavior of such a queuing strategy
as the input load is increased.
In this example, when the input load exceeded the WAN bandwidth (at around 6%),
frames were discarded. The low priority traffic was discarded until there was none left.
Then the normal priority traffic was discarded followed by medium priority traffic. The
results illustrate a potential problem with this strategy. If there is a high volume of high
priority traffic, then it will squeeze out all traffic of lesser priority.
The test was repeated again. This time, three traffic flows were configured – one for FTP,
one for HTTP, and one for VoIP. Prioritization was done based on frame length. The VoIP
traffic was given high priority whereas FTP and HTTP were given low priority. The
following table shows the results of the test.
Congestion started at 6% and 8%, at which point data traffic was discarded while voice
traffic was preserved. At the same time, voice traffic also experienced much lower delay
compared with data.
flows. Higher priority will typically be given to traffic flows with lower volume (for
example, interactive traffic), while traffic flows with higher volume (for example, bulk
data transfer) will be given lower priority.
The results show that WFQ is equally effective in providing better service to voice over
data while providing the additional benefits of fairer sharing of bandwidth.
IP RTP Priority
One of the problems with flow-based WFQ is scalability. The queuing strategy tends to
break down when there are a large number of flows. This is evident from the following
test. Figure 13 on page 23 shows the results of WFQ with a total of 8 flows. Instead of one
flow per IP precedence, 50 flows were configured per IP precedence. This resulted in a
total of 400 flows. In theory, the bandwidth allocation should stay the same, i.e., 1/36 for
IP precedence 0, 2/36 for IP precedence 1, and so on. Figure 14 on page 24 shows the test
results. Clearly the scheduling algorithm broke down when there was a large number of
If there is a large number of VoIP flows, IP RTP priority can be used instead. It combines
priority queuing with WFQ. VoIP traffic (as a group as opposed to per flow) can be treated
with higher priority. This can be done by specifying a UDP port range. All other traffic
will be scheduled using WFQ. Figure 15 on page 25 shows the test results.
In this case, absolute priority was given to VoIP. However, to prevent VoIP traffic from
monopolizing the bandwidth, an upper limit was placed on the bandwidth consumption
(although the bandwidth cap was not reached in the test).
IP Wave was configured with the following frame loss — 0%, 1%, and 3%. G.711 and
G.729 were used in the gateway to see how each coding algorithm could cope with the
frame loss. The following table summarizes the results.
1% 0 0.4 2.4
3% 0 1.0 2.9
With 0% frame loss, G.711 had a PSQM score of 0, which represented little or no signal
distortion. With G.729, since the bit rate is 8 kbps, it introduced some signal distortion
even without frame loss. However, the impact of frame loss on both coding algorithms
seemed comparable. A 1% frame loss caused a PSQM increase of 0.4 for G.711 and 0.5
for G.729. Similarly, a 3% frame loss caused PSQM increases of 1.0 and 1.1 respectively
for G.711 and G.729.
Figure 17 shows the PSQM values for G.711 with a 3% frame loss.
The same tests were repeated again. This time, there was no frame loss but various delay
impairments were introduced. First, a 50 ms delay was introduced. Then a 50 ms jitter was
added and finally a 100 ms jitter was used. The following is a summary of the results:
These test results show that a 50 ms delay and a 50 ms jitter had minimal impact on the
PSQM score.7 The 100 ms jitter caused an increase of 0.3 and 0.5 on G.711 and G.729
respectively. This could be interpreted to indicate that the routers decided to allow some
frame loss to not introduce too much delay by fully compensating for the jitter.
VoIP is an IP application that has stringent performance requirements. The performance of
the IP network has a direct impact on voice quality. This document identifies the
transmission impairment factors that should be measured. These include frame loss rate,
delay, and jitter. In particular, QoS is an important component of the IP network. When
there is resource contention, such as network congestion, it is important for the network to
provide better service to real-time traffic such as VoIP at the expense of data traffic. This
document also examines various quality measures including MOS, PSQM, PAMS, PESQ,
and the E-Model. All of these measures are useful and can be used in combination.
When testing VoIP, there are different tests that should be performed. These include IP
network analysis, end-to-end voice testing, DTMF, fax, and modem testing, impairment
simulation, and signaling stress testing. This document discusses the various test
7. Standard PSQM scoring does not include the effect of delay. However, as previously mentioned,
voice quality will degrade if the one-way delay exceeds 150 ms due to talker overlap.
Adaptive Differential Pulse Code Modulation (ADPCM)
Process by which analog voice samples are encoded into high-quality digital signals.
See Adaptive Differential Pulse Code Modulation.
See Advanced Intelligent Network.
Answer Number Indication. Also known as Caller ID. The calling number (number of
calling party).
Admission request.
Backbone Network
Core high bandwidth links concentrating traffic from access links.
See Common Channel Signaling.
See Call Detail Record.
See Code-Excited Linear Predictive Coding.
Circuit-Switched Network
A network where a dedicated physical circuit is established, maintained, and terminated for
each communication session. A traditional method for connecting telephone calls.
See Competitive Local Exchange Carrier.
Coder/Decoder (CODEC)
Hardware or software that converts analogue signals (e.g., video, audio) to or from digital
form for transmission or storage. May perform compression or other optimizations such as
silence suppression.
Reducing the size of a data set to lower the bandwidth or space required for transmission
or storage.
The ability to connect physically and logically between devices to exchange data.
Combining multiple small frames for transmission to reduce overhead due to lower
communication layers.
See Customer Premises Equipment.
Call Processing Language.
Compressed Real-Time Transmission Protocol. See RTP.
See Conjugate Structure Algebraic Code Excited Linear Prediction.
Call switching module.
See Computer Telephony Integration.
Dedicated Circuit
A transmission circuit leased by one customer for exclusive use all the time. Also called a
private line or leased line.
In the context of telephony or circuit switching, the amount of time a call spends waiting to
be processed. In the context of network transfers, the time to traverse a network or network
segment. Differential delay is the difference in transit time between data packets taking
separate transmission paths.
Dial Peer
An addressable call endpoint. In VoIP, there are two types of dial peers: POTS and VoIP.
Dialed number identification service. The called number.
Digital Loop Carrier. A PSTN distribution system with fiber links from the carrier office to
a distribution node from which conventional analogue phone loops emanate to individual
See Domain Name System.
Digital Signal 0. North American Digital Hierarchy signaling standard for transmission at
64 kbps. Also the worldwide standard transmission rate (64 kbps) for PCM digitized voice
Digital Signal 1. North American Digital Hierarchy signaling standard for transmissions at
1.544 Mbps. Supports 24 simultaneous DS-0 signals. The term is often used
interchangeably with T-1, although DS-1 signals may be exchanged over other
transmission systems.
Digital Signal 3. North American Digital Hierarchy signaling standard for transmissions at
44 Mbps.
See Digital Signal Processor.
See Dual Tone Multi-Frequency.
European equivalent of T1 but operating at 2.048 Mbps.
The international public telecommunications numbering plan. A standard set by ITU-T that
addresses telephone numbers.
An H.323 terminal or gateway. An endpoint can call and be called. It generates and/or
terminates the information stream.
Echo Cancellation
When transmitting a signal, some of the energy may be reflected back to the transmitter.
For some types of full duplex communication, this will interfere with a real signal being
sent to the transmitter. A full duplex device can eliminate some of this noise in a received
signal by applying a correction signal derived from its transmitted signal.
See European Telecommunications Standards Institute.
Frame Relay Forum implementation agreement for Voice over Frame Relay (v1.0 May
1997). This specification defines multiplexed data, voice, fax, DTMF digit-relay, and CAS/
Robbed-bit signaling frame formats, but does not include call setup, routing, or
administration facilities.
The FRF.12 Implementation Agreement (also known as FRF.11 Annex C) was developed
to allow long data frames to be fragmented into smaller pieces and interleaved with real-
time frames. In this way, real-time voice and non real-time data frames can be carried
together on lower speed links without causing excessive delay to the real-time traffic.
A collection of data sent as a unit. Normally used in the context of layer two of the OSI
protocol stack.
Audio codec over 48, 56, and 64 kbps PCM half-duplex channels (normal telephony).
Encoded voice is already in the correct format for digital voice delivery in the PSTN or
through PBXs; also referred to as “clear channel” coding. Characteristics: high quality,
high bandwidth, and minimum processor load.
Audio codec over 48, 56, and 64 kbps channels.
G.723, G.723.1
Audio codec over 5.3 and 6.3 kbps channels. Selected by the VoIP Forum for use with VoIP.
Based on CELP. Characteristics: low quality, low bandwidth, and high processor load due
to the compression.
40/32/24/16 kbps ADPCM codec. Characteristics: good quality, medium bandwidth, and
low processor load due to minimal compression.
Audio codec over 16 kbps channels using LD-CELP. Characteristics: medium quality,
medium bandwidth, and very high processor load to greater compression.
G.729, G.729a
Audio codec over 8 kbps channels using CELP. Adopted by the Frame Relay Forum for
voice over Frame Relay. Characteristics: medium quality, low bandwidth, and high
processor load.
An H.323 entity on the LAN that maintains a registry of devices (e.g., H.323 terminals,
gateways, and MCUs) to provide address translation services. The devices register with the
gatekeeper at startup and request admission to a call from the gatekeeper.
In general, a gateway translates between similar services using different protocols to
support inter-operation. In the VoIP context, a gateway allows H.323 terminals to
communicate with non-H.323 terminals. Different types of gateways may be involved. A
Signaling Gateway may be needed to convert IP signaling protocol (H.323, SIP) to PSTN
signaling protocols (SS7, ISDN-D). A Media Gateway may be needed to convert IP media
protocols (H.323, RTP) to PSTN media protocols (ISDN-B, DS0, DS1). Other types of
gateways may be needed to connect to cellular phone networks, etc.
Call Control. An ITU standard that governs H.323 session establishment and packetization.
H.225 describes several different protocols: RAS, use of Q.931, use of RTP, and message
H.225.0 RAS
Registration, admission, and status. The RAS signaling function performs registration,
admissions, bandwidth changes, status, and disengagement procedures between the VoIP
gateway and the gatekeeper.
An ITU standard that governs H.323 endpoint control, including the opening and closing
of channels for media streams, capability negotiation, and more.
See MeGaCo and MGCP.
Video codec for audio visual services at multiples of 64 kbps.
Specifies a codec for video over the PSTN.
An ITU-T standard that describes packet-based video, audio, and data conferencing over
unreliable networks (i.e., QoS is not guaranteed). H.323 is an umbrella standard that
describes the architecture of the conferencing system and refers to a set of other standards
(H.245, H.225.0, and Q.931) to describe its actual protocol. H.323 is an extension of ITU
standard H.320, which is geared to ISDN. Related standards are:
• H.332: Conferences
• H.235: Security — authentication, encryption, integrity, non-repudiation
• H.246: Interface with PSTN
• H.450.1, .2, .3: Supplementary services, call transfer, call diversion
• H.261, H.263, ...: Video
• H.320, H.321, H.324: ISDN, PRI/ATM, PSTN analogue
H.323 Terminal
Network nodes that provide real-time, two-way communications with another H.323
terminal (e.g., computer-based video conferencing systems).
IA 1.0
VoIP Forum Implementation Agreement 1.0 selecting protocol options for interoperable
See Institute of Electrical and Electronic Engineers.
See Internet Engineering Task Force.
See Internet Group Management Protocol.
(Note the capital “I.”) The largest internet in the world consisting of large national
backbone nets (such as MILNET, NSFNET, and CREN) and a myriad of regional and local
campus networks all over the world. The Internet uses the Internet protocol suite. To be on
the Internet, you must have IP connectivity (i.e., be able to Telnet to or ping other systems).
Networks with only e-mail connectivity are not actually classified as being on the Internet.
Internet Telephony
A generic term used to describe various approaches to running voice telephony over IP.
A collection of networks interconnected by routers that function (generally) as a single
network. Sometimes called an internet, which is not to be confused with the Internet.
A private network inside a company or organization that uses the same kinds of software
that you would find on the public Internet, but that is only for internal use. As the Internet
has become more popular, many of the tools used on the Internet are being used in private
networks. For example, many companies have Web servers that are available only to
IP Device Control (family of protocols, IETF work in progress, see also MGCP).
See International Telecommunications Union.
See Interactive Voice Response.
See Interexchange Carrier.
The delay between the time when a device receives a frame and when the frame is
forwarded out of the destination port.
Lightweight Directory Access Protocol. An Internet standard for accessing Internet
directory services.
See Low-delay CELP.
See Local Exchange Carrier.
Lifeline POTS
A minimal telephone service designed to extend a “lifeline” to the telephone system in case
of emergency, particularly when electric power is lost.
See Logical Link Control (LLC).
Local Number Portability.
Twisted-pair copper telephone line connecting from the PSTN to a client’s premises. Loops
may differ in distance, diameter, age, and transmission characteristics depending on the
See Multipoint Control Unit.
See Media Gateway Control Protocol.
See Management Information Base.
See Mean Opinion Score.
A process of transmitting PDUs from one source to many destinations. The actual
mechanism for this process might be different for different LAN and WAN technologies.
A process of transferring PDUs (Protocol Data Units) where an endpoint sends more than
one copy of a media stream to different endpoints. This might be necessary in networks that
do not support multicast.
Network Address Translation.
An H.323 entity that uses RAS to communicate with the gatekeeper. For example, an
endpoint may be a terminal, proxy, or a gateway.
See Network Time Protocol.
The active condition of a Switched Access or Telephone Exchange Service line.
See Overhead.
The idle condition of a Switched Access or Telephone Exchange Service line.
(OH) Bits in a frame or cell that are required for framing, CRC, routing, etc.
ITU-T Recommendation (1993), Artificial voices.
ITU-T Recommendation (1993), Objective measurement of active speech level.
ITU-T Recommendation (1996), Test signals for use in telephonometry.
ITU-T Recommendation (1996), In-service, non-intrusive measurement devices for voice
service measurement.
ITU-T Recommendation (1996), Methods for the subjective determination of transmission
ITU-T Recommendation (1996), Subjective performance assessment of telephone-band
and wideband digital codecs.
ITU-T Recommendation (1996), Objective quality measurement of telephone-band (300 -
3400 Hz) speech codecs.
A collection of data sent as a unit. Normally used in the context of layer three of the OSI
protocol stack. A packet may be fragmented and sent in multiple frames as required by the
underlying layer two facilities.
Packet Switching
A WAN switching method in which network devices share a single point-to-point link to
transport packets from a source to a destination across a carrier network.
An MCNS and Cable Labs initiative principally intended to carry packetized voice and fax
over DOCSISTM capable cable systems. Services include voice mail, call placement, call
management, PSTN interfaces (SS7), and other functions common to traditional voice
See Private Branch Exchange.
See Pulse Code Modulation.
See Protocol Data Unit.
See Payload Header Suppression.
In the general sense, a proxy is an agent that performs operations on behalf of another
entity. In the context of VoIP, proxies are special gateways that relay one H.323 session to
See Public Switched Telephone Network.
An ITU standard that describes ISDN call signaling and setup. The H.225.0 standard uses
a variant of Q.931 encapsulated within TCP to establish and disconnect H.323 sessions.
See Quality of Service.
A signaling system between a PBX and CO, or between PBXs used to support enhanced
features such as forwarding and follow me.
Registration Authentication Status. A specification within H.323 that allows for session
authentication and authorization. This is what validates the call. See H.225.0.
Registration request.
See Resource Reservation Protocol.
See RTP Control Protocol.
See Real-Time Transport Protocol.
Real-Time Streaming Protocol. A protocol used to interface to a server that will provide
real-time data.
Session Announcement Protocol. A protocol used by multicast session managers to
distribute a multicast session description to a large group of recipients.
See Session Description Protocol.
Simple Gateway Control Protocol. A simple UDP-based protocol for managing endpoints
and connections between endpoints.
See Session Initiation Protocol.
Signaling System 7. A standard CCS system used with BISDN and ISDN that was
developed by Bellcore. SS7 is a packet signaling network that runs parallel to the circuit-
switched network that transports actual voice traffic. It is used for control / management—
setup, teardown, control calls, etc. Control traffic is out-of-band (i.e., on separate lines, but
within the same devices, which makes it less prone to problems of congestion since it
avoids heavy traffic).
One implementation of DS-1 services utilizing 4 wires and Bipolar Alternate Mark
Inversion (AMI) encoding. Requires repeaters every 6,000 ft.
An ITU standard that describes data conferencing H.323. It enables the establishment of
T.120 data sessions inside of an existing H.323 session.
Internet standard Transport Layer protocols.
See Type of Service.
See Voice Activity Detection.
See Voice over IP.
Voice Band
The frequency range from 0 to 4kHz used for analogue (voice, fax, data) signals in
conventional POTS.
See Virtual Private Network.
Voice telephony service provider.
Weighted Early Packet Discard.
Weighted fair queuing. Congestion management algorithm that identifies conversations (in
the form of traffic streams), separates packets that belong to each conversation, and ensures
that capacity is shared fairly between these individual conversations. WFQ is an automatic
way of stabilizing network behavior during congestion and results in increased
performance and reduced retransmission.
Weighted Random Early Detection.
A collection of all terminals, gateways, and Multipoint Control Units managed by a single
gatekeeper. A zone includes at least one terminal and may or may not include gateways or
MCUs. A zone has only one gatekeeper. A zone may be independent of LAN topology and
may be comprised of multiple LAN segments that are connected using routes or other
ITU-T Recommendation G.103 (2/1996), Transmission impairments.
ITU-T Recommendation G.103 Appendix I (9/1999), Provisional planning values for the
equipment impairment factor Ie.
ITU-T Recommendation G.107 (5/2000), The E-Model, a computational model for use in
transmission planning.
ITU-T Recommendation G.114 (2/1996), One-way transmission time.
ITU-T Recommendation G.168 (4/1997), Digital network echo cancellers.
CCITT Recommendation G.711 (1988), Pulse Code Modulation (PCM) of voice
ITU-T Recommendation G.711 Appendix I (9/1999), Appendix I: A high quality low-
complexity algorithm for packet loss concealment with G.711.
ITU-T Recommendation G.723.1 (1996), Speech coders: Dual rate speech coder for
multimedia communications transmitting at 5.3 and 6.3 kbps.
CCITT Recommendation G.726 (1990), 40, 32, 24, 16 kbps Adaptive Differential Pulse
Code Modulation (ADPCM).
CCITT Recommendation G.728 (1992), Coding of speech at 16 kbps using Low Delay
Code Excited Linear Prediction (LD-CELP).
ITU-T Recommendation G.729 (3/1996), Coding of speech at 8 kbps using Conjugate-
Structure Algebraic Code Excited Linear Prediction (CS-ACELP).
ITU-T Recommendation H.323 (9/1999), Packet-based multimedia communications
ITU-T Recommendation P.800 (8/1996), Methods for subjective determination of
transmission quality.
ITU-T Recommendation P.861 (2/1998), Objective quality measurement of telephone-
band (300-3400 Hz) speech codecs.
IETF RFC 1889 (1/1996), RTP: A Transport Protocol for Real-Time Applications.
IETF RFC 1890 (1/1996), RTP Profile for Audio and Video Conferences with Minimal
IETF RFC 2327 (4/1998), SDP: Session Description Protocol.
IETF RFC 2543 (3/1999), SIP: Session Initiation Protocol.
IETF RFC 2705 (10/1999), Media Gateway Control Protocol (MGCP) Version 1.0.
The author would like to thank Spirent Communications for their support during testing,
in particular, Mr. Iain Milnes and Mr. Andrez Chavez for running the PSQM tests.