Real-Time Voice Over Packet-Switched Networks

12
Real-Time Voice Over

Packet-Switched Networks
Thomas J. Kostas, Michael S. Borella, Ikhlaq Sidhu, Guido M. Schuster,
Jacek Grabiec, and Jerry Mahler
3COM
Abstract
We discuss the architecture and technical viability of transporting real-time voice
over packet-switched networks such as the Internet. The value of integrating voice
and data networks onto a common platform is well known. The telephony industry
has proposed the ATM standard as a means of upgrading the Internet to provide
both real-time and data services. In contrast, voice services may be added to tradi-
tional IP networks that were originally designed for data transmission alone. Here,
we consider the feasibility and expected quality of service of audio applications
over IP networks such as the Internet. In particular, we examine possible architec-
tures for voice over IP and discuss measured Internet delay and loss characteristics.
T he concept of an integrated services network with both

real-time and data services is not new. In fact, two
alternate schemes are currently contending to provide
all the services seen in Fig. 1. From one point of view,
a new backbone from the telephony world — interexchange
carrier (IXC)/asynchronous transfer mode (ATM) or frame
relay (FR) — would provide all the required quality of service
own technology. The telecommunications world has envi-
sioned an integrated network via a large-scale ATM backbone
that supports many levels of QoS, including traditional n x 64
kb/s voice. Since the telecommunications world has always
been very QoS-focused, ATM is provisioned with mechanisms
to provide different QoS levels. From the IP community, the
long-term view is that real-time voice and video services can
(QoS) levels for integrated services. From the other point of multiplex with existing data traffic. However, QoS has not
view, the existing Internet and corporate intranets would carry been considered with the same intensity — the current Inter-
real-time voice traffic in addition to data. net service model is flat, offering a classless, best-effort deliv-
Traditionally, the networking world has been divided along ery service. As such, QoS is an ad hoc extension to the IP
such lines. There has long been a telecommunications net- infrastructure. The next generation of IP, version 6, includes
work that is circuit-switched and designed for point-to-point support for “flows” of packets between one or more hosts [1].
communication of real-time audio. Subsequently it has been In conjunction with a hop-by-hop resource reservation proto-
adapted for the growing needs of data communication via col such as RSVP [2], end-to-end capacity can be set aside for
modem technologies, ISDN, digital carriers, and, most recent- real-time traffic. Much can be said of the trade-offs between
ly, integrated services ATM and FR backbones. In contrast, IP and ATM solutions for providing integrated services. How-
there also exists a data networking world of store-and-forward ever, in this article we focus on the technical issues involved in
packet technologies created primarily for data transport over supporting audio in the current Internet.
local and wide areas. These networks are the vast collection of The dominant standard for transmitting multimedia in
small and large IP networks that are intertwined in the form packet-switched networks is International Telecommunication
of the Internet and many partitioned intranets. Data networks Union (ITU) Recommendation H.323 [3, 4], which uses
comprise links, routers, bridges, and switches in the form of IP/UDP/RTP encapsulation for audio. RTP, the Real-Time
local and wide area networks. Protocol [5], is a generic mechanism for supporting the inte-
Although these two networking worlds are coming together gration of voice, video, and data. RTP headers provide the
in a model of shared data, voice, and video, proponents of sequence number and timestamp information needed to
each view are looking at the future as an extension of their reassemble a real-time stream from packets. H.323 does not
18 0890-8044/98/$10.00 © 1998 IEEE IEEE Network • January/February 1998

Telnet / FTP / Web
provide any QoS guarantees, but does specify that a reliable

transport protocol, such as TCP, be used for transmitting con-
trol information. The Voice Over IP (VoIP) standards com- Internet
mittee is proposing a subset of H.323 for audio over IP [6].
Currently, a number of vendors have developed or are devel- PC
oping Internet telephony products that conform to this stan-
dard. Heterogeneous mix of
LAN data and real-time
This article is organized as follows. The first section pro- services
vides an overview of Internet telephony architecture and a
discussion of the issues that must be addressed in order for an Central
implementation to succeed. The second section discusses the office/ISP
ATM or
factors affecting QoS, including the trade-offs between various frame relay
end-user technologies and between different coding and error
recovery schemes. It also examines an early implementation of
Internet telephony. The fourth section provides an overview Telephony
of our extensive Internet delay and loss measurements and
examines their implications for VoIP deployment. ■ Figure 1. Integrated services architecture.
Internet Telephony Architecture

number of the intended recipient. The sender’s gateway then
Description looks up and initiates an H.323 session to the IP address of a
The current model of Internet telephony, shown in Fig. 2, is gateway that is as close as possible to the recipient. The recip-
based on the assumption that two (or more) users have ient’s gateway places a call to the recipient’s phone, and then
access to multimedia computers that are connected to the the end-to-end communication can proceed, with voice sent in
Internet. These computers can be on a LAN, as in the case IP packets between the two gateways. Encoding and packeti-
of many corporate computers, or connected via telephone zation occurs in the sender’s gateway, while decoding and
lines to Internet service providers (ISPs), as is the case for reassembly occurs in the recipient’s gateway. The central
most home computers. All sampling, compression and pack- office or hub may digitize the voice before passing it to the
etization of the voice signal occurs in codec hardware and gateway, or a central office bypass may be in place that passes
software on the sender’s PC, while playout of the received the analog signal to the gateway for digitization. In each user’s
signal occurs through a sound card on the receiver’s PC. local loop, the signal is analog. The gateways are implemented
Alternatively, the codec could be implemented in hardware, in hardware — digital signal processors (DSPs) and ASICs
possibly as part of a modem, network interface card, or and are designed for low latency and to be able to handle
sound board. A user places a call by specifying the IP address many simultaneous calls.
of the recipient, or by looking up the recipient’s name in a Naturally, hybrid schemes will arise in which a gateway user
public directory.1 places a call to or receives a call from a PC user. In such a sit-
In contrast to the PC-to-PC architecture, we can use a stan- uation, there must be a mapping or translation service
dard telephone to place and receive a phone call over the between IP addresses and phone numbers. There are four dif-
Internet. A home or office user calls an Internet telephony ferent types of unidirectional paths: PC to PC (PC-PC), gate-
gateway that is located near a central office switch or local way to gateway (GW-GW), PC to gateway (PC-GW), and
hub. Based on caller ID, the user is recognized (for authenti- gateway to PC (GW-PC). In all of these architectures, both
cation and billing purposes) and asked to enter the phone endpoints must employ the same voice codec.
Implementation Issues
Telephone Telephone In this section we compare the architectures
line or line or above by discussing a number of issues which
LAN LAN must be addressed in order for a VoIP imple-
segment segment
Internet
mentation to be feasible. In doing so, we also
implicitly emphasize the advantages and disad-
IP router IP router vantages of each architecture.
Multimedia PC Multimedia PC
Endpoint Requirements — In order for a PC to
support real-time interactive voice, it must have
■ Figure 2. PC-to-PC architecture.
considerable processing power. The computa-
tional requirements of voice codecs increases
with the voice compression ratio (see the sec-
tion on codecs). This is bad news for home
users, because even fast modems are limited to
Telephone Telephone
about 33.6 Kb/s transmission rates. Corporate
users connected to the Internet at T1 or greater
speeds, as well as home users with ISDN or
Central office Central office ADSL connectivity, may be able to employ
Internet
switch or local hub switch or local hub
Internet telephony Internet telephony 1
gateway gateway This will be the case if the recipient is using DHCP [7],
with which an ISP server dynamically assigns IP
■ Figure 3. Gateway architecture. addresses to dial-up users.
IEEE Network • January/February 1998 19

Two-wire side Four-wire side Speaker
Hybrid EC Internet
Mic
PC
Gateway
Line echo
Acoustic feedback
■ Figure 4. Line echo and acoustic feedback.
codecs with lower compression ratios, and thus lower proces- of PCs, the echo cancellation can be designed into the codec.
sor utilization. These two forms of echo are shown in Fig. 4.
Echo Cancellation — There are two types of echo that can Dual Tone Multifrequency (DTMF) Transmission — Touch-tone
impact VoIP. The first is the usual far-end echo caused by the phones transmit phone numbers using a simple combination
four-wire to two-wire hybrid conversion. End users will hear of two sinusoidal tones for each digit. Either these tones, or
their own voice signal bouncing off the remote central office’s the phone number itself, must be reliably passed through the
line-card hybrid. Adaptive cancellation is programmed into network in the GW-GW or PC-GW architectures. Most mod-
outgoing long distance trunks to subtract the echo from the ern telephony hardware is guaranteed to reject tones with a
line on the four-wire side. If the hybrid conversion is done in duration of less than 20 ms and to accept tones with a dura-
a gateway, the gateway must also implement the echo cancel- tion of greater than 40 ms. Tone durations between these
lation. This echo will not exist in the PC-PC architecture. The thresholds may be either accepted or rejected, depending on
second form of echo occurs when a free-air microphone and implementation. Some autodialers generate tones with a dura-
speakers are used, as is the case for most PC endpoints. The tion close to the 40 ms threshold. Linear predictive codecs,
remote user’s voice signal produced by the speakers is picked which are the most promising for VoIP, do not handle DTMF
up by the microphone and echoed back to the remote user. very well — they distort the on/off transitions at the begin-
The microphone may receive the PC speaker signal2 from nings and ends of the tones. This distortion may effectively
multiple paths (i.e., bouncing off the walls and ceiling of the shorten the duration of the tones so that a tone of appropri-
room). This multipath echo is the most difficult type of echo ate length is rejected. Alternatives to tone transmission
to alleviate. Modern high-end speaker phones do a reasonably include using in-band H.245 [8], a control message protocol
good job of echo cancellation with built-in DSPs. In the case for multimedia, or initiating a separate RTP control stream
for sending an ASCII representation of the phone
number. When the number reaches the recipient’s
gateway, the gateway will regenerate the DTMF
signal on the recipient’s local loop.
Audio signal
Clock Synchronization — Whether the communica-
tion endpoints are gateways or PCs, low-frequency
clock drift between the two can cause the receiver
buffer (see the section on packet delays and loss-
Sender Encoder es) overflow or underflow. At the receiver, a clock
synchronization mechanism must be in place that
corrects for clock drift by comparing the times-
tamps of the received RTP packets with a local
clock [5].
Packetizer Internet Decoder
Billing — Gateways support the billing of local
users. Thus, the GW-GW architecture supports
billing of both the sender and recipient, while the
Dynamic Receiver GW-PC and PC-GW architectures support billing
buffer of the sender and recipient, respectively. The PC-
PC architecture does not explicitly support billing
in current IP networks. However, an overall billing
paradigm for real-time services has yet to emerge.
Audio signal It is possible that the caller will be billed for the
2In some rooms, the local user’s voice may also be subject to
■ Figure 5. VoIP data flow. multi-path echo.
20 IEEE Network • January/February 1998

Codec G.723.1 G.729 G.729A
Bit rate 5.3 / 6.4 kb/s 8 kb/s 8 kb/s

call while the recipient will be with different applications in
charged for airtime, not unlike Frame size 30 ms 10 ms 10 ms mind, they all are candidates for
the current cellular telephony enabling VoIP.
model. Processing delay 30 ms 10 ms 10 ms Table 1 lists the characteris-
tics of the codecs [12]. Bit rate
Lookahead delay 7.5 ms 5 ms 5 ms
IP Address/Phone Number Map- refers to the output bit rate of
ping — In order for a call to be Frame length 20/24 bytes 10 bytes 10 bytes the encoder when the input is
connected in any of the architec- standard 64 kb/s pulse code
tures, the sender must supply DSP MIPS 16 20 10.5 modulated (PCM) voice. Frame
either an IP address (for a PC- size is the length of the voice
RAM 2200 3000 2000
based recipient) or a phone signal compressed into each
number (for a gateway-based ■ Table 1. Codec bit rates, delays, and complexity [12]. packet. Processing delay is the
recipient). For PC-PC communi- delay required to run the encod-
cation, the caller must specify ing algorithm on a single frame.
the IP address of the recipient — no mapping is necessary The lookahead delay is the amount of the next frame the
unless DHCP is employed. For all other architectures, the coder uses to encode the current frame in order to take
caller must specify the phone number of the recipient. The advantage of correlation. The effective one-way latency of the
network maps the phone number to the IP address of the encoder is the sum of the frame size, lookahead, and process-
gateway closest to the recipient. These mappings must be ing delay. Typical decode delays are on the order of half the
stored in a dynamic distributed database, not unlike the Inter- encode delays. Frame length is the number of bytes in an
net Domain Name Service (DNS). No standard mapping tech- encoded frame (excluding headers). The DSP MIPS rating
nique has yet emerged, although a promising candidate is the specifies the minimum processor speed, in millions of instruc-
H.323 gatekeeper service [3]. tions per second, required for a DSP implementation of the
encoder. Note that the DSP MIPS rating is not equivalent to
Value-Added Services and Human Factors MIPS ratings of general-purpose CPU microprocessors, such
The PC-PC architecture offers support for a number of value- as those used in PCs. The latter, not specifically designed for
added services, such as multiparty calls via IP multicast as well the task, require greater speeds to encode or decode at the
as voice mail, document sharing, and “distributed whiteboard” required rate. The required RAM for each encoder is given in
applications. The latter is particularly attractive to corporate 16-bit words.
users. Home users with modems are at a disadvantage due to From Table 1, we find that while G.723.1 provides the low-
their limited access bandwidth — coding and decoding of est bit rate, it also suffers from the largest delays. G.729
voice signals may not leave any spare processor or link capaci- trades off a slightly higher bit rate and more complexity for a
ty for other applications. PC telephony can also be integrated significant decrease in delay. G.729A provides the same per-
with video telephony when the latter technology matures. formance as G.729, but with about half the complexity.
However, the gateway architecture has several important
advantages over the PC architecture. First, the telephone is a Bandwidth
familiar tool to a huge customer base, and does not require A prerequisite condition for audio transport is, of course, that
the user to buy a PC or learn new skills. Second, unless head- enough network bandwidth is available. Many users connect
phones are used, PC speakers decrease the privacy of calls, to the Internet at 33.6 kb/s rates or less. The small frame sizes
especially in cubicle-based corporate environments. Third, in of G.729 and G.729A allow for low-latency encoding, but also
order to receive a call on a PC, the PC must be turned on and add a significant overhead if only one frame is encapsulated in
connected to the Internet. And finally, PC telephony does not an RTP packet. This implies that G.723.1 would be favorable
allow the user the mobility common to cordless telephony cus- to home PC users, who must share what little bandwidth they
tomers — the user may walk around the room, but cannot have with data traffic. Corporate users with direct access to
leave it. Ethernet or T1 media may prefer to G.729A for its favorable
delay characteristics.
Factors Affecting Quality of Service Packet Delays and Losses
To transport audio over a nonguaranteed packet-switched net- Network bandwidth is not the only requirement for quality
work, audio samples must be coded (usually with some form audio. Each piece in the data flow pipeline, from coding to
of compression), inserted into packets that have sequence transport to reception to decoding, adds delay to the overall
numbers and creation timestamps, transported by the net- transmission. Some delays are relatively fixed, such as coding
work, received in a playout buffer, decoded in sequential and decoding, while others depend on network conditions.
order, and played back, as seen in Fig. 5. A symmetric scheme Delay due to the transport network is nondeterministic in
is used in the other direction for interactive conversation. All nature. If network conditions are poor, average packet delay
real-time transport schemes use this mechanism. In this sec- and packet delay variance (jitter) will be high (on the order of
tion, we describe the barriers to the operation of these 75–300 ms). Receive buffers can hide jitter at the cost of addi-
schemes, including requirements for codecs, bandwidth, tional delay; however, packets delayed past the point at which
delays, and losses. they are supposed to be played out (the playout point) are
effectively lost.
Codecs IP networks do not guarantee delivery of packets. Due to
Internet telephony services must operate in a bandwidth-, the stringent delay requirements of real-time interactive appli-
delay-, loss-, and cost-constrained environment. This environ- cations, reliable transport protocols such as TCP cannot be
ment has been passed down to the codec development efforts used. Packet loss is unavoidable, but can be compensated for
of the ITU. Recently three ITU codecs, G.723.1 [9], G.729 by codec loss-concealment schemes. For example, G.723.1
[10], and G.729A [11], have been designed to work well in the interpolates a lost frame by simulating the vocal characteris-
presence of these constraints. Although they were designed tics of the previous frame and slowly damping the signal [9].

Buffer delay
High loss Large interactive
at playout delay required delays of 200 ms or less, while more tolerant
Short Long users were satisfied with delays of 300–800 ms.
Frequency
Access Delays
In all proposed end-to-end architectures, users are vul-
nerable to significant hardware, operating system, and
processing delays at one or more PCs (in PC-GW and
GW-PC architectures, the gateway user will also be
Delay affected by the PC user’s latencies). Modern PC sound
cards typically add 20–180 ms of delay. V.34 modems
will add a further 20–40 ms of DSP, equalization, and
■ Figure 6. Delay vs. loss trade-off. processing delay.3 Modems will also incur transmission
delays which are based on the ratio of packet size
(including all headers) to bit rate. Processor and oper-
Informal tests in our lab have found that random, indepen- ating system delays are highly variable. In particular, computa-
dent packet loss rates of up to 10 percent have little notice- tion- or communication-intensive applications will interfere
able impact on G.723.1 speech transmission. with real-time applications. Gateway delays are also nonnegli-
While single packet losses are of little consequence due to gible. A realistic design goal for maximum unidirectional gate-
these schemes, loss bursts, like those produced by the Internet way latency would be in the range of 20-40 ms, not including
[13], can cause noticeable dropouts in the received signal. codec delays.
Forward error correction (FEC) schemes have been proposed
to alleviate loss bursts of a small number of packets [14]. The Trade-offs
effectiveness of FEC in the presence of loss concealment has For packet-switched audio transport, any of the various solu-
not been rigorously studied. The drawback to FEC-based loss tions that exist can be categorized by a trade-off space, as
recovery is that in order to recover packet n (where n is the illustrated in Fig. 7. The three axes shown are bandwidth,
packet’s sequence number), we need, at the very least, to suc- delay, and computational complexity. Any given solution for
cessfully receive packet n + 1. Thus, we will be subjected to at packetized audio can be characterized by its required band-
least one extra frame delay in addition to the processing delay width, end-to-end delay, and computational complexity.
of the FEC encoding and decoding. These additional delays Therefore, any particular solution can also be mapped into
may cause the recovered packet to arrive too late (beyond its the space shown by these axes (note that zero-complexity,
playout point) so that it is lost anyway. zero-bandwidth, and zero-delay solutions are not practically
An alternative loss recovery scheme involves adding copies feasible). Each point on this surface results in decoded speech
of the previous k frames in the packet containing frame n. For of the same quality. Note that the surface shown is a simple
example, when k = 2, packet n will contain frames n, n – 1, monotonic function. In reality, we expect it to be more com-
and n – 2. Then, if packet n – 1 is lost, we can still reconstruct plex.
frame n – 1 from either packet n or packet n + 1. Like other The high point on the bandwidth axis represents traditional
FEC schemes, this one will be most effective in scenarios in telephony, which requires a large amount of bandwidth (64
which we have a receiver buffer depth of several frames. kb/s) but low computational effort, and exhibits low delay.
In the Internet, packet delay is highly variable. It may be to The point near the delay axis represents streamed compressed
our advantage to dynamically modify the receiver buffer audio over the Internet, which may suffer seconds of delay.
depth. Figure 6 shows the interaction between delay and loss For Internet and intranet audio applications, we suggest solu-
for a representative delay distribution. The vertical line repre- tions that lie in the region of low to moderate bandwidth,
sents the playout point. As we move the line to the left, delays intermediate delay, and high computational complexity. These
decrease but loss increases. As we move the line to the right, solutions are generally based on dedicated DSP hardware,
losses are reduced at the expense of such as that found in a gateway, for
higher delays. lower-bit-rate coding.
If the network is not congested, it Bandwidth
is possible to satisfy both delay and Implementation Examples
Quality surface
loss constraints. When network con- Telephone One of the earliest implementations
gestion is high enough, one of the of Internet telephony is the INRIA
two constraints must be broken. User Streamed audio Free Phone [16]. Free Phone has
studies [15] indicate that telephony been implemented entirely in soft-
users find round-trip delays of greater ware. The current version, 3.5, uti-
than about 300 ms more like a half- Delay lizes RTP as well as a separate
duplex connection than a conversa- signaling protocol. A number of
tion. However, user tolerance of codecs are supported, including 64
delays varies significantly from user kb/s PCM [17], 32 kb/s and 24 kb/s
to user and from application to appli- adaptive differential PCM (ADPCM)
cation. The most critical users Gateway [18], 13 kb/s Global System for
Mobile Communication (GSM), and
4.8 kb/s linear predictive coding
3 Modem processing delays vary from vendor (LPC). Free Phone will attempt to
to vendor and configuration to configura-
Computation
keep the loss rate between user-
tion. Informal tests in our laboratory have defined watermarks by using adap-
found that 33.6 kb/s V.34 modems with error tive redundancy and loss con-
correction and data compression turned off ■ Figure 7. Bandwidth, delay, and computation cealment techniques. In particular,
exhibit delays slightly less than 20 ms. trade-offs. lost frames can be reconstructed or

interpolated through redundant copies in later Transmitter
packets or copying adjacent packets that were suc-
Echo server
cessfully received, respectively. As a last resort,
silence is inserted into the stream when a lost pack- Receiver
et cannot be rebuilt. Other freely available software
implementations include Berkeley’s VAT [19] and
GMD Fokus’s NeVoT [20]. A number of commer-
cial VoIP products have been released as well, most
software-based. Hardware-based commercial imple-
mentations are likely to become widely available in Internet
1998.
Internet Delay and Loss Client Server

In this section, we describe a set of sample delay
and loss measurements of the Internet. Previous ■ Figure 8. MID architecture.
research [21–23] in this area has shown that round-
trip Internet delays are often in the hundreds of
milliseconds, and are usually correlated with pack-
et loss. However, these studies were limited in
scope, usually encompassing just a few hours or
days of measurement. Our measurements were
made on three Internet paths over a six-month UCD DePaul
period. They are powerful because they show long-
term trends, as well as short-term and daily char-
acteristics of the Internet. They also provide UIC
real-world numbers that can be used as a refer-
ence by VoIP implementers.
Measurement Barriers
In order to evaluate the impact of Internet delay
on real-time applications, knowledge of unidirec-
tional delay, delay jitter, and loss rates are neces-
sary. These metrics must be available in order to
determine the most effective codecs, transmis-
sion redundancy rates, and receiver buffer size. ■ Figure 9. Measurement paths.
Accurately determining one-way packet delay
from a client host to a server host in the Internet
is difficult due to the need for synchronizing the client and server and server-client delay. It was shown that the second
server clocks. The clock resolution on the most popular moments of client-server and server-client delays are usually
computer architecture (the Intel Pentium family) is poor not symmetrical. This phenomenon is likely due to the asym-
at 10 ms. This, as well as the variability of Internet delay, metric end-to-end routes common in the Internet [26]. Thus,
limits the accuracy of clock synchronization techniques to we know that measuring round-trip delays and simply dividing
a few tens of milliseconds at best [24]. In theory, unidirec- the results by two does not necessarily give us realistic one-
tional delays can be measured accurately by equipping way delays. Given that there is no reasonably accurate tech-
each endpoint with a global positioning system (GPS) nique for measuring unidirectional delays, we must use
satellite transceiver, but this is an expensive solution that round-trip delays for our network latency metric. While our
does not easily scale. measurements cannot be used directly to infer one-way delays,
Measuring the jitter of one-way Internet delay has been they do provide us with general short-term and long-term
done [25] by comparing the relative difference between client- trends of Internet packet delay.
Measurement Architecture
The Internet ping service is commonly used to mea-
Average delay versus hop count sure round-trip delay. ICMP echo packets with times-
250
tamps are used. We chose not to measure round-trip
Numbers indicate 477 delays with ping because some routers will drop
Average delay (ms)
200 geographical distance

in miles ICMP packets before UDP or TCP packets when
413 they are congested. As an alternative, we have devel-
150 1259 oped a UDP-based measurement package called
1744 MID.
100
MID consists of a client and a server to be run on
805
825 two different hosts. The client program consists of
50 919 two processes running in parallel: a transmitter and a
73 203 receiver. The client’s transmitter process uses the
0
0 2 4 6 8 10 12 14 16 18 20 22 24 host’s operating system timer to schedule transmis-
Hop count sion of a stream of UDP packets to the server with
regular interdeparture times. Each packet contains a
■ Figure 10. Round trip delay vs. hop count. sequence number and a client identifier (CID), which

Mean delays for UIC-to-UCD route
200
is a random 10-bit string chosen at the beginning of a session. 180

The server saves the last CID it has received. When the server 160
receives a packet with a CID different than the one it has 140
Mean delay (ms)

stored, it sets its packet count (PC) register to 0. When the
120
server receives a packet with a CID identical to the one it has
stored, it increments its PC by 1. Each packet received by the 100
server is echoed back to the client, and, if not lost, is received 80
by the client’s receiver process. The receiver process logs the
60
sequence number of all packets it receives. When the sender
process has completed sending a stream, it transmits an end- 40
of-transmission (EOT) token to the server, which responds 20
with the current contents of its PC register. If either the EOT 0
or PC packets are lost, the transmitter will timeout and 0 5 10 15 20
retransmit the EOT until it successfully receives the server’s Hours (hour 0 is midnight)
response. Using this software, we can measure client-to-server
loss rate and server-to-client loss rate, as well as round-trip ■ Figure 11. Mean delays over a 24-hour period. Data measured
loss rate. on Friday, January 10 on UIC to UCD route.
Sample Measurements
Delay and loss characteristics were measured from DePaul a server for three minutes. UDP packet size was 80 bytes, and
University to University of Illinois, Chicago (UIC), University interdeparture times were 30 ms apart (except for the larger
of California, Davis (UCD) to DePaul University, and from packet size, our experiments conformed with G.723.1). We
UIC to UCD (Fig. 9). These sites were chosen based on avail- refer to the data collected from such a run as a trace. The
ability and geographical dispersion. Each experiment was con- measurements produced 24 traces/path per day, from a collec-
ducted as follows: once per hour, the client would transmit to tion period starting January 1, 1997, and continuing until mid-
June 1997.
200 Delay and Hop Count Correlation — Observed delays are

much more correlated to the “hop distance” between hosts
180 than to geographical distance. As a quick but effective test,
160 we used the ping program to measure delay versus hop dis-
tance. The results of this test in Fig. 10 illustrate this phe-
Mean delay (ms)
140
nomenon.
120 The results, obtained by running the ping program 100
100 times to each site during one hour and taking average delay,
80
clearly shows the trend. Notice that although sites may be
spaced the same hop distance apart, average delays experi-
60 enced by packets sent to these sites reveals dramatically dif-
40 ferent values. For example, delay between Duke University
20 and DePaul University was measured at 50 ms, while delay
between University of California, Davis and DePaul was 110
0 ms, although both destinations are 13 hops away. The two
0 20 40 60 80 100
Weekday starting January 1 sites that were 413 and 477 miles from the sender exhibited
larger delays than some sites several times more distant. This
■ Figure 12. Weekday mean delay on DePaul to UIC route, Jan- is likely to be due to higher levels of congestion on the paths
uary–June 1997. to the closer sites.
200 200
180 180
160 160
140 140
Mean delay (ms)
Mean delay (ms)
120 120
100 100
80 80
60 60
40 40
20 20
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Weekday starting January 1 Weekday starting January 1
■ Figure 13. Weekday mean delay on UIC-UCD route, January– ■ Figure 14. Weekday mean delay on UCD - DePaul route, Jan-
June 1997. uary–June 1997.

200 Loss
Mean delay (ms)

Poor
Mean delay, UIC-to-UCD route
150 A
100 20%
Potentially
50 useful
0
0 20 40 60 80 100
Weekday starting January 1 10%
Percent packets lost (%) Standard deviation (ms)
50 B
Good
40 Delay standard deviation, UIC-to-UCD route
5% C
30
20 N D
10 Toll
quality
0
0 20 40 60 80 100 100 ms 150 ms 400 ms
Weekday starting January 1
Delay
30
25 ■ Figure 16. QoS mapping for unidirectional Internet
Packet loss, UIC-to-UCD route
20 delay and loss.
15
10
5 delay is due to congestion during heavy-use hours.
0 Poor quality of service is subjectively described in
0 20 40 60 80 100 terms of higher delay means, standard deviations,
Weekday starting January 1
and loss rates. These figures also show a trend
■ Figure 15. Round-trip delay, standard deviation of delay, and loss rate (especially seen in Fig. 13) toward smaller mean
from UIC to UCD at 1:00 pm. delay from January to June.
We gain more insight by further characterizing
the Internet by measuring standard deviation and
Daily Mean Delay Samples — It has been established that loss rates. A sample of these measurements is seen in Fig. 15,
Internet delay follows a diurnal cycle [23]. During the “work- which represents the UIC-to-UCD route at 1 p.m. Note the
ing hours” of about 8 a.m. to 6 p.m., delays are usually greater different scale in the standard deviation measurements from
than during “overnight” hours of midnight to 8 a.m. Evening the mean delay. A positive correlation between mean delay,
hours of 6 p.m. to midnight exhibit moderate delays. In Fig. standard deviation, and loss is seen. During hours of higher
11, note the increase in mean round-trip delay by approxi- delay toward the beginning of the year, greater standard devi-
mately 20 ms during the middle of the business day. We have ation and loss rates are evident. Throughout the next few
also observed a weekly cycle in which delays during a given months, periods when delays are higher than their surround-
hour are generally larger during weekdays than on weekends. ing days are also marked by higher loss rates and standard
Thus, directly comparing daytime versus nighttime or weekday deviations.
versus weekend delays is misleading. In many of our traces, packet loss is highly correlated. In
Figures 12–14 show the daily round-trip mean delays for all other words, given that packet n is lost, the probability that
three paths. The average round-trip packet delays at 9 a.m., 1 packet n + 1 also is lost increases. In [13] packet loss for the
p.m., and 9 p.m. are plotted as a function of the number of month of April is studied in more detail, and we find that
weekdays since January 1. For example, the mean delay at 4 packet loss over all three paths is well modeled by a nonlinear
o’clock on Monday, January 6 would be shown above the Pareto distribution. The latter indicates that the Internet’s
fourth position on the x axis since January 6 is the fourth packet loss process is highly bursty, with a disproportionate
weekday of the year. Mean delays for weekend hours are not amount of individual packet losses occurring in a relatively
shown on the same graphs as those for weekdays for the rea- small number of bursts.
sons described above.
In Figs. 13 and 14, the
majority of the mean delays GW-GW PC-PC
range from 70 ms and 160 ms.
This is in sharp contrast to Source of delay G.723.1 G.729A G.723.1 G.729A
the mean delay points plotted
in Fig. 12, the majority of Encode/decode 82 ms 30 ms 82 ms 30 ms
which range between 20 ms
and 40 ms. The previous fig- Access (source and destination) 40–80 ms 40–80 ms 100–340 ms 100–340 ms
ures represent routes between Transmission (source and destination) < 1 ms < 1 ms 20–40 ms 10–30 ms
California and Illinois, which
have more hops and a much Internet (propagation and queuing) 30–100 ms 30–100 ms 30–100 ms 30–100 ms
greater propagation distance
than those between UIC and Totals 152–262 ms 100–210 ms 232–562 ms 170–500 ms
DePaul within Chicago. We NOTE: In this table we do not consider operating system delays which, in the PC-based
also notice that the delay at 1 architectures, may cause significant additional delays as well as occasional packet loss due to
p.m. in each graph is often kernel buffer overflows.
larger than delays at 9 p.m.
and 9 a.m. The increased ■ Table 2. Expected unidirectional delays using G.723.1 and G.729A.

Discussion cumstances VoIP implementations can reach a QoS accept-
Given the factors that affect QoS from the second section and able by many users. Lower QoS may be acceptable by a large
measured Internet delays from this section, we would like to user base if appropriate pricing incentives are used (e.g., users
provide a mapping from delay and loss rate to QoS. As dis- may accept lower than toll-quality calls if they pay less per
cussed earlier, such a mapping is very complex, and includes a minute than comparable long distance rates).
great deal of site- and implementation-dependent variability,
such as codec, access, network, operating system, and sound
card delays. These factors cannot be easily quantified into a
Conclusions
parsimonious analytical model. As an alternative, we present Internet telephony promises to combine our separate data
an approximate and intuitive representation of the QoS trade- and voice networks into a single transport mechanism. The
offs involved in implementing VoIP. potential benefits for corporate and home users include the
Table 2 shows approximate unidirectional latencies that we reduced cost of only needing to buy a single line to the out-
expect to experience with Internet telephony applications side world as well as lower per-minute telephone rates. How-
using G.723.1 and G.729A in the GW-GW and PC-PC archi- ever, there are significant barriers to acceptable QoS that
tectures. The encode/decode delays are derived from Table 1, must be overcome. Many of these barriers are in the form of
assuming that decode delays are about half of encode delays. trade-offs; finding the best combination of codec, access tech-
Access delays are estimated in the GW-GW case, based on nology, and end-to-end architecture is challenging. The delay
current high-end hardware performance. Access delays in the constraints for QoS are the most limiting, in the sense that we
PC-PC case include sound card delay and four modem pro- may be able to build faster hardware in the future, but we
cessing delays (user and ISP modems at each endpoint). cannot increase the speed of light. These constraints can be
Transmission delays are assumed to be nearly negligible in the somewhat mitigated by using Internet telephony on a local or
GW-GW case, due to the likelihood that gateways will be metropolitan basis. In this article we have summarized the
attached to Ethernet or T1 lines. In the PC-PC architecture, implementation and network issues that form these trade-offs
transmission delays are quite severe and occur at both ends. and have presented six months of Internet delay and loss mea-
Our calculations in the latter case are based on the frame surements. The latter can be used as a guide by VoIP imple-
lengths from Table 1, assuming compressed IP/UDP/RTP menters for evaluating the effectiveness of their schemes.
headers in the best case and uncompressed headers in the
worst case. Internet delays are based on our measurements Acknowledgments
from the 2000-mile Chicago-to-California link. Most of the We are grateful to Ying Wu, Tom Gentles, Stan Naudus, and
round-trip delays that we measured were between 70 and 160 Vijay Nadkarni of 3Com for their insight into the implemen-
ms. We chose an expected unidirectional best case of 30 ms, tation issues of Internet telephony and to Biswanath Mukherjee
but an expected worst case of 100 ms to cover a reasonable of the University of California, Davis for allowing us to use
amount delay asymmetry. Note that although Table 2 may his equipment for measurements. We would also like to thank
imply that G.729A is more promising than G.723.1, G.729A the anonymous reviewers for many useful recommendations.
requires additional bandwidth and processing due to a higher
header-to-payload ratio. References
Figure 16 shows a hypothetical mapping from delay and [1] C. Huitema, IPv6: The New Internet Protocol, Prentice Hall, 1996.
loss to QoS. This mapping is meant to provide the reader with [2] R. Braden et al., “Resource Reservation Protocol (RSVP) — Version 1, Func-
an understanding of the effect that various delay and loss tional Specification,” RFC 2205, Sept. 1997.
[3] G. A. Thom, “H.323: The Multimedia Communications Standard for Local
rates, as well as buffer dynamics, have on QoS, and should not Area Networks,” IEEE Commun. Mag., Dec. 1996.
be taken to be based on user studies. We show four ranges of [4] ITU Rec. H.323, “Visual Telephone Systems and Equipment for Local Area
QoS based on codec properties and the discussion in [15]: toll Networks which Provide a Non-Guaranteed Quality of Service,” Nov. 1996.
quality, for delays of less than 100 ms and low loss rates; good, [5] H. Schulzrinne et al., “RTP: A Transport Protocol for Real-Time Applications,”
RFC 1889, Jan. 1996.
for delays 100–150 ms and slightly higher loss; potentially use- [6] Hyperlink at http://www.imtc.org/i/activity/i_voip.htm; http://www.
ful, for delays from 150–400 ms and higher loss rates; and imtc.org/i/activity/i_voip.htm.
poor, for delays greater than 400 ms and very high loss rates. [7] R. Droms, “Dynamic Host Configuration Protocol,” RFC 2131, Mar. 1997.
We assume that a codec with a reasonably good loss conceal- [8] ITU Rec. H.245, “Control Protocol for Multimedia Communication,” Mar. 1996.
[9] ITU Rec. G.723.1, “Dual Rate Speech Coder for Multimedia Communications
ment algorithm is being used; otherwise loss rates of 5–10 per- Transmitting at 5.3 and 6.3 kbit/s,” Mar. 1996.
cent would result in a poor-quality connection. [10] ITU Rec. G.729, “Coding of Speech at 8 kbit/s Using Conjugate-Structure
Point N in Fig. 16 represents the total delay and loss that Algebraic-Code-Excited Linear-Prediction (CS-ACELP),” Mar. 1996.
is experienced due to the “network” factors listed in Table 2, [11] ITU Rec. G.729 Annex A, “Reduced Complexity 8 kbit/s CS-ACELP Speech
Codec,” Nov. 1996.
including all delays and losses up to the receive buffer. The [12] R. V. Cox, “Three New Speech Coders from the ITU Cover a Range of
system will never actually operate at this point. Point B repre- Applications,” IEEE Commun. Mag., Sept. 1997.
sents a receiver operating under a certain amount of delay [13] M. S. Borella et al., “Analysis of End-to-End Internet Packet Loss: Depen-
and loss. Part of this delay is due to the size of the receiver’s dence and Asymmetry,” Preprint, 1997.
[14] J.-C. Bolot and A. Vega-Garcia, “The Case for FEC-Based Error Control for
buffer. Increasing the buffer size will bring us to point C, Packet Audio in the Internet,” ACM Multimedia Sys., 1997.
which increases the delay and decreases the loss rate. This [15] ITU Rec. G.114, “One-Way Transmission Time,” Feb. 1996.
decrease in loss is due to the fact that by pushing back our [16] Hyperlink at http://zenon.inria.fr/rodeo/fphone/.
playout point we are not dropping as many packets. If we con- [17] ITU Rec. G.711, “Pulse Code Modulation (PCM) of Voice Frequencies,” 1988.
[18] ITU Rec. G.726, “40, 32, 24, 16 kbit/s Adaptive Differential Pulse Code
tinue to increase this buffer toward point D and on to infinity, Modulation (ADPCM),” June 1990.
we reach the “network” loss rate since we no longer drop [19] Hyperlink at http://www-nrg.ee.lbl.gov/vat/.
additional packets. On the other hand, if we decrease the [20] Hyperlink at http://www.fokus.gmd.de/step/hgs/nevot/.
buffer size toward point A and onward to a zero buffer [21] D. Sanghi et al., “Experimental Assessment of End-to-End Behavior on Inter-
net,” Proc. IEEE INFOCOM ’93, Mar. 1993, pp. 867–74.
length, we approach the “network” delay. However, the loss [22] J.-C. Bolot, “Characterizing End-to-End Packet Delay and Loss in the Inter-
rate approaches 100 percent because the packets must arrive net,” J. High Speed Networks, vol. 2, 1993, pp. 305–23.
exactly at the playout point. [23] A. Mukherjee, “On the Dynamics and Significance of Low Frequency Com-
Given our assumptions, we expect that under the best cir- ponents of Internet Load,” Internetworking: Research and Experience, vol. 5,

1994, pp. 163–205. IKHLAQ SIDHU received his B.S.E.E. degree from the University of Illinois, Urbana-
[24] D. L. Mills, “The Network Time Protocol,” RFC 1129, Oct. 1989. Champaign, and M.S.E.E. and Ph.D. degrees from Northwestern University in
[25] K. Claffy, G. Polyzos, and H.-W. Braun, “Measurement Considerations for Evanston, Illinois. He heads the Advanced Technologies Research Center, which
Assessing Unidirectional Latencies,” Internetworking: Research and Experi- is the applied research program for the Carrier Systems Business Unit of 3Com
ence, vol. 4, no. 3, Sept. 1993, pp. 121–32. Corporation. The center’s research agenda includes architectures, technologies,
[26] V. Paxson, “End-to-End Routing Behavior in the Internet,” IEEE/ACM Trans. and new services for next-generation network access.
Networking, vol. 5, no. 5, Oct. 1997, pp. 601–15.
GUIDO M. SCHUSTER received an Ing HTL degree in elektronik, mess- und regel-
technik in 1990 from the Neu Technikum Buchs (NTB), Buchs, St.Gallen, Switzer-
Additional Reading land. He received M.S. and Ph.D. degrees, both in electrical engineering, from
[1] D. Oran, “Voice Quality: An End-to-End Problem,” presented at Nat’l. Com- Northwestern University, Evanston, Illinois, in 1992 and 1996, respectively. In
mun. Forum, Nov. 1997. 1996 he joined the Network Systems Division of U.S. Robotics in Mount Prospect
[2] V. Paxson, “Measurements and Analysis of End-to-End Internet Dynamics,” (now the Carrier Systems Business Unit of 3Com), Illinois, where he cofounded
Ph.D. dissertation, UC-Berkeley, Apr. 1997. the Advanced Technologies Research Center. He has filed and holds several
patents in fields ranging from adaptive control over video compression to Internet
telephony. He is also the author of the book Rate-Distortion Based Video Com-
Biographies pression (Kluwer, 1997). His current research interests are operational rate-dis-
THOMAS J. KOSTAS received a B.S. degree in computer engineering and a B.A. tortion theory, source and channel coding, and networked multimedia.
degree in economics from Lehigh University, Bethlehem, Pennsylvania, in 1990,
and an M.S. degree in electrical engineering from Northwestern University, JACEK GRABIEC received a B.S. degree from the Department of Electrical Engi-
Evanston, Illinois, in 1993. He is currently working at 3Com Advanced Technolo- neering, Automation and Electronics of the University of Mining and Metallurgy,
gies Research Center in Mount Prospect, Illinois. He is also pursuing a Ph.D. Krakow, Poland. He is currently a software engineer at the 3Com Advanced
degree in electrical engineering at Northwestern University. Technologies Research Center in Mount Prospect, Illinois. His research interests
include modeling and simulation of communication systems, network perfor-
MICHAEL S. BORELLA (mike_borella@mw.ecom.com) received a B.S. in computer mance evaluation, and data traffic analysis.
science and technical communications from Clarkson University in 1991, and M.S.
and Ph.D. degrees in computer science from UC Davis, in 1994 and 1995, J ERRY M AHLER received B.S. and M.S. degrees in electrical engineering from
respectively. He is currently a senior engineer at the 3Com Advanced Technologies Northwestern University, Evanston, Illinois, in 1995 and 1997, respectively. He is
Research Center in Mount Prospect, IL. He has authored over 35 papers and currently a research engineer at the 3Com Advanced Technologies Research
patents in various areas of networking. His research interests include internet- Center in Mount Prospect, Illinois. His research interests include networking, per-
working, multimedia, performance evaluation, traffic analysis, and operating systems. formance evaluation, multimedia, and wireless communications.

Real-Time Voice Over Packet-Switched Networks

Uploaded by

Copyright:

Available Formats

Real-Time Voice Over Packet-Switched Networks

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Real-Time Voice Over Packet-Switched Networks

Uploaded by

Copyright:

Available Formats

12

Real-Time Voice Over

T he concept of an integrated services network with both

18 0890-8044/98/$10.00 © 1998 IEEE IEEE Network • January/February 1998

provide any QoS guarantees, but does specify that a reliable

Internet Telephony Architecture

IEEE Network • January/February 1998 19

■ Figure 4. Line echo and acoustic feedback.

20 IEEE Network • January/February 1998

Bit rate 5.3 / 6.4 kb/s 8 kb/s 8 kb/s

IEEE Network • January/February 1998 21

22 IEEE Network • January/February 1998

Internet Delay and Loss Client Server

200 geographical distance

IEEE Network • January/February 1998 23

is a random 10-bit string chosen at the beginning of a session. 180

Mean delay (ms)

200 Delay and Hop Count Correlation — Observed delays are

24 IEEE Network • January/February 1998

Mean delay (ms)

IEEE Network • January/February 1998 25

26 IEEE Network • January/February 1998

IEEE Network • January/February 1998 27

You might also like