Real-Time Voice Over Packet-Switched Networks
Real-Time Voice Over Packet-Switched Networks
Real-Time Voice Over Packet-Switched Networks
Abstract
We discuss the architecture and technical viability of transporting real-time voice
over packet-switched networks such as the Internet. The value of integrating voice
and data networks onto a common platform is well known. The telephony industry
has proposed the ATM standard as a means of upgrading the Internet to provide
both real-time and data services. In contrast, voice services may be added to tradi-
tional IP networks that were originally designed for data transmission alone. Here,
we consider the feasibility and expected quality of service of audio applications
over IP networks such as the Internet. In particular, we examine possible architec-
tures for voice over IP and discuss measured Internet delay and loss characteristics.
Implementation Issues
Telephone Telephone In this section we compare the architectures
line or line or above by discussing a number of issues which
LAN LAN must be addressed in order for a VoIP imple-
segment segment
Internet
mentation to be feasible. In doing so, we also
implicitly emphasize the advantages and disad-
IP router IP router vantages of each architecture.
Multimedia PC Multimedia PC
Endpoint Requirements — In order for a PC to
support real-time interactive voice, it must have
■ Figure 2. PC-to-PC architecture.
considerable processing power. The computa-
tional requirements of voice codecs increases
with the voice compression ratio (see the sec-
tion on codecs). This is bad news for home
users, because even fast modems are limited to
Telephone Telephone
about 33.6 Kb/s transmission rates. Corporate
users connected to the Internet at T1 or greater
speeds, as well as home users with ISDN or
Central office Central office ADSL connectivity, may be able to employ
Internet
switch or local hub switch or local hub
Internet telephony Internet telephony 1
gateway gateway This will be the case if the recipient is using DHCP [7],
with which an ISP server dynamically assigns IP
■ Figure 3. Gateway architecture. addresses to dial-up users.
Hybrid EC Internet
Mic
PC
Gateway
Line echo
Acoustic feedback
codecs with lower compression ratios, and thus lower proces- of PCs, the echo cancellation can be designed into the codec.
sor utilization. These two forms of echo are shown in Fig. 4.
Echo Cancellation — There are two types of echo that can Dual Tone Multifrequency (DTMF) Transmission — Touch-tone
impact VoIP. The first is the usual far-end echo caused by the phones transmit phone numbers using a simple combination
four-wire to two-wire hybrid conversion. End users will hear of two sinusoidal tones for each digit. Either these tones, or
their own voice signal bouncing off the remote central office’s the phone number itself, must be reliably passed through the
line-card hybrid. Adaptive cancellation is programmed into network in the GW-GW or PC-GW architectures. Most mod-
outgoing long distance trunks to subtract the echo from the ern telephony hardware is guaranteed to reject tones with a
line on the four-wire side. If the hybrid conversion is done in duration of less than 20 ms and to accept tones with a dura-
a gateway, the gateway must also implement the echo cancel- tion of greater than 40 ms. Tone durations between these
lation. This echo will not exist in the PC-PC architecture. The thresholds may be either accepted or rejected, depending on
second form of echo occurs when a free-air microphone and implementation. Some autodialers generate tones with a dura-
speakers are used, as is the case for most PC endpoints. The tion close to the 40 ms threshold. Linear predictive codecs,
remote user’s voice signal produced by the speakers is picked which are the most promising for VoIP, do not handle DTMF
up by the microphone and echoed back to the remote user. very well — they distort the on/off transitions at the begin-
The microphone may receive the PC speaker signal2 from nings and ends of the tones. This distortion may effectively
multiple paths (i.e., bouncing off the walls and ceiling of the shorten the duration of the tones so that a tone of appropri-
room). This multipath echo is the most difficult type of echo ate length is rejected. Alternatives to tone transmission
to alleviate. Modern high-end speaker phones do a reasonably include using in-band H.245 [8], a control message protocol
good job of echo cancellation with built-in DSPs. In the case for multimedia, or initiating a separate RTP control stream
for sending an ASCII representation of the phone
number. When the number reaches the recipient’s
gateway, the gateway will regenerate the DTMF
signal on the recipient’s local loop.
Audio signal
Clock Synchronization — Whether the communica-
tion endpoints are gateways or PCs, low-frequency
clock drift between the two can cause the receiver
buffer (see the section on packet delays and loss-
Sender Encoder es) overflow or underflow. At the receiver, a clock
synchronization mechanism must be in place that
corrects for clock drift by comparing the times-
tamps of the received RTP packets with a local
clock [5].
Packetizer Internet Decoder
Billing — Gateways support the billing of local
users. Thus, the GW-GW architecture supports
billing of both the sender and recipient, while the
Dynamic Receiver GW-PC and PC-GW architectures support billing
buffer of the sender and recipient, respectively. The PC-
PC architecture does not explicitly support billing
in current IP networks. However, an overall billing
paradigm for real-time services has yet to emerge.
Audio signal It is possible that the caller will be billed for the
2In some rooms, the local user’s voice may also be subject to
■ Figure 5. VoIP data flow. multi-path echo.
Frequency
Access Delays
In all proposed end-to-end architectures, users are vul-
nerable to significant hardware, operating system, and
processing delays at one or more PCs (in PC-GW and
GW-PC architectures, the gateway user will also be
Delay affected by the PC user’s latencies). Modern PC sound
cards typically add 20–180 ms of delay. V.34 modems
will add a further 20–40 ms of DSP, equalization, and
■ Figure 6. Delay vs. loss trade-off. processing delay.3 Modems will also incur transmission
delays which are based on the ratio of packet size
(including all headers) to bit rate. Processor and oper-
Informal tests in our lab have found that random, indepen- ating system delays are highly variable. In particular, computa-
dent packet loss rates of up to 10 percent have little notice- tion- or communication-intensive applications will interfere
able impact on G.723.1 speech transmission. with real-time applications. Gateway delays are also nonnegli-
While single packet losses are of little consequence due to gible. A realistic design goal for maximum unidirectional gate-
these schemes, loss bursts, like those produced by the Internet way latency would be in the range of 20-40 ms, not including
[13], can cause noticeable dropouts in the received signal. codec delays.
Forward error correction (FEC) schemes have been proposed
to alleviate loss bursts of a small number of packets [14]. The Trade-offs
effectiveness of FEC in the presence of loss concealment has For packet-switched audio transport, any of the various solu-
not been rigorously studied. The drawback to FEC-based loss tions that exist can be categorized by a trade-off space, as
recovery is that in order to recover packet n (where n is the illustrated in Fig. 7. The three axes shown are bandwidth,
packet’s sequence number), we need, at the very least, to suc- delay, and computational complexity. Any given solution for
cessfully receive packet n + 1. Thus, we will be subjected to at packetized audio can be characterized by its required band-
least one extra frame delay in addition to the processing delay width, end-to-end delay, and computational complexity.
of the FEC encoding and decoding. These additional delays Therefore, any particular solution can also be mapped into
may cause the recovered packet to arrive too late (beyond its the space shown by these axes (note that zero-complexity,
playout point) so that it is lost anyway. zero-bandwidth, and zero-delay solutions are not practically
An alternative loss recovery scheme involves adding copies feasible). Each point on this surface results in decoded speech
of the previous k frames in the packet containing frame n. For of the same quality. Note that the surface shown is a simple
example, when k = 2, packet n will contain frames n, n – 1, monotonic function. In reality, we expect it to be more com-
and n – 2. Then, if packet n – 1 is lost, we can still reconstruct plex.
frame n – 1 from either packet n or packet n + 1. Like other The high point on the bandwidth axis represents traditional
FEC schemes, this one will be most effective in scenarios in telephony, which requires a large amount of bandwidth (64
which we have a receiver buffer depth of several frames. kb/s) but low computational effort, and exhibits low delay.
In the Internet, packet delay is highly variable. It may be to The point near the delay axis represents streamed compressed
our advantage to dynamically modify the receiver buffer audio over the Internet, which may suffer seconds of delay.
depth. Figure 6 shows the interaction between delay and loss For Internet and intranet audio applications, we suggest solu-
for a representative delay distribution. The vertical line repre- tions that lie in the region of low to moderate bandwidth,
sents the playout point. As we move the line to the left, delays intermediate delay, and high computational complexity. These
decrease but loss increases. As we move the line to the right, solutions are generally based on dedicated DSP hardware,
losses are reduced at the expense of such as that found in a gateway, for
higher delays. lower-bit-rate coding.
If the network is not congested, it Bandwidth
is possible to satisfy both delay and Implementation Examples
Quality surface
loss constraints. When network con- Telephone One of the earliest implementations
gestion is high enough, one of the of Internet telephony is the INRIA
two constraints must be broken. User Streamed audio Free Phone [16]. Free Phone has
studies [15] indicate that telephony been implemented entirely in soft-
users find round-trip delays of greater ware. The current version, 3.5, uti-
than about 300 ms more like a half- Delay lizes RTP as well as a separate
duplex connection than a conversa- signaling protocol. A number of
tion. However, user tolerance of codecs are supported, including 64
delays varies significantly from user kb/s PCM [17], 32 kb/s and 24 kb/s
to user and from application to appli- adaptive differential PCM (ADPCM)
cation. The most critical users Gateway [18], 13 kb/s Global System for
Mobile Communication (GSM), and
4.8 kb/s linear predictive coding
3 Modem processing delays vary from vendor (LPC). Free Phone will attempt to
to vendor and configuration to configura-
Computation
keep the loss rate between user-
tion. Informal tests in our laboratory have defined watermarks by using adap-
found that 33.6 kb/s V.34 modems with error tive redundancy and loss con-
correction and data compression turned off ■ Figure 7. Bandwidth, delay, and computation cealment techniques. In particular,
exhibit delays slightly less than 20 ms. trade-offs. lost frames can be reconstructed or
Measurement Barriers
In order to evaluate the impact of Internet delay
on real-time applications, knowledge of unidirec-
tional delay, delay jitter, and loss rates are neces-
sary. These metrics must be available in order to
determine the most effective codecs, transmis-
sion redundancy rates, and receiver buffer size. ■ Figure 9. Measurement paths.
Accurately determining one-way packet delay
from a client host to a server host in the Internet
is difficult due to the need for synchronizing the client and server and server-client delay. It was shown that the second
server clocks. The clock resolution on the most popular moments of client-server and server-client delays are usually
computer architecture (the Intel Pentium family) is poor not symmetrical. This phenomenon is likely due to the asym-
at 10 ms. This, as well as the variability of Internet delay, metric end-to-end routes common in the Internet [26]. Thus,
limits the accuracy of clock synchronization techniques to we know that measuring round-trip delays and simply dividing
a few tens of milliseconds at best [24]. In theory, unidirec- the results by two does not necessarily give us realistic one-
tional delays can be measured accurately by equipping way delays. Given that there is no reasonably accurate tech-
each endpoint with a global positioning system (GPS) nique for measuring unidirectional delays, we must use
satellite transceiver, but this is an expensive solution that round-trip delays for our network latency metric. While our
does not easily scale. measurements cannot be used directly to infer one-way delays,
Measuring the jitter of one-way Internet delay has been they do provide us with general short-term and long-term
done [25] by comparing the relative difference between client- trends of Internet packet delay.
Measurement Architecture
The Internet ping service is commonly used to mea-
Average delay versus hop count sure round-trip delay. ICMP echo packets with times-
250
tamps are used. We chose not to measure round-trip
Numbers indicate 477 delays with ping because some routers will drop
Average delay (ms)
Sample Measurements
Delay and loss characteristics were measured from DePaul a server for three minutes. UDP packet size was 80 bytes, and
University to University of Illinois, Chicago (UIC), University interdeparture times were 30 ms apart (except for the larger
of California, Davis (UCD) to DePaul University, and from packet size, our experiments conformed with G.723.1). We
UIC to UCD (Fig. 9). These sites were chosen based on avail- refer to the data collected from such a run as a trace. The
ability and geographical dispersion. Each experiment was con- measurements produced 24 traces/path per day, from a collec-
ducted as follows: once per hour, the client would transmit to tion period starting January 1, 1997, and continuing until mid-
June 1997.
140
nomenon.
120 The results, obtained by running the ping program 100
100 times to each site during one hour and taking average delay,
80
clearly shows the trend. Notice that although sites may be
spaced the same hop distance apart, average delays experi-
60 enced by packets sent to these sites reveals dramatically dif-
40 ferent values. For example, delay between Duke University
20 and DePaul University was measured at 50 ms, while delay
between University of California, Davis and DePaul was 110
0 ms, although both destinations are 13 hops away. The two
0 20 40 60 80 100
Weekday starting January 1 sites that were 413 and 477 miles from the sender exhibited
larger delays than some sites several times more distant. This
■ Figure 12. Weekday mean delay on DePaul to UIC route, Jan- is likely to be due to higher levels of congestion on the paths
uary–June 1997. to the closer sites.
200 200
180 180
160 160
140 140
Mean delay (ms)
Mean delay (ms)
120 120
100 100
80 80
60 60
40 40
20 20
0 0
0 20 40 60 80 100 0 20 40 60 80 100
Weekday starting January 1 Weekday starting January 1
■ Figure 13. Weekday mean delay on UIC-UCD route, January– ■ Figure 14. Weekday mean delay on UCD - DePaul route, Jan-
June 1997. uary–June 1997.
50 B
Good
40 Delay standard deviation, UIC-to-UCD route
5% C
30
20 N D
10 Toll
quality
0
0 20 40 60 80 100 100 ms 150 ms 400 ms
Weekday starting January 1
Delay
30
25 ■ Figure 16. QoS mapping for unidirectional Internet
Packet loss, UIC-to-UCD route
20 delay and loss.
15
10
5 delay is due to congestion during heavy-use hours.
0 Poor quality of service is subjectively described in
0 20 40 60 80 100 terms of higher delay means, standard deviations,
Weekday starting January 1
and loss rates. These figures also show a trend
■ Figure 15. Round-trip delay, standard deviation of delay, and loss rate (especially seen in Fig. 13) toward smaller mean
from UIC to UCD at 1:00 pm. delay from January to June.
We gain more insight by further characterizing
the Internet by measuring standard deviation and
Daily Mean Delay Samples — It has been established that loss rates. A sample of these measurements is seen in Fig. 15,
Internet delay follows a diurnal cycle [23]. During the “work- which represents the UIC-to-UCD route at 1 p.m. Note the
ing hours” of about 8 a.m. to 6 p.m., delays are usually greater different scale in the standard deviation measurements from
than during “overnight” hours of midnight to 8 a.m. Evening the mean delay. A positive correlation between mean delay,
hours of 6 p.m. to midnight exhibit moderate delays. In Fig. standard deviation, and loss is seen. During hours of higher
11, note the increase in mean round-trip delay by approxi- delay toward the beginning of the year, greater standard devi-
mately 20 ms during the middle of the business day. We have ation and loss rates are evident. Throughout the next few
also observed a weekly cycle in which delays during a given months, periods when delays are higher than their surround-
hour are generally larger during weekdays than on weekends. ing days are also marked by higher loss rates and standard
Thus, directly comparing daytime versus nighttime or weekday deviations.
versus weekend delays is misleading. In many of our traces, packet loss is highly correlated. In
Figures 12–14 show the daily round-trip mean delays for all other words, given that packet n is lost, the probability that
three paths. The average round-trip packet delays at 9 a.m., 1 packet n + 1 also is lost increases. In [13] packet loss for the
p.m., and 9 p.m. are plotted as a function of the number of month of April is studied in more detail, and we find that
weekdays since January 1. For example, the mean delay at 4 packet loss over all three paths is well modeled by a nonlinear
o’clock on Monday, January 6 would be shown above the Pareto distribution. The latter indicates that the Internet’s
fourth position on the x axis since January 6 is the fourth packet loss process is highly bursty, with a disproportionate
weekday of the year. Mean delays for weekend hours are not amount of individual packet losses occurring in a relatively
shown on the same graphs as those for weekdays for the rea- small number of bursts.
sons described above.
In Figs. 13 and 14, the
majority of the mean delays GW-GW PC-PC
range from 70 ms and 160 ms.
This is in sharp contrast to Source of delay G.723.1 G.729A G.723.1 G.729A
the mean delay points plotted
in Fig. 12, the majority of Encode/decode 82 ms 30 ms 82 ms 30 ms
which range between 20 ms
and 40 ms. The previous fig- Access (source and destination) 40–80 ms 40–80 ms 100–340 ms 100–340 ms
ures represent routes between Transmission (source and destination) < 1 ms < 1 ms 20–40 ms 10–30 ms
California and Illinois, which
have more hops and a much Internet (propagation and queuing) 30–100 ms 30–100 ms 30–100 ms 30–100 ms
greater propagation distance
than those between UIC and Totals 152–262 ms 100–210 ms 232–562 ms 170–500 ms
DePaul within Chicago. We NOTE: In this table we do not consider operating system delays which, in the PC-based
also notice that the delay at 1 architectures, may cause significant additional delays as well as occasional packet loss due to
p.m. in each graph is often kernel buffer overflows.
larger than delays at 9 p.m.
and 9 a.m. The increased ■ Table 2. Expected unidirectional delays using G.723.1 and G.729A.