Asterisk Curso Completo
Asterisk Curso Completo
Asterisk Curso Completo
_
IEEE
COMMUNICA TlONS
IS U R V E Y S'-----
The Electronic Magazine of
Original Peer-Reviewed Survey Articles
www.comsoc.org/pubs/surveys
ABSTRACT
This article provides a tutorial overview of voice over the Internet, examining the
effects of moving voice traffic over the packet switched Internet and comparing this
with the effects of moving voice over the more traditional circuit-switched telephone
system. The emphasis of this document is on areas of concern to a backbone service
provider implementing Voice over IP (VoIP). We begin by providing overviews of
the Plain Old Telephone Service (POTS) and VoIP. We then discuss techniques
service providers can use to help preserve service quality on their VoIP networks.
Next, we briefly discuss Voice over ATM (VoATM) as an alternative to VoIP.
Finally, we offer some conclusions.
T
his article provides a tutorial overview of one of the POTS
Internet traffic types likely to see significant future
growth, Voice over the Internet (a.k.a. Voice over IP or POTS today consists of a mix of the very old and the very
VoIP). The effects of moving voice traffic over a packet- new. Figure 1 presents a simplified view of POTS connectivity.
switched statistically multiplexed network such as the Internet, End-user telephones typically use twisted pair copper cabling
which was never really designed to handle this type of source, to deliver analog voice signals to the central office (CO). The
are examined and compared with the effects of moving voice vast majority of inter-office communications are now carried
over the more traditional circuit-switched time division multi over fiber optic trunks.
plexed telephone system, more affectionately known as POTS. Today, most long-haul voice traffic is digitized. At the CO
The emphasis of this article is on areas of concern to a back switch, the analog signals are passed through a narrow-band
bone service provider implementing VoIP. width bandpass filter, sampled at a rate of 8,000 samples/sec
This article is organized as follows. We first provide an ond and quantized (rounded off) to one of 256 unequally
overview of POTS. We then examine a non-traditional, pack spaced voltage values. A technique known as Pulse Code
et-based, voice transport, VoIP. We next survey techniques Modulation (PCM) is then used to assign an 8-bit code word
service providers can use to help preserve quality on their to each voltage, resulting in 64 kb requiring transport every
VoIP networks, while the following section considers the second. This 8-bit code word is the basic transmission entity of
effects that design choices and failings in the network have on POTS. POTS backbones use Time Division Multiplexing
this quality. We discuss the trade-offs involved in terms of (TDM) and circuit switching to efficiently transmit this entity.
maximizing the number of calls supported while simultaneous Circuit switching is used to dedicate sufficient bandwidth to
ly preserving end-to-end delivery bounds and meeting mini support this bit rate for the duration of the phone call. An
mum quality standards. Comments regarding another end-to-end path is set up, and resources are dedicated, prior
alternative voice transport system, ATM, are offered. Our to the initiation of the actual voice conversation. As PCM out
conclusions are then summarized. puts 8 bits every l/8000th of a second in a predictable, deter-
=
I An appropriately sized receiver buffer is critical
to the proper operation of VoIP systems. By sizing,
Q -
Voice
decoder
Receiver
buffer
Packet
switch
we are referring to two parameters: the fill delay
StatMux mentioned immediately above, and the buffer stor
trunks age capacity. If the fill delay is chosen too small
and a group of packets arrives later than expected,
• FIGURE 3. Sources of VolP delay it is possible that the buffer will empty, which will
result in a loss of voice output. If the fill delay is
chosen too large, then excessive time is lost in the
while listening to the other party talking, as well as pauses receiver buffer waiting to be played out, which reduces the
between sentences, and even pauses between some words. A time allotted to other devices in order that end-to-end deliv
G.729B coder with silence suppression will output 8 kb/s dur ery goals be met. If the buffer storage capacity is too small
ing talk spurts (40 percent of the time), and nothing or a then a burst of packets received may result in inadequate stor
reduced bit rate during intervening silence intervals (60 per age space being available and dropped packets.
cent of the time). The use of silence suppression potentially With the use of time lines, and assuming no packets are
allows a G.729 coder to reduce its average output from 8 kb/s dropped by intermediate packet switches, it is possible to
to 3.2 kb/s (alternating 8 kb/s and 0 kb/s bursts). show that to be 100 percent certain that the receiver buffer
does not empty, packet playback should commence no earli
PACKET ASSEMBLER er than WCDeliv seconds after assembly, where WCDeliv is
equal to the worst case packet end-to-end delivery time. A
One decision faced by every operator of a VoIP telephony more complicated statistical analysis would be necessary to
voice switch is how many frames from the voice coder to determine tolerable receiver buffer delay if packets are
include in each transmitted packet. A typical VoIP packet dropped by the network, or if a possibility of the receiver
requires about 47 bytes of overhead: 8 bytes for the User buffer emptying is acceptable. If VoIP traffic is routed over
Datagram Protocol (UDP), 12 bytes for Real Time Transport an Internet backbone augmented with voice gateways han
Protocol (RTP), 20 bytes for the Internet Protocol (IP), and 7 dling connection admission control to limit the number of
bytes for the Point-to-Point Protocol. Header compression of voice calls, and these gateways are knowledgeable of the
the UDP and IP headers can reduce the amount of overhead end-to-end voice paths through the network and worst-case
on low-speed links, but is not a standardized option on high delays at transited routers, the above technique is, at least in
speed carrier backbones. It would not be cost effective to take theory, possible.
the one byte frame output of a G.71 1 coder, packetize it, and However, if the traffic is routed over the commodity Inter
immediately transport this 48 byte packet, as 98 percent (47 net, where the exact path through the network and the worst
out of 48) of the bytes transmitted would be overhead. In this case delay through routers is not known, determination of
case, it is better to place the output of multiple frames into WCDeliv may be difficult, if not impossible, to obtain. In this
one packet. The disadvantage of this is that time is lost at the case the packet playback might be delayed a value large
transmitter site waiting for the desired number of voice enough such that it is (hopefully) rarely exceeded by the actu
frames to be generated. al packet end-to-end transit time. An alternative option, and
The choice of the number of frames in a packet can seri one that tends to be favored today, would be to adaptively
ously impact the number of phone calls a VoIP network can adjust the de-jitter buffer delay to account for the time vary
support. Too few frames/packet results in a considerable per ing changes of the voice packet delivery delays.
centage of bandwidth being allocated to the movement of To summarize, in comparing Fig. 2 with Fig. 3, it should be
overhead bytes. Too many frames in a packet results in an noted that a VoIP system is more complex than POTS. VoIP
excessive amount of time being lost assembling a packet at the has more sources of delay, and the delay through these ele-
transmitter, reducing the time remaining to meet the end
to-end delivery time bound. In the latter case, trunk loads
may have to be decreased in order to maintain switch Codec G.711 G.723.1 G.726-32 G.729
�
Oklahoma City, then back to Stillwater, traversing a total of
10 routers. �0'"
If)
�
�
Most of the variability in the inter-arrival time of VoIP 4- Q)
>,-"
co u Low pcioc", do'o,
packets at the destination is due to varying queue times at the -co
Q)o..
"0"0
intermediate packet switches. A large amount of resulting jit i H igh priority delay
ter will require a large receiver de-jitter buffer delay to
Q) Q)
Ol�
co Q)
>
� :'=
Overa l l delay
: /;
smooth things out. Keeping the number of intervening packet Q) :
> Q) "
switches low can make a significant difference here. This may
be difficult to accomplish on the commodity Internet.
«"0
::::::::::::;;:.�
_______ ___
__ - ... ...
.......
i� ,
Queuing delays in packet switches can be controlled by either Preferential treatment reduces the delay seen by high·priority
controlling the load on interconnecting trunk links, prioritiz traffic and increases delays seen by lower·priority traffic.
ing the voice traffic, or enabling some combination of the two.
lf the trunks are kept lightly loaded, best-effort first in, • FIGURE 5. Heavy trunk loading
Another choice faced by the VoIP system designer is whether Coder distortion is frequently assessed via the mean opinion
or not to segregate the time-sensitive voice traffic from the score (MaS), which is a measure that specifies the "perceptu
data traffic, where the latter refers to man-computer or com al similarity between an uncoded source signal and its coded
puter-computer communications. Potentially, carriers will gain version" [1 6]. This score is subjective in nature, and is assigned
the most economy by integrating all types of traffic over a by a panel of trained listeners awarding values ranging from 1
common core. However, given the current state of the com to 5, with a "5" being no degradation noted and a "4" being
mercial Internet, which by and large remains a best-effort net perceptible degradation noted, but not annoying. An MaS
work, and traffic increases that strain a carrier's ability to score of "1" is awarded if the degradation is very annoying. Of
deploy bandwidth fast enough to remain ahead of the growth the coders listed in Table 1, the POTS mainstay, G.7 1 1 , has
curve, integrating time-sensitive voice traffic with other traffic the highest MaS score of 4.3, G.723.1 has the lowest with a
on Internet backbones is not necessarily a good idea if the 3.9, and the other two lie in between [1]. While the caliber of
carrier's goal is to offer a VoIP service with quality approach these coders is very similar, a VoIP system using G.729 does
ing that of POTS. begin its journey from the source coder with a slight quality
Given the state of the art, some carriers have chosen to disadvantage compared to POTS.
segregate their VoIP traffic and move it via a dedicated VoIP
network separate from the network moving data. PACKET Loss
MAINTAINING THE NETWORK: TRAFFIC ENGINEERING Packet loss can severely affect playback quality. One packet
can contain one or more speech frames depending on the
One problem with the Internet today is that, by and large, the implementation. While the actual impact of a lost packet will
system still transmits traffic as datagrams wherein switching depend on the amount of compression delivered by the voice
devices forward each packet independently of all other pack coder and the number of voice frames per packet, a typical
ets. As a result there is no guarantee that traffic that is part of requirement is that the loss rate be one percent or less [ 1 5,
the same information transfer will follow the same path. As 17].
routers typically update their routing tables several times an Error masking can be used to help overcome the deteriora
hour it is possible that the end-to-end path taken by the voice tion of quality of the decoded speech due to packet loss.
traffic may shift, possibly to a path with more hops, a larger Information in the last correct frame can be used by the
delay, or more delay variation. Should this happen, the care receiver decoder to mask missing information. Unfortunately,
fully engineered VoIP system could be thrown into disarray packet losses encountered in digital transmission over the
and fail to meet specifications. commodity Internet generally appear in bursts. This tends to
One advantage of Multi-Protocol Label Switching (MPLS) reduce the effectiveness of voice decoder error concealment,
is that it enables virtual circuits, which allow the path through as the decoder may need to mask the loss of several consecu
the system to be designated in advance and fixed in place [9]. tive packets, each carrying multiple frames [1 8]. Good CAC
Designating the path in advance allows the option of setting procedures at gateways to a carrier VoIP backbone can help
aside and reserving switch resources, such as buffer space and reduce the probability of lost packets.
bandwidth, for specific traffic flows [10]. Having predictable
paths through the system makes traffic engineering simpler END-To-END DELIVERY DELAY
and more feasible, increasing the probability that the system
can be configured to perform reliably. Tests have shown that the perceived quality of an interactive
Installing voice gateways at access points to the VoIP back voice conversation depends heavily on the time that elapses
bone can allow the implementation of connection admission between voice energy hitting a microphone and playing out on
control (CAC), such that the number of allowed voice calls the destination speaker. As this time increases, the quality
can be monitored and calls blocked if network resources are steadily degrades in that users become more likely to acciden
insufficient to support a new request. tally talk over each other. This time need not be very large
before problems commence. The lead author has personally
experienced this difficulty during satellite-based phone calls
DEALING WITH DEGRADATION: between Korea and the United States back in the early 1980s.
The one-way delivery time of approximately 0.3 to 0.4 sec
THE IMPACT OF PACKET Loss,
onds, coupled with brain processing time at the far end while
CODER DISTORTION, AND a reply was formulated, and then the return trip of the audio,
was perceived as an unnaturally long time interval that
END-TO-END DELAY ON VOICE QUALITY
required some practice to adjust to. There are other effects as
The previous section discussed techniques that carriers can well. For example, a pause due to a lengthy round-trip time
use to help insure that a backbone carrying VoIP traffic is might be mistaken as hesitation on the part of the party with
able to provide reliable and timely delivery of voice packets. whom you are talking. International Telecommunications
However, despite careful engineering, end-to-end delivery Union (ITU) standard G. 1 14 recommends 1 50 msec as the
delay is going to be greater than that on POTS, and packets maximum allowable end-to-end delivery value for VoIP sys
will be lost. In this section, we discuss how these factors, cou tems, and that will be considered the worst case target value
pled with distortion in the voice coding algorithms, can effect for this article.
the perceived quality of the reconstructed voice at the receiv
er. We also note one important tool that can be used to esti ECHO CONTROL [19]
mate the overall voice quality.
The POTS local loop typically uses two wires which carry both
inbound and outbound analog voice traffic, whereas the long
distance telephone network uses what are called four wire sys
tems with the inbound audio (ideally) separated from the out-
lTV standard G.107, known as the E-Model, attempts to pre To maximize resulting revenues, the system engineer will want
dict how a typical user would rate the overall quality of a to maximize the number of calls the network will support,
phone call. Based on a large set of subjective experiments, the while meeting or exceeding the quality level indicated above.
model can generate a numerical score that accounts for fac To better understand the impact certain design choices have
tors such as coder distortion, packet loss, end-to-end delay, on the ability of a VoIP system to support calls, it is construc
and echo, as well as other parameters such as noise. This tive to look in some detail at a specific example. Fig. 7 shows
resulting score is known as a the transmission rating factor R, the network used in this example, where phones are connect-
to a G.729 voice coder that outputs one frame of 1 0 .015 + .010* N + WCDeliv + .010 :::; Target, or
octets every 1 0 msec (8 kb/s). T o help prevent audible
.010* N + WCDeliv:::; Target - .025
glitches from frame-to-frame, G.729 includes 5 msec of
look-ahead information which overlaps the next frame. To maximize profits, a VoIP carrier desires to move as
Hence 15 msec are required to acquire the necessary many voice calls as feasible over their VoIP backbone. Equa
voice information to code a frame. tion 3 illustrates two of the key trade-offs that can affect this:
If silence suppression is being used and the voice source is the number of frames in a packet and the tolerable delays at
silent, 0 kbls is assumed output with any necessary comfort intervening switches. As one increases, the other must
noise generated at the receiver. decrease. As the number of frames in a packet decreases,
• At the gateway, N G.729 Frames are acquired for place more time can be allocated to queuing delays, and trunk loads
ment in a packet for transmission. This yields a 47 + can be increased in order to carry more traffic, but a larger
N*10-byte packet for transmission (7 bytes Point-to-Point percentage of that traffic is packet overhead, not more phone
Protocol, 20 bytes IP, 8 bytes UDP, 12 bytes RTP, and calls. Conversely, as the number of frames in a packet increas
10* Nbytes of traffic). es, less time is available for packets to spend in intervening
• It requires 10* N msec + 5 msec to acquire sufficient switches. Trunk loads must be decreased, reducing the num
voice to generate NG.729 frames for emplacement in a ber of phone calls that can be conveyed. Using a technique
packet. The voice coder will then have up to 10 msec to outlined in [21], Figs. 8a and 8b show plots of the number of
complete compression before the next following frame is voice calls supportable over a backbone OC-12 trunk for both
collected and ready for processing. Considering this, we the fixed-rate (Fig. 8a) and variable-rate (Fig. 8b) G.729
use a voice coding and packet assembly delay of 15 msec coders, for 100 msec and 150 msec target end-to-end delay
+ 10* Nmsec below. bounds, using the network of Fig. 7. A target delivery time of
• The PBXs are connected to a gateway with a T -1 line. 50 msec is seen to be impractical for this example network,
Not shown are other PBXs or CO switches connected to even with one frame per packet, as from Equation 3 it is seen
the gateways. that WCDeliv must be less than 15 msec, an infeasible value
• The gateways are connected to the VoIP backbone by when the propagation delay alone is 20 msec. A similar situa
OC-3s. Packet switching and statistical multiplexing are tion occurs at 1 1 frames per packet with a target delivery time
used on these connections. Not shown are other gate of 150 msec, and at six frames per packet and a target delivery
ways that are also attached to the backbone routers. time of 100 msec.
• The VoIP routers are connected by OC-12s. These con Figure 8 indicates that multiple frames per packet are nec
nections are also packet switched and statistically multi essary to maximize the number of voice calls the system can
plexed. support, and that given some network configuration the
• Propagation delays between the PBX and the gateways choice of the number of voice frames to carry in a packet can
are inconsequential and are ignored. The propagation significantly impact the number of calls the system is able to
delays between the routers are assumed to be 10 msec. support. As a comparison, note that an OC-12 can carry 8,192
• The network is engineered such that packet discards due POTS phone calls. The combination of compression and
to queue overflows or bit errors are negligible and can be silence suppression allows the VoIP system to potentially ser
ignored. Hence CAC procedures at gateways are in vice a significantly larger number of customers.
effect, and fiber optic systems are widely deployed. The choice of receiver de-jitter buffer delay also has a con-
_ 1
J � � �g ��:� l
40000 i---------:=------;===='ll
: 100000 i------------r=====il
I
100msec '"
'" :0
D 150msec
:0
'" .e 80000
________________
'5 30000
------------------------- ----- ----- --------------------------------------------
a
0.
0. 0.
0.
::J i.il 60000 ._-----------------------------------------
'"
!!!.
� 20000 -- ----_ . ._---- ----- ._----------------------------------
(ij
u
u
� 40000 ---- ----- ---- ---- ----------------------------------
n�
'"
u ·0
>
.� 10000
---------------------------
"'" � 20000
----- ----- ---- ---------
n
c
2
2
I
>-
>-
o������·� �����u-� n o +-L4���r-L4�
�LL����LL��-�
2 3 4 5 6 7 8 9 10 11 2 3 4 5 6 7 8 9 10 11
Number of frames per packet Number of frames per packet
• FIGURE 8_ a) Fixed rate G. 729 VolP calls supportable over OC-12 trunks for 100 and 150 msec delivery delays; b) Variable rate G. 729
VoIP calls supportable over OC-12 trunks for 100 and 150 msec delivery delays.
siderable impact on the number of calls the network can sup backbones can be used_ An analysis of this type of network
port. Equation 3 is based on the conservative choice of delay would differ from the network of Fig_ 7 in several significant
ing the initial packet of a call or talk spurt such that it plays ways_ First, the mobile phone voice coder would take longer
back WCDeliv seconds after construction at the transmitter. than a 0_7 1 1 coder to generate voice bits for movement over
This choice insures the buffer will not empty, but it delays the mobile phone radio frequency (RF) link_ This would
play back an amount of time that is only necessary in the reduce the time available for the VoIP backbone to move the
worst case, an event that may be unlikely_ If the de-jitter traffic to the destination_ If this destination were another
buffer delay is decreased, time can be freed up and trans mobile phone, the voice decoder would also require more
ferred to another entity of Equation 1, such as the packet decoding time than POTS' 0_7 1 1 . Second, the probability of
assembly delay (allowing a reduction in the percentage of voice frames being lost over the mobile phone's RF link will
overhead) or delays spent at packet switches (allowing an generally be higher than that of a wired link, adversely effect
increased trunk load)_ The interested reader is referred to ing the R rating_ Third, were trans-coding necessary to con
[22] and [23] for further information on these issues_ vert the mobile phone-compressed voice to a different
What kind of quality does the E-Model predict for the net protocol being used on the VoIP backbone, some additional
work of Fig_ 7? We use the following values in this example voice degradation would result compared to the example
[20]: above, further impacting the R rating_ In short, if a mobile
• A base Ro value of 94, corresponding to the value associ telephone is the source or destination of a call being carried
ated with a typical PSTN voice connection_ by a VoIP backbone, additional engineering attention must be
• A delay impairment value Id of 2, 4, and 6 for echo focused on maintaining quality_ A reduction in the allowed
delays of 50, 100, and 150 msec, respectively_ This corre end-to-end delay (to reduce echo delay impairments) may be
sponds to a system using echo cancelers sufficient to gen necessary to offset quality degradation caused by frame losses
erate a TELR of 60 dB in Fig_ 6_ Perfect echo over the RF link and trans-coding degradation_ A reduction in
cancellation requires a TELR of 65 dB. the number of supportable calls over the backbone would
• An equipment impairment value Ie of 10 for a 0_729 result.
fixed-rate coder, and 11 for a variable-rate coder. These Another type of system likely to be connected to a carrier
values reflect the voice quality degradation of these VoIP backbone has one (or both) of the POTS PBX systems
reduced bit rate coders as compared to the standard of Fig_ 7 replaced with LAN-based VoIP systems_ Traffic from
0_7 1 1 coder. the LAN VoIP phone will be mixed in with data traffic on the
• A and Is are both set equal to zero, indicating the user corporate LAN prior to being segregated and routed to the
expects similar quality on a VoIP phone as on a regular VoIP network gateway_ It will be much more difficult for a
phone, and that there are no speech impairments other carrier to offer and engineer high QoS when a significant part
than those associated with the voice coder, as reflected in of the end-to-end network is on a LAN, especially if in the
Ie_ LAN the VoIP traffic is not prioritized_ This situation will
Substituting these into Eq_ 2 yields R values ranging from have some of the same characteristics as that of mobile phone
78 to 82 for the fixed-rate coder, and 77 to 81 for the variable voice sources and sinks, specifically higher delays at the LAN
rate coder (with the lower values associated with the higher voice coders and decoders, possible trans-coding degradation
end-to-end delivery delays)_ Note that these R values would at the gateways, and higher packet losses on the LAN_ Engi
be reduced further were packet losses of any significance_ neering steps to offset these would be similar to that men
If the minimum acceptable R rating is a 70, these all pass tioned in the paragraph above, but would also include an
muster in terms of quality_ But the values calculated do corre increase required in the receiver de-jitter buffer delay to off
spond to a perceived voice quality on the borderline between set the jitter inherent with LAN traffic_ Again, a reduced
where most users are satisfied and where some of the more number of supportable calls over the backbone would likely
critical users will be unsatisfied_ be required in order to meet QoS goals_