CVSD - A Tutorial
CVSD - A Tutorial
CVSD - A Tutorial
1. Introduction
Virtually all means of wireless and wireline speech communication are, or are becoming, digital. Digital coding
of speech for transmission or storage clearly has advantages over traditional analog methods. In digital data
communication or storage systems, information is transmitted or recorded as a series of binary digits—the
receiver or player must only distinguish between a one or zero to exactly recover the original information. In
digital voice systems the information is the human voice. Digital speech coding algorithms are judged by their
ability to quantize (digitize) speech accurately for transmission and then perform the reverse at the decoder.
In other words, the original analog speech signal must be accurately recovered at the receiver. Sounds
simple, however, without compression quantized speech can require significantly more bandwidth than analog
speech. For wireline or wireless telecom applications, the speech extends from 300 Hz to 3300 Hz. If an
analog signal in this band is quantized using a linear Analog to Digital Converter (ADC) sampling just above
the Nyquist rate, say 8 KHz, with 256 quantization levels (8 bits) the resulting data rate is 64 Kbits per second.
Without exotic modulation/coding, the bandwidth (BW) for the digitally encoded signal is nearly twenty times
the original analog signal. Such BW consumption is not practical, particularly for wireless applications.
Engineers working in the field of Speech Coding have been actively searching for methods to reduce the
bandwidth consumed by quantized speech signals. Early algorithms attempted to take advantage of the
human ear’s adaptive dynamic range. The human ear has a built in ability to become more or less sensitive to
audible signals—the ear can hear sound pressure levels (SPL) as low as 0 dB SPL (threshold of human
hearing) to 120 dB SPL (on set of pain) yet at any one time the dynamic range of human hearing is generally
considered to be about 40 dB. In other words, we have a difficult time hearing someone whispering at a rock
concert. Some speech coding algorithms exploit this phenomena by using a greater number of progressively
smaller quantization levels for low amplitude signals and fewer, more coarse quantization levels for large
amplitude signals. This is known as non-uniform quantization. Non-uniform quantization is used in the Public
Switched Telephone Network (PSTN) where it is called -law Pulse Code Modulation (PCM). A slightly more
complex approach takes advantage of strong correlation between adjacent speech samples, quantizing the
amplitude difference (delta) between two samples as opposed to the entire sample amplitude. This difference
signal requires fewer quantization levels for the same signal quality and consequently, reduces the required
bandwidth. Algorithms employing this technique are classified under the broad category of differential
quantization or differential PCM (DPCM). Further bandwidth conservation is possible through more complex
algorithms. For example, combining adaptive quantization with DPCM results in the commonly used coding
algorithm, adaptive DPCM (ADPCM).
Delta modulation (DM) and Continuously Variable Slope Delta modulation (CVSD) are differential
waveform quantization techniques. Both employ two level quantizers (one bit). CVSD is basically DM with an
adaptive quantizer. Applying adaptive techniques to a DM quantizer allows for continuous step size
adjustment. By adjusting the quantization step size, the coder is abled to represent low amplitude signals with
greater accuracy (where it is needed) without sacraficing performance on large amplitude signals.
CVSD is used in tactical communications where “communication quality1” is required yet the option for
security must be available. MIL-STD-188-113 (16 Kb/s and 32 Kb/s), and Federal Standard 1023 (12 Kb/s
CVSD) are examples of a tactical communication systems using CVSD. With the tremendous worldwide
growth in wireless technology, secure communication is becoming important to everyone. In addition to point-
to-point communication, CVSD is commonly used in digital voice recording/messaging and audio delay lines.
This paper attempts to describe CVSD quantization, focusing on its application to coding of speech.
Before discussing the details of CVSD, the basics of uniform and non-uniform quantization (non-adaptive) will
be reviewed. Next, the subject of differential quantization will be explained, showing that DM and CVSD are
equivalent to one bit DPCM and ADPCM, respectively. Finally, some application suggestions for MX•COM
CVSD codecs will be presented.
1
Communication quality is a qualitative expression widely considered synonymous with “acceptable speech
communication.” It is not intended to imply “high-fidelity,” only that intelligible conversation can take place.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 2 Application Note
1
FS ( ) [ F( ) * S( )] (1)
2
or,
1
FS ( )
2 S( ) F( )d (2)
fS 2B . (3)
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 3 Application Note
f
0
(a)
f
-2F S -F S 0 FS 2F S
(b)
Figure 1: Fourier transform of (a) continuous time signal and (b) sampled signal.
Shannon's sampling theorem is mathematically accurate. Although, in most cases it is not practical to
sample a signal at exactly twice it's highest frequency. Band-limiting is necessary to avoid aliasing. For fs=2B
the band-limiting filter must have a so called “brick wall” roll-off at frequency B. A filter that matches this
requirement is physically unrealizable. Several factors contribute to the actual sampling frequency used.
Generally there is a compromise between the complexity of the band-limiting filter versus the cost of the
analog to digital converter (ADC). As fS becomes closer to 2B, the band-limiting filter requires more stages to
give the desired roll-off. As fS increases, the required conversion time necessitates a faster ADC. Cost for
ADCs is inversely proportional to conversion time, as conversion time decreases, cost rises.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 4 Application Note
q(n)
s(t)
-4 -3 -2 -1 1 2 3 4
-1
-2
-3
The noise introduced by PCM is primarily due to the rounding to the “nearest” binary number. If a signal is
quantized to eight levels (3 bits) using a quantizer transfer characteristic similar to the one shown in Figure 2,
the finest resolution is the full scale magnitude divided by eight. In general,
A
. (4)
2n
2
1
E ( ) d .
2 2
(5)
2
2
2
N OUT . (6).
12
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 5 Application Note
Taking the square root of both sides gives the root mean square (RMS) noise voltage
N OUT . (7)
2 3
The maximum signal-to-noise ratio (SNR) for a uniformly quantized signal can be calculated by finding the
ratio of the full scale quantization level to the noise voltage in equation (7). The full scale quantization level is
simply the total number of quantization levels multiplied by the minimum quantization increment
(2 n )
SOUTMAX . (8)
2
2n
SNR . (9)
3
In dB,
Equation (10) is an objective measure of quality in systems employing uniform quantization. It should be
noted: the SNR calculated in equation (10) is a theoretical maximum. In practice other factors (e.g. power
supply noise) tend to reduce the final SNR. Also, the human voice is considered to have about 40 dB of
dynamic range, however, during most conversations it is typically about 20 dB down from maximum. In other
words, we generally do not shout during normal conversation. Consequently, the average signal to noise ratio
for uniformly quantized speech is about 20 dB less than what would be calculated using equation (10).
Another objective measure of quality which can be derived in a similar manner is dynamic range. Dynamic
range pertains to the resolution of a quantization scheme. It is the ratio of the full scale amplitude to the
smallest quantized amplitude change,
1 2n
DR
2 n 1 . (11)
2
In dB,
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 6 Application Note
q(n)
s(t)
-4 -3 -2 -1 1 2 3 4
-1
-2
-3
A complete digital communication system, incorporating non-uniform speech coding, compresses the
quantized signal at the transmitter then expands the signal back to linear form at the receiver. The
compressed signal requires fewer bits and therefore consumes less BW. Ideally, the expander transfer
function is the exact inverse of the compressor and is able to reproduce the original uncoded analog signal. In
literature pertaining to telecommunications, the words compressor and expander are generally combined in to
a single word; compandor.
Companding is on of the most popular forms of non-uniform quantization. When compared to uniform
quantization, it allows for bandwidth compression without degradation in dynamic range, at the expense of
peak signal to noise ratio. For example, a signal uniformly quantized to 14 bits, using equations (10) and (12),
would have a peak SNR and DR of approximately 89 dB and 78 dB respectively. If the same signal is non-
uniformly quantized (compressed) using only 8 bits (256 levels), the minimum quantization level can be set
such that DR of 78 dB can be retained, although the peak SNR will be degraded. Degradation in the peak
SNR is due to the course quantization levels used for the large amplitude signals (where fine quantization is
not necessary). Low amplitude signals are quantized at finer resolution (more steps, where it is necessary).
Consequently, the SNR at low signal levels is improved when compared to uniform quantization with the
same number of levels. Essentially, companding strives to make SNR constant over the dynamic range of the
quantizer.
ln(1 x )
y( x ) sign( x ) . (13)
ln(1 )
Where, 1 x 1
x = input signal
y(x) = compressed output signal
sign(x) = polarity of input signal.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 7 Application Note
S
SNRdB 4.77 6.02 n 20 log
i
. (14)
S MAX
2n
3(2 )
SNR ( dB) 10 log . (15)
[ln(1 )]2 1 1.732 1
Si 2 Si2
It should be noted that as µ 0, equation (15) is equivalent to equation (14), thus µ = 0 implies uniform
quantization. Figure 4 compares SNR versus relative input signal power for 256 level non-uniform
quantization (µ = 255) and 256 level uniform quantization.
60
8 bit linear PCM
8 bit mu-law PCM
50
40
SNR (dB)
30
20
10
0
-60 -50 -40 -30 -20 -10 0
INPUT (dB)
Figure 4: SNR versus input level for 8 bit -law and 8 bit uniform quantization.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 8 Application Note
ENCODER DECODER
Q -1
d Q (n)
x P (n) x Q (n)
P(z) +
LOCAL DECODER
Figure 5 is a block diagram displaying a differential quantization system. All signals are represented in
discrete time notation implying that x(n) is the discrete time version of x(t). Notice the decoder is in the
feedback path of the encoder. Thus, the decoder is performing the inverse of the encoder. The block labeled
Q converts the difference signal d(n) in to a binary representation suitable for transmission and the block
labeled Q-1 does the inverse. In reality, the process of converting d(n) to c(n) and back to dQD(n) is a significant
factor in the non-ideal behavior of differential quantization. Nevertheless, fundamental analysis of the system
is simplified by assuming blocks Q and Q-1 cancel. Employing this assumption, the transfer function of the
encoder in z-domain terms
C( z )
H ENC ( z ) Q( z )[1 P( z )] . (16)
X ( z)
X QD ( z ) 1
H DEC ( z ) . (17)
C( z ) Q( z )[1 P( z )]
If DQD(z) D(z), then XQD(z) X(z) and the entire system transfer function can be written as,
Although equation (18) is based on several assumptions, it shows a differential quantization system
patterned after the topology displayed in Figure 5 can be made to produce an output signal which
approximates the original input signal. The quality of the approximation is what distinguishes various
differential quantization schemes. Generally, high quality approximation does not come without a price.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 9 Application Note
A crucial element in differential quantization is the predictor P(z). The output xP(n) of P(z) is a weighted
sum of past input samples. The general form is equivalent to a finite impulse response (FIR) filter
P
x P (n) a k x Q (n k ) . (19)
k 1
P
X P (z)
P( z )
XQ (z)
ak z k . (20)
k 0
Equations (19) and (20) show that the predictor output is a linear combination of past inputs giving rise to
the term “linear prediction.” Non-linear predictors (non-linear combination of past input samples) have been
studied, however, due to complexity and stability issues, their popularity is limited.
The coefficients ak are calculated such that P(z) will provide a reasonably accurate model for the behavior
of human speech. In 1966 a paper was published by McDonald [6] suggesting coefficients, based on
normalized autocorrelation of human speech samples, for predictors of order one to ten. Later, in 1972, Noll
[4] published similar data. These two papers are generally referenced in determination of predictor
coefficients. Assuming that P(z) does provide a reasonably accurate model for human speech, xQ(n) x(n) and
d(n) 0. In other words, a good predictor should minimize the difference signal d(n). This is the basis for
differential quantization. In most DM algorithms the predictor order P is set to one.
The potential for instability in differential quantizers exists in the encoder. As mentioned early, equation
(20) is the transfer function of an FIR filter. One of the significant characteristics of FIR filters is, by definition,
they are stable (the transfer function has only zeros). However, when an FIR is placed in a feedback path, as
is the case with the encoder, the zeros become poles—if one of these poles finds its way outside the unit
circle the differential quantizer will be unstable.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 10 Application Note
ENCODER DECODER
INTEGRATOR
The predictor is shown as a single tap FIR filter with transfer function P(z) = az-1. In the encoder, the
transfer function between dQ(n) and xP(n) can be expressed in the z-domain terms as
X P (z) az 1
(21)
DQ ( z ) 1 az 1
X QD ( z ) 1
(22)
DQD ( z ) 1 az 1
Equations (21) and (22) represent discrete time integrators (if a = 1). If a < 1 they become damped or
lossy integrators. McDonald [6] and Noll [4] both suggest a = 1 for optimum prediction gain. A pure integrator
(a = 1) will cause bit errors to propagate longer than if a < 1. In practice, a value of a< 1 is preferred.
The quantizer Q in figure 6 functions like a comparator. When the input d(n) exceeds zero it outputs a
logic one, when the input is less then zero the output is a logic zero. Hence, the output of Q is a single bit
indicating the sign of the magnitude of d(n). The inverse quantizer Q-1 converts logic levels to delta dQ(n) as
shown in the table below.
The value of plays an important role in the performance of LDM. If is relatively small, tracking of
slowly changing, low amplitude signals is quite good at the expense of poor tracking for fast, abruptly
changing signals. When DM is not able to keep up with the input signal a phenomena called slope overload is
exhibited, as shown in Figure 7.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 11 Application Note
x Q (n)
x(t)
SLOPE OVERLOAD
SLOPE OVERLOAD
Increasing the value of can lessen the effects of slope overload but creates a new problem; granular
noise. With to large, low amplitude signals will not be quantized at fine enough levels and they appear as
idle channel noise , see Figure 8. The idle channel pattern is simply an alternating one-zero sequence
indicating the input signal amplitude is not changing. Since an alternating one-zero bit pattern has a mean
value of zero, the signal out of the decoder will integrate to zero.
x Q (n)
x(t)
GRANULAR NOISE
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 12 Application Note
d Q (n) ±1
x L
I1
∆
z -1 z -1
+ x
I2
∆ MIN ∆ MAX
Q -1
I1
∆
z -1 z -1
x +
I2
∆ MAX ∆ MIN
Q -1
The block labeled Q-1 , previously shown in Figure 6, has been replaced by two cascaded one sample
delays (z-1) and logic to determine when three consecutive ones or zeros have occurred. The minimum and
maximum step height is set by MIN. and MAX, respectively. When strings of three consecutive zeros or ones
have not occurred for a period of time long enough for the output of the integrator I2 to decay to near zero, the
algorithm is equivalent to LDM (section 3.1). The time constants for the integrator I2 and integrator I1 are
typically 4 ms and 1 ms respectively. In CVSD literature integrator I1 is referred to as the principle integrator
and integrator I2 as the syllabic integrator. The so called syllabic integrator derives its name from the length of
syllable. Actually a syllable is about 100 ms in duration, however, pitch changes are on the order of 10 ms.
Consequently, 4 ms seems to work best for the CVSD syllabic (pitch) time constant. The block labeled L
performs simple level conversion (i.e. c(n) =1, L outputs 1; c(n) = 0, L outputs -1). This CVSD algorithm is also
known as “Digitally Controlled DM” [10].
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 13 Application Note
Figure 11 displays the output of the integrator I1 (xP(n) in the encoder and xPD(n) in the decoder). When
compared to Figure 7 and Figure 8 slope overload and granular noise are reduced.
x P (n)
x(t)
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 14 Application Note
30
25
20
SNR (dB)
15
10
0
-40 -35 -30 -25 -20 -15 -10 -5 0 5 10
INPUT (dBm0)
Figure 12: Measured SNR versus input level for 32 Kb/s CVSD (input = 820 Hz sinewave).
SNR is the most used method to objectively quantify performance of speech coding algorithms. However,
it does not always correspond with perceived quality, particularly for differential and adaptive algorithms using
actual voice as the input. In addition, it is difficult to make reliable SNR measurements in the presence of
random bit errors. In an effort to quantify the perceived quality of a speech coding algorithm, Mean Opinion
Score (MOS) testing was developed [4]. Table 2 summarizes the five point scale used to judge quality and
impairment.
An MOS rating of 4 to 4.5 is considered Toll Quality (equivalent to commercial telephony). Where as
Communication Quality MOS ratings are 3 to 4 (barely perceptible distortion, but no degradation in
intelligibility).
Figure 13 compares MOS ratings for -law PCM (the standard for Toll Quality), CVSD and ADPCM.
Notice CVSD performs as well or better than both -law PCM and ADPCM in the presence of bit errors.
Specifically, CVSD retains quite good MOS ratings at bit error rates exceeding 1%, and at 10% has an MOS
rating of 3 (Communication Quality). It is this robustness to bit errors (channel noise) that makes CVSD an
ideal solution for many wireless speech communication applications.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 15 Application Note
5
48 kb/s CVSD
64 kb/s u-law PCM
32 kb/s ADPCM
4
0
1e-05 0.0001 0.001 0.01 0.1
Figure 13: MOS versus bit error rate for ADPCM, -law PCM, and CVSD from [4].
4. Summary
CVSD has several attributes that make it well suited for digital coding of speech. One bit words eliminate the
need for complex framing schemes. Robust performance in the presence of bit errors make error detection
and correction hardware unecessary. Other speech coding schemes may require a digital signal processing
engine and external analog to digital/digital to analog converters to convert the analog signal in to a form that
can be processed digitally—the entire CVSD codec algorithm, including input and output filters, can be
integrated on a single silicon substrate. Despite this simplicity, CVSD has enough flexibility to allow digital
encryption for secure applications. Finally, CVSD can operate over a wide range of data rates—it has been
successfully used from 9.6 kB/s to 64Kb/s. At 9.6Kb/s audio quality is not particularly good, however, it is
intelligible. At data rates of 24 Kb/s to 48 Kb/s it is judged as quite acceptable. And above 48 Kb/s it is
comparable to toll quality. All of these attributes make CVSD attractive to wireless telecommunication systems
(e.g. digital cordless telephones, digital Land Mobile Radio). The defence industry has been using CVSD for
decades in wireline and wireless systems as specified in Mil-Std-188-113. More recently, Federal Standard
1023 proposed CVSD for 25 Khz channel radios operating above 30 Mhz. Figure 14 is a block diagram
showing a CVSD Codec in a digital mobile radio system.
This tutorial has attempted to shed some light on the fundamental aspects of CVSD. It has shown that CVSD
is a differential adaptive quantization algorithm with one bit coding and first order prediction (one bit ADPCM).
In addition, objective and subjective methods of measuring signal quality showing that CVSD performs quite
well in the presence of bit errors (noisy channel) have been presented.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.
CVSD: A Tutorial 16 Application Note
5. References
[1]J. C. Bellamy, Digital Telephony, Wiley and Sons, New York, 1982.
[2] J. A. Greefkes and K. Riemens, “Code Modulation with Digitally Controlled Companding for Speech
Transmission,” Philips Tech. Rev., pp. 335-353, 1970.
[3] A. Gersho, "Principals of Quantization," IEEE Transactions on Circuits and Systems, pp. 427-436, July
1978.
[4] N. S. Jayant and P. Noll, Digital Coding of Waveforms: Principles and Applications to Speech and Video,
Prentice-Hall, Englewood Cliffs, N. J., 1984.
[5] A. B. Jerri, "The Shannon Sampling Theorem—Its Various Extensions and applications: A Tutorial
Review,"Proceedings of the IEEE, pp. 1565-1596, November 1977.
[6] R. A. McDonald, "Signal-to-Noise and Idle Channel Performance of Differential Pulse Code Modulation
Systems-Particular Applications to Voice Signals," Bell System Technical Journal, pp. 1123-1155,
Sept. 1966.
[7] P. Noll, "A Comparative Study of Various Quantization Schemes for Speech Encoding," Bell System
Technical Journal, pp. 1597-1614, November 1975.
[8] L. R. Rabiner and R. W. Schafer, Digital Processing of Speech Signals, Prentice-Hall, Englewood Cliffs, N.
J., 1978.
[9] M. Schwartz, Information, Transmission, Modulation, and Noise, McGraw Hill, New York, 1980.
[10] R. Steele, Delta Modulation Systems, Pentech Press, London, England, 1975.
©1998 MX-COM Inc. www.mxcom.com Tel: 800 638-5577 336 744-5050 Fax: 336 744-5054 Doc. # 20830070.002
4800 Bethania Station Road, Winston-Salem, NC 27105-1201 USA All trademarks and service marks are held by their respective companies.