Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
Comparative Analysis of Speech Compression Algorithms With Perceptual and LP Based Quality Evaluations
38
International Journal of Computer Applications (0975 - 8887)
Volume 51-No. 15, August 2012
The summation is computed from k=1 to k=p. In case of LPC employs inverse DCT to recreate signal with the intention of
this is 10 means that 10 coefficients are sent for synthesis. excitation of voice.
Two key practices are utilized to compute coefficients, the
autocorrelation and covariance. Here the autocorrelation 4.1 Bits Allocation
scheme is utilized as the roots of polynomials in denominator The speech segment in VLPC encloses 88-bits as additional
causes the poles to be located within unity circle which shows 49 bits are needed for DCT. The 8000 samples/second are
stable system. The Levinson-Durbin scheme is took up in wrecked down to 180 samples segments as sampling rate is
order to compute the parameters for autocorrelation. The LPC 8000 samples/second. Thus approximately 44.44
analysis of frames is carried out to answer the question that frames/second fallout in approximately 4kbps. The bits
specific frame is voiced or unvoiced? In favor of voiced allocation is summarized in table 2.
segment, the impulse train is utilized. The precise pitch
detecting algorithm is used to conclude the pitch. We Table 2 Bits allocation for VELP coder
deployed autocorrelation function to decide pitch of segment.
In case of unvoiced segment, white noise is utilized for Parameters Bits
demonstration. Hence impulse train or white noise is used as DCT 40
excitation for synthesis filer. Mathematical model for speech K1 and K2 10
production is given in figure 4.
K3 and K4 10
Pitch Period
Av (Voiced Gain)
K5, K6, K7 and K8 16
K9 03
Vocal Tract
K10 02
Parameters
Impulse Glottal
Gain 05
Train Pulse R(z)
generator Model Synchronization 01
E(z) G(z)
Glottal Radiation Total 88
Switch Pulse Model
Model
Excitation LPC
Detector Synthesizer
S'(t)
Fig 5: VELP vocoder block diagram
39
International Journal of Computer Applications (0975 - 8887)
Volume 51-No. 15, August 2012
It can be clearly observed from figures (6,7) that the estimation is based on divergence between power spectra of
reconstructed speech waveforms are close to novel in MELP original and reconstructed speech. If the distance is high for a
as compared to VELP. specific speech then algorithm is performing badly, but if the
distance is low then the algorithm performs well for that
6. PERCEPTUAL MEASUREMENT speech. Here in MELP, the IS distance is low while VELP
distance is higher means low performance. The chart 3 shows
6.1PESQ the IS computations for both coders.
The aim of the PESQ [6],[7],[8] algorithm is to measure the
quality of the speech. The quality evaluation process is carried Chart 3: IS distance for Speech coders
out by the comparison of the original and the degraded speech
due to the compression. For the comparison process the PESQ
takes the help of stochastic and cognitive models and is 15 13.2305 13.5837
associated with MOS [16]. PESQ score is calculated by the
linear grouping of the average disturbance (Dave) and the 10
average asymmetrical disturbance (Aave) values respectively
and is given by the formula: 5
PESQ = ao +a1Dave + a2Aave (6.1) 1.1221 1.0444
The PESQ results for MELP are better than VELP shown in 0
Male Speaker Female Speaker
Chart 1: PESQ computation for coders
41