Module2 SSP
Module2 SSP
Module2 SSP
ECE3028
BY
DR. K. GOWRI
ASSISTANT PROFESSOR,
DEPARTMENT OF ECE,
PRESIDENCY UNIVERSITY,
BANGALORE
MODULE: 2: Discrete time speech signals
Introduction, Time dependent processing of
speech, short time energy and average magnitude,
short time Average zero crossing rate, Speech vs.
silence discrimination using Energy and Zero
Crossings, Pitch period estimation using parallel
processing approach.
INTRODUCTION
• Objective : how digital signal processing methods can be applied to speech
signals to estimate the properties and parameters of such models.
• The first step usually is to obtain a convenient and useful parametric
representation of the information carried by the speech signal.
• This can be achieved by assuming that the speech signal, s[n], is the output of
a parametric synthesis model such as the one shown in Figure 1.
Model for speech production and synthesis.
Model for speech production and synthesis.
• The information carried by the speech signal includes,
• The (time-varying) pitch period (in samples), Np, (or pitch frequency, Fp =
Fs/Np, where Fs is the speech sampling frequency), for regions of voiced
speech, including possibly the locations of the pitch excitation impulses that
define the periods between adjacent pitch pulses.
• The glottal pulse model, g[n]
• The time-varying amplitude of voiced excitation, AV
• The time-varying amplitude of unvoiced excitation, AN
• The time-varying excitation type for the speech signal; i.e., quasi-periodic pitch
pulses for voiced sounds or pseudo-random noise for unvoiced sounds
• The time-varying vocal tract model impulse response, v[n], or equivalently, a
set of vocal tract parameters that control a vocal tract model.
• The radiation model impulse response, r[n] .
Different “representations” of the speech signal as
depicted in Figure 2.
• ---(1)
---(4)
• Time-domain plots of the rectangular and Hamming windows for L = 21 are
given in Figure 5.
• If we substitute Eq. (3) for w[n] in Eq. (1), we obtain,
---(5)
Fig.7 Plots of the time responses of a 21-point (a) rectangular window; (b) Hamming
window.
• The rectangular window corresponds to applying equal weight to all samples
in the interval (nˆ − L + 1) to nˆ.
• If the Hamming window is used, Eq. (6.5) will incorporate the window
sequence wH[nˆ − m] from Eq. (6.4) as weighting coefficients, with the same
limits on the summation.
• The frequency response of an L-point rectangular window is shown as,
•
• The log magnitude (in dB) of WR(ejω) is shown in Figure 6.7a for a 51-point
window (L = 51).
Fig. 8 Fourier transform of (a) 51-point rectangular window; (b) 51-point Hamming window.
• Note that the first zero of Eq. (6.6) occurs at ω = 2π/L, corresponding to analog
frequency,
• where Fs = 1/T is the sampling frequency.
Analysis of Rectangular and Hamming window, Fs = 10 kHz
Rectangular window Hamming window
• Nominal cutoff frequency = 2π/L • Nominal cutoff frequency = 4π/L
• F= Fs/L = 10,000/51 = 196 Hz • F= 2Fs/L =2* 10,000/51 = 392 Hz
(for L=51) (for L=51)
• Bandwidth is les • Bandwidth is twice the
• attenuation (> 14 dB) outside rectangular window.
the passband. • much greater attenuation (> 40
• Rectangular window would dB) outside the passband
seem to require only half the • Sampling rate of Qnˆ should be
sampling rate, nominal cutoff of greater than or equal to 4Fs/L,
Fs/L or R ≤ L/4
Analysis of Rectangular and Hamming window
• Increasing the window length, L, simply decreases the bandwidth.
• Attenuation of both these windows is essentially independent of the window
duration
• For typical finite-duration analysis windows, the short-time representation
Qnˆ has a restricted lowpass bandwidth that is inversely proportional to the
window length L.
• Poor frequency selectivity would generally result in significant aliasing at this
sampling rate.
SHORT-TIME ENERGY AND SHORT-TIME MAGNITUDE
• The energy of a discrete-time signal is defined as,
• --(7)
Exponential window for short-time energy computation using a value of α = 0.9: (a) w˜ [n] (impulse response of analysis filter)
and (b) 20 log10 |W˜ (e^ jω)| (log magnitude frequency response of the analysis filter).
Automatic Gain Control Based on Short-Time Energy
• These are the impulse response and frequency response, respectively, of the
recursive short-time energy analysis filter.
• Note that by including the scale factor (1 − α) in the numerator, we ensure
that W˜ (ej0) = 1; i.e., the low frequency gain is around unity (0 dB)
irrespective of the value of α.
• By increasing or decreasing the parameter α, we can make the effective
window longer or shorter respectively.
• The effect on the corresponding frequency response is, of course, opposite.
Increasing α makes the filter more lowpass, and vice versa.
• To anticipate the need for a more convenient notation, we will denote the
short-time energy as En = σ2[n], where we have used the index [n] instead of
the subscript nˆ.
Automatic Gain Control Based on Short-Time Energy
• we use the notation σ^2 to denote the fact that the short-time energy is an
estimate of the variance of x[m].
• Now since σ^2[n] is the output of the filter with impulse response w˜ [n] in
Eq. (6), it follows that it satisfies the recursive difference equation,
Fig.13 Block diagram of recursive computation of the short-time energy for an exponential window.
Automatic Gain Control Based on Short-Time Energy
• Now we can define an AGC of the form,
• --(9)
• where G0 is a constant gain level to which we attempt to equalize the level of
all frames,
• The capability of the AGC control of Eq. (9) to equalize the variance (or more
precisely the standard deviation) of a speech waveform is illustrated in Figure
14.
• Larger values of α would introduce more smoothing so that the AGC would act
over a longer (e.g., syllabic) time scale.
Automatic Gain Control Based on Short-Time Energy
--(10)
where the weighted sum of absolute values of the signal is computed instead of
the sum of squares.
• Note that a simplification in arithmetic is achieved by eliminating the squaring
operation in the short-time energy computation.
• Figure 15 shows that Eq. (10) can be implemented as a linear filtering
operation on |x[n]|.
Block diagram representation of computation of the
short-time magnitude function
FIGURE 6.15 Block diagram representation of computation of the short-time magnitude function.
Short-time magnitude functions
FIGURE 16 Short-time magnitude functions for rectangular FIGURE 17 Short-time magnitude functions for Hamming
windows of length L = 51, 101, 201, and 401. windows of length L = 51, 101, 201, and 401.
Comparing short time energy and magnitude
Short-Time Magnitude
• For the short-time magnitude computation of Eq. (10), the dynamic range
(ratio of maximum to minimum) is approximately the square-root of the
dynamic range for the standard energy computation.
• To conclude our comments on the properties of short-time energy and short-
time magnitude, it is instructive to point out that the window need not be
restricted to rectangular or Hamming form, or indeed to any function
commonly used as a window in spectrum analysis or digital filter design.
• The filter can be either a finite duration impulse response (FIR) or an infinite
duration impulse response (IIR) filter as in the case of the exponential
windows.
• There is an advantage in having the impulse response (window) be always
positive since this guarantees that the short-time energy or short-time
magnitude will always be positive.
Short-Time Magnitude
• FIR filters (such as the rectangular or Hamming impulse responses) have the
advantage that the output can easily be computed at a lower sampling rate.
• If we use the exponential window of Eq. (6.11), the short-time magnitude
would be,
• The rate (number of crossings per some unit of time) at which zero-crossings
occur is a simple (and often highly reliable) measure of the frequency content
of a signal.
• This is particularly true of narrowband signals.
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• For example, a sinusoidal signal of frequency F0, sampled at a rate Fs, has
Fs/F0 samples per cycle of the sine wave.
• Each cycle has two zero-crossings, so that the average rate of zero-crossings
per sample is,
• --(11)
• --(12)
• where we use the notation Z(M) to denote the number of crossings per M
samples of the waveform.
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• The short-time zero-crossing rate gives a reasonable way to estimate the
frequency of a sine wave.
• An alternative form of Eq. (12) is, ---(13)
• where Fe is the equivalent sinusoidal frequency corresponding to a given zero-
crossing rate, Z(1), per sample.
• If the signal is a single sinusoid of frequency F0, then Fe = F0.
• Thus, Eq. (13) can be used to estimate the frequency of a sinusoid, and if the
signal is not a sinusoid, Eq. (13) can be thought of as an equivalent sinusoidal
frequency for the signal.
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• Consider the following examples of zero-crossing rates of sinusoidal
waveforms; (assume a sampling rate of Fs = 10,000 samples/sec)
• for a 100 Hz sinusoid (F0 = 100 Hz), with Fs/F0 = 10,000/100 = 100 samples
per cycle, we get Z(1) = 2/100 = 0.02 crossings/sample, or Z(100) = (2/100) ∗
100 = 2 crossings/10 msec interval (or 100 samples);
• F0 increases, crossings also increases.
• There are a number of practical considerations in implementing a
representation based on the short-time zero-crossing rate.
• Although the basic algorithm for detecting a zero-crossing requires only a
comparison of signs of pairs of successive samples, special care must be
taken in the sampling process.
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• The zero-crossing rate is strongly affected by DC offset(60 Hz hum in the signal
or any noise ) in the analog-to-digital converter may be present in the
digitizing system.
• If the DC offset is greater than the peak value of the signal(small signal), no
zero-crossings will be detected.
• Therefore, care must be taken in the analog processing prior to sampling to
minimize these effects.
• A key question about the use and measurement of zero-crossing rates is the
effect of DC offsets on the measurements.
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• Figures 6.19 and 6.20 show the effects of severe offset (much more than
might be anticipated in real systems) on the waveforms and the resulting
locations and counts of zero-crossings.
FIGURE 19 Plots of waveform for
sinusoid with no DC offset (top
panel) and DC offset of 0.75
times the peak amplitude
(bottom panel).
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• Figure 6.20 shows the waveforms for a
Gaussian white noise sequence (zero-mean,
unit variance, flat spectrum) with no DC
offset (the top panel), and with an offset of
0.75 (the bottom panel).
• The locations of the zero-crossings have
changed with the DC offset.
• The count of zero-crossings over the
251-sample interval has changed from
124 (for no DC offset) to a value of 82 for FIGURE 20 Plots of waveform for
zero-mean, unity variance,
the 0.75 DC offset. Gaussian white noise signal with
no offset (top panel), and 0.75 DC
offset
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• This shows that for zero-crossing counts to be a useful measure of frequency
content.
• The waveform must be high-pass filtered to insure that no such DC component
exists in the waveform, prior to calculation of zero crossing rates.
• Speech signals are broadband signals and the interpretation of the short-time
zero-crossing rate is therefore much less precise.
• However, rough estimates of spectral properties of the speech signal can be
obtained using a representation based on the short-time average zero-
crossing rate defined as simply the average number of zero-crossings in a
block of L samples.
• If we select a block of L samples, all that is required is to check samples in
pairs to count the number of times the samples change sign within the block
and then compute the average by dividing by L.
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• This would give the average number of zero-crossings/sample.
• As in the case of the short-time energy, the window can be moved by R
samples and the process repeated, thus giving the short-time zero-crossing
representation of the speech signal.
• Defining the short-time average zero-crossing rate (per sample) as,
• --(14)
• where Leff is the effective window length defined in Eq. (14), and where the
signum (sgn) operator, defined as,
• --(15)
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• Transforms x[n] into a signal that retains only the sign of the samples.
• The terms |sgn(x[m]) − sgn(x[m − 1])| are equal to 2 when the pair of samples
have opposite sign, and 0 when they have the same sign.
• Thus, each zero-crossing would be represented by a sample of amplitude 2 at
the time of the zero-crossing.
• This factor of 2 is taken into account by the factor of 2 in the averaging factor
1/(2Leff) in Eq. (14).
• Typically, the window used to compute the average zero-crossing rate is the
rectangular window,
• --(16)
• for which Leff = L.
SHORT-TIME ZERO-CROSSING RATE – CONTD…
• Thus, Eq. (14) becomes,