0% found this document useful (0 votes)
58 views34 pages

Lec9-10 Speech Processing

The document summarizes a lecture on acoustic phonetics and time domain features of speech. It discusses the vocal tract model, spectrograms, phoneme classifications including vowels, diphthongs, semi-vowels and consonants. It describes acoustic properties and cues used to identify different speech sounds. Key topics covered include the vocal tract configuration and acoustic features of different types of consonants and vowels that act as perceptual cues for auditory perception. Formant frequencies and transitions are important cues for identifying vowels and distinguishing consonants.

Uploaded by

abhinav kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
58 views34 pages

Lec9-10 Speech Processing

The document summarizes a lecture on acoustic phonetics and time domain features of speech. It discusses the vocal tract model, spectrograms, phoneme classifications including vowels, diphthongs, semi-vowels and consonants. It describes acoustic properties and cues used to identify different speech sounds. Key topics covered include the vocal tract configuration and acoustic features of different types of consonants and vowels that act as perceptual cues for auditory perception. Formant frequencies and transitions are important cues for identifying vowels and distinguishing consonants.

Uploaded by

abhinav kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 34

Lecture: 9-10

Acoustic Phonetics &


Time Domain Features
Dr. Shikha Tripathi,PES, Blr
¡  Model of Vocal Tract
§  LTI Model for speech production
§  Uniform lossless tube model
§  Concatenated tube model
§  Electric Circuit Model
§  Lip radiation
¡  Spectrograms
§  Narrowband
§  Wideband
¡  Acoustic Phonetics
30/08/2019 Dr.Shikha Tripathi@PESU Blr 2
¡  Acoustic Phonetics
¡  Auditory Perception
¡  Time domain methods of speech processing
§  ST Energy
§  ST Average Magnitude Function

30/08/2019 Dr.Shikha Tripathi@PESU Blr 3


In English, the combinations of features give 40 phonemes as follows

v v v

Phonemes in American English.


30/08/2019 Dr.Shikha Tripathi@PESU Blr 4
¡  Phoneme classifications: Vowels, Dipthongs, Semivowels
and Consonants
¡  Vowels (voiced segments)
§  Voiced component of sound(except when whispered)
§  The phonemes with the greatest intensity, and range in duration
from 50 to 400 ms in normal speech
§  Vowel energy is primarily concentrated below 1kHz and falls off
at about -6 dB/oct with frequency.
§  The excitation is periodic, generalized by fundamental frequency
of vocal cords and sound gets modulated while it passes through
vocal tract.
§  If these vowels are recorded & PSD plotted we get formant
frequencies
§  As different words are uttered containing vowels the position of
formants change depending on shape of VT and mouth cavity

30/08/2019 Dr.Shikha Tripathi@PESU Blr 5


Vowels:

Vocal tract profiles for vowels in American English.


30/08/2019 Dr.Shikha Tripathi@PESU Blr 6
Waveform, wideband spectrogram, and spectral slice of
narrowband spectrogram for two vowels: (a) /i/ as in “eve”;
(b) /a/ as in “father.”

30/08/2019 Dr.Shikha Tripathi@PESU Blr 7


Diphthongs:
¡  Two sounds or two tones: two adjacent vowel
sounds occurring within same syllable
¡  Also known as gliding vowel
¡  For ex: eye,hay,low,cow…the tongue moves
and these are said to contain diphthongs
¡  American English: 6 diphthongs
¡  Characterzed by time varying VT area function
that veries between 2 vowel configurations

30/08/2019 Dr.Shikha Tripathi@PESU Blr 8


Semi-Vowel:
¡  It is sound like /w/, /r/,/L/or /y/ (phonetically
similar to vowel sound but functions often as
syllable boundary). Called semi-vowel due to
their vowel like structure.
Consonants:
¡  Speech segment articulated using partial or
complete closure of VT. For ex: /p/ (lips
closed),/t/ (front part of tongue closed),/k/
(closure of back of tongue),/h/ (pronounced
in throat)


30/08/2019 Dr.Shikha Tripathi@PESU Blr 9
¡  The second large phoneme grouping is that of
consonants containing number of subgroups:
¡  Consonants:
1. Nasals
2. Stops/Plosives (voiced or unvoiced)
3. Fricatives (voiced or unvoiced)
4. Whispers
5. Affricates

30/08/2019 Dr.Shikha Tripathi@PESU Blr 10


1. Nasals (nasal stops or nasal continuant):
§  Produced by lowered velum.
§  Allows air to flow freely and escape through nose
▪  /m/ (constriction at lips)
▪  /n/ (constriction just at the back of teeth)

30/08/2019 Dr.Shikha Tripathi@PESU Blr 11


Nasals:

Vocal tract configurations for nasal consonants.

Wideband spectrograms of nasal consonants (a) /n/ in “no” and (b) /m/ in
“me.”

30/08/2019 Dr.Shikha Tripathi@PESU Blr 12


2. Stops:
§  Produced by stopping airflow in Vocal tract
▪  Plosives are stops with airflow out of mouth.
Sometimes nasal stops.
§  Voiced stop: /b/,/d/,/g/(properties depend on
vowel that follows them)
§  Unvoiced stop: /p/,/t/,/k/ (here during closure
vocal cords do NOT vibrate)
§  Plosives: Stops with air flow out of the mouth.
▪  Sometimes also includes nasal stops (airflow stopped in
the mouth and released through nose)
30/08/2019 Dr.Shikha Tripathi@PESU Blr 13
Plosives:

Vocal tract configurations for unvoiced and voiced plosive pairs.

30/08/2019 Dr.Shikha Tripathi@PESU Blr 14


A schematic representation of (a) unvoiced and (b) voiced plosives.
The Voice Onset Time is denoted by VOT.

30/08/2019 Dr.Shikha Tripathi@PESU Blr 15


Waveform, wideband spectrogram, and narrowband spectral slice of
voiced and unvoiced plosive pair: (a) /g/ as in “go” (Voice bar ); (b) /k/ as
in “key”(aspiration)
30/08/2019 Dr.Shikha Tripathi@PESU Blr 16
3. Fricatives:
§  Special class of consonants produced by forcing air
through a narrow channel like thing made by placing any
two articulators close together. Articulators are lower
lips and upper teeth as in /f/.
§  The turbulent air flow is called frication(ex: /s/, /z/, /f/)
§  Unvoiced: /f/,/s/,/sh/(produced by turbulence)
§  Voiced: /v/,/z/,/zh/. Two excitation sources are involved.
One is periodic due to vibration of VC and other is air
turbulence that gets generated due to constriction

30/08/2019 Dr.Shikha Tripathi@PESU Blr 17


Fricatives:

Vocal tract configurations for pairs of voiced and unvoiced fricatives.

30/08/2019 Dr.Shikha Tripathi@PESU Blr 18


Waveform, wideband spectrogram, and narrowband spectral
slice of voiced and unvoiced fricative pair: (a) /v/ as in “vote”;
(b) /f/ as in “for.”
30/08/2019 Dr.Shikha Tripathi@PESU Blr 19
4. Whispers:
§  Special case of speech. There is no fundamental
frequency. First formant frequency is perceived by us.

5. Affricates(combination of stop and fricatives):


§  Consonants such as /pf/ and /kx/. They start and stop
like /t/ or /d/ but sometimes release as fricatives as
in /s/ or /z/


30/08/2019 Dr.Shikha Tripathi@PESU Blr 20
Prosody: Melody of speech
§  Long time variations (changes extending over more
than one phoneme) i.e, in pitch (intonation),
amplitude (loudness) and timing (articulation rate or
rhythm), follow rules of prosody of a language
(speaker dependant)

30/08/2019 Dr.Shikha Tripathi@PESU Blr 21


¡  Speech segment:
§  Silence: Speech appears like grass ; no or very less
energy equivalent to small noise
§  Unvoiced: Somewhat higher amplitude than
silence part
§  Voiced: Higher energy than unvoiced
EV > EU > ES
¡  Generally log of energy is computed for about
20 ms

30/08/2019 Dr.Shikha Tripathi@PESU Blr 22


¡  Acoustic properties of speech sounds essential
for phoneme discrimination by auditory system
§  Study of Acoustic aspects of speech sound that act as
perceptual cues
¡  Goal : preserve such properties in speech
processing

30/08/2019 Dr.Shikha Tripathi@PESU Blr 23


¡  Acoustic Cues: Acoustic components of
speech used by listener to correctly perceive
the phoneme

§  Vowels

§  Consonants

30/08/2019 Dr.Shikha Tripathi@PESU Blr 24


¡  Vowels: Characterized by Formant frequencies
§  Vowel identification is mapped to F1 and F2
§  Higher formants also contribute
¡  As formant frequencies scale with tract length
formant location is normalized in phoneme
recognition
§  Or relative formant spacing is used as essential
feature
¡  Nasalization: Cued primarily by bandwidth
increase of F1 and introduction of zeros.
30/08/2019 Dr.Shikha Tripathi@PESU Blr 25
¡  Consonant: Identification depends on:
§  Formant of consonant
§  Formant transition into formants of following
vowel
§  Voicing(unvoicing) of vocal folds during or near
consonant production
§  Relative timing of the consonant and the onset of
following vowel

30/08/2019 Dr.Shikha Tripathi@PESU Blr 26


¡  Consider plosive voiced consonants /b/,/d/
and /g/ and their unvoiced counterparts /p/,/
t/ and /k/
¡  Each plosive is characterized by
§  Formant locus: spectrum determined by vocal
tract configuration in front of closure
§  Formant transition: Movement from formant locus
spectrum to spectrum of following vowel
configuration
¡  Perception of consonants depends on
formant locus and formant transitions

30/08/2019 Dr.Shikha Tripathi@PESU Blr 27


¡  Consider discrimination between /b/ and /d/
followed by /a/ as in “ba” and “da”
¡  Two perceptual cues are
§  F1 of the formant locus is lower in /b/ than in /d/
§  F2 transitions are upward from /b/ to the following
vowel and downward from /d/ to the following
vowel
¡  As with plosive consonants, for fricative
consonants also speech spectrum during
frication noise and transition into following
vowel are perceptual cues
30/08/2019 Dr.Shikha Tripathi@PESU Blr 28
¡  Time between initial burst and vowel onset to
discriminate between voiced and unvoiced
consonant
¡  For both plosive and fricative consonants:
presence/lack of voicing also acts as a perceptual
cue for voiced Vs unvoiced consonants
¡  In addition to direction of formant transition,
rate of transition also matters
§  Consider /b/ in “be” If the transition rate increases
beyond 30 ms “be” is transformed into “we” (semi-
vowel)
30/08/2019 Dr.Shikha Tripathi@PESU Blr 29
¡  Physiological correlation of acoustic
attributes
§  Mechanism in brain that detects the acoustic
features ultimately leading to meaning of the
message
¡  Motor theory Model
§  Acoustic features map to articulatory features
§  We create in our brains a picture of the basic
articulatory movements responsible for acoustic
feature

30/08/2019 Dr.Shikha Tripathi@PESU Blr 30


¡  Articulatory variability suggests that speakers
and listeners rely on acoustic representation
than articulatory representation
¡  A combination of acoustic features and
articulatory mapping is better in speech
perception
¡  Feature detection is carried out by auditory
centers of brain: formant energy, bursts,
voice onset times, formant transitions and
presence (or lack) of voicing
30/08/2019 Dr.Shikha Tripathi@PESU Blr 31
¡ Need to consider how changes in
temporal and spectral properties
of speech waveform affect
perception

30/08/2019 Dr.Shikha Tripathi@PESU Blr 32


¡  Time domain processing
¡  Short Time zero crossing rate
¡  Time domain methods of speech processing
§  Speech and Silence Descrimination
▪  Algorithm to find start and end of word
§  Pitch Period estimation

30/08/2019 Dr.Shikha Tripathi@PESU Blr 33


30/08/2019 Dr.Shikha Tripathi@PESU Blr 34

You might also like