1.0 Introduction To Speech Processing
1.0 Introduction To Speech Processing
1.0 Introduction To Speech Processing
Speech Production
Speech Perception
Speech Analysis
Speech Synthesis
Speech Recognition
Speech Coding
Human Factors
Hearing
Texts and Web Pages
Principles of Computer Speech
I. H. Witten Academic Press 1982
Speech Processing
Ed. Chris Rowden McGraw-Hill 1992
Web addresses:
http://www.speech.usyd.edu.au/comp.speech COMP.SPEECH
http://svr-www.eng.cam.ac.uk/comp.speech COMP.SPEECH
http://www.cse.ogi.edu/CSLU/ CSLU
http://www.cstr.ed.ac.uk/projects/festival.html FESTIVAL
http://tcts.fpms.ac.be/synthesis/mbrola.html MBROLA
http://www.speech.psychol.ucl.ac.uk/ UCL
http://www.ISIP.MsState.Edu/resources/ MISSISSIPPI U
SPEECH
What is it?
Linguistics
Acoustics
Physiology
The Speech Chain (Denes & Pinson)
Speaker Listener
Data
Typically 8 or 16 bit integers but variety of data supported
including floating point depending on format.
http://www.phon.ox.ac.uk/~jcoleman/phonation.htm
Amplitude
50
Opening
Time (ms)
Closure
phase
Closing
phase
Period = 12.5ms
Fundamental frequency = 1/.0125 = 80Hz
Glottal Pulse
Rosenberg JASM 49, 1971
Intensity
Spectrum of glottal pulse
Frequency (Hz)
http://www.exploratorium.edu/exhibits/vocal_vowels/vocal_vowels.html
http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html
Intensity
Spectrum of glottal pulse
filtered by the vocal tract
Frequency (Hz)
The lips
The teeth
The alveolar ridge
The palate
The velum
The tongue
The jaws
http://www.exploratorium.edu/exhibits/vocal_vowels/vocal_vowels.html
http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html
2.2 Sounds
The individual sounds of speech are known as phones but these
sounds differ slightly according to their context.
For example the words key and caw both begin with a k sound but the
tongue position is not quite the same in each case.
In fact a native speaker probably doesn't notice the difference. In this
situation k and c are said to belong to the same class of sounds and we
distinguish this class by the phoneme /k/. A phoneme, therefore, is an
abstract linguistic unit and is defined as the smallest meaningful
contrastive unit in the language.
The group of phones which are placed together to make the phoneme
are called allophones.
What distinguishes sounds?
Approximants
These form a class of sounds which lie on the border between
consonants and vowels. There's a marginal degree of constriction.
They can be sub-divided into liquids (l, r) and glides (w, y).
Laterals l
Approximants w r y
http://www.chass.utoronto.ca/~danhall/phonetics/sammy.html
Vowels
Dipthongs
ou as in pound ia as in deer
ie as in tie ur as in poor
oa as in boat ei as in their
oi as in toy ai as in make
front central back
ee uu high
mid
o
low
front
central
high
back
mid-high
mid-low
low
Vowel /uu/
as in rude
as in car
as in feet
High, front,spread
/ee/ /ar/ /uu/
Wrong /r/ /o/ /ng/
Moving /m/ /uu/ /v/ /i/ /ng/
Southampton /s/ /ou/ /th/ /aa/ /m/ /p/ /t/ /a/ /n/
Sixteen /s/ /i/ /k/ /s/ /t/ /ee/ /n/