MFCC Step
MFCC Step
MFCC Step
www.iosrjen.org
I.
INTRODUCTION
Speech signals are naturally occurring signals and hence, are random signals. These informationcarrying signals are functions of an independent variable called time. Speech recognition is the process of
automatically recognizing certain word which is spoken by a particular speaker based on some information
included in voice sample. It conveys information about words, expression, style of speech, accent, emotion,
speaker identity, gender, age, the state of health of the speaker etc. There has been a lot of advancement in
speech recognition technology, but still it has huge scope. Speech based devices find their applications in our
daily lives and have huge benefits especially for those people who are suffering from some kind of disabilities
[3] [4]. We can say that such people are restricted to show their hidden talent and creativity. We can also use
these speech based devices for security measures to reduce cases of fraud and theft [7].
Speech Sample
Denoising
Pattern Output
matching
Feature
Extraction
II.
Speech
input
PreEmphasis
Framing
FEATURE EXTRACTION
Window
ing
Fast Fourier
Transform
Mel Filter
bank
DCT
Delta
Energy
Output
21 | P a g e
Fig. 5: Pre-Emphasis
2.
Frame blocking: The speech signal is segmented into small duration blocks of 20-30 ms known as frames.
Voice signal is divided into N samples and adjacent frames are being separated by M (M<N). Typical
values for M=100 and N=256. Framing is required as speech is a time varying signal but when it is
examined over a sufficiently short period of time, its properties are fairly stationary. Therefore short time
spectral analysis is done.
3.
Hamming Windowing: Each of the above frames is multiplied with a hamming window in order to keep
continuity of the signal. So to reduce this discontinuity we apply window function. Basically the spectral
distortion is minimized by using window to taper the voice sample to zero at both beginning and end of
each frame.
Y (n) = X (n) * W (n)
Where W (n) is the window function
22 | P a g e
Fast Fourier Transform: FFT is a process of converting time domain into frequency domain. To obtain
the magnitude frequency response of each frame we perform FFT. By applying FFT the output is a
spectrum or periodogram.
Triangular band pass filters: We multiply magnitude frequency response by a set of 20 triangular band
pass filters in order to get smooth magnitude spectrum. It also reduces the size of features involved.
Mel (f) =1125* ln (1+f/700)
23 | P a g e
Log energy: We can also calculate energy within a frame. It can be another feature to MFCC.
Delta cepstrum: We can add some other features by calculating time derivatives of (energy + MFCC)
which give velocity and acceleration.
(t) =[ = (t+)]/[ = 2 ]
Value for M=2, if we add the velocity, feature dimension is 26. If we add both acceleration and velocity, the
feature dimension is 39.
24 | P a g e
CONCLUSION
In this research we have successfully denoise the input sample and while extracting the MFCC
coefficients we also taken into the consideration of Delta energy function and draw a conclusion that we can
increase the MFCC coefficient according to our requirement. We can add velocity and acceleration to extract 39
MFCC coefficients. The MFCC feature extraction technique is more effective and robust, and with the help of
this technique we can normalizes the features as well, and it is quite popular technique for isolated word
recognition in English language. Features are extracted based on information that was included in the speech
signal. Extracted features were stored in a .mat file. In our future work we will do another breakthrough in the
field of research, and will use these extracted MFCC coefficients for designing a speaker independent system
type.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
S. Dhingra, G. Nijhawan and P. Pandit, Isolated Speech Recognition using MFCC and DTW,
International journal of Advanced Research in Electrical, Electronics and Instrumentation
Engineering,8(2), 2013.
C. Poonkuzhali, R. Karthiprakash, S. Valarmathy and M. Kalamani, An Approach to feature selection
algorithm based on Ant Colony Optimization for Automatic Speech Recognition, International journal of
Advanced Research in Electrical, Electronics and Instrumentation Engineering, 11(2), and 2013.
V. Sharma and P. Sharma, Discrete and continuous Mouse Motion using Vocal and Non-Vocal
Characteristics of Human Voice, International journal of Computer Science and Engineering
Technology,4,2013.
C. Ittichaichareon, S. Suksri and T. Yingthawornsuk, speech Recognition using MFCC, International
Conference on Computer Graphics Simulation and Modeling, 2012.
N.N. Lokhande, N.S. Nehe and P.S. Vikhe , MFCC based Robust features for English word Recognition,
IEEE, 2012.
L. Muda, M. Begam and I. Elamvazuthi, Voice Recognition Algorithms using Mel Frequency Cepstral
Coefficient (MFCC) and Dynamic Time Warping(DTW) Techniques, Journal of Computing, 3(2),2010.
Anjali, A. Kumar and N. Birla, Voice Command Recognition System based on MFCC and DTW,
International Journal of Engineering Science and Technology, 2(12),2010.
25 | P a g e