Voice Classification

Download as pdf or txt
Download as pdf or txt
You are on page 1of 9

The International Arab Journal of Information Technology, Vol. 10, No.

5, September 2013 477

Gender Classification in Speech Recognition using


Fuzzy Logic and Neural Network
Kunjithapatham Meena1, Kulumani Subramaniam2, and Muthusamy Gomathy3
1
Shrimathi Indira Gandhi College, Bharathidhasan University, India
2
Department of Computer Application, Shrimathi Indira Gandhi College, India
3
Department of Computer Science, Shrimathi Indira Gandhi College, India

Abstract: Nowadays classification of gender is one of the most important processes in speech processing. Usually gender
classification is based on considering pitch as feature. The pitch value of female is higher than the male. In most of the recent
research works gender classification process is performed using the abovementioned condition. In some cases the pitch value
of male is higher and also pitch of some female is low, in that case this classification does not produce the exact required
result. By considering the aforementioned problem we have here proposed a new method for gender classification method
which considers three features. The new method uses fuzzy logic and neural network to identify the gender of the speaker. To
train fuzzy logic and neural network, training dataset is generated by using the above three features. Then mean value is
calculated for the obtained result from fuzzy logic and neural network. By using this threshold value, the proposed method
identifies the speaker belongs to which gender. The implementation result shows the performance of the proposed technique in
gender classification.

Keywords: Gender classification, fuzzy logic, neural network, energy entropy, short time energy, zero crossing rate.

Received July 16, 2011; accepted December 30, 2011; published online August 5, 2012

1. Introduction determine the linguistic information conveyed by a


speech wave we have to employ computers or
In modern civilized societies for communication electronic circuits [6]. This process is performed for
between human speeches is one of the common several applications such as security device, household
methods [3]. Different ideas formed in the mind of the appliances, cellular phones ATM machines and
speaker are communicated by speech in the form of computers [14].
words, phrases, and sentences by applying some proper Gender classification is applied in many fields. For
grammatical rules [4]. By considering speech as one example it is applied in various applications such as
of the outcome of passing a glottal excitation wave speech recognition, speaker diarization, speaker
form through a time varying linear filter which can be indexing, annotation and retrieval of multimedia
used to represent speech signal, so as a speech database, synthesis, smart human computer interaction
production model that models the resonant biometrics social robots etc. and it is a difficult and
characteristics of the vocal tract [12]. By classifying challenging problem [10]. We can identify
the speech with voiced, unvoiced and silence (VAS/S) physiological differences such as vocal fold thickness
an elementary acoustic segmentation of speech which or vocal tract length and differences in speaking style
is essential for speech can be considered [15]. In of humans as partly the reason gender based
succession to individual sounds called phonemes this differences in human speech [20, 26]. Normally the
technique can almost be identical to the sounds of each higher formant frequencies and fundamental
letter of the alphabet which makes the composition of Frequency (FO) are higher for female speakers and the
human speech [18]. FO differences are larger than the formant frequency
Speech processing is the study of speech signals, differences between male and female groups [25]. For
and the various methods which are used to process male speakers various speech qualities like
them. In this process various applications such as aggressiveness, body size, self-confidence, and
speech coding, speech synthesis, speech recognition assertiveness are related to low FO [8].
and speaker recognition technologies; speech In most of the previous research works classification
processing is employed [25]. Among the above, speech is performed by considering pitch as feature. There are
recognition is the most important one. The main also certain limitations while considering this feature.
purpose of speech recognition is to convert the acoustic To solve the abovementioned problem, here we have
signal obtained from a microphone or a telephone to proposed a new method for gender classification using
generate a set of words [13, 23]. In order to extract and three features by using fuzzy logic and neural network.
478 The International Arab Journal of Information Technology, Vol. 10, No. 5, September 2013

The rest of the paper is structured as follows: The feature space was an important process in the design of
related works are briefly reviewed in section 2 and the a signal classification system. Noise classification is
proposed technique with adequate mathematical crucial process in order to reduce the consequence of
models and illustrations are detailed in section 3. The environmental noises on speech processing tasks. They
implementation results obtained are discussed in have proposed a fuzzy ARTMAP network and
section 4 and section 5 concludes the paper. modified fuzzy ARTMAP network to classify the
various background noise signals. Moreover in
2. Related Works addition to it their experimental results were compared
with both back propagation networks and Radial Basis
Some of the recent research works related to speech Function Network (RBFN).
classification is discussed as follows. Sedaaghi [19], have discussed a comparative study
Rakesh et al. [16] have proposed two different of gender and age classification algorithms which is
models by using several speech processing techniques used in speech signal. Experimental results are used for
and algorithms, and one of their models is used to the Danish Emotional Speech database (DES) and
produce formant values of the voice sample and the English Language Speech Database for Speaker
other model to produce pitch value of the voice Recognition (ELSDSR). Identification of the best
sample. The gender biased features and pitch value of a classifier for gender and age classification when
speaker were extracted by employing these two speech signals were processed, has been made by
models. The mean of formants and pitch of all the experimentally comparing the Bayes classifier using
samples of a speaker were calculated by applying a various techniques Sequential Floating Forward
model having loops and counters which generates a Selection (SFFS) for feature selection, probabilistic
mean of Formant 1 and pitch value of the speaker. The Neural Networks (PNNs), Support Vector Machines
speaker are classified between Male and Female by (SVMs), the K-Nearest Neighbor (K-NN) and
computing euclidean distance from the mean value of Gaussian Mixture Model (GMM), as different
Males and Females of the generated mean values of classifiers. They have shown that gender classification
formant 1 and Pitch by using a nearest neighbor can be carried out with a precision of 95%
technique. Using NI Lab VIEW, the algorithm is approximately by using speech signal either from both
implemented in real time. genders or from male and female individually.
Rao and Prasad [17], have proposed that the Sigmund [21], have proposed an approach for
different time-varying glottal excitation components of automatic identification of gender in a short segment of
speech were used for text independent gender normally spoken continuous speech. They have studied
recognition studies. The excitation information in all the vowels separately to observe which phonemes
speech was represented by using a Linear Prediction are useful for gender recognition and have also
(LP) residual. They have used a Hidden Markov evaluated the selected Mel-Frequency Cepstral
Models (HMMs) to capture the gender-specific Coefficients (MFCCs) based two different simple
information in the excitation of different voiced identifiers. More than 90% of accuracy is being
speech. The decrease in the error during training and achieved for gender identification in short-time
identifying genders during testing phase close to 100% analysis (20 msec) by using the vowel phonemes.
precision have proved that the continuous Ergodic Particularly there was no error for vowel “a”. To
HMM can effectively capture the gender-specific recognize male/female speakers with the accuracy of
information in the excitation component of the speech. more than 93%, the speech duration of 500 msec is
In their gender identification study, they have also enough for text-independent analysis. Automatic
calculated the size of testing data on the gender assessment of speaker’s gender by her/his voice has
recognition performance by using gender specific been an important aspect for achieving high-quality
features in various HMM states, and mixture dialogue systems.
components. They have also performed the gender Mahdi and Jafer [11], have suggested a wavelet-
recognition studies on Texas Instruments and based algorithm for voice and unvoiced classification
Massachusetts Institute of Technology (TIMIT) of speech segments. The classification process
database. involves two steps:
Devi et al. [1], in their study have discussed that the
1. Statistical analysis of the energy-frequency
background noise from noisy environment for example
distribution of the different speech signals by means
car, bus, babble, factory, helicopter, street noise and
of wavelet transform.
more have reduced the performance of speech-
2. Evaluation of the short-time zero-crossing rate of
processing systems like speech coding, speech
the signal.
recognition etc. Thus the classification of noise is
necessary to improve the performance of the speech For each time segment of the pre-emphasized speech,
recognition system. The selection of excellent set of they have also calculated the ratio of the average
features that can efficiently separate the signals in the energy in the low-frequency wavelet sub bands in
Gender Classification in Speech Recognition using Fuzzy Logic and Neural Network 479

comparison to that of highest-frequency wavelet sub have discussed about the features which are used in our
band by using a 4-level dyadic wavelet transform, and method.
then have compared it to a pre-determined threshold.
An experimentally confirmed criterion depends upon 3.1. Feature Analysis for Speech Signals
the results of comparison process was used to obtain
the classification decision. Feature selection plays one of the important roles in
Silovsky and Nouza [22], in his research have gender classification. The gender classification fully
presented a set of methods to categorize of various depends on the feature which we have selected in
audio segments in a system for automatic transcription proposed method. The three features used in our
of broadcast programs. Their task is to decide: method are as follows:

1. Whether the segment should be labeled as speech or • Short Time Energy (STE).
as non-speech and also in the previous case. • Zero Crossing Rate (ZCR).
2. Whether the talking person is one of the speakers in • Energy Entropy (EE).
the database. Among these three features, the most important feature
3. Or else, the speaker belongs to which gender. is ZCR. These features are explained briefly in [2].
Extending the information obtained from transcription Now we can see the basic operation of these three
system and also by improving the performance of the features one by one.
speech recognition module was done by using the
result of classification. Like all other modern speaker 3.1.1. STE
recognition systems, their proposed method is also The STE of speech signal is said to be the sudden
based on GMM. Since the number of the database increase in energy signal. To compute STE, initially
speakers can be large, they have developed a method the signal is split into s windows and then the window
that accelerates the recognition process in a significant function is calculated for each window. The STE is
way. calculated using the equation given below.
While reviewing these recent researches which has

discussed the same problem, in most of the researches S = ∑ y( r) 2
.h( s − r ) (1)
pitch is considered as feature and in some other r =−∞

researches other statistical features are considered. For By using the above equation the STE is calculated.
testing their methods, in some researches emotional From the testing results we have observed that the
speech is utilized and in some other, continuous/real energy entropy output for males is low whereas for
time speech data is considered and in other researches females it is high and continuous.
any words speech dataset was considered.
3.1.2. ZCR
3. Gender Classification using Fuzzy Logic The ZCR is the most important feature considered in
and Neural Network our method. The ZCR is defined as to be the ratio of
Gender classification plays a major role in speech number of time domain zero crossings occurred to the
processing. This technique is used to identify the frame length. The equation 3 shows the formula to
gender of the speaker. There are various methods used calculate zero crossing rate.
N −1
for gender classification. But the major problem is 1
most of these works depends on pitch value. The pitch
Z =
2N
∑ sgn{ x ( i )} − sgn{ x ( i −1 )}
i =1
(2)
mainly depends on the frequency of sound. Normally
where, sgn{x(i)} stands for the sign function, i.e.
the pitch of female is high and for male the pitch is
low. In some cases the pitch of male is higher like the 1; x ( i ) > 0
 (3)
female and also the pitch of female is lower like male. sgn{ x ( i )} = 0 ; x ( i ) = 0
−1 ; x ( i ) < 0
In this situation speech classification using pitch will
not produce appropriate results. By considering this By using the above equation the ZCR for each signal is
drawback here we proposed a new method for speech calculated. From the testing results we observed that
classification using three features namely; energy the ZCR for female speech is higher than that of the
entropy, short time energy, and zero crossing rates. male speech.
Initially the three feature values are computed and
given as an input to the fuzzy logic and neural network 3.1.3. EE
individually and it gives the percentage of male and
EE in speech signal is defined as the sudden different
female feature as output. Then mean value is taken and
changes in the energy level of a speech signal. To
using this value gender classification is done. The
calculate EE, initially the speech signal is split into k
process takes place in proposed method which is
explained briefly in the below sections. Initially, we frames and then the normalized energy for each frame
480 The International Arab Journal of Information Technology, Vol. 10, No. 5, September 2013

is evaluated. The formula to calculate energy entropy After generation of fuzzy rules the next step is to
is given below: train fuzzy logic. The fuzzy logic is trained by using
k −1 the rules shown in Table 1. To train fuzzy logic,
E = −∑ σ 2 .log 2 ( σ 2 ) (4) training datasets are to be generated. The input training
i =0
dataset is generated as {[Emax,Emin], [Smax,Smin],
where, σ2 is the normalized energy. [Zmax,Zmin]}. After completion of training, the fuzzy
By using the above equation the EE is computed. From logic obtained is ready for practical operation. In
the testing results we have observed that the energy testing if we will give E, S and Z values as input to the
entropy for males is low and distributed while for fuzzy logic it will provide the output as the feature
females it is high and remains for a short period. belong to male or female.
The features used in our method are explained in the Table 1. Fuzzy rules.
above sections. Next process is to identify the
S. No Fuzzy Rules for Gender Classification
percentage of male and female feature present in the 1 if E=high and S=low and Z=low, then Male
given speech signal using fuzzy logic and neural 2 if E=high and S=low and Z=medium, then Female/Male
network. 3 if E=high and S=low and Z=high, then Female
4 if E=high and S=medium and Z=low, then Female/Male
5 if E=high and S=medium and Z=medium, then Female
3.2. Identifying Male and Female Feature
6 if E=high and S=medium and Z=high, then Female
using Fuzzy Logic 7 if E=high and S=high and Z=low, then Female
Fuzzy Logic offers several unique parameters which 8 if E=high and S=high and Z=medium, then Female
9 if E=high and S=high and Z=high, then Female
alternatively produces better results in many control
10 if E=medium and S=low and Z=low, then Male
problems [5]. Fuzzy logic here is used to calculate the
11 if E=medium and S=low and Z=medium, then Male
percentage of various male and female features if E=medium and S=low and Z=high, then Female
12
presents in the given speech signal. Generally fuzzy 13 if E=medium and S=medium and Z=low, then Female/Male
logic consists of three important steps. This includes 14 if E=medium and S=medium and Z=medium, then Female/Male
fuzzification, generating fuzzy rules and 15 if E=medium and S=medium and Z=high, then Female/Male
defuzzification. In the fuzzification process the system 16 if E=medium and S=high and Z=low, then Female/Male
data is converted in to fuzzy data. For fuzzification 17 if E=medium and S=high and Z=medium, then Female/Male
process triangular membership function is used. Next 18 if E=medium and S=high and Z=high, then Female
process after this is generating fuzzy rules. Figure 1 19 if E=low and S=low and Z=low, then Male

shows the structure of fuzzy logic used in the proposed 20 if E=low and S=low and Z=medium, then Male
21 if E=low and S=low and Z=high, then Male
method with 3 input variables and one output variable.
22 if E=low and S=medium and Z=low, then Male
23 if E=low and S=medium and Z=medium, then Male
24 if E=low and S=medium and Z=high, then Female/Male
25 if E=low and S=high and Z=low, then Female/Male
26 if E=low and S=high and Z=medium, then Female/Male
27 if E=low and S=high and Z=high, then Female

3.3. Identifying Male and Female Feature using


Neural Network
The main aim of the classification ANNs is to produce
an exact output based on the input parameters [7].
Neural networks are used here to calculate the
Figure 1. Structure of fuzzy used in our method.
percentage of female and male features present in a
given speech signal. Basically neural network consists
3.2.1. Fuzzy Rules Generation
of three layers namely; input layer, hidden layer and
The input to our fuzzy logic is energy entropy (E), output layer. In our method input layer has three
short time energy (S) and zero crossing rate (Z) and the variables, hidden layer has n variables and output layer
output obtained from the fuzzy is the percentage of has one variable. The input to the neural network is
various male and female features which are present in energy entropy, short time energy and zero crossing
the given speech signal. The input variables are rate. The two stages of operation which takes place in
fuzzified into three various sets namely; large, medium neural network are training stage and testing stage. For
and small and the output variable is fuzzified into three training of neural network, training dataset is
sets namely; male, female/male and female. In female generated. The input training dataset is generated as
/male the speech signal belongs to either male or {[Emax,Emin], [Smax,Smin], [Zmax,Zmin]}. Figure 2 shows
female. The fuzzy rules generated are shown in Table the structure of neural network used in the proposed
1. method with 3 input variables and one output variable.
Gender Classification in Speech Recognition using Fuzzy Logic and Neural Network 481

if S final < S threshold ; then female


classification =  (8)
 f S final > S threshold ; then male

From the above equation we obtain the speaker


belongs to which gender. The threshold used in our
method is 0.5.

4. Result and Discussions


This proposed technique was implemented in
MATLAB 7.10 and is tested for different speech
Figure 2. Structure of neural network used in proposed method.
signals from Harvard-Haskins database [24]. Here 80
speech signals are taken as an input and then splitted
3.3.1. Neural Network Training for Gender into four datasets. Initially the neural network and
Classification fuzzy logic is trained by using some speech signal, and
The steps for training the neural network are: testing is performed by using a set of speech signal as
input to the proposed method so that it identifies the
• Step 1: Initialize the input weight of each neuron. speaker gender. The result of proposed technique i.e.,
• Step 2: Apply a training dataset to the network. combination of fuzzy logic and neural network are
Here E, S and Z are the input to the network and compared with the Fuzzy Logic (FL) and Neural
M/F is the output of the network. Network (NN), Naive Bayes (NB) and using pitch as
n
(5) feature. From the comparison results, it is clear that our
M / F = ∑ W 2 r1 y( r ) method is better than the other methods.
r =1
where, The performance of the proposed method, fuzzy
1 logic and neural network are explained separately in
y( r) = (6)
1 + exp( −w 11 r .( E + S + Z )) the below sections.

Equation 5, represents the activation function 4.1. Performance Analysis of Gender


performed in the output and input layer respectively. Classification
• Step 3: Adjust the weights of all neurons.
• Step 4: For each E, S and Z corresponding male and The True Positive (TP), True Negative (TN), False
female feature is computed. Positive (FP) and False Negative (FN) values are
• Step 5: Repeat the iteration process till the output calculated from the results obtained to the proposed
reaches its least value. method, fuzzy logic and neural network. The above
four values are used to compute performance
After completing the training neural network is ready parameters like false positive rate (α), false negative
for various practical applications. Next step after rate (β), sensitivity (SE), specificity (SP), Likelihood
completion of training is testing neural network. Ratio Positive (LRP), Likelihood Ratio Negative
During testing speech signal is given as input, it (LRN), Accuracy (Acc) and Precision (Pre) using the
provides the percentage of male and female feature equations given below:
present in that signal.
TN
SP = (9)
( FP + TN )
3.4. Gender Classification for the Given Speech
Signal TP
SE = (10)
( TP + FN )
After completing the process of training fuzzy logic
and neural network, the next process is to identify the FP
α= (11)
gender of the speaker. The initial step is to compute the ( FP + TN )
mean value of output obtained from fuzzy logic and FN
β= (12)
neural network. The mean value is calculated using the ( TP + FN )
equations given below:
Sensitivity
LRP = (13)
S fuzzy + S NN (7) ( 1 − specificity )
S final =
2 ( 1 − Sensitivit y )
LRN = (14)
where, Sfuzzy is the output generated from fuzzy logic specificit y
and SNN is the output obtained from neural network. TP + TN
After calculating these mean values, the speech Acc = (15)
TP + FP + TN + FN
signal is splitted in to male and female using a
threshold value.
482 The International Arab Journal of Information Technology, Vol. 10, No. 5, September 2013

TP from each dataset for proposed different methods,


Pr e = (16)
( TP + FP ) fuzzy logic, neural network, Naive Bayes and using
Using these above equations the performance of pitch are shown below.
proposed method, FL, NN, NB and using pitch is Accuracy vs Dataset
calculated and values obtained are displayed in the
below table for all the four dataset. Here we have
tested 80 speech signals which are divided into 4

Accuracy
datasets with 20 signals each.
Table 2. Performance analysis.
Proposed Using
Data Set FL NN NB
Method Pitch
1 0.8 1 0.8 0 0.1 Dataset
SP 2 0.8 1 0.8 0 0.2
Figure 3. Comparison graph for accuracy vs dataset.
3 0.8 1 0.8 0 0.1
4 0.7 1 0.7 1 0.2
Specificity vs Dataset
1 0.5 0 0.5 0.5 1
2 0.4 0 0.4 1 1
SE
3 0.3 0 0.3 1 1
4 0.3 0 0.3 0 1

Specificity
1 5 0 5 5 10
2 4 0 4 10 10
TP
3 3 0 3 10 10
4 3 0 3 0 10
1 8 10 8 0 1
TN 2 8 10 8 0 2
3 8 10 8 0 1 Dataset
4 7 10 7 10 2
Figure 4. Comparison graph for specificity vs dataset.
1 2 0 2 10 9
FP 2 2 0 2 10 8 Specificity vs Dataset
3 2 0 2 10 9
4 3 0 3 0 8
1 5 10 5 5 0
FN 2 6 10 6 0 0
Specificity

3 7 10 7 0 0
4 7 10 7 10 0
1 0.2 0 0.2 1 0.9
2 0.2 0 0.2 1 0.8
α
3 0.2 0 0.2 1 0.9
4 0.3 0 0.3 0 0.8
1 0.5 1 0.5 0.5 0
Dataset
2 0.6 1 0.6 0 0
β
3 0.7 1 0.7 0 0 Figure 5. Comparison graph for sensitivity vs dataset.
4 0.7 1 0.7 1 0
1 2.5 0 2.5 0.5 1.11 Precision vs Dataset
LRP 2 2 0 2 1 1.25
3 1.5 0 1.5 1 1.11
4 1 0 1 0 1.25
Precision

1 0.625 1 0.625 0 0
LRN 2 0.75 1 0.75 0 0
3 0.875 1 0.875 0 0
4 1 1 1 1 0
1 0.65 0.5 0.55 0.25 0.5
Acc 2 0.6 0.5 0.6 0.5 0.55
3 0.55 0.5 0.5 0.5 0.5
Dataset
4 0.5 0.5 0.5 0.5 0.6
1 0.714 0 0.714 0.33 0.526 Figure 6. Comparison graph for precision vs dataset.
2 0.667 0 0.667 0.5 0.55
Pre
3 0.6 0 0.6 0.5 0.526 The above Figures 3, 4, 5, and 6 shows the
4 0.5 0 0.5 0 0.55
accuracy, specificity, sensitivity and precision vs
dataset graph respectively for proposed method, fuzzy
Table 2 shows the performance of proposed method
logic, neural network, Naive Bayes and using pitch.
and other methods like fuzzy logic, neural network,
From the above graphs it is clear that the proposed
Naive Bayes and using pitch for various performance
method is better than other methods.
parameters. From the table obtained above, it is clear
Figure 7 shows the membership function used for
that the accuracy of the proposed method is very much
training fuzzy logic and Figures 8, 9 and 10 shows the
better than the fuzzy logic and neural network, Naive
performance, regression and training graph obtained
Bayes and using pitch. The graph of accuracy,
during the training of neural network respectively.
specificity, sensitivity and precision values obtained
Gender Classification in Speech Recognition using Fuzzy Logic and Neural Network 483

energy and zero crossing rates. Firstly mean values are


calculated for three features by using training dataset
and percentage of male and female features which are
present in the speech signal are computed using fuzzy
logic and neural network individually and then the
mean value is taken to identify the gender of the
speaker. This approach was implemented in the
working platform of MATLAB for testing. The
proposed method was tested using Harvard-Haskins
database. During testing if a speech signal is given as
Figure 7. Fuzzy member ship function used in proposed method. input it will identify the gender of the speaker to which
speaker belongs. The results obtained from proposed
method are compared with the fuzzy and neural
network, Naive Bayes and using pitch as feature.
Comparison results have shown that our method is
better than the other methods in gender classification.

Reference
[1] Devi M., Kasthuri N., and Natarajan A.,
“Performance Comparison of Noise
Classification using Intelligent Networks,”
International Journal of Electronics Engineering,
Figure 8. Performance graph obtained during neural network vol. 2, no. 1, pp. 49-54, 2010.
training.
[2] Gomathy M., Meena K., and Subramaniam K.,
“Gender Grouping in Speech Recognition using
Statistical Metrics of Pitch Strength,” European
Journal of Scientific Research, vol. 61, no. 4, pp.
524, 2011.
[3] Gudi A., Shreedhar H., and Nagaraj H., “Signal
Processing Techniques to Estimate the Speech
Disability in Children,” IACSIT International
Journal of Engineering and Technology, vol. 2,
no. 2, pp. 169-176, 2010.
[4] Gudi A. and Nagaraj H., “Optimal Curve Fitting
of Speech Signal for Disabled Children,”
Figure 9. Regression graph obtained during neural network International Journal of Computer Science &
training. Information Technology, vol. 1, no. 2, pp. 99-
107, 2009.
[5] Haider T. and Yusuf M., “A Fuzzy Approach to
Energy Optimized Routing for Wireless Sensor
Network,” The International Arab Journal of
Information Technology, vol. 6, no. 2, pp. 179-
188, 2009.
[6] Haraty R. and Ariss O., “CASRA+: A Colloquial
Arabic Speech Recognition Application,”
American Journal of Applied Sciences, vol. 4, no.
1, pp. 23-32, 2007.
[7] Haraty H. and Ghaddar C., “Arabic Text
Figure 10. Training graph obtained during neural network training. Recognition,” The International Arab Journal of
Information Technology, vol. 1, no. 2, pp. 156-
5. Conclusions 163, 2004.
[8] Hasegawa Y. and Hata K., “Non-Physiological
In this paper, a novel gender classification technique in Differences between Male and Female Speech:
speech processing using neural network and fuzzy Evidence from the Delayed F0 fall Phenomenon
logic was proposed. In this technique gender in Japanese,” in Proceedings of the International
classification is performed by considering three Conference on Spoken Language Processing,
different features such as energy entropy, short time Japan, pp. 1179-82, 1994.
484 The International Arab Journal of Information Technology, Vol. 10, No. 5, September 2013

[9] Hasegawa Y. and Hata K., “The Function of F0- [21] Sigmund M., “Gender Distinction using Short
Peak Delay in Japanese,” in Proceedings of the Segments of Speech Signal,” International
21st Annual Meeting of the Berkeley Linguistics Journal of Computer Science and Network
Society, pp. 141-151, 1995. Security, vol. 8, no. 10, pp. 159-162, 2008.
[10] Kotti M. and Kotropoulos C., “Gender [22] Silovsky J. and Nouza J., “Speech, Speaker and
Classification in Two Emotional Speech Speaker’s Gender Identification in Automatically
Databases,” in Proceedings of the 19th Processed Broadcast Stream,” Radio Engineering
International Conference on Pattern Journal, vol. 15, no. 3, pp. 42-48, 2006.
Recognition, Tampa, pp. 1-4, 2008. [23] Singh G., Junghare A., and Chokhani P., “Multi
[11] Mahdi A. and Jafer E., “Two-Feature Utility E-Controlled Cum Voice Operated Farm
Voiced/Unvoiced Classifier Using Wavelet Vehicle,” International Journal of Computer
Transform,” The Open Electrical and Electronic Applications, vol. 1, no. 13, pp. 109-113, 2010.
Engineering Journal, vol. 2, no. 1874-3005, pp. [24] Vesicle, available at: http://vesicle.nsi.edu/
8-13, 2008. users/patel/download.html, last visited 2002.
[12] McAulay R. and Quatieri T., “Speech Processing [25] Zanuy F., McLaughlin S., Esposito A., Hussain
Based on a Sinusoidal Model,” The Lincoln A., Schoentgen J., Kubin G., Kleijn W., and
Laboratory Journal, vol. 1, no. 2, pp. 153-168, Maragos P., “Non-Linear Speech Processing:
1988. Overview and Applications, Control & Intelligent
[13] Othman A. and Riadh M., “Speech Recognition Systems,” ACTA Press, vol. 30, no. 1, pp. 1-10,
using Scaly Neural Networks,” in Proceedings of 2002.
World Academy of Science, Engineering and [26] Zengi Y., Wu Z., Falk T., and Chan W., “Robust
Technology, vol. 38, pp. 253-258, 2008. GMM Based Gender Classification using Pitch
[14] Patel I. and Rao S., “Speech Recognition using and Rasta-PLP Parameters of Speech,” in
HMM with MFCC- an Analysis using Frequency Proceedings of the 5th International Conference
Specral Decomposion Technique,” Signal & on Machine Learning and Cybernetics, Dalian,
Image Processing : An International Journal, pp. 13-16, 2006.
vol. 1, no. 2, pp. 101-110, 2010.
[15] Qi Y. and Hunt B., “Voiced-Unvoiced-Silence Kunjithapatham Meena received
Classifications of Speech using Hybrid Features her MSc, M.Phil, ME in computer
and a Network Classifier,” IEEE Transactions on science and engineering, MIE, PhD.
Speech and Audio Processing, vol. 1, no. 2, pp. She is the vice-chancellor of
250-255, 1993. Bharathidhasan University. She is
[16] Rakesh K., Dutta S., and Shama K., “Gender the principal and director MBA and
Recognition using Speech Processing Techniques MCA of Shrimathi Indira Gandhi
in LABVIEW,” International Journal of College, Trichirapalli. She has rich experience in the
Advances in Engineering & Technology, vol. 1, development of software tools for the assessment of
no. 2, pp. 51-63, 2011. specially abled children. Also, provides consultancy
[17] Rao R. and Prasad A., “Glottal Excitation Feature for organizing specific programmes for creating
Based Gender Identification System using awareness/literacy about the computer and information
Ergodic HMM,” International Journal of technology among specific cross-sections of the
Computer Applications, vol. 17, no. 3, pp. 31-36, society (Co-ordinator of the novel project IT ON
2011. WHEELS-from Lab to Land). Provides counseling for
[18] Rodger J. and Pendharkar P., “A Field Study of higher education, career placement and training.
the Impact of Gender and User’s Technical
Experience on the Performance of Voice- Kulumani Subramaniam
Activated Medical Tracking Application,” received his BSc, MSc degree in
International Journal of Human-Computer maths, MA (English), MEd, MSc
Studies, vol. 60, no. 2, pp. 529-544, 2004. (IT) and PhD degree in maths and
[19] Sedaaghi M., “A Comparative Study of Gender computer applications from Madras,
and Age Classification in Speech Signals,” Annamalai University, Madurai and
Iranian Journal of Electrical & Electronic Bharathidhasan Universities, Tamil
Engineering, vol. 5, no. 1, pp. 1-12, 2009. nadu, India in the years 1966, 1969, 1982, 1977, 1983,
[20] Shue Y. and Iseli M., “The Role of Voice Source 2009 and 2003 respectively. From 1969 to 2007 he has
Measures on Automatic Gender Classification,” been an educationist for mathematics, english,
in Proceedings of IEEE International Conference educational technology and computer applications as
on Acoustics, Speech and Signal Processing, Las lecturer and professor. He has headed the Department
Vegas, pp. 4493-4496, 2008. of Master of Computer Applications, Shrimathi Indira
Gandhi College, Trichy-2, from 2007 to 2010.
Gender Classification in Speech Recognition using Fuzzy Logic and Neural Network 485

Muthusamy Gomathy received


BSc (Chemistry) degree in Holy
Cross College in the year 1998. She
has completed her MSIT degree in
the year 2001 from Shrimathi Indira
Gandhi College, Trichy,
Bharathidhasan University, Then,
She has completed her M.Phil degree from St.Josephs
College, Trichy in the year 2002. From 2003 to 2010,
She has been an Educationist for computer
applications, Information Technology as Lecturer and
Professor.

You might also like