Boosted Convolutional Neural Network For Real Time Facial Expression Recognition

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

1

Boosted Convolutional Neural Network for Real


Time Facial Expression Recognition
Mahamat Nour Ali Mai Ilham Fadhillah Amka Assoc.Prof.Amelia Ritahani Ismail
mahamatnouralimai@gmail.com ilhamfadhillahamka1@gmail.com amelia@iium.edu.my

Abstract—Facial expression recognition systems have attracted but nothing in the success of BCNN, our aim to train the weak
research interest in the field of artificial intelligence. Facial classifier to form into stronger to get highest performance.
expression is an important channel for human communication With the power of computers today and the current
and can be applied in many real-life applications. For this
project we developed Facial expression recognition system, and breakthrough in technologies, there are now various meth-
it is implemented using Convolution Neural Network (CNN). ods/algorithms that were developed to enable a com-
CNN model of the project is based on combination of different puter/machine to perform tasks such as face detection with
activation function to improve the performance. Kaggle facial emotion features. In this paper, the objective will be on
expression dataset with seven facial expression labels as happy, how to develop real time facial expression recognition using
sad, surprise, fear, anger, disgust, and neutral is used in this
project. The best combination of activation function achieved boosted convolutional neural network. This paper is going to
95.23 % accuracy on training set and 52% on testing set. implement boosted convolutional neural network for the weak
classifier to form a strong classifier.
Index Terms—Facial Expression Recognition, Machine Learn-
ing, Deep Learning, Convolutional Neural Network, Boosted In this paper facial expression recognition system is going to
Convolutional Neural Network, Computer Vision. be implemented using boosted convolutional neural network.
Facial images are classified into six facial expression labels
namely angry, fear, disgust, happy, sad, surprise, and neutral.
I. I NTRODUCTION The data will be from Kaggle data set (FER2013) that will be
used to train and test the classifier.

F ACIAL behavior is one of the most important cues for


sensing human emotion and intention in people. Facial
expression contains crucial and important information about
II. L ITERATURE R EVIEW
In the current study, the seven states of the facial expression
human appearance and human activity. It plays an important
are recognized by using convolutional neural networks [6] [7]
role in the applications of human-centered computing, such
[8] which it includes three steps of feature learning, selection,
as human-machine interfaces, human emotion analysis, and
and classification simultaneously. Usually, training network
medical care [1]. Facial expression, which plays a vital role
with more than two layers was a difficult problem in last
in social interaction, is one of the most important nonverbal
decade that with progress of GPUs, it is possible to train neural
channels through which human machine interaction (HMI)
network with more than one layers. Convolutional neural
systems can recognize humans’ internal emotions. Face is
network has three alternating types of layers which includes
the most distinctive and widely used key to identity a person
convolutional, sub-sampling and fully connected layers.
face. Face detection and facial feature extraction have attracted
considerable attention in the advancement of human-machine
interaction as it provides a natural and efficient way to A. Convolutional Neural Network
communicate between humans and machines. The problem of Convolutional neural network is one of the most important
detecting faces and facial parts in image sequences has become deep learning technique in machine vision and image recogni-
a popular area of research due to emerging applications in tion. Convolutional Neural Networks were inspired by research
human-computer interface, surveillance systems, secure access done on the visual cortex of mammals and how they perceive
control, video conferencing, financial transaction, forensic ap- the world using a layered architecture of neurons in the brain.
plications. Facial emotions are created as a result of distortions Think of this model of the visual cortex as groups of neurons
of facial features due to the constriction of facial muscles. designed specifically to recognize different shapes. Each group
The facial expressions are inspected for recognizing the basic of neurons fires at the sight of an object, and communicate
human emotions like anger, disgust, fear, happy, sad, surprise with each other to develop a holistic understanding of the
and neutral. perceived object. Convolutional neural networks (CNNs) [9],
Many have established facial expression recognition (FER) which are composed of multiple processing layers to learn
systems use standard machine learning and extracted features, the representations of data with multiple abstract levels, are
which do not have significant performance when applied to the most successful machine learning models in recent years.
previously unseen data [2], [3], [4] [5]. within the past few However, these models can have millions of parameters and
month a few papers have been published that use deep learning many layers, which are difficult to train, and sometimes several
for FER which have been successfully achieved .60 accuracy, days or weeks are required to tune the parameters.
2

Convolutional neural network includes six components such 6) Softmax layer: The error of the network is propagated
as convolutional layer, Sub-sampling layers, Rectified linear back through a softmax layer. If N be the size of the input
unit (ReLU), Fully connected layer, Output layer and Softmax vector, a mapping can be calculates by softmax such that:
layer [10]. S(x) : R[0, 1]N , and each components of the softmax layer
1) Convolutional Layer: Convolutional layers can be de- is calculated as follows:
termined by the number of generated maps and kernel’s size.
The kernel is moved over the valid area of the given image
(perform a convolution) for generating the map, output of the
layers can be calculates as follows :
Where 1 <= j <= N

III. M ETHODOLOGY
The main aim of this paper is to implement an efficient
method to detect the face, emotion of the person in real time
and improve the performance.

2) Sub-sampling layers: Sub-sampling layers in CNN re-


A. A Brief Review of CNNs
duce the map size of previous layer in order to increase the
invariance of the kernels. Sub-sampling includes two types A CNN consists of many layers such as convolutional
of average pooling and maximum-pooling [10]. By applying layers, pooling layers, rectification layers and fully connected
maximum function in the Max-Pooling, output of max-pooling (FC) layers. Convolutional layers comprised of filters and
can be calculated as follows : features maps, the feature are the neurons of the layers, and
the feature map is the output of one filter applied applied to the
previous layers. For instance, the pooling layers follow a se-
quence of one or more convolutional layers and are intended to
consolidate the features learned and expressed in the previous
layers feature map. Fully connected layers are the normal flat
3) Rectified linear unit: A rectied linear unit is a ac- feed-forward neural network layer. These layers may have a
tivation function which it simply thresholded at zero and non-linear activation function or a softmax activation in order
can be calculated as follows: ReLU has advantages over to output the probabilities of class predictions.

B. Dataset and Features


In this project, we used facial expression challenge dataset
provided by Kaggle, which consists of about 35,500 well-
tanh/sigmoid function in which it can be implemented by
structured 48x48 pixel gray scale images of faces. The images
simple thresholding at zero, while in tanh/sigmoid there are
are processed in such a way that the faces are almost centered
expensive operations like exponentials. ReLU is also prevents
and each face occupies about the same amount of space in
loosing gradient error, and extremely accelerate the stochastic
each image. Each image has to be categorized into one of
gradient descent convergence compared with the tanh/sigmoid
the seven classes that express different facial emotions. These
functions.
facial emotions have been categorized as: 0=Angry, 1=Disgust,
4) Fully connected layer: Fully connected layers are sim-
2=Fear, 3=Happy, 4=Sad, 5=Surprise, and 6=Neutral. Figure
ilar to neurons in general neural networks which its neurons
1 depicts one example for each facial expression category. In
are fully connected with every neurons in the prior layer. If the
addition to the image class number (a number between 0 and
x be input with size k and the number of neurons represented
6), the given images are divided into three different sets which
by l in the fully connected layer, the layer can be calculated
are training, validation, and test sets. There are about 28,500
as follows : where σ is activation function.
training images, 3,500 validation images, and 3,500 images
for testing. After reading the raw pixel data, we normalized
them by subtracting the mean of the training images from each
image including those in the validation and test sets.

5) Output layer: The output layer represent class of the IV. A NALYSIS
input image which its size equal to number of classes. Output We built and train our model with a normal convolutional
vector x produce resulting class as follows: neural network CNN. This network had four convolutional
layers and one FC layer. In the first convolutional layer, we had
32 3x3 filters, with the stride of size 1, along with batch nor-
malization and dropout, and max pooling of 2x2 alongside we
used Rectified Linear Unit (ReLU) as the activation function.
3

combination. Furthermore, Rectified Linear Unit (ReLU) and


SoftMax combination has reduced the overfitting behavior of
the learning model by adding more non-linearity and hierarchi-
cal usage of anti-overfitting techniques such as drop-out and
batch normalization in addition to L2 regularization. Moreover,
we computed the confusion matrices for the both models.
Figures 4 and 5 present the visualization of the confusion
matrices. As demonstrated in these figures, the combination of
Rectified Linear Unit (ReLU) and SoftMax re-sults in higher
true predictions for most of the labels like happy label, sad
label, angry label and natural label. However, the combination
of Rectified Linear Unit (ReLU) and Sigmoid show a poor
result in all labels.
Fig. 1. Examples of seven facial emotions that we consider in this classifica-
tion problem. (a) angry, (b) neutral, (c) sad, (d) happy, (e) surprise, (f) fear,
(g) disgust

In the second convolutional layer, we had 64 3x3 filters, with


the stride of size 1, along with batch normalization and dropout
and also max-pooling with a filter size 2x2 and same activation
function as the previous layer. In the third convolutional layer,
we had 128 5x5 filters. In the fourth layer we had 256 3x3 In
the FC layer, we had a hidden layer with 512 neurons and Soft
Max as the loss function. Therefore, we trained this network
with all of the images in the training set, with 15 epochs and
a batch size of 256 and cross-validated the hyper-parameters
of the model with different values for regularization, learning
rate and the number of hidden neurons. To validate our model
in each iteration, we used the validation set and to evaluate the
performance of the model, we used the test set. This model
has achieved an accuracy of 95% on training set, it’s achieved
55% accuracy on the validation set and 52% on the test set.
To compare the difference of the performance we trained Fig. 2. Confusion Matrix with the combination of ReLu and SoftMax
the same network with different combination of activation
function, we first trained the model with Tangent Activation
Function (Tanh), and sigmoid as the output and lost function
of the neurons, in this first set we achieved very low ac-
curacy on the training set of 52%, validation set 24% and
test set 21%. Then we combine other activation function
such as Rectified Linear Unit (ReLU) and sigmoid, and
Tanh and Rectified Linear Unit (ReLU), in overall with all
these different combination of activation functions we couldn’t
achieve higher performance. Therefore, the best combination
among the activation which achieved higher accuracy was
Rectified Linear Unit (ReLU) and SoftMax as the output and
lost function activation which has achieved an accuracy of 95%
on training set, it’s achieved 55% accuracy on the validation
set and 52% on the test set and we also considered as the best
combination for our network.

V. R ESULTS
To compare the performance of the model with the com-
bination of different activation function, we plotted the loss
history and the obtained accuracy in these models. Figures 2
and 3 exhibit the results. As seen in Figure 3, the combination
of Rectified Linear Unit (ReLU) and SoftMax enabled us to Fig. 3. Confusion Matrix with the combination of ReLu and Tanh
increase the validation accuracy by 31.50% compare to other
4

[8] M. Mohammadpour, S. Mohamad. R Hashemi, H. Khaliliardi, M. M.


AlyanNezhadi.
Facial Emotion Recognition using Deep ConvolutionalNetworks. 2017,
IEEE 4th International Conference on Knowledge-Based Engineering and
Innovation (KBEI) doi: 10.1109/KBEI.2017.8324974
[9] S. Jye Lee, T. Chen, L. Yun, and C. Hui Lai. Image Classification Based
on the Boost Convolutional Neural Network. 2017 doi: 10.1109/AC-
CESS.2018.2796722
[10] Michael A Nielsen. Neural Networks and Deep Learning. 2015

Fig. 4. Loss and Accuracy of training and validation of the model with
combination of ReLu and SoftMax

Fig. 5. Loss and Accuracy of training and validation of the model with
combination of ReLu and Tanh

VI. C ONCLUSION
We develop various cnn model with many different combi-
nation of activation function for a facial expression recognition
problem and evaluate their performance using different post-
processing and visualization techniques. The results demon-
strated that the combination of Rectified Linear Unit (ReLU)
and SoftMax activation function with cnn model are capable of
learning facial characteristic and improving facial expression
detection.

VII. F UTURE W ORK


For the future work in this project, we would like to
boost cnn using XGboost to see the difference in term of
performance with normal cnn.

R EFERENCES
[1] Z. Liu, H. Wang, Y. Yan, and G. Guo. Effective Facial Expression
Recognition via the Boosted Convolutional Neural Network. Springer-
Verlag, Berlin Heidelberg, 2015.
[2] A. Raghuvanshi, and V. Choksi. Facial Expression Recognition with
Convolutional Neural Networks. Stanford University, 2016.
[3] D. Yang, T. Kunihiro, H. Shimoda, and H. Yoshikawa. A Study
of Real-time Image Processing Method for Treating Human Emotion
by Facial Expression. IEEE SMC’99 Conference Proceedings. 1999
IEEE International Conference on Systems, Man, and Cybernetics (Cat.
No.99CH37028), Tokyo, Japan, 1999, pp. 360-364 vol.2. doi: 10.1109/IC-
SMC.1999.825285.
[4] J. Zhu, and Z. Chen. Real Time Face detection System Using Ad-
aboost and Haar-like Features. Shanghai, 2015, pp. 404-407. doi:
10.1109/ICISCE.2015.95
[5] Y. Wang, H. AI, B. Wu, and C. Huang. Real Time Facial Expression
Recognition with Adaboost. Proceedings of the 17th International Con-
ference on Pattern Recognition, 2004. ICPR 2004., Cambridge, 2004, pp.
926-929 Vol.3. doi: 10.1109/ICPR.2004.1334680
[6] S. Alizadeh, and A. Fazel Convolutional Neural Networks for Facial
Expression Recognition. Standford University, 2017. arXiv:1704.06756v1
[7] S. Mukherjee, S.Saha, S. Lahiri, A. Das, A. Kumar Bhunia, A. Konwer,
and A. Chakraborty. Convolutional Neural Network based Face detection.
2017, 1st International Conference on Electronics, Materials Engineer-
ing and Nano-Technology (IEMENTech), Kolkata, 2017, pp. 1-5. doi:
10.1109/IEMENTECH.2017.8076987

You might also like