0% found this document useful (0 votes)
37 views6 pages

Facial Expression Recognition Using Backpropagation

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 6

Facial Expression Recognition Using Backpropagation

Indra Adji Sulistijono, Zaqiatud Darojah Naoyuki Kubota


Abdurahman Dwijotomo, Dadet Pramadihanto

Electronics Engineering Polytechnic Institute Graduate School of System Design


of Surabaya (EEPIS), Indonesia Tokyo Metropolitan University, Japan
{indra, zaqiah, dadet}@eepis-its.edu kubota@tmu.ac.jp

Abstract: This paper proposed facial expression recognition using backpropagation of neural network.
The procedures are image capturing, face detection, filtering, facial feature extraction and recognition
using neural network. We propose forehead, mid forehead, mouth and cheek for the input of facial feature
extraction. We define six output of facial expression, i.e.: anger, disgust, surprise, happiness, sadness
and fear. Then, we trained the data of image captured using backpropagation of neural network. The
proposed method can recognize the facial expression from image captured well.

1 INTRODUCTION cial expression change and the displacement of the facial


feature points by the use of a B-spline curve. Then a
Object recognition is one of the development of image flexible graph matching for tracking these facial features
processing study that has many extensive application ar- from an input image sequence was used, while matching
eas. One of the application is it can be used to recognize the trajectory of the features with the expression change
face of a human with its features, hence, it also can be models to determine the category and degree of expres-
developed to recognize a human facial expression. Fa- sion change model in the image sequences.
cial expression is interesting research concerning with a At first, it is important that the machine can detect
human behavior recognition. It considered to be one of and tracking the face [9]. This procedure is used for
the most powerful and immediate means for human to capturing the moving face. Next procedure is extracting
communicate their emotions, intentions and opinions to the image captures by camera. The method to recognize
each other and this is the why much efforts have been the facial expression in this work is backpropagation of
devoted to their study by cognitive scientists an lately feedforward neural network. We define input and target
computer vision researchers [1]-[3]. In this perspective, a as training pairs. The input are the data from forehead
computer as if can feels what a human feels and it will wrinkle, mid forehead wrinkle, cheek wrinkle and mouth
create an interaction between a computer and a human length, while six output of facial expressions are defined,
like a friend. i.e. anger, disgust, surprise, happiness, sadness and fear
Many experiments have been reported toward facial to be recognized. After we train it, the proposed method
expression recognition [1],[4]. In this work, the motion can recognize the facial expression from image captured
information is used to detect the face in real time video well.
which is obtained from a camera. The facial detection is
used to extract the features which is useful to recognize a
human expression and a human emotion later. There are
2 FACIAL FEATURE EXTRACTION
many ways to extract the facial features to be input in fa- The face of a human has several features such as, mouth,
cial expression recognition. Majority, the object used to eyes, nose, eyebrows, and forehead. Each of this features
be input is difficult to interpret, such as, the change of the has a unique shape and a unique pattern, hence, many
eyebrows [1], it is almost no change between expression experiments have been reported in extracting facial fea-
happy, sad, and normal. The schema that we propose in ture for recognizing facial expression. A. Geetha et. al.
this work is very simple than any other method before. (2007) [4] used the locations of eyes as the visual fea-
The facial features which extracted in this work are fore- tures of face. Yh- yeong Chang et.al. (2001) [1] used
head wrinkle, mid forehead wrinkle, cheek wrinkle and eyebrows, eyes, and mouth for facial expression labeling.
mouth length. These facial features are easy to be rec- In this paper, we extract four main features: forehead,
ognized when a face does an expression. The method mid forehead, cheek and mouth for facial expression la-
assumes that there are no mustache, no beard, and no beling. We extract the features toward forehead wrinkle,
glasses on a facial human. mid forehead wrinkle, cheek wrinkle, and mouth length
The other facial expression recognition is along with as seen in Figure 1.
the face detection and facial feature points detection
module [8]. Faces are deformed by their facial expres- 2.1 Line Face Detection
sion. The ways of face change caused by their expres-
sion can be measured from the change in locations of fa- Facial features extracted using edge detection and mor-
cial feature points. The facial expression and the degree phology technique to obtain the lines on the face. We
of facial expression change depend on the displacement used Canny edge detection. Before applying Canny edge
of the facial expression features points from the neutral detection method, at first, the images should be opti-
face. Modeling is the relationship between the degree fa- mized using brightness and contrast tuning as shown in
The edge direction angle (θ) is rounded to one of four
angles representing vertical, horizontal and the two di-
agonals (0o , 45o , 90o and 135o for example). Once the
edge direction is obtained, the next step is related the
edge direction to a direction that can be traced in an im-
age. Finally, hysteresis is used as a means of eliminating
streaking. Hysteresis uses two thresholds, a high and a
low. Any pixel in the image that has a value greater than
T1 is presumed to be an edge pixel, and is marked as such
immediately. Then, any pixels that are connected to this
edge pixel and that have a value greater than T2 are also
selected as edge pixels.

2.1.2 Morphology Technique


For morphology, we use two operations, dilation and ero-
Fig. 1: Parts of Facial Feature Extracted sion for getting a potential area for edge intensity. The
dilation and erosion operators for grayscale images are
defined conventionally [6]. We define the edge intensity
φedge as
φedge = dilation − erosion (3)
The data from the lines on face feature, such as mouth,
are obtained using morphology technique because using
edge detection can cause many of noises. While the data
from the face wrinkle easy to be applied by usual edge
detection, such as Canny edge or Sobel edge detection.
Once the lines is obtained, the next step is counting the
number of the wrinkle and the result is as percentage how
serious a face did an expression.

2.2 Forehead Extraction


Forehead has a wide area on face, the location of it is one
third upper part of face. By this condition, we can put
Fig. 2: The image processing (a) Canny edge detection a sign into the location of forehead easy as a rectangular
with no contrast and brightness tuning, (b) Canny edge box. The size of this rectangular has been identified and
detection with contrast and brightness tuning it can not change. The location of this rectangular box
which is used to narrow and simplify the forehead wrinkle
Figure 2. The aim of this optimizing is to detect the processing from a whole face. The image is simplified to
vague lines such as the face wrinkle. be grayscale first, and then the data from the forehead
wrinkle can be obtained by Canny edge detection. Figure
3 describes how the procedures of forehead extraction
2.1.1 Canny Edge Detection given.
The Canny edge detector uses a filter based on the first
derivative of a Gaussian, because it is susceptible to noise
present on raw unprocessed image data, so to begin with,
the raw image is convolved with a Gaussian filter. The
result is a slightly blurred version of the original which
is not affected by a single noisy pixel to any significant
degree [7].
The next is finding the intensity gradient of the image.
An edge of an image may point in a variety of directions,
so the Canny algorithm uses four filters to detect hori-
zontal, vertical and diagonal edges in the blurred image. Fig. 3: The procedures of forehead extraction
The edge detection operator (Roberts, Prewitt, Sobel for
example) returns a value for the first derivative in the Once the forehead wrinkle, represented in columns and
horizontal direction (Gy ) and the vertical direction (Gx ). rows pixel, is obtained using edge detection, the number
From this the edge gradient and direction can be deter- of the wrinkle is counted. The shape of forehead wrinkle
mined: is always in horizontal, hence, other lines are classified as
|G| = |Gx | + |Gy | (1) noise. To reduce the noise, all columns from the image
next is finding the edge direction. The formula for finding of forehead can be simplified to be some columns. Each
the edge direction is: column has one search line which used to find the wrinkle
vertically as describing in Figure 4. The wrinkle which
θ = inv tan(Gy /Gx ) (2) detected is signed by red dot. Later, these red dots are
Fig. 4: The procedures of counting the Forehead wrinkle
Fig. 6: The procedures of counting the Mid Forehead
wrinkle
counted and scaled value to 0-1 range for neural network
input.
The search line that used in this work is 17 lines and is (MFHW ) which has 0-1 range is defined by
set in 17 columns. From the experiment, the maximum
of red dot (RDmaxInForeHead ) is about 30. The value of T otalRedDotInM idF oreHead
M F HW = (5)
Forehead Wrinkles (FHW ) which has 0-1 range is defined RDmaxInM idF oreHead
by
T otalRedDotInF oreHead The result of this computing is used to be input of
F HW = (4) neural network.
RDmaxInF oreHead
The result of this computation is used to be input of 2.4 Mouth Extraction
neural network.
The first step to extract the data from mouth is determin-
2.3 Mid Forehead Extraction ing the location of the mouth. The methods have been
proposed in the literature for determining the location of
The step to obtain the data from mid forehead wrinkle the eyes. Location-based approach is commonly used for
is almost same with the step to obtain the data from the locations the eyes. It’s location is one-third of lower
forehead contraction. First, we should find the location facial width and it is under the hole of nose. The work
of the mid forehead. The location of mid forehead is in area of mouth is various because it can become wide and
between two eyebrow. It is under forehead location in the also become long. After we obtained the exact location
middle area. Then, this location is signed by rectangular of the mouth, a rectangular box 57 x 66 pixel is signed
box which has size 21 x 29 pixels and is fixed. The mid which can become the region of interest. Using edge de-
forehead wrinkle is more vague than the forehead wrinkle, tection either Canny edge or Sobel edge is difficult to
hence using such Sobel edge detection method is difficult apply, because the color of the edge of mouth is vague
to apply in this case. We use Canny edge detection to and it is near to the skin of face. From the experience
obtain the data from mid forehead wrinkle as describing of some technique, morphology technique is the best to
in Figure 5. extract the lines in the edge of mouth, although there
are still small noises surrounding the mouth. To solve
this problem, threshold color and Gaussian smoothing
are used. Figure 7 illustrates the procedures of mouth
extraction.

Fig. 5: The procedures of mid forehead extraction

The step to extract mid forehead wrinkle is also al-


most same with the step to extract forehead wrinkle be-
fore. Figure 6 describes how to count the mid forehead Fig. 7: The procedures of mouth extraction
wrinkle. The shape of mid forehead wrinkle is always
in vertical, hence, other lines are classified as noise. To
reduce the noise, all rows of the image of mid forehead
can be simplified to be some rows. Each row has also one
search line which used to find the wrinkle horizontally.
The wrinkle which detected is signed by red dot. Later,
these red dots are counted and scaled value to 0-1 range
for neural network input.
The search line that used in finding mid forehead wrin-
kle is 7 lines and is set in 17 columns. From the ex-
periment, the maximum of red dot (RDmaxInMidFore-
Head ) is about 18. The value of Mid Forehead Wrinkles Fig. 8: The procedures of counting the mouth length
The length of mouth is obtained by finding upper dot 3 FACIAL EXPRESSION RECOGNI-
(Y1) and lower dot (Y2) of mouth. The distance of the TION
two dots is obtained from the gaping mouth as shown in
Figure 8. The mouth length is then scaled into 0-1 range We use backpropagation algorithm to recognize of facial
for the input of neural network. From our experiments, expression with feedforward architecture. Backpropaga-
the maximum value of mouth length (VMLg) is 20. The tion neural networks are the most widely used network
value of VMLg is defined by and are considered the work horse of artificial neural net-
work [5],[6]. It can be used to model complex relation-
(Y 2 − Y 1) ships between inputs and outputs or to find patterns in
V M Lg = (6) data.
V M Lg
The backpropagation of feedforward architecture is de-
signed based on facial features extracted as illustrated in
2.5 Cheek Extraction Figure 11. It consists of (1) an input layer containing four
neurons representing input variable to the problem, that
The location of cheek is inside of mouth. After detecting is extracted data from the forehead wrinkle, the mid fore-
the location of the cheek, the image of cheek is extracted head wrinkle, the cheek wrinkle, and the mouth length;
into grayscale form. This location is then signed by rect- (2) one hidden layers containing one or more neurons to
angular box which has size 15 x 30 pixel. Because of help capture the nonlinearity in the data; and (3) an out-
there are much of noises around of cheek, we use Gaus- put layer containing six nodes representing output vari-
sian smoothing technique to obtain cheek wrinkle. Figure able to the problem, that is facial expressions : anger,
9 describing the procedures of cheek extraction. disgust, surprise, happiness, sadness and fear. The neu-
rons between layers are fully interconnected with weight
vij and wij .

Fig. 9: The procedures of cheek extraction

The step to obtain the cheek wrinkle is the same with


the steps before. Each search line find the wrinkle which
vertical. Each wrinkle detected is then signed by red dot.
All of these dots is counted and scaled to 0-1 range for
neural network input. Figure 10 shows the procedures of
counting the cheek wrinkle.
Fig. 11: Architecture of feedforward backpropagation
neural network for facial expression recognition

The training of a network by backpropagation neural


network involve three stages: the feedforward of the input
training pattern, the calculation and backpropagation of
associated error, and the adjustment of the weights [6].
The data are fed forward from the input layer, through
hidden layer, to output layer without feedback. Then,
based on the feedforward error-backpropagation learning
algorithm, backpropagation will search the error surface
using gradient descent for point(s). Based on the error,
Fig. 10: The procedures of counting the Cheek Wrinkle the portion of error correction is computed, and then the
weights for all layers are adjusted simultaneously.
The search line that used for finding cheek wrinkle is In many neural network applications, the data (input
7 lines and is set in 7 rows. From the experiment, the or target patterns) have the same range of values [6]. We
maximum of red dot (RDmaxInCheek ) is about 18. The use the binary sigmoid function, which has range of (0,1)
value of Cheek Wrinkles (CkW ) which has 0-1 range is and is defined as f (x) = 1/(1 + exp(−x)), that’s why
defined by the data is also represented in binary form or has range
of 0-1. The representation data of input is explained in
T otalRedDotInCheek the section before. Table 1 shows the data of training
CkW = (7)
RDmaxInCheek pairs (input and target patterns) in backpropagation of
neural network. We use two pairs of training input data
The result of this computing is used to be input of for each of six output expressions. The first row is for
neural network. neutral expression.
Table 1: The data of training pairs in backpropagation of neural network

Input Output
ForeHead Mid ForeHead Mouth Cheek Anger Disgust Surprise Happy Sadness Fear
0 0 0 0 0 0 0 0 0 0
0 1 0 0 1 0 0 0 0 0
0 1 1 0 0 1 0 0 0 0
1 0 1 0 0 0 1 0 0 0
0 0 0 1 0 0 0 1 0 0
1 1 0 0 0 0 0 0 1 0
1 1 1 0 0 0 0 0 0 1
0 0.5 0 0 0.5 0 0 0 0 0
0 0.6 0.2 0 0 0.5 0 0 0 0
1 0.15 0 0 0 0 0.5 0 0 0
0 0 0 0.3 0 0 0 0.5 0 0
0.8 0.2 0 0 0 0 0 0 0.5 0
1 0.5 0.2 0 0 0 0 0 0 0.5

4 EXPERIMENTAL RESULTS References


The proposed method was implemented on a personal [1] Chang Jyh-Yeong, Chen Jia Lin, Automated Facial
computer with Pentium dual core 3 GHz CPU and 2 GB Expression Recognition System Using Neural Net-
RAM. First, we do the training for the all training pair . works, Journal of the Chinese Institute of Engineers,
The aim is to train the net to achieve a balance between Vol. 24, No.3, pp.345-356, 2001
the ability to respond correctly to the input patterns that [2] P. Ekman, W.V. Friesen, The Facial Action Cod-
are used for training (memorization) and the ability to ing System: A Technique for measurement of fa-
give good responses to the input that is similar, to the cial movement, Consulting Psychologists Press, Palo
used in training. After training, the backpropagation Alto, CA, 1978
neural net is applied by using only the feedforward phase
of the training algorithm. [3] M. Pantic, L, Rothkrantz, Automatic Analysis of
Table 2 shows a snapshot of the real time facial expres- Facial Expression: the state of the art, IEEE Trans-
sions as input with the output result. There are 2 persons action on Pattern Analysisi and Machine Intelli-
of pair data, the first person is with 6 facial expressions gence 22 (12)(2000), pp.1424-1445, 2000
and the second person is with 5 facial expressions. The [4] A. Geetha, V, Ramalingam, S. Palanivel, B. Pala-
input data range is 0-1 and the output data is percentage. niappan, Facial Expression Recognition-A real time
The lowest percentage of expression is sadness (32.4%) approach, Elsevier:Expert System with Applications
for the first person and disgust for the second person 36 (2009), 303-308, 2009
(45.5%). The expression percentage of disgust and anger
showed that those expression have high similarity for the [5] Te-Hsiu Sun, Fang-Chih Tien, Using Backpropaga-
first person (the 2nd row from top), while the percent- tion Neural Network for Face Recognition with 2D
age expression of happy and disgust have high similarity + 3D Hybrid Information, Elsevier:Expert System
for the second person (the 2nd row from bottom). The with Applications 35 (2008), pp.361-372, 2008
highest percentage of expression is happy (83.6%) for the [6] Fausett Laurene, Fundamentals of Neural Networks:
first person and sadness for the second person (95.8%) Architectures, Algorithms, and Applications, Pren-
It shows that the six expressions identified can be rec- tice Hall, 1994
ognized well based on the memory of the weight which
learned from face features with higher percentage of the [7] Gonzales Rafael C., Woods Richard E., Digital Im-
appropriate expression. age Processing, Addison-Wesley Publishing Com-
pany, Inc, New York, 1993

5 CONCLUSIONS [8] D.Pramadihanto, Y.Iwai and M.Yachida, Integrated


face identification and facial expression recognition,
In this paper, a simple method for facial expression recog- IEICE Transaction on Information and Systems,
nition using backpropagation was proposed . The experi- vol.E84-D, no.7, pp.856-866, 2001
mental results show that backpropagation algorithm with [9] Indra Adji Sulistijono, Naoyuki Kubota, Human
the facial features extracted method can recognize well Head Tracking Based on Particle Swarm Optimiza-
the appropriate facial expressions with the higher per- tion and Genetic Algorithm, Journal of Advanced
centage than another facial expressions. The expression Computational Intelligence and Intelligent Infor-
of sadness and disgust are more difficult than the others matics (JACIII), Vol. 11, No. 6, Fuji Technology
to recognize. Press Ltd., July, 2007, pp.681-687.
Generally speaking, online and spontaneous expres-
sion recognition is a difficult task. We focus to tackle
the recognition of subtle spontaneous facial expressions.
Furthermore, we would like to apply unsupervised learn-
ing with an online clustering technique, and to estimate
the intensity of facial expressions.
Table 2: The experimental result of facial expression using forehead, mid forehead and mouth as input with six facial expressions as
output (%)

You might also like