A Face Recognition System Based On A Kinect Sensor and Windows Azure Cloud Technology

Download as pdf or txt
Download as pdf or txt
You are on page 1of 4

A face recognition system based on a Kinect sensor

and Windows Azure cloud technology

Dan-Marius Dobrea1, Daniel Maxim1, Stefan Ceparu1


1
Technical University “Gh. Asachi”, Faculty of Electronics, Telecommunications and Information Technology, Iaşi, Romania
mdobrea@etti.tuiasi.ro, danielmaxim88@yahoo.com, ceparu.stefan@gmail.com

Abstract— The aim of this paper is to build a system for human operates on intensity images, second one that deals with video
detection based on facial recognition. The state-of-the-art face sequences and, the last one, that uses other sensory data (such as
recognition algorithms obtain high recognition rates base on 3D information).
demanding costs – computational, energy and memory. The use of Comparing the performances obtained in face recognition
these classical algorithms on an embedded system cannot achieve
field with the ones obtained in OCR (optical character
such performances due to the existing constrains: computational
power and memory. Our objective is to develop a cheap, real time recognition) area one can remark that in the field of OCR over
embedded system able to recognize faces without any compromise forty years were necessary to built acceptable quality algorithms
on system’s accuracy. The system is designed for automotive able to recognize written symbols. The face recognition
industry, smart house application and security systems. To algorithms developments are in this moment in the childhood
achieve superior performance (higher recognition rates) in real stage.
time, an optimum combination of new technologies was used for Our goal is to develop a low cost, but reliable, face
detection and classification of faces. The face detection system recognition system. The system was developed and tested
uses skeletal-tracking feature of Microsoft Kinect sensor. The face related to automotive industry applications – security system (to
recognition, more precisely - the training of neural network, the
have the authorization to start the engine) and setting the car
most computing-intensive part of the software, is achieved based
on the Windows Azures cloud technology environment accordingly to the driver necessities (adjust the
mirrors, driver seat, the steering wheel etc.).
The classical algorithms (for face detection and recognition)
I. INTRODUCTION are very complex and, additionally, are data and computational
The computer’s or system’s ability to sense and respond to intensive requiring powerful systems and large fast memories.
the requirements of a specific user helps them to better adapt to In order to solve these problems we have used the most recent
the user’s needs and also, to develop the naturalness of human- advances in computer vision techniques, computer design,
computer communication. sensors and distributed computing.
The face identification and face recognition are some of the As a result, a low cost solution was devised (without any
toughest and challenging problems in the computational compromise in system’s reliability and performances) based on
intelligence field. Sometimes even a human has difficulty in two edge technologies: Windows Azure and Kinect sensor.
recognizing faces. It is well known that people have trouble Moving the computational load and the complexity of the
recognizing face differences amongst other people of different algorithms on Windows Azure cloud and Kinect sensor we are
races than their own. able to build an embedded system able to work in real time,
One of the main challenges in face recognition algorithms is with maximum performance for face recognition.
given by the face detection (i.e. locating the face or faces in an Our project has three main sections: face detection, face
image) – a preliminary, but necessary, step before attempting identification and training the neural network (used to recognize
faces recognition. Various face detection algorithms have been faces).
proposed [1], [2], [3], [4], [5], [6]; however, almost all of them The system operates in two modes (see Fig. 1): learning
have poor performance in the presence of: scale variation [6], mode and recognizing mode.
face occlusions [1], [4], rotation [3], variation in illumination Since the training of neural network requires a lot of
[2], [6], orientation, variation in skin colors [2], [6], complex computational resources (it is impossible to be done on a low
backgrounds [5] etc. Another important drawback is the cost embedded system) and, additionally, must be done only
requirement of high computing power in order to run the few times, we have decided to let the Windows Azure (the
algorithm in real time. Microsoft's cloud platform) handle this job.
In the face recognition field, the reported solutions presented The novelty of our system does not rely on the algorithms
in the literature are of rare encountered diversity, but all have a used, but in the main concepts of the system: how to use and
common feature: they are computing intensive – requiring integrate several new technologies allowing us to obtain results
powerful and expensive systems. In connection with the face previously unattainable under the same conditions. The
data acquisition methods, face recognition methods can be resulting build on system is very accurate with a very
classified into three main categories [7]: first the one that competitive price.

978-1-4673-6143-9/13/$31.00 ©2013 IEEE


II. THE HARDWARE SYSTEM recognize and track people standing in front of it. The SDK
The requirements of the face detection and recognition (Software Development Kit), provided by Microsoft, running on
algorithms are sustained by a PC104+ embedded system. The the embedded device is able to: decode the information from
embedded system is a MOPS-PM system produced by Kontron. sensors, recognize human elements in the images and map them
The MOPS-PM is an „all in one“ Intel® Pentium Processor into a body skeleton drew in the RBG space, Fig. 2(a). At the
Celeron® M (1GHz Hz 512KB L2 cache - Dothan) based end we have access to different skeleton joints (e.g. head, left
PC/104+ board having 1 GByte of RAM, 2 serial ports, 1 LAN hand, right hand, elbow etc.) coordinates in the Cartesian
port 10/100 BaseT Ethernet, 1 parallel port, 2 USB 2.0, 1 system, having Kinect sensor as a reference.
Watchdog timer etc.
The link between the PC104+ system and the Windows
Azure cloud infrastructure was a 3G wireless connection.

Interfaces with the Kinect


sensor, through the USB port

Detect the face/faces

Acquire the image with the user’s face


(a). (b).
“Normalize” and store the face Figure 2. (a). Kinect skeleton and joints, (b). the detected face
image (size, resolution etc.)
No Detection of the face is very simple mainly because we have
the head coordinates (the center of gravity) in the RGB frame,
No Learning Yes Database Fig. 2(a). Even if the detection of the face seems to be trivial
mode? complete? (from the point of view of the user of the Kinect SDK) a large
part of the face recognition technologies existing in the Kinect
PC104+ Yes is based on two powerful algorithms: learning-base descriptor
and pose-adaptive matching technique [8].
system Send images to cloud
The classical techniques for face detection confront with
Neural another problem: the false positive detection of faces. For
network Train neural network example the OneVision algorithm for face detection developed
weights by Innovation Labs, Microsoft’s R&D center in Israel, detects
Neural network all the faces in a frame, even if the faces are on a poster [9] –
Send neural network
decision weights resulting low detection accuracy.
Comparing with the classical face detection algorithms (e.g.
… Windows Azure cloud Viola–Jones or other methods) the Kinect face detection
Identity of the subject infrastructure approach bring the following advantages: (1) the embedded
Figure 1. The software modules and the hardware allocated to support them system use the computational resources of the Kinect sensor to
detect faces, (2) the 2D faces (from pictures, posters etc.) are
The Windows Embedded Standard 7 operating system runs not detected – see Fig. 2(b) and (3) the system is invariant to
on the PC104+ system and supports all the developed software had rotation – the explanation will follow.
modules, see Fig. 1. Windows Embedded Standard 7 is a Having the head coordinates, in the next step we must detect
componentized form of Windows 7 operating system that and save only the face information. The face dimension in an
allows developers to select only the required components image is proportional with the distance between the Kinect
necessary to support their applications. sensor and the human body. As a result, depending on the
distance from the sensor, the dimension of the rectangle that
III. KINECT SENSOR AND FACE DETECTION framed the face was changed accordingly.
The Kinect sensor uses a RGB video camera (to capture an
image), an IR projector, an infrared camera sensor (used to
build a “depth map” of the area in front of it), four microphones
and some special signal processing hardware that is able to
preprocess all the data from the sensors – in this mode a part of Figure 3. Samples with the obtained faces
the potential computational load of the system, that use Kinect
sensor, is supported by the sensor itself. In the last step, in order to obtain a good face image for the
One of the main applications of the Kinect sensor is to recognition process, the face images were resampled at the
desired size (32 x 32 pixels) and converted to the grayscale. In “storing” a set of prototype patterns in the memory (weights)
Fig. 3 are presented some samples of the obtained faces images, based on auto-associative learning process the unknown face
all having the same resolution, for all face images the distances image is presented at the neural networks inputs. The
from the Kinect sensor was different. recognition phase is based on computing differences (the errors)
between the reconstructed image (obtained from the input
IV. WINDOWS AZURE CLOUD PLATFORM unknown image going through the neural network weights) and
each image prototype (7 for each user of the system). The
The facial recognition algorithm, used in this project, is based
selected candidate is founded based on the greatest degree of
on an artificial neural network (ANN) structure. In a ANN, the
similarity. After a large number of tests, we establish a
training process is necessary only a few times (the first time
recognition threshold of 60%. The subjects who were not in the
when the face database is created or when a new user must be
database obtained a percentage of similarity of about 45%. As a
added or removed from the database), otherwise the system will
result, any picture which received a value of similarity smaller
work in the “recognition mode”. In the “recognition mode” the
than 60% will be classified as intruder.
system is able to detect a person, based on its face image and on
the neural network weights (that embed knowledge acquired in
the training stage). As a result, almost all the time (when the
system is in the “recognition mode”) the computation load is
very low, with very rare spikes – when the system goes in
“learning mode” (e.g. for 4 users, 13 imagines/user, the training
process of ANN require around 140 hours, if we use the
PC104+ system), see Fig. 1. These computational
characteristics fit with cloud elasticity feature – ability to
elastically provision and respectively free new resourced, in
order to scale them rapidly outward and inward in a direct
relation with the requirements of the application [10]. Figure 4. SQL Azure database view
On a practical consumer scenario (for automotive industry,
smart house application and security systems) a new user must B. Windows Azure
be recognized by the system – as a result: (a) several imagine In order to train, in cloud, the neural network we need the
are acquired and (b) the embedded system must to obtain the following components: (1) Cloud SQL Database – to store the
ANN weights in order to get to the “recognition mode”. Base training face images, Fig. 4, (2) Cloud Worker – used to train
on Azure cloud infrastructure the training process of ANN is the neural network, Fig. 5, and (3) Cloud Blob Storage – to hold
reduced from hundreds of hours to 5 minutes. the results of the training process.
A. Neural network The face images database is located in cloud (our choice to
In this research we used an auto-associative neural network speed up the training process).
(AANN) [11]. The algorithm used in the classification process A Cloud Worker (named Cloud Service) is similar to a
affect directly the recognition rate. In our project the AANN Windows Service and represents a container used to host any
was used only to demonstrate the feasibility of the system applications. Essentially it can run any code – our program is
concepts. Other neuronal network structures (with superior developed in C#. It can react to outside stimuli (e.g. by polling
performances, like support vector machine – SVM) can be used from the Azure Queue service) but can also open
without any system changes, only replacing directly the AANN. communication channels, query databases etc.
In an AANN the input layer is linked to the output layer In our project the Cloud Worker handles several main
through the associated weights. In the training process, the input functions as follows. First, handle the client-server
face image and the output face image are the same. After communication: we have chosen the asynchronous
communication technology in order to minimize the costs and,

(a) (b)

Figure 5. Cloud Worker states: (a) when a client just connected and (b) training the neural network
in this mode, the worker can deal with multiple clients at the efficiency and increase the power consumption. Exploring and
same time. Second, Cloud Worker manage the server-database identifying a solution for increasing the computational expenses
communication used to get the pictures from the database and, without obtaining too much degradation in overall performance
finally, when the training process is completed, to store the best and energy consumption of the system are necessary steps for
weights of the auto-associative neural networks. Third, Cloud facial recognition applications in order to make them as popular
Worker train the neural network based on the image faces as personal computers. The Windows Azure cloud technology
retrieved from the database, Fig. 5(b). The algorithm used to can represent a solution for this problem. Our project and the
train the network was backpropagation. Fourth, finally, when obtained result sustain this conclusion.
the training process is completed, the neural network is saved Using and exploiting Kinect technology we can detect the
into a blob storage container. faces in real time faster, easier and more accurate, Fig. 2, than
other similar system [9].
V. RESULTS The 3G data link connection is enough; the maximum
To test the ability of the system to recognize correctly quantity of exchanged data is around 200 Kbyte in one session.
different faces, we have used 4 subjects, each of them having 7 As a final conclusion, the previously presented face
face image prototypes based on which we have trained the auto- recognition concepts can be used in a large field of application
associative neural network. Even if it seems that the number of (as a human computer interface), obtaining higher classification
subject is small, for an automotive applications this number is rates without any significant computational burden to the
adequate – it represent the potentially car/truck drivers. The embedded system. As a result of the above described concepts
recognition rate was 93%. To obtain such higher correct implementation, the final system is low-cost and it is able to
classification rate we have used some parameters provided by provide all the functionalities of similar high-end facial
the SDK that helps to find the head orientation – the head recognition system.
rotation angle. As a result, the system grabbed the face images
only when the face position is suitable for a good recognition – ACKNOWLEDGMENT
frontal faces and less than 20 degrees off position. All the tests The authors are grateful to the Microsoft Romania for the
took place in a room with normal natural light – not directly donation of a set of 2 Kinect devices.
exposed to sun light – between 9.00 AM and 13.00 PM. A main
problem remains the influence of the ambient illumination over REFERENCES
the face recognition process. [1] D. Xingjing, L. Chunmei, and Y. Yongxia, “Analysis of detection and
As we have mentioned previously, in this paper, the goal was track on partially occluded face,” Int. Forum on Inf. Tech. and App., vol.
3, 2009, pp. 158 – 161
not to obtain higher classification rates (for this objective a
[2] U. Yang, M. Kang, K. A. Toh, and K. Sohn, “An illumination invariant
more powerful neural network can be used and, additionally, skin-color model for face detection”, 4th IEEE Int. Conf. on Biometrics:
some image preprocessing algorithms, to make face recognition Theory App. and Syst., 2010, pp. 1-6
invariant to illumination condition), but to prove the feasibility [3] H. Chang, A. Haizhou, L. Yuan, and L. Shihong, “High-performance
of the main concepts of the system. rotation invariant multiview face detection,” IEEE Trans. on Patt.
Analysis and Mach. Intell., vol. 29 , no. 4, pp 671 – 686, 2007
The necessary time for the system to identify a user or an
[4] L. Goldmann, U. J. Monich, and T. Sikora, “Components and their
intruder and to act accordingly varies depending on the face topology for robust face detection in the presence of partial occlusions,”
position and orientation. The system will continuously make IEEE Trans. on Inf. Forens. and Sec., vol. 2, no. 3, part 2, pp. 559 – 569,
face orientation estimation and will acquire a face image only in 2007
frontal case. In a normal situation the system will respond, in [5] W. Yanjiang, and Y. Baozong, “Segmentation method for face detection
the worst case, in approximately 30 seconds. The response time in complex background,” Elect. Lett., vol. 36, no. 3, pp. 213 - 214, 2000
of the system, when a face image was acquired, is less than 200 [6] C. N. R. Kumar, and A. Bindu, “An efficient skin illumination
compensation model for efficient face detection,” 32nd Annual Conf. on
ms. IEEE Ind. Elect., France, Paris, 6-10 Nov. 2006, pp. 3444 – 3449
Power consumption tests were performed during the normal [7] R. Jafri, and H. R. Arabnia, “A survey of face recognition techniques,” J.
system operation and reflect the power consumption only of the of Inf. Proc. Syst., vol. 5, no. 2, pp. 41-68, June 2009
embedded system at which we have connected Kinect sensor, [8] C. Zhimin, Y. Qi, T. Xiaoou, and S. Jian, “Face recognition with
mouse and keyboard – the software make a normal cycle of learning-based descriptor,” IEEE Conf. on Comp. Vision and Patt.
Recogn., 13-18 June 2010, San Francisco CA, pp. 2707 – 2714
image face recognition. As a result the power consumption was
[9] “Big brother? Microsoft unveils technology to recognize faces in video,”
of 12.5W up to 14W – these values correspond to a current of http://nocamels.com/2011/03/big-brother-microsoft-unveils-
2.5A up 2.8A at a supply voltage of 5V DC. To make a technologyto-recognize-faces-in-video/, date of access: 23 feb. 2013
comparison, with an Intel i7 personal computer, the measured [10] P. Marshall, K. Keahey, and T. Freeman, “Elastic site: using clouds to
standby power consumption and active power consumption was elastically extend site resources,” 10th IEEE/ACM International
33.03W and 102.2W respectively (for only one core – the Conference on Cluster, Cloud and Grid Computing, 17-20 May 2010,
Melbourne, Australia, pp. 43 - 52
system has 4 cores) [12].
[11] C. Bishop, Neural Network for Pattern Recognition, New York, NY,
USA, Oxford University Press, 1995
VI. DISCUSSIONS AND CONCLUSIONS
[12] Y. C. Wang, B. Donyanavard, and K. T. Cheng, “Energy-aware real-time
In a classical system growing the complexity and the face recognition system on mobile CPU-GPU platform,” Int. Workshop
computational costs of the software compromise the execution on Comp. Vision on GPU 2010, Crete, Greece, Vol Part II, pp. 411-422

You might also like