A Face Recognition System Based On A Kinect Sensor and Windows Azure Cloud Technology
A Face Recognition System Based On A Kinect Sensor and Windows Azure Cloud Technology
A Face Recognition System Based On A Kinect Sensor and Windows Azure Cloud Technology
Abstract— The aim of this paper is to build a system for human operates on intensity images, second one that deals with video
detection based on facial recognition. The state-of-the-art face sequences and, the last one, that uses other sensory data (such as
recognition algorithms obtain high recognition rates base on 3D information).
demanding costs – computational, energy and memory. The use of Comparing the performances obtained in face recognition
these classical algorithms on an embedded system cannot achieve
field with the ones obtained in OCR (optical character
such performances due to the existing constrains: computational
power and memory. Our objective is to develop a cheap, real time recognition) area one can remark that in the field of OCR over
embedded system able to recognize faces without any compromise forty years were necessary to built acceptable quality algorithms
on system’s accuracy. The system is designed for automotive able to recognize written symbols. The face recognition
industry, smart house application and security systems. To algorithms developments are in this moment in the childhood
achieve superior performance (higher recognition rates) in real stage.
time, an optimum combination of new technologies was used for Our goal is to develop a low cost, but reliable, face
detection and classification of faces. The face detection system recognition system. The system was developed and tested
uses skeletal-tracking feature of Microsoft Kinect sensor. The face related to automotive industry applications – security system (to
recognition, more precisely - the training of neural network, the
have the authorization to start the engine) and setting the car
most computing-intensive part of the software, is achieved based
on the Windows Azures cloud technology environment accordingly to the driver necessities (adjust the
mirrors, driver seat, the steering wheel etc.).
The classical algorithms (for face detection and recognition)
I. INTRODUCTION are very complex and, additionally, are data and computational
The computer’s or system’s ability to sense and respond to intensive requiring powerful systems and large fast memories.
the requirements of a specific user helps them to better adapt to In order to solve these problems we have used the most recent
the user’s needs and also, to develop the naturalness of human- advances in computer vision techniques, computer design,
computer communication. sensors and distributed computing.
The face identification and face recognition are some of the As a result, a low cost solution was devised (without any
toughest and challenging problems in the computational compromise in system’s reliability and performances) based on
intelligence field. Sometimes even a human has difficulty in two edge technologies: Windows Azure and Kinect sensor.
recognizing faces. It is well known that people have trouble Moving the computational load and the complexity of the
recognizing face differences amongst other people of different algorithms on Windows Azure cloud and Kinect sensor we are
races than their own. able to build an embedded system able to work in real time,
One of the main challenges in face recognition algorithms is with maximum performance for face recognition.
given by the face detection (i.e. locating the face or faces in an Our project has three main sections: face detection, face
image) – a preliminary, but necessary, step before attempting identification and training the neural network (used to recognize
faces recognition. Various face detection algorithms have been faces).
proposed [1], [2], [3], [4], [5], [6]; however, almost all of them The system operates in two modes (see Fig. 1): learning
have poor performance in the presence of: scale variation [6], mode and recognizing mode.
face occlusions [1], [4], rotation [3], variation in illumination Since the training of neural network requires a lot of
[2], [6], orientation, variation in skin colors [2], [6], complex computational resources (it is impossible to be done on a low
backgrounds [5] etc. Another important drawback is the cost embedded system) and, additionally, must be done only
requirement of high computing power in order to run the few times, we have decided to let the Windows Azure (the
algorithm in real time. Microsoft's cloud platform) handle this job.
In the face recognition field, the reported solutions presented The novelty of our system does not rely on the algorithms
in the literature are of rare encountered diversity, but all have a used, but in the main concepts of the system: how to use and
common feature: they are computing intensive – requiring integrate several new technologies allowing us to obtain results
powerful and expensive systems. In connection with the face previously unattainable under the same conditions. The
data acquisition methods, face recognition methods can be resulting build on system is very accurate with a very
classified into three main categories [7]: first the one that competitive price.
(a) (b)
Figure 5. Cloud Worker states: (a) when a client just connected and (b) training the neural network
in this mode, the worker can deal with multiple clients at the efficiency and increase the power consumption. Exploring and
same time. Second, Cloud Worker manage the server-database identifying a solution for increasing the computational expenses
communication used to get the pictures from the database and, without obtaining too much degradation in overall performance
finally, when the training process is completed, to store the best and energy consumption of the system are necessary steps for
weights of the auto-associative neural networks. Third, Cloud facial recognition applications in order to make them as popular
Worker train the neural network based on the image faces as personal computers. The Windows Azure cloud technology
retrieved from the database, Fig. 5(b). The algorithm used to can represent a solution for this problem. Our project and the
train the network was backpropagation. Fourth, finally, when obtained result sustain this conclusion.
the training process is completed, the neural network is saved Using and exploiting Kinect technology we can detect the
into a blob storage container. faces in real time faster, easier and more accurate, Fig. 2, than
other similar system [9].
V. RESULTS The 3G data link connection is enough; the maximum
To test the ability of the system to recognize correctly quantity of exchanged data is around 200 Kbyte in one session.
different faces, we have used 4 subjects, each of them having 7 As a final conclusion, the previously presented face
face image prototypes based on which we have trained the auto- recognition concepts can be used in a large field of application
associative neural network. Even if it seems that the number of (as a human computer interface), obtaining higher classification
subject is small, for an automotive applications this number is rates without any significant computational burden to the
adequate – it represent the potentially car/truck drivers. The embedded system. As a result of the above described concepts
recognition rate was 93%. To obtain such higher correct implementation, the final system is low-cost and it is able to
classification rate we have used some parameters provided by provide all the functionalities of similar high-end facial
the SDK that helps to find the head orientation – the head recognition system.
rotation angle. As a result, the system grabbed the face images
only when the face position is suitable for a good recognition – ACKNOWLEDGMENT
frontal faces and less than 20 degrees off position. All the tests The authors are grateful to the Microsoft Romania for the
took place in a room with normal natural light – not directly donation of a set of 2 Kinect devices.
exposed to sun light – between 9.00 AM and 13.00 PM. A main
problem remains the influence of the ambient illumination over REFERENCES
the face recognition process. [1] D. Xingjing, L. Chunmei, and Y. Yongxia, “Analysis of detection and
As we have mentioned previously, in this paper, the goal was track on partially occluded face,” Int. Forum on Inf. Tech. and App., vol.
3, 2009, pp. 158 – 161
not to obtain higher classification rates (for this objective a
[2] U. Yang, M. Kang, K. A. Toh, and K. Sohn, “An illumination invariant
more powerful neural network can be used and, additionally, skin-color model for face detection”, 4th IEEE Int. Conf. on Biometrics:
some image preprocessing algorithms, to make face recognition Theory App. and Syst., 2010, pp. 1-6
invariant to illumination condition), but to prove the feasibility [3] H. Chang, A. Haizhou, L. Yuan, and L. Shihong, “High-performance
of the main concepts of the system. rotation invariant multiview face detection,” IEEE Trans. on Patt.
Analysis and Mach. Intell., vol. 29 , no. 4, pp 671 – 686, 2007
The necessary time for the system to identify a user or an
[4] L. Goldmann, U. J. Monich, and T. Sikora, “Components and their
intruder and to act accordingly varies depending on the face topology for robust face detection in the presence of partial occlusions,”
position and orientation. The system will continuously make IEEE Trans. on Inf. Forens. and Sec., vol. 2, no. 3, part 2, pp. 559 – 569,
face orientation estimation and will acquire a face image only in 2007
frontal case. In a normal situation the system will respond, in [5] W. Yanjiang, and Y. Baozong, “Segmentation method for face detection
the worst case, in approximately 30 seconds. The response time in complex background,” Elect. Lett., vol. 36, no. 3, pp. 213 - 214, 2000
of the system, when a face image was acquired, is less than 200 [6] C. N. R. Kumar, and A. Bindu, “An efficient skin illumination
compensation model for efficient face detection,” 32nd Annual Conf. on
ms. IEEE Ind. Elect., France, Paris, 6-10 Nov. 2006, pp. 3444 – 3449
Power consumption tests were performed during the normal [7] R. Jafri, and H. R. Arabnia, “A survey of face recognition techniques,” J.
system operation and reflect the power consumption only of the of Inf. Proc. Syst., vol. 5, no. 2, pp. 41-68, June 2009
embedded system at which we have connected Kinect sensor, [8] C. Zhimin, Y. Qi, T. Xiaoou, and S. Jian, “Face recognition with
mouse and keyboard – the software make a normal cycle of learning-based descriptor,” IEEE Conf. on Comp. Vision and Patt.
Recogn., 13-18 June 2010, San Francisco CA, pp. 2707 – 2714
image face recognition. As a result the power consumption was
[9] “Big brother? Microsoft unveils technology to recognize faces in video,”
of 12.5W up to 14W – these values correspond to a current of http://nocamels.com/2011/03/big-brother-microsoft-unveils-
2.5A up 2.8A at a supply voltage of 5V DC. To make a technologyto-recognize-faces-in-video/, date of access: 23 feb. 2013
comparison, with an Intel i7 personal computer, the measured [10] P. Marshall, K. Keahey, and T. Freeman, “Elastic site: using clouds to
standby power consumption and active power consumption was elastically extend site resources,” 10th IEEE/ACM International
33.03W and 102.2W respectively (for only one core – the Conference on Cluster, Cloud and Grid Computing, 17-20 May 2010,
Melbourne, Australia, pp. 43 - 52
system has 4 cores) [12].
[11] C. Bishop, Neural Network for Pattern Recognition, New York, NY,
USA, Oxford University Press, 1995
VI. DISCUSSIONS AND CONCLUSIONS
[12] Y. C. Wang, B. Donyanavard, and K. T. Cheng, “Energy-aware real-time
In a classical system growing the complexity and the face recognition system on mobile CPU-GPU platform,” Int. Workshop
computational costs of the software compromise the execution on Comp. Vision on GPU 2010, Crete, Greece, Vol Part II, pp. 411-422