A Robust, Low-Cost Approach To Face Detection and Face Recognition
A Robust, Low-Cost Approach To Face Detection and Face Recognition
A Robust, Low-Cost Approach To Face Detection and Face Recognition
1
CiiT International Journal of Digital Image Processing, ISSN 0974 9691 (Print) & ISSN 0974 9586 (Online)
Vol. 15, No. 10, October 2011
Most electronic imaging applications often desire and coefficients). The approximate coefficients thus obtained, are
require high resolution images. High resolution basically stored and the detail coefficients are discarded. Various levels
means that pixel density within an image is high, and therefore of DWT are realized and their corresponding accuracy rates are
a HR image can offer more details and subtle transitions that determined.
may be critical in various applications [19]. For instance, high
A. Frontal Face Image Detection and Extraction
resolution medical images could be very helpful for a doctor to
make an accurate diagnosis. It may be easy to distinguish an The face detection problem can be defined as, given an input
object from similar ones using high resolution satellite images, an arbitrary image, which could be a digitized video signal or a
and the performance of pattern recognition in computer vision scanned photograph, determine whether or not there are any
can easily be improved if such images are provided. Over the human faces in the image and if there are, then return a code
past few decades, charge-coupled device (CCD) and CMOS corresponding to their location. Face detection as a computer
image sensors have been widely used to capture digital images. vision task has many applications. It has direct relevance to the
Although these sensors are suitable for most imaging face recognition problem, because the first and foremost
applications, the current resolution level and consumer price important step of an automatic human face recognition system
will not satisfy the future demand [19]. is usually identifying and locating the faces in an unknown
Past studies by researches and scientists that have image [5].
investigated the challenging task of face detection and For our purpose, face detection is actually a face localization
recognition have therefore, typically used high resolution problem in which the image position of single face has to be
images. Moreover, most standard face databases such as the determined [6]. The goal of our facial feature detection is to
MIT-CBCL Face Recognition Database [21], CMU Multi-PIE detect the presence of features, such as eyes, nose, nostrils,
[22], The Yale Face Database [23] etc., that are basically used eyebrow, mouth, lips, ears, etc., with the assumption that there
as a standard test data set by researchers to benchmark their is only one face in an image [7]. The system should also be
results, also employ high quality images. robust against human affective states of like happy, sad,
Results obtained by solutions proposed by researchers are disgusted etc. The difficulties associated with face detection
therefore, relevant for theoretical understanding of face systems due to the variations in image appearance such as pose,
detection and identification in most cases. Practical conditions scale, image rotation and orientation, illumination and facial
being rarely optimal, a number of factors play an important role expression make face detection a difficult pattern recognition
in hampering system performance. Image degradation, i.e., loss problem. Hence, for face detection following problems need to
of resolution caused mainly by large viewing distances as be taken into account [5]:
demonstrated in [4], and lack of specialized high resolution 1) Size: A face detector should be able to detect faces in
image capturing equipment such as commercial cameras are the different sizes. Thus, the scaling factor between the
underlying factors for poor performance of face detection and reference and the face image under test, needs to be given
recognition systems in practical situations. There are two due consideration.
paradigms to alleviate this problem, but both have clear 2) Expressions: The appearance of a face changes
disadvantages. One option is to use super-resolution algorithms considerably for different facial expressions and thus,
to enhance the image as proposed in [20], but as resolution makes face detection more difficult.
decreases, super-resolution becomes more vulnerable to 3) Pose variation: Face images vary due to relative
environmental variations, and it introduces distortions that camera-face pose and some facial features such as an eye
affect recognition performance. A detailed analysis of or the nose may become partially or wholly occluded.
super-resolution constraints has been presented in [3]. On the Another source of variation is the distance of the face from
other hand, it is also possible to match in the low-resolution the camera, changes in which can result in perspective
domain by downsampling the training set, but this is distortion.
undesirable because features important for recognition depend 4) Lighting and texture variation: Changes in the light source
on high frequency details that are erased by downsampling. in particular can change a faces appearance can also cause
These features are permanently lost upon performing a change in its apparent texture.
downsampling and cannot be recovered with upsampling [24]. 5) Presence or absence of structural components: Facial
The proposed system has been designed keeping in view features such as beards, moustaches and glasses may or
these critical factors and to address such bottlenecks. may not be present. And also there may be variability
among these components including shape, colour and size.
II. IDEA OF THE PROPOSED SOLUTION The proposed system employs global feature matching for
face recognition. However, all computation takes place only on
The database consists of a set of face samples of 50 people.
the frontal face, by eliminating the hair and background as these
There are 5 test images and 5 training or reference images.
may vary from one image to another. All systems therefore
Frontal face images are detected and hence, extracted. DWT is
need frontal face extraction. One approach to achieve the
applied to the entire image so as to obtain the global features
aforementioned is by manually cropping the test image for
which include approximate coefficients (low frequency
required region or by precisely aligning the user's face with the
coefficients) and detail coefficients (high frequency
camera before the test sample is clicked. Both these methods
2
CiiT International Journal of Digital Image Processing, ISSN 0974 9691 (Print) & ISSN 0974 9586 (Online)
Vol. 15, No. 10, October 2011
may introduce a high degree of human error and so they have Tone identification is applied to detect skin by calculating
been avoided. Instead, automated Frontal Face Detection and the gray threshold of a and b colour components and then
Extraction is put to use. Therefore, a robust automatic face converting the image to pure black and white (BW) using the
recognition system should be capable of handling the above obtained threshold. Thus, RGB colour space can separate out
problem with no need for human intervention. Thus, it is only specific pigments, but L*a*b* space can separate out
practical and advantageous to realize automatic face detection tones. Fig. 2 shows skin color differentiation in the form of
in a functional face recognition system. Commonly used white color using L*a*b* space whereas Fig. 4 shows no such
methods for skin detection include Gabor filters, neural differentiation using RGB. The extracted frontal face has been
networks and template matching. shown in Fig. 3.
It has been proved that Gabor filters give optimum output for
a wide range of variations in the test image with respect to user
image, but it is the most time intensive procedure [8].
Moreover, it is unlikely that the test image would be severely
out of sync for on the spot face recognition, so this method is
not used. Most neural-network based algorithms [18],[19] are
tedious and require training samples for different skin types Fig. 2. Reference image (left), image in black and white a plane
which add to the already vast reference image database; hence, (middle) and image in black and white b plane (right)
even this does not fit the program's requirements. Even
template matching has severe drawbacks, including high
computational cost [10] and fails to work as expected when the
user's face is positioned at an angle in the test image.
After considering all the above factors, a classical
appearance based methodology is applied to extract Frontal
face. The default sRGB colour space is transformed to L*a*b*
gamut, because L*a*b* separates intensity from a and b colour Fig. 3. Image in L*a*b* color space (left), frontal face extracted
components [11]. L*a*b* colour is designed to approximate image (middle) and grayscale resized image (right)
human vision in contrast to the RGB and CMYK colour
models. It aspires to perceptual uniformity, and its L component
closely matches human perception of lightness. It can thus be
used to make accurate colour balance corrections by modifying
output curves in the a and b components, or to adjust the
lightness contrast using the L component. In RGB or CMYK
spaces, which model the output of physical devices rather than
human visual perception, these transformations can only be
Fig. 4. B&W image of red color space (left), B&W image of green
done with the help of appropriate blend modes in the editing color space (middle) and B&W image of blue color space (right)
application [10]. This distinction makes L*a*b* space more
perceptually uniform as compared to sRGB and thus There is higher probability of skin surface being the lighter
identifying tones, and not just a single colour, can be part of the image as compared to the gray threshold [12], This
accomplished using L*a*b* space. may happen due to illumination and natural skin colour (in
Fig. 1 shows the RGB colour model (B) relating to the most cultures), so, pure white regions in the black and white
CMYK model (C). The larger gamut of L*a*b* (A) gives more image correspond to skin. It is assumed that face will have at
of a spectrum to work with, thus making the gamut of the least one hole, i.e., a small patch of absolute black due to eyes,
device the only limitation. chin, dimples etc. [11] and on the basis of presence of holes
frontal face is separated from other skin surfaces like hands. A
bounding box is created around the Frontal Face and after
cropping the excess area, frontal face extraction is complete.
The above technique has been tested extensively on images
obtained from standalone VGA cameras, webcams and camera
equipped mobile devices having a resolution of 640 480.
Even for resolutions as low as 320 200, where the test image
is poorly illuminated or extremely grainy, the algorithm was
able to successfully extract frontal face from test images. Thus,
the proposed system is robust enough to achieve desired result
even when low cost equipment like CCTV's and low resolution
Fig. 1. RGB, CMYK and L*a*b* Colour Model webcams are used. Also, since equipment with inferior picture
quality like CCTV's and low resolution webcams are used, the
3
CiiT International Journal of Digital Image Processing, ISSN 0974 9691 (Print) & ISSN 0974 9586 (Online)
Vol. 15, No. 10, October 2011
4
CiiT International Journal of Digital Image Processing, ISSN 0974 9691 (Print) & ISSN 0974 9586 (Online)
Vol. 15, No. 10, October 2011
known wavelet and was proposed in 1909 by Alfred Haar. The approximate coefficients (low frequency coefficients) are
term wavelet was coined much later. The Haar wavelet is also stored and the detail coefficients (high frequency coefficients)
the simplest possible wavelet. Wavelets are mathematical are discarded. These approximate coefficients are used as
functions developed for the purpose of sorting data by inputs to the next level. Level 2 yields the number of
frequency. Translated data can then be sorted at a resolution approximate coefficients as 32 32 = 1024. These steps are
which matches its scale. Studying data at different levels allows repeated until an improvement in the recognition rate is
for the development of a more complete picture. Both small observed. At each level the detail coefficients are neglected and
features and large features are discernable because they are the approximate coefficients are used as inputs to the next level.
studied separately. Unlike the Discrete Cosine Transform These approximate coefficients of input image and registered
(DCT), the wavelet transform is not Fourier-based and hence, image are extracted. Each set coefficients belonging to the test
does a better job of handling discontinuities in data [16]. image is compared with those of the registered image by taking
For an input represented by a list of 2n numbers, the Haar the Euclidean distance and the recognition rate is calculated.
wavelet transform may be considered to simply pair up input Table 1 shows the comparison carried out at each level and its
values, storing the difference and passing the sum. This process recognition rate.
is repeated recursively, pairing up the sums to provide the next
scale, finally resulting in 2n 1 differences and one final sum. TABLE I
COMPARISON OF VARIOUS LEVELS OF DWT
Each step in the forward Haar transform calculates a set of
Recognition Rate Recognition Rate
wavelet coefficients and a set of averages. If a data set s0, s1, Levels Coefficients without normalized with normalized
sN-1 contains N elements; there will be N/2 averages and N/2 image image
coefficient values. The averages are stored in the lower half of Level 1 4096 85.1% 91.4%
the N element array and the coefficients are stored in the upper Level 2 1024 91.4% 91.4%
half. The averages become the input for the next step in the Level 3 256 93.6% 95.7%
Level 4 64 89% 93.6%
wavelet calculation, where for iteration i+1, Ni+1 = Ni/2. The Level 5 16 87.6% 93.6%
Haar wavelet operates on data by calculating the sums and
differences of adjacent elements. The Haar equations to
calculate an average (ai) and a wavelet coefficient (ci) from an Upon inspecting the results obtained, we can infer that Level
odd and even element in the data set can be given as: 3 offers better performance in comparison to other levels.
Hence the images are subjected to decomposition only up to
( S i + S i +1 ) Level 3.
ai = (2)
2 IV. FUTURE WORK
( S i S i +1 )
ci = (3) The proposed face detection algorithm is time-efficient, i.e.,
2 having an execution speed of less than 1.75 seconds on an Intel
Core 2 Duo 2.2 GHz processor. Due to its speed and robustness,
In wavelet terminology, the Haar average is calculated by it can further be extended for real time face detection and
the scaling function while the coefficient is calculated by the identification in video systems. Also, the proposed face
wavelet function. recognition method can be coupled with recognition using local
D. Inverse Discrete Wavelet Transform features, thus leading to an improvement in accuracy.
The data input to the forward transform can be perfectly
reconstructed using the following equations: V. CONCLUSION
The proposed face detection and extraction scheme was able
S i = ai + ci (4) to successfully extract the frontal face for poor resolutions as
low as 320 200, even when the original image was poorly
S i = ai ci (5) illuminated or extremely grainy. Thus, images obtained from
low cost equipment like CCTV's and low resolution webcams
After applying DWT, we take approximate coefficients, i.e., could be processed by the algorithm.
output coefficients of low pass filters. High pass coefficients For face recognition, the recognition rate for global features
are discarded since they provide detail information which using various levels of DWT was calculated. Generally, the
serves no practical use for our application. Various levels of recognition rate was found to improve upon normalization.
DWT are used to reduce the number of coefficients. Level 3 DWT Decomposition gives a superior recognition rate
as compared to other decomposition levels.
III. IMPLEMENTATION STEPS
REFERENCES
The image size used in the project work is 128 128 pixels.
[1] Z. Hafed, Face Recognition Using DCT, International Journal of
On applying the wavelet transform, the image is divided into Computer Vision, 2001, pp. 167-188.
approximate coefficients and detail coefficients. Level 1 yields [2] W. Zhao, R. Chellappa, Face Recognition: A Literature Survey, ACM
the number of approximate coefficients as 64 64 = 4096. The Computing Surveys, Vol.35, No.4, December 2003, pp. 399-458, p. 9.
5
CiiT International Journal of Digital Image Processing, ISSN 0974 9691 (Print) & ISSN 0974 9586 (Online)
Vol. 15, No. 10, October 2011
[3] S. Baker, T. Kanade, "Limits on super-resolution and how to break them," Divya P. Jyoti (M2008) was born in Bhopal (M.P.) in
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. India on April 24, 1990. She is currently pursuing her
24, No. 9, pp. 1167-1183, September 2002. undergraduate studies in the Electronics and
[4] A. Braun, I. Jarudi and P. Sinha, Face Recognition as a Function of Telecommunication Engineering discipline at Thadomal
Image Resolution and Viewing Distance, Journal of Vision, September Shahani Engineering College, Mumbai. Her fields of
23, 2011, Vol. 11 No. 11, Article 666. interest include Image Processing, and Human-Computer
[5] L. S. Sayana, M. Tech Dissertation, Face Detection, Indian Institute of Interaction. She has 4 papers in International Conferences
Technology (IIT) Bombay, p. 5, pp. 10-15. and Journals to her credit.
[6] C. Schneider, N. Esau, L. Kleinjohann, B. Kleinjohann, "Feature based Aman R. Chadha (M2008) was born in Mumbai (M.H.)
Face Localization and Recognition on Mobile Devices," Intl. Conf. on in India on November 22, 1990. He is currently pursuing
Control, Automation, Robotics and Vision, Dec. 2006, pp. 1-6. his undergraduate studies in the Electronics and
[7] L. V. Praseeda, S. Kumar, D. S. Vidyadharan, Face detection and Telecommunication Engineering discipline at Thadomal
localization of facial features in still and video images, IEEE Intl. Conf. Shahani Engineering College, Mumbai. His special fields
on Emerging Trends in Engineering and Technology, 2008,pp.1-2. of interest include Image Processing, Computer Vision
[8] K. Chung, S. C. Kee, S. R. Kim, "Face Recognition Using Principal (particularly, Pattern Recognition) and Embedded Systems.
Component Analysis of Gabor Filter Responses, International He has 4 papers in International Conferences and Journals
Workshop on Recognition, Analysis, and Tracking of Faces and Gestures to his credit.
in Real-Time Systems, 1999, p. 53.
[9] T. Kawanishi, T. Kurozumi, K. Kashino, S. Takagi, "A Fast Template Pallavi P. Vaidya (M2006) was born in Mumbai (M.H.)
Matching Algorithm with Adaptive Skipping Using Inner-Subtemplates' in India on March 18, 1985. She graduated with a B.E. in
Distances," Vol. 3, 17th International Conference on Pattern Electronics & Telecommunication Engineering from
Recognition, 2004, pp. 654-65. Maharashtra Institute of Technology (M.I.T.), Pune in
[10] D. Margulis, Photoshop Lab Colour: The Canyon Conundrum and Other 2006, and completed her post-graduation (M.E.) in
Adventures in the Most Powerful Colourspace, Pearson Education. ISBN Electronics & Telecommunication Engineering from
0321356780, 2006. Thadomal Shahani Engineering College (TSEC), Mumbai
University in 2008. She is currently working as a Senior
[11] J. Cai, A. Goshtasby, and C. Yu, Detecting human faces in colour Engineer at a premier shipyard construction firm. Her special fields of interest
images, Image and Vision Computing, Vol. 18, No. 1, 1999, pp. 63-75. include Image Processing and Biometrics.
[12] Singh, D. Garg, Soft computing, Allied Publishers, 2005, p. 222.
[13] S. Jayaraman, S. Esakkirajan, T. Veerakumar, Digital Image Processing,
Mc Graw Hill, 2008. M. Mani Roja (M1990) was born in Tirunelveli (T.N.) in
[14] S. Assegie, M.S. thesis, Department of Electrical and Computer India on June 19, 1969. She has received B.E. in
Engineering, Purdue University, Efficient and Secure Image and Video Electronics & Communication Engineering from GCE
Processing and Transmission in Wireless Sensor Networks, pp. 5-7. Tirunelveli, Madurai Kamraj University in 1990, and M.E.
in Electronics from Mumbai University in 2002. Her
[15] P. Goyal, NUC algorithm by calculating the corresponding statistics of
employment experience includes 21 years as an
the decomposed signal , International Journal on Computer Science and
educationist at Thadomal Shahani Engineering College
Technology (IJCST), Vol. 1, Issue 2, pp. 1-2, December 2010 .
(TSEC), Mumbai University. She holds the post of an Associate Professor in
[16] Y. Ma, C. Liu, H. Sun, A Simple Transform Method in the Field of TSEC. Her special fields of interest include Image Processing and Data
Image Processing, Proceedings of the Sixth International Conference on Encryption. She has over 20 papers in National / International Conferences and
Intelligent Systems Design and Applications, 2006, pp. 1-2. Journals to her credit. She is a member of IETE, ISTE, IACSIT and ACM.
[17] H. A. Rowley, S. Baluja and T. Kanade, "Neural network-based face
detection," IEEE Transactions on Pattern Analysis and Machine
Intelligence, Vol. 20, No. 1, pp. 23-38, January 1998.
[18] P. Latha, L. Ganesan and S. Annadurai, Face Recognition Using Neural
Networks, Signal Processing: An International Journal (SPIJ), Volume:
3 Issue: 5, pp. 153-160.
[19] S. Park, M. Park and M. Kang, Super-resolution image reconstruction: a
technical overview, Signal Processing Magazine, pp. 21-36, May 2003.
[20] P. Hennings-Yeomans, S. Baker, and B.V.K. Vijaya Kumar,
Recognition of Low-Resolution Faces Using Multiple Still Images and
Multiple Cameras, Proceedings of the IEEE International Conference
on Biometrics: Theory, Systems, and Applications, pp. 1-6, September
2008.
[21] MIT-CBCL Face Recognition Database, Center for Biological &
Computational Learning (CBCL), Massachusetts Institute of Technology,
Available: http://cbcl.mit.edu/software-datasets/heisele/facerecognition
-database.html, July 2011.
[22] Multi-PIE Database, Carnegie Mellon University, Available:
http://www.multipie.org, July 2011.
[23] The Yale Face Database, Department of Computer Science, Yale
University, Available: http://cvc.yale.edu/projects/yalefacesB/
yalefacesB.html, June 2011.
[24] T. Frajka, K. Zeger, Downsampling dependent upsampling of images,
Signal Processing: Image Communication, Vol. 19, No. 3, pp. 257-265,
March 2004.