Jeas 0214 1007ARPN
Jeas 0214 1007ARPN
Jeas 0214 1007ARPN
net/publication/262187093
CITATIONS READS
85 33,926
3 authors:
SEE PROFILE
All content following this page was uploaded by Ashok Kumar Sahoo on 01 July 2014.
www.arpnjournals.com
ABSTRACT
Sign language is used by deaf and hard hearing people to exchange information between their own community
and with other people. Computer recognition of sign language deals from sign gesture acquisition and continues till
text/speech generation. Sign gestures can be classified as static and dynamic. However static gesture recognition is simpler
than dynamic gesture recognition but both recognition systems are important to the human community. The sign language
recognition steps are described in this survey. The data acquisition, data preprocessing and transformation, feature
extraction, classification and results obtained are examined. Some future directions for research in this area also suggested.
Keywords: sign language recognition, hand tracking, hand gesture recognition, gesture analysis, face recognition.
1. INTRODUCTION use ISL and nearly 5% of deaf people [3] attend deaf
Sign language (SL) [1] is a visual-gestural schools. The use of ISL is restricted only to vocational
language used by deaf and hard-hearing people for programs and short term courses. ISL was partly
communication purposes. Three dimensional spaces and influenced by British Sign Language in the finger spelling
the hand movements are used (and other parts of the body) system and some other signs, but most are unrelated to
to convey meanings. It has its own vocabulary and syntax European sign system.
which is purely different from spoken languages/written There was no formal ISL until 1978. Banerjee [4]
language. Spoken languages use the oratory faculties to compared the signs used in some schools for the deaf
produce sounds mapped against specific words and located in West Bengal and Assam. His conclusion was
grammatical combinations to convey meaningful that the gestures used in each school were not the same.
information. Then the oratory elements are received by the He believed that signing started in India in the 18th
auditory faculties and processed accordingly. Sign century but its use was strongly discouraged. Madan
language uses the visual faculties which is different from Vasishta [5] sent a questionnaire to the heads of more than
spoken language. Spoken language makes use of rules to hundred schools for the deaf in India in 1975. Almost all
produce comprehensive messages; similarly sign language the respondents agreed that there was no ISL. But they
is also governed by a complex grammar. A sign language also acknowledged that deaf children used some kind of
recognition system consists of an easy, efficient and gestures. A similar survey was conducted 20 years later,
accurate mechanism to transform sign language into text using a set of questionnaires sent to deaf schools. Some of
or speech. The computerized digital image processing and the responses showed the same misconceptions about sign
a wide variety of classification methods used to recognize language that signing is “based on spoken language”, or
the alphabet flow and interpret sign language words and “based on English”, or “difficult to provide a sign for
phrases. Sign language information can be conveyed using every spoken word”. Some statements showed that a more
gestures of hands, position of head and body parts. Four positive attitude towards manual communication, and here
essential components in a gesture recognition system are: respondents talked about sign language, rather than
gesture modeling, gesture analysis, gesture recognition gestures. Increasing awareness about the nature of sign
and gesture-based application systems [2]. languages was verified later on.
Observing the advantages of works on sign
1.1. Indian sign language: history language recognition of different countries in aiding the
Professionals in India believe in an acute shortage deaf people for communication in public places and
of special schools for deaf people. A very few schools use accessing/communicating with latest gadgets like
sign language as a medium of instruction. There is also a Telephone, Computers, etc, linguistic studies on Indian
lack of proper and effective audio visual support in oral Sign Language started in 1978 in India. These works
education in these schools. This results in inadequate resulted in ISL and discovered that it is a language on its
communication and language skills in the majority of deaf own with specific syntax, grammar, phonology and
children, impacting on poor literacy skills in the deaf morphology. Static numerals of ISL isolated gestures are
community. The reality is that deaf schools mainly do not shown in Figure-1.
(0) (1) (2) (3) (4) (5) (6) (7) (8) (9)
Figure-1. ISL static gestures for isolated numerals.
116
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
While significant progress has already been made The organization of the paper is as follows. We
in computer recognition of sign languages of other are summarizing the research papers from various authors
countries but a very limited work has been done in ISL according to following characteristics:
computerization [6].
a) Sign language used.
The works carried by various researchers
b) The domain of the sign language used.
worldwide are summarized in this paper. The domain is
c) Data acquisition methods employed.
isolated sign language but continuous sign language
d) Data transformation techniques.
recognition is also discussed due to similarity with isolated
e) Feature extraction methods.
sign language recognition. We also select research papers
f) Classification techniques.
where no special image acquiring devices are required.
g) Results obtained.
The reason is that in common places no special image
h) Conclusion.
acquiring devices are available at all the times, and all
deaf/mute/hard hearing persons might be unable to wear
2. SIGN LANGUAGES USED
due to their economic conditions and in most cases it is
It is reported that about 5% of world population
cumbersome to carry and wear. Also we select few
consists of deaf mute and hard hearing people. They used
research papers in which special wearable devices are used
some kind of hand, head, and body gesture to exchange
as inputs due to their better performance for comparison
their feelings/ideas. So almost all nation have its own Sign
purposes.
Language. The sign language development is different for
each country or sub-continent.
The Table-1 represents the sign languages of performed on ASL. We also include two survey papers on
influencing countries/sub-continent. The Table-1 indicates ISL.
the most dominating research is going on ASL, next
comes CSL and others follows. The reason is that a large 3. THE DOMAIN OF THE SL USED
number of standard database for ASL gesture are available SL is an independent language which is entirely
publicly. The developing countries are currently focuses different from spoken/written language. It has its own set
on the research in this field. Although two research papers of alphabet, numeral, word/phrases/sentences and so on.
from India are reported in this survey but the work was The basic difference is that it has limited vocabulary
117
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
compared to written/spoken language. Also in most of the 3.1.1. Life print Fingurespell Library
developing countries and under developed countries it is in American Sign Language University provides [9,
the initial phase. The development of the sign language in 13] online sign language instruction from the year 1997.
these countries will take years to become an independent The program is as an effort to support parents and relatives
language. But the computer recognition for sign language of deaf-mute children living in rural areas where access to
for these countries is started and significant works are sign language programs is limited. The technical details
reported in literature. about how the database is acquired are not available in the
A Sign Language has a set of alphabets and is the literature. However it has a rich library having all types of
same to the written/spoken language of the country it dataset ranging from static alphabets to simple and
belongs to. If we consider the case of ASL or BSL it is complex phrases including medical terms and up to
nothing but the alphabet set A to Z. Similarly the numerals advanced phrases.
0 to 9 are communicated by any sign language [2, 7, 8 and
9]. Secondly the words/phrases of any sign language 3.1.2. CAS-PEAL database
belong to a particular domain. Examples are "Why? CAS-PEAL face database [14] was developed by
","Award ","What for?", "How much? [10]; a coin, Joint Research and Development Laboratory (JDL) for
cigarette, flower; reluctantly, row, take, immediately, Advanced Computer and Communication Technologies of
understand, hate, left, seven, moon, eight, walk, Chinese Academy of Sciences (CAS), under the support of
conscience [11] and other set used like Friend, To Eat, the Chinese National Hi-Tech Program and the ISVISION
Neighbor, To sleep, Guest, To Drink, Gift, To wake up, Tech. Co. Ltd. The construction of the CAS-PEAL face
Enemy, To listen, Peace upon you, To stop talking, database was aimed for providing the researchers a large-
Welcome, To smell, Thank you, To help, Come in, scale Chinese face database for studying, developing, and
Yesterday, Shame, To go, House, To come and I/me [12]. evaluating their algorithms. The CAS-PEAL large-scale
The main aim is that when a researcher wants to produce a face images with different sources of variations, like Pose,
system of recognition of sign language he/she used a set of Expression, Accessories, and Lighting (PEAL) were used
words/phrases in a particular domain like banking, to advance the state-of-the-art face recognition
railways, public telephone booths or something that technologies.
focuses very general conversations in public places. The database contains 99, 594 images from 1040
Thirdly combinations of sign gestures for simple individuals (595 males and 445 females). For each subject
sentences/phrases are used in recognition of sign of the database, nine cameras with equal spaced in a
languages. horizontal semicircular layer were setup to capture images
The databases used by various researchers are across different poses in one shot. The person who was
classified according to: used to perform sign gestures also asked to look up and
down to capture 18 images in another two shots. The
Availability of standard database
developers also considered five kinds of expressions, six
Creating own database
kinds accessories (three goggles, and three caps), and
fifteen lighting directions, also with varying backgrounds,
3.1. Availability of standard database
distance from cameras, and aging.
The standard databases used by various
A specially designed photographic room was
researchers are available in various libraries. The
designed in the JDL Lab CAS. The room size was about
following library data in Table-2 are included in this
4m×5m×3½m. Some special apparatus were configured in
survey.
the room including multiple digital cameras, all kinds of
lamps, accessories in order to capture faces with different
Table-2. Standard database for sign language/face.
poses, expression, accessories, and lighting.
Library database Sign language The camera system consists of nine digital
Lifeprint Fingurespell Library ASL cameras and a specially designed computer. All the nine
cameras were placed in a horizontal semicircular layer
CAS-PEAL CSL with radius and height being 0.8m and 1.1m respectively.
Extended Yale B Yale B frontal The cameras used in the experiments were web-eye PC631
Face Database
(Subset of extended Yale B) with 370, 000 pixels CCD. They were all pointed to the
Weizmann face database Face Database center of the semicircular layer and labeled as 0 to 8 from
the subject signer’s right to left.
American Sign Language All of the nine cameras were connected and
Linguistic Research Project with ASL controlled by the same computer through USB interface.
transcription using SignStream The computer had been specially designed to support nine
ASL Lexicon Video Dataset ASL USB ports. A software package was designed to control
eNTERFACE ASL the cameras and capture images simultaneously in one
shot. In each shot, the software package obtained nine
PETS 2002 Similar to PSL images of the subject across different poses within no
RWTH-BOSTON-104 Database ASL
118
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
more than two seconds and stores these images in the hard By randomly picking lighting conditions to form
disk using uniform naming conventions. subspaces, the expected error rate should be an order of
A lighting system was designed using multiple magnitude larger than the configuration. Most impressive
lamps and lanterns. To simulate the ambient lighting, two fact that there were only three lighting configurations (out
photographic sunlamps of high power covered with of the total 16, 000 tested configurations) that performed
ground glass were used to imitate the indoor lighting better than the configuration. These three configurations
environment. all shared the same basic pattern with the configuration in
Fluorescent lamps were roughly arranged as the spatial distribution of their lighting directions, is a
‘lighting sources’ to form the varying lighting conditions. frontal lighting direction coupled with four lateral
The lamps were arranged in a spherical coordinate, whose directions.
origin is the center of the circle, which matched with the
semicircular layer. Fifteen fluorescent lamps were used at 3.1.5. Weizmann face database
the ‘lamp’ positions, which are uniformly located at In acquiring face images [17] about 1000 random
specific five azimuths (-90o,-45o, 0o, +45o, +90o) and three coordinates were selected in a single acquired image.
elevations (-45o, 0o, +45o). By turning on/off each lamp, Sixteen different, overlapping 5×5 patches around each
different lighting conditions are replicated. To decrease coordinate were then used to produce a subspace by taking
the labor cost and time, a multiplex switch circuit was their four principal components. These were stored in the
used to control the on/off of these lamps. subspace database. Additionally all sixteen patches were
Several types of glasses and hats were used as stored for the patch database. A novel test image was
accessories to further increase the diversity of the subdivided into a grid of non-overlapping 5×5 patches. For
database. The glasses consisted of dark frame glasses, thin each patch the database was searched for a similar point
and white frame glasses, glasses without frame. The hats patch using the exact sequential and point in ANN based
also had brims of different size and shape. search. The selected database patch was then used as an
Face images were captured with a blue cloth as approximation to the original input patch. Similarly, both
the default background. But in practical applications, exact sequential and ANN searches were used to select a
many cameras were working under the auto-white balance matching subspace in the subspace database for each
mode, which changed the face appearance. patch. The point on the selected subspace, closest to the
query patch, was then taken as its approximation.
3.1.3. Yale B frontal
To capture the images in this database [15], the 3.1.6. Sign stream
geodesic lighting rig was used with 64 computer Sign Stream [18, 19], a multimedia database tool
controlled xenon strobes whose positions in spherical distributed on a nonprofit basis to educators, students, and
coordinates to capture the images. The illumination with researchers and it provides a single computing
the rig was modified at frame rate and images were environment within which to view, annotate, and analyze
captured under variable illumination and pose. Images of digital video and/or audio data. It also provides direct on-
ten persons were acquired under sixty four different screen access to video and/or audio files and facilitates
lighting conditions in nine poses (a frontal pose, five poses detailed and accurate annotation, making it valuable for
at 12o, and three poses at 24o from the camera axis). The linguistic research on signed languages and the gestural
sixty four images of a face in a particular pose were component of spoken languages. The database is useful in
acquired in two seconds. different domains involving annotation and analysis of
The original size of the images was 640 × 480 digital video data.
pixels. In experiments [16], all images were manually Each facility was equipped with multiple
cropped to include only the face with as little hair and synchronized digital cameras to capture different views of
background. the subject data. The facilities were able to capture four
simultaneous digital video streams at up to 85 frames per
3.1.4. Extended Yale B second, while storing the video to disk for editing and
In the database [15, 17], forty five images were annotation. A substantial corpus of ASL video data from
captured in different lighting directions in average. The native signers was collected and is now publicly available
authors randomly generated configurations of five lighting for research purpose. The data acquisition consist the
directions among these forty five different lighting followings:
directions, and the corresponding five images were taken
to form the basis vector of a subspace. For each randomly a. Four PCs, each with a 500-MHz Pentium III
generated lighting configuration, there were five images processor, 256 MB RAM; 64 GB of hard drive
for training and forty images for testing. They randomly storage, and Bit Flow Road Runner video capture
generated 16, 000 different configurations of five lighting cards.
positions, and this number corresponds to roughly 1.5 b. Four Kodak ES-310 digital video cameras. Each
percent of the total number of configurations C (45, 5) = 1; camera was connected to any one PC.
221; 759.
119
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
c. A video sync generator which was used to reporting the test results (12 examples per sign). The
synchronize the cameras. Videos were captured in 30, subjects in training and test sets were different except for
60, or 85 frames per second. one subject whose examples are divided between training
d. An Ethernet switch allows the four PCs to and test sets. Equal numbers of sign classes were used
communicate with each other efficiently. training and testing sets. The authors applied a stratified
e. IO Industries ’Video Savant software’ installed on all seven-fold cross validation (CV) on the training sets where
PCs in order to synchronize video capture across the a validation set was needed. Sign language features were
four cameras. extracted from both manual signs (hand motion, hand
f. Various illumination sources, dark (black) shape, hand position with respect to face) and non-manual
backgrounds, chairs for subjects, and so on. signs (head motion). The center of mass (CoM) of each
hand was tracked and filtered by a Kalman Filter for hand
Out of the four PCs, one was designated as a motion analysis. Appearance-based shape features were
server and other three PCs were act as clients. To capture a calculated on the binary hand images for hand shape
video sequence the appropriate program was executed on features. It includes the parameters of an ellipse fitted to
server PC and corresponding client programs run on client the binary hand and statistics from a rectangular mask
PCs. Instructions were given to the server about how many placed on top of the binary hand. The system detects rigid
frames were to be captured and the starting time of the head motions such as head rotations and head nods for
recording. The captured frames ware then stored in the head motion analysis [21]. The orientation and velocity
hard drives in real time mode. With 64 GB of hard drive information of the head and the quantity of motion were
storage available, continuously video could be recorded also used as head motion features. More details can be
for 60 min, at 60 frames per second, in all four machines found in [20] for further study.
simultaneously, at an image resolution of 648×484 (width
× height). 3.1.8. ASL lexicon video dataset
Video sequences had been collected with four The authors introduced a new large-scale dataset,
video cameras configured in two different ways: the ASL Lexicon Video Dataset [11, 18, 22, 23 and 24],
containing video sequences of a large number of distinct
a) All cameras were focused on a single ASL subject sign classes of ASL. This dataset is publicly available now
signer. Two cameras make a stereo pair, facing toward and expanding rapidly. The authors believe that this
the signer and covering the upper half of the signer’s dataset will be an important resource for researchers in
body. One camera faces toward the signer and zooms sign language recognition and human activity analysis, by
in on the head of the signer. One camera was placed providing a large amount of data that can be used for
on the side of the viewer and covers the upper half of training and testing, and by providing a public benchmark
the subject signer’s body. dataset on which different methods can be evaluated. The
b) The cameras were focused on two ASL subject signers dataset is currently a part of a computer vision system that
engaged in conversation, facing each other. In this allows users to look up the meaning of a sign
setup, the cameras stand low on tripods placed in the automatically. The authors suggested that the dataset can
middle (between the two subject signers, but not be used for testing a wide variety of computer vision,
obstructing their conversation). machine learning, and database indexing algorithms. The
signer performs the sign in front of a camera (or, possibly,
One pair of cameras was focused so as to give a in a multi-camera set up), and the computer retrieves and
close-up facial view of each subject signer. The other pair displays the most similar signs in the lexicon dataset in
of cameras was facing one toward each subject signer, and this system.
covering the upper half of the subject signer’s body. The The video sequences were captured
video data are now available in both uncompressed and simultaneously from four different cameras, providing
compressed formats. Significant portions of the collected four views namely a side view, two frontal views, and a
data were also being linguistically annotated using Sign view zoomed in on the face of the signer. The upper body
Stream, and these data and the associated Sign Stream occupies a relatively large part of the visible scene in both
annotations publicly available through Internet (Refer to the side views and two frontal views. A frontal view of the
http://www.bu.edu/asllrp/ncslgr.html). face occupies a large part of the image in the face views.
Video was captured at 60 frames per second, non-
3.1.7. eNTERFACE interlaced, at a resolution of 640×480 pixels per frame; for
A single web camera with 640×480 resolution the side view, first frontal view, and face view. For the
and 25 frames per second rate was used for the recordings second frontal view, video was captured at 30 frames per
of signs in ASL. The camera was placed in front of the second, non-interlaced, at a resolution of 1600×1200
subject signer. Eight subject signers performed five pixels per frame. This high-resolution frontal view
repetitions of each sign and the video data were collected facilitated the application of existing hand pose estimation
[20] in the database. The database was divided into and hand tracking systems on the dataset, by displaying
training and test sets. For training 532 examples are used the hand in significantly more detail than in the 640×480
for training (28 examples per sign) and 228 examples for views.
120
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
The authors applied a motion energy method, 3.2. Creating own database
using a test set of 206 video sequences belonging to 108 Most of the researchers create their own database
distinct glosses (used as class labels) and a training set of for sign language recognition. This database can be also
999 video sequences belonging to 992 distinct glosses. For classified into digits, alphabets and phrases (simple or
almost all sign classes the authors had only one training complex). The Table-3 describes the characteristics of the
example. The worst possible ranking result for the correct dataset created by various researchers.
class of any test sign was 992. The test sequences were
signed by two signers, and the training sequences were 4. DATA ACQUISITION METHODS EMPLOYED
signed by another signer, who did not sign in any of the In creating standard database a set of digital
test sequences. Thus, the experiments are user- cameras/video cameras with different positions/places
independent. before the object are used by different researchers. Also
they employed different lighting illuminations,
3.1.9. PETS 2002 dataset background selection and other equipments like hat/cap,
The database [25, 26, and 27] consists of 1,000 dresses and spectacles to acquire data in the form of static
color images of 1282 pixels of 12 hand postures performed gestures (photographs) or dynamic gestures (videos).
by 19 persons against simple and complex backgrounds The same kinds of procedures are also followed
with varying amount of skin color. The images of three by other researchers for acquiring their own dataset. Some
subjects signing against uniform light and dark researchers use specially designed input devices to capture
backgrounds formed the training set, giving six training gestures dataset. Although specially designed devises like
images per posture, the remaining images formed the test the CyberGlove® [30] may be expensive but it is self
set were used for experimental purposes. For the images in sufficient to acquire desired data and no other supporting
the training set, they constructed graphs of 15 nodes. All input devices are required. In the following section we will
15 nodes were manually placed at anatomically significant explain each of them. In general digital cameras are used
points. The number of training images is quite small, but by various researchers to acquire static gestures (signs).
because preliminary experiments were very encouraging
and, due to the amount of manual work involved in 4.1. Digital still camera
creating the model graphs, they chose not to add more Two digital cameras [7, 31] with their optical axis
images to the training set. Gabor jets of the three different parallel to the Z axis were used for data acquisition. The
feature types were extracted at the node positions of every distance between the lenses of the cameras, called the
training image. baseline was set to 10 centimeters. The world coordinate
system {X, Y, Z} was chosen parallel to the camera
3.1.10 The RWTH-BOSTON-104 database coordinate system (x, y, z). The origin of the coordinate
The National Center for sign language and system was located exactly between the two cameras.
Gesture Resources of the Boston University published this Image for each sign was taken [2] by a camera.
database of ASL sentences [26, 27, 28 and 29]. This Portable Document Format (PDF) was used for images
Database consists of 201 annotated video streams of ASL preprocessing due to the system will be able to deal with
sentences. The images were captured simultaneously by images that have a uniform background. Images of signs
using four standard stationary cameras where three of were resized to 80×64. The authors used ‘bicubic’ method
them were black/white and one was a color camera. Two to reduce aliasing. The default filter size used was 11×11.
of the black/white cameras were placed towards the In [8, 32] images of 30 letters from the Chinese
signer's face, form a stereo pair and another camera was manual alphabet were collected from a camera. A total of
installed on the side of the signer. The color camera was 195 images were captured for each letter, therefore 5850
placed between the stereo camera pair and was zoomed to images in all.
capture only the face of the signer. The videos published In the designated experiment [33], the authors
on the Internet were at 30 frames per second, the size of used 26 hand gestures to express 26 letters. The hand
the videos is 366×312 pixels, and the size of the frames gesture images were captured in five different views. So,
without any additional recording information was five cameras were mounted at approximately 2/3 body
312×242 pixels. height with looking directions that was parallel to the
To use the RWTH-BOSTON-104 database for ground plane. There were 130 hand gestures from all the
ASL sentence recognition, the authors separated the video views were cropped by the minimum exterior rectangle of
streams into a training and test set. The training set hand region and then resized to 80×80 pixels. All images
consists of 161 sign language sentences and the test set were transformed to gray scale images and binary images,
includes the 40 remaining sign language sentences. respectively.
Further the training set was splited again into a smaller A digital camera was used to image acquisition
training set with 131 sequences and a development set and a colored glove was used [34] to allow image
with 30 sequences, in order to tune the parameters of the processing using the color system (Hue, Saturation,
system used. The database is freely available at Intensity) HSI. A total of 900 colored images were used to
http://www-i6.informatik.rwth-aachen.de/~dreuw/ represent the 30 different hand gestures. Segmentation
database.php.
121
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
process divides the image into six layers representing the gestures consists of a sequence of 30 image frames
five fingertips and the wrist. captured using a single hand moving in different directions
The required images were acquired using a digital with constant or time-varying hand shape. Gestures had
camera by the authors [35]. Backgrounds of all images similar or different moving trajectories. The shape patterns
were kept black for uniformity. Five persons (male and and trajectory patterns to find the resemblance between the
females) with mean age 25 years were recruited for the input model and the stored model were used in the
study. They were familiar with the experimental procedure recognition phase. Four different hand motion directions
before the experimental data collection. By varying the were available in the database. Accordingly the direction
hand orientation and its distance from camera, 30 images of hand’s motion was used to classify a sign in a group of
for each sign were captured. motion directions. Motion history image are used for
A camera was used [36] to collect the gesture finding motion direction. After this coarse classification,
data. Because the main focus was on the adaptation key frames in a sign were selected by means of Fourier
module, the vocabulary and the feature extraction were descriptor [39]. Tow frames with differences
simplified. The vocabulary consisting of 10 gestures, each corresponding to max FD coefficients were selected as key
of which is a number with 3 connected digits. The digit frames. Each sign had two key frames. The features were
was signed by Chinese spelling way. The authors extracted extracted from each key frame with GCD and stored in a
eight features from these gestures: the area, the vector. So for each sign there was a vector. Then sign
circumference, the length of two axes of the ellipse to fit language recognition was conducted according to the GCD
the gesture region and their derivatives. Experimental data features of hand shapes in key frames and Euclidean
set consists of 1200 samples over 10 gestures and 5 distance classifier.
subjects. Among the 5 subjects, 4 of them are selected as The database [12] consists of 23 Arabic gestured
the training subjects, and each of them signs 5 times of words/phrases collected from 3 different signers. The list
each gesture. of words was the most commonly used in the
In this work [37], 32 signs of PSL alphabet were communication between the deaf and the hearing society.
captured to train and test the proposed system. The Each signer was asked to repeat each gesture for 50 times
selected static signs were collected from one hand. The over 3 different sessions, so a total of 150 repetitions of
required images were obtained using a digital camera. the 23 gestures results a grand total of 3450 video
Background of all images was black for uniformity, and segments. The signers were videotaped using a digital
the experimental data were collected by varying the hand camcorder without imposing any restriction on clothing or
orientation and its distance from camera. 640 images for image background for user independent recognition
the selected signs were prepared. Among these 416 images applications. The three signer participants (one male and
were utilized as the training set, and the remaining 224 two females) were quite diverse in terms of size and
images were employed as the test set. height.
The authors [40] decided to select vision based
4.2 Video camera system to continue this project as they felt data gloves
For the identification of the image areas with were very expensive and inconvenient to the signer. Sri
hand and face color several points belonging to the Lankan Tamil finger spellers’ average speed of the finger
corresponding areas in the image was obtained from web spelling per minute is forty five (45) letters per minute.
camera. The proposed hand shape comparison algorithm That implies a signer will be able to sign only 0.75 signs
[10] was tested on a set of 240 images taken from 12 per second. A standard web camera is capable to capture
signs. Locating the user's face and hands was solved in fifteen frames per second and processing of fifteen frames
two stages, first, determination of the image areas with in a second is computationally very expensive. So, the
skin color and, secondly, segmentation of the obtained authors proposed a method in video capturing module to
area for faces and hand recognition. Face segment was capture only three frames per second, which will allow
considered to the larger part of the image, and two smaller system to speed up the recognition process.
ones were hands. The user should wear a long sleeved The vision-based sign language recognition
garment which was differing in color from his/her skin. system in Intelligent Building (IB) [41] was able to
The video sequence of the subject signer was capture images of the sign language user by a video
obtained by using a camera. In this work, the authors [38] camera. An integrated algorithm (AdaBoost) was
suggested that the camera faces towards the signer in order incorporated in the system for the face and hands detection
to capture the front view of the hand gestures. The to address the problem of real-time recognition system.
initiation of the acquisition was carried out manually. A The system extracts features of sign language gestures,
camera sensor was required in order to capture the image facial expressions and lip movements separately after
frames from the video sequence of the signer. pretreatment. These features were matched with sign
In the experiments [11] the subject signers were language database and facial expressions database.
asked to wear dark clothes with long sleeves with white Simultaneously, lip movements are extracted through
gloves and stand before dark curtains under normal image edge detection and matched with mouth shape
lighting illumination. 15 different gestures from TSL were database. After processing of semantics disambiguation,
captured with the help of a video camera. Each input
122
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
all of the recognition results are integrated for translating (i) Hand gestures were recognized without resorting to
the sign language into speech. any special marks, limited or uniform background, or
Video sequences in [42] were captured from a particular illumination.
CCD camera for the proposed system. Since hand images (ii) Only one un-calibrated video camera was utilized.
are two-fold, the 2-DHMM, an extension to the standard (iii) The user was allowed to perform sign language letters,
HMM, offers a great potential for analyzing and within the view field of the camera.
recognizing gesture patterns were used. Due to fully (iv) The proposed system observes the user and give
connected 2-DHMMs lead to an algorithm of exponential feedback in real-time.
complexity, the connectivity of the network has been
reduced in several ways, two among which are Markov 4.3. Specially designed input devices
random field and its variants and pseudo 2-DHMMs. The Specially designed data acquisition devices are
latter model, called P2-DHMMs, is a very simple and also used by some of the researchers in order to acquire
efficient 2-D model that retains all of the useful HMMs input signs. These are the list of input devices:
features. This paper focused on the real-time construction
CyberGlove®
of hand gesture P2-DHMMs. The proposed P2-DHMMs
Sensor Glove
use observation vectors that are composed of two-
Polhemus FASTRAK
dimensional Discrete Cosine Transform 2-D DCT)
coefficients
4.3.1. CyberGlove®
The gesture recognition system uses both the
The CyberGlove® [30, 43] consists of two bend
temporal and characteristics of the gesture for recognition.
sensors for fingers, four abduction sensors, and some
The system was also robust to background clutter, did not
additional sensors to measure thumb crossover, palm arch,
use special glove to be worn and runs in real time.
wrist flexion and wrist abduction. The 18 sensors are
In the experiments which were performed in a
based on a linear, resistive bend sensing technology,
user-independent manner [23], the database contains 933
which was used in experiments to transform hand and
examples of signs, corresponding to 921 unique sign
finger configuration into real-time joint-angle data that
classes. The persons performing signs in the query videos
were converted and digitized to 8 bits. These data from the
did not appeared in the database videos. All test images
18 sensors were captured at a rate of 112 samples per
were obtained from video sequences of a native ASL
second, and provide 18-D feature vectors to describe the
signer either performing individual hand shapes in
handshape. In the experiments, the authors attached one
isolation or signing in ASL. The test images were obtained
receiver to the chest area of a signer to serve as a
from original frames by extracting the sub-window
reference, and attached another to the wrist of the
corresponding to the hand region, and performing the
dominant signing hand, to obtain hand tracking data at 60
same normalization that had performed for database
Hz. The hand and glove data were simultaneously
images, so that the image were resized into 256×256
acquired at a synchronized rate of 60 Hz.
pixels, and the minimum enclosing circle of the hand
region is centered at pixel (128, 128), and has radius 120.
4.3.2. Sensor glove
In the experiments 20 different hand shapes were
For experiments [44], the authors used the
included. Those 20 hand shapes are all commonly used in
MEMS sensors ADXL202 accelerometers
ASL. For each hand shape, the authors synthetically
(www.analog.com). The ADXL202 is of low cost, low
generated a total of 4, 032 database images that
power, and complete two-axis accelerometers on a single
correspond to different 3D orientations of the hand. The
IC chip with a measurement range of ±2g. The ADXL202
3D orientation depends on the viewpoint and on the image
was used to measure both dynamic acceleration (e.g.,
plane rotation. A sample of 84 different viewpoints from
vibration) and static acceleration (e.g., gravity).
the viewing sphere was collected, so the viewpoints were
The surface micromachining technology is used
approximately spaced 22.5 degrees apart. Also a sample of
to fabricate the accelerometer. It is composed of a small
48 image plane rotations was used, so that rotations were
mass suspended by springs. Capacitive sensors distributed
spaced 7.5 degrees apart. A total of 80,640 images were
along two orthogonal axes provide a measurement
collected for the experiment. Each image was normalized
proportional to the displacement of the mass with respect
to be of size 256×256 pixels, and the hand region in the
to its rest position. The sensor is able to measure absolute
image was normalized so that the minimum enclosing
angular position due to the mass is displaced from the
circle of the hand region is centered at pixel (128, 128),
center, either because of acceleration or due to an
and has radius 120. All database images were generated
inclination with respect to the gravitational vector. The
using computer graphics using Poser 5 software.
outputs produced by the Sensor Glove are digital signals
For the purpose of a typical gesture recognition
whose duty cycles (ratio of pulse width to period) are
system, the proposed [14] system has the following
proportional to the acceleration in each of the two
prominent characteristics:
sensitive axes. The output period can be adjusted from 0.5
to 10 ms via a single resistor RSET. If a voltage output is
required, a voltage output proportional to acceleration can
be achievable from the XFILT and YFILT pins, or may be
123
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
reconstructed by filtering the duty cycle outputs. The The hand’s contour was chosen to obtain information on
bandwidth of the ADXL202 can be set from 0.01 Hz to 5 the shape of the hand and also used the hand’s center of
kHz via capacitors CX and CY, if required. The typical gravity (COG) as the reference point which alleviated the
noise floor is 500 µg/ (Hz) ½ which allow signals below 5 bias and applied as other reference points. After defining
mg to be resolved for bandwidths below 60 Hz. The the reference point, the distance between all the different
sensing device used for experiments consists of six points of a contour respect to the COG of the hand were
ADXL202 accelerometers attached on a glove, five on the estimated. The location of the tip of the hand was easily
fingers, and one on the back of the palm. The Y axis of the extracted by extracting the local maximum of the distance
sensor on each finger points toward the fingertip, which vector. To reduce the noise introduced by the quantization
provides a measure of joint flexion. The Y axis of the of the image and the contour extracting methods, a moving
sensor located on the back of the palm is able to measure average filter to smooth the distance vector was used in
the flexing angle of the palm. The X axis of the sensor on the experiments.
the back of the palm can be used to extract information of The RGB color space [8, 37] (Red, Green and
hand roll, and the X axis of the sensor on each finger can Blue) was converted to gray scale image and then to a
provide information of individual finger abduction. binary image. Binary images are images whose pixels
Data can be collected by measuring the duty have only two possible intensity values. They are normally
cycle of a train of pulses of 1 kHz. The duty cycle is 50%, displayed as black and white. Numerically, the two values
when a sensor is in its horizontal position. When it is tilted are often 0 for black, and either 1 or 255 for white. Binary
from +90o to -90o, the duty cycle varies from 37.5% (0.375 images can be produced by thresholding (0.25 in case of
ms) to 62.5% (0.625 ms), respectively. The duty cycle is [37]) a grayscale or color image, in order to separate an
measured using a BASIC Stamp microcontroller in the object in the image from the background. The color of the
device used. The Parallax BASIC Stamp module is a object (usually white) is referred to as the foreground
small, low cost general-purpose I/O computer that is color. The rest (usually black) is referred to as the
programmed in a simple form of BASIC (refer background color. However, depending on the image
www.parallax.com for details). The pulse width modulated which is to be thresholded, this polarity might be inverted
output of the ADXL202 can be read directly of the BASIC in which case the object is displayed with zero and the
Stamp module, so no ADC is required. Twelve pulse background is with a non-zero value. This conversion
widths are read sequentially by the microcontroller, resulted in sharp and clear details for the image.
beginning with the X axis followed by the Y axis, thumb These actions proposed [10] were (1) the
first. The data are then sent through the serial port to a PC downscaling of initial video, (2) skin color area detection
for the purpose of analyses. using neural network classifier and the inverse Phong
reflection model, (3) 10×10 rectangle pixel averaging,
4.3.3. Polhemus FASTRAK which rejects minor objects with color of the user's skin,
It [43, 45] provides real time 6 Degree-of- (4) skin area clustering and label assignment (the larger
freedom (6DOF) tracking with virtually no latency. It may cluster was a face, two smaller clusters were hands) and
be used for head, hand and instrument tracking for (5) hand motion trajectory refinement by means of the
biomedical analysis, graphics and cursor control, closest neighbor method.
digitizing and pointing, steriotaxic localization, Images were captured [31] from a parallel
telerobotics and other applications. It captures data with binocular system the hand was extracted by a color-based
high accuracy with maximum reliability and is used for segmentation process and its contour was used in order to
electromagnetic motion tracking system. model the hand’s geometry. After the contour was
It tracks the position (X, Y, and Z coordinates) extracted from the image, the extracted contour was
and orientation (azimuth, elevation, and roll) of a small represented by Elliptical Fourier coefficients and a limited
sensor as it moves through space. The tracking system’s set of harmonies.
near zero latency makes it very ideal for virtual reality The conversion of video sequences into frame
interfacing, simulators and other real time response format (any size of frame format) was the first step in [13].
applications. It also converts the acquired data that can be Backgrounds in the used video sequences were uniform
used in popular computer graphics programs. By just and non uniform. In the proposed hand gesture recognition
pressing buttons data are captured in a simpler manner. It system the first step was video object plane (VOP)
provides exceptional stability in power grid fluctuations. generation. Inter-frame change detection algorithm was
The system setup is very easy as it can be connected to a used for extracting the VOP using contour mapping.
PCs USB/RS-232 port. Various factors including lighting illumination,
background, camera parameters, and viewpoint or camera
5. DATA TRANSFORMATION location were used to address the scene complexity in the
There are several reference points [7] which can research [38]. These scene conditions affect images of the
be used for image analysis. In sign language recognition same object dramatically. The first step of preprocessing
where the motion of the hand and its location in block was filtering. A moving average or median filter
consecutive frames is a key feature in the classification of was used to remove the unwanted noise from the image
different signs, a fixed reference point must be chosen. scenes. Background subtraction forms the next major step
124
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
in the preprocessing block. Running Gaussian average At the first step [25, 48] in the image processing
method [46] is used in order to obtain the background phase a hand region extraction was performed. The
subtraction as it is very fast and consumes low memory experiments have been done in front of a simple
when compared to other similar methods. background and in constant lightning conditions. Three
The hand gesture image sequence was analyzed well-known models, namely: normalized RGB, Gaussian
for key frame selection after global motion analysis. As distribution model of a skin color and morphological
the hand shapes between two consecutive view models image processing have been used for this purpose.
were very similar to each other, the authors [46] select
some key frames for the stored model generation and the 6. FEATURE EXTRACTION
input model generation. The closed boundary of Refer to Table-4 for details.
segmented hand shape was described by a Fourier
Descriptor (FD) vector with the first 25 coefficients. Due 7. CLASSIFICATION
to the properties of rotation, translation, dilation invariant Various classification techniques which are used
the database space of the stored models was reduced. by researchers to recognize sign language gestures are
The video sequences of a given gesture were summarized in the Table-5.
segmented in the RGB color space prior to feature
extraction [12]. This step had the advantage of colored 8. RESULTS
gloves worn by the signers. Samples of pixel vectors The results obtained by various research papers
representatives of the glove’s color were used to estimate are summarized in Table-6. The Table-6 (b) shows the
the mean and covariance matrix of the color which was results obtained from standard datasets that are available
segmented. So the segmentation process was automated for research work, which we described in section 3.1.
with no user intervention. The measure of pixel Similarly the result from creators own datasets are
similarities was used by the Mahalanobis distance. A pixel summarized in Table-6(c). The result includes the
vector that falls within the locus of points that describe the parameters like input Sign Language, Dataset size,
3D ellipsoid was classified as a glove pixel. The threshold Training set, Testing set, standard dataset/ creators of own
used to define the locus of points was set to the maximum dataset, classification methods and finally recognition
standard deviation of the three color components. Once the rate[49, 50].
images were segmented, a 5×5 median filter was used to The table indicates that neural network and HMM
counter affect any imperfections as a result of the variations [51] are widely used by the researchers due to
segmentation process. their popularity in terms of recognition percentage.
In the proposed work [35] color images were first
resized to 250×250 pixels and then, the RGB (Red, Green 9. CONCLUSIONS
and Blue) images were converted to gray scale images. After thorough analysis, the following are
Users were not required to use any gloves or visual conclusions for future research in sign language
markings; instead the system uses only the images of the recognition:
bare hand taken by a digital camera.
Current systems are mainly focused on static signs/
In color object tracking method [47] the video
manual signs/ alphabets/ numerals.
frames were converted into color HSV (Hue-Saturation-
Standard dataset not available for all countries/sub
Value) space. Then the pixels with the tracked color were
continents / languages.
identified and marked and the resultant images were
A need for large vocabulary database is the demand
converted to a binary (Gray Scale image). In image
for current scenario.
preprocessing, all the images were cropped and their eye-
Focus should be on continuous or dynamic signs and
points were manually aligned.
nonverbal type of communication.
Then all the image vectors were normalized to
Sign language recognition systems should adopt data
unity [17].
acquisition in any situation (not restricted to
The system [42] identifies image regions
laboratory data).
corresponding to human skin by binarizing the input
Systems should be able to distinguish face, hand
image with a proper threshold value. Then small regions
(right/left) and other parts of body simultaneously.
from the binarized image were removed by applying a
Systems should perform recognition task in a
morphological operator and select the regions to obtain an
convenient and faster manner.
image as candidate of hand.
125
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
126
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
127
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
128
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
129
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
130
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
131
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
132
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
[14] Xiaolong T., Bian W., Weiwei Y. and Chongqing Liu. [25] Flasiński M.and Myśliński S.. 2010. On the use of
2005. A hand gesture recognition system based on graph parsing for recognition of isolated hand postures
local linear embedding. Journal of Visual Languages of Polish Sign Language. Pattern Recognition. 43(6):
and amp; Computing. 16(5): 442-454. 2249-2264.
[15] Kuang-Chih L., Ho J. and Kriegman D. J. 2005. [26] Triesch J. and von der Malsburg C. 2002.
Acquiring linear subspaces for face recognition under Classification of hand postures against complex
variable lighting. Pattern Analysis and Machine backgrounds using elastic graph matching. Image and
Intelligence. IEEE Transactions on. pp. 684-698. Vision Computing. 20(13-14): 937-943.
[16] Georghiades A. S., Belhumeur P. N. and Kriegman D. [27] Triesch J. and von der Malsburg C. 2001. A system
J. 2001. From few to many: illumination cone models for person-independent hand posture recognition
for face recognition under variable lighting and pose. against complex backgrounds. Pattern Analysis and
Pattern Analysis and Machine Intelligence. IEEE Machine Intelligence. IEEE Transactions on. pp.
Transactions on. 23(6): 643-660. 1449-1453.
[17] Rana, S, Liu, W., Lazarescu, M and Venkatesh, S.. [28] Philippe D., David R., Thomas D., Zahedi M. and Ney
2009. A unified tensor framework for face H. 2007. Speech recognition techniques for a sign
recognition. Pattern Recognition. 42(11): 2850-2862. language recognition system. In INTERSPEECH-
2007. pp. 2513-2516.
[18] Thangali A., Nash J. P., Sclaroff S. and Neidle C.
2011. Exploiting phonological constraints for [29] ftp://wasserstoff.informatik.rwth-aachen.de/pub/rwth-
handshape inference in ASL video. Computer Vision boston-104/readme.info (accessed on 10 March 2012).
and Pattern Recognition (CVPR). IEEE Conference
on. pp. 521-528. [30] Kong, W. W. and Ranganath, S.. 2008. Signing Exact
English (SEE): Modeling and recognition. Pattern
[19] Tsechpenakis G., Metaxas D. and Neidle, C. . 2006. Recognition. 41(5): 1638-1652.
Learning-based dynamic coupling of discrete and
continuous trackers. Computer Vision and Image [31] Rezaei A., Vafadoost M., Rezaei S. and Daliri A.
Understanding. 104(2-3): 140-156. 2008. 3D Pose Estimation via Elliptical Fourier
Descriptors for Deformable Hand Representations.
[20] Aran O., Ari I., Benoit A., Campr P., Carrillo A. H., Bioinformatics and Biomedical Engineering. ICBBE.
Fanard F. X., Akarun L., Caplier A., Rombaut M. and The 2nd International Conference on. pp. 1871-1875.
Sankur B. 2006. Sign language tutoring tool. In:
eNTERFACE, The Summer Workshop on [32] Quan Y., Jinye P. and Yulong L. 2009. Chinese Sign
Multimodal Interfaces, Dubrovnik, Croatia. pp. 23-33. Language Recognition Based on Gray-Level Co-
Occurrence Matrix and Other Multi-features Fusion.
[21] Benoit A. and Caplier A.. 2005. Head nods analysis: Proc. of Industrial Electronics and Applications. 4th
Interpretation of non verbal communication gestures. IEEE Conference. pp. 1569-1572.
In: International Conference on Image Processing,
ICIP, Genova, Italy. 3: 425-428. [33] Wang S., Zhang D., Jia C., Zhang N., Zhou C. and
Zhang L. . 2010. A Sign Language Recognition Based
[22] Antunes D. R., Guimaraes C., Garcia L. S., Oliveira L. on Tensor. Multimedia and Information Technology
and Fernandes S. 2011. A framework to support (MMIT), Second International Conference on. 2: 192-
development of Sign Language human-computer 195, 24-25 April.
interaction: Building tools for effective information
access and inclusion of the deaf. Research Challenges [34] Maraqa M. and Abu-Zaiter R. 2008. Recognition of
in Information Science (RCIS). 5th International Arabic Sign Language (ArSL) using recurrent neural
Conference on. pp. 1-12. networks. Applications of Digital Information and
Web Technologies. ICADIWT 2008. First
[23] Athitsos V., Wang H. and Stefan A. 2010. A database- International Conference on the. pp. 478-481.
based framework for gesture recognition. Personal and
Ubiquitous Computing. pp. 511-526. [35] Kiani Sarkaleh A., Poorahangaryan F., Zanj B. and
Karami A. 2009. A Neural Network based system for
[24] Athitsos V., Neidle C., Sclaroff S., Nash J., Stefan A., Persian sign language recognition. Signal and Image
Quan Yuan and Thangali A. 2008. The American Sign Processing Applications (ICSIPA). IEEE International
Language Lexicon Video Dataset. Computer Vision Conference on. pp. 145-149.
and Pattern Recognition Workshops. CVPRW '08.
IEEE Computer Society Conference on. pp. 1-8.
133
VOL. 9, NO. 2, FEBRUARY 2014 ISSN 1819-6608
ARPN Journal of Engineering and Applied Sciences
©2006-2014 Asian Research Publishing Network (ARPN). All rights reserved.
www.arpnjournals.com
[36] Zhou Y., Yang X., Lin W., Xu Y., and Xu L.. 2011. Information Systems (ICIIS). 6th IEEE International
Hypothesis comparison guided cross validation for Conference on. pp. 169-174.
unsupervised signer adaptation. Multimedia and Expo
(ICME). IEEE International Conference on. pp. 1-4. [48] Mahmoudi F. and Parviz M. 1993. Visual Hand
Tracking Algorithms. Geometric Modeling and
[37] Karami A., Zanj B. and Sarkaleh A. K. 2011. Persian Imaging-New Trends, 2006. pp. 228-232.
sign language (PSL) recognition using wavelet
transform and neural networks. Expert Systems with [49] Ong S. and Ranganath S. 2005. Automatic sign
Applications. 38: 2661-2667. language analysis: a survey and the future beyond
lexical meaning. Pattern Analysis and Machine
[38] Mekala P., Gao Y., Fan J. and Davari A.. 2011. Real- Intelligence. IEEE Transactions on. 27(6): 873- 891.
time sign language recognition based on neural
network architecture. System Theory (SSST). IEEE [50] Mitra S. and Acharya T. 2007. Gesture Recognition:
43rd Southeastern Symposium on. pp. 195-199. A Survey. Systems, Man, and Cybernetics, Part C:
Applications and Reviews. IEEE Transactions on.
[39] Bourennane S. and Fossati C. 2012. Comparison of 37(3): 311-324.
shape descriptors for hand posture recognition in
video. Signal, Image and Video Processing. 6(1): 147- [51] Moni M. A. and Ali A. B. M. S. 2009. HMM based
157. hand gesture recognition: A review on techniques and
approaches. Computer Science and Information
[40] Vanjikumaran S. and Balachandran G. 2011. An Technology. ICCSIT 2009. 2nd IEEE International
automated vision based recognition system for Sri Conference on. pp. 433-437.
Lankan Tamil sign language finger spelling. Advances
in ICT for Emerging Regions (ICTer). International [52] Montiel E., Aguado A. S. and Nixon M. S. 2000. On
Conference on. pp. 39-44. Resolving Ambiguities in Arbitrary-Shape extraction
by the Hough Transform. Proc. of the British Machine
[41] Yang Quan and Peng Jinye. 2008. Application of Vision Conference (BMVC’00).
improved sign language recognition and synthesis
technology in IB. Industrial Electronics and [53] Grobel K. and Assan M. 1997. Isolated sign language
Applications. ICIEA 2008. 3rd IEEE Conference on. recognition using hidden Markov models. Systems,
pp. 1629-1634. Man, and Cybernetics. Computational Cybernetics
and Simulation. IEEE International Conference on. 1:
[42] Nguyen D. B., Enokida S. and Toshiaki E. 2005. Real- 162-167.
Time Hand Tracking and Gesture Recognition
System. IGVIP05 Conference, CICC. pp. 362-368. [54] Nadgeri S. M., Sawarkar S. D. and Gawande A. D.
2010. Hand Gesture Recognition Using CAMSHIFT
[43] Maebatake M., Suzuki I., Nishida M., Horiuchi Y. and Algorithm. Emerging Trends in Engineering and
Kuroiwa S. 2008. Sign Language Recognition Based Technology (ICETET). 3rd International Conference
on Position and Movement Using Multi-Stream on. pp. 37-41.
HMM. 2nd International Symposium on Universal
Communication. pp. 478-481. [55] Aran O. and Akarun L. . 2010. A multi-class
classification strategy for Fisher scores: Application to
[44] Bui T.D., Nguyen L.T. 2007. Recognizing Postures in signer independent sign language recognition. Pattern
Vietnamese Sign Language with MEMS Recognition. 43(5): 1776-1788.
Accelerometers. Sensors Journal, IEEE. 7(5): 707-
712.
134
View publication stats