Hand Gesture Vocalizer For Dumb and Deaf People: SCITECH Nepal September 2019
Hand Gesture Vocalizer For Dumb and Deaf People: SCITECH Nepal September 2019
net/publication/335705845
CITATIONS READS
4 1,163
4 authors, including:
SEE PROFILE
All content following this page was uploaded by Er. Ashish Kr. Jha on 19 November 2022.
I. Introduction
Sanjeev Karki is a student of BE
computer Engineering, Batch 2014 Humans possess the voice capability for
at Nepal Engineering College interaction and communication among each other.
and recently he has received his Unfortunately, not everybody has the capability of
bachelor degree from Pokhara speaking and hearing. Sign language used among the
University. He is currently working community of people who cannot speak or hear as the
as web developer in a prominent software company.
means of communication. Sign language is a gesture
representation that involves simultaneously combining
hand shapes, orientation and movement of the hands,
Ashish Kumar Jha received his arms or body, and facial expressions to express fluently
Master's degree in Computer with a speaker’s thoughts. The people who cannot
Science specialization in
speak makes use of the sign languages to communicate
Networking from Sharda
University in 2017. He has
with other fellow vocally impaired person and even
involved in software development with other normal people who knows the meanings of
and teaching profession since sign languages or an interpreter is needed to translate
2013, currently working as assistant professor in Nepal the meanings of sign languages to other people who can
Engineering College. His research interest includes speak and do not know the meanings of sign languages.
Internet of things, Image Processing and Pattern However, it is not always possible for an individual to
Recognition. be around all the time to interpret the sign languages
- 22 -
Hand Gesture Vocalizer for Dumb and Deaf People
and not everybody can learn the sign languages. Thus, B. Sign Language Recognition System [3] [4]
another alternative is that we can use a computer or a Yang Quan, a Chinese student, defined a Basic
smart phone as a mediator. The computer or a smart Sign Language Recognition system that is able to
phone could take an input from the vocally impaired translate a sequence of signs into the commonly
person and give its textual as well as and audio form of used speech language and vice versa. The sign
output. language/speech bidirectional translation (from
signs to speech and speech to signs) focused on the
II. Related Work Chinese Manual Alphabet where every single sign
A Lexicon Statics Survey of Signed Language in belongs to a single letter from the alphabet. The
Nepal This thesis proposed by Hope M. Hurl but after system was composed of a camera, a video display
his field study about the variety of sign languages used in terminal (VDT), a speaker, a microphone, and
Nepal and its similarity or dissimilarity with the American a keyboard. Two kinds of data were used which
Sign Language (ASL) and British Sign Language (BSL) were; vector of hand gestures and vector of lip
through the collection of world list and interviews. The actions. In order to characterize these vectors, they
results of the world list comparisons showed that there used the Normalized Moment of Inertia (NMI)
are at least three signed languages in the country [1]. In algorithm and Hu moments. The former attempts
more recent years NpSL has been introduced to many to solve translation invariability, size invariability,
Hearing Impaired Associations have undertaken to open anti-gradation distortion, and so on and the latter is
classes for hearing impaired children in many places a set of algebraic invariants that combines regular
where the government does not yet have a school for the moments and is useful to describe the information
hearing impaired. In this course, a research conducted of the image. They do not vary under change of
in the following cities of Nepal – Kathmandu, Surkhet, size, translation, and rotation and have been widely
Jumla, Pokhara, Ghandruk, Dharan, and Rajbiraj used in Pattern Recognition and successfully
where it was discovered that there was one national proven in sign language letter recognition. As said
sign language used by all hearing impaired named the before, they combine the hand gestures recognition
NpSL. The survey further concluded that, there were 3 with the lips movement reader in order to make the
village sign language that were used in the Indian Sign system more accurate. By using a multi-features
Language (ISL), ASL and BSL and the most widely SVMs classifier trained with a linear kernel, the
used technique was the sign writing which is somewhat 30 letters from the Chinese manual alphabet
similar to imitate the object used in spoken language. recognized with average accuracy of 95.55%.
In this area, there has been a lot work done with C. Microcontroller and Sensors Based Gesture
implementation different sign language and Vocalizer [5]
technologies few work are discuss below:
Sensors Based Gesture Vocalizer describes the
design and working of a system that is useful
A. Kinetic sign language translator [2]
for dumb, deaf and blind people to communicate
This project was a result of collaboration, with each another and with the normal people
facilitated by Microsoft Research, between the as well. Gesture Vocalizer is a large-scale multi-
Chinese Academy of Sciences, Beijing Union microcontroller based system that designed to
University, and Microsoft Research Asia, in which facilitate the communication among the dumb,
all the organization has made crucial contributions. deaf and blind communities with the normal
The system understands the gestures of sign people. In the project a data glove is used which
language and converts them to spoken and written can detect almost all the movements of a hand
language and vice versa. This system captures a and microcontroller based system converts some
conversation from both sides, displays the signer specified movements into human recognizable
and renders a written and spoken translation of the voice. The data glove is equipped with two
sign language in real-time. It also takes the spoken types of sensors, which are; the bend sensors
words of non-signer and turns them into accurate, and accelerometers as tilt sensors. This system is
understandable sign language. beneficial for dumb people and their hands will
- 23 -
SCITECH Nepal, Vol. 14, No. 1
speak having worn the gesture, vocalizer data at a flat and steady position gives a resistance value
glove. This project is similar to our proposed near to 25k ohm whereas when the sensors are fully
project but it does not make use of any smart bent they give a resistance value near to 72k ohm.
phone, instead it uses external speakers and a LCD They are usually in the form of a thin strip, are very
display of outputs. comfortable to use, as they are very light in weight,
and are easily bent [8].
Figure 1: Block diagram of Microcontroller and One side of the sensor is printed with a polymer ink
Sensors Based Gesture Vocaliser that has conductive particles embedded on it, and note
that the flex sensors gives a proper change in resistance
D. Hand Gesture Recognition Using KINECT [6] value only when it is bent away from the ink like and
This is an automatic Gesture Recognition System, provides very minimum change in resistance value
which uses Microsoft Kinect for capturing the raw when it is bent in the reverse direction.
image of the gesture. Originally developed for The flex sensor comes in various sizes and made by
gaming, the Kinect’s sensors read a user’s body various companies, and in our research, we have
position and movements and, with the help of a used 10 flex sensors of 2.2 inches each placed on five
computer, Gesture Recognition System translates fingers. Each flex sensor consists of two strips, one is
them into words. This system thus has tremendous a thin strip connected to the high voltage (that can be
potential for understanding the complex gestures 5v or 3.3v) and another is a fat strip consisting of the
and for translating the signs into spoken or written polymer ink connected to ground through a resistor and
words. It check textual character in train data and provides the resistance output as well.
display as output if present. If no match found, then
the system allows the user to store the new gesture By connecting the flex sensor with a static resistor to
and its corresponding textual form. The system create a voltage divider, it produces a variable voltage
allows the user to auto-create a dictionary. The that can read by a microcontroller’s analog-to-digital
main motive of this project is to make interaction converter [9].
possible between an unimpaired and hearing-
impaired individual. The accelerometer can use to measure the static
acceleration of gravity in tilt- sensing applications and
can used to measure dynamic acceleration resulting
III. Hardware Details: Flex Sensors from motion, shock, or vibrations. By measuring the
The flex sensors also referred as the bend sensor amount of static acceleration due to gravity, we can
is a type of sensor that measures the amount of find out the angle the device tilted at with respect
bending or deflection or flexing as the name itself to the earth and by sensing the amount of dynamic
has it. As the sensors are bent the sensors provides us acceleration, you can analyze the way the device is
with an electrical resistance value, the sensor gives so moving. There are many different ways to make an
the more the sensors are bent, more resistance value. accelerometer.
When measured with a multi meter, the flex sensor
- 24 -
Hand Gesture Vocalizer for Dumb and Deaf People
- 25 -
SCITECH Nepal, Vol. 14, No. 1
Random forest prediction pseudo-code: is time – consuming. The model is difficult to interpret
Takes the test features, use the rules of each compared to a decision tree, where you can easily make
randomly created decision tree to predict the a decision by following the path in the tree.
outcome, and stores the predicted outcome (target).
The random forest algorithm could be used for Figure 6: Feature Extraction Diagram
feature engineering.
When a user makes a hand gesture, eight user inputs
It means identifying the most important features from the glove controller has given to the system in
out of the available features from the training which there is 3- axes accelerometer signal and five flex
dataset. sensor signals. Three axes values and five flex sensor
values that has taken from the user acts as the features
for recognizing a particular gesture. These features for
the gesture has passed through random forest classifier,
then the gestures has classified according to the
features, and the output is recognized.
- 26 -
Hand Gesture Vocalizer for Dumb and Deaf People
With the machine, has trained with those dataset, for Step 5: Input: Collect data from the serial port at
the real time application of the trained machine, for real time.
a particular gesture the flex sensor and accelerometer
Step 6: Open new file in write mode.
data were collected from the glove movement and
processed by the Arduino converting those raw data to Step 7: Write the collected data the new file in
meaningful data. These values help in feature extraction CSV format.
process. Those data has sent from the Arduino to the Step 8: Flush the Arduino buffer.
serial port, has collected by the python through the
serial interface and saved as temporary dataset (CSV Step 9: Close the Serial port connection.
format) form. Finally, the temporary dataset passed Step 9: Pass the temporary dataset through the model.
through the machine such that the machine extracts the
Step 10: Trained machine predicts the output.
features of those dataset and predicts the appropriate
output for that gesture with reference to the Random Step 11: Display the prediction on the screen and
Forest Classifier model. As the output has recognized play the audio of the predicted word.
by the model, laptop screen and speaker has used to Step 12: Add 1-second delay
display the output to the user.
Step 13: End
V. Algorithms
VI. Results
A. Algorithm for dataset preparation at Arduino
Mega 2560 Finally, we have obtained a system that can read
the values for a particular gesture done by the user,
Step 1: Start predict the output for the gesture, display it on the laptop
Step 2: Input: For a particular gesture, 8 sensor screen, and provide an audio output via a desktop GUI.
data inputs (5 flex sensors data and 3 accelerometer The dataset for all the alphabets and frequently used
data) words has created. Using those datasets, the machine
has trained and the model created. Then dataset for all
Step 3: Convert raw sensor data to meaningful data
the alphabets combined and shuffled in order to train
Step 4: Set the baud rate the machine with reduced variance and make sure that
Step 5: Send data to the serial port model remains general and over fit less. When the input
has taken for sign language of the alphabet or word,
Step 6: Is Arduino buffer flushed? the input passed through the trained machine and the
If yes, go to step 7 closest value predicted display and the audio played
as output. Then the correlation plot of each alphabet
Else, wait for buffer to flush then continue was compare with other alphabet to check the level
Step 7: Add delay of 10 milliseconds. of closeness of alphabets with each other and predict
the accuracy of the model. The accuracy for our model
Step 8: Go to step 2.
founded to be 96.8 %.
B. Algorithm for real time application
Step 1: Start
Step 2: Train the model using the training dataset.
Step 3: Set the baud rate similar to Arduino serial
baud rate and open the serial connection.
Step 4: Is there data at serial port?
If no, wait till data available then continue.
Else, go to step 5.
Figure 6: Glove Controller
- 27 -
SCITECH Nepal, Vol. 14, No. 1
A AND
B NO
C EAT
D I NEED
TO GO TO
WASHRO
OM
E I
Figure: Dataset Values for Word Hello
OKAY STOP
HELLO THANK
YOU
CAN BYE
VII. Conclusion
Some Dataset Values for Some Gestures of our dataset
Smart gloves which take input from the users’
gestures and gives a textual output. It helps the
mute people to communicate with one another and
with the normal people. The mute people use their
standard sign language to communicate which is not
easily understandable by common people without the
knowledge of the sign language. Thus, this system
converts the sign language into audio, pictorial and
into text form, improving their life in a significant
way Image processing can be applied to take in new
gestures from the user themselves and store it in the
dataset. Make proper, well finished and a wireless
wearable technology to bridge the communication gap
with accurate and precise result.
Figure: Dataset Values for alphabet A
- 28 -
Hand Gesture Vocalizer for Dumb and Deaf People
- 29 -