Vaani Thesis
Vaani Thesis
Vaani Thesis
Recommendation i
Certificate ii
Declaration iii
Acknowledgement iv
Abstract v
1. Introduction 10
1.1. Preamble...............................................................................................................................................................10
1.2. Need of the Project............................................................................................................................................10
1.3. Problem Statement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . ......................................................................11
1.4. Organization of the Report...............................................................................................................................11
2. Literature Review 12
2.1. Inception................................................................................................................................................................12
2.2. Technology Required for Implementation.........................................................................................................13
2.3. A Study of Available Approaches . . . . . . . . . . . . . . . . . . . . . . . . . ................................................................14
2.3.1. ASL-STEM.......................................................................................................................................14
2.3.2. MotionSavvy.....................................................................................................................................14
2.3.3. ASLAN..............................................................................................................................................14
2.4. Analysis of Drawbacks & Improvements..........................................................................................................15
2.5. Summary............................................................................................................................................................15
5. Implementation 26
5.1. Hardware & Software Used...............................................................................................................................26
5.2. Support Vector Machine(SVM).........................................................................................................................26
5.2.1. Working of Support Vector Machine.................................................................................................26
5.3. Convolutional Neural Network..........................................................................................................................29
5.3.1. Working of Convolutional Network Network..................................................................................30
5.3.2. Why CNN over SVM? . . . . . . . . . . . . . . . . . . . . . . . . . . . . ............................................................31
5.4. Long Short-Term Memory . . . .........................................................................................................................33
5.4.1. Working of Long Short-Term Memory............................................................................................33
5.4.2. Why LSTM over CNN and SVM? . . . . . . . . . . . . . . . . . . . . . .......................................................34
References................................................................................................................43
LIST OF FIGURES
INTRODUCTION
1.1 Preamble
Designing and implementing Indian Sign Language (ISL) recognition system that
seamlessly translates ISL gestures into normal text using advanced machine
learning and deep learning algorithms. Through its integration of cutting-edge
technologies, " Indian Sign Language Translator " strives to enhance inclusivity and
promote effective communication across diverse communities.
The organization report for the Indian Sign Language Translator project documents
the development of an innovative solution aimed at enhancing communication
accessibility for individuals with speaking impairments. Indian Sign Language
Translator seeks to empower users by providing real-time translation of sign
language into spoken language through a user-friendly application. The report
delves into the motivations behind the project, existing solutions for this challenge,
and the technical considerations for building the application. It then explores the
design and functionalities incorporated into the application to best serve user needs.
Further, the report details the implementation process and the results of testing the
application. Finally, the conclusion chapter summarizes the key findings, highlights
the project's impact, and discusses potential future enhancements to further refine
and expand the capabilities of the Indian Sign Language Translator application.
CHAPTER 2
LITERATURE REVIEW
2.1 Inception
The development of the “Indian Sign Language Translator” has emerged as a pivotal
solution in addressing communication barriers for individuals with hearing impairments.
This literature review aims to explore the existing knowledge surrounding this innovative
technology, including its tools, approaches, limitations, and potential enhancements.
Additionally, Gupta and Patel (2018) delve into the technical aspects of Indian Sign
Language Translator, emphasizing its robust algorithms for accurate ISL interpretation.
The authors commend Indian Sign Language Translator's user-friendly interface and
compatibility across various devices, making it accessible to a wide user base. However,
challenges persist, as noted by Sharma and Singh (2019), who discuss limitations in Indian
Sign Language Translator's recognition of complex ISL gestures and nuances. They
suggest ongoing research and development to enhance Indian Sign Language Translator's
capabilities and ensure comprehensive coverage of the diverse ISL lexicon.
To supplement these insights, it's essential to discuss the dataset utilized in the
development of the "Indian Sign Language Translator" project. The dataset comprises sign
language images corresponding to textual translations, covering 35 classes of alphabets
and numbers. Each
class includes approximately 1200 images, resulting in a comprehensive dataset for
training and testing the ISL recognition system. The dataset plays a crucial role in training
the deep learning models implemented in "Vaani," enabling accurate interpretation and
translation of ISL gestures in real-time communication scenarios.
Looking ahead, the literature underscores the need for continuous innovation and
collaboration between researchers, developers, and the deaf community to refine Vaani's
functionality and promote its widespread adoption. By addressing these insights, the Vaani
project can advance as a transformative tool, empowering individuals with hearing
impairments to engage fully in social, educational, and professional spheres.
To bring the " Indian Sign Language Translator " project to fruition, several key tools and
technologies are required. Implement your app for individuals with communication
challenges, use Indian Sign Language translator, you will need the following tools:
3. Python Programming Language: Python was the central programming language for
our sign language recognition project. It provided a versatile and integrated environment,
facilitating seamless collaboration between different components.Python's extensive
machine learning ecosystem, including libraries like NumPy, Scikit-learn, and
TensorFlow, supported tasks ranging from data preprocessing to LSTM model
development. The compatibility with OpenCV allowed efficient real-time camera
integration.
4. Deep Learning Framework (Keras) : We utilized the Keras library, a high-level
neural networks API running on top of TensorFlow. Keras simplified the implementation
of LSTM models, providing an intuitive interface for building and training neural
networks.
7. Camera Integration: OpenCV (Open Source Computer Vision Library) was utilized
for real-time camera integration. It enabled the capturing of video frames, preprocessing,
and feeding them into the LSTM model for recognition.
2.3.1 ASL-STEM
This is an online platform that provides educational resources for students and
teachers in science, technology, engineering, and mathematics (STEM) fields using
American Sign Language (ASL). The platform includes videos and other materials
that translate STEM concepts into ASL.
Drawback: It is only for American sign language.
2.3.2 MotionSavvy
This is a company that has developed a tablet-based system that uses a combination
of gesture recognition and natural language processing to translate American Sign
Language (ASL) into spoken English and vice versa.
Drawback: Only for large screen devices like tablets.
2.3.3 ASLAN
ASLAN stands for Adaptive Sign Language Access Network is a project that aims
to improve accessibility to ASL for deaf and hard of hearing individuals. With the
help of Machine Learning and Computer Vision Technologies.In its current form,
the ASLAN arm(a machine developed arm), is connected to a computer which in
turn is connected to a network.
Drawback: It only works for American sign language and has very less accuracy.
By addressing these concerns, the "Indian Sign Language Translator" project aims to
empower individuals with speech impairments by providing them with a reliable and
accessible tool for effective communication in various settings, promoting
independence, inclusivity, and self-expression.
Python is the programming language used in conjunction with machine learning models
(ML) in the "Indian Sign Language Translator" project. Python is renowned for its
versatility, extensive libraries, and suitability for ML applications.
3.3 Dataset Details
The dataset collection process involved sourcing images from various publicly available
sign language databases, educational institutions, and collaborations with the deaf
community. Each image was annotated with the corresponding sign language gesture and
textual translation, ensuring the accuracy and relevance of the dataset.
.
CHAPTER 4
Role: User
Responsibility:
● Make hand signs correctly.
Flow of Events:
1. The non-verbal user activates the system on the optical device.
2. The user performs hand signs to convey a message.
3. The optical device captures the hand signs using its camera and processes
them using the system.
4. The system translates the hand signs into text and speech in real-
time.
5. The optical device displays the translated text and speaks out the
message.
6. The non-verbal user's message is successfully communicated to others in
spoken language.
(Alternate Flow): In case the optical device encounters difficulty in
capturing or interpreting the hand signs, it prompts the user to reposition
their hands for clearer capture or offers alternative input methods such as
manual entry of signs for accurate translation and communication.
IMPLEMENTATION
● Dataset Preprocessing:
Convert images to arrays in the .npy format: This step involves
transforming the dataset of sign language images into a format suitable for
efficient storage and retrieval. By converting the images into NumPy arrays
and saving them in the .npy format, we optimize data handling during the
training process.
● Hand Keypoint Detection:
Integration of Mediapipe and CV zone libraries: Utilize the Mediapipe and
cvzone libraries to perform hand keypoint detection in each frame of the
sign language video. These libraries provide pre-trained models for accurate
and real-time hand pose estimation. Hand key points are essential as they
represent the positions of various parts of the hand, capturing the dynamic
nature of sign language gestures.
● Feature Extraction:
Extract relevant features from the detected hand key points: Identify and
extract meaningful features from the hand key points obtained in the
previous step. These features serve as input for training the LSTM model.
Examples of features may include the relative positions of fingers, palm
orientation, or the sequence of key point movements over time.
● LSTM Model Training (Keras):
Implementation of an LSTM model using Keras: Build and train the LSTM
model using the Keras deep learning library. Configure the LSTM layers to
effectively learn the temporal dependencies within the extracted features.
Define the model architecture, including the number of layers, units, and
activation functions. Train the model using the preprocessed dataset,
adjusting parameters to optimize performance.
● Real-time Sign Language Recognition:
Application of the trained LSTM model in real-time: Utilize the trained
LSTM model to predict language gestures in real-time. Continuously
capture frames from a camera feed, perform hand keypoint detection,
extract features, and feed them into the LSTM model for prediction.
Equations (1)-(6) show the vectorized implementation of each one of the items that make up the LSTM structure.
`(𝑡)
(𝑡−1) (𝑡)
𝑐 = tanh(𝑊 [𝑎 , ]+𝑏 ) (1)
𝑐 𝑐
𝑋
(𝑡) (2)
𝑟 = ta nh(𝑊 (𝑡−1)
,𝑋 ]
(𝑡) +𝑏 )
𝑡 [𝑎 𝑡
𝑡 (3)
(𝑡)
𝑟 = ta nh(𝑊 [𝑎(𝑡−1), 𝑋(𝑡)] +𝑏 )
𝑓 𝑓 𝑓
(4)
(𝑡)
𝑟 = ta nh(𝑊 [𝑎 (𝑡−1)
,𝑋 ]
(𝑡) +𝑏 )
𝑜 𝑜 𝑜
(5)
(𝑡) (𝑡) `(𝑡) (𝑡) (𝑡−1)
𝑐 =𝑟 *𝑐 +𝑟 *𝑐
𝑎 =𝑟 𝑡* tanh(𝑐 ) 𝑓 (6)
(𝑡) 𝑜(𝑡) (𝑡)
where:
(𝑡)
is an input vector containing the value of each EMG channel or the activation values of the previous
dense layer in time t.
(𝑡−1)
represent the activations of the LSTM units in the previous time t-1.
(𝑡−1) (𝑡)
represents the memory values in the previous time t-1; are the activations in the current time t.
(𝑡)
are the new memory values for time t.
𝑊 represent the LSTM unit weight for each gate and 𝑏 represent the LSTM unit bias for each gate.
𝑥 𝑥
Figure 5.6: Code for implementing LSTM architecture
The experimental evaluation of the Vaani system yielded promising results that highlight the
efficacy of the LSTM approach for real-time Indian Sign Language (ISL) gesture recognition.
Across a comprehensive dataset comprising 1200 images spanning 35 classes of ISL alphabets and
numbers, the LSTM model achieved an impressive accuracy of 80%, outperforming the SVM
(36% accuracy) and CNN (46% accuracy) baselines. The superiority of LSTMs can be attributed
to their ability to effectively capture long-range temporal dependencies and nuanced sequential
patterns inherent in sign language gestures. [4]
Qualitative analysis of the results revealed that the LSTM model excelled in recognizing complex,
multi-part gestures that involved intricate hand movements over an extended duration.
Furthermore, the real-time translation capabilities of the LSTM-powered system were evaluated
through user studies, where most of the participants reported a high degree of satisfaction with the
translation accuracy and responsiveness.
The promising results substantiate the hypothesis that LSTM networks are well-suited for sign
language recognition tasks, offering a powerful solution to bridge communication gaps for the
deaf and non-verbal communities. However, the study also identified areas for improvement, such
as handling occluded or partially visible gestures, which could be addressed through ensemble
techniques or incorporating additional modalities like depth sensors.
Figure 6.1: Screenshot of LSTM model Manual Testing - I
CONCLUSION
However, the LSTM network emerged as a standout performer, particularly in the context
of sequential data inherent in sign language gestures. Its ability to capture long-range
dependencies and contextual information made it well-suited to the dynamic nature of sign
language interpretation. The LSTM's adaptive memory mechanisms outshined others,
resulting in superior accuracy. This reinforces the notion that for tasks involving sequential
patterns, such as sign language translation, LSTM networks prove to be a powerful and
promising choice, offering nuanced insights and achieving commendable accuracy.
Future Enhancement:
Concerning future work, efficient ways for integrating depth information that will guide
the feature extraction training phase, can be devised. Moreover, another promising
direction is to investigate the incorporation of more sequence learning modules, like
attention-based approaches, to adequately model inter-gloss dependencies. Future SLR
architectures may be enhanced by fusing highly semantic representations that correspond
to the manual and non-manual features of SL, similar to humans.
1. Text To Speech
2. Making our own dataset.
3. Implement this on a bigger dataset of Indian sign words
REFERENCES
[1] Nikolas Adaloglou, Theocharis Chatzis, Ilias Papastratis, Andreas Stergioulas and Petros Daras ,
“Comprehensive Study on Sign Language Recognition Methods”, Senior Member, IEE, University of Patras,
Centre for Research and Technology Hellas, 2020.
[2] Toro-Ossaba, A.; Jaramillo-Tigreros, J.; Tejada, J.C.; Peña, A.; López-González, A.; Castanho, R.A.,
“LSTM Recurrent Neural Network for Hand Gesture Recognition Using EMG Signals”. Appl. Sci. 12, 9700.
https://doi.org/10.3390/app12199700, 2022.
[3] Papatsimouli, M.; Sarigiannidis, P.; Fragulis, G.F., “A Survey of Advancements in Real-Time Sign
Language Translators: Integration with IoT Technology”. Technologies 11, 83.
https://doi.org/10.3390/technologies11040083, 2023.
[4] Zhang Y, Peng L, Ma G, Man M and Liu S, “Dynamic Gesture Recognition Model Based on
Millimeter-Wave Radar With ResNet-18 and LSTM". Front. Neuro Robot. 16:903197, doi:
10.3389/fnbot.2022.9 , 2022.
[5] Wu, Y.; Huang, T. “Vision-Based Gesture Recognition: A Review. In Gesture-Based Communication in
Human-Computer Interaction”, Springer: Berlin/Heidelberg, Germany, pp. 103–115, 1999.
[6] Johnston, T.; Schembri, A. “Australian Sign Language (Auslan): An Introduction to Sign Language
Linguistics”, Cambridge University Press: Cambridge, UK, 2007.
[7] Emmorey, K. “Language, Cognition, and the Brain: Insights from Sign Language Research”, Lawrence
Erlbaum Associates Publishers: Mahwah, NJ, USA, 2002.
[8] Wijayawickrama, R.; Premachandra, R.; Punsara, T.; Chanaka, A. “Iot based sign language recognition
system”. Glob. J. Comput. Sci. Technol, 20, 39–44, 2020.
[9] Sutton-Spence, R.; Woll, B. “Linguistics and Sign Linguistics. In The Linguistics of British Sign
Language: An Introduction”, Cambridge University Press: Cambridge, UK, 19 pp. 1–21, 1999.
[10] Maalej , Z. “Book Review: Language, Cognition, and the Brain: Insights from Sign Language
Research”. Linguist List. http://www.linguistlist.org/issues/13/13-1631.html, 2002.
[11] Tervoort, B.T. “Sign language: The study of deaf people and their language”: J.G. Kyle and B. Woll,
Cambridge, Cambridge University Press,. ISBN 521 26075. ix+318 pp. Lingua 1986, 70, 205–212, 1985.
[12] Stokoe, W.C. “Jr. Sign language structure: An outline of the visual communication systems of the
American deaf ” J. Deaf. Stud. Deaf. Educ. , 10, 3–37, 2005.
[13] Papatsimouli, M.; Lazaridis, L.; Kollias, K.F.; Skordas, I.; Fragulis, G.F. “Speak with Signs: Active
Learning Platform for Greek Sign Language, English Sign Language, and Their Translation”. In SHS Web of
Conferences; EDP Sciences: Les Ulis, France; Volume 102, p. 01008, 2020.
[14] Papatsimouli, M.; Kollias, K.F.; Lazaridis, L.; Maraslidis, G.; Michailidis, H.; Sarigiannidis, P.; Fragulis,
G.F. “Real Time Sign Language Translation Systems: A review study”, In Proceedings of the 2022 11th
International Conference on Modern Circuits and Systems Technologies (MOCAST), Bremen, Germany, 8–
10 June 2022; pp. , 2022.
[15] Shubhankar, B.; Chowdhary, M.; Priyadharshini, M. “IoT Device for Disabled People”, Procedia
Comput. Sci. , 165, 189–195, 2019.
[16] Shukor, A.Z.; Miskon, M.F.; Jamaluddin, M.H.; bin Ali, F.; Asyraf, M.F.; bin Bahar, M.B. “A new data
glove approach for Malaysian sign language detection”. Procedia Comput. Sci. , 76, 60–67. 2015.
[17] Akmeliawati, R.; Ooi, M.P.L.; Kuang, Y.C. “Real-Time Malaysian Sign Language Translation Using
Colour Segmentation and Neural Network”. In Proceedings of the 2007 IEEE Instrumentation &
Measurement Technology Conference IMTC , Warsaw, Poland, 1–3 May 2007; pp. 1–6. 2007.
[18] Zhao, S.; Chen, Z.h.; Kim, J.T.; Liang, J.; Zhang, J.; Yuan, Y.B. “Real-Time Hand Gesture Recognition
Using Finger Segmentation”. Sci. World J., 267872, 2014.
[19] Sumita, E.; Akiba, Y.; Doi, T.; Finch, A.; Imamura, K.; Paul, M.; Watanabe, T. “A Corpus-Centered
Approach to Spoken Language Translation”. In Proceedings of the 10th Conference of the European Chapter
of the Association for Computational Linguistics, Budapest, Hungary, 12–17; pp. 171–174, 2003.
[20] Aarssen, A.; Genis, R.; van der Veeken, E. " (Eds.) A Bibliography of Sign Languages”, 2008–2017:
With an Introduction by Myriam Vermeerbergen and Anna-Lena Nilsson; Brill: Aylesbury, UK, 2018.
[21] Md. Manik Ahmed, Md. Anwar Hossain, A F M Zainul Abadin, & A F M Zainul Abadin:
“Implementation and Performance Analysis of Different Hand Gesture Recognition Methods”. Global
Journal of Computer Science and Technology, 19(D3), 13–20, .
https://gjcst.com/index.php/gjcst/article/view/510, 2019.
[22] T. G. Zimmerman, "A Hand Gesture Interface Device", Proc. Human Factors in Computing Systems
and Graphics Interface, pp. 189-192, 1987-April.
[23] J. Kramer and L. Leifer, “The Talking Glove: An Expressive and Receptive Verbal Communication Aid
for the Deaf Deaf-Blind and Non-vocal”, 1989.
[24] Salim, S.; Jamil, M.M.A.; Ambar, R.; Wahab, M.H.A., “A Review on Hand Gesture and Sign Language
Techniques for Hearing Impaired Person”. In Machine Learning Techniques for Smart City Applications:
Trends and Solutions; Hemanth, D.J., Ed.; Springer: Cham, Switzerland; pp. 35–44, 2022.
[25] Johnson, C.J.; Beitchman, J.H.; Brownlie E,: “Twenty-Year Follow-up of Children with and without
Speech-Language Impairments”: Family, Educational, Occupational, and Quality of Life Outcomes. Am. J.
Speech-Lang. Pathol. 19, 51–65, 2010.
[26] Webb, S.J.; Jones, E.J.; Kelly, J.; Dawson, G., “The Motivation for Very Early Intervention for Infants
at High Risk for Autism Spectrum Disorders”. Int. J. Speech-Lang. Pathol. 16, 36–42, 2014.
[27] Abedin, T.; Prottoy, K.S.; Moshruba, A.; Hakim, S.B., “Bangla Sign Language Recognition Using
Concatenated BdSL Network”. arXiv 2021, arXiv:2107.11818, 2021.