Automated Service Assistances To The Visually Impaired People Using Android Application

Automated service assistances to the visually

impaired people using android application

S. Hemalatha
Computer Science and Engineering, Panimalar Institute of Technology, Chennai, India
Nripendra Narayan Das
Department of Information Technology, School of Computing and Information Technology,
Manipal University Jaipur, Jaipur, India
Jayanthy Ramasamy
School of Computer Science and Engineering, Vellore Institute of Technology, Vellore, India
Suman Madan
Department of Information Technology, Jagan Institute of Management Studies, Delhi, India, and
P.C. Senthil Mahesh
Department of CSE, Excel Engineering College, Komarapalayam, India

Purpose – Internet of Things (IoT) involves connecting physical objects to the internet to provide opportunities to build smart systems or
applications. IoT paradigm assumes many devices connected over a conventional intent network. These devices usually have restricted
resources, so moving part of the service implementation to a cloud infrastructure is a prominent solution. This study aims to proposes in
this project human voice as a potential interface for one or more devices in IoT ecosystem enabling issuing commands and receiving
Design/methodology/approach – System design is the process of defining the elements of a system such as the architecture, modules and
components, the different interfaces of those components and the data that goes through that system. It is meant to satisfy specific needs and
requirements of a business or organization through the engineering of a coherent and well-running system.
Findings – The main aim of this proposed work is to develop a ticket booking application that performs all the operations by speech recognition.
Hence, visually impaired people can make use of this application. There are several applications that help visually impaired people. This application
adds extra features to those available soft wares. Using this, visually impaired people can book the tickets without the help of personal assistants.
For future research, this study hopes to extend this application to perform various other operations that will help visually impaired people to do their
daily activities like normal people without the help of personal assistants. For example, making a phone call, sending text messages, booking a taxi,
easy navigation, etc.
Originality/value – System design involves the identification of classes, their relationship as well as their collaboration. In objector, classes are
divided into entity classes and control classes.
Keywords Internet of Things, Mobile application development, Visually impaired, Ticket booking
Paper type Research paper

1. Introduction optical and neural factors such as the sharpness of the

retinal focus within the eye, the health and functioning of
Visually impaired people face several problems for doing the retina and sensitivity of the interpretative faculty of the
their regular daily activities such as reading, driving and brain.
walking. There are several problems of vision; they are loss Visual impairment is due to damage in the eyes or the failure
of visual acuity, loss of visual field – inability of a person to of the brain to receive and read the visual cues sent by the eyes.
be as wide as the normal person, photophobia – inability of The common cause of visual impairment is diabetic
the person to look at light, diplopia – double vision, visual retinopathy, age-related macular degeneration, formation of
distortion and visual perceptual difficulties. Visual acuity cataracts and increased pressure within the eyes, which causes
means the clarity of vision. Visual acuity depends on glaucoma. Visual impairment has many forms and varying
such as reading or searching for any information from the 1.1 Objective
Web. To overcome this problem, we can use screen  Embedded computers are indeed the foundation of
readers, but it takes more time to read the information. modern technology. Every modern electrical appliance
Screen reader transmits the text displayed on the has some processing unit; hence, it is a computer-based
computer into a form that visually impaired people can system.
understand, such as tactile or auditory. Most of the screen
readers do not help blind people to navigate between
different Web pages, but if some additional features are Figure 2 First page
added, then the user can work independently on a
computer. Some screen readers have the synthetic voice
that reads the text on the screen loudly, and the others
have the braille display. Such screen readers use crystals
that can expand when exposed to certain particular
voltage levels, which helps blind people to read the text
using their fingers. Screen reading hardwires are much
In this paper, we are using speech recognition for
booking tickets. Speech recognition is the method of
developing technologies for converting speech into text.
Speech recognition was dominated by the approaches
such as the hidden Markov model (HMM) along with
artificial neural networks (ANNs). Today, many aspects
of speech recognition have been taken over by a deep
learning method called long short-term memory
(LSTM). LSTM can store the values for both long and
short periods, which is achieved by using the identity
activation function for the memory cell. LSTM is used
to identify the time series or time lag and duration of an
important event.

Figure 3 Shake and speak

Figure 1 Architecture diagram

 Computer technology is moving from embedded devices events. It describes a recurrent neural network model for statistical
to pervasive or ubiquitous computing, which means that script learning using LSTM, an architecture that has been
we have more and more computers that are embedded demonstrated to work well on a range of artificial intelligence tasks.
everywhere and in everything. They evaluated the system on two tasks, inferring held-out events
 With the advancement of internet in both coverage and from text and inferring novel events from text, substantially
speed, such computers are often having connecting outperforming prior approaches on both tasks. They compared to
capabilities forming what is now popularly called Internet a number of baselines, including the previous best-published
of Things (IoT). IoT paradigm assumes many devices system, on the tasks of inferring both held-out and novel events,
connected over the conventional internet network. demonstrating substantial improvements.
 These devices usually have restricted resources so moving part Gaikwad et al. (2010) presented a review on speech
of the service implementation to a cloud infrastructure is a recognition. They stated that speech is the most prominent and
prominent solution. On the other hand, having to interface primary mode of communication among human beings.
with many devices could be very cumbersome. Speech has the potential to be an important mode of interaction
 For IoT deployment in everyday life, devices need to be with the computer. This paper gives an overview of major
designed ergonomically. The main objective of this project technological perspectives and appreciation of the fundamental
is to use the human voice as a potential interface for one or progress of speech recognition and also gives an overview
more devices in IoT ecosystem enabling issuing technique developed in each stage of speech recognition. This
commands and receiving information via voice messages. paper helps in choosing the technique along with its relative
merits and demerits. A comparative study of different
techniques is done as per stages. This paper concludes with the
2. Literature survey decision on feature direction for developing techniques in
Pichotta and Mooney (2016) illustrate an approach to learning human–computer interface systems. Through this review, it is
statistical scripts with LSTM recurrent neural networks. In this found that MFCC is used widely for feature extraction of
paper, scripts encode knowledge of prototypical sequences of speech and general health maintenance and HMM is best
among all modeling technique.
Figure 4 Voice ticket confirming
Figure 5 Ticket verification
Arora et al. (2014) explained the common problems faced by especially in rural areas where they hold religious myths against
visually impaired people. Visual impairment has long been blindness.
treated as a deterrent to normal functioning in human beings, Preeti and Parneet (2013) state about automatic speech
especially in the participation and economic productivity recognition. In this paper, they describe the accuracy of
domains. The visually impaired faced difficulties in ambulation automatic speech recognition (ASR) remains one of the most
on pavement because of uneven surfaces, open manholes, important research challenges e.g. speaker and language
parked cars, vendors, etc. Approach the government to variability, vocabulary size and domain, noise. The design of a
improve infrastructure in terms of the quality of roads. speech recognition system requires careful attention to the
Footpaths or pavement should be wide enough. Provide challenges or issues such as various types of speech classes,
railings in between the roads and pavements. Awareness can be speech representation, feature extraction techniques, database
achieved by showing short films of problems faced by blinds, and performance evaluation. This paper presents a study of
posters, etc., so that more and more people are aware, basic approaches to speech recognition, and their results show

Figure 6 Cloud data

better accuracy. This paper also presents what research has Ghosal et al. (2015) presented a paper, Android Application
been done around for dealing with the problem of ASR. for Ticket Booking and Ticket Checking in Suburban Railways.
Srivastava (2014) presented a paper, Speech Recognition using This paper deals with the development and implementation of
Artificial Neural Network. This paper uses neural network (NN) a smartphone application that is more effective than the current
and mel-frequency cepstrum coefficients (MFCC) for speech ticketing system. The “Android Suburban Ticket (ASR)” can
recognition. MFCC is used for the feature extraction of be bought easily anytime, anywhere, and the ticket will be
speech as it generates the training vectors by transforming present in the customer’s phone in the form of “Quick
speech signals into the frequency domain. ANN solves the Response Code”. Global positioning system facility is used for
problem by self-learning and is a computational model for validation of the ticket at the source and deletion at the
information processing that combines artificial neurons to destination. The information for each user is stored in a cloud
process information. The weights of an artificial neuron are database for security purposes, which is unavailable in the
adjusted for getting a particular output from the particular current suburban railway system. Also, the ticket checker is
input. A total of 120 samples were recorded in MATLAB with provided with an application to search for the user’s ticket with
the sampling frequency 44,100 Hz, where 60 samples were the ticket number in the cloud. The platform-independent
used for training sets and 60 for testing sets. For training, the language java is used for the implementation. Along with that
data train algorithm is used. The function melcepst from the structured queryLite and cloud database are used as databases
VOICEBOX speech processing toolbox is used for calculating for the user and ticket information, respectively. Hypertext
the MFCC coefficients. Finally, NN toolbox of MATLAB was preprocessor (PHP) is also used as a development framework.
used to create, train and simulate the networks and mean Singh and Bathla (2013) presented a paper, A Survey on
square error was used to evaluate its performance. Speech Recognition. This paper gives an overview of the speech
Yadav et al. (2014) presented a paper, Online Reservation recognition system and its recent progress. The primary
System Using quick response (QR) Code based Android Application objective of this paper is to compare and summarize some of the
System. This paper presents the new seat allocation system using well-known methods used in various stages of the speech
a QR code image that contains information about tickets and recognition system. Speech is the most common mode of
passengers in the form of a 2d image, which reduces the time of communication among humans. The communication among
scanning. The main idea of this research paper is to make the
human–computer interaction is called human–computer
journey of waiting list passengers more convenient on the Indian
interface. Speech recognition is the process of the computer
Railway. Wireless standards are used for connectivity between
identifying human speech to generate a string of words or
Hereditary Hemorrhagic Telangiectasia (HHT) and direct
commands. The output of speech recognition systems can be
selling agent (DSA) servers by which authentication is provided
applied in various fields. There are many artificial intelligent
to every ticket. The automatic up-gradation procedure of the
techniques available for ASR development. The performance
DSA server makes it is possible to make the reservation when the
of the ASR system based on the adopted feature extraction
train is running and also provides transparency in berth or seat
technique and the speech recognition approach for the
booking either through online or from the counter. QR code is
particular language is compared in this paper.
scanned by HHT devices which encode the uniform resource
locator by check-in-process and redirects to PRS Legislative
Research server and fetches the stored data to verify the Figure 7 Data logs calculation
passenger. Check-in-process updates the information of all
passengers and lets the DSA server to make the seat reserved or
vacant. DSA server allots the seats of absent passengers to the
waitlist. The check-out process provides the passenger to break
his journey. The booking interface provides the capability to book
the ticket for passengers on board.
Vrinda and Shekhar (2013) presented a paper, Speech
recognition system for English language. This paper presents an
overview of speech recognition technology. It gives a description
of how speech recognition systems work and the level of accuracy
that can be expected. The main idea behind this paper is to
develop a speech recognition system for physically challenged
people who cannot operate the computer through a keyboard and
mouse. In this paper, HMM is used to recognize speech samples
to give excellent results for isolated words. This paper provides
the study of how to trap human voice in a digital computer and
decode it into corresponding text, that is, converting speech to
text. This project can be used on a very large scale with very little
modifications. During the experiment work, medium size
vocabulary system was implemented. The system can be
extended to continuous word recognition with a large vocabulary
based on a phone acoustic model, using the HMM technique or
using other growing techniques like ANN.
Anusuya and Katti (2009) gave a brief description of ASR. and has created a technological impact on society, and is
After years of research and development, the accuracy of ASR expected to flourish further in this area of human–machine
remains one of the important research challenges. The design interaction.
of speech recognition system requires careful attention to the
following issues: definition of various types of speech classes,
3. System design
speech representation and feature extraction. The problems
that are existing in ASR and the various techniques to solve System design is the process of defining the elements of a
these problems constructed by various research workers have system such as the architecture, modules and components, the
been presented in chronological order. The objective of this different interfaces of those components and the data that goes
paper is to summarize and compare some of the well-known through that system. It is meant to satisfy specific needs and
methods used in various stages of speech recognition systems requirements of a business or organization through the
and identify research topics and applications which are at the engineering of a coherent and well running system. System
forefront of this exciting and challenging field. Speech design involves the identification of classes, their relationship as
recognition has attracted scientists as an important discipline well as their collaboration. In objector, classes are divided into

Figure 8 Train booking

entity classes and control classes. The computer-aided software 3.2.1 Android-based voice-enabled ticketing system using Google
engineering (CASE) tools that are available commercially do speech API
not provide any assistance in this transition. CASE tools take Google cloud speech API enables developers to convert audio to
advantage of metamodeling that is helpful only after the text by applying powerful NN models in an easy-to-use API.
construction of the class diagram shown in Figure 1. The API recognizes over 110 languages and variants to support
global users. You can transcribe the text of users dictating to an
3.1 System overall architecture application’s microphone, enable command-and-control
The overall architecture diagram is in Figure 1. through voice or transcribe audio files, among many other use
cases. Recognize audio uploaded in the request, and integrate
3.2 Modules with your audio storage on Google cloud storage by using the
There are three modules that are mainly used for better same technology Google uses to power its own products. speech
performance of the system. API can stream text results, returning partial recognition results
The three modules are as follows: as they become available, with the recognized text appearing
1 Android-based voice-enabled ticketing system using immediately while speaking. Alternatively, speech API can
Google speech application program interface (API). return recognized text from audio stored in a file. ASR is
2 Data analytics through LSTM using Python and powered by deep learning neural networking to power your
TensorFlow. applications like voice search or speech transcription. In our
3 PHP-based cloud communication to interface IoT board. project, this module performs the following functions. When

Figure 9 TensorFlow
the user sends the voice input and out put (I/O) request (speaks) increasingly with the aid of specialized systems and software.
to the developed android application, the application sends the DA technologies and techniques are widely used in commercial
request to the speech recognition process to recognize the industries to enable organizations to make more-informed
particular voice. The recognized voice is sent to the cloud business decisions and by scientists and researchers to verify or
communication as the training logs, which sends back the disprove scientific models, theories and hypotheses.
predicted logs to the speech recognition. After successful TensorFlow is based on graph-based computation. It is an
recognition, it sends the response back to the android alternative way of conceptualizing mathematical calculations.
application. The android application sends the available menu The output of the hidden layer in a recurrent NN is passed
to the screen reader to perform the text-to-speech operation, through a conceptual delay block and then is fed back into
which reads the contents on the screen. The android app sends itself. Recurrent NNs are very flexible, but the main problem in
the voice I/O response to the user. The screen readers also recurrent NN is the vanishing gradient problem. For recurrent
communicate with the cloud. After successful ticket booking, NNs, we need long memories, so the network can connect data
the generated ticket is stored in the device data store. Users can relationships at significant distances in time. The most popular
access the ticket from the device data store.
way of dealing with this issue is by using LSTM networks. The
3.2.2 Data analytics through long short-term memory using Python way it does so is by creating an internal memory state that is
and TensorFlow simply added to the processed input that greatly reduces the
Data analytics (DA) is the process of examining data sets to multiplicative effect of small gradients. The time dependence
draw conclusions about the information they contain, and effects of previous inputs are controlled by an interesting

Figure 10 Histograms
concept called a forget gate that determines which states are visually impaired people. This application adds extra features
remembered or forgotten. Two other gates, the input gate and to those available soft wares. Using this, visually impaired
output gate, are also featured in LSTM cells. people can book the tickets without the help of personal
In this project, this module performs the following functions. assistants.
Testing and training the data sets is performed by the google For future research, we hope to extend this application to
speech recognition process by collecting the audio stream perform various other operations that will help visually
(voice) from the user and storing the collected text in the impaired people to do their daily activities like normal people
training data logs. LSTM takes the texts from training data logs without the help of personal assistants. For example, making a
and stores them in the results database to which the google phone call, sending text messages, booking a taxi, easy
speech recognition process communicates directly. From the navigation, etc.
result database, the text is sent to the word error rate calculator
and the loss functions. The LSTM NN sends the process
input–output operations to the TensorFlow implementation References
process, which provides the graphical visualization to the
