Pfe Python
Pfe Python
IMPAIRED
Submitted by
Yashu Chauhan(1613101861)
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
Internet is one of the basic luxury for daily living. Every person is using the facts and
information on internet. On the other hand, blind people face difficulty in accessing the
text resources. The advancement in computer based accessible systems has opened up
many avenues for the visually impaired across a wide majority of the globe. Audio
feedback based virtual environment like, the screen readers have helped blind people to
access internet applications immensely. However the visually challenged people find it
very difficult to utilize this technology because of the fact that using them requires visual
perception. Even though many new advancements have been implemented to help them
use the computers efficiently no naïve user who is visually challenged can use this
technology as efficiently as a normal naïve user can do that is unlike normal users they
require some practice for using the available technologies. In this project, the voicemail
system architecture that can be used by a blind person to access e-mails easily and
efficiently. The contribution made by project has enabled the blind people to send and
receive voice-based e-mail message. The proposed system GUI has been evaluated
against the GUI of a traditional mail server and found that the proposed architecture
performs much better than that of the existing GUIS. In this project, the use of voice to
text and text to voice technique access for blind people. Also this system can be used by
any normal person also for example the one who is not able to read. The system is
ABBREVATIONS MEANING
IVR Interactive Voice Response
DTMF Dual Tone Multi Frequency
ASR Automatic Speech Recognition
STT Speech To Text
TTS Text To Speech
GUI Graphical User Interface
PCA Principle Component Analysis
HTML Hypertext Markup Language
VRU Voice Reaction Unit
ADC Analog To Digital Converter
API Application Program Interface
gTTS Google Text To Speech
SMTP Simple Mail Transfer Protocol
IMAP Internet Message Access Protocol
TCP Transmission Control Protocol
Chapter 1
INTRODUCTION
We have seen that the introduction of Internet has revolutionized many fields. Internet
has made life of people so easy that people today have access to any information they
want easily. Communication is one of the main fields highly changed by Internet.
E-mails are the most dependable way of communication over Internet, for sending and
receiving some important information. But there is a certain norm for humans to access
the Internet and the norm is you must be able to see. But there are also differently able
people in our society who are not gifted with what you have. There are some visually
impaired people or blind people who can’t see things and thus can’t see the computer
screen or keyboard.
A survey has shown that there are more than 240 million visually impaired people
around the globe. That is, around 240 million people are unaware of how to use Internet
or E-mail. The only way by which a visually challenged person can send an E-mail is,
they have to speak the entire content of the mail to another person (not visually
challenged) and then that third person will compose the mail and send on the behalf of
the visually challenged person. But this is not a right way to deal with the problem. It is
very unlikely that every time a visually impaired person can find someone for help.
Although for these reasons the visually impaired people are criticized by our society.
So, for the betterment of society and giving an equal status to such specially able people
we have come up with this project idea which provides the user with ability to send
mails using voice commands without the need of keyboard or any other visual things.
1.6 MOTIVATION
It is estimated that nearly 285 million people in the world are visually impaired and idea
is to facilitate suitable communication system for them. This reason was driving force
behind developing given system. One of the major disadvantages of existing system is
that all operations are based on mouse click events and keyboard. Operations depend
completely on types of clicks specified by idea. Also sometimes remembering keyboard
shortcut is difficult. The extent of existing system is limited for blind and visually
impaired people. There is high need of developing a proper system which curbs all the
above drawbacks and turn into a simple system. Idea focuses on providing basic
functionalities like compose, send, receive E-mail along with advance features like voice
based operation, search mail, provision for voice as well as text based email with added
ease and simplicity. Related Work Interaction of the users to the system earlier was
based on Screen reader based technology and also system based on mouse click based
operations were in for every operation there is associated mouse click for example to
compose email let say to left clicks. Therefore interaction with the system is tough also
there is need to keep events in mind. This paper focuses on developing an email system
which helps blind people to use communication services. The system based in IVR is
used, major idea is to discard keyboard and use of mouse operation. Internet is rich
source of knowledge and information, blind people face difficulties in accessing text
based material. The idea is to develop audio feedback based virtual environment like
screen reader, text to speech, etc. Voice mail architecture helps blind people to access
info. in form of audio, text, self read system. Idea focuses on helping visually impaired
and illiterate people to access technology by reducing cognitive load. Decision making
depends on eyesight and everything that happens or appears.
4
Chapter 2
LITERATURE REVIEW
2.1 “Voice Based System in Desktop and Mobile Devices for Blind People”. In
International Journal of Emerging Technology and Advanced Engineering
(IJETAE), 2014
This paper deals with “Voice Based System in Desktop and Mobile Devices for Blind
People”. Voice mail architecture helps blind people to access e-mail and other
multimedia functions of operating system (songs, text).Also in mobile application SMS
can be read by system itself. Now a days the advancement made in computer technology
opened platforms for visually impaired people across the world. It has been observed
that nearly about 60% of total blind population across the world is present in INDIA. In
this paper, we describe the voice mail architecture used by blind people to access E-mail
and multimedia functions of operating system easily and efficiently. This architecture
will also reduce cognitive load taken by blind to remember and type characters using
keyboard. There is bulk of information available on technological advances for visually
impaired people. This includes development of text to Braille systems, screen magnifiers
and screen readers. Recently, attempts have been made in order to develop tools and
technologies to help Blind people to access internet technologies. Among the early
attempts, voice input and input for surfing was adopted for the Blind people. In IBM’s
Home page the web page is an easy-to-use interface and converts the text-to-speech
having different gender voices for reading texts and links. However, the disadvantage of
this is that the developer has to design a complex new interface for the complex
graphical web pages to be browsed and for the screen reader to recognize.
Simple browsing solution, which divides a web page into two dimensions. This greatly
simplifies a web page’s structure and makes it easier to browse. Another web browser
5
generated a tree structure from the HTML document through analyzing links. As it
attempted to structure the pages that are linked together to enhance navigability, it did
not prove very efficient for surfing. After, it did not handle needs regarding navigability
and usability of current page itself. Another browser developed for the visually
handicapped people was eGuideDog which had an integrated TTS engine. This system
applies some advanced text extraction algorithm to represent the page in a user-friendly
manner. However, still it did not meet the required standards of commercial use.
Considering Indian scenario, ShrutiDrishti and WebBrowser for Blind are the two web
browser framework that are used by Blind people to access the internet including the
emails. Both the systems are integrated with Indian language ASR and TTS systems. But
the available systems are not portable for small devices like mobile phones.
2.2“Voice Based Search Engine and Web page Reader”. In International Journal of
Computational Engineering Research (IJCER)
This paper aims to develop a search engine which supports Man-Machine interaction
purely in the form of voice. A novel Voice based Search Engine and Web-page Reader
which allows the users to command and control the web browser through their voice, is
introduced. The existing Search Engines get request from the user in the form of text and
respond by retrieving the relevant documents from the server and displays in the form of
text .Even though the existing web browsers are capable of playing audios and videos,
the user has to request by typing some text in the search text box and then the user can
play the interested audio/video with the help of Graphical User Interfaces (GUI). The
proposed Voice based Search Engine aspires to serve the users especially the blind in
browsing the Internet. The user can speak with the computer and the computer will
respond to the user in the form of voice. The computer will assist the user in reading the
documents as well. Voice-enabled interface with addition support for gesture based input
and output approaches are for the “Social Robot Maggie” converting it into an aloud
6
reader . This voice recognition and synthesis can be affected by number of reasons such
as the voice pitch, its speed, its volume etc. It is based on the Loquendo ETTS
(Emotional Text-To-Speech) software. Robot also expresses its mood through gesture
that is based on gestionary. Speech recognition accuracy can be improved by removal of
noise. In A Bayesian scheme is applied in a wavelet domain to separate the speech and
noise components in a proposed iterative speech enhancement algorithm. This proposed
method is developed in the wavelet domain to exploit the selected features in the time
frequency space representation. It involves two stages: a noise estimate stage and a
signal separation stage. In the Principle Component Analysis (PCA) based HMM for the
visual modality of audio-visual recordings is used. PCA (Principle Component Analysis)
and PDF (Probabilistic Density Analysis). Presents an approach to speech recognition
using fuzzy modelling and decision making that ignores noise instead of its detection
and removal. In the speech spectrogram is converted into a fuzzy linguistic description
and this description is used instead of precise acoustic features.In Voice recognition
technique combined with facial feature interaction to assist virtual artist with upper limb
disabilities to create visual cut in a digital medium, preserve the individuality and
authenticity of the art work. Techniques to recover phenomena such as Sentence
Boundaries, Filler words and Disfluencies referred to as structural Metadata are
discussed in and describe the approach that automatically adds information about the
location of sentence boundaries and speech disfluencies in order to enrich speech
recognition output. Clarissa a voice enabled procedure browser that is deployed on the
international space station (ISS). The main components of the Clarissa system are speech
recognition module a classifier for executing the open microphone accepts/reject
decision, a semantic analysis and a dialog manager. Mainly focuses on expressions.
To build a prosody model for each expressive state, an end pitch and a delta pitch for
each syllable are predicted from a set of features gathered from the text. The expression-
tagged units are then pooled with the neutral data, In a TTS system, such paralinguistic
events efficiently provide clues as to the state of a transaction, and Markup specifying
7
these events is a convenient way for a developer to achieve these types of events in the
audio coming from the TTS engine.
Main features of are smooth and natural sounding speech can be synthesized, the voice
characteristics can be changed, it is “trainable. Limitations of the basic system is that
synthesized speech is “buzz” since it is based on a vocoding technique, it has been
overcome by high quality vocoder and hidden semi-Markov model based acoustic
modelling. Speech synthesis consists of three categories: Concatenation Synthesis,
Articulation Synthesis, and Formant Synthesis.
Mainly focuses on formant synthesis, array of phoneme of syllable with formants
frequency is given as input, frequency of given input is processed, on collaborated with
Thai-Tonal-Accent Rules convert given formants frequency format to wave format, so
that audio output via soundcard.
8
2.3 “Voice Based Services for Blind People”. In International Journal of Advance
Research, Ideas and Innovations in Technology(IJARIIT)
The advancement in computer based accessible systems has opened up many avenues for
the visually impaired across a wide majority of the globe. Audio feedback based virtual
environment like, the screen readers have helped blind people to access internet
applications immensely. However, a large section of visually impaired people in
different countries, in particular, the Indian sub-continent could not benefit much from
such systems. This was primarily due to the difference in the technology required for
Indian languages compared to those corresponding to other popular languages of the
world. In this paper, we describe the voicemail system architecture that can be used by a
blind person to access e-mails easily and efficiently. The contribution made by this
research has enabled the blind people to send and receive voice-based e-mail messages
in their native language with the help of a mobile device. Our proposed system GUI has
been evaluated against the GUI of a traditional mail server. We found that our proposed
architecture performs much better than that of the existing GUIS. In this project, we use
voice to text and text to voice technique access for blind people.
The navigation system uses TTS (Text-to-Speech) for blindness in order to provide a
navigation service through voice. Suggested system, as an independent program, is fairly
cheap and it is possible to install onto Smartphone held by blind people.This allows
blind people to easy access the program. An increasing number of studies have used
technology to help blind people to integrate more fully into a global world. We present
software to use mobile devices by blind users. The software considers a system of instant
messenger to favor interaction of blind users with any other user connected to the
network. Nowadays the advancement made in computer technology opened platforms
for visually impaired people across the world. It has been observed that nearly about
60% of the total blind population across the world is present in INDIA. In this paper, we
describe the voice mail architecture used by blind people to access E-mail and
9
multimedia functions of the operating system easily and efficiently.This architecture will
also reduce cognitive load taken by the blind to remember and type characters using the
keyboard. It also helps handicapped and illiterate people. In previous work, blind people
does not send email using the system. The multitude of email types along with the ability
setting enables their use in nomadic daily contexts. But these emails are not useful in all
types of people such as blind people they can’t send the email. Audio based email are
only preferable for blind peoples. They can easily respond to the audio instructions. In
this system is very rare. So there is less chance to available this audio based email to the
blind people. We describe the voicemail system architecture that can be used by a blind
person to access e-mails easily and efficiently. The contribution made by this research
has enabled the blind people to send and receive voice-based e-mail messages in their
native language with the help of a computer or a mobile device. Our proposed system
GUI has been evaluated against the GUI of a traditional mail server. We found that our
proposed architecture performs much better than that of the existing GUIS.
It involves the development of the following modules:
SPEECH_ TO_ TEXT Converter :The system acquires speech at run time through a
microphone and processes the sampled speech to recognize the uttered text. The
recognized text can be stored in a file. We are developing this on Android platform using
Eclipse workbench. Our speech to-text system directly acquires and converts speech to
text. It can supplement other larger systems, giving users a different choice for data
entry. A speech-to-text system can also improve system accessibility by providing data
entry options for blind, deaf, or physically handicapped users. Speech recognition system
can be divided into several blocks: feature extraction, acoustic models database which is
built based on the training data, dictionary, language model and the speech recognition
algorithm. Analog speech signal must first be sampled at time and amplitude axes, or
digitized. Samples of the speech signal are analyzed in even intervals. This period is
usually 20 ms because the signal in this interval is considered stationary. Speech feature
extraction involves the formation of equally spaced discrete vectors of speech
10
characteristics. Feature vectors from training database are used to estimate the
parameters of acoustic models. The acoustic model describes properties of the basic
elements that can be recognized. The basic element can be a phoneme for continuous
speech or word for isolated
words recognition.
TEXT_ TO_ SPEECH Converter: Converting text to voice output using speech
synthesis techniques. Although initially used by the blind to listen to written material, it
is now used extensively to convey financial data, e-mail messages, and other information
via telephone for everyone. Text-to-speech is also used on handheld devices such as
portable GPS units to announce street names when giving directions. Our Text-to-
Speech Converter‖ accepts a string of 50 characters of text (alphabets and/or numbers) as
input. In this, we have interfaced the keyboard with the controller and defined all the
alphabets as well as digits keys on it. The speech processor has an unlimited dictionary
and can speak out almost any text provided at the input most of the times. Hence, it has
an accuracy of above 90%. It is a microcontroller based hardware coded in Embedded C
language. Further research is to be done to optimize various methods of inputting the
text i.e. Reading the text using optical sensor and converting it to speech so that almost
all sorts of physical challenges faced by the people while communicating are overcome.
WORD RECOGNITION :Voice recognition software (also known as speech to text
software)allows an individual to use their voice instead of typing on a keyboard. Voice
recognition may be used to dictate text into the computer or to give commands to the
computer. Voice recognition software allows for a quick method of writing onto a
computer. It is also useful for people with disabilities who find it difficult to use the
keyboard. This software can also assist those who have difficulty with transferring ideas
onto paper as it helps take the focus out of the mechanics of writing. Word recognition is
measured as a matter of speed, such that a word with a high level of recognition is read
faster than a novel one. This manner of testing suggests that comprehension
of the meaning of the words being read is not required, but rather the ability to recognize
11
them in a way that allows proper pronunciation. Therefore, context is unimportant, and
word recognition is often assessed with words presented in isolation in formats such as
flash cards Nevertheless, ease in word recognition, as in fluency, enables proficiency
that fosters comprehension of the
text being read.
2.4 “Voice based e-mail System for Blinds”. In International Journal of Research
Studies in Computer Science and Engineering (IJRSCSE)
Internet plays a vital role in today’s world of communication. Today the world is
running on the basis of internet. No work can be done without use of internet. Electronic
mail i.e. email is the most important part in day to day life. But some of the people in
today’s world don’t know how to make use of internet, some are blind or some are
illiterate. So it goes very difficult to them when to live in this world of internet.
Nowadays there are various technologies available in this world like screen readers,
ASR, TTS, STT, etc. but these are not that much efficient for them. Around 39 million
people are blind and 246 people have low vision and also 82 of people living with
blindness are 50 aged and above. We have to make some internet facilities to them so
12
they can use internet. Therefore we came up with our project as voice based email
system for blinds which will help a lot to visually impaired peoples and also illiterate
peoples for sending their mails. The users of this system don’t need to remember any
basic information about keyboard shortcuts as well as location of the keys. Simple
mouse click operations are needed for functions making system easy to use for user of
any age group. Our system provides location of where user is prompting through voice
so that user doesn’t have to worry about remembering which mouse click operation
The visually challenged people find it very difficult to utilize this technology because of
the fact that using them requires visual perception. However not all people can use the
internet. This is because in order to access the internet you would need to know what is
written on the screen. If that is not visible it is of no use. This makes internet a
completely useless technology for the visually impaired and illiterate people.
In this system mainly three types of technologies are used namely:
STT (Speech-to-text): here whatever we speak is converted to text. Their will a small
icon ofmic on whose clicking the user had to speak and his/her speech will be converted
to text format, which the naked people would see and read also.
TTS (text-to-speech) this, method is full opposite of STT. In this method, which
converts the text format of the emails to synthesized speech? A text-to-speech (TTS)
system converts language text into speech, alternative systems render symbolic linguistic
representations. Synthesized speech can be created by concatenating pieces of recorded
speech that are stored in a database.
14
Chapter 3
SYSTEM DEVELOPMENT
Increase professionalism :
You can use an IVR system to greet your customers in a very professional manner and to
make it appear that you have more departments and employees than you actually have.
21
Figure-4: System Block Diagram for Speech Recognition
22
synthesizers since the first nineteen nineties years.
The text to speech system is consisting of 2 parts:-front-end and a back-end. The front-
end consist of 2 major tasks. Firstly, it disciple unprocessed text containing symbols like
numbers and abstraction into the equivalent of written out words. This method is
commonly known as text, standardization, or processing. Front end then assigns spoken
transcriptions to every word, and divides and marks the text into speech units, like
phrases, clauses, and sentences.
The process of assigning phonetic transcriptions to words is called text-to-phoneme or
grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information
together make up the symbolic linguistic representation that is output by the front-end.
The back-end—often referred to as the synthesizer—then converts the symbolic
linguistic representation into sound. In certain systems, this part includes the
computation of the target prosody (pitch contour, phoneme durations), which is then
imposed on the output speech.
24
Chapter 4
DESIGN
B. Phase-2:
In phase-2 of our program the user will give speech input to the system.
This speech input will be handled by speech_recognition module.It is a python library
which is used to handle the voice requests and it converts speech into text.
Now after receiving input from the user speech to text converter will save the response
in respective variables used in the script and based on their value it will further enter into
respective modules.
C. Phase-3:
In this phase our program will handle the requests by the user. Based on the speech input
given by the user it will launch the modules.
• Login to G-mail account:- This module will handle the request by user to login in their
25
g-mail account. This module will make the connection with the user’s gmail account
based on the credentials provided through voice input. This module’s script designed as
such it will prompt user to enter their g-mail username and password and then it will use
selenium web-driver to automate the task for the user and as a result connection will be
made.
• Send E-mail through G-mail:- This module will handle the request by user to send
email through their g-mail account. The python script for this module will prompt the
user to enter their credentials and then it will make connection with their account.
After the connection has been done it will further prompt the user to enter the receiver’s
account e- mail id and it will then allow the user to speak their message and it will repeat
it for them and by saying ok it will send the mail.
SMTP library in python is used for the above task.
• Read E-mail through G-mail:- This module will handle the request by user to read
email through their g-mail account. The python script for this module will prompt the
user to enter their credentials and then it will make connection with their account.
After the connection has been done it will start fetching the unread mails for the user and
will speak it for them with the help of pyttsx3 or gTTS library in python for text to
speech conversion.
26
Figure-6: Illustration of Sending and Receiving E-mails
27
Chapter 5
IMPLEMENTATION
A handful of packages for speech recognition exist on PyPI. A few of them include:
• Google-cloud-speech
• Watson-developer-cloud
• Pocketsphinx
• Wit
• Apiai
• Speech Recognition
SpeechRecognition is a library that acts as a wrapper for many popular speech APIs and
is thus very flexible to use. One of these is the Google Web Speech API which supports
a default API key that is hard coded into the SpeechRecognition library.
The elasticity and easy to use features of the SpeechRecognition package in python
make it a very good choice for developers who are working on any python project. It
does not guarantee to support every feature that is wrapped with this API. You will have
28
to dispense some time searching for the easily available options to find out if
SpeechRecognition is going work in your particular case.
5.1.1 REQUIRED INSTALLATIONS
SpeechRecognition is the library which is compatible with Python 2.6, 2.7 and 3.3+, but
it will require some additional installation steps for Python v2.0. For our project we have
used Python v3.0+.
1.>shell-$ pip install SpeechRecognition.
2.>shell-$ pip install python3-pyaudio.
SpeechRecognition will work very good if you need to work with existing audio files.
The pyaudio package comes in play when you need to capture microphone input.
The main class which is used in this package is Recognizer class. The use of recognizer
instance is obviously to recognize the speech. Every instance of this class comes with
various settings and functionality for recognizing speech from the speaker.
The Microphone class used in this python program will let the user use the default
microphone of their system instead of using some audio files as a source.
If the system of the user doesn’t have the default microphone or in case they want to use
some other microphone then they will need to specify which one to use by giving a
29
device index. The list can be seen by calling list_microphone_names() which is static
method of Microphone class.
Every instance of Recognizer class has seven methods for recognizing speech from
speaker source using various APIs:-
• recognize_bing(): Used in “Microsoft Bing Speech”
• recognize_google(): Used in “Google Web Speech API”
• recognize_google_cloud():Used in “Google Cloud Speech” - requires installation of the
google-cloud-speech package
• recognize_houndify(): Used in “Houndify by SoundHound”
• recognize_ibm(): Used in “IBM Speech to Text”
• recognize_sphinx():Used in CMU Sphinx - requires installing PocketSphinx
• recognize_wit(): Used in “Wit.ai”
listen():- It is another function used for capturing microphone input. It works just like the
AudioFile class while Microhpone is a context manager. Input can be captured from
microphone using listen() method of Recognizer class.The first argument taken by this
method is an audio source and it will keep on detecting the audio input until the silence
is detected by it.
The audio input is generally mixed with ambient noises which can be handled by using
the in-built method of recognizer class adjust_for_ambient_noise().
30
You need to wait for a second or two to adjust_for_ambient() to perform its task and
then try speaking “Whatever you want” in your microphone and wait for sometime
before returning it to recognize the speech again. It only recognizes the speech for one
second and it also give you the option to set the duration for wait time.
5.2.1Pyttsx
Pyttsx is platform independent that is it is compatible with Windows, Linux, and MacOS
speech library. This offers a great set of functionality and features.
The user can set their voice metadata that is information about the data such as gender
male or female, pitch of the voice, age, name and language. It supports large set of
voices.
So to install it in windows platform depending upon which version of python you are
using.
For example if you are using python3 so you need to install pyttsx3.
>>>shell> pip install pyttsx3.
31
5.2.2 gTTS
Another module which can be used in python for conversion is:-
This module is Google text to speech library in python. gTTS is platform independent
that is it is compatible with Windows, Linux, and MacOS speech library. This offers a
great set of functionality and features.
To install this API in windows platform
>>>shell>pip install gTTS
32
5.3 SIMPLE MAIL TRANSFER PROTOCOL(SMTP)
Email is rising because the one among the foremost valuable service in net nowadays.
Most of the web systems use SMTP as a technique to transmit mail from one client to
different. SMTP may be a thrust set of rules and is employed to send the mail whereas
POP (post workplace protocol) or IMAP (internet message access protocol) square
measure accustomed retrieve those mails at the receiver’s aspect.
SMTP is Associated with the application layer protocol of OSI model of network.
The user who desires to launch the mail open a TCP (Transmission Control Protocol)
connection to the SMTP server and then sends the mail to the other connection. The
SMTP server is mostly on listening mode. No sooner the server listens for a TCP
connection from any user, the SMTP procedure initiate a connection usually on port
number 25. When the successful establishment of TCP connection has been done, the
client can send the mail.
The two processes that is sender process and the receiver process carry out a simple
request response dialogue, outlined by the SMTP protocol within which the client
process transmits the mail address of the mastermind and the recipient for a message.
Once the server method accept these mail addresses, the consumer method broadcast the
e-mail instant message. The message should include a message header and message text
(“body”) formatted in accord with RFC 822.
The following example illustrates a message in the RFC 822 message format:
From: yashuchauhan@example.com
To: sauravmishra@example.com
Subject: An RFC 822 formatted message
This is a simple text body of the message.
The blank line separates the header and body of the message.
The SMTP model is of two types :-
1.End-to- end method
33
2.Store-and- forward method
The SMTP model chains both end-to-end no intermediate message transfer agents and
store-and-forward mail delivery methods. The end-to-end method of SMTP is used
between organization, and the store-and forward method is chosen for sending mails
within organizations which have TCP/IP and SMTP-based networks.
End-To-End
In this method , a SMTP client will speak to the destination host’s SMTP server directly
to transport the mail. It will keep the mail item from being transmitted until it has been
successfully copied to the recipient’s SMTP.
Store-and-Forward
In this method a mail can be sent through a number of intermediary hosts, before
reaching to the final destination.
A successful transmission from a hosts signify only that the mails has been sent to the
next host, and then the mail will be sent to next host.
Automation of sending mails using Python can be done by using the smtplib module of
Python. Smtplib contains the class SMTP which is useful to connect with mail servers
and can be used to send mails. It defines a SMTP client session object which is used to
send mail to any internet connected machine that depends on SMTP format.
SMTP is normally used to connect to a mail server and transmit the messages.
The mail server host name and port can be passed to the constructor, or you can use
connect() explicitly.
Once connected, just call sendmail() with the envelope arguements and body of the
message.
The message text should be a completely created RFC 822-compliant message, since
smtplib does not alter the contents of headers.
34
We have to add header and sender mail and receiver mail by ourselves.
S1 = smtplib.SMTP( h , p , l)
Where h=host name, p=port number,l=localhost name
Host – The argument is used to represent the host which provides you SMTP server. We
can specify IP address of the host or a domain name like gmail.com or outlook.com. It is
not a compulsory argument.
Port - If the host name is provided then we have to give a port number where SMTP
server will listen the requests, normally this port number is 25.
Local hostname - If your SMTP server is running on your local machine, then you can
give just localhost in this argument.
An SMTP object has a method called sendmail, which is usually used to send the mails.
It takes following parameters -
The sender - Email-Id of sender.
The receivers - Email-Ids of receivers.
The message - A message arranged like RFC822
This is very important part of our system but python as usual provides very high
flexibility in providing this feature. Yes we can automate the process of reading the
email from our Gmail account and this can be very useful for the people who can’t see
so they can use this system to read the email or we can say fetch the unread email from
their Gmail account and can listen to it with the help of text to speech converter.
So to achieve our task we just need three modules or functionalities.
1.A mail server and username and its password.
2.Login to Gmail account.(Already discussed how to login using python script) .
3.Servers such as imap.gmail.com and smtp.gmail.com.
4.Most important server needed would be imap.gmail.com.
Imaplib-IMAP4 PROTOCOL
This is the main module which will be used in the process of reading email from your
Gmail account using a python script.
Basically this module consists of three classes 1.IMAP4 2.IMAP4_SSL and
3.IMAP4_stream.
These classes are contained in imaplib module whereas IMAP4 is the base class.
36
37
Figure-12: IMAPLIB Demonstration
38
5.4 Software Requirements:
Tools Used:
• Python IDLE.
• Interpreters for scripts.
• Selenium Web driver in python.
• Google Speech-to-text and text-to-speech Converters.
• Pyttsx text to speech api in python.
5.5Hardware Requirements:
• Windows Desktop
39
Chapter 6
TESTING AND RESULTS
6.1 CODE
import speech_recognition as sr
import smtplib
import pyaudio
import platform
import sys
from bs4 import BeautifulSoup
import email
import imaplib
from gtts import gTTS
import pyglet
import os, time
print("-" * 60)
print(" Project: Voice based Email for blind")
print(" <--Created by Yashu Chauhan-->")
print("-" * 60)
# project name
tts = gTTS(text="Project: Voice based Email for blind", lang='en')
ttsname = ("path/name.mp3")
tts.save(ttsname)
# login from os
login = os.getlogin
print("You are logging from : " + login())
# choices
print("1. composed a mail.")
tts = gTTS(text="option 1. composed a mail.", lang='en')
ttsname = ("path/hello.mp3")
tts.save(ttsname)
time.sleep(music.duration)
os.remove(ttsname)
41
time.sleep(music.duration)
os.remove(ttsname)
# this is for input choices
tts = gTTS(text="Your choice ", lang='en')
ttsname = ("path/hello.mp3")
tts.save(ttsname)
time.sleep(music.duration)
os.remove(ttsname)
try:
text = r.recognize_google(audio)
print("You said : " + text)
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio.")
except sr.RequestError as e:
42
print("Could not request results from Google Speech Recognition service;
{0}".format(e))
# choices details
if int(text) == 1:
r = sr.Recognizer() # recognize
with sr.Microphone() as source:
print("Your message :")
audio = r.listen(source)
print("ok done!!")
try:
text1 = r.recognize_google(audio)
print("You said : " + text1)
msg = text1
except sr.UnknownValueError:
print("Google Speech Recognition could not understand audio.")
except sr.RequestError as e:
print("Could not request results from Google Speech Recognition service;
{0}".format(e))
if int(text) == 2:
mail = imaplib.IMAP4_SSL('imap.gmail.com', 993) # this is host and port area.... ssl
security
unm = ('your mail/ victim mail') # username
psw = ('pswrd') # password
mail.login(unm, psw) # login
stat, total = mail.select('Inbox') # total number of mails in inbox
print("Number of mails in your inbox :" + str(total))
tts = gTTS(text="Total mails are :" + str(total), lang='en') # voice out
ttsname = ("path/total.mp3")
tts.save(ttsname)
music = pyglet.media.load(ttsname, streaming=False)
music.play()
time.sleep(music.duration)
os.remove(ttsname)
# unseen mails
unseen = mail.search(None, 'UnSeen') # unseen count
print("Number of UnSeen mails :" + str(unseen))
tts = gTTS(text="Your Unseen mail :" + str(unseen), lang='en')
ttsname = ("path/unseen.mp3")
44
tts.save(ttsname)
music = pyglet.media.load(ttsname, streaming=False)
music.play()
time.sleep(music.duration)
os.remove(ttsname)
# search mails
result, data = mail.uid('search', None, "ALL")
inbox_item_list = data[0].split()
new = inbox_item_list[-1]
old = inbox_item_list[0]
result2, email_data = mail.uid('fetch', new, '(RFC822)') # fetch
raw_email = email_data[0][1].decode("utf-8") # decode
email_message = email.message_from_string(raw_email)
print("From: " + email_message['From'])
print("Subject: " + str(email_message['Subject']))
tts = gTTS(text="From: " + email_message['From'] + " And Your subject: " +
str(email_message['Subject']), lang='en')
ttsname = ("path/mail.mp3")
tts.save(ttsname)
music = pyglet.media.load(ttsname, streaming=False)
music.play()
time.sleep(music.duration)
os.remove(ttsname)
# Body part of mails
stat, total1 = mail.select('Inbox')
stat, data1 = mail.fetch(total1[0], "(UID BODY[TEXT])")
msg = data1[0][1]
soup = BeautifulSoup(msg, "html.parser")
45
txt = soup.get_text()
print("Body :" + txt)
tts = gTTS(text="Body: " + txt, lang='en')
ttsname = ("path/body.mp3")
tts.save(ttsname)
music = pyglet.media.load(ttsname, streaming=False)
music.play()
time.sleep(music.duration)
os.remove(ttsname)
mail.close()
mail.logout()
6.2 MODIFICATIONS
If you want to save the mp3 files in other directory then just follow the below instruction
otherwise don't modify anything:
Just add your desktop directory in code wherever the word path has been used in several
lines. If you don't know your desktop directory then just open terminal or command
prompt and paste the below code. Like: C:\Users\yashu\Desktop (this is my desktop
directory).
%userprofile%\Desktop
Also paste your email id, receiver’s id, password wherever : emailed,victim and pswrd is
written.
If invalid or unsupported audio file occurs or Recall that only FLAC, AIFF, and RIFF
WAV files are supported occurs then try this link. Read the file with librosa, then
convert it back to a temporary .wav file. Then read it back with the wave package.
46
6.3 OUTPUT
Once the user encounters the ‘your choice’ he will then be allowed to input the action he
wants to perform through voice commands and the given command will be executed by
the system.
47
Chapter 7
CONCLUSION
This e-mail system can be used by any user of any age group with ease of access. It has
highlight of speech to content just as content to speech with discourse reader which
makes planned framework to be taken care of by outwardly hindered individual too.
Now the visually impaired people can send and receive mails with a lot of ease only
through voice commands without making any use of a keyboard or any mouse. It has
helped eradicate the difficulties that the blind people face and made them more the
normal individuals.
It has wiped out the idea of utilizing console easy routes alongside screen readers which
will help decreasing the intellectual heap of recollecting console alternate ways. Also
any non-sophisticated user who does not know the position of keys on the keyboard need
not bother as keyboard usage is eliminated. Instructions given by the IVR accordingly to
get the respective services offered.
48
7.2 ADVANTAGES
•The disabilities of visually impaired folks are thrashed.
•This method makes the disabled folks desire a standard user.
•Completely voice based, wiped out the use of keyboard and mouse.
•Efficient and robust
•This design also scales back psychological feature load taken by blind to recollect and
kind characters mistreatment keyboard.
•User friendly
49
REFERENCES
[1] Jagtap Nilesh, Pawan Alai, Chavhan Swapnil and Bendre M.R.. “Voice Based
System in Desktop and Mobile Devices for Blind People”. In International Journal of
Emerging Technology and Advanced Engineering (IJETAE), 2014 on Pages 404-407
(Volume 4, issue 2).
[2] Ummuhanysifa U.,Nizar Banu P K , “Voice Based Search Engine and Web page
Reader”. In Internationa Journal of Computational Engineering Research (IJCER). Pages
1-5.
[3] The Radicati website. [Online]. Available: http://www.radicati.com/wp/wp-
content/uploads/2014/01/EmailStatistics-Report-2014-2018-Executive-Summary.pdf.
[4] Geeks for geeks - https://www.geeksforgeeks.org/project-idea-voice-based-email-
visually-challenged/
[5] K. Jayachandran and P. Anbumani “Voice Based Email for Blind People” in
International Journal of Advance Research, Ideas and Innovations in
Technology(IJARIIT),2017 on Pages 1065-1071
[6] Pranjal Ingle, Harshada Kanade and Arti Lanke “ Voice based e-mail system for
blinds” in International Journal of Research Studies in Computer Science and
Engineering(IJRSCSE), 2016 on Pages 25-30 (Volume 3, issue 1)
[7] G. Broll, S. Keck, P. Holleis and A. Butz, “Improving the Accessibility of
NFC/RFID-based Mobile Interaction through Learnability and Guidance”, International
Conference on Human-Computer Interaction with Mobile devices and services, vol. 11,
(2009).
50