Chapter 1 - Introduction: Dept. of Electronics and Communication Engineering 1
Chapter 1 - Introduction: Dept. of Electronics and Communication Engineering 1
CHAPTER 1 - INTRODUCTION
As we stand on the brink of the third wave of computing, the concept of a virtual
assistant (a digital service looking after a range of our needs) is fast becoming a reality. As artificial
intelligence and machine learning progress at pace, virtual assistants are set to become our gateway
to the internet and know more about us than we do ourselves. A virtual assistant is a software
agent that can perform tasks or services for an individual. As of 2017, the capabilities and usage
of virtual assistants are expanding rapidly, with new products entering the market and a strong
emphasis on voice user interfaces. The most widely used are Apple's Siri, Google
Assistant, Amazon Alexa and Microsoft Cortana.
Virtual assistants can provide a wide variety of services, and particularly those from
Amazon Alexa and Google Assistant grow by the day. These include:
Provide information such as weather, facts from e.g. Wikipedia or IMDB, set an alarm, make
to-do lists and shopping lists.
Play music from streaming services such as Spotify and Pandora, play radio stations,
read audiobooks.
Play videos, TV shows or movies on televisions, streaming from e.g. Netflix.
Conversational commerce.
Complement and/or replace customer service by humans. One report estimated that an
automated online assistant produced a 30% decrease in the work-load for a human-
provided call center.
Home automation is also an application of virtual assistant. Home
automation or domotics is building automation for a home, called a smart home or smart house. A
home automation system will control lighting, climate, entertainment systems, and appliances. It
may also include home security such as access control and alarm systems. When connected with
the Internet, home devices are an important constituent of the Internet of Things. A home
automation system typically connects controlled devices to a central hub or "gateway". The user
interface for control of the system uses either wall-mounted terminals, tablet or desktop computers,
a mobile phone application, or a Web interface, that may also be accessible off-site through the
Internet.
The virtual assistant in this project is named ‘Green’. The system is named after
the Raspberry Pi board which is green in colour.
Initially when the project is run the system introduces itself by telling its name
and then it waits for the input from user. If the keyword ‘green’ is not present in the user input, it
goes into an infinite loop and waits for the keyword ‘green’. Once the user speaks the word ‘green’,
the system is enabled to perform certain tasks. After the completion of each task the system waits
for the keyword ‘green’ and the whole process repeats. Once the program is run, the system
outputs:
'Hi. I'm your virtual assistant. I'm called Green. Just call my name whenever you need me.'
through the output transducer (or simply the speaker). The system may be addressed as:
“hey green” or
“hello green” or
Here, the keyword that is of importance is ‘green’ and the other associated words are ignored.
2.1 - Raspbian
Raspbian is a Debian-based computer operating system for Raspberry Pi [1]. There are
several versions of Raspbian including Raspbian Stretch and Raspbian Jessie. Raspbian is highly
optimized for the Raspberry Pi line's low-performance ARM CPUs.
philosophy that emphasizes code readability, notably using significant whitespace. It provides
constructs that enable clear programming on both small and large scales. In July 2018, Van
Rossum stepped down as the leader in the language community after 30 years.
Python features a dynamic type system and automatic memory management. It supports multiple
programming paradigms, including object-oriented, imperative, functional and procedural, and has
a large and comprehensive standard library. Here are the some of the salient features [3].
Interpreted: There are no separate compilation and execution steps like C and
C++.Internally, Python converts the source code into an intermediate form called
bytecodes which is then translated into native language of specific computer to run it. There
is no need for linking and loading with libraries.
High-level language: In Python there is no need to take care about low-level details
such as managing the memory used by the program.
Simple: It is closer to English language and easy to learn and. It emphasizes more on the
solution to the problem rather than the syntax.
Embeddable: Python can be used within C/C++ program to give scripting capabilities
for the program’s users.
Robust: It has exception handling features and in-built memory management techniques.
gTTS (Google Text-to-Speech): This Python library and CLI tool is to interface with
Google translator’s text-to-speech API. It writes spoken mp3 data to a file, a file-like object
(byte string). The gTTS API supports several languages including English, Hindi, Tamil,
French, German and many more. The speech can be delivered in any one of the two
available audio speeds, fast or slow. However, as of the latest update, it is not possible to
change the voice of the generated audio [4].
Wikipedia: Wikipedia is a Python library that makes it easy to access and parse data
from Wikipedia. Search Wikipedia, get article summaries, get data like links and images
from a page, and more. This library was designed for simple use and it wraps the
MediaWiki API [4].
requests: Requests will allow us to send HTTP/1.1 requests using Python. With it, you
can add content like headers, form data, multipart files, and parameters via simple Python
libraries. It also allows you to access the response data of Python in the same way [4].
PyAudio: With PyAudio, Python to play can be used and audio can be recorded on a
variety of platforms, such as GNU/Linux, Microsoft Windows, and Apple Mac OS [4].
FLAC (Free Lossless Audio Codec): FLAC is an audio coding format for lossless
compression of digital audio, and is also the name of the free software project producing
the FLAC tools, the reference software package that includes a codec implementation.
Digital audio compressed by FLAC's algorithm can typically be reduced to between 50 and
70 percent of its original size and decompress to an identical copy of the original audio
data [4].
mpg321: Is an audio player which can play mp3 format. It is a lightweight command line
MP3 player [4].
Lame: LAME is a software encoder that converts audio to the MP3 file format [4].
The first step to getting started with Raspberry Pi is to format the microSD card that will
be used to download the operating system [5]. Steps to format the SD card:
The next step is to get Raspbian Stretch onto the microSD card. Once it's loaded, the SD
card can be plugged it into your Raspberry Pi and configure the operating system. The microSD
card should already be connected to your computer at this time. Steps to install the OS:
Insert the microSD card into the card slot on the underside of the Raspberry Pi.
Plug the USB keyboard into one of the USB ports.
Plug the USB mouse into one of the USB ports.
Turn on your monitor or TV set and make sure it is set to the proper input (e.g. HDMI 1 or
Component.
Plug the HDMI or video component cable into the monitor or TV set.
Connect the other end of the cable into the Raspberry Pi.
Plug the power supply into the power outlet. This will turn on and boot up Raspberry Pi.
A power indicator light will begin to glow, letting you know that you are connected.
When Raspbian begins to load a bunch of lines of code will appear. This will continue until
the boot process has completed. Then, the Raspbian Home screen will appear. Configure the
Raspberry Pi system in order to add your location, date, and time.
Create a file which contains the details of the Wi-Fi name and the Wi-Fi password with the
name “wpa_supplicant.config”. The code should be as follows:
network= {
ssid= “name of the network”
psk= “password of the network”
}
Eject the microSD card and put it back into the Raspberry Pi.
Power on the Pi board.
Download and install PuTTY software.
Open PuTTY and enter the IP addresses or Host name (Pi Host name) and click on “open”.
After login install tightvncserver by running sudo apt-get install tightvncserver command
and install xrdp by running sudo apt-get install xrdp command.
Open Remote Desktop Connection, enter Raspberry Pi 3B IP address and click Connect.
Now enter “pi” as the username and “raspberry” as the password. Click OK. If everything
works fine, Raspberry Pi desktop will be loaded on the laptop.
Phase 1
Phase 2
Phase 3
speech recognition,
response,
core,
search,
weather,
train details and
hardware controller.
Output
Transducer
Input
Transduce
r
an input transducer
a central control unit
an output transducer
resistors
LEDs
a diode
a transistor
a DC motor
Figure 5.2 – Circuit for interfacing DC motor
The input transducer or a microphone is connected to a sound card which is inserted into one of
the USB ports. This is used to control the GPIO pins.
The GPIOs are used to control the LEDs and the DC motor. The LEDs are named as
KITCHEN_LIGHT at GPIO 8, BEDROOM_LIGHT at GPIO 12, BATHROOM_LIGHT at
GPIO 16 and LIVINGROOM_LIGHT at GPIO 22. The LEDs are given to the ground through
resistors of value 1kΩ.
The DC motor is named as FAN and is connected to GPIO 26 through a resistor of value 1kΩ
and a npn transistor MPS2222A. The GPIO output is given to the base of the transistor. The DC
motor is connected across a 1N4148 diode and to the collector terminal of the transistor. The
other end of the DC motor is given to the positive terminal of a 9V battery. When a GPIO pin is
made HIGH the corresponding LED glows or the motor rotates.
The output transducer or a speaker is connected to the 3mm jack. The is used to convey a
message to the user regarding the status of the LEDs or the DC motor.
6.2 - Code
6.2.1 – initial.py
import text_to_speech as ts
import main
import speech_recognition as sr
GPIO.setwarnings(False)
GPIO.setmode(GPIO.BOARD)
BLINK=7
GPIO.setup(BLINK,GPIO.OUT)
def loop():
while True:
r=sr.Recognizer()
print("Speak")
GPIO.output(BLINK,True)
audio = r.listen(scource)
GPIO.output(BLINK,False)
try:
print(r.recognize_google(audio))
if ('green' in r.recognize_google(audio)):
main.askme()
else:
loop()
except:
loop()
return
ts.voice('Hi. I\'m your virtual assistant. I\'m called Green. Just call my name whenever you need me.')
loop()
6.2.2 – iopin.py
import RPi.GPIO as GPIO
import time
import text_to_speech as ts
GPIO.setmode(GPIO.BOARD)
GPIO.setwarnings(False)
GPIO.setup(8,GPIO.OUT)
GPIO.setup(12,GPIO.OUT)
GPIO.setup(16,GPIO.OUT)
GPIO.setup(22,GPIO.OUT)
KITCHEN_LIGHT = 8
BEDROOM_LIGHT = 12
BATHROOM_LIGHT = 16
HALL_LIGHT = 22
def devices(st):
if('bedroom' in st ):
if('on' in st):
GPIO.output(BEDROOM_LIGHT,True)
else:
GPIO.output(BEDROOM_LIGHT,False)
elif('kitchen' in st):
if('on' in st):
GPIO.output(KITCHEN_LIGHT,True)
else:
GPIO.output(KITCHEN_LIGHT,False)
elif('bath' in st):
if('on' in st):
GPIO.output(BATHROOM_LIGHT,True)
else:
GPIO.output(BATHROOM_LIGHT,False)
elif('living' in st):
if('on' in st):
GPIO.output(HALL_LIGHT,True)
else:
GPIO.output(HALL_LIGHT,False)
else:
6.2.3 – main.py
import text_to_speech as ts
import train
import wiki
import weather
import iopin
import speech_recognition as sr
GPIO.setwarnings(False)
GPIO.setmode(GPIO.BOARD)
FAN=26
BLINK=7
GPIO.setup(FAN,GPIO.OUT)
GPIO.setup(BLINK,GPIO.OUT)
now = datetime.now()
def askme():
r=sr.Recognizer()
print("Speak")
GPIO.output(BLINK,True)
audio = r.listen(scource)
GPIO.output(BLINK,False)
try:
print(r.recognize_google(audio))
if ("time" in r.recognize_google(audio)):
print('Today is '+ now.strftime(" %d ") +' of '+ now.strftime(" %B ") + now.strftime(" %Y ") + '.')
weather.Weather()
print(r.recognize_google(audio))
train.details()
wiki.Wiki()
ts.voice('I can help you with a few things. You can ask me questions related to the current date, time,
weather or queries related to train data.'+
'Just tell me to search for information on wikipedia. I can also control all the lights and fans of a
building.'+
st=r.recognize_google(audio)
print(st)
iopin.devices(st)
if('on' in r.recognize_google(audio)):
GPIO.output(FAN,True)
elif('off' in r.recognize_google(audio)):
GPIO.output(FAN,False)
else:
except:
return
6.2.4 - text_to_speech.py
from gtts import gTTS
import os
def voice(audio):
tts.save('audio.mp3')
os.system("mpg321 audio.mp3")
return
6.2.5 - train.py
import requests , json
import text_to_speech as ts
import speech_recognition as sr
r=sr.Recognizer()
def details():
print("Speak")
audio = r.listen(scource)
x=r.recognize_google(audio)
api_key="d2mh5mt66z"
base_url="https://api.railwayapi.com/v2/live/train/"
train_number=x
current_date="06-07-2018"
response_ob = requests.get(complete_url)
print(complete_url)
result = response_ob.json()
if result["response_code"] == 200 :
train_name = result["train"]["name"]
y = result["route"]
source_station = y[0]["station"]["name"]
destination_station = y[len(y)-1]["station"]["name"]
position = result["position"]
+ str(position))
else:
return
6.2.6 - weather.py
import requests
import text_to_speech as ts
def Weather():
url =
'http://api.openweathermap.org/data/2.5/weather?appid=561470f211323497f57d99a4ae889ffd&q=bangalore'
json_data = requests.get(url).json()
forecast = json_data['weather'][0]['description']
temperature = json_data['main']['temp']
humidity = json_data['main']['humidity']
tmax = json_data['main']['temp_max']
tmin = json_data['main']['temp_min']
return
6.2.7 - wikipedia.py
import speech_recognition as sr
import text_to_speech as ts
import wikipedia
GPIO.setwarnings(False)
GPIO.setmode(GPIO.BOARD)
GPIO.setup(7,GPIO.OUT)
def Wiki():
r=sr.Recognizer()
GPIO.output(7,True)
audio = r.listen(scource)
GPIO.output(7,False)
print(r.recognize_google(audio))
try:
print(wikipedia.summary(r.recognize_google(audio),sentences=2))
s=(wikipedia.summary(r.recognize_google(audio),sentences=2))
ts.voice(s)
except:
CHAPTER 7 – RESULT
• Test case 6:
• Output 6:
• Turn the fan ON.
• Turning the fan ON. (DC
motor is turned ON)
• Test case 7:
• Turn the ‘___ROOM’ light • Output 7:
ON/OFF.
• Turning the ‘___ROOM’ light
ON/OFF. (LED is turned
ON/OFF)
CHAPTER 8 - DISCUSSION
8.1 - Challenges
The challenges that were encountered during the project tenure and the solutions
to those challenges are as follows:
Usage of mpg321 instead of VLC player: Initially VLC was used to play the
output responses. For this, the user had to open VLC and click play manually. Python did
not support audios to be played on VLC directly using the OS package. mpg321, an mp3
player is made use of in place of VLC. The main advantage of using the player is that the
audio can be played directly while the program is on the run using the OS package in
python.
Lack of a codec for mpg321: The mpg321 does not come with an in-built codec.
So, any audio file played gave an exception while the program was running. FLAC is an
audio coding format for lossless compression of digital audio. This had to be installed for
mpg321 to work as a regular audio player.
Usage of a sound card for microphone: Initially a USB microphone was used as
an input transducer to input the voice commands from the user. But this microphone did
not have a good SNR and hence the added noise or the background noise was more evident
than the actual voice command. A microphone with 3.5mm jack is being used with a sound
card. This has a better noise cancellation when compared to the USB microphone. This
microphone is more accurate in converting the audio signal into text.
Connecting the Pi to a laptop: Initially many devices like keyboard, mouse, monitor
etc. had to be interfaced with the Raspberry Pi. Hence the system was not portable. In order
to make the system portable the Pi was connected to a laptop using Remote Desktop
Connection application with the help of which the Pi’s display could be got on to the laptop.
CHAPTER 9 – CONCLUSION
The prototype developed during the project performed with satisfying results. The accuracy
reached with users that has distinct pronunciation resulted in a limited need to repeat commands.
The chosen hardware and framework did not have maximum compatibility which meant that an
optimal result could not be reached. To increase performance the prototype would need more
computational power, which would incur a higher cost. This increased cost would counter the
affordability property defined in the project purpose. It might have been better to use a
SpeechRecognition framework based on a lower-level language, for example C, and a more light-
weight framework with less demanding search algorithms. These changes would probably
optimize the execution of the software on the Raspberry Pi, which could lead to a more satisfying
result.
The properties defined by the project purpose were fulfilled. The prototype has a relatively
low cost and is easy to use. The implemented functionality corresponds well to the questions from
the user.
APPENDIX A
COMPONENTS
Raspberry Pi 3 model B
Sound Card
Microphone
Speakers
DC motor
MPS2222A (Transistor)
1N4148 (Diode)
LED
Resistors
Bread Board
Connecting wires
A.1 – Raspberry Pi
The Raspberry Pi is a series of small single-board computers developed in the United
Kingdom by the Raspberry Pi Foundation to promote the teaching of basic computer science in
schools and in developing countries. It does not include peripherals (such as keyboards and mice)
and cases.
The Raspberry Pi 3 Model B is the earliest model of the third-generation Raspberry Pi. It
replaced the Raspberry Pi 2 Model B in February 2016. The specifications of Raspberry Pi 3B are:
Any of the GPIO pins can be designated (in software) as an input or output pin and
used for a wide range of purposes.
Voltages:
Two 5V pins and two 3.3V pins are present on the board, as well as a number of ground
pins (0V), which are unconfigurable. The remaining pins are all general purpose 3V3 pins,
meaning outputs are set to 3V3 and inputs are 3V3-tolerant.
Outputs:
A GPIO pin designated as an output pin can be set to high (3V3) or low (0V).
Inputs:
A GPIO pin designated as an input pin can be read as high (3V3) or low (0V). This is
made easier with the use of internal pull-up or pull-down resistors. Pins GPIO2 and GPIO3 have
fixed pull-up resistors, but for other pins this can be configured in software.
The software PWM is available on all pins and hardware PWM is available on GPIO12,
GPIO13, GPIO18, GPIO19.
Driver Support:
The Foundation will not include a GPIO driver in the initial release, standard Linux
GPIO drivers should work with minimal modification. The community implemented SPI and I²C
drivers which will be integrated with the new Linux pin control concept in a later version of the
kernel. The I²C and SPI driver uses the hardware modules of the microcontroller and interrupts for
low CPU usage, the 1-wire support uses bit banging on the GPIO ports, which results in higher
CPU usage.
On the production board, the Raspberry Pi Foundation design brings out the MIPI CSI-2
(Camera Serial Interface) to a 15-way flat flex connector S5, between the Ethernet and HDMI
connectors. A compatible camera with 5 Megapixels and 1080p video resolution is released.
On the production board, the Raspberry Pi Foundation design brings out the DSI
(Display Serial Interface) to a 15-way flat flex connector labelled S2, next to Raspberry Pi logo. It
has two data lanes and a clock lane, to drive a possible future LCD screen device. Some smart
phone screens use DSI.
A.3 – Microphone
A microphone is a transducer that converts sound into an electric signal. Microphones are
used in many applications such as telephones, hearing aids, public address systems for concert
halls and public events, motion picture production, live and recorded audio engineering, sound
recording, two-way radios, megaphones, radio and television broadcasting, and in computers for
recording voice, speech recognition, VoIP, and for non-acoustic purposes such as ultrasonic
sensors or knock sensors.
In the dynamic microphone, sound waves cause movement of a thin metallic diaphragm
and an attached coil of wire. A magnet produces a magnetic field which surrounds the coil, and
motion of the coil within this field causes current to flow. The principles are the same as those that
produce electricity at the utility company, realized in a pocket-sized scale. It is important to
remember that current is produced by the motion of the diaphragm, and the current in the coil is
channeled from the microphone along wires.
A.4 – Speaker
A speaker is an electroacoustic transducer which converts which converts an electric
signal into sound energy. The dynamic speaker operates on the same basic principle as a dynamic
microphone, but in reverse, to produce sound from an electrical signal. When an alternating current
electrical audio signal is applied to its voice coil, a coil of wire suspended in a circular gap between
the poles of a permanent magnet, the coil is forced to move rapidly back and forth due to Faraday's
law of induction, which causes a diaphragm (usually conically shaped) attached to the coil to move
back and forth, pushing on the air to create sound waves. Besides this most common method, there
are several alternative technologies that can be used to convert an electrical signal into sound. The
sound source (e.g., a sound recording or a microphone) must be amplified or strengthened with an
audio power amplifier before the signal is sent to the speaker.
A.5 – DC motor
A DC motor is any of a class of rotary electrical machines that converts direct
current electrical energy into mechanical energy. The most common types rely on the forces
produced by magnetic fields. Nearly all types of DC motors have some internal mechanism, either
electromechanical or electronic; to periodically change the direction of current flow in part of the
motor.
This DC or direct current motor works on the principal which states that, ‘when
a current carrying conductor is placed in a magnetic field, it experiences a torque and tends to
move’. This is known as motoring action. If the direction of current in the wire is reversed, the
direction of rotation also reverses. When magnetic field and electric field interact, they produce a
mechanical force, and based on that the working principle of DC motor is established.
The MPS2222A transistor consists two PN diode connected back to back. It has
three terminals namely emitter, base and collector. The base is the middle section which is made
up of thin layers. The right part of the diode is called emitter diode and the left part is called
collector-base diode. These names are given as per the common terminal of the transistor. The
emitter-based junction of the transistor is connected to forward biased and the collector-base
junction is connected in reverse bias which offers a high resistance.
The 1N4148 is a standard silicon switching signal diode. It is one of the most
popular and long-lived switching diodes because of its dependable specifications and low cost. Its
name follows the JEDEC nomenclature. The 1N4148 is useful in switching applications up to
about 100 MHz with a reverse-recovery time of no more than 4 ns.
LEDs have many advantages over incandescent light sources, including lower
energy consumption, longer lifetime, improved physical robustness, smaller size, and faster
switching. Light-emitting diodes are used in applications as diverse as aviation lighting,
automotive headlamps, advertising, general lighting, traffic signals, camera flashes, lighted
wallpaper and medical devices. They are also significantly more energy efficient and, arguably,
have fewer environmental concerns linked to their disposal.
REFRERENCE
[1] Raspbian, [online]. Available: https://www.raspberrypi.org/downloads/raspbian/.
[2] Python, [online]. Available: https://www.python.org/.
[3] Python, [online]. Available: https://www.w3schools.com/python/.
[4] GIT, [online]. Available: https://github.com/.
[5] Install Raspbian OS, [online]. Available: https://maker.pro/raspberry-pi/tutorial/how-to-
install-raspbian-on-your-raspberry-pi-sd-card.
[6] Weather API, [online]. Available: https://openweathermap.org/api.
[7] Railway API, [online]. Available: https://railwayapi.com/.