Implementasi Google Assistant With Raspberry Pi 3

The document discusses implementing Google Assistant on a Raspberry Pi without using the official Google Voice Kit hardware. It describes the necessary software changes to access the microphone and speaker to enable voice interaction. It also explores using the Google Assistant software for other applications and languages like Romanian.

Uploaded by

Isti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

137 views

Implementasi Google Assistant With Raspberry Pi 3

Uploaded by

Isti

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 4

Implementation of Google Assistant on Rasberry Pi

Septimiu Mischie Liliana Mâiu-Iovan Gabriel Gpresc

Faculty of Electronics, Faculty of Electronics, Faculty of Electronics,
Telecommunications and Information Telecommunications and Information Telecommunications and Information
Technologies Technologies Technologies
Politehnica University of Timisoara Politehnica University of Timisoara Politehnica University of Timisoara
Timisoara, Romania Timisoara, Romania liliana.matiu- Timisoara, Romania
septimiu.mischie@upt.ro iovan@upt.ro gabriel.gasparesc@upt.ro
Abstract—This paper investigates the implementation of 978-1-5836-5925-0/18/$31.00 ©2018 IEEE
the voice Google assistant on a Raspberry Pi microcomputer. Another field where the speech recognition is widely
First, the Voice Kit of Google is presented. This device can be
attached to a Raspberry Pi and a voice Google assistant is
used is represented by automotive industry [13]. Thus, it is
obtained. The necessary steps to implement the voice Google estimated that 55% of the cars will contain a voice
assistant on a system that contains among the Raspberry Pi recognition system in 2019. An important part of the car’s
only a microphone and a speaker, instead of the Voice Kit, are function can be controlled by voice, especially those on
presented. In this way a voice Google assistant can be made entertainment area.
more easily and flexible. This newly created device has several
working modes that are analyzed. Finally, a speech Google has launched in May 2017, a project called
recognition system that works in Romanian language is AIY (Artificial Intelligence Yourself) Voice Kit [9-10].
presented and evaluated. This device contains mainly a so-called hardware on top
(HAT) that can be connected on the GPIO (General
Keywords—google assistant, voice, Raspberry Pi, speech Purpose Input Output) connector of a Raspberry Pi 3
recognition, Python. microcomputer, a microphone, a speaker and an Arcade
style button. The last three components must be connected
to the HAT by suitable wires and connectors. The function
I. INTRODUCTION of this combo device, AIY Voice Kit and Raspberry Pi 3,
Spoken dialogue systems or voice-controlled assistants correspond to those of a voice assistant. Thus, by using
are devices that can respond to multiple voices, regardless Google’s cloud facilities it can answer questions asked by
of accent, can execute several commands or can provide an the user. It can be further improved to execute some
answer, thus imitating a natural conversation [1-3]. Such a commands because the GPIO pins of the Raspberry Pi are
system, Fig. 1, contains more elements: an automatic available on the HAT and thus it can be interfaced with
speech recognition part (ASR), a spoken language other devices as remote controls, actuators, sensors. The
understanding (SLU), a dialogue manager (DM), a software part of this project contains several Python files
knowledge data base (KDB), a natural language generator and is included on the image of the Raspbian operating
(NLG) and a text to speech synthesis (TTS) system [7]. One system (OS) which is available on the AIY Voice Kit site
of the most important parts of such an assistant is [10].
represented by speech recognition also called speech to text
We have acquired, installed and used the AIY Voice
translation because it transforms human voice into a string
Kit. However, in this paper we present the implementation
of data than can be interpreted by the computer. There are
of a Google assistant on a Raspberry Pi without the AIY
some open source software packages that allow speech
Voice Kit. That means we used a usual USB microphone
recognition such as Kaldi [4] or Pocket Sphinx [5].
and a
However, in recent years cloud-based speech recognition
systems have been developed a lot. In this way, all elements
of a voice controlled assistant are placed in cloud. The most speaker for the 3.5 mm audio output of this
important ones from this category of assistants are Apple microcomputer. Other applications such as speech
Siri, Google Assistant and Amazon Alexa. They are present recognition in different languages especially in Romanian
in most smartphones and are based on artificial intelligence are explored. All of these imply knowledge of the
elements such as deep learning and neural networks. Apart operating system such as accessing the audio devices or
from voice digital assistants, other systems that have been knowledge of Python language to modify the initial form
developed using this idea are call centers [6], systems for of the software part. In this way, the Google assistant can
selfmanagement of diabetes patients [7] or shopping be embedded in a hardware device. This is an advantage in
assistant robots [8]. comparison with [11] where a smartphone is used as an
ASR SLU additional element in an assistive device for visually
impaired individuals. Thus, the smartphone receives voice
KDB commands and it sends them via Bluetooth to an Arduino-
DM
based system that accesses a speaker or a motor to
microphone implement some tasks depending on the voice commands.
TTS NLG The paper is organized as follows. In the second section
User the necessary changes to implement the Google assistant
on the Raspberry Pi without the AIY Voice Kit are
speaker presented. The third section presents the applications that
can be executed using the initial form of the software
Fig. 1. The structure of the voice controlled assistant among the newly introduced ones and the last section
concludes this work.
II. IMPLEMENTION OF GOOGLE ASSISTANT ON RASPBERRY type plug
PI WITHOUT THE AIY VOICE KIT slave { pcm
The software part of AIY project is included in the “hw:1,0”}
image of a Raspbian OS that can be downloaded from }
their site [10]. All of the following explanations regarding pcm.speaker
the software are valid for the release called aiyprojects-
type plug
2017-0911.img.xz. Thus, after installing the Raspbian OS,
the Google assistant can be used. It is available on slave { pcm
/home/pi/AIYvoice-kit-python. Mainly, the entire software “hw:0,0”
of this project is designed to use an input to record or }
capture speech signal and an output to generate the audio }
answer. By default, these two elements correspond to the In this way the default devices are those having the card
Voice Kit. This device contains two digital microphones number 1 (microphone) and 0 (onboard audio output).
(only one is used) that communicate with the Raspberry Pi Thus, recording and playing can be performed using
via a serial interface and a serial controlled DAC (digital suitable commands. However, there is an icon on the
to analog converter) that commands the speaker. Due to desktop of Debian OS, Check audio, than can be used for an
the HAT driver the two digital devices are seen by the OS interactive test that verifies playing and recording. By
as two usual audio devices as it will be seen in the double clicking on this icon a python file is executed (AIY-
following paragraphs. voice-kitpython/checkpoints/checkaudio.py) but it must be
modified before, namely the line VOICEHAT_ID =
In order to find the available audio devices, the Linux ‘googlevoicehat’ must be replaced by VOICEHAT_ID =
command can be executed: cat /proc/asound/modules ‘bcm2835’.
Thus, if this command is executed during the initial If everything is all right it can move on to execute the
form of the software, the answer is applications such as those from the following section.
0 snd_soundcard_googleVoice_hat There are only two problems before to test the AIY
A similar response can be obtained by the commands project. They are regarding the arcade style button that also
aplay –l and arecord –l that present the available Playback, contains a lamp. Thus, in our implementation of the Google
and the recording device, respectively. Furthermore, with a assistant that does not contain the Voice Kit, the function of
right click on the speaker icon from the status bar of the the arcade style button is replaced by a push button that
Raspbian OS, the same information is obtained. From all of must apply the GND to GPIO23 pin and a LED that must
these it follows that both recording and playback can be be connected to GPIO25 pin.
performed with the same device,
souncard_googleVoice_hat, that corresponds to the Voice III. TESTING AND IMPROVEMENT OF AIY APPLICATIONS
Kit.
The experimental setup that allows us implementing the
In order to implement the Google assistant onto the Google assistant is presented in Fig. 2. It is based on a
Raspberry without the Voice Kit we must disable the audio Raspberry Pi 3 system that contains a WiFi adapter. This is
devices from the Voice Kit and validate the onboard audio absolutely necessary because Google assistant needs an
output of the Raspberry Pi. Then we need to verify if a Internet connection which must be available all the time.
usual USB microphone device is recognized. The software part of the AIY project has two categories of
Thus, by considering the file /boot/config.txt, the lines applications. The first one is based on Google Assistant
dtoverlay = i2s-map and dtoverlay = SDK and the other one is based on Cloud Speech API. The
googlevoicehatsoundcard must be removed while the line first one is completely free while the second one requires
#dtparam=audio=on must be activated to enable loading minimal fee. Both of them require a Google account and
the driver bcm2835 that corresponds to the onboard audio several steps that include some information such as name of
output. After a reboot, the answer to the command cat the project. This registration part is finalized by
/proc/asound/modules is 0 bcm_snd2835, as expected. downloading of two .json files. The first one has the name
Then, if an USB microphone (actually we used the Logitech assistant.json and is used for Google Assistant SDK
C270 USB webcam that includes a microphone too) is applications. The other one has the name cloud_speech.json
connected, the answer to the same command will be 0 and is used for Cloud Speech API applications.
bcm_snd2835 and 1 snd_usb_audio. The numbers 0 and 1
represent so-called cards. If the commands arecord –l and
aplay –l are executed again it follows that card 0 is used for
playing while card 1 is used for recording. On the other
hand, a right click on the speaker icon presents now HDMI
(checked) and Analog. In order to allow the sound to be
sent to the onboard audio output, Analog option must be
checked.
Then the file /etc/asound.conf must be modified to have
the following structure pcm.!default {
type asym
playback.pcm
“speaker” capture.pcm
Fig. 2. The experimental setup
“mic”} pcm.mic {
The AIY project applications are included in the Thus, according to the content of the variable text some
AIYvoice-kit-python/src folder and are written in Python actions can be executed as we previously presented for the
language. The applications assistant_grpc_demo.py and application assistant_grpc.demo.py. The main advantage of
assistant_library_demo.py need assistant.json to be run, this application is that it can make speech recognition in
while cloudspeech_demo.py needs cloud_speech.json to be about 80 languages.
run.
To change the language used for speech recognition, the
When any of these applications are run for the first time, following program line from the file AIY-voice-
the Google website is accessed and the user name and the kitpython/src/ i18n.py must be used:
password are required. On subsequent runnings, no further
DEFAULT_LANGUAGE_CODE = ‘ro-RO’
login information is required, but however the internet
access is necessary because all the components of the where ro-RO is the code for Romanian language. All the
Google assistant are in cloud servers. First, the assistant codes are available on [12]. In Fig.4 a capture obtained
_grpc_demo.py is presented. It asks the user to press the during running of cloudspeech_demo.py application is
connected push button and then the Listening… message is presented. It can be seen that there is the same trigger
displayed, Fig. 3. That means the user can say e few words. (button) as in assistant_grpc_demo.py application. On the
The spoken text is immediately displayed and the answer of other hand the application recognizes at the same time the
the assistant is then generated to the speaker. This of course selected language (Romanian language in this example; we
could not be identified in Fig.3. The connected LED selected German language too) but also English.
becomes ON after the user pressed the button and remains
In order to create an application we connected three LEDS,
in this state until the end of the answer of the assistant. It
having the colours blue, red and yellow, respectively, to the
can be seen from Fig. 3 that the spoken text “Good bye” has
appropriate GPIO pins. This application allows the control
the effect to ends the application.
of the status of the three LEDS through Romanian words as
“Led albastru aprins”, “Led albastru stins” for the Blue
LED and in a similar way for the other two LEDS. As it
was described in the previous example, this function is built
by accessing the variable text. In the following, a short part
of the program is presented:
if text == 'led albastru aprins':
print('albastru ON')
GPIO.output(24,1) if text ==
'led albastru stins ': print
('albastru OFF')
GPIO.output(24,0)
Fig. 3. A screenshot from the terminal during running assistant_grpc_
demo.py application

By studying the source of the application we have

observed that there is a very powerful function
text, audio = assistant.recognize() that waits for an input
speech and then returns the recognized text as the variable
text and the answer as the variable audio. This can be
played by the function
aiy.audio.play_audio(audio)
According to [10] this application,
assistant_demo_grpc.py, can be used for a dialogue with Fig. 4. A screenshot from the terminal during running of the
the assistant while the application cloudspeech_demo.py cloudspeech_demo.py application
can be used to implement new commands that do not need
answers. However, because the variable text is available, Here we present an experiment that examines the
some simplest tests of this variable can be used to generate response time of the implemented Google assistant. The
commands. previous application was modified so that it contains a
The application assistant_library_demo.py is mainly counter that is incremented for each spoken text and which
similar to the previous application. The differences are: the also displays the current time stamp when the string of text
trigger of the application is represented by the words “OK is received. In addition, this application no longer requires
Google” while the decoded speech is not displayed on the the user to press the button to be able to speak, as of Fig.4.
screen. Furthermore, the connected LED becomes ON Instead, the user can speak when the message “Listening…”
when the trigger is detected and it blinks during the answer is displayed. While this application is running, the user does
of the assistant. the following:
As we previously presented the application
cloudspeech_demo.py uses Cloud Speech API and is not 1.Says the text ‘albastru aprins’
free. However the cost of using it is negligible. When we 2.Waits until the blue LED becomes ON
made the operations to obtain the cloud_speech.json file,
we obtained a credit that could be used for 12 months. This 3.Says the text ‘albastru stins’
application does not return any audio answer. It only returns 4.Waits until the blue LED becomes OFF
the spoken text, by function text = recognizer.recognize()
5. Goes to step 1.
We executed this experiment of few times, until the Also we want to conduct a speech recognizer experiment
counter got different values, for instance 21. Fig. 5 presents that does not require internet access. In addition, we would
a screenshot from the terminal during this experiment, like to incorporate a text to speech function on Raspberry Pi
while Fig. 6 presents the differences between successive that uses Romanian as its default language.
time stamps. It can be seen that the minimum value is about
2 sec and the maximum one is 2.6 sec. We computed the ACKNOWLEDGMENT
average of these time difference and obtained 2.3 sec. This
value represents the period of time necessary to execute The work of the first author was supported by a grant of
voice commands. Of course this value contains the time to the Romanian Ministry of Research and Innovation,
say the text (about 1 sec), to send the speech samples in CCDIUEFISCDI, project number PN-III-P1-1.2-PCCDI-
cloud, to find and receive the spoken text. This also 20170917 / contract number 21PCCDI/2018, within PNCDI
depends on the user concentration to observe the changes of III.
the LED’s state and to make a correct pronunciation.
However this value is acceptable for a practical application.
Even if we did not make an intensive test regarding the
accuracy of this assistant to recognize the spoken words, we REFERENCES
can say that during our tests an accuracy of at least 90
percent has been obtained. [1] P. Milhorat, S. Schogl, G. Chollet et all “Building the Next
Generation of Personal Digital Assistants” 1st International
Conference on Advanced Technologies for Signal and Image
Processing – ATSIP’2014, March 17-19, 2014, Sousse, Tunisia,
pp.458–463.
[2] V. Kepuska and G. Bohouta, “Next Generation of Virtulal Personal
Assistants (Microsoft Cortana, Apple Siri, Amazon Alexa and
Google Home)”, 2018 IEEE 8th Annual Computing and
Communication Workshop and Conference 8-10 Jan. 2018 Las
Vegas, USA, pp.99– 103.
[3] P. J. Young, J. H. Jin, S. Woo and D. H. Lee, “Bad Voice: Soundless
Voice-control Replay Attack on Modern Smartphones”, 2016 Eigth
International Conference on Ubiquitous and Future Networks
(ICUFN), Vienna, Austria, pp. 882–887.
Fig. 5. A screenshot of the terminal during the experiment for measuring
[4] Kaldi Toolkit for Speech Recognition, http://kaldi-asr.org/index.html,
the time response
accesssed July 7, 2018.
[5] Open Source Speech Recognition Toolkit,
https://cmusphinx.github.io/, accessed July 7, 2018.
[6] M. Ceaparu, S.A. Toma, S. Segrceanu and I. Gavt, “Voice-based
User Interaction System for Call-Centers, Using a Small Vocabulary
for Romanian”, The 12th International Conference on
Communications, COMM 2018, 14-16 June, 2018, Bucharest,
Romania.
[7] A. Cheng, V. Raghavaraju, J. Janugo et all, “Development and
Evaluation of a Healthy Coping Voice Interface Application Using
the Google Home for Elderly Patients with Type 2 Diabetes”, 2018
15th IEEE Annual Consumer Communication & Networking
Fig. 6. The differences between time stamps as a function of the counter Conference (CCNC), Las Vegas, USA.
value [8] Chi Zhao, “Text Labeling Applied in Shopping Assistant Robot using
Long Short-Term Memory”, 2018 International Conference on
Intelligent Transportation, Big Data & Smart City, Xiamen, China.
IV. CONCLUSIONS [9] The MagPi, The official Raspberry Pi magazine, issue 57, May 2017,
pp.14-33, raspberrypi.org/magpi, accesse July 7, 2018.
This paper introduced the possibility of using the
Google assistant on a Raspberry Pi microcomputer. Starting [10] Google official website of artificial
intelligence projects,
from scratch, all the details, software and hardware are https://aiyprojects.withgoogle.com/voice/, accessed July 7, 2018.
presented. Google assistant is available on smartphones,
but using it on Raspberry Pi has the advantage that this [11] D. Munteanu and R. Ionel, Voice-Controlled Smart Assitive Device
for Visulally Impaired Individuals, 2016 12th International
microcomputer can be interfaced with other hardware Symposium on Electronics and Telecommunications, IESTC 2016,
devices. We presented a simple example that turns on/off pp. 186-189.
some LEDS by voice commands. However, this application [12] Google cloud official site, https://cloud.google.com/speech-
can be extended to any smart home or assistive devices for totext/docs/languages, accessed July 7, 2018.
impaired individuals. Most importantly, this application can
[13] F.Weng, P. Angkititrakul, E. Shirberg et all, Conversational
work in practice in any language, however, only one at a InVehicle Dialog Systems: The past, present, and future, IEEE Signal
time. We conducted a Processing Magazine, vol. 33, issue 6, pp. 49-60, 2016.
large number of tests in Romanian using text strings of
several words and the system responded in real time and
with a good accuracy. This system requires internet access,
however, this is no longer a problem these days.
In terms of future work, we would like to implement an
application that will use the voice command to interact with
a DC motor that can be integrated in a smart home device.