Mikkonen Tiia
Mikkonen Tiia
Mikkonen Tiia
The topics of artificial intelligence, machine learning, deep learning, neural networks,
artificial neural networks, convolutional neural networks, computer vision, YOLO,
raspberry pi, python, TensorFlow and OpenCV are also discussed to help
understand how they work and how everything is interconnected.
Finally, the process of setting up a raspberry with camera, installing the OS, installing
all of the required libraries, along with YOLO, TensorFlow, and OpenCV are gone
through in detail.
List of Abbreviations
1 Introduction 1
2 Theoretical Background 3
3 Methodology 27
4 Discussion 39
5 Conclusion 40
References 41
List of Abbreviations
COVID-19: Coronavirus.
1 Introduction
With this thesis, a solution for capacity monitoring that makes use of algorithms
that identify and recognise diverse objects in real time will be discussed. Among
the object detection solutions that will be discussed, "You Only Look Once" will
be the main point of emphasis. There are a variety of additional techniques that
may be used for object detection, but not all of them can be discussed in detail
here due to space constraints. This presentation will provide a brief overview of
the following technologies: Histogram of Oriented Gradients, Region-based
Convolutional Neural Networks (including Fast R-CNN and Faster R-CNNS),
Region-based Fully Convolutional Network and Single Shot Detector, and
Spatial Pyramid Pooling, which are all examples of machine learning
techniques. It is widely acknowledged that "You Only Look Once," often known
as YOLO, is the most widely used object recognition technique in use today.
During March 2020, the world went into a global lockdown in order to slow down
the spread of COVID-19. The WHO started announcing recommendations of
social distancing, self-isolation/quarantine, adequate hand hygiene, and the use
of face masks [1]. Medical faculties across the globe also started developing
vaccines to help prevent infections and/or serious illness. Many governments
started imposing country-wide lockdowns along with remote work
recommendations to slow down the spread. Once establishments were allowed
to open, capacity limitations were set in places such as restaurants, event
venues, private gathering, outdoor gatherings, and public transit in some cases.
As of the 18th of April 2022, more than 504 million cases of COVID-19 have
been confirmed across the globe with about 6.22 million deaths [2]. Europe
itself has gone through at least 3 waves of lockdowns since the start of the
pandemic with varying restrictions imposed by each countries’ governments.
This is where object detection algorithms would come in handy to help
businesses and patrons follow restrictions along with making smart decisions
based off the capacity of an establishment along with the mask use within that
facility.
With the use of YOLO, a business can easily comply with government
restrictions put in place. Another possible use is to see how prominent the use
of face masks is in the establishment. The technology also has the possibility of
other uses such as rush hours prediction, allowing patrons to see how busy the
establishment is before arriving, even allow patrons to know whether there are
any animals in the establishment if they suffer from allergies or asthma, or even
analyse the popularity of each entrance/exit.
3 (44)
2 Theoretical Background
In order to get a better understanding of the solution described in this thesis, the
following topics should be discussed beforehand: machine learning (ML), deep
learning (DL), object detection (OD), neural networks (NN), computer vision,
raspberry pi and python. The purpose of the following subchapters is to give a
brief explanation of the topics and how they relate to subject at hand.
Figure 2. AI vs ML vs DL [9].
On the other hand, DL is just a sort of ML that tries to replicate the structure of a
person’s brain, and it is not a new concept. Deep learning algorithms aim to form
conclusions that are comparable to those reached by humans by continuously
examining data with a predetermined logical framework. Deep learning does this
by using neural networks, which are multi-layered structures of algorithms [9]. DL
should be applied when solution experts are not available or when they are
unable to explain their decisions in cases such as stock preferences or price
predictions. In Figure 3, the differences between ML and DL can be seen.
6 (44)
Deep learning can be found in self driving cars, news aggregation along with
fake news detection, virtual assistants, visual recognition, health care, fraud
detection, along with various other uses.
Neural networks which are in deep learning techniques are used to model data.
Their structure and name originate from the brain, where neurons convey data.
They are also known as artificial neural networks (ANNs). A neural network can
be seen in Figure 4 below.
Many different types of neural networks exist, such as: deep feed-forward
(DFF), feed forward (FF), perceptron (P), radical basis network (RBN), recurrent
7 (44)
neural network (RNN), along with various other types seen in Figure 5 below
[10].
Artificial neural networks (ANNs) consist of various node layers. Each of those
nodes most consist of the following components, but are not limited to: input
layer, hidden layers, and an output layer [10]. Each node, otherwise known as
an artificial neuron, corresponds with the others and has a threshold and
weight. The defined threshold is the minimum requirement that determines
whether the node is active, before allowing data to be sent to the next layer of
the network from there. If the defined threshold is not met, then no data is sent
on to the next portion in the hierarchy.
“When given an input of Wout times Wout times Dout and the amount of kernels
of spatial size F, stride S, and padding P, the output volume may be calculated
as follows” [14]:
Fully-connected layers are level feed-forward neural networks layers [15]. They
might use a non-linear activation feature or softmax activation in order to
produce course forecast likelihoods [15]. As soon as the merging and also
convolutional layers have actually drawn out and also combined functions, the
layers are released. The network makes use of these to produce last non-linear
function mixes as well as make forecasts [15].
• LeNet (1998)
• AlexNet (2012)
• GoogLeNet (2014)
• VGGNet (2014)
• ResNet (2015)
11 (44)
The invention of LeNet marked the beginning of the history of deep CNNs. At
the time, CNNs were only capable of performing handwritten digit recognition
tasks, which means they couldn't be applied to other picture classes. AlexNet is
highly regarded in the field of deep CNN architecture since it has produced
groundbreaking achievements in the fields of image recognition and
classification. Initially presented by Alex Krizhevesky, Ilya Sutskever, and Geoff
Hinton, AlexNet has since been enhanced in terms of learning ability by
increasing the depth of the network and using a number of parameter
optimization algorithms. Using the AlexNet architecture as an example, Figure 8
depicts the fundamental design [16].
It was the ResNet CNN architecture developed by Kaiming He et al. that took
first place in the 2015 International Learning and Teaching Virtual Reality Task
[16]. It had a top-five error of just 15.43 percent in the classification challenge.
13 (44)
With 152 layers and more than one million parameters, the network is
considered deep even for CNNs [17]. It would have taken more than 40 days on
32 GPUs to train the network on the ILSVRC 2015 dataset, which is an
extremely long period of time. CNNs are most typically employed to handle
picture classification tasks with 1000 classes, however ResNet illustrates that
CNNs may also be used to address natural language processing issues like as
sentence completion and machine understanding [17], as shown by ResNet.
ResNet is shown in block diagram form in Figure 10.
Training data helps neural networks learn and increase accuracy over the
course of time. When refined, these learning algorithms are important tools in
artificial intelligence as well as computer science in general, allowing scientists
to rapidly categorise and cluster data. The time it takes for voice or picture
recognition is comparable to manual identification by human professionals.
Google's search algorithm uses a neural network.
ANNs can be used for predicting weather, predictive analysis for businesses, for
example SCM forecasting, speech to text transcription, facial and handwriting
recognition and spam email monitoring. ANNs are adaptive, have an enhanced
learning capacity, slow corruption and distributed storage across an entire
network and not just on one database. However, ANNs suffer since they require
profuse quantities of data in order to train, the architecture does allow for clear
explanations as to how a result was reached [18].
Using CNNs in computer vision applications has several advantages over other
types of classical neural networks, which are stated as follows [16]:
Neural networks are a vast and growing field with various architectures. For the
sake of this thesis, the most accurate and commonly used where discussed.
15 (44)
2.4 Object Detection and Computer Vision
The most notable two-stage object detection algorithms include the following:
• YOLO (2016)
• SDD (2016)
• RetinaNet (2017)
• YOLOv3 (2018)
• YOLOv4 (2020)
• YOLOR (2021)
Verification has been performed on the impact of the latest item detection
approaches, including the "Bag-of-Freebies" and the "Bag-of-Specials," on the
training of detectors [22].
Throughout the R-CNN procedure, the inputted image is first split right into
approximately 2 thousand location items, and after that a convolutional
semantic network is put on each market of the photo consequently. Utilizing
these details, the dimension of the locations is approximated, as well as the
ideal area is after that included right into the semantic network. It might be
reasoned that a time frame can be enforced by a comprehensive way such as
that explained. As a result of the reality that it identifies as well as constructs
bounding boxes on a specific basis, and also given that a semantic network is
put on one area each time, the training duration is considerably longer than with
YOLO [23].
Rapid R-CNN, which was developed in 2015 with the objective of decreasing
train period by a huge quantity, was presented. When contrasted to the initial R-
CNN, which calculated semantic network attributes on each of as much as 2
thousand areas of passion separately, Rapid R-CNN runs the semantic network
just when overall photo. This is incredibly comparable to the style of YOLO, yet
YOLO remains to be a quicker choice to Quick R-CNN as a result of the
simpleness of the code [23].
20 (44)
2.4.3 Fast R-CNN
The Fast R-CNN algorithm is superior to the original R-CNN algorithm in that it
runs the neural network only once on the entire picture rather than on each of
up to two thousand areas of interest separately, as the previous R-CNN
algorithm did. The previous R-CNN algorithm ran the neural network on each of
up to two thousand areas of interest separately. Final results include the
invention of a revolutionary approach known as ROI Pooling, which slices each
ROI from the network output tensor, then shapes and categorises each ROI in
line with the category into which it was classified. Similarly to the original R-
CNN, the Fast R-CNN generates its area proposals by the use of a process
known as Selective Search, which is comparable to that used by the original R-
CNN.
While Fast R-CNN generated ROIs through the use of Selective Search, Faster
R-CNN incorporates the process of ROI formation within the neural network
itself [24].
Using the object detector, scores are generated for each default box depending
on the existence of each item category in each default box, and the default box
is altered to better match the shape of the object in each default box. In
addition, predictions from a number of feature maps with varied resolutions are
included into the network in order to cope with objects of varying sizes and
shapes [25].
As the name suggests, it is a kind of CNN that uses spatial pyramid pooling
(SPP-net) to conquer a fixed-size restriction in both the input as well as the
outcome dimensions of the network. Even more, the SPP layer is used in
addition to the last convolutional layer prior to the last making is finished. SPP
layer swimming pools functions as well as creates fixed-length outcomes, which
are after that fed right into layers 3 and also 4. The SPP layer is absolutely
adjoined with layers 1 and also 2, yet not with layers 3 as well as 4. Therefore,
we do some info gathering at a greater degree in the network's framework to
avoid the demand for chopping or deforming at the start of the network's
understanding stage [27].
2.5 Raspberry Pi
• Pi 1 Model B (2012)
• Pi 1 Model A (2013)
• Pi 1 Model B+ (2014)
• Pi 1 Model A+ (2014)
• Pi 2 Model B (2015)
• Pi Zero (2015)
• Pi 3 Model B (2016)
• Pi Zero W (2017)
• Pi 3 Model B+ (2018)
• Pi 3 Model A+ (2019)
• Pi 4 Model A (2019)
• Pi 4 Model B (2020)
• Pi 400 (2021)
2.6 Python
• Automation
• Data analysis
• Everyday tasks
• Machine learning
• Scripting
• Software protyping
• Software testing
• Web development
2.7 Numpy
2.8 Tensorflow
For Python and C++, TensorFlow provides a stable API, as well as a non-
guaranteed backward compatible API for other programming languages.
3 Methodology
The setup process will include the following steps, which will be discussed in
their own subchapters.
Now the SD-card can be inserted into the Raspberry Pi. Finally insert the power
supply, at least 3.0 amps, to the Raspberry Pi 4 model B. Once the device
powers on a screen similar to Figure 18 should be seen.
Click next. Select your country, language, and time-zone, and then click on Next
once again to proceed.
31 (44)
Enter a new username and password for your Raspberry Pi and then press the
Next button.
Configure your screen such that the Desktop takes up the entire width of your
display.
Connect to your wireless network by selecting the network's name, inputting the
password, and pressing the Next button on your computer's keyboard.
Click on Next, and the process will check for and install any necessary updates
to the Raspberry Pi OS (this might take a little while).
Once the device starts up, open the terminal and then run the commands below
to update the OS.
Once those commands have completed, the device should be restarted again.
This ensures that all libraries, firmware and the OS are up-to-date.
pip3 --version
This command might not install the most recent version, and it will need to be
upgraded by using the following command:
Before proceeding to the next stage, we must first create a batch of packages
using the command listed below [35]:
The issue arises as a result of the fact that the current version of OpenCV is
incompatible with the Raspberry Pi. There are a number of prerequisites that
must be installed via apt-get in order for OpenCV to function properly on the
Raspberry Pi. If running "sudo apt-get update" does not work and try again with
the following command [35]:
If you have already attempted to install OpenCV, you can simply run to remove
the unwanted package.
33 (44)
pip3 uninstall opencv-python
If there are any issues still, check from OpenCV’s website which versions are
compatible with the Raspberry Pi that is being used.
Check the Python3 version on your computer. A different wheel is required for
each edition. The Python 3.7.3 programming language is currently used by the
Raspberry Pi 64-bit operating system. The version 2.2.0 of TensorFlow for Linux
should be downloaded and installed from their official website. Python will release
new versions over time, and it will require an update. In order to ensure all the
correct libraries, enter the following commands in the terminal [36] [37].
After completing the installation process successfully, you should see the
screen dump shown below in Figure 19.
34 (44)
TensorFlow Lite is not included in the repositories. Rather than that, one must
make use of Google's package repository. On the Raspberry Pi, it is required to
35 (44)
add the Google package repository containing TensorFlow Lite. Due to the fact
that we amended the package sources on our Raspberry Pi, we must update
our package list to include the newly added repository. Once those steps have
been completed then TensorFlow Lite can be installed using the following
commands in the terminal.
python3
from tflite_runtime.interpreter import Interpreter
The pre-trained weight file can be downloaded via command below [38].
wget https://pjreddie.com/media/files/yolov3.weights
Darknet logs the items it detects, its confidence in them, and the time it took to
locate them. Darknet was not compiled with OpenCV, and so cannot display the
detections directly. Rather than that, it stores them in predictions.png. The
identified items can be viewed by opening the image. Due to the fact that
Darknet is being utilized on the CPU, each image takes around 6-12 seconds to
load. It would be significantly quicker if we used the GPU version. In Figure 20,
the prediction image generated by the program can be seen.
Then, using the tony configuration file and weights, run the detector:
The next set is to configure the Raspberry Pi camera along with setting up the
real-time object detection.
First the camera needs to be turned on, which can be seen below in Figure 21.
Run sudo raspi-config and navigate to the main menu of the Raspberry Pi
Software Configuration Tool. Select Interfacing Options.
In the subsequent menu, hit Enter after selecting Enable with the right arrow
key.
In order to initialize the camera and generate a reference for the raw camera
capture, the following code could be used.
cam = PiCamera()
cam.resolution = (640, 480)
cam.framerate = 32
rawCap = PiRGBArray(camera, size=(640, 480))
time.sleep(0.1)
39 (44)
The final steps include writing code that utilizes the object detection algorithm of
your choice. One important aspect to consider is the thresh value which only
determines which object are detected based off the confidence level. Along with
adding capacity tracking, it is possible to also identify persons with masks or
animals and generate statistics from those findings. Then those statistics can be
stored within a database and visual representations can be generated based off
that data.
4 Discussion
The original intention of this thesis was to create a prototype that could be used
to monitor the capacity of a selected area. The primary blocker of the project
was the lack of power or resources of the original Raspberry Pi that was used.
After installing YOLOv3 along with VOLOv3 Tiny, it was clear that the resources
of the Raspberry Pi 3 model B (1 GB RAM) were exhausted. After numerous
attempts, a Raspberry Pi 4 model B (4 GB RAM) was borrowed in order to test
whether YOLOv3 could run on it. At this point, there was some success but
nothing viable. YOLOv3 Tiny thankfully produced results and was able to read
an image along with real-time video. The processing quality and speed were
lacking, however. Using YOLOv4 was also installed, and some tests were run,
but again, the Raspberry Pi did not have enough resources to run it. At this
point, it was clear that the current Raspberry Pis could not run YOLO efficiently.
One option is to utilize an Intel® Neural Compute Stick 2 (Intel® NCS2). The
Intel® NCS2 is based on the Intel® MovidiusTM MyriadTM X VPU, which
features 16 configurable shave cores and a specialised neural compute engine
for accelerating deep neural network inferences in hardware. This would,
however, be a much more complex solution for capacity monitoring using object
detection algorithms as well as require a different setup. The setup with the
greatest performance includes wired or wireless cameras that feed the real-time
video to a central server or computer(s), which utilizes a NVIDIA GPU, that
have the scripts and YOLO models running on it.
40 (44)
5 Conclusion
Object identification utilising deep learning and neural networks have made
huge strides in the last several years, and the topic is currently quite popular.
New research articles, algorithms, or solutions to bugs are being published
more frequently than before.
The purpose of this thesis was to explore various object detection algorithms,
discussion of artificial intelligence, machine learning, neural networks and
convolutional neural networks and explanation of how-to setup YOLOv3 on a
Raspberry Pi. With all this collective theory and solution, one should be able to
effectively monitor traffic and the capacity of an establishment. A business may
easily comply with government limitations by utilising YOLO. Another
conceivable application is to determine how prevalent the usage of face masks
is in the institution. Other possible applications for the technology include rush
hour prediction, which allows patrons to see how busy the establishment is
before arriving, as well as knowing whether there are any animals in the
establishment if they suffer from respiratory issues and analysing the popularity
of each entrance/exit for safety planning. With all these possibilities, the
technology can be utilized by both the business and the patrons.
Even though the project was not entirely successful, there were still valuable
conclusions drawn from the theory and research done during the thesis. One
can use those learning and implement a better, faster solution for capacity
monitoring in the future.
41 (44)
References
3 Koza JR, Bennett FH, Andre D, Keane MA. Automated design of both the
topology and sizing of analog electrical circuits using genetic
programming. Artificial Intelligence in Design ’96. 1996;:151–70.
4 Edgar TW, Manz DO. Research methods for cyber security. Cambridge,
MA: Syngress, an imprint of Elsevier; 2017.
10 Team TAIE. Main types of neural networks and its applications - tutorial
[Internet]. Towards AI. 2022 [cited 2022Apr30]. Available from:
https://towardsai.net/p/machine-learning/main-types-of-neural-networks-
and-its-applications-tutorial-734480d7ec8e
11 Veen Fvan. Fjodor van Veen, author at the Asimov Institute [Internet]. The
Asimov Institute. 2017 [cited 2022Apr30]. Available from:
https://www.asimovinstitute.org/author/fjodorvanveen/
37 EdjeElectronics. Tensorflow-lite-object-detection-on-android-and-
raspberry-pi [Internet]. GitHub. 2020 [cited 2022May1]. Available from:
https://github.com/EdjeElectronics/TensorFlow-Lite-Object-Detection-on-
Android-and-Raspberry-Pi/blob/master/Raspberry_Pi_Guide.md