Report_DT26072020
Report_DT26072020
Report_DT26072020
A Project Report on
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by,
Jyothsna P 1JS16CS045
Cauvery A 1JS16CS031
Bhargav Hegde 1JS16CS027
Akshitha Y V 1JS16CS010
A Project Report on
BACHELOR OF ENGINEERING
IN
COMPUTER SCIENCE AND ENGINEERING
Submitted by,
Jyothsna P 1JS16CS045
Cauvery A 1JS16CS031
Bhargav Hegde 1JS16CS027
Akshitha Y V 1JS16CS010
CERTIFICATE
This is to certify that the thesis work entitled “Garbage Segregation System using Image
Processing” carried out by JYOTHSNA P (1JS16CS045), CAUVERY A (1JS16CS031),
BHARGAV HEGDE (1JS16CS027), and AKSHITHA Y V (1JS16CS010) in partial
fulfilment for Bachelor of Engineering in Computer Science and Engineering of the
Visvesvaraya Technological University, Belagavi during the academic year 2019 - 2020. It
is certified that all corrections and suggestions indicated during internal assessment have been
incorporated in the report. The seminar report has been approved as it satisfies the academic
requirements in respect of project work prescribed for the award of degree of Bachelor of
Engineering.
1. …………………………… ………………………………
2. …………………………… ………………………………
JSS MAHAVIDYAPEETHA, MYSURU
DECLARATION
We hereby declare that the entire work embodied in this project report titled “Garbage
Segregation System using Image Processing” submitted to Visvesvaraya Technological
University, Belagavi is carried out at the department of Computer Science and
Engineering, JSS Academy of Technical Education, Bengaluru under the guidance of
Dr. Naveen N C, Professor & Head. This report has not been submitted for the award of any
Diploma or Degree of this or any other University.
To solve this problem, this project aims to automate the task of segregation of the
waste. With this project, the attempt is to segregate waste with the use of machine learning
algorithms and image processing technology to identify the different types of waste
automatically, and send the waste to the appropriate bin with the help of an IoT based robotic
model.[2]
[i]
ACKNOWLEDGEMENT
We express our humble gratitude to Holiness Jagadguru Sri Sri Sri Shivarathri
Deshikendra Mahaswamiji who has showered their blessings on us for framing our career
successfully.
The completion of any project involves the efforts of many people. We have been lucky
enough to have received a lot of help and support from all quarters during the making of
this report, so with gratitude, we take this opportunity to acknowledge all those whose
guidance and encouragement helped us emerge successful.
We are thankful for Karnataka State Council for Science and Technology (KSCST) for
allowing us the opportunity to showcase the project and funding of the robotic arm. We
are grateful for JSS Management for their support in financing this endeavor and help in
making a distinct impression for the Department of Computer Science and Engineering,
JSS Academy of Technical Education, Bangalore.
We are thankful to the resourceful guidance, timely assistance and graceful gesture of our
guide Dr. Naveen N C, Head of Department of Computer Science and Engineering, who
has helped us in every aspect of our project and for the facilities and support extended
towards us. We express our sincere thanks to our beloved principal, Dr. Mrityunjaya V
Latte for having supported us in our academic endeavors.
And last but not the least; we would be very pleased to express our heart full thanks to all
the teaching and non-teaching staff of CSE department and our friends who have
rendered their help, motivation and support.
JYOTHSNA P
CAUVERY A
BHARGAV HEGDE
AKSHITHA Y V
[ii]
TABLE OF CONTENTS
Abstract…………………………………………………………........ i
Acknowledgement……………………………………………........... ii
List of figures………………………………………………………... vi
Abbreviations used…………………………………………………... xi
Chapter 1 Introduction………………………………………………….............
1.1 Overview……………………………………………………….. 1
1.2 Motivation……………………………………………………… 1
1.3 Objective…………………………………………………......... 2
1.4 Organization……………………………………………………. 2
[iii]
2.9 You Only Look Once: Unified, Real-time Object Detection…... 11
4.1.2 Pi Camera……………………………………………………. 33
4.2.2 Darknet……………………………………………………… 36
4.2.3 YOLO……………………………………………………….. 37
4.2.4 OpenCV……………………………………………………... 38
4.2.5 PIGPIO……………………………………………………… 38
[iv]
5.2 Data flow diagram……………………………………………… 40
6.1 Introduction……………………………………………………... 43
6.3.1 Installation…………………………………………………... 44
6.3.3 Tiny-YOLOv3………………………………………………. 45
6.4 Pseudocode……………………………………………………... 45
6.5 Training…………………………………………………………. 48
6.6 Testing………………………………………………………….. 50
8.1 Conclusion……………………………………………………… 59
References……………………………………………………........... 61
Appendix…………………………………………………………….. 64
[v]
LIST OF FIGURES
2.11 Darkent-53 15
[vi]
4.3 Robotic Arm 34
[vii]
LIST OF TABLES
[viii]
ABBREVIATIONS
[ix]
Garbage Segregation System using Image Processing
Chapter 1
INTRODUCTION
1.1 Overview
Garbage is waste material that is discarded by humans, usually due to a perceived lack
of utility. It doesn’t include bodily waste products, purely liquid or gaseous wastes, or toxic
waste products. They are commonly sorted and classified for specific kinds of disposal.
Garbage is actually the technical term for putrescent organic matter. Garbage is discarded in
ways that causes it to end up in the environment in an eco-friendly way as possible. Garbage
segregation means dividing waste into wet and dry. Garbage collected in each area is to be
separated before treating or reusing them for any other purposes.
1.2 Motivation
Bengaluru has a major garbage problem and the municipal corporation mandates that
the households must segregate the waste into wet and dry waste. Unfortunately, the basic
waste segregation is not being followed. There is a requirement of an effective system that
can ensure perfect segregation of wastes. The system must be able to perform the task of
separating garbage appropriately in an automated fashion.
1.3 Objectives
The project aims to develop a technology for garbage segregation using image
processing for the following things:
1.4 Organization
The report consists of 8 chapters. Chapter 2 consists of Literature Survey which
includes the abstract of papers studied before starting the project. Chapter 3 consists of
Comparative Research for the selection of system requirements. Chapter 4 contains System
Requirements which consists of both software and hardware components used. Chapter 5
explains the System Architecture along with the block diagrams. Chapter 6 gives the detailed
explanation of the implementation of the project. Chapter 7 contains the results of the project
and Chapter 8 gives the conclusion and future enhancements on the project.
Chapter 2
LITERATURE SURVEY
During 2014, the common method of waste disposal was by unplanned and
uncontrolled open dumping at the landfill sites. This method is injurious to human health,
plant and animal life. There is a dependency on rag pickers for collecting the recyclable
wastes from the dump. This is not the efficient way of segregating. When the waste is
segregated into basic streams such as wet, dry and metallic, there was higher potential of
recovery, recyclability and reusability. There were large scale industrial waste segregators
present. In this, larger items were removed by manual sorting, then they were passed through
rotating drums with perforated holes of certain size.
Proposed system, Automated Waste Segregator (AWS), proximity sensors were used
to identify the incoming of waste and this starts the entire system. The waste falls on metal
detection system which detects the metallic waste. After this, the object falls into capacitive
sensing module, which distinguishes between wet and dry waste. After the identification the
circular base which holds the containers is rotated to collect corresponding waste.
The differentiation of wet and dry waste is done using dielectric constant. When a
dielectric is introduced between the plates of capacitor, the capacitance increases. Wet waste
has a higher relative dielectric constant than dry waste coz of the moisture, oil and fat content
present in organic waste. A threshold is set, if the value is greater than the threshold then it is
inferred as wet waste or else it is dry waste.
Data set was created by hand by the presented students. It contains images of recycled
objects like paper, glass, plastic, metal, cardboard, etc. Data augmentation techniques were
performed due to the small size of each class. SVM was used for the classification of trash
into categories. Features used for SVM were SIFT features. Torch7 Framework was used to
construct CNN. The CNN was eleven layered network which is very similar to AlexNet.
CNN was trained with train-val-test split of 70-13-17 with 60 epochs and a batch size of 32.
Only single objects were supposed to be given as input for the classification.
Test accuracy with the SVM was 63% and that of CNN was only 22%. To improve
the CNN results, more data should have been collected and used. The system with 22%
accuracy wasn’t practical. Identification of multiple objects from a single image or video is
required for practical implementation.
Proposed system consists of 2 discs, rotating and stationary disc. Bins are placed
between the discs. All the sensors are placed on the rotating disc at specified spots. IR
proximity sensor detects the waste when kept and starts the system. Moisture sensor detects
the wet waste. If not wet waste, it is passed to next sensor i.e., Inductive Proximity sensor for
metal detection. The last module has a Laser LDR circuit that detects plastic and paper. If the
LASER kept passes through the trash it is detected as plastic or else it is paper. Once the
detection is done, the waste is pushed to respective bins with the help of Servo Motors.
Even though this system could be used in large scale, there are limitations. The
limitations include the size of trash. It should be minimum of 30mm width. The system can
segregate only one type of waste at a time. Further improvements are required for it to be
actually used in public places. The system is to be constructed as per that size and more
sensors should be used to detect more objects and multiple objects at the same time.
The proposed idea was to create an autonomous system which segregates the waste
using CNN algorithm in machine learning. The algorithm detects and classifies the waste
according to the dataset provided to CNN. The algorithm classifies the waste as
biodegradable and non-biodegradable. The waste material is recognized based on shape and
size of the objects. Raspberry Pi is used to capture the image and to push the classified object
to respective bin.
The proposed solution empowers waste monitoring personnel by notifying when the
fill-level or safe gas emission levels are surpassed. The proposed smart bin consists of a
number of sensors. Once the bin gets filled it will automatically move to garbage collecting
area, dispose the waste and return back to its place. A rain sensor is used to sense rain and
close the bin automatically; a gas sensor is used to detect the gas level and to alert the
passersby with the buzzer. All these sensors are connected to Arduino UNO. A DC motor is
used to move around. The process is repeated by means of a microcontroller.
The paper proposes IoT stationed smart waste segregation and management device
which detects the wastes using sensors and the information is directly transferred to cloud
database via IoT. Microcontroller is used as a mediator between the sensors and IoT system.
Numbers of sensors were used to detect the waste class. Ultra-sonic sensor was used to detect
the presence of waste. Moisture sensor is used to detect the moisture content in waste and
metal sensor to separate metal items from the rest of the waste. Image processing is used to
identify other plastics and degradable items. The dustbin data were uploaded to cloud
database.
SURF algorithm is used for feature extraction of the images. KNN algorithm is used
to compare the test image with dataset images. The accuracy of this system was 99%. The
limitation of the system was only one waste material can be put into the dustbin at a time.
The quality of the image decides the classification accuracy. And if the so kept material is not
a part of the database it affects the result. Due to these limitations, this system was not fit to
be used in large scale.
The proposed system is based on a robotic assembly and machine learning based
classification. The robotic arm is used to move the object from one place to the classifier
platform. The robot senses the object using the ultrasonic sensors to calculate the distance to
the target object. It uses CNN based classifier for identifying the class, a robotic arm is driven
by servo motor for handling the waste. Arduino board is used for controlling the assembly.
The proposed idea was to use an STM32 controller to create an eco-friendly waste
segregator that would segregate between biodegradable, metal and plastic waste. The waste
that is being dropped into the smart dustbin is segregated right at the source, thus preventing
any improper waste management in the future. This method is proposed to be used in primary
waste generation locations. This model is equipped into the dustbin itself.
The model when compared with the technology previously used waste segregation
methods shows the following advantages:
The model segregates any waste dropped in the bin at the panel with the help of
sensors and the corresponding valves on the segment are opened and the waste is dumped
into their respective segment. Furthermore, alarming levels of microbe activity in the
biodegradable segment are controlled with sensors and chemical treatment. The sensors in
this segment include gas sensors for methane gas produced and even controls the odour of the
gas. The system is equipped with a Wi-Fi module for connectivity with a data service that
will continuously monitor the threshold levels of the waste in the bin. Once a threshold level
is reached, it alerts that the garbage needs to be disposed. Similarly, for metal and plastic
waste, inductive and capacitive sensors are used. Metal objects dumped are coated with an
acrylic coating to prevent reactivity. Overall, the bins are equipped with level sensors to
indicate the collected waste levels and sends the status message with the means of the Wi-Fi
module. Further appropriate disposal of each segment can be carried out.
The system is as shown above, and while it considers many aspects of waste
collection and segregation in a timely manner, this model is not practical with different mixes
of waste at the same time as it can only detect one type of waste at the segment at a time.
2.9 You Only Look Once: Unified, Real-Time Object Detection [19]
This paper was presented by Joseph Redmon, Santosh Divvala, Ross Girshick, Ali
Farhadi from University of Washington and published in 2016.
This paper develops a new approach to object detection in the form of an algorithm of
the name You Only Look Once (YOLO) for real-time object detection using a 24-layered
convolutional neural network. YOLO frames the object detection as a regression problem to
spatially separated bounding boxes and associated class probabilities instead of repurposing
classifiers. By this method, the neural network passes the image through it only once and
performs the detection by predicting bounding boxes and class probabilities directly from the
images. It is a single network pipeline and therefore optimized for fast detection performance.
for each category at test time. Each prediction is either correct or it is classified based on the
type of error:
The breakdown of each error type is shown in the figure, across all 20 classes of VOC-2007.
Most of the errors committed by YOLO are localization errors in comparison to all
other sources, while R-CNN makes fewer of these but far more background errors. 13.6% of
its top detections are false positives that don’t contain any objects. This is almost completely
avoided in YOLO and therefore much more effective for the use case of waste detection in a
heap pile. Fast R-CNN is almost 3 times more likely to predict background detections than
YOLO.
with only two other competitors in terms of real time detection, both of which are
Deformable parts Models (DPM), as shown in the table below.
Fast YOLO is the fastest detector on record for PASCAL VOC detection and is still twice as
accurate as any other real time detector. YOLO is 10mAP more accurate than the fast version
while still well above real-time in speed. YOLO achieves a 6-times faster speed than the
VGG-16 version of Faster R-CNN and 2.5 times higher speed along with more accuracy
against The Zeiler-Fergus Faster R-CNN.
Many design changes made to the original version of YOLO resulted in a much more
accurate YOLO network. Here a new trained network proves to be the biggest change. At
320x320 YOLOv3 runs in 22 ms at 28.2 mAP., which is just as accurate as Single Shot
Detector (SSD) but 3x faster.
One of the many design changes include the way that bounding boxes are predicted.
An objectness score is predicted for each bounding box using logistic regression, and it will
maximise at 1 if the bounding box prior overlaps a ground truth object more than any other
bounding box prior. Another design change would be class prediction which now uses binary
cross-entropy loss. Independent logistic classifiers replaced softmax for better performance.
Another feature to this version would be prediction across 3 different scales and feature
extraction across these scales, similar to feature pyramid networks. This resulted in addition
of several convolutional layers. The final layer in these predicts a 3-d tensor encoding
bounding box, objectness and class predictions. Feature extraction, a new network is used
which uses a hybrid approach with YOLOv2, Darknet-19 and residual networks. This new
network is comprised of successive 3 x 3 and 1 x 1 convolutional layers with some shortcut
connections as well. This makes the network much larger i.e. 53 convolutional layers so it is
called Darknet-53.
This network proves to be much more powerful than Darknet-19 but still more
efficient than ResNet-101 or ResNet-152. The results shown are measured on ImageNet in
terms of accuracy, Billions of Operations, Billion floating-point operations per second and
FPS for various networks.
All networks are trained with identical settings and tested at 256 x 256, single crop
accuracy. Thus Darknet-53 performs on par with state-of-the-art classifiers but with fewer
floating-point operations and more speed. This network structure better utilizes the GPU,
making it more efficient to evaluate and thus faster.
Thus, with these changes made, and a larger network with more efficient designs,
YOLOv3 runs significantly faster than other detection methods with comparable
performance. Times are from either an M-40 or Titan X which are similar GPUs.
A smaller version of the more efficient YOLOv3 is also constructed in order to avoid usage
of GPUs when testing, and faster detection with lower usage of disk space. This is the version
the proposed garbage detection model uses that is loaded into the Raspberry Pi and detects
the waste.
Chapter 3
COMPARATIVE RESEARCH
The motivation behind the CNN is that it is based on the way the visual cortex
functions, where one object in the scene is in focus while the rest is blurred, similarly the
CNN takes one section/window of the input image at a time for classification. Each time the
CNN will produce a feature map for each section, in the convolutional layer. In the Pooling
layer it removes the excess features and takes only the most important features for that
section, thereby performing feature extraction. Hence, with the use of CNNs we don't have to
perform an additional feature extraction technique.
There are two types of Pooling: Max Pooling and Average Pooling. Max
Pooling returns the maximum value from the portion of the image covered by the Kernel. On
the other hand, Average Pooling returns the average of all the values from the portion of the
image covered by the Kernel. Adding a Fully-Connected layer is a (usually) cheap way of
learning non-linear combinations of the high-level features as represented by the output of the
convolutional layer. The Fully-Connected layer is learning a possibly non-linear function in
that space.
Selective search method is used to generate region proposals. This algorithm is based
on computing hierarchical grouping of similar regions based on colour, texture, size and
shape compatibility. Selective search starts by over-segmenting the image based on intensity
of the pixels. These oversegments are used as an initial seed and then add all bounding boxes
corresponding to segmented parts to the list of regional proposals, group adjacent segments
based on similarity. These steps are repeated to form larger segments. This Selective Search
method will create nearly 2000 different regions.
3.2.2 CNN
Each region proposal is taken individually and a feature vector representing this
image is created using Convolutional Neural Network (CNN).
3.2.3 SVM
Once the feature vector is created, we need to identify the classes each object belongs
to. SVM is a classifier used for this purpose. SVM gives the output as confidence levels.
In this paper “Fast R-CNN” by R. B. Girshick, the author tries to solve these problems. The
different parts of R-CNN are combined to one architecture.
1. Whole image is processed with CNN and gives the feature map as the output.
2. Using this feature map, for each region proposal extract the corresponding part called as
Region Proposal Feature Map. Pooling layer is used to resize all the region proposal
feature map to a fixed size.
3. This fixed region proposal map is flattened. Map becomes a feature vector which doesn’t
change its size.
This part has 2 fully connected layers that have 2 outputs. First is Softmax layer which
decides the object class, second is Bounding Box Regressor, where output is a bounding box
coordinates for each object class.
In this paper, “Faster R-CNN: Towards Real-Time Object Detection with Region
Proposal Networks”, by Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Sun, tries to
solve this problem by using a different approach than Selective Search. Faster R-CNN lets
the network learn the region proposals.
Like Fast R-CNN, the feature map is provided by the CNN. To identify the region
proposals a separate network is used to predict them. Predicted region proposals use RoI
(Region of Interest) pooling layer to reshape the images and then used to classify the image
within that proposed region and predict values for bounding boxes.
The overall process of YOLO is very simple. YOLO takes an input image. Then the
image is divided into grids. Image classification and localization are applied on each grid.
A single neural network which predicts bounding boxes and class probabilities
directly from full images in one evaluation is being explained.
3.5.1 Model
Input image is divided into S × S grid. B bounding boxes are defined in each grid cell
with confidence score. Confidence is nothing but the probability an object exists in each box.
C = Pr (Object) * IOU𝒕𝒓𝒖𝒕𝒉
𝒑𝒓𝒆𝒅
Where IOU is Intersection Over Union, truth is ground truth and pred is prediction.
Intersection is overlapping area between predicted box and truth, and union is the total
area between both predicted and truth. IOU should be close to1 which indicates that predicted
bounding box is close to truth.
These probabilities are encoded to tensor of a fixed dimension. Tensor dimension is defined
as: S ×S ×(B ∗5+C)
This network is inspired by GoogLeNet model for image classification. This network
has 24 convolutional layers followed by 2 fully connected layers.
For pre-training we use the first 20 convolutional layers, average-pooling layer and a
fully connected layer. Final layer predicts class probabilities and bounding box coordinates.
Bounding box height and width are to be normalized by image height and width so that they
fall between 0 and 1.
Linear activation function is used for final layer and all other layers uses leaky
rectified linear activation which is defined as:
𝑥, 𝑖𝑓 𝑥 > 0
ɸ(𝑥) =
0.1𝑥, 𝑖𝑓 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒
YOLO predicts multiple bounding boxes per cell. One predictor is assigned to predict an
object based on the highest current IOU with ground truth.
The loss function is used to correct the center and bounding box of each prediction. λcoord
and λnoobj variables are used to increase emphasis on boxes with objects and lower the
emphasis on boxes with no objects. C refers to confidence and p(c) refers to classification
prediction.
YOLO has been improved with different versions such as YOLOv2 or YOLOv3.
Condensed versions of these versions are also available which are smaller and faster than
their original versions.
Arduino is very easy to get started for real-time applications. Not much programming
practice and experience is necessary to get started with working with Arduino. With RPi,
accessing hardware is not real-time. Sometimes there is a delay. There is no built-in Analog
to Digital converter.
RPi hardware is not open source. Arduino is not as powerful as RPi. Only
programming languages like C/C++ can be used to program the Arduino. Connecting to the
Internet is difficult. If any project requires Internet connectivity, less hardware interaction and
is slightly complex on the software side, then RPi is the best option. If we need/want to use
other languages other than C/C++, then RPi is a better option. [27]
interface is also a crucial part of the CPU. The I/O interface is sometimes included in the
control unit. [28]
It provides address, data and control signal while receives instructions, data, status
signal and interrupt which is processed with the help of the system bus. A system bus is a
group of various busses such as address, control and data bus. The CPU assigns more
hardware unit to fast cache while low to computation, unlike GPU.
The modern graphics are capable of performing the research and analysis task, often
surpassing CPUs because of its extreme parallel processing. In the GPU the several
processing units are stripped together where no cache coherency exist.
Chapter 4
SYSTEM REQUIREMENTS
System requirements are the configuration that a system must have for a hardware and
software for the proposed idea to run easily and proficiently.
Specifications
Processor: 64-bit SoC 2 1.4GHz
Memory: 1GB SDRAM
Connectivity: 2.4GHz and 5GHz wireless LAN
4 × USB 2.0 ports
Access: Extended 40-pin GPIO header
SD card support: Micro SD format
Camera modules which are official products from Raspberry Pi Foundation are called
Pi cameras. Camera module is used to take high-definition video and photographs. Raspberry
Pi Module v2 is the camera module used in this project. The v2 camera module has a Sony
IMX219 8-megapixel sensor. It supports 1080p30, 720p60 and VGA90 video modes and still
captures.
The robotic arm used in this project is “Aluminium Alloy 4 DOF Manipulator
Steering Gear Bracket Mechanical Paws”. It is a 4-DOF manipulator designed from multiple
servo sets.
Arm Parameters
Servo motors are DC motors that allow for precise control of the angular position.
The servo motors have a revolution cut-off from 90˚ to 180˚. The servo motors used in this
project are MG996R. It is a metal gear servo motor with features mentioned below:
Current: 2.5A(6V)
Rotation: 0˚ - 180˚
It has 3 wires:
3. Orange PWM signal is given in through this wire to drive the motor
Colab supports Python 2.7 and 3.6. There is a limit to the sessions and size for the
colab notebooks. It is easy to work with deep learning libraries like PyTorsch, Keras,
TensorFlow and OpenCV using Google colab.
Google colab provides free NVIDIA Tesla K80 GPU of about 12 GB. A GPU
provides 1.8TFlops and has a RAM of 12GB. GPU allocation is restricted to 12 hours at a
time per user. We can connect your session to Google Drive as an external storage.
Darknet displays information as it loads the config file and weights then it classifies
the image and prints top classes for the image. This framework can be used to run RNN.
RNNs are powerful models for representing data that changes over time and Darknet can
handle them without using CUDA or OpenCV. The framework also allows its users to
venture into game-playing neural networks.
YOLO makes predictions with a single network evaluation. It is more than 1000x
faster than R-CNN and 100x faster than Fast R-CNN.
OpenCV is an open source computer vision and machine learning software library. It
was built to provide a common infrastructure for computer vision applications. OpenCV was
originally developed by Intel. OpenCV is written in C++ and its primary interface is in C++.
There are bindings in in Pyhton, Java and MATLAB.
OpenCV runs on windows, Linux, macOS, etc. It can also run on mobile Operating
systems like Android, iOS, Blackberry 10. OpenCV library has more than 2500 optimized
algorithms. It includes a comprehensive set of both classic and state-of-the-art computer
vision and machine learning algorithms. These algorithms can be used to detect and
recognize faces, identify objects, track camera movements, track moving objects, etc.
pigpio is a Python module for the Raspberry to control the GPIO. The pigpio Python
module can run on Windows, MACs, or Linux. It controls one or more Pi’s. It is useful in
creating and transmitting precisely timed waveforms, reading/writing GPIO and setting their
modes. Transmitted waveforms are accurate to a microsecond. This module uses the services
of the C pigpio library. pigpio must be running on Pi(s) whose GPIO are to be manipulated.
Chapter 5
SYSTEM ARCHITECTURE
Data flow diagrams can be of 2 types, Logical DFD and Physical DFD. Figure 4.2
shows physical DFD of the proposed system. It describes the processes in detail and shows
all processes regardless of it being manual or automated.
Here the user, system and arm are considered as actors that interact with each other
for the final output.
Chapter 6
6.1 Introduction
This chapter covers the methodology and the practical implementation of our proposed
model. In order to bring the thesis to life, we have several factors to consider before we can
implement the system. The mapping out of these stages is known as the methodology. It can
be dissected into many modules or steps, such as:
The implementation stage consists of proper planning, detailed study of the existing
system and its constraints and designing of any alternative methods and their evaluation.
For this project, the selection of the programming is crucial, as we require a language
that is optimal for both remote computation of YOLO weights and Raspberry Pi
configuration. The ideal choice for this was therefore Python.
6.2.1 Python
The main reasons for using Python in our proposed system are:
2. OpenCV library & Darknet frameworks are crucial features and effortlessly accessible in
Python.
As mentioned earlier, Darknet can be used with OpenCV and CUDA. For this project,
we have made use of both the features. The framework is basically written in C but it has an
excellent integration in Python.
6.3.1 Installation
The process of installation is straightforward and done through an open source git
repository created by Joseph Redmon. For our needs however, an updated version of the
repository created by AlexyAB has been used as this supports Windows platforms as well as
streamlining the training and validation process.
Once the base system framework of Darknet has been compiled, we can make use of
the tiny yolo model present.
In order to use the tiny-yolov3 algorithm, a dataset of the types of waste needed to be
segregated has to be compiled. This dataset consists of at least 1000 different images of one
type or class of waste. We have compiled our dataset by scraping the web as well as making
use of Kaggle, a hub of open source datasets for the world of Machine Learning.
Our model attempts to segregate the recyclable waste into 5 classes, namely: Wood,
Metal, Paper, Plastic and Glass. The waste in the dataset of over 7000 images is accurately
labeled and saved for the model to read. Organic waste is ignored for the large part, and not
segregated; hence we have no use for images of such kind.
This dataset, once labeled is split into training and test sets at a ratio of 90-10 and
loaded into the darknet framework for training.
6.3.3 Tiny-YOLOv3
From the many algorithms available on the darknet framework, we have chosen Tiny-
Yolov3 as it is a succinct neural network of 24 convolutional layers and 2 yolo layers.
The tiny version of YOLO allows us to make garbage detection on a small processing
unit such as a Raspberry Pi 3 and therefore ideal for the project.
YOLO allows the detection and segregation to be made quickly and accurately, since
the images will be passing through a single neural network. This network divides the image
into regions and predicts bounding boxes and probabilities for each region. These bounding
boxes are weighted by the predicted probabilities.
The default mAP of the YOLO system is 57.9% on the universal COCO dataset
mainly used to predict large objects. However, we are utilizing this system to detect more
undefined objects like waste with a considerably smaller dataset of jpeg images.
6.4 Pseudocode
The working model can be differentiated into various modules. The algorithm of the
complete sequence is presented below in Figure 6.1. This sequence continues to function as
long as the number of objects presented in front of the system is greater than zero.
processes it in real time. The network returns the updated number of objects, midpoints of the
objects as well as its classes so as to dispose of them appropriately. ‘Predictions.jpg’ image
contains the final output of the software module complete with bounding boxes and their
labels. The function Pick_Up_Recyclable_Waste() uses the co-ordinates returned to
move the robotic arm to the location of the recyclable objects while the
Move_Arm_to_Bin() function uses the default bin position to throw the recyclable
waste. This bin position can be configured to different locations depending on the class
predictions given to that particular object. After each object, the arm resets using
Reset_Arm()function to come to rest. This continues for all obtained co-ordinates.
The detection of recyclable objects takes place within the YOLO function where the image
is partitioned into 13x13 cells in which it is capable of detecting up to five objects per cell.
The image is only processed once through the tiny-YOLO network and each cell is examined
carefully where a class prediction and bounding box prediction is done and appended to the
final predictions of the entire image. This is demonstrated in the algorithm for YOLO in
Figure 6.2.
As explained, the number of cells the image is divided into and number of classes are
taken into consideration. For each grid cell in the 13x13 division, up to 5 objects can be
detected and this detection is made by using the step size to traverse the image until all cells
are analysed. The predictionClass stores the values predicted by the
class_predictor() function for that particular cell. These predictions are then given
bounding boxes (multiple) which is stored in predictionsBox array. The best box is
determined by considering the bounding box with higher confidence. The highest probable
class stored in predictionClass and then this value is attached with the bounding box
(prediction variable) and then appended to final_predictions array that stores all
final predictions of final objects in the image.
Since these final predictions are stored in the array, the size of it should inform the
program about the number of objects it has detected which can be used to keep the loop of
segregation continuous. An additional function CalculateMidpoints()is used to find
the correct co-ordinates where the robotic arm needs to travel to pick up the waste. The
predictions, midpoints and number of objects are returned.
Once the co-ordinates of the recyclable objects are obtained, it is easily picked mapped to
the workspace and segregated by the robotic arm. This is done repeatedly till no recyclable
objects remain and new waste can be dumped.
6.5 Training
For the training of the system on the dataset of images, a default weights file is
installed and loaded initially. This allows the model to work with more accurate numbers
rather than a random value and provides a faster training experience and greater drops in loss
values.
The initial weights used for our project is the tiny-yolov3.conv.15 file which is
optimized for tiny yolov3. Moreover, further configurations are required for the network, as
the resolution of images, number of classes, filters, maximum iterations, steps, batch sizes
and learning rate is communicated to obtain the most accurate result.
These parameters are set and the model is allowed to train. For our project, the
parameter values are shown as below:
Number of Classes: 5
Filters: 15
Steps: 8000,9000
Batch Sizes: 64
For every 100 and 1000 iterations, it maps out the progress and saves the trained
weights for safe-keeping. Once the maximum iterations is reached, the final weights file is
saved for detection purposes.
Here the main indication of rising accuracy is the decrease in average loss with the
escalation of the number of images trained. Object confidence and class prediction
confidence also increases with each iteration. The ideal number of iterations to train is 2000
or more per the number of classes. In this project, the model is trained over 10000 iterations.
The training is only done on the training set.
In order to ensure the continued decline in average loss, the learning rate is monitored
and deceased exponentially every time the loss stalls or experiences an upturn. The initial
learning rate specified is 0.01.
Training is stopped when maximum iterations are reached, or when average loss does
not decrease after more than 100 steps, and the weights are saved.
6.6 Testing
The trained model of Tiny-YOLOv3 is validated or tested using the test set that was
separated out initially. These images are tested on and the predicted boxes are matched with
the labeled values to calculate mAP, Precision, Recall, F1 score, and the Intersection over
Union (IOU) percentage.
The results of each of the weights files saved at every 1000 iterations is compared
since it is not mandatory that the final weights are best suited for the dataset. Sometimes, an
early stopping point can have the ideal weights that is normalized for both training and
validation set. The figure below shows the progression of average loss values when the
different weights are used on both training and validation sets.
In that case, the weights file that gives the lowest loss with the highest mAP or IOU is
considered for detection in future stages. For this project, the weights file generated at 9000
iterations gives the highest mAP for the dataset and therefore is considered for detection.
The hardware model is constructed so that the robotic arm may move freely over the
mapped workspace where the garbage is placed for segregation, with the camera suspended at
an appropriate distance to capture the status of this grid with clarity.
The Raspberry Pi is used to control the 4 servo motors to guide the arm to these points
on the grid. The pre-trained software tiny-yolov3 model is loaded into the raspberry pi unit so
that it may perform the detection locally. The camera is configured to capture the image of
the grid and transfer the knowledge to the processing unit.
A program is necessary to convert the co-ordinates of the waste in an image into co-
ordinates on the grid.
Once the hardware has been set up, it is ready to detect the waste and perform real
time waste segregation.
For the process of segregating waste, the model goes through the following steps:
Capture image of waste on the grid & transfer image to Raspberry Pi processor
Detect the waste type and position in the image using tiny-yolov3
Obtain & Transfer the Co-ordinates of the objects to the robotic arm
Segregate the waste into the appropriate bin using robotic arm
The waste detected is segregated into the categories of Wood, Metal, Paper, Plastic and
Glass. Multiple types of waste and objects are detected simultaneously, but the segregation is
done linearly. Once all the detected waste is segregated, the camera captures another image
and the next set of waste goes through the same process.
Chapter 7
For every 1000 iterations a weights file is updated as mentioned above. We need to
choose best weights file to be used for detection. Weights file with highest mAP or IOU
should be considered for the detection purposes.
Precision is the ratio of true positive and the total number of predicted positives where
true positive is when the prediction is positive and the ground truth is also positive.
Recall is another evaluation metric which can be defined as ratio of true positive and
the total of ground truth positives. It gives the basic idea about how good the model detects
all objects in the data.
mAP is the mean value of ‘average precisions for each class’. It can also be defined as
the area under precision-recall curve. The highest mAP means that the model detects all the
objects in the image as precisely as possible.
After training for 9000 iterations, the evaluation metrics values are as follows:
After this command is run, the model gives the output which consists of IOU, scale,
time taken to detect the object and the confidence of the class of the detected object which is
saved in a file called “result.txt”. The model created detects only the recyclable objects as it is
trained to be. Along with this an image will be saved as “predictions.jpg” and displayed on
the screen with a bounding box around the each object detected.
If the image consists of multiple objects, multiple bounding boxes will be drawn over
the image indicating multiple objects and the confidence of each class of each object will be
displayed on the screen.
The complete image is divided into 12*8 grids. Each grid has its own function written
related to the servo motors. The values left_x, top_y, width and height written in “result.txt”
are values corresponding to the bounding box drawn over the objects and these are used to
calculate the midpoint of the object. Using the midpoint of the object we check the grid the
object is present in. With this the function corresponding to the grid is called and the object is
picked up by the robotic arm and thrown to a bin kept beside it. The inorganic waste objects
which are beside these recyclable objects are left behind which has to picked out and put to
another bin altogether.
Chapter 8
8.1 Conclusion
Our proposed system effectively detects the garbage in the sightline of the camera,
identifies the different categories of recyclable waste and segregates it using the robotic arm’s
reach. Using YOLO method to completely scan the workspace ensures a quick and reliable
detection of the waste. There may be some latency in transference of knowledge between the
camera, processing unit and the servo-motors, but this can be streamlined to be more
efficient. Multiple types of waste are detected simultaneously and the co-ordinates
transmitted allow the robotic-arm to queue the segregation of these objects.
The pre-trained tiny-yolo model allows the testing of the model on a small embedded
processing unit such as the Raspberry Pi, however the much larger version of YOLOv3 with
74 layers instead of 25 would be beneficial if more memory space, RAM and GPU was made
available. This model can be implemented realistically in various scales with a couple of real-
world tweaks.
The project has tremendous potential for implementation in many facets and scales,
provided the processing power is increased along with GPUs to validate the garbage detected.
This can be facilitated with the latest Raspberry Pi model 4 as well as enabling high-speed
internet, which would allow the use of cloud graphics processor units that can compute the
results at least ten times faster rates.
Further improvement can be made to the accuracy of the model by the addition of a
feature that would allow real time dataset creation. If the software could add more items to its
catalogue every iteration that it detects an unknown object, it would learn with each new test
and automatically improve its accuracy. This would also require high speed internet facilities
to be able to retrain the model multiple times while also allowing for data augmentation.
Another improvement that can be done with this system is to implement it with the
conveyor belt. Objects can be kept on a conveyor belt and transported to the system which
detects and picks out the recyclable items and lets the decomposable objects to move through
the belt to the decomposable pit directly. And the recyclable objects which are collected in a
different bin can be further divided to separate categories and sent to its respective recycle
stations.
With a more accurate model and larger scale hardware, this model may be
implemented at both household/residential locations as well as garbage segregation centers
for much larger quantities of waste. Automating this process would benefit the streamlining
of disposal in urban as well as rural areas.
REFERENCES
[1] https://www.ijraset.com/fileserve.php?FID=14058
[2] https://www.ijraset.com/fileserve.php?FID=17556
[4] “Classification of Trash for Recyclability Status”. Gary Thung, Mingxiang Yang.
[6] “Waste segregation using Machine Learning”. Yesha Desai, Asmita Dalvi,
Pruthviraj Jadhav, Abhilash Baphna. 2018 International Journal for Research in
Applied Science & Engineering Technology.
[7] “Smart Bin for Waste Management System”. Sreejith S, Sanjay Kumar A, Ramya
R, Roja R. 2019 5th International Conference on Advanced Computing &
Communication Systems
[8] “Smart Garbage Segregation & Management System Using Internet of Things (Iot)
& Machine Learning (ML)”. Shamin N, P Mohamed Fathimal, Raghvendran R and
Kamalesh Prakash. 2019 1st International Conference on Innovations in Information
and Technology.
[10] https://www.raspberrypi.org/products/raspberry-pi-3-model-b/
[11] https://www.raspberrypi.org/products/camera-module-v2/
[12] https://thinkrobotics.in/products/aluminum-4-dof-manipulator-steering-mechanical-
paws?_pos=3&_sid=7e06b0e75&_ss=r
[13] https://colab.research.google.com/notebooks/intro.ipynb
[14] https://pjreddie.com/darknet/
[15] https://pjreddie.com/darknet/yolo/
[16] https://opencv.org/
[17] http://abyz.me.uk/rpi/pigpio/python.html
[19] “You Only Look Once: Unified, Real-time Object Detection”, Joseph Redmon,
Santosh Divvala, Ross Girshick, Ali Farhadi, 2016
[21] R. B. Girshick, J. Donahue, T. Darrell, and J. Malik, “Rich feature hierarchies for
accurate object detection and semantic segmentation,” CoRR, vol. abs/1311.2524,
2013. [Online]. Available: http://arxiv.org/abs/1311.2524 1, 2
[23] S. Ren, K. He, R. B. Girshick, and J. Sun, “Faster R-CNN: towards real-time object
detection with region proposal networks,” CoRR, vol. abs/1506.01497, 2015.
[Online]. Available: http://arxiv.org/abs/1506. 01497 2
[24] J. Redmon and A. Farhadi, “Yolo9000: Better, faster, stronger,” arXiv preprint,
2017. 3, 4
[25] S. Han, H. Mao, and W. J. Dally, “Deep compression: Compressing deep neural
networks with pruning, trained quantization and huffman coding,” arXiv preprint
arXiv:1510.00149, 2015
[26] https://www.geeksforgeeks.org/difference-between-arduino-and-raspberry-pi/
[27] https://www.elprocus.com/difference-between-arduino-and-raspberry-pi/
[28] https://techdifferences.com/difference-between-cpu-and-gpu.html
[29] https://www.geeksforgeeks.org/difference-between-cpu-and-gpu/
[30] https://www.datenreise.de/en/raspberry-pi-3b-and-3b-in-comparison/
APPENDIX