Computational Intelligence
Computational Intelligence
Computational Intelligence
Computational
Intelligence in
Data Mining
Proceedings of ICCIDM 2021
Smart Innovation, Systems and Technologies
Volume 281
Series Editors
Robert J. Howlett, Bournemouth University and KES International,
Shoreham-by-Sea, UK
Lakhmi C. Jain, KES International, Shoreham-by-Sea, UK
The Smart Innovation, Systems and Technologies book series encompasses the topics
of knowledge, intelligence, innovation and sustainability. The aim of the series is to
make available a platform for the publication of books on all aspects of single and
multi-disciplinary research on these themes in order to make the latest results avail-
able in a readily-accessible form. Volumes on interdisciplinary research combining
two or more of these areas is particularly sought.
The series covers systems and paradigms that employ knowledge and intelligence
in a broad sense. Its scope is systems having embedded knowledge and intelligence,
which may be applied to the solution of world problems in industry, the environment
and the community. It also focusses on the knowledge-transfer methodologies and
innovation strategies employed to make this happen effectively. The combination
of intelligent systems tools and a broad range of applications introduces a need
for a synergy of disciplines from science, technology, business and the humanities.
The series will include conference proceedings, edited collections, monographs,
handbooks, reference books, and other relevant types of book in areas of science and
technology where smart systems and technologies can offer innovative solutions.
High quality content is an essential feature for all book proposals accepted for the
series. It is expected that editors of all accepted volumes will ensure that contributions
are subjected to an appropriate level of reviewing process and adhere to KES quality
principles.
Indexed by SCOPUS, EI Compendex, INSPEC, WTI Frankfurt eG, zbMATH,
Japanese Science and Technology Agency (JST), SCImago, DBLP.
All books published in the series are submitted for consideration in Web of Science.
Computational Intelligence
in Data Mining
Proceedings of ICCIDM 2021
Editors
Janmenjoy Nayak H. S. Behera
Department of Computer Science Department of Information Technology
Maharaja Sriram Chandra BhanjaDeo Veer Surendra Sai University of Technology
(MSCB) University Sambalpur, Odisha, India
Baripada, Odisha, India
S. Vimal
Bighnaraj Naik Department of AI & DS
Department of Computer Application Ramco Institute of Technology
Veer Surendra Sai University of Technology Rajapalayam, Tamil Nadu, India
Sambalpur, Odisha, India
Danilo Pelusi
Faculty of Communication Sciences
University of Teramo
Teramo, Italy
© The Editor(s) (if applicable) and The Author(s), under exclusive license to Springer Nature
Singapore Pte Ltd. 2022
This work is subject to copyright. All rights are solely and exclusively licensed by the Publisher, whether
the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse
of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and
transmission or information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known or hereafter developed.
The use of general descriptive names, registered names, trademarks, service marks, etc. in this publication
does not imply, even in the absence of a specific statement, that such names are exempt from the relevant
protective laws and regulations and therefore free for general use.
The publisher, the authors and the editors are safe to assume that the advice and information in this book
are believed to be true and accurate at the date of publication. Neither the publisher nor the authors or
the editors give a warranty, expressed or implied, with respect to the material contained herein or for any
errors or omissions that may have been made. The publisher remains neutral with regard to jurisdictional
claims in published maps and institutional affiliations.
This Springer imprint is published by the registered company Springer Nature Singapore Pte Ltd.
The registered company address is: 152 Beach Road, #21-01/04 Gateway East, Singapore 189721,
Singapore
ICCIDM Committee
Chief Patron
Patrons
v
vi ICCIDM Committee
General Chairs
Convenors
Program Chairs
Dr. Weiping Ding, Deputy Dean of School of Information Science and Technology,
Nantong University
Dr. Florin PopentiuVladicescu, University of Politehnica, Romania
Dr. Vincenzo Piuri, University of Milan
Dr. Shaikh A. Fattah, Bangladesh
ICCIDM Committee vii
Organizing Chairs
Publicity Chairs
Publication Chairs
Sponsorship Chairs
Registration Chairs
Web Chair
Reviewer Committee
xiii
xiv Preface
In this sixth version, two new themes such as deep learning and big data are
being specifically focused with a wider range of applications. Moreover, a separate
theme of proposals is invited on applications of computational intelligence in inte-
gration to text & video recognition, sentiment analysis, advanced image processing
and COVID data analysis. The articles are subdivided into four tracks such as
Advance computational intelligent techniques and their applications, computation
with modeling, nature inspired computation and neural network and data mining
applications. This series of ICCIDM contains good quality articles based on the
major and minor thematic areas of the conference, which are very specialized in the
respective domain. This volume is a wide range of collections of articles on applica-
tions of computational intelligence on Multi-sensor data fusion for Occupancy detec-
tion, Sentiment Analysis, Automated System for Facial Mask Detection and Social
Distancing During COVID-19 Pandemic, Detection of Insider Threats, Advanced
Persistent Threat detection, Diagnosing Plant Diseases, Multimodal MRI Analysis,
Alzheimer Disease Classification, Robot Motion using Gradient Generalized Artifi-
cial Potential Fields with obstacles, Analysis of Kidney Disease Dataset, Deepfakes
for Video Conferencing, Face Recognition, Impact of UV-C Treatment on Fruits and
Vegetables, Modeling and Forecasting Stock Closing Prices, Software Reliability
Prediction, Disaster Event Detection, Breast Cancer prediction, Customer Segmen-
tation, Solar Radiation Prediction, QCM Sensor based Alcohol Classification, Breast
cancer mammography identification etc.
For the sake of this well-prepared conference, all the accepted papers are double-
blind peer-reviewed by the concerned subject experts. These papers have gone
through a strict reviewing process performed by the qualified national and interna-
tional reviewers with the constraint of containing highly informative and insightful
contributions based on research quality. Atleast two subject expertise based reviewers
have reviewed each submission for a better outcome in terms of selecting the papers.
And the editors had a great time working in collaboration with international advisory,
program and technical committee members. We are very pleased to report that the
quality of the submissions this year turned out to be very high.
This version of the ICCIDM 2021 objective provides an interactive forum for
presentation and discussion on computational intelligence research and related fields.
In addition, this volume of proceedings provides an opportunity for readers to engage
with a selection of refereed papers that highlight the role of intelligent computing
methods and algorithms in the fields of data sciences. In this respect, on behalf of the
Organizing Committee, we would like to thank the reviewers, the technical committee
members, international and national advisory board members and the organizers for
their valuable efforts and work during this recurring pandemic challenge. We are
indebted to their great effort and professionalism. Furthermore, we wish to express
our heartfelt gratitude to the conference keynote speakers. Moreover, the support and
cooperation of the Springer technical team for the timeline production of this volume
Preface xv
are deeply acknowledged. Finally, we wish to thank all the authors of submitted
technical papers, demos and exhibitions, for contributing to these great proceedings.
Editors
Baripada, India Janmenjoy Nayak
Sambalpur, India H. S. Behera
Sambalpur, India Bighnaraj Naik
Rajapalayam, India S. Vimal
Teramo, Italy Danilo Pelusi
Acknowledgements
After the successful five versions of ICCIDM, the sixth version believed in more
quality aspects of computational intelligence based research and developments in
various disciplines of data mining. Though the COVID-19 pandemic has affected a
wide variety of researchers, but the interest towards the research in various disciplines
of computational intelligence for solving useful data mining problems remained
same. In fact, there are many novel solutions for combating the COVID-19 with
different dimensions. This version attracted several researchers or academicians
across the globe to choose this venue for submitting their research and provide
an extensive height to the reputation of ICCIDM conference for research findings
and sharing the knowledge among national and international experts. The ICCIDM
team is really thankful to all the perspective authors whose valuable research findings
made this event truly exceptional.
We have been fortunate enough to work with a brilliant international as well as
national advisory, technical and programme committee members. It was our great
pleasure to work with such eminent members of program and technical committee for
their suggestions, which has made all possible ways to filter good quality articles out
of all submitted papers. We would like to convey our sincere thanks and obligation
to the benevolent reviewers for sparing their precious time and putting in endeavor
to review the papers in a stipulated time and providing their important insights in
brainstorming the presentation, content and quality of proceeding.
We are especially thankful to the organizing team of ICCIDM for their enormous
support in all the form to make this international event happen in a successful way.
The efforts of such young yet experienced and dynamic organizing committee has
lead all the way to mark the success path of this event in this year. Moreover, the
inputs and efforts of program chair(s) are highly appreciable for enhancement of the
quality of each accepted article.
We are highly thankful to the Management of Aditya Institute of Technology
And Management (AITAM)—Tekkali, Srikakulam, AP, India especially Prof.
V. V. Nageswar Rao (Director), Principal, Director, R&D, Deans & Associate Deans
and Head of the Departments for their constant support and motivation for making
xvii
xviii Acknowledgements
the conference successful. The editors would also like to thank Springer Edito-
rial Members for their constant support for in time publishing the proceedings.
The ICCIDM conference and proceedings ensured the acknowledgments to a huge
congregation of people.
Contents
xix
xx Contents
xxv
xxvi About the Editors
Dr. Danilo Pelusi has received the Ph.D. degree in Computational Astrophysics
from the University of Teramo, Italy. Presently, he is holding the position of Asso-
ciate Professor at the Faculty of Communication Sciences, University of Teramo.
Associate Editor of IEEE Transactions on Emerging Topics in Computational Intel-
ligence, IEEE Access, International Journal of Machine Learning and Cybernetics
(Springer) and Array (Elsevier), he served as Guest Editor for Elsevier, Springer
and Inderscience journals, as Program Member of many conferences and as Edito-
rial Board Member of many journals. Reviewer of reputed journals such as IEEE
Transactions on Fuzzy Systems and IEEE Transactions on Neural Networks and
Machine Learning, his research interests include intelligent computing, communi-
cation system, fuzzy logic, neural networks, information theory, and evolutionary
algorithms.
Multi-Sensor Data Fusion for Occupancy
Detection Using Dempster–Shafer
Theory
Abstract A data fusion technique has been proposed in this paper for detecting the
presence and absence of individuals in a room. It involves the utilization of data
from a series of sensors mainly temperature and humidity sensors, for detecting
the presence of a person. From the perspective of evidence theory, data collected
from every sensor can be viewed as a matter of evidence. A Dempster–Shafer (D-S)
theory-based data fusion model is established for modeling and consolidating the
pieces of evidence and hence generating an overall speculation of the temperature
and humidity level of a room. Testing has been carried out with a dataset that have
two classes. At first, the detection is done using various well-known classifiers such
as logistic regression which shows an accuracy level of 94%, K-nearest neighbors
shows an accuracy of 93%, support vector machines result in an accuracy of 94%,
and decision tree classifier and random forest classifier show an accuracy of 92%
and 93%, respectively. A subset of the data is used to create class membership
probabilities for every attribute for training, and hence, a mass function is created.
Finally, D-S rule is applied, and the outcomes suggest that the data fusion method
gives a higher accuracy level compared to the only involvement of classifiers.
1 Introduction
Occupancy of a person in a room is one of the most useful pieces of information that
can be used in smart homes and also in places where security is of utter importance
like locker rooms in bank, jewelry stores, etc. In the current situation of COVID-19,
gathering of large crowds is not allowed which can be monitored using this project.
Human presence inside a room can be identified by using temperature and humidity
sensors [1]. Several temperature and humidity sensors were placed in a room, and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 1
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_1
2 S. Sarkar et al.
the corresponding readings were taken both in the absence and presence of humans.
Then the data was processed using five types of classifiers, namely logistic regression,
K-nearest neighbors, support vector machines, decision tree classifier and random
forest classifier, and their accuracy level was noted.
Multi-sensor data fusion is an innovation to empower fusion of evidence from
series of different sensors in order to frame a unified picture [2]. It is a significant
technique for producing more consistent, accurate and efficient information than that
provided by an individual sensor. By discovering the number of applications of multi-
sensor data fusion technique, we can figure it out that it is in demand tremendously.
However, it has more application in the field of artificial intelligence, controlling
autonomous vehicle, target recognition, image fusion and gear fault diagnosis in
strong noise [3–8]. Within our work, multi-sensor data fusion has been applied for
occupancy detection in a room.
It has been found that different degrees of uncertainty and inconsistency have been
associated with different sensors [9]. Inconsistency in data collected from sensors is
often the consequence of either inherent impediments in the precision with which the
sensed data is obtained or limitations imposed such as accuracy and age of the sensors.
Even when readings of the sensors are consistent and precise, there is still a possibility
of imperfection and uncertainty, for example, if one or more sensors are suspected
of being defective. This uncertainty may result in a contradictory conclusion. Since
information derived from the sources can be inconsistent and uncertain sometimes,
a fusion mechanism is used to reduce the inconsistent and imprecision.
Dempster–Shafer evidence theory and Bayesian theory are the two most used
statistical theories for evidence combination [10]. Though Bayes theory is unani-
mously agreed as correct, it is argued for its “subjectiveness” by many researchers.
Bayesian theory also represents complete ignorance. D-S evidence theory can be
viewed as an augmentation of traditional probabilistic theory that deals with data
uncertainty but does not find out enough prior probabilities. Bayesian theory is an
exclusive instance of D-S theory. Within this paper, data from different tempera-
ture and humidity sensors was first computed using only classifiers, then data fusion
mechanism using D-S theory was applied, and the results were compared. It was
observed that the accuracy level was more after applying data fusion mechanism.
Section 2 depicts the proposed method where preliminary concepts of all the five
classifiers, namely logistic regression, K-nearest neighbors, support vector machines,
decision tree classifier and random forest classifier, that have been used to perform
classification have been described along with the method used for multi-sensor data
fusion. Preliminary concept of Dempster–Shafer evidence theory has been described
briefly in Sect. 3. Section 4 deals with the results and discussions where the effective-
ness of the proposed data fusion method has been demonstrated. Section 5 provides
the conclusion.
Multi-Sensor Data Fusion for Occupancy Detection Using Dempster … 3
2 Proposed Method
In the current work, five different classifiers were used to perform classification before
performing data fusion, namely logistic regression, K-nearest neighbors, support
vector machines, decision tree classifier and random forest classifier.
A classifier named logistic regression has been utilized for solving multi-sensor data
fusion technique under the study for occupancy detection. It is one of the easiest
machine learning algorithms and can be easily implemented. As logistic regression
yields well-calibrated probabilities along with classification results, it turns out to
be efficient for the information that are obtained from the sensors. Logistic regres-
sion does not need large number of computational resources and is exceptionally
interpretable. It is not difficult to regularize and does not need any tuning. Logistic
regression works better when we eliminate attributes that are irrelevant to the output
variable and attributes that are similar to each other.
Classifier named KNN has also been utilized earlier for the study of multi-sensor
data fusion technique. The aim of utilizing this classifier is to show how much the
accuracy and precision metric change after applying proposed data fusion method to
this classifier. This classifier is a lot quicker in comparison with other classifiers. It
has also been found that KNN works better than many classifiers under the study of
occupancy detection.
A classifier named SVM has been utilized to analyze information for classification
and regression analysis. SVM has been utilized before and even after applying multi-
sensor data fusion technique under the study of occupancy detection with the aim to
show whether there is any change in the accuracy or precision metrics.
SVM functions admirably when there is an absolute partition between classes. It
is generally memory effective. Overfitting is less in SVM.
4 S. Sarkar et al.
A hierarchical classifier named decision tree also has been utilized for solving multi-
sensor data fusion technique under the study for occupancy detection. It is a flowchart-
like design in which each inner node indicates a test on a component, each leaf node
indicates a class label, and branches indicate conjunctions of features that lead to
those class labels. The ways from root to leaf indicate classification rules. No feature
scaling is needed in decision tree, and training period is likewise less.
Decision trees are not difficult to clarify. It follows similar methodology as people
follow at the time of making choices.
Another classifier named random forest has been utilized under the study of occu-
pancy detection. It builds multiple decision trees on provided information. Predic-
tion from all of these hierarchical trees is obtained, and then it picks up the optimal
arrangement through polling. It is an ensemble method. It has been found that it
works in a more precise and optimal manner than a single decision tree.
It can consequently deal with missing numbers. It can detect outliers and can deal
with them naturally.
Figure 1 depicts a series of DHT11 temperature and humidity sensors which have
been used for occupancy detection. It is used to measure the performance of sensors
by using Dempster–Shafer-based data fusion technique. Hence, the collected data
from the sensors has been fused together to form a unified picture by using D-
S combination rule. Further, the hypothesis with the highest mass value has been
selected. If the hypothesis is containing singleton class, then the class has been
returned; otherwise, feature selection value (FSV) has been calculated. Then the
attribute with smallest FSV has been selected. The difference of the selected attributes
has been calculated. Finally, the class with smallest absolute difference has been
found out, and the class has been returned.
The Dempster–Shafer evidence theory is derived from the theory that has been
proposed by Dempster [11], and then it has been modified by Shafer [12]. It is a
Multi-Sensor Data Fusion for Occupancy Detection Using Dempster … 5
D-S Combination
Hypothesis containing
singleton class Return that class
Yes
No
generalization of classical probability theory. It can deal with uncertain data more
readily. Some mathematical terminologies used in D-S theory are as follows.
Each subset of power set has been found out. To all those subsets, a mass value
between 0 and 1 has been assigned.
f : 2ω → [0, 1]
f (φ) = 0 and f (B) = 1
B⊆ω
where f (B) represents the accessible evidence proportion that underpins the decla-
ration that the actual state does not belong to any subset of B. It belongs to B itself
[13].
The lower and upper bounds of a probability interval can be computed by two
measures that are evaluated from the plausibility degree (pla) and belief degree (bl)
and the mass [13].
To evaluate the belief function degree of B, bl(B), all the n-empty subsets of B
have been found out. The mass values of all these subsets have been computed, and
a summation of the mass values has been done as shown in Eq. (1).
bl(B) = f (C) (1)
C⊆B
To evaluate the plausibility degree of B, pla(B), all sets that intersect B have been
found out. The masses of these sets have been evaluated, and then a summation has
been done as shown in Eq. (2).
pla(B) = f (C) (2)
C∩B=φ
The relationship between plausibility degree and belief functions degree is shown
in Eq. (3).
Information has been collected from two independent and non-identical sensors in
the similar discernment frame ω. Let f 1 and f 2 be the two mass functions established
based on that. The resultant mass function is given by f as shown in Eqs. (4) and (5).
and
1
f 1 ⊕ f 2 (B) = f 1 (C) f 2 (D) (5)
1− Q
C∩D=B=∅
where Q is the measure of the differences between the two mass values which is
evaluated using the formula in Eq. (6):
Q= f 1 (C) f 2 (D), Q = 1 (6)
C∩D=∅
Consider a series of sensors with temperature and humidity sensor, used for occu-
pancy detection. The dataset collected from the sensors was split into training and
testing data.
Suppose ω = {d 1 , d 2 , …, d t } be the discernment frame, where the number of
classes is represented by t and assuming n number of sensors are there. First, the
power set which is 2θ was found for each set of attributes. Each attribute of the training
data has been found out. Then, the minimum and maximum values for each class have
been evaluated based on that. Overlapping has been done. Boundary information has
been obtained. This procedure is used to group the data items; however, in some
cases the attributes do not contain a singleton class. If it is found that the value of
a data item is more than or equivalent to minimum of class 1 and not as much as
minimum of class 2, then class 1 will be the possible class for those data items. If
its value is more than or equivalent to minimum of class 2, then there can be two
possible classes for that item so it can be assigned to any class, be it class 1 or class
2.
According to the Dempster–Shafer theory, the temperature and humidity classes
are evaluated by mass values. Let f n (d1 ), f n (d2 ), . . . f n (dt ) be the mass function
values of classes (d 1 , d 2 , …, d t ), respectively, and d consists two parameters
temperature and humidity.
If the data items belong to a single class, then
8 S. Sarkar et al.
f n (ω) = 1,
where the number of classes has been represented by t which is a natural number.
The mean value of the absolute difference d between the features with smallest
FSV has been evaluated for each class using Eq. (8) which has been clearly shown
in Fig. 1.
di = |ai −a i |, i = 1, 2, . . . t (8)
A series of sensors have been used for occupancy detection. The sensor used in our
project is DHT11. The DHT11 is a commonly used temperature and humidity sensor.
A series of DHT11 sensors were connected to an Arduino and was placed in a room.
Data from the sensors was collected for a week. Temperature and humidity values
for all the sensors were observed.
The dataset has 2000 data items with two classes. It has the following numeric
attributes: Temperature 1, Temperature 2, Temperature 3, Humidity 1, Humidity 2
and Humidity 3. The classes are Not Occupied and Occupied. Each class contains
Multi-Sensor Data Fusion for Occupancy Detection Using Dempster … 9
1000 instances. The data has been apportioned into training set and test set with a
split of 70–30%.
The current section reports the experimental results obtained after the experiments
have been carried out following the procedure depicted in Sect. 2.6. Tables 1, 2, 3, 4
and 5 depict the performance of classes (Not Occupied and Occupied) after applying
various classification algorithms. The performance shown is based upon three metrics
value, i.e., precision, recall and F1-score.
In Tables 6 and 7, the experimental outcomes of the proposed models have been
shown. The results suggest that the performance of logistic regression before applying
data fusion has an accuracy of 94%, precision 91%, F1-score 94% and recall 97%,
while after applying data fusion, the accuracy increases to 99%, precision 97%, F1-
score 98% and recall 99%. After applying data fusion, the accuracy of KNN increases
from 93 to 97%, precision from 92 to 96%, recall from 95 to 97% and F1-score from
93 to 97%. After applying data fusion, the accuracy of SVM increases from 94 to
97%, precision from 91 to 96%, recall from 98 to 99% and F1-score from 95 to
97%. After applying data fusion the accuracy of Decision Tree increases from 92
to 98%, precision from 90 to 97%, recall from 94 to 98% and F1-score from 92
Table 6 Performance of various classifiers under the study before applying proposed data fusion
technique
Classifier Accuracy Precision F1-score Recall
Logistic regression 0.94 0.91 0.94 0.97
KNN 0.93 0.92 0.93 0.95
SVM 0.94 0.91 0.95 0.98
Decision tree 0.92 0.90 0.92 0.94
Random forest 0.93 0.91 0.93 0.95
Table 7 Performance of various classifiers under the study after applying proposed data fusion
technique
Classifier Accuracy Precision F1-score Recall
Logistic regression 0.99 0.97 0.98 0.99
KNN 0.97 0.96 0.97 0.97
SVM 0.97 0.96 0.97 0.99
Decision tree 0.98 0.97 0.98 0.98
Random forest 0.99 0.97 0.98 0.99
to 98%. After applying data fusion, the accuracy of random forest increases from
93 to 99%, precision from 91 to 97%, recall from 95 to 99% and F1-score from
93 to 98%. Hence, it is evident that before applying data fusion, logistic regression
and SVM have the highest performance, while after applying data fusion, logistic
regression and random forest have the highest performance. The results also suggest
that Dempster–Shafer-based data fusion method has increased the performance of
all the models to a greater extent.
Figure 2 reports that various classification algorithms have been applied on the
collected data, and accuracy of each classification algorithm has been found out
before and after applying the proposed data fusion algorithm. The same has been
applied for precision, recall and F1-score as well. It reveals that data fusion improves
the performance of all the classifiers under the current study.
5 Conclusion
Multi-sensor data fusion technique has been applied for occupancy detection within
this paper. We have applied Dempster–Shafer evidence theory for multi-sensor data
fusion. Within our work, a technique for computing mass function using temperature
and humidity as parameters has been proposed. The dataset was classified into two
possible classes utilizing a two-venture approach that utilized the values of class
limit in the training data for every feature. Furthermore, the data items which are
not been assigned to a singleton class were classified further that used standard
Multi-Sensor Data Fusion for Occupancy Detection Using Dempster … 11
1 1
0.95 0.95
0.9 0.9
0.85 0.85
Before data fusion After data fusion Before data fusion After data fusion
Before data fusion After data fusion Before data fusion After data fusion
Random Forest
1
0.95
0.9
0.85
Fig. 2 Comparison of performance measure of logistic regression, SVM, KNN, decision tree and
random forest before and after data fusion
deviation measures for choosing an attribute for ultimate classification. The decision
rule to ascertain the class based on the fusion mass values has been proposed. Our
experiment has demonstrated that the proposed data fusion method enhances the
performance of classifiers under our study to a greater extent.
12 S. Sarkar et al.
References
1. J. Han, A. Shah, M. Luk, A. Perrig, Don’t sweat your privacy using humidity to detect human
presence (2007)
2. B. Khaleghi, A. Khamis, F.O. Karray, S.N. Razavib, Multisensor data fusion: a review of the
state-of-the-art. Inf. Fusion 14(1), 28–44 (2013)
3. G. Fortino, S. Galzarano, R. Gravina, W. Li, A framework for collaborative computing and
multi-sensor data fusion in body sensor networks. Inf. Fusion 22, 50–70 (2015)
4. A. Ardeshir Goshtasby, S. Nikolov, Image fusion: advances in the state of the art. Inf. Fusion
8(2), 114–118 (2007)
5. M. Panicker, T. Mitha, K. Oak, A.M. Deshpande, C. Ganguly, Multisensor data fusion for an
autonomous ground vehicle, in 2016 Conference on Advances in Signal Processing (CASP)
(Pune, India, 2016), pp. 507–512
6. G. Dong, G. Kuang, Target recognition via information aggregation through dempster–shafer’s
evidence theory. IEEE Geosci. Remote Sens. Lett. 12(6), 1247–1251 (2015)
7. F. Xiao, Multi-sensor data fusion based on the belief divergence measure of evidences and the
belief entropy. Inf. Fusion 46, 23–32 (2019)
8. G. Cheng, X.-H. Chen, X.-L. Shan, H.-G. Liu, C.-F. Zhou, A new method of gear fault diagnosis
in strong noise based on multi-sensor information fusion. J. Vib. Control 22(6), 1504–1515
(2016)
9. A. Ranganathan, J. Al-Muhtadi, R.H. Campbell, Reasoning about uncertain contexts in
pervasive computing environments. IEEE Pervasive Comput. 3(2), 62–70 (2004)
10. J. Zhou, L. Liu, J. Guo, L. Sun, Multisensor data fusion for water quality evaluation using
dempster-shafer theory (2013)
11. A.P. Dempster, Upper and lower probabilities induced by a multivalued mapping. Ann. Math.
Statist. 38, 325339 (1967)
12. G. Shafer, A Mathematical Theory of Evidence (Princeton University Press, Princeton and
London, 1976)
13. Q. Chen, A. Whitbrook, U. Aickelin, C. Roadknight, Data classification using dempster-shafer
theory (2014)
Sentiment Analysis: A Recent Survey
with Applications and a Proposed
Ensemble Algorithm
Abstract With the dramatic increase in the massive amount of data on e-commercial
sites, it becomes difficult to analyze the sentiment manually. Sentiment analysis auto-
matically analyzes the user sentiment in terms of positive, negative, or neutral using
natural language processing. This paper presents a survey on sentiment analysis
with its various approaches and algorithms in detail based on supervised and unsu-
pervised learning. Further, this paper analyzes the pros and cons of some current
studies with their approaches and compares their results on various datasets. Also,
this paper discusses the uses of sentiment analysis in real life. Finally, this paper
proposes a new hybrid algorithm carrying various features for sentiment analysis
using AdaBoost and majority voting ensemble techniques.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 13
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_2
14 S. Jain and V. Gupta
The primary goal of sentiment analysis is to solve a classification problem that deter-
mines whether a review is positive, negative, or neutral. Sentiment analysis can be
done on many levels: word level, sentence level, document level, and aspect level
(feature level) [1]. Many researchers are experimenting with different approaches
and algorithms in natural language processing to understand better sentiment analysis
based on machine learning algorithms. The machine learning algorithm is categorized
further into supervised and unsupervised-based learning approaches. Therefore, this
paper performs an overview of recent research based on machine learning models.
This paper also proposes an ensemble model which trains multiple classifiers using
ensemble methods such as AdaBoost and maximum voting.
The structure of this survey paper describes as follows. Section 2 discusses senti-
ment analysis and its methods, as well as a description of the findings. Section 3
examines sentiment analysis applications, as well as Sect. 4 presents the proposed
methodology, and Sect. 5 examples for sentiment analysis. Finally, Sect. 6 presents
the conclusion for sentiment analysis.
Sentiment analysis determines the views of the users from social media based
on machine learning techniques which further divides it into various approaches
mentioned below:
to implement can be used in text classification. Seref and Bostanci [2] used Hadoop
tools to perform sentiment analysis on various dataset sizes to classify reviews as
positive, negative, or neutral using NB and complement Naive Bayes (CNB) discov-
ered that CNB provides the best overall accuracy. Abbas et al. [3] used multinomial
Naïve Bayes (MNB) and the term frequency-inverse document frequency (TF-IDF)
approach to classify movie reviews in text into positive and negative categories in
their paper. They used the TF-IDF method to concentrate on both the frequency and
uniqueness of words. To enhance the efficiency of CNB, Cahya et al. [4] used a
feature weighting method based on a genetic algorithm (GA).
SVM is a text classification method based on supervised machine learning. The hyper-
plane maximizes the distance between the separating hyperplanes used to perform
classification. It is capable of solving both linear and nonlinear problems. Kernel
trick used to transform a non-separable problem into a separable problem. Linear,
polynomial, and radial basis kernel function (RBF) are three types of kernel func-
tions. The improved RBF kernel of SVM (SVM-IRBF) had used by the author’s
Gopi et al. [5]. Kumar and Subba [6] proposed a TF-IDF vectorizer and SVM-based
sentiment analysis at the document level to analyze the polarity of documents in a
text data corpus.
ME classifier is a probabilistic classifier that falls under the exponential model cate-
gory. ME modeling is to calculate a probability distribution with the highest entropy
based on the evidence available. Xie et al. [8] suggest a novel maximum entropy based
16 S. Jain and V. Gupta
on probabilistic latent semantic analysis (PLSA). The above results for supervised
learning approaches are summarized in Table 1.
It is a machine learning technique in which we train the machine using unlabeled data.
The main aim is to group unsorted data based on similarities and patterns between
them without the need for prior training. Clustering is one of the types of unsuper-
vised machine learning. The following are the types of unsupervised clustering-based
learning algorithms:
K-means clustering is the partitional-based clustering used to classify the dataset into
clusters by several predefined k clusters. Orkphol and Yang [15] improve K-means
clustering for a high-dimensional dataset with high sparseness. They used TF-IDF for
selecting appropriate features and the singular value decomposition (SVD) technique
to minimize the dataset’s high dimensionality. They suggested a new approach for
finding the best initial state of centroids called artificial bee colony (ABC). Rehioui
and Idrissi [16] proposed a new algorithm merging two clustering algorithms: K-
means and density-based clustering (DENCLUE) and its variants to analyze the
sentiments of tweets.
FCM algorithm is a popular fuzzy clustering method. The main aim is to reduce
the objective function. Trupthi et al. [20] proposed a new topic modeling method-
ology based on latent Dirichlet allocation (LDA) to identify topics and possibilistic
fuzzy c-means (PFCM) for classification. The above results for unsupervised learning
approaches are summarized in Table 2.
Sentiment analysis helps companies and service providers to understand their buyers’
and users’ mindsets so that they can modify their products and services to meet their
demands. Some of the applications are given below with their brief explanation.
(1) Product analysis: Before purchasing a product, a new consumer may use
sentiment analysis to decide whether all of the reviewers are in favor or against
it. A graph-based approach is utilized in a study [31] to categorize user reviews
of the Redmi Note1 smartphone as positive or negative.
(2) Movie review analysis: In the paper [32], SA is performed on the movie review
dataset using various machine learning classifiers to classify the dataset into
positive and negative.
(3) Customer feedback: In paper [33], SA is used to analyze the customer tweets
on US Airline Services using supervised machine learning algorithms. TF-
IDF and bag-of-words (BOW) feature vector representation approach used
with bigrams.
(4) Stock market prediction: In paper [34], the sentiment analysis for stock price
prediction is taken into account using various classifiers such as Naïve Bayes,
SVM, and logistic regression.
(5) Government Intelligence: Sentiment analysis is to check the impact of demo-
nization in India using a machine learning method such as KNN and support
vector classifier (SVC) [35].
4 Proposed Methodology
Fig. 1 Proposed
architecture for sentiment
analysis Data Collection
Data pre-processing
Performance Evaluation
4.1 Preprocessing
In this first stage, data that does not contain any emotion remove to prevent the
challenge from being too big in the later stages of sentiment analysis and store the
preprocess data into the F1_Data file. The following steps are:
• Tokenize the data so that it can express in words.
• Convert all the entire words to lower case.
• Remove all the web addresses and stop words that do not contain any emotion by
comparing them with the stop words dictionary [36].
• Replace all the acronyms with their true meanings by contrast with an acronym
dictionary, such as (2day → today) [37].
• Stemming
• POS (part of speech) tagging like “This mobile is good and perfect.” becomes
“This/DT mobile/NN is/VBZ very/RB good/JJ and/CC perfect/JJ.”
• Not remove punctuation and emoticons.
After preprocessing, various feature extraction methods, such as TF-IDF and BOW,
apply to the features extracted from the dataset. These feature extraction techniques
create the feature vector of the data. The following features included:
• Unigram, bigram features considering one or two consecutive terms into account,
respectively.
Sentiment Analysis: A Recent Survey with Applications … 21
• Bi-tagged features extracted from a POS tagging pattern in which one word
contains either an adjective or an adverb [38, 39].
• The total count of positive expressions includes positive words, positive emoti-
cons, and positive exclamations [16].
• The total count of negative expressions includes negative words, negative
emoticon, and negative exclamations [16].
• The total count of neutral expressions includes neutral words, neutral emoticons,
and neutral exclamations [16].
• The emoticon lexicon [40] uses to extract the sentiment associated with emoticons.
• SentiWordNet [41]
• In this stage, feature selection techniques such as the filter method [42] use to
select the best possible set of features for constructing a machine learning model.
In the third stage, machine learning classifiers use to train these selected features.
MNB, CNB, SVM-IRBF, ME, random forest (RF), and KNN classifiers use for
sentiment classification.
At the last stage, we apply ensemble techniques where many algorithms are used for
classification and evaluate the performance using accuracy, precision, recall, and F1
score.
AdaBoost is a boosting algorithm that transforms weak classifiers into strong ones.
In this technique, weights are reassigned to each instance, with higher weights to
wrongly categorized ones. Algorithm 1 shows the detailed steps of the proposed
approach.
22 S. Jain and V. Gupta
5 Illustrative Example
6 Conclusion
References
1. A. Kumar, T.M. Sebastian, Sentiment analysis: a perspective on its past, present and future. I.J.
Intell. Syst. Appl. 10, 1–14 Published Online September 2012 in MECS (http://www.mecs-
press.org/). https://doi.org/10.5815/ijisa.2012.10.01.
2. B. Seref, E. Bostanci, Sentiment analysis using Naive Bayes and complement Naive Bayes
classifier algorithms on Hadoop framework, in 2018 2nd International Symposium on Multi-
disciplinary Studies and Innovative Technologies (ISMSIT) (Ankara, Turkey, 2018), pp. 1–7.
https://doi.org/10.1109/ISMSIT.2018.8567243
3. M. Abbas, K. Ali Memon, A. Aleem Jamali, Multinomial Naive Bayes classification model
for sentiment analysis. IJCSNS Int. J. Comput. Sci. Netw. Secur. 19(3), 62 (2019). https://doi.
org/10.13140/RG.2.2.30021.40169
4. R.A. Cahya, D. Adimanggala, A.A. Supianto, Deep feature weighting based on genetic algo-
rithm and Naïve Bayes for Twitter sentiment analysis, in 2019 International Conference on
Sustainable Information Engineering and Technology (SIET) (Lombok, Indonesia, 2019),
pp. 326–331. https://doi.org/10.1109/SIET48054.2019.8986107
5. A.P. Gopi, R.N.S. Jyothi, V.L. Narayana et al., Classification of tweets data based on polarity
using improved RBF kernel of SVM. Int. J. Inf. Technol. (2020). https://doi.org/10.1007/s41
870-019-00409-4
6. V. Kumar, B. Subba, A Tfidf Vectorizer and SVM based sentiment analysis framework for
text data corpus, in 2020 National Conference on Communications (NCC) (Kharagpur, India,
2020), pp. 1–6. https://doi.org/10.1109/NCC48643.2020.9056085
7. C. Jiang, Y. Li, L. Li, A. Liu, C. Liu, News readers’ sentiment analysis based on fused-
KNN algorithm, in 2019 4th International Conference on Computational Intelligence and
Applications (ICCIA) (Nanchang, China, 2019), pp. 21–29. https://doi.org/10.1109/ICCIA.
2019.00012
8. X. Xie, S. Ge, F. Hu et al., An improved algorithm for sentiment analysis based on maximum
entropy. Soft. Comput. 23, 599–611 (2019). https://doi.org/10.1007/s00500-017-2904-0
9. https://snap.stanford.edu/data/web-Movies.html. Accessed on 21 May 2021
10. http://qwone.com/~jason/20Newsgroups/. Accessed on 22 May 2021
11. Airline-twitter-sentiment, 2015. [Online]. Available: https://www.crowdflower.com/data/air
line-twitter-sentiment/. Accessed on 15 May 2021
12. C. Sindhu, D. Rajkakati, C. Shelukar, Context-based sentiment analysis on amazon product
customer feedback data, ed. by D. Hemanth, G. Vadivu, M. Sangeetha, V. Balas. Artificial
Intelligence Techniques for Advanced Computing Applications. Lecture Notes in Networks
and Systems, vol 130. (Springer, Singapore, 2021). https://doi.org/10.1007/978-981-15-5329-
5_48
13. S. Brody, N. Elhadad, Restaurant review corpus (2009). http://people.dbmi.columbia.edu/noe
mie/ursa. Accessed on 18 May 2021
24 S. Jain and V. Gupta
Abstract Since December 2019, the world has started getting affected by a widely
spreading virus which we all call the coronavirus. This virus is spread all across the
globe, causing many severe health problems and deaths too. COVID-19 is spread
when a healthy person comes in contact with the droplets generated when an infected
person coughs or sneezes. So, the WHO has suggested some precautionary measures
against the spread of this disease. These measures include wearing a mask in public,
maintaining social distancing, avoiding mass gatherings. To help reduce the virus’
spread, in this paper, we are proposing a system that detects unmasked people, iden-
tifies them, checks if social distancing is followed or not, and also provides a feature
of contact tracing. The proposed system consists of mainly two modules: face mask
detection and social distancing. There are two more modules which include face
recognition and contact tracing. We used two datasets for training our models. First
one to detect masks on faces. For this purpose, we collected the image dataset from
GitHub and Kaggle. And, the second dataset was for face recognition in which we
took our own images for training purposes. It is hoped that our model contributes
toward reducing the spread of this disease. Along with COVID-19, this model can
also help reduce the spread of similar communicable disease scenarios.
1 Introduction
Up to December 2019, we were living normally. But after December 2019, a deadly
virus emerged that changed our lives. It is the most widespread virus—coron-
avirus. It causes cough, fever, serious lung damage, and similar effects. It was
and continues to be so prevalent that the World Health Organization called it a
global pandemic. As per worldometers.info Web site, up to August 10, 2021, there
were approximately 203,456,760 cases of COVID-19 reported worldwide. Of those,
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 27
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_3
28 R. Rashinkar et al.
2 Related Works
Since the outbreak of COVID-19, many researchers have studied the symptoms,
preventive measures, etc. of COVID-19. Many have developed various models for
various purposes which would help in controlling the spread of the virus.
Many systems were proposed for mask detection. Hue-Saturation-Intensity (HSI)
color space from Rahman et al. [1] presented a system for smart cities which was
useful in reducing the spread of the virus. It only determined whether people were
masked or unmasked. The model proposed by Rahman et al. in [1] had some issues
like it was not able to differentiate a masked face and a face covered with hands.
Another model Chavda et al. [2] used a two-stage face mask detector. The first stage
used a RetinaFace model Jiang et al. [3], Ejaz et al. [4] for robust face detection. The
second stage involved training three different lightweight face mask classifier models
on the dataset. The NASNetMobile-based model was finally selected for classifying
masked and unmasked faces. Ge et al. in [5] used locally linear embedding for
detecting masked faces. They used MAFA dataset for training and included three
types of locations, that is, of face, eyes, and the mask along with three types of face
characteristics, namely left, right, front, left-front, right-front. In addition to this,
various occlusion degrees were also taken into consideration of training the masked
faces to the model.
There are numerous systems used for face recognition. MobileNetV2 architecture
and computer vision are used to help maintain a safe environment and monitor public
places to ensure individual’s safety. Along with cameras for monitoring, a micro-
controller has been used in Yadav [6]. Masked face recognition is done using the
An Automated System for Facial Mask Detection and Social … 29
FaceNet pretrained model Ejaz and Islam [7]. Three datasets are used in training the
model in Ejaz and Islam [7].
A method for distance measurement using only a single target image was proposed
by Xu et al. [8]. In this target image, it has some measurements such that they
can achieve image pixels to real distance ratio for distance calculation. Distance
information is drawn by integrating the moving target detection method which is
formed by Gaussian mixture model (GMM) and the Hue-Saturation-Intensity (HSI)
color space. Punn et al. [9] have used the formulae using focal length of camera
along with Euclidean distance to calculate the distance among people in Punn et al.
[9]. They have proposed a deep learning approach for automating the monitoring of
people for social distancing using surveillance video. They made use of YOLOv3
architecture to distinguish between human beings and the background.
3 Methodology
We are proposing a system where people are monitored using an automated system.
For creating this system, we have used a webcam initially. One can also use CCTV
for the same purpose. With the help of these cameras, real-time video is captured and
given as input to the proposed system. The input video can be of any duration as per
the requirements. But for testing purpose, we gave a 2-minute real-time video input.
The system will then detect whether a person is masked or unmasked and whether
social distancing is maintained in that particular frame of the given input video. If
any person is found without a face mask, then his/her face is recognized, and this
information is stored in an Excel sheet that will be generated for further action, and
also, notification will be sent to that individual as well as the respective authority.
We require image processing as proposed by Hire et al. [10] for the purpose of
detecting face masks, recognizing a person’s identity, and for differentiating people
from the background. So, the first and the most crucial step in our system is image
processing. And, for this, we are taking images from the live video which will be
provided as an input to our system. This video is captured using a webcam. The
captured video is processed in the form of continuous frames, and these frames
need to be preprocessed as there is a large amount of data which is not useful for
our purpose. These frames are in RGB color format and are required to convert
into the grayscale format as stated by Mane and Shinde [11]. After transforming to
the grayscale format, we get the frames in which all the unnecessary information
is eliminated. Following the transformation into the grayscale format, the frames
are reshaped uniformly and are normalized in the range from 0 to 1. Performing
30 R. Rashinkar et al.
normalization helps in making the image processing process faster as the frames
contain only the helpful features.
We have made use of the deep learning architecture. The deep learning algorithms
learn different types of features from the given input Somvanshi et al. [12], Patil and
Shinde [13]. Important features like edge detection, line detection such as vertical
lines, horizontal lines, blob detection are detected from the input image. The archi-
tecture is used to predict the previously not seen input or samples. Among all the deep
learning architectures, we chose convolutional neural network architecture. CNN is
basically used for classification and recognition, and this architecture gives pretty
good accuracy. Along with this, there is no need for human supervision when CNN
is used. This is because the architecture automatically detects the important features.
Getting a dataset to train our models was a crucial thing to do. Thus, we got required
images from various sources like GitHub, Kaggle, and from Google Images [14].
The image dataset that we have downloaded for the face mask detection model is
from Kaggle, and it contains 3833 images. Out of these 3833 images, 1915 were
with masked faces and remaining were of unmasked faces.
For face recognition, our faces’ images were taken, and a dataset out of it was
made. This was done using a user interface where we provided a button to get images
of the person who is in the front of the camera along with an identity number and
other details like name, email ID, and his/her contact number. After saving the faces’
images, we trained our face recognition model with these faces. For this purpose, we
used the Haar cascade.
3.2.2 Architecture
this results in a simplified computation network. After this layer, a flatten layer is
applied, and this layer transforms the data into a one-dimensional array. Then, this
one-dimensional array is given to the dense network. This dense network layer learns
parameters that are helpful for the classification. This dense network layer consists
of a series of neurons which learns the nonlinear features. After this, a dropout layer
is applied which prevents overfitting by dropping some of the units.
For training our face mask detection model, we used MobileNetV2 with 53 layers.
We used the learning rate of 1e-4 so as to minimize the loss. Sigmoid and ReLU are
the activation functions that we have used in this model. In addition to this, we used
Adam optimizer.
Figure 1 is the block diagram of the proposed model which will help in
understanding the flow of the system proposed in this paper.
For detecting masked and unmasked people, we have trained our model with the
dataset which is discussed in the above Sect. 3.2.1. In this module, we are going
to use CNN along with the MobileNetV2 architecture for the purpose of image
preprocessing. When compared with other architectures, MobileNetV2 models are
faster for the same accuracy. We gave input image parameters to the MobileNetV2
architecture, and after that, we set the next maxpooling layer and fully connected
layer same as in CNN. This is depicted in Fig. 2.
A pretrained face detector FaceNet is used for detecting the faces in the given input
image. We imported a couple of necessary files which define model architecture and
pretrained weights for face detection.
Dataset of masked and unmasked images are taken from Kaggle, Google Images,
and open-source image libraries. Figures 3 and 4 show training images of face mask
detection modules.
32 R. Rashinkar et al.
For performing face recognition, first we need to detect if the person is wearing
a face mask or not. If the person is not wearing a mask, then we have to identify
that person’s identity. In this module, we are using the Haar cascade algorithm for
identifying faces. The dataset we used for face recognition consists of images of our
faces. These images are given at the real time itself by using a webcam.
To recognize the people, we are giving the ID number for each person and
capturing 100 face images of each person. This count can be increased so as to
increase the accuracy of the model. Figure 5 represents a dataset that is created of
face images of that person to train the model.
An Automated System for Facial Mask Detection and Social … 33
For recognizing images, we have used the Haar cascade feature classifier which is
an object detection algorithm used to identify faces in an image or a real-time video.
It is used to detect faces and also some features like eyes, lips. This is all possible
because the algorithm uses line detection or edge detection features.
The line features shown in Fig. 6 help us to recognize the people who are detected
unmasked. “Haarcascade_frontalface_default.xml” file is imported using cascade
classifier for detecting and finding location of face in input image. We used LBPH
34 R. Rashinkar et al.
Fig. 6 Haar cascade line/edge detection using two rectangle, three rectangle, and four rectangle
features
recognizer for training of images that we had created earlier. After that, we trained
the recognizer with face images and their respective IDs.
The trained model using Haar cascade algorithm detects and identifies a person’s
name if not wearing a mask with the help of a list of names corresponding to the
respective IDs. This identified person’s information is stored, and notification is sent
through an email for further action. This is how we implemented face recognition in
our system.
To verify the exercise of social distance among the people at crowded places or
workplaces, we proposed a mechanism which can detect if people are maintaining
social distance or not. In this mechanism, we used an object detection approach
named YOLOv3 for detection of a person as an object from the input image. This
object detector identifies and detects all persons from the image by putting bounding
boxes around each person, Punn et al. [9]. Here, we imported three necessary files
that were required for YOLOv3 setup. Data from the following three files is used for
setting up this object detector.
(i) “yolov3.weights” file contains pretrained weights of the neural network.
(ii) “yolov3.cfg” contains neural network model architecture.
(iii) “coco.names” has a list of 80 object classes that the model will be able to
detect.
An Automated System for Facial Mask Detection and Social … 35
In case a person is detected positive for COVID-19, it is necessary to trace all the
people who came in contact with the infected person before 2–3 days the person
was detected positive. For carrying out this task, we need a list of people who were
in contact with the infected person. But, before creating the list, we also need to
keep in mind that wearing a mask and maintaining appropriate distance with the
infected individual would decrease the chances of a healthy person getting infected.
The infected person cannot be recognized by our system, and thus, we need to go
through the list manually. Therefore, people who are not following the norms, not
following preventive measures should be the ones whose details must be in the file.
To do this task, we are maintaining an Excel sheet which will save the information
of people who were not wearing masks. This list will be helpful for contact tracing.
4 Results
We are assuming that before using our system for monitoring people, the system is
trained by every person’s face, and we have his/her information such as email ID for
contacting them. This can be done using the user interface that we have provided
(mentioned in methodology, collection of datasets).
36 R. Rashinkar et al.
Figure 7 show the results of our system when a person is masked and unmasked,
respectively.
In Fig. 7a, as the person is with a mask on his face, our model detects him as a
masked image and gives a green-color bounding box with “Mask” tag. As we have
to notify authorities about the person’s identity, we need to identify people who are
not following the preventive measures. This task of finding the person’s identity is
done in the face recognition module. The same person is present without a mask in
Fig. 7b, and he is recognized as “Swapnil” because we have trained our model with
the images of Swapnil’s face and given the name tag as “Swapnil”. And thus, we can
identify the unmasked persons and notify them by email for further actions.
In Fig. 8a, we can see that there are two persons who are standing far away from each
other. Our social distancing module which uses YOLOv3 architecture detects them,
and using Euclidean distance, distance between them is calculated. As the distance
calculated is greater than the threshold value, the bounding boxes drawn are in green
color, whereas in Fig. 8b, the same two persons are seen nearer to each other, and
thus, the distance between them comes out to be less than the threshold value, and
hence, their bounding boxes are turned red denoting that these two people are closer
than necessary.
An Automated System for Facial Mask Detection and Social … 37
We can see that both persons are identified, and their name is shown in Fig. 9. This
is how persons are identified, and their information is maintained in Excel sheet for
the purpose of contact tracing. Information like the name of the person who is not
following the government norms and his/her other contact details is stored in this
Excel sheet.
5 Conclusion
In this paper, we proposed a model that uses MobileNetV2 and Haar cascade algo-
rithm for face mask detection and face detection. Social distancing is monitored by
camera using YOLOv3. In this COVID-19 situation, it is really important that we
have to follow all the guidelines given by the government. For that purpose, we have
developed this system to detect whether people are wearing masks or not and to check
whether they are maintaining social distance. If people are not wearing masks and
not maintaining social distancing, then they are detected and identified and notified
to the respective authorities. We have successfully developed a module that detects
whether people are wearing masks or not, maintaining social distance or not, and if
not, they are identified and their information is stored so as to keep a track and to
use the information in the future in case a person is tested positive for COVID-19.
This module will be helpful to reduce spread of disease and can be used in private
organizations, schools, colleges, etc.
References
1. M.M. Rahman, M.M.H. Manik, M.M. Islam, S. Mahmud, J.-H. Kim, An automated system to
limit COVID-19 using facial mask detection in smart city network, in 2020 IEEE International
IOT, Electronics and Mechatronics Conference (IEMTRONICS), pp. 1–5 (2020). https://doi.
org/10.1109/IEMTRONICS51293.2020.9216386
2. A. Chavda, J. Dsouza, S. Badgujar, A. Damani, Multi-stage CNN architecture for face mask
detection, in 2021 6th International Conference for Convergence in Technology (I2CT), pp. 1–8
(2021). https://doi.org/10.1109/I2CT51068.2021.9418207
3. M. Jiang, X. Fan, H. Yan, RetinaMask: a face mask detector (2020). [Online]. Available: http://
arxiv.org/abs/2005.03950
4. M.S. Ejaz, M.R. Islam, M. Sifatullah, A. Sarker, Implementation of principal component anal-
ysis on masked and non-masked face recognition, in 2019 1st International Conference on
Advances in Science, Engineering and Robotics Technology (ICASERT) (Dhaka, Bangladesh,
2019), pp. 1–5. https://doi.org/10.1109/ICASERT.2019.8934543
5. S. Ge, J. Li, Q. Ye, Z. Luo, Detecting masked faces in the wild with LLE-CNNs, in 2017
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI, 2017),
pp. 426–434. https://doi.org/10.1109/CVPR.2017.53
6. S. Yadav, Deep learning based safe social distancing and face mask detection in public areas
for COVID-19 safety guidelines adherence. Int. J. Res. Appl. Sci. Eng. Technol. 8, 1368–1375
(2020). https://doi.org/10.22214/ijraset.2020.30560
7. M.S. Ejaz, M.R. Islam, Masked face recognition using convolutional neural network, in 2019
International Conference on Sustainable Technologies for Industry 4.0 (STI), pp. 1–6 (2019).
https://doi.org/10.1109/STI47673.2019.9068044
8. Z. Xu, L. Wang, J. Wang, A method for distance measurement of moving objects in a monocular
image, in 2018 IEEE 3rd International Conference on Signal and Image Processing (ICSIP),
pp. 245–249 (2018). https://doi.org/10.1109/SIPROCESS.2018.8600495
9. N. Punn et al., Monitoring COVID-19 social distancing with person detection and tracking via
fine-tuned YOLO v3 and deep-sort techniques. 2020. [Online]. Available: https://arxiv.org/abs/
2005.01385
An Automated System for Facial Mask Detection and Social … 39
10. M. Hire, S. Shinde, Ant colony optimization based exudates segmentation in retinal fundus
images and classification, in 2018 Fourth International Conference on Computing Communi-
cation Control and Automation (ICCUBEA) (Pune, India, 2018), pp. 1–6. https://doi.org/10.
1109/ICCUBEA.2018.8697727
11. S. Mane, S. Shinde, A method for melanoma skin cancer detection using dermoscopy images, in
2018 Fourth International Conference on Computing Communication Control and Automation
(ICCUBEA) (Pune, India, 2018), pp. 1–6. https://doi.org/10.1109/ICCUBEA.2018.8697804
12. M. Somvanshi, P. Chavan, S. Tambade, S.V. Shinde, A review of machine learning techniques
using decision tree and support vector machine, in 2016 International Conference on Computing
Communication Control and automation (ICCUBEA) (Pune, 2016), pp. 1–7. https://doi.org/
10.1109/ICCUBEA.2016.7860040
13. C. Patil, S. Shinde, Leaf detection by extracting leaf features with convolutional neural network
(May 18, 2019), in Proceedings of International Conference on Communication and Infor-
mation Processing (ICCIP) 2019. Available at SSRN: https://ssrn.com/abstract=3419766 or
https://doi.org/10.2139/ssrn.3419766
14. Dataset: https://www.kaggle.com/andrewmvd/face-mask-detection, https://github.com/balaji
srinivas/Face-Mask-Detection/tree/master/dataset Accessed on 23 April 2021
15. S. Deshmukh, S. Shinde, Diagnosis of lung cancer using pruned fuzzy min-max neural network,
in 2016 International Conference on Automatic Control and Dynamic Optimization Techniques
(ICACDOT) (Pune, 2016), pp. 398–402. https://doi.org/10.1109/ICACDOT.2016.7877616
Detection of Insider Threats Using Deep
Learning: A Review
Abstract Massive number of cyberattacks exist on the Internet, among which insider
threat is one of the most challenging malicious threats in cyberspace. The iden-
tification of insiders (attackers) is a very hard-hitting job within an organization
and discriminating benign employees and insiders is crucial. Hence, the automa-
tion of insider threat detection using machine learning and deep learning techniques
improves the detection performance and helps in analyzing the characteristics of an
insider. Several learning models have been developed, of which deep learning tech-
niques are promising as it offers high-quality results and it does not require feature
engineering. Assorted deep learning techniques have been employed to discrimi-
nate insiders from benign employees, and this review article articulates the deep
learning techniques presented so far in the literature for effective insider threat
detection. The performance of the deep learning techniques and their discrimination
ability, commonalities, and differences among the cybersecurity researchers based
on the metrics are summarized in a motive to provide a clear insight to the budding
cybersecurity researchers.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 41
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_4
42 P. Lavanya and V. S. Shankar Sriram
protects the digitalized data and restricts unauthorized users to access the network
using advanced technology to safeguard cyberspace. Increased cybersecurity chal-
lenges lead to cyberattacks which are enclosed as a criminal offense that harms
the CIA (confidential, integrity, and authentication) factors of the system. Recently,
there is an abundant increase in cyberattacks due to the frequent utilization of online
activities. The three different forms of cyberattacks are targeted attack, untargeted
attack, and insider attack. Among them, Insider attacks are the most vulnerable and
are very difficult to detect owing to the following reasons: insiders have authorized
access, however, to perform unauthorized activities; the illegal employee can insert
a malicious code into the system which disturbs the system’s normal function and
have a practice of thieving an intellectual property [2].
Insider threat is one of the malicious threats where employers within an organiza-
tion will become the attacker. The term insider is defined as a person with authorized
entitlement who has the power to access all confidential information of the institution
illegally [2, 3]. The technical report from the CERT division states that a prolonged
investigation of almost 1500 and above insider threats has happened in both types
of public and private industries [4]. Though the process of detecting insider threats
is complex, cybersecurity researchers have designed different approaches namely
anomaly-based approaches, role-based access control, scenario-based approaches,
and psychological factors-based approaches [5] to safeguard the digital environ-
ment. Most of the aforementioned approaches rely on deep learning techniques as it
detects the attack vectors automatically without human intervention.
Among these techniques, deep learning has been evolved in the detection process
for its higher-level feature extraction to solve complex data [6, 7]. The benefits of deep
learning are it provides the finest results for unstructured data and had well- labeled
data. Owing to the advantages of deep learning, it makes the detection of insider threat
to achieve a higher level of performance accuracy. This manuscript mainly focused
on the study of insider threat detection approaches, and classification of different
deep learning techniques is taken for detection approaches. The review manuscript
is designed as follows: Sect. 2 includes the detailed description of insider threat,
types, and categories of insiders and challenges of insider threat. Section 3 describes
the different types of insider threat detection approaches. Section 4 deals with the
study of the insider threat detection approach based on deep learning techniques.
Section 5 discusses the insider threat dataset, and Sect. 6 concludes the paper.
2 Insider Threat
An insider threat is a security threat that begins inside the focused organization. This
does not imply that an insider or attacker should be a current worker or official in the
organization. They could be a specialist, previous worker, colleague, or board part.
Detection of Insider Threats Using Deep Learning: A Review 43
The malicious insider threats are classified into four types. They are sabotage, theft,
fraud, and espionage [8]. The sabotage was one of the most intelligent attacks and
always concludes in considerable distress to people and organizations. These types
of insiders are normally dissatisfied employees with technical knowledge who have
authorized access. A well-defined example of IT sabotage is website defacement. The
second type of insider profiling is theft. It is the process of stealing the intellectual
assets within a company where they can access the data throughout a day and those
data are transferred to some other private companies. For instance, transferring some
important information from one company to another through an employer who can
be either technical or non-technical. Fraud is the third type. Fraud is the process of
accessing the company’s financial data illegally through authorized privileges. For
example, when a person theft a valuable thing from a company is said to be a fraud.
The last and final type is espionage. It is a kind of threat where the systematic nature
of corporate data gets extracted and also it delivers information about the benefits of
insider planning [9].
Insiders are of three types, namely traitors masqueraders and unintentional. The
description of types of insiders is pictured in Fig. 1.
The insiders are classified into four categories that depend on freedom in accessing
the system within an organization. They are pure insider, insider affiliate, insider
associate, and outside affiliate (Table 1).
Depends on the different types of insider threat, the categories of insider threat are
discussed below; they are self-motivated insiders, recruited insiders, and planted
insiders (Table 2)
44 P. Lavanya and V. S. Shankar Sriram
Some of the challenges faced by researchers in case of generating the dataset and
the process of detecting the insider threat are discussed below,
1. The process of detecting the insider threat is complex because the resemblances
of the user and the insider are similar
2. Insufficient real data and lack of ability to examine the encoded data
Detection of Insider Threats Using Deep Learning: A Review 45
Insider threat detection approaches are used to describe the techniques for identifying
insiders, and it defines the uncertainty of a user being an insider. It is based on
the attributes of the user or insider such as behavior, character, working profile,
and situation-based. Once the insiders are determined from the attributes of insider
threat detection approaches, then it makes it informal to identify the insider threat and
prevent the insider attack. The different types of insider threat detection approaches
are described below.
the hours of client spends on the PC), and (2) there are hardly any names accessible
ahead of time to show the “great” and “terrible” review information occurrences [12].
The scenario-based approach is based on datasets and the set of user activities
constructed at different levels of scenarios. Contingent upon a grouping of malig-
nant practices includes different types of scenario-based threats. Providing legitimate
definitions to this approach is in process. Basic scenarios in the detection process of
CMU dataset are changes in user’s behavior, avoiding removable drives in working
time but using it after when working time gets covered, and also when an employee
is in the position of shifting to another company, he/she are using their biometric in
their office to steal the confidential information, and issues in maintaining the pass-
words. All these scenarios can use the techniques of both deep learning and machine
learning [15].
The CNN-based insider threat detection methods are delivered here. The unautho-
rized access of role-based access control approach in a synthetic dataset used a
learning classifier system (LCS)-based CNN called (CN-LCS) where it optimizes
48 P. Lavanya and V. S. Shankar Sriram
the feature selection rules and does modeling, and classification produces an accu-
racy of 92% [12, 22]. An anomalous behavior approach of mouse clicking behavioral
features of the user in an organization used CNN for changing mouse activities into
images and gives both feature extraction and abstraction in the training phase. It
also provides results for classifying images with 85% accuracy of the open-source
dataset [23]. Another type of CNN is graph convolution network (GCN) which is used
for detecting the malicious team occurred based on anomalous behavior approach
where GCN drops signify data from spatial to graph with an accuracy 92% using
CMU CERT dataset [24].
RNN has input as sequential data and its functions on feedback loops. It takes the input
as portions and addresses the problem with a highly flexible deep learning concept
[25]. The back-propagation process begins when the error is encountered. The fore-
most issue of this process was vanishing and exploding gradients. The changes in
character or behavioral activities of a user come under a psychological factors-based
approach which is detected using a learning technique called deep neural network
(DNN) and recurrent neural network (RNN) in the CERT V6.2 dataset produces an
accuracy of 93% [26].
Gated feedback recurrent neural network resolves the limitations of vanishing and
gradient problems. It has two forms of gates: update and reset gates. The functions
associated with these gates are used to manage the flow between previous state and
current state inputs [31]. GRU is used to detect the psychological behavior of an
employee which comes under the psychological factors-based approach of insider
threat detection using the Enron email and tweets dataset which provides an accuracy
of 68% and 71% [32].
The output of DBN was produced from both probabilities and unsupervised learning.
The two types of layers are directed and undirected layers. It is composed of binary
latent variables. An adaptive optimization deep belief network (AODBN) technique
is proposed for the role-based insider threat detection approach; the DBN performs
the process of combining and regularizing the progress logs by absorbing both regular
and irregular characteristics of malicious threats where CERT-IT dataset is taken and
produce a 98% of accuracy [33]. The insiders occur in more different domains, and
the user performance is difficult to detect. DBN is used for the process of managing
the audit logs and it shows that the unseen features were removed and the One-
Class Support Vector Machine (OC-SVM) used for training the features produces
the training accuracy of 88% for the CMU CERT r4.2 dataset [34].
4.7 Autoencoder
The long short-term memory (LSTM) and convolutional neural network techniques
are applicable in the anomalous behavior approach of insider threat detection. The
role of LSTM on known user actions which take a portion of sorted temporal features
and the role of CNN in the classification of features matrices of user actions in
CMU CERT (public dataset) which gives the accuracy of 94% [2]. The method
proposed for detection of malicious activities under anomalous behavior approach is
the combination of neural temporal point process and recurrent neural network which
is known as hierarchical neural temporal point processes using CERT and UMD
Wikipedia dataset that gives accuracies of 90% in CERT and 91% in UMD Wikipedia
[37]. The neural network-based graph embedding adaptively learns discriminative
embedding’s from an account device graph dependent on two basic shortcomings of
aggressors that have been developed to avoid illegal access of user’s account using an
Alipay mobile payment gateway dataset which turns accuracy of 94% [38]. Thus, the
study on different types of deep learning techniques in insider threat detection had
clearly stated the technique of LSTM and CNN are more effective when compared to
all other deep learning techniques because LSTM mainly overcomes the limitations
of long-term dependencies and CNN handles the high features and large volumes of
data.
The overview architecture of insider threat detection using deep learning as shown
in Fig. 2 describes the entire working process. When a dataset is deployed with
preprocessing techniques, the methods of data cleaning and data normalization is
embedded here. Among those insider threat detection approaches and different deep
learning techniques, the researchers can select the approach and the method based
Fig. 2 Insider threat detection using deep learning architecture. CNN convolutional neural network,
LSTM long short-term memory, GRU gated recurrent unit, AE autoencoder, RNN recurrent neural
network, DBN Data Belief Network
Detection of Insider Threats Using Deep Learning: A Review 51
on the dataset. With this, the dataset deals with two techniques of training and testing
based on performance evaluation of precision, recall, f -score, ROC, and accuracy
with respect to detection of normal and insiders.
5 Datasets
The study of insider threat datasets is computed in five different types based on two
strategies called malicious and benign. The malicious strategy exists in two ways.
One is through violating the rules of the organization by the use of authorized user
access which is called a traitor-based dataset and another way is illegal accessing of
sensitive data is known as a masquerader-based dataset. When the above two ways
of malicious strategies are combined, then it is defined as miscellaneous malicious
as shown in Table 3.
In benign strategy, by discriminating whether the malicious was expressed by
correspondents of a dataset or not. The substituted masquerader and authentication-
based are two types of benign strategies. When a dataset sample comprising tags of
malicious outward, then it is said to be substituted, masquerader. The authentication-
based one is the process of user identification in a labeled dataset [39]. Apart from
these five types, the datasets of insider threat are beginning from real-world-based
datasets and laboratory-based datasets.
Above-discussed insider threat datasets are taken from many different universities
and laboratories. Among these datasets, the most primarily used dataset is CERT
and the secondary is ENRON. By comparing all the datasets, the CERT plays an
important role in all the proposed solutions of insider threat detection; it is due to
the characteristics of insiders present in it. The involvement of different types of
insider threat detection approaches in CERT makes it easy to detect the activity of an
insider whether he/she is an anomaly or role-based or scenario-based which results in
minimum time complexity. This article discuss about the performance metrics used
for the evaluation of deep learning models.
Figure 3 explains the survey-based statistical report of deep learning in insider
threat detection which describes the different types of deep learning techniques and
its performance metrics in year. Based on this, the yearly based review of each
techniques rise had special fall and rise which help the study of researchers in this
domain.
52 P. Lavanya and V. S. Shankar Sriram
Table 3 (continued)
Dataset and its types Description
CERT CERT is a computer emergency response team for major
(Miscellaneous malicious dataset) computer security incidents in its constituency. It is a group
of synthetic datasets for insider threat. This dataset is
available in the CERT insider threat center which shows the
characteristics of insider threat [49]. The common attributes
of CERT datasets are HTTP, email, login activity, file, and
devices include identity, user information, date, and PC [1]
TWOS TWOS is “The Wolf of STUD” was brought from
(Miscellaneous malicious dataset) Singapore University of technology and design in 2017. It
includes six different types of data: keystroke, mouse, host
monitor, network traffic, SMTP logs (email), and logon. It is
a collection of real time dataset which has a simple
conversation between the user with a host-based system
with authorized user information and some threat cases [40]
Schonlau It was given by Schonlau in the year 2001 which has 15,000
(Substituted masquerader dataset) Unix commands for each user, and it has 50 users of Unix
system logs. It had different types of performance within
organizations. Here, the masqueraders are combining the
data from varying unknown users; at the same time, it does
not have any malicious intent. Its features are time, user,
process, registry, and file access [50]
Greenberg It is composed of Unix csh (C shell) command line entries
(Authentication-based dataset) from 168 users. It holds the related information of
implemented Unix commands. Based on the knowledge and
aids of the user’s programming skills, the dataset is divided
into four sets. They are (1) novice programmers, (2)
experienced programmers, (3) computer scientists, and (4)
non-programmers. The features of Greenberg’s datasets are
the start and end times of a particular session, history, and
its error updates. The layout of the dataset given by
researchers in two forms is plain and enriched [51, 52]
Purdue University This dataset was given by Lane and Brodley in 1997. It
(Authentication -based dataset) includes Unix shell entries of eight users for the past two
years. It is a type of enriched dataset which consists of Unix
command names and arguments
MITRE Owl The MITRE provides an organization-based wide learning
(Authentication-based dataset) (OWL) dataset. It is purely based on data that gives a
statistical representation of specific user feedback and
training the users. It includes logs of 24 Mac system users.
The users of this dataset are working in the area of artificial
intelligence and technology. This dataset is very flexible for
graphical user interactions (GUI)-based applications which
are used for user authentication [51]
LANL The LANL dataset was taken from Los Alamos National
(Authentication-based dataset) Laboratory in the year 2015. It has 12,425 user’s system,
process, network, and its domain name server (DNS), about
red team logs [52]
54 P. Lavanya and V. S. Shankar Sriram
2020
2020
100
2020
2018
2018
2018
2020
2020
2016
75
2016
2020
2016
2018
2016
2016
2018
2018
2016
50
25
0
CNN RNN DBN LSTM GRU AE
6 Conclusion
Insider threat is one of the greatest challenging threats in cybersecurity where both
the insider threats and the presence of insiders cannot be identified easily within an
infrastructure which results in an enormous amount of loss (financial and confidential
data) in many organizations. Different types of deep learning-based insider threat
detection approaches have been developed to detect insider threats and insiders.
This review article presents the categories of insider threat, types of insiders, and a
glimpse of insider threat detection approaches implemented so far and provides a deep
insight into the insider threat detection approaches deployed based on deep learning
techniques. This study helps cybersecurity researchers to know the importance and
working of deep learning techniques in insider threat detection.
Acknowledgements This work was supported by The Department of Science and Technology-
Interdisciplinary Cyber-Physical System (T-615).
References
1. M.R.G. Raman, N. Somu, K. Kirthivasan, V.S. Shankar Sriram, A hypergraph and arithmetic
residue-based probabilistic neural network for classification in intrusion detection systems.
Neural Netw. 92, 89–97 (2017)
2. F. Yuan, Y. Cao, Y. Shang, Y. Liu, J. Tan, B. Fang, Insider threat detection with deep neural
network, in International Conference on Computational Science (Springer, Cham, 2018),
pp. 43–54
3. R. Chinchani, D. Ha, A. Iyer, H.Q. Ngo, S. Upadhyaya, Insider threat assessment: model,
analysis and tool, in Network security (Springer, Boston, MA, 2010), pp. 143–174
4. Y. Wu, D. Wei, J. Feng, Network attacks detection methods based on deep learning techniques:
a survey. Secur. Commun. Netw. 2020 (2020). https://doi.org/10.1155/2020/8872923
Detection of Insider Threats Using Deep Learning: A Review 55
28. J. Lu, R.K. Wong, Insider threat detection with long short-term memory, in Proceedings of the
Australasian Computer Science Week Multiconference, pp. 1–10 (2019)
29. F. Meng, F. Lou, Y. Fu, Z. Tian, Deep learning based attribute classification insider threat
detection for data security, in 2018 IEEE Third International Conference on Data Science in
Cyberspace (DSC) (IEEE, 2018), pp. 576–581
30. D. Zhang, Y. Zheng, Y. Wen, Y. Xu, J. Wang, Y. Yu, D. Meng, Role-based log analysis applying
deep learning for insider threat detection, in Proceedings of the 1st Workshop on Security-
Oriented Designs of Computer Architectures and Processors, pp. 18–20 (2018)
31. R. Dey, F.M. Salemt, Gate-variants of gated recurrent unit (GRU) neural networks, in 2017
IEEE 60th international midwest symposium on circuits and systems (MWSCAS) (IEEE, 2017),
pp. 1597–1600
32. C. Soh, Y. Sicheng, A. Narayanan, S. Duraisamy, L. Chen, Employee profiling via aspect-based
sentiment and network for insider threats detection. Expert Syst. Appl. 135, 351–361 (2019)
33. M. Yousefi-Azar, V. Varadharajan, L. Hamey, U. Tupakula, Autoencoder-based feature learning
for cybersecurity applications, in 2017 International Joint Conference on Neural Networks
(IJCNN) (IEEE, 2017), pp. 3854–3861
34. J. Zhang, Y. Chen, J. Ankang, Insider threat detection of adaptive optimization DBN for
behavior logs. Turk. J. Electr. Eng. Comput. Sci. 26(2), 792–802 (2018)
35. G. Dong, G. Liao, H. Liu, G. Kuang, A review of the autoencoder and its variants: a comparative
perspective from target recognition in synthetic-aperture radar images. IEEE Geosci. Remote
Sens. Mag. 6(3), 44–68 (2018)
36. L. Liu, O. De Vel, C. Chen, J. Zhang, Y. Xiang, Anomaly-based insider threat detection
using deep autoencoders, in 2018 IEEE International Conference on Data Mining Workshops
(ICDMW) (IEEE, 2018), pp. 39–48
37. S. Yuan, P. Zheng, X. Wu, Q. Li, Insider threat detection via hierarchical neural temporal
point processes, in 2019 IEEE International Conference on Big Data (Big Data) (IEEE, 2019),
pp. 1343–1350
38. Z. Liu, C. Chen, J. Zhou, X. Li, F. Xu, T. Chen, L. Song, Poster: neural network-based
graph embedding for malicious accounts detection, in Proceedings of the 2017 ACM SIGSAC
Conference on Computer and Communications Security, pp. 2543–2545 (2017)
39. A. Harilal, F. Toffalini, J. Castellanos, J. Guarnizo, I. Homoliak, M. Ochoa, Twos: a dataset of
malicious insider threat behavior based on a gamified competition, in Proceedings of the 2017
International Workshop on Managing Insider Security Threats, pp. 45–56 (2017)
40. M.B. Salem, S.J. Stolfo, Modeling user search behavior for masquerade detection, in Inter-
national Workshop on Recent Advances in Intrusion Detection (Springer, Berlin, Heidelberg,
2011), pp. 181–200
41. J.B. Camina, C. Hernández-Gracidas, R. Monroy, L. Trejo, The windows-users and-intruder
simulations Logs dataset (WUIL): an experimental framework for masquerade detection
mechanisms. Expert Syst. Appl. 41(3), 919–930 (2014)
42. J.B. Camina, R. Monroy, L.A. Trejo, M.A. Medina-Pérez, Temporal and spatial locality: an
abstraction for masquerade detection. IEEE Trans. Inf. Forensics Secur. 11(9), 2036–2051
(2016)
43. M. Miao, J. Wang, S. Wen, J. Ma, Publicly verifiable database scheme with efficient keyword
search. Inf. Sci. 475, 18–28 (2019)
44. C. Thomas, V. Sharma, N. Balakrishnan, Usefulness of DARPA dataset for intrusion detection
system evaluation, in Data Mining, Intrusion Detection, Information Assurance, and Data
Networks Security, vol. 6973, p. 69730G (2008)
45. S. Terry, B.J. Chow, An assessment of the DARPA IDS evaluation dataset using snort. UCDAVIS
Department of Computer Science, vol. 1, p. 22 (2007)
46. J. Shetty, J. Adibi, The enron email dataset database schema and brief statistical report. Inf.
Sci. Inst. Tech. Rep. Univ. South. Calif. 4(1), 120–128 (2004)
47. E. Santos, H. Nguyen, F. Yu, K.J. Kim, D. Li, J.T. Wilkinson, A. Olson, J. Russell, B. Clark,
Intelligence analyses and the insider threat. IEEE Trans. Syst. Man Cybern. Part A: Syst.
Humans 42(2), 331–347 (2011)
Detection of Insider Threats Using Deep Learning: A Review 57
48. M. Collins, Common sense guide to mitigating insider threats. CARNEGIE—MELLON UNIV
PITTSBURGH PA PITTSBURGH United States (2016)
49. P.A. Legg, Visualizing the insider threat: challenges and tools for identifying malicious user
activity, in 2015 IEEE Symposium on Visualization for Cyber Security (VizSec) (IEEE, 2015),
pp. 1–7
50. M.B. Salem, S.J. Stolfo, A comparison of one-class bag-of-words user behavior modeling
techniques for masquerade detection. Secur. Commun. Netw. 5(8), 863–872 (2012)
51. S. Greenberg, Using unix: collected traces of 168 users (1988). https://doi.org/10.11575/
PRISM/30806
52. A. El Masri, H. Wechsler, P. Likarish, B. ByungHoon Kang, Identifying users with application-
specific command streams, in 2014 Twelfth Annual International Conference on Privacy,
Security and Trust (IEEE, 2014), pp. 232–238
53. A. Bushuev, Modern methods of protection against insider threats. zyk v cfepe
ppofeccionalno kommynikacii—Ekatepinbypg 2020(2020), 458–461 (2020)
54. R.A. Alsowail, T. Al-Shehari, Empirical detection techniques of insider threat incidents. IEEE
Access 8, 78385–78402 (2020)
55. M. Canham, C. Posey, P.S. Bockelman, Confronting information security’s elephant, the
unintentional insider threat, in International Conference on Human-Computer Interaction
(Springer, Cham, 2020), pp. 316–334
An Incisive Analysis of Advanced
Persistent Threat Detection Using
Machine Learning Techniques
Abstract Up thrust in security threats and cyber-attacks are due to infinite growth in
Internet-based services. One such multi-stage security threat which is more serious,
undiscoverable, and complicated is Advanced Persistent Threat (APT). Discovering
an APT attack is a foremost challenge to the research community as the attack
vectors of APT exists for a long period. Persistent efforts by the researchers in
APT detection using machine learning models improve the detection efficiency and
provide a better understanding of the APT Stages. This review article summarizes the
various machine learning-based detection techniques presented so far in the literature
to alleviate the impact of APT and guides the interested researchers to design a
computationally attractive, reliable, and robust machine learning-based system for
efficient APT detection.
1 Introduction
The security in Internet-based applications had rapid growth due to the development
of technology. To protect the industries or organizations from vulnerabilities, the
defender needs technology-based visibility over its possessions which provide secu-
rity within an infrastructure [1]. In recent years, most of the outbreaks in cyberspace
lead to confidential data loss and effective capital loss in both public and private orga-
nizations [2, 3]. These generous outbreaks are renowned as a national threat which
are induced by cybercriminals (may be individuals or groups) that are bounded by
malware rather than safe networks. Nowadays, many different types of cyber-attacks
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 59
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_5
60 M. K. Vishnu Priya and V. S. S. Sriram
have existed in network applications. Among those attacks, one of the complex and
customized attacks is Advanced Persistent Threat (APT).
APT is a targeted Cyber attack, which is personified for its slow and steady
attack process. It utilizes standard frameworks to conceal itself inside the genuine
network traffic [4, 5] and persist over there for a long period. The APT attack is
a more perplexing process and has a significant impact when compared to other
attacks [6]. It uses zero-day vulnerabilities, spear phishing, watering hole attacks,
remote access control tools, some advanced tools and techniques to harm the targeted
systems, and move laterally inside the network. After reaching the targeted network,
the APT attack groups steal sensitive information, monitor the activities of the system
without being detected by intrusion detection systems and other traditional detection
mechanisms [7]. Attackers generate network traffic inside the organizations which
makes the detection process more challenging [8]. Most of the sectors are affected
by APT such as industry sectors, military, healthcare, and education.
The real-time application of APT attacks is based on unique characteristics such as
abnormal performance on client accounts maintenance, the occurrence of secondary
channel Trojans, irrelevant information groups, confused data streams, and deviations
in outbound information. Some reports [9–11] of malware analysis are discussed,
in which FireEye [12, 13] report has abundant sources related to cybersecurity. It
presents an overview of various APT attacks, suspected attributions, target sectors,
attack vectors, and suspected countries. The cybersecurity professionals [14, 15]
focuses on understanding the features of APT, an attempt to excavate hidden state
practices by utilizing the organization’s network traffic, and the process of accessing
security log data using standard/traditional detection techniques. However, these
methods failed to achieve high-level accuracy. To further improve the APT detec-
tion accuracy, Machine Learning (ML) algorithms have been proposed by various
researchers [16, 17]. It analyzes the data and performs the APT detection process
systematically. This manuscript focused on the review of recent research on various
machine learning techniques to prevent large-scale APT attacks.
The rest of the paper is structured as follows, Section 2 presents a detailed descrip-
tion of APT and their stages. Section 3 discusses about the various dataset used for the
valuation of APT attack. Section 4 focuses on the traditional detection and defense
mechanism proposed by various authors and procures the challenges while detecting
the APT attacks. Section 5 illustrates the importance of using ML to detect the
APT attack, architecture of machine learning model, and summarizes some of the
research articles related to APT detection using ML techniques. Section 6 concludes
the review article.
Advanced persistent Threat (APT) can be described as, “a holistic human driver
infiltration or a meticulous group of attackers use advanced tools and techniques to
attack the well-established sectors to do the process of exfiltration & exploitation of
An Incisive Analysis of Advanced Persistent Threat … 61
sensitive information from those sectors. The main feature of APT is that attacker
can persist over a very long period in an organization without being undetected.”
The characteristics of APT can be derived as,
Advanced—APT attackers are well-funded to access advanced tools and techniques.
The methods and vectors of attacks are customized into several stages based on their
targets.
Persistent—Attackers pertain to access to the targeted network, impersonate ordi-
nary traffic to avoid detection from the traditional detector, and stays unobserved
for a long period using deceptive techniques. APT attackers try to stay inside the
targeted network without being explicitly noticed.
Threat—The goal of attackers is to devastate and disrupt the organization by
irreplaceable data loss, causes damage to the impendent economic growth of an
organization.
The characteristics of APT rely on the same as other malware. Though, some of the
characteristics are quite different from other malware in their intent and impetus. To
differentiate between the malware and APT, Chen et al. [18] stated some individuality
presented in Table 1.
The exploration of definite APT models shows there is no such similarity among
every single APT attack and they are explicitly customized for each target. However,
the various steps involved in the process of APT are more related to one another. The
upcoming section elaborates on the stages of APT attack in detail.
Advanced Persistent Threat involves multiple stages of attack process based on their
targeted environment, among them most of the researches used the following six
stages (Fig. 1) (Reconnaissance, Delivery, Exploitation, Operational, Data collection,
62 M. K. Vishnu Priya and V. S. S. Sriram
and Exfiltration) have been considered as commonly used loopholes by most of the
attackers and based on Intrusion kill chain. Table 2 exemplifies the stages of APT
used so far by the attackers which are seized for the detection process.
The above-mentioned table signifies the study of the APT Stages which would
pave the way for upcoming researchers to gain wide knowledge on new techniques
in APT detection based on its various stages and their respective functions.
Creating and collecting a dependable benchmark dataset, training, and testing the
APT labeled features are still a very big challenge due to the vague nature of the
dataset. Most of the APT features are similar to the features of IDS, so IDS dataset
like NSL-KDD, UNSW-NB15, NGIDS-DS, and CICIDS2017 was also evaluated to
detect the trace of APT in a network environment. Those existing approaches moder-
ately found the APT attack signatures. Since, detecting the advanced techniques used
by the attackers is too tough. To encounter these challenges, the establishment of
dataset(s) like synthetic, semi-synthetic, and realistic modes is used for the detection
process because these dataset is composed of attacks that are injected manually in the
network traffic flow includes the features of APT. The following section elaborates
a discussion on various analyzes of dataset related to APT detection.
Ghafir et al. [15] proposed a novel approach for threat detection using various
ML methods and the dataset used for this process was a synthetic dataset gener-
ated based on random and related alerts in the network environment which has 8
important features of APT. The model resulted in better threat prediction with a
low false positive rate. Paul et al. [28] evaluated their proposed approach using a
synthetic dataset that was created in the general organization network and correlate
the evaluated results from the sample data collected from several security fields.
An Incisive Analysis of Advanced Persistent Threat … 63
Table 2 (continued)
APT stages Description
• Point of entry The attackers started with stick phishing messages
shipped off to focused workers in the organization
• C&C server Starts compromising the targeted network by using
C&C protocol, establishing long-term access,
download additional malware executable
• Lateral movement Compromises additional systems to gain access and to
pave the way for vulnerable host
• Data discovery Obtain information about the internal Web servers and
hosts
• Data exfiltration Exfiltration sensitive information
6 Stages of APT [23]
• Reconnaissance and weaponization Information gathered for the attack process id done by
OSINT, social engineering
• Delivery Two ways in delivering the exploits in the victim’s
system:
Direct way—using spear phishing, watering hole
techniques to invade into the system
Indirect way—compromises the trustworthy 3rd party
of the targeted network to exploit the attack
• Initial intrusion Exploiting vulnerabilities in Adobe, Excel, Internet
Explorer of the targeted network
• Command and control Attackers use social networking sites, Tor anonymity
network, remote access control to control the targeted
network and exploits the user’s behaviors
• Lateral movement This phase of the attack lasts for a long period for
establishing the attack vectors to completely
compromise the targeted network
• Data exfiltration To gain the strategic benefits of the organization threats
7 Stages of APT [24, 25]
• Research Collects basic information, loopholes about the targeted
network
• Preparation Extracts definite information about the victim and test
the tools and techniques to exploit vulnerabilities
• Intrusion Possessing vulnerable emails into the targeted systems
• Conquering the network Once the victim systems have been compromised,
attackers include some additional malware to access
some specific host in the targeted network
• Hiding presence Conceals their presence to avoid the detection from
traditional methods
• Gathering data Steals some confidential information
• Maintaining access The main goal of the APT attackers was to maintain
access to the targeted network for a long period
(continued)
An Incisive Analysis of Advanced Persistent Threat … 65
Table 2 (continued)
APT stages Description
8 Stages of APT [26]
• Initial reconnaissance Initial process
• Initial compromise Spear phishing email technique was used to penetrate
through the target network
• Establishing foothold Control over the target
• Privilege escalation Predominant usage of tools to gain credentials
information
• Internal reconnaissance Collects all the information about the target to initiate
the attack
• Lateral movement Access the system using legitimate credentials
• Maintain presence Stays in the network for a long time
• Complete mission Steals the information using C&C servers
11 Stages of APT [27]
• Initial access The initial phase of the attack
• Persistence More exertion is placed into constancy focused for
long-lasting admittance in the organization
• Privilege escalation To introduce malware or perseverance focuses
• Discovery Finding frameworks, clients, and information important
to the mission
• Lateral movement This stage presents to getting across an organization to
the important information for the mission
• Collection Target network data are collected
• Exfiltration Exfiltration of personal data
• Execution One of the main requirements for attackers to execute
the malicious code in the victim system
• Defense evasion Bypassing the traditional detectors and stays
undetected for a long time
• Credential access This phase is the key role for the attackers to gain
access to victims systems using valid credentials
• C&C Gain control over the target network by establishing
communication with the various internal host
Micro et al. [29] analyzed high dimensions of network traffic flow which consti-
tutes data records for 5 months and ranking the features based on the ranking approach
and valued their proposed model on the dataset contains the log data gathered from the
enterprise network over three days. The specific dataset was released by the visual
analytics community for the mini challenge. It was injected by various kinds of
attacks monitored by the intrusion detection approach to detect anomalous behavior.
Siddiqui et al. [30] proposed a fractal dimension-based machine learning classifi-
cation model and evaluated using the pcap files obtained from the Contagio malware
dataset and dataset from DARPA scalable network monitoring program traffic to
66 M. K. Vishnu Priya and V. S. S. Sriram
classify the normal and malicious behavioral features. Since the features related to
the APT attack were categorized and labeled for the further process. Ivo Friedberg
et al. [31] proposed an anomaly detection model to differ the actions and behav-
iors of APT from Skopik et al. [32] generated network traffic data. This divides the
observed data into training and attack phases based on the anomalous behavior of
the system. The above-discussed dataset detects the signatures of APT but there is a
lack of advanced techniques to mitigate the attack. Even though, there is a surplus of
benchmark dataset(s) for the evaluation of advanced models the generated synthetic
dataset [4, 33] and contagion malware dataset [34] resulted in better performance
when compared with other IDS-based dataset.
APT uses advanced tools and techniques to penetrate through large endeavors such
as industrial control systems, banking sectors, institutions, and nuclear power plants
which affects the security infrastructure completely [35]. Owing to the diverse
features of APT, it can persist over a long period without being detected and it
outbreaks confidential information. To diminish these attack processes, most of the
researchers focused on the detection of APT using some standard detection methods
such as anomaly detection, rule-based detection, signature-based detection, traffic
data analysis, and pattern recognition [36–38]. These existed detection methods were
used for the process of isolating the affected systems, extracting the threat features
based on the network traffic, data flow, average size of the data packet, time interval
taken by the data packet for the process of transmission, etc., encountered as the most
encouraging techniques which are widely used for the intrusion detection systems
and malware identification [9, 39, 40].
APT attacks were discovered by its anomalous behavior and most of the researchers
focused on anomaly detection [41] such as Niu et al. [11] proposed a model called
Global Abnormal Forest (GAF) to detect the features of APT in the C&C domains
which are based on the DNS log data of mobile devices. To find the unusual actors
accessing the system and to achieve higher efficiency, Berrada et al. [42] had demon-
strated an anomaly-based detection algorithm on provenance traces. Normally, this
process of extracting the provenance data results in the prediction of the anomalous
behavior of a system and it made it easier to find the traces of attackers who find the
pathway for the exfiltration of information. Whereas, the proposed model recorded
the traces on four operating systems to detect the APT traces on the system using the
categorical anomalous detection with effective results.
An Incisive Analysis of Advanced Persistent Threat … 67
Detection of APT based on the various techniques rather than traditional methods
was stated by some researchers. A hot booting PHC-based APT discovery conspire
has been proposed for the powerful games, it (i) improves the APT identification
execution in the unique games by increased outcomes with an insurance level of
18.1% (ii) utility of the cloud increments by 8.8% contrasted based on the Q-learning
procedure [14].
The sandboxing execution method was proposed for radical malware detection
based on malware activities in VM infrastructure. This method also includes the
sandbox-evasion method [43–45] for the process of finding the APT presences. The
two techniques are (1) big data are large-scale distributed computing framework
based on MapReduce is used to absorb the changes in the normal behavior of the
victim at each stage of APT and (2) Hadoop which is used to expose possible targets
based on identified APT target [46, 47]. When an APT attack exists during the process
of transferring the information from source to destination, there occurs a problem
called data loss. To avoid this kind of data loss, Data Loss Prevention (DLP) is
proposed which involves two operations, namely systematic collecting, analyzing of
data, and currently obstructing data [48–50].
A very few APT researchers have pointed out the detection solutions for malware in
the industrial control systems (ICS) suits for mitigating the APT attacks. The solution
can be accomplished using the security apparatuses, for example, weakness scanners,
SIEM frameworks, IDS/IPS frameworks, or security arrangement devices [2, 22, 51,
52]. This information base is effectively upgradeable and can be incorporated into
all Cisco interruption recognition frameworks [39].
However, there is no specific method for defending over an APT attack and also
the traditional method has failed to find ways related to the threat, their misclassified
detection in classifying the advanced malware attacks [10, 53] leads the overall attack
detection method results in a high false positive rate, fails to handle a large volume
of data [54].
The existing detection process relies on detecting the changes that occurred in the
network traffic, log data, examining the network ports, checks for pattern matching,
etc. Since most of the malware being undetected due to its advanced tools and tech-
niques to invade into the system. The following are some of the important challenges
discussed previously:
(i) lack of benchmark dataset
68 M. K. Vishnu Priya and V. S. S. Sriram
Machine learning methods can be utilized in different situations which are applied
to various applications of cyberspace such as malware detection, intrusion detec-
tion, insider threat detection [2, 55], etc., in monitoring the internal user systems
by analyzing the spam and phishing emails from unknown individuals or groups.
Managing the machine individualities had become the acute security proficiency.
They are trained to handle enormous real-time data within a certain period and require
some degree of mechanization and compactness when it is used for the detection of
threat models. Further, these models are well-trained and result in higher accuracy
when it is adapted to work on other high-dimensional data [56–58]. Although several
methodologies are focusing on the detection and analyzes of APT attacks, there exist
few shortcomings related to the maintenance of trade-offs between the false positives
and false negatives and also correlating the alerts from various APT cycles for the
identification of exact APT scenario [59, 60]. Figure 2 explain the APT detection
process carried out by ML trained model.
Hence, the accurate detection of existing APT attacks with minimal time
complexity remains an open challenge for researchers [16]. Therefore, the neces-
sity of using ML methods in the detection of APT (i) provides deep insights into
the features of malware dataset, (ii) provides better classification accuracy and more
flexibility than other traditional detection methods [54], (iii) results in low speci-
ficity with low false positive rate prediction [25, 46, 61] and also attains early alert
of unfamiliar threats like APT. To state the concerns, the following table (Table 3)
Table 3 (continued)
Authors ML techniques Dataset used Performance Merits
metrics
Lamprakis Random forest Web request data ✓ Precision High precision in
et al. [65] classifiers ✓ Recall predicting the
C&C traffic
Moon et al. DTB-IDS Synthetic dataset Accuracy 84.7% accuracy
[66]
Adams et al. ✓ Neural networks Synthetic data ✓ Precision More viable in
[62] ✓ Decision tree ✓ Recall approaching APT
✓ One class SVM ✓ Accuracy attacks
✓ K-means ✓ MCC
clustering
Shang et al. ✓ Convolution Contagio malware ✓ Precision Higher efficiency
[67] neural network database ✓ Recall in detecting C&C
✓ Long short term ✓ False alarm channels of
memory rate unknown APT
✓ PCA ✓ F1-score attack
✓ SVM,
✓ Decision tree
✓ Random forest,
✓ K-NN
✓ Naïve Bayes
✓ Logistic
regression
Tan et al. [34] Entropy-based Contagio malware ✓ Precision Reduces
detection using database ✓ Accuracy computation
SVM ✓ Recall complexity
✓ F1-score Alert
Generation on
traffic data
Efficient method
Ghafir et al. MLAPT Synthetic dataset ✓ Accuracy 84.8%
[33] ✓ TPR prediction
✓ FPR accuracy in the
early stage of APT
low false positive
rate
elucidates APT detection using several machine learning algorithms. SVM performs
well in classifying the normal and APT signatures in a high traffic flow. The features
are labeled based on the APT attack and the learning model classifies the malicious
behavior [34, 47, 59, 62].
The following statistical report (Fig. 3) consists of a detailed survey on the compar-
ison of machine learning and other computational detection models based on their
performance metrics.
An Incisive Analysis of Advanced Persistent Threat … 71
Fig. 3 Statistical report on performance analysis of computational and machine learning techniques
6 Conclusion
Acknowledgements This work was supported by The Department of Science and Technology-
Interdisciplinary Cyber-Physical System (T-615)
References
2. B. Stojanović, K. Hofer-Schmitz, U. Kleb, APT datasets and attack modeling for automated
detection methods: a review. Comput. Secur. 92, 101734 (2020)
3. Swisscom, Targeted Attacks Cyber Security Report 2019; Technical report (Swisscom
(Switzerland) Ltd. Group Security, Bern, 2019)
4. A. Alshamrani, S. Myneni, A. Chowdhary, D. Huang.: A survey on advanced persistent threats:
techniques, solutions, challenges, and research opportunities. IEEE Commun. Surv. Tutorials
21(2), 1851–1877 (2019)
5. W. Niu, X. Zhang, G.W. Yang, J. Zhu, Z. Ren, Identifying APT malware domain based on
mobile DNS logging. Math. Probl. Eng. (2017)
6. CISCO Systems. CISCO: Protecting ICS with Industrial Signatures. https://www.cisco.com/
c/en/us/products/security/index.html. Accessed on 5 June 2021
7. Solid State System LLC, http://solidsystemsllc.com/advanced-persistent-threat-protection
Accessed on 24 Mar 2021
8. R. Zhang, Y. Huo, J. Liu, F. Weng, Constructing APT attack scenarios based on intrusion kill
chain and fuzzy clustering. Secur. Commun. Netw. 7536381 (2017)
9. Malware Capture Facility Project. http://mcfp.weebly.com Accessed 28 on Aug 2021
10. Malware-Traffic-Analysis Blog. http://www.malware-traffic-analysis.net Accessed on 27 Aug
2021
11. T M technical report, Targeted attacks and how to defend against them, http://www.trendm
icro.co.uk/media/misc/targeted-attacks-and-how-to-defendagainst-them-en.pdf. Accessed on
9 July 2021
12. Fire eye Report, https://content.fireeye.com/apt-41/rpt-apt41/. Accessed 10 Jan 2021
13. Fire eye Report, https://www.fireeye.com/current-threats/apt-groups.html. Accessed 10 Jan
2021
14. Attivo Networks. BOTsink. https://attivonetworks.com/product/attivo-botsink. Accessed 12
Jan 2021.
15. I. Ghafir, V. Prenosil, Proposed approach for targeted attacks detection, in Advanced Computer
and Communication Engineering Technology (Springer, Cham, 2016), pp. 73–80
16. H.A. Glory, C. Vigneswaran, S.S. Jagtap, R. Shruthi, G. Hariharan, V.S. Shankar Sriram, AHW-
BGOA-DNN: a novel deep learning model for epileptic seizure detection. Neural Comput.
Appl. 1–29 (2020)
17. J. Vukalović, D. Delija, Advanced persistent threats-detection and defense, in 2015 38th
International Convention on Information and Communication Technology, Electronics and
Microelectronics (MIPRO) (IEEE, 2015), pp. 1324–1330
18. P. Chen, L. Desmet, C. Huygens, A study on advanced persistent threats, in IFIP International
Conference on Communications and Multimedia Security (Springer, Berlin, 2014), pp. 63–72
19. C. Vigneswaran, V.S. Shankar Sriram, Unsupervised bin-wise pre-training: a fusion of
information theory and hypergraph. Knowl. Based Syst. 195, 105650 (2020)
20. Guan, Z., L. Bian, T. Shang, J. Liu, When machine learning meets security issues: a survey,
in 2018 IEEE International Conference on Intelligence and Safety for Robotics (ISR). IEEE
(2018), pp. 158–165
21. P.K. Sharma, S.Y. Moon, D. Moon, J.H. Park, DFA-AD: a distributed framework architecture
for the detection of advanced persistent threats. Cluster Comput. 20(1), 597–609 (2017)
22. D. Moon, H. Im, I. Kim, J.H. Park, DTB-IDS: an intrusion detection system based on decision
tree using behavior analysis for preventing APT attacks. J. Supercomput. 73(7), 2881–2895
(2017)
23. M. Ussath, D. Jaeger, F. Cheng, C. Meinel, Advanced persistent threats: behind the scenes, in
2016 Annual Conference on Information Science and Systems (CISS) (IEEE, 2016), pp. 181–
186
24. E.M. Hutchins, J.C. Michael, R.M. Amin, Intelligence-driven computer network defense
informed by analysis of adversary campaigns and intrusion kill chains. Leading Issues Inf.
Warfare Secur. Res. 1(1), 80 (2011)
25. Mandiant. The Advanced Persistent Threat. https://www.fireeye.com/content/dam/fireeye-
www/services/pdfs/mandiant-apt1-report.pdf. Accessed on 30 Mar 2021
An Incisive Analysis of Advanced Persistent Threat … 73
26. W. Tounsi, H. Rais, A survey on technical threat intelligence in the age of sophisticated cyber-
attacks. Comput. Secur. 72, 212–233 (2018)
27. Trend Micro, The Custom Defense Against Targeted Attacks. Technical report (Trend Micro,
Tokyo, 2013)
28. F. Skopik, G. Settanni, R. Fiedler, I. Friedberg, Semi-synthetic data set generation for security
software , in 2014 Twelfth Annual International Conference on Privacy, Security and Trust
(IEEE, 2014), pp. 156–163
29. W. Matsuda, M. Fujimoto, T. Mitsunaga, Detecting APT attacks against active directory using
machine leaning, in 2018 IEEE Conference on Application, Information and Network Security
(AINS). IEEE (2018), pp. 60–65
30. S. Singh, P.K. Sharma, S.Y. Moon, D. Moon, J.H. Park, A comprehensive study on APT attacks
and countermeasures for future networks and communications: challenges and solutions. J.
Supercomput. 75(8), 4543–4574 (2019)
31. A. Bohara, U. Thakore, W.H. Sanders, Intrusion detection in enterprise systems by combining
and clustering diverse monitor data, in Proceedings of the Symposium and Bootcamp on the
Science of Security (2016), pp. 7–16
32. I. Friedberg, F. Skopik, G. Settanni, R. Fiedler, Combating advanced persistent threats: from
network event correlation to incident detection. Comput. Secur. 48, 35–57 (2015)
33. I. Ghafir, M. Hammoudeh, V. Prenosil, L. Han, R. Hegarty, K. Rabie, F.J. Aparicio-Navarro,
Detection of advanced persistent threat using machine-learning correlation analysis. Futur.
Gener. Comput. Syst. 89, 349–359 (2018)
34. K. Krithivasan, S. Pravinraj, V.S. Shankar Sriram, Detection of cyberattacks in industrial control
systems using enhanced principal component analysis and hypergraph-based convolution
neural network (EPCA-HG-CNN). IEEE Trans. Ind. Appl. 56(4), 4394–4404 (2020)
35. M. Salem, M. Mohammed, Feasibility approach based on SecMonet framework to protect
networks from advanced persistent threat attacks, in International Conference on Emerging
Internetworking, Data & Web Technologies (Springer, Cham, 2019), pp. 333–343
36. R.P. Baksi, S.J. Upadhyaya, A comprehensive model for elucidating advanced persistent threats
(APT), in Proceedings of the International Conference on Security and Management (SAM)
(2018), pp. 245–251
37. G. Berrada, J. Cheney, S. Benabderrahmane, W. Maxwell, H. Mookherjee, A. Theriault, R.
Wright, A baseline for unsupervised advanced persistent threat detection in system-level
provenance. Futur. Gener. Comput. Syst. 108, 401–413 (2020)
38. T. Schindler, Anomaly detection in log data using graph databases and machine learning to
defend advanced persistent threats. arXiv preprint arXiv:1802.00259 (2018)
39. C. Wen-Lin, C.-J. Lin, K.-N. Chang, Detection and classification of advanced persistent threats
and attacks using the support vector machine. Appl. Sci. 9(21), 4579 (2019)
40. J. Tan, J. Wang, Detecting advanced persistent threats based on entropy and support vector
machine, in International Conference on Algorithms and Architectures for Parallel Processing
(Springer, Cham, 2018), pp. 153–165
41. D.X. Cho, H.H. Nam, A method of monitoring and detecting apt attacks based on unknown
domains. Procedia Comput. Sci. 150, 316–323 (2019)
42. P. Giura, W. Wang, Using large scale distributed computing to unveil advanced persistent
threats. Science 1(3), 93 (2013)
43. A. Singh, Z. Bu, Hot knives through butter: Evading file-based sandboxes. Threat Research
Blog. Accessed on 20 Apr 2021 (2013)
44. F.M. Al-Matarneh, Advanced persistent threats and its role in network security vulnerabilities.
Int. J. Adv. Res. Comput. Sci. 11(1) (2020)
45. J. Sexton, C. Storlie, B. Anderson, Subroutine based detection of APT malware. J. Comput.
Virol. Hacking Technol. 12(4), 225–233 (2016)
46. M. Marchetti, F. Pierazzi, M. Colajanni, A. Guido, Analysis of high volumes of network traffic
for advanced persistent threat detection. Comput. Netw. 109, 127–141 (2016)
47. T. Micro, Countering the advanced persistent threat challenge with deep discovery. Retrieved
10(10) (2013)
74 M. K. Vishnu Priya and V. S. S. Sriram
48. M.R.G. Raman, N. Somu, K. Kirthivasan, R. Liscano, V.S. Shankar Sriram, An efficient intru-
sion detection system based on hypergraph-genetic algorithm for parameter optimization and
feature selection in support vector machine. Knowl.-Based Syst. 134, 1–12 (2017)
49. J. Sexton, C. Storlie, J. Neil, Attack chain detection Statistical analysis and data mining. ASA
Data Sci. J. 8(5–6), 353–363 (2015)
50. F. Skopik, G. Settanni, R. Fiedler, A problem shared is a problem halved: a survey on the
dimensions of collective cyber defense through security information sharing. Comput. Secur.
60, 154–176 (2016)
51. AlertEnterprise. Sentry CyberSCADA. http://www.alertenterprise.com/products-EnterpriseSe
ntryCybersecuritySCADA.php. Accessed 12 Jan 2021
52. X. Wang, K. Zheng, X. Niu, B. Wu, C. Wu, Detection of command and control in advanced
persistent threat based on independent access, in 2016 IEEE International Conference on
Communications (ICC) (IEEE, 2016), pp. 1–6
53. O.I. Adelaiye, S. Aminat, S.A. Faki, Evaluating advanced persistent threats mitigation effects:
a review. Int. J. Inf. Secur. Sci. 7(4), 159–171 (2018)
54. M.Z. Rafique, P. Chen, C. Huygens, W. Joosen, Evolutionary algorithms for classification of
malware families through different network behaviors, in Proceedings of the 2014 Annual
Conference on Genetic and Evolutionary Computation (2014), pp. 1167–1174
55. L. Xiao, D. Xu, N.B. Mandayam, H. Vincent Poor, Attacker-centric view of a detection game
against advanced persistent threats. IEEE Trans. Mobile Comput. 17(11), 2512–2523 (2018)
56. M.A.M. Hasan, M. Nasser, S. Ahmad, K.I. Molla, Feature selection for intrusion detection
using random forest. J. Inf. Secur. 7(3), 129–140 (2016)
57. A.M. Lajevardi, M. Amini, A semantic-based correlation approach for detecting hybrid and
low-level APTs. Fut. Gener. Comput. Syst. 96, 64–88 (2019)
58. P. Giura, W. Wang, A context-based detection framework for advanced persistent threats, in
2012 International Conference on Cyber Security (IEEE, 2012), pp. 69–74
59. L. Shang, D. Guo, Y. Ji, Q. Li, Discovering unknown advanced persistent threat using shared
features mined by neural networks. Comput. Netw. 189,107937 (2021)
60. Y. Shi, G. Chen, J. Li, Malicious domain name detection based on extreme machine learning.
Neural Process. Lett. 48(3), 1347–1357 (2018)
61. M. Schmid, F. Hill, A.K. Ghosh, Protecting data from malicious software, in 18th Annual
Computer Security Applications Conference, 2002. Proceedings (IEEE, 2002), pp. 199–208
62. C. Adams, A.A. Tambay, D. Bissessar, R. Brien, J. Fan, M. Hezaveh, J. Zahed, Using machine
learning to detect APTs on a user workstation. Int. J. Sens. Netw. Data Commun. 8(2), (2019)
63. I. Jeun, Y. Lee, D.A. Won, A practical study on advanced persistent threats. Computer
applications for security. Control Syst. Eng. 144–152 (2012)
64. Ş. Bahtiyar, B.Y. Mehmet, C.Y. Altıniğne, A multi-dimensional machine learning approach to
predict advanced malware. Comput. Netw. 160, 118–129 (2019)
65. P. Lamprakis, R. Dargenio, D. Gugelmann, V. Lenders, M. Happe, L. Vanbever, Unsupervised
detection of APT C&C channels using web request graphs, in International Conference on
Detection of Intrusions and Malware, and Vulnerability Assessment (Springer, Cham, 2017),
pp. 366–387
66. C. Neasbitt, R. Perdisci, K. Li, T. Nelms, Clickminer: towards forensic reconstruction of user-
browser interactions from network traces, in Proceedings of the ACM CCS 2014 (ACM, 2014),
pp. 1244–1255
67. S. Siddiqui, M.S. Khan, K. Ferens, W. Kinsner, Detecting advanced persistent threats using
fractal dimension based machine learning classification, in Proceedings of the 2016 ACM on
International Workshop on Security and Privacy Analytics (2016), pp. 64–69
Intelligent Computing Systems
for Diagnosing Plant Diseases
Abstract In this paper, various image processing approaches for identifying, evalu-
ating, and organizing plant diseases have been discussed. Various parts of the plants
such as seeds, stems, leaves, fruits, flowers can be used to detect the health of the
plants or to identify the diseases on them. This paper specifically focuses on the
methods that include plant leaves to detect the disease. The objective of the paper is
to comprehend various approaches for detecting the diseases and classify them by
using convolutional neural networks by using the concept of transfer learning. The
knowledge of the same will be useful for further exploration on methods to iden-
tify, detect, and quantify the diseases irrespective of the plant. Further, the knowledge
gained will be useful for further investigation of various image processing techniques
to be applied on parts of the plants other than leaves.
1 Introduction
Agriculture is the means to feed the world’s rising population. Except for feeding the
world, plants contribute for reducing global warming. Various methods are used to
practice agriculture in different countries. The agriculture sector faces various chal-
lenges. Traditional methods of farming involved mostly human guessing, observing
crops through the naked eye is not an effective way as compared to modern tech-
niques. Some plants do not show the major symptoms and some plants show them
when it is too late. Powerful microscopes are used to detect plant diseases. Other
cases involve the use of electromagnetic spectrum which is not visible for naked
eyes. Modern techniques like efficient, digital image processing, has higher accu-
racy and are feasible too. Most of the diseases can be revealed in the visible spectrum.
Trained people tell the health of the crop, but the efficiency may not be high always
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 75
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_6
76 M. Sawai et al.
and also are costly. If collected samples in the field damage while transporting to the
laboratory for testing, visual rating becomes judgmental. Many such problems are
reduced through digital image processing. In this paper, we discuss various archi-
tectures of convolutional neural networks. We apply the concept of transfer learning
and use four different architectures to find the solution to our problem. We have
implemented this project using a graphical user interface with the help of python
flask, HTML, and CSS. We also find the accuracy measures after using each of these
different architectures. We also compare the outcomes we got by using the different
architectures and hence we conclude the paper by recommending the architecture
which is most appropriate for a particular given input.
2 Literature Review
Barbedo [1] has reviewed various digital image processing approaches that sense,
compute and categorize plant diseases by digital images in the noticeable range. As
the methods which deal with roots, seeds, and fruits need specialized methods so the
paper explores the diseases and the symptoms occurring only on stem and leaves and
not on the other parts of the plant. The survey tried to study as various problems as
possible.
Sladojevic et al. [2] developed a “Deep neural networks-based recognition of
plant diseases by leaf image classification”, which explored a new deep learning
approach for automatically detecting and categorizing the diseases of various plants
by the images of their leaves shot by a phone camera. The developed model could
find the presence of the leaf from the background images and could differentiate
between thirteen different types of diseases and healthy ones. The researchers created
a new plant disease image database, which included 30,000 images using required
transformation in addition to the original 3,000 images publicly available.
Toda et al. [3] discussed “How convolutional neural networks diagnose plant
disease”. Further, they have assessed an array of neuron-wise and layer-wise methods
array of visualization methods to detect plant diseases that were diagnosed using
convolutional neural networks (CNN). Here, CNN was trained by a dataset that was
available publicly. It was found that the colors and texture specific to the diseases
can be captured by neural networks which is identical to human decision-making.
Experimental outcomes signify those straightforward methods like naïve visualiza-
tion of the hidden output layer are not sufficient for the visualization of the diseases
of the plants. Visualization features and semantic dictionaries can be put to use to
find out the visual features which are required for disease identification.
Mohanthy et al. [4] highlighted the work of Krizhevsky et al. [5] which portrayed
instead of using outdated methods, it is practically possible to use supervised training
using deep convolutional neural network for image classification problems which
contain an enormous quantity of classes. They trained a model on a huge number of
high-resolution plant leaf images which used the architecture of the deep convolu-
tional neural network, to classify the disease which the model had not come across.
Intelligent Computing Systems for Diagnosing Plant Diseases 77
There exist many methods to detect plant diseases. Some types of diseases do not
show any symptoms which can be visually identified. In some cases, these symptoms
appear only at a time when it is too late. In such cases, these symptoms can be identi-
fied by using microscopes. Sometimes these signs can be identified in that area of the
electromagnetic spectrum which is cannot be seen by humans. In such cases, we can
use remote sensing methods which give us hyperspectral and multi-images. Detailed
information on this subject can be found in “Plant disease severity estimated visually,
by digital photography and image analysis, and by hyperspectral imaging” by Bock
et al. [6], “Recent advances in sensing plant diseases for precision crop protection.”
by Mahlein AK et al. [7] and “A review of advanced techniques for detecting plant
diseases” by Sankaran et al. [8].
Barberdo [1] explains that plant disease detection research has three stages. The
first stage is the detection of the disease, the second stage is disease severity quantifi-
cation and the last stage is the classification of the disease. But many of the diseases
generate changes in the visible spectrum. Most of the time the first guess is made by
trained people. These trained people may get their guesses correct at times but there
are many glitches in doing this. Bock et al. [6] listed a few of his observations:
• The trained people may get tired, which may affect their levels of concentration,
resulting in decreased accuracy.
• The guesses can vary from person to person.
• Standard area diagrams are required which will help the assessment.
• The people are required to undergo training and re-training to increase the quality.
• These guesses can be incorrect if the pictures taken from the fields are assessed
later in the laboratory.
• The trained people can have misinterpretations related to the area of infection,
lesion number, its size.
• Some plants or crops may stretch up to large areas thus monitoring becomes a
tedious and difficult task.
• Hence by the use of image processing techniques, we can get rid of these disadvan-
tages and increase accuracy and introduce uniformity in the disease identification
process.
Here the application monitors the plant at all times and rings an alarm as and when
a disease is detected. This is as mentioned in the paper, “Fall armyworm damaged
maize plant identification using digital images” by Sena et al. [10] and the paper,
“Lettuce calcium deficiency detection with machine vision computed plant features
in controlled environments” written by Story et al. [11].
Abdullah et al. [9] put forth a method trying to differentiate a disease called coryne-
spora from various diseases affecting the leaves of the rubber tree. The algorithm
does not use segmentation rather it uses principal component analysis to apply it to
red, green, blue values of low-resolution leaf images. Then the first two principal
components are fed to a multi-layer perceptron (MLP) neural network having a single
hidden layer. The output is nothing but whether the sample contains the disease or
not.
2.1.4 Thresholding
Sena et al. [10] proposed this method to distinguish between images of maize plants
with fall armyworm with healthy plants. The algorithms have two parts, image
processing, and image analysis. Initially, a greyscale transformation of the image
is done. The processing stage deals with the transformation of the image into a
greyscale, filtered, and threshold to discard spurious things. The image is divided
into 12 portions. The portion is set aside having an area less than 5% of the total area.
Then for the remaining portion, the number of the diseased portions was counted and
if this count is above a threshold, then the plant is considered as that of a diseased
one. On doing an empirical evaluation, this threshold value was set to ten.
2.2 Quantification
Here the objective is to quantify or measure the magnitude of the severity of the
particular disease. This can be done in two ways:
• Measuring the area of the affected leaves.
• Measuring how much deep it has affected the plant, with the help of texture and
color
As described earlier, the manual methods have their disadvantages which are
previously mentioned.
Intelligent Computing Systems for Diagnosing Plant Diseases 79
2.3 Classification
Saad Albawi et al. [13] explained the fundamentals concepts of neural network. The
way a neural network works was demonstrated in this paper. This paper also states
the parameters that affect the efficiency of the neural networks.
Meunkaewjinda et al. [14] proposes a technique for identifying and classifying the
diseases affecting grapevines. The method makes use of various color notations
(HIS, L*a*b, UVL, and YCbCr). Multi-layer perceptron neural network performs
the separation between leaves and background, a color library is constructed using
an unsupervised self-organizing map using an unsupervised and untrained map. In
each case, the number of clusters to be adopted is decided by the genetic algorithm.
A support vector machine (SVM) then separates the diseased and healthy parts. After
this some more manipulations are done, then the image is put through a multiclass
SVM, which classifies the sample into the respective class.
80 M. Sawai et al.
It was put forth by Hairuddin et al. [15]. In this research, he classifies various nutri-
tional deficiencies of oil palm plants. In the first phase concerning the color, the image
is segmented. Once the segmentation is finished, feature extraction takes place, and
these various color and texture features are given to a fuzzy classifier. The output
here does not provide with the deficiencies present but it rather gives the solutions
on what should be done based on the deficiency.
Murk et al. [16] proposed a model based on deep learning. They named it a plant
disease detector. As per the researchers, the model detects various plant diseases
using pictures of their leaves. To increase the sample size, augmentation was applied
and then CNN was used with multiple convolutions and pooling layers. To train the
model plant village dataset was used and for testing, 15% of its data with images of
healthy and diseased plants were used. The testing accuracy achieved by the model
is 98.3%.
3 Requirements
4 Implementation
4.1 Techniques
Deep convolutional neural networks take a lot of time for implementation. They
might take even days or weeks to train on very huge datasets. To combat this and
save time we can use the concept of transfer learning. Transfer learning is a technique
to re-use the weights of the model from models which are pretrained. These models
which are pretrained are developed for standard computer vision datasets. In our
model, we have used the pretrained weights of ImageNet. Transfer learning involves
using a model trained for a problem being used for another similar problem. Transfer
learning uses pretrained weights but the developer can add or remove layers as per
the needs of the different problem statement.
For this purpose, Keras applications provide neural network architectures. These
architectures vary with each other based on size, number of parameters, depth, etc.
Some of these popular models are:
• Xception
• VGG16
• VGG19
• ResNet50
82 M. Sawai et al.
Fig. 1 Photograph of
healthy Okra leaf
• ResNet101V2
• InceptionV3
• MobileNet
• DenseNet201
4.2 Dataset
The dataset consists of cotton and okra leaves which are both healthy and diseased.
We have used 6 classes for prediction as to the output namely:
• Fresh cotton leaf:427 images
• Diseased cotton leaf: 288 images
• Fresh cotton plant: 427 images
• Diseased cotton plant: 815 images
• Fresh okra leaf: 196 images
• Diseased okra leaf: 125 images
Figures 1, 2 and 3 show the sample data set images.
The credit for the cotton leaves dataset goes to Mr. Akash Zade. The okra leaves
dataset has been created by us. Also, we have divided the dataset into folders of
training and testing.
In this section, we will discuss the complete process flow of our implementation. We
have implemented our application by creating a user interface. It has been imple-
mented on the localhost. We have executed our source code on Spyder IDE also
making use of anaconda navigator.
Intelligent Computing Systems for Diagnosing Plant Diseases 83
Fig. 2 Photograph of
diseased Okra leaf
First, we need to download and install all the software requirements which have
been mentioned in the requirements section. After opening the anaconda navigator,
we created our environment of TensorFlow. Then we downloaded all the packages
mentioned previously. Once all the packages are downloaded, we need to import all
the packages and modules into our coding file. We already ran our neural network
model on google colab and saved it as a “.h5” file. Then we load this file using the
“load_model()” function. Then we define our predict function which takes an input
image and gives us the prediction with the help of the “predict ()” function. We
then created an instance of the flask and deployed our application on a local system
(localhost).
For the implementation, we just need to choose any image on the localhost. On
clicking the submit button we get redirected to the corresponding page of output
classification which gives us a prediction about its health.
Algorithm:
Step 1: Import all the necessary packages.
Step 2: Mount drive on Colab.
Step 3: Create model object.
Step 4: Compile the model.
Step 5: Perform image augmentation on training images.
Step 6: Training the convolutional neural network model.
Step 7: Plot training and validation curve using the function.
Step 8: Load an image for prediction and convert it to an array.
Step 9: Test using the predict() function.
The flowchart of the whole process is shown in Fig. 4.
5 Results
The result which we got is that our application is correctly able to predict a disease of
the six classes. After clicking on the predict button of the application, our application
was correctly able to redirect to the appropriate page of the predicted disease.
Table 1 shows the summary of the different CNN architectures used. We have
used InceptionV3, DenseNet201, and ResNet50 and all these are architectures of
convolutional neural networks. Here the number of parameters indicate the total
number of parameters used by the particular CNN architecture. As we can see the
parameters used by any one architecture are in millions. One particular layer of any
architecture can use up to a million parameters.
Figure 5 depicts the user interface of our application. Based on our research, we
have come up with the following aspects that can be considered while diagnosing
plant diseases:
• The solution should be easily available to the farmers which can be backed up by
designing a mobile application or website.
Intelligent Computing Systems for Diagnosing Plant Diseases 85
Input Image
Augmentation process
Testing
Output Result
Result
Table 1 Comparative
Model Name Accuracy Number of parameters
analysis of different models
InceptionV3 95.23 23,851,784
DenseNet201 98.41 20,242,984
ResNet50 72.38 25,636,712
• The algorithm can be tested with images of mixed light conditions to determine
whether the required accuracy is accomplished.
• To increase the effectiveness of the detection technique, a combination of various
features and learning methods can be used.
• The feature which determines the severity of the disease should be considered to
help the farmers to take corrective action within time.
• The model should be trained and tested with more varied data, maybe considering
more types of plants.
• The model should be extended to detect additional diseases which may not be so
common.
6 Conclusion
Thus, we are successfully able to predict the plant disease by implementing a web
application running on a local host. Two factors must be taken into consideration
for selecting an appropriate neural network architecture. The first factor is accuracy.
We conclude that using the DenseNet201 architecture gave us the highest accuracy.
The second factor is the number of parameters. We also need to focus on the number
of parameters the architecture uses because this factor will increase the time of
execution. Hence our recommendation will be to hit the right balance between these
two factors depending upon the hardware of the user.
We conclude this review with the hope that deeper research in this area will create
a positive impact on the agricultural industry having disease-free crops.
Acknowledgements All four researchers would like to thank their guide Prof. Dr. Manjiri
Ranjanikar for her constant support and guidance in writing this paper.
References
1. J.G.A. Barbedo, Digital image processing techniques for detecting, quantifying and classifying
plant diseases. SpringerPlus 2(1), 1–12 (2013)
2. S. Sladojevic, et al., Deep neural networks based recognition of plant diseases by leaf image
classification. Comput. Intell. Neurosci. 2016 (2016). https://doi.org/10.1155/2016/3289801
3. Y. Toda, F. Okura, How convolutional neural networks diagnose plant disease. Plant Phenomics
2019 (2019). https://doi.org/10.34133/2019/9237136
4. S.P. Mohanty, D.P. Hughes, M. Salathé, Using deep learning for image-based plant disease
detection. Front. Plant Sci. 7, 1419 (2016)
5. A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional
neural networks. Commun. ACM 60(6), 84–90 (2017)
6. C.H. Bock, G.H. Poole, P.E. Parker, T.R. Gottwald, Plant disease severity estimated visually,
by digital photography and image analysis, and by hyperspectral imaging. Crit. Rev. Plant Sci.
29(2), 59–107 (2010)
7. A.K. Mahlein, E.C. Oerke, U. Steiner, H.W. Dehne, Recent advances in sensing plant diseases
for precision crop protection. Eur. J. Plant Pathol. 133(1), 197–209 (2012)
Intelligent Computing Systems for Diagnosing Plant Diseases 87
Abstract Automatic segmentation of brain tumors plays important role for diag-
nosis of cancer. The work explores CNN-based auto-segmentation of high-grade
glioma using two models. Firstly, basic 3-dimensional VNet model is applied on
2D images using same architecture. Secondly, original WNet model is enhanced by
making it deeper with additional convolutional layers at encoder-decoder paths of
both UNet-like segments. Total 31 and 44 convolutional layers are used with 2D-
VNet and modified WNet, respectively, to experiment on BraTS 2018 MRI data.
It generated multi-region segmentation with three classes as per internal structures
of tumor namely—enhancing, non-enhancing/necrosis, and edema. Test accuracy of
99.52%, 99.49%, dice scores of 0.9957, 0.9958, dice loss of 0.425, 0.414 are obtained
by 2D-VNet and WNet, respectively. Training time taken by 2D-VNet and WNet is
44 and 77 s-per-epoch, respectively. Modified WNet exhibits more complexity than
2D-VNet model, whereas performance of both is almost similar.
1 Introduction
A brain tumor, like any other neoplasm in the body, occurs when cells grow at an
abnormal rate to form a mass of abnormal cells. These neoplasms grow over a varying
period of time depending upon whether they are benign or malignant (cancerous) in
nature. The brain is enclosed within the skull, so when these neoplasms start growing
inside the brain, the pressure within the closed space of the skull increases and gives
rise to symptoms indicating that the person is suffering from brain tumor. There
are non-cancerous tumors (benign) while cancerous tissues are more malignant.
There are two main brain tumor types—primary tumors and metastatic/secondary
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 89
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_7
90 S. Gore et al.
neoplasms. Primary neoplasm originates within brain being as either benign or malig-
nant. Secondary tumors are the ones that originate in other parts of human body like
breasts or lungs but later spread to the brain through blood. Secondary or metastatic
brain tumors are always cancerous and malignant. According to a study [1], the inci-
dence of brain tumor cases in Indian population ranges from 5 to 10 per 100,000
people with an increasing trend. 40% of all cancers spread to the brain. Brain cancer
is also the second most common cancer in children, accounting for nearby 26% of
infant cancers.
Recently, automatic tumor segmentation methods have achieved great advances
using deep neural networks. However, manual or semi-automatic segmentation
methods are still dominant during clinical practices, which is cumbersome and
requires the doctor’s expertise. Annotated medical scans require expert opinion,
which is costly, time consuming, and susceptible to inter-rater variability. Auto-
segmentation definitely help to reduce the impact of consequences on tumor diag-
nosis. Glioma segmentation surely has clinical relevance as well as importance during
treatment planning after immediate assessment of its severity as per type.
In order to tackle this problem, two deep neural networks are proposed here,
namely deep WNet with residual blocks and 2D-VNet for accurate auto-segmentation
of brain tumor tissues into three different regions. There is a high need to segment
the tumor area at micro level, which further helps to identify the different regions of
tumor area as enhancing, non-enhancing or necrotic, and peri-tumoral edema region.
These regions carry different characteristics, which cause differently for its severity,
its treatment planning or further consequences on the life quality of glioma patients.
Original VNet model is proposed to operate on 3D MRI/CT volumes. The VNet
model with similar architecture is applied on 2D images with our proposed system.
The original WNet model consists of two cascaded UNet-based models, and these
are further enhanced by making it deeper with additional convolutional layers at
encoder and decoder paths of both the UNet-like segments of WNet network.
2 Related Work
context and exactly symmetric expansive path for specific localization. These have
demonstrated that such a network can be trained from start to finish using just a
few images. Shreyas et al. [3] have proposed a deep learning-based architecture for
segmentation of brain tumor from MRI scans while the work built upon UNet archi-
tecture proposed in [4] and have used a fully convolutional neural network. Those
have experimented on a UNet model with changes like using an empirically derived
class weights instead of using pixel-wise weight maps, that has reduced the need
of high memory usage. The batch-normalization layers are also included in network
after every convolution step in order to speed up the training and to reduce an effect of
internal co-variate shift [5]. Few studies have used preprocessing techniques such as
intensity normalization, bias correction to prepare MRI images for final processing
[6, 7]. As the present study is working with WNet and VNet [12], we have studied
the method in which it has made use of a cascade of WNet and UNet. The model has
designed UNet in conjunction with WNet with an intent that WNet would segment
complete tumor portion from normal brain tissues and the outputted bounded-box
would be fed as an input to UNet to further segment the tumor region in its internal
sub-regions. This approach was taken by Reji et al. [24] in order to increase the accu-
racy of UNet predictions. Another notable work is the cascaded framework proposed
by Wang et al. [8] where three fully connected networks (FCNs) were employed in
hierarchical and sequential fashion in order to segment the internal structures of
brain tumor, and each of these FCNs have dealt with binary classification problem in
segmentation task. Firstly, a multimodal 3D image is fed as an input to WNet model
that segments complete tumor and finally its cropped region (bounding box of whole
tumor) is attained. Then, the cropped region from WNet [9] is provided as input to
second network i.e., TNet [10] to segment core region specifically. Similarly, region
inside the bounding box generated by TNet is used as input for ENet [11] to segment
particularly enhancing core region of tumor. Another work proposed by Casamitjana
et al. [13] has implemented a cascaded VNet architecture, that have redesigned the
residual connections and utilized the masks of region of interest in order to segment
further the relevant regions of brain; that has helped to solve class imbalance problem
which is generally inherent in tumor segmentation tasks. A two-step process was
used where the first step is to localize the tumorous tissues and sub-regions within
tumor are distinguished in the second step by ignoring set of unwanted background
voxels. The actions are conducted simultaneously using two CNN (Convolutional
Neural Network) models, in which prediction of first model is provided to second
one as next input. Chen et al. [14] proposed a bridging design between two UNet
architectures. It has connected each layer of expansive path of first UNet with its
corresponding layer in contracting path of second UNet, and it directly inputs the
auto-learnt feature panel of previous layers to later layers. This technique has reduced
the training cost and exhibited a better performance in contrast to single UNet archi-
tecture. A fully automated tumor identification and volume estimating method was
introduced in the work done by Ogretmenoglu et al. [15] on FLAIR images, in
which mean/area variance dependent analysis is used to assess whether or not there
is a tumor in any part of the hemisphere. Gadolinium injected T1 weighted scans are
used to exclude non-brain areas such as fat tissues, skull. The clustering using fuzzy
92 S. Gore et al.
C-means method is used to detect edema region in FLAIR scans, while threshold
segmentation is used to detect tumor in T1 post-Gadolinium (T1CE) images. Tumor
volume is measured using tumor area and MRI slice thickness data. Kermi et al. [16]
tested a UNet-based model with modifications such as residual blocks with large
padding and the absence of a pooling layer. It used data augmentation on BraTS
2018 challenge data and obtained mean dice coefficient values of 0.868, 0.783, and
0.805 for whole tumor, enhancing, and core region, respectively. Segmentation in the
presence of a dynamic context and a fuzzy boundary is one of the most challenging
tasks in biomedical image segmentation. To fix such problem, Huimin Huang et al.
[17] proposed WNet, a double U-shape-based architecture capable of exact localiza-
tions along with sharpening the inter-regional margins. The study has constructed
an atlas-based segmentation network [18] to generate position awaked segmenta-
tion maps using former knowledge of anatomy of human body. Furthermore, other
work has developed refinement network for boundary enhancement [19] to produce
a consistent boundary. The experimental findings has demonstrated that presented
WNet model reliably catch desired body part with sharpened information and thus
increased efficiency on two datasets, with dice score values of 0.9661 and 0.9625 for
liver and spleen, respectively. Therefore, it is found that research in deep learning to
perform segmentation on medical images has helped a lot to improve the ability to
treat ailments. MRI-based automatic segmentation of brain tumors plays a crucial
role for tumor diagnosis, surgical, or other suitable treatment planning for brain
cancer patients. And convolutional neural networks have been extensively employed
for the segmentation tasks.
Hence, our work explores the segmentation of high-grade glioma tumors using
two deep learning models. The first method uses the VNet model and the second
method makes use of the WNet model with residual blocks. Original VNet model,
which is proposed for 3D volumes, is applied on 2D images with our proposed system
using same architecture of basic VNet model. The original WNet model consists of
two bridged UNet-based models, and these are further enhanced by making it deeper
with additional convolutional layers at encoder side as well as at decoder path of
both the UNet-like segments of original WNet network.
The data released under BRATS 2018 challenge (Multimodal Brain Tumor Segmen-
tation Challenge); that is provided by Medical Image Computing and Computer
Assisted Intervention (MICCAI), is used with our work [20, 21]. The data consists
clinically acquired presurgical multimodal MRI scans of 210 high-grade glioma
(HGG) patients, each contains volumes of four MRI modalities including T1
weighted (T1W), gadolinium-enhanced T1 weighted (T1-GD), T2 Weighted (T2W),
fluid attenuated inversion recovery (FLAIR), and it also contains the ground truth
volume segmented by expert team of neuro-oncologist and radiologist. This data
Multimodal MRI Analysis for Segmentation … 93
4 Method
It first preprocesses the images by converting every 3D MRI volume into a NumPy
array using the SimpleITK library available in Python. While the 3D NumPy array
consists of 155 two dimensional slices, each of size 240 × 240. These slices are
further cropped to a size of 192 × 192 to remove unwanted background pixels. 7750
2D images of each of 4 MRI modalities has resulted in total 31,000 images; which
are finally included in our experimentation. Further these images are standardized
using z-score normalization method, and such normalized images are fed as an input
to deep models (2D-VNet and modified WNet) for auto-extraction of features. The
model is learnt from auto-extracted feature set and finally generates the segmented
output with three classes as enhancing, non-enhancing, and edema sub-regions. The
typical flow of proposed system is presented in Fig. 1, that has followed the steps in
specific manner as mentioned in above description in order to preprocess and prepare
the data.
Figure 3 shows a schematic representation of modified WNet with residual blocks and
deep architecture. The WNet model consists of two cascaded UNet-based networks
[4]. Each contracting and expanding path of modified WNet network contains 5
blocks, and each block of encoder/decoder consists of two consecutive convolution
layers with activation kernel of size 3 × 3, rectified linear unit (ReLU) activation
function and each block is followed by maximum pooling layer, and further followed
by a residual block. The residual block helps to preserve the location information
of pixels during each convolutional layer of down-sampling path. It learns from the
residue of true output and the input. The architecture of the residual block is as shown
in the Fig. 4. The number of filters used are 32, 64, 128, 256, 512 at encoder/decoder
paths with 1024 filters at connection point of encoder and decoder. At the end, softmax
activation function is applied on the output layer. A dropout layer with dropout rate
of 20% is applied at the end of each block. ADAM optimizer (with learning rate of
1e−5) is used to train the model.
Two different deep learning models namely 2D-VNet and WNet with residual blocks
are trained using 60% images, validated using separate cohort of 20% images and it is
further tested with separate group of 20% images. Performance of these two models
is measured by the modified dice coefficient metric which evaluates the performance
of both an accuracy of automated probabilistic segmentation (with spatial overlap
between actual and predicted) and the reproducibility of manual segmentation of
96 S. Gore et al.
MRI images. The original dice coefficient formula with slight modification is used
to measure the performance of models, which computes the sum of square of actual
and predicted output added with epsilon value of 1e-6 in the denominator instead
of using sum of absolute of actual and predicted output in the denominator as per
shown in Eq. 1.
2 × SEG(gt) ∩ SEG(pr)
Dice Coefficient = (1)
SEG2(gt) + SEG2(pr) +
where SEG(gt) is the ground truth segmentation of tumor with three class annotation,
SEG(pr) is the predicted segmentation of tumorous region with predicted three class
labels and ε is 1e−6. Dice loss can be simply calculated as per Eq. 2.
The experimentation is carried out using Python implementation, with Keras and
Tensorflow 2.4 as a backend. All experiments are conducted on Google colaboratory
with 25 GB RAM. The experimentation is carried out on 31,000 brain tumor slices
obtained from 50 HGG patient’s data with four different modalities. The data is split
in the ratio 3:1:1 that is 60% for training, 20% for validation, and 20% for testing.
Both the models are trained with batch size of 8 and with 30 epochs. Performance
of both the models is measured by dice coefficient, dice loss, and accuracy. Table
1 shows the performance metrics obtained during training, validation, and testing
phase.
Table 1 Performance metrics obtained during training, validation, and testing
Method Training Validation Testing
Accuracy Dice coefficient Dice loss Accuracy Dice coefficient Dice loss Accuracy Dice coefficient Dice loss
Multimodal MRI Analysis for Segmentation …
2D-VNet 99.5549 99.6391 0.3608 99.5115 99.5663 0.4336 99.5210 99.5744 0.4255
Modified 99.5712 99.6459 0.3541 99.5044 99.5901 0.4099 99.4973 99.5856 0.4143
WNet
Bold indicates the highest values for accuracy, dice coefficient, and lowest value for dice loss, which are obtained during training, validation and testing phase
97
98 S. Gore et al.
Fig. 5 a Dice loss and accuracy graph for 2D-VNet, b graph for modified WNet
It is observed from the Table 1 that the modified WNet model with residual blocks
gives slightly more accuracy, dice coefficient, and less dice loss as compared to 2D-
VNet while training the model. It could also be seen that the proposed model of
WNet gives slightly better results in terms of dice coefficient than 2D-VNet during
testing phase. The loss and accuracy graphs of training and validation for 2D-VNet
and modified WNet are given in Fig. 5a, b, respectively.
The qualitative comparison of the results of FLAIR modality between 2D-VNet
and modified WNet with additional residual blocks is presented in Fig. 6. It shows the
original 2D slice of FLAIR modality, its ground truth segmentation, and its predicted
segmented tumorous regions which is divided into 3 classes—enhancing tumor (in
yellow color), edema (green color), and non-enhancing or necrotic tumor (dark green
color).
The segmentation map is obtained from 2D-VNet and modified deep WNet with
residual blocks. Each tumor region is highlighted with a different color: edema shown
in green color, enhancing tumor region shown in yellow color, and non-enhancing
tumor area shown in dark green color. It should be highlighted that the segmentation
map resulted by modified deep WNet with residual blocks gives more accurate results
as compared to 2D-VNet by maintaining sharp boundaries even in small object
circumstances. The efficacy of WNet based model performed better in comparison
to 2D-VNet based model with a slightly high dice coefficient. The dice coefficient
achieved with both the models signifies the highest similarity between ground truth
and predicted segmentation map.
Multimodal MRI Analysis for Segmentation … 99
6 Conclusion
The work has demonstrated an effective performance to segment the tumorous tissues
from non-tumorous as well as further to classify different regions of tumorous tissues
using deep neural networks such as 2D-VNet and deep modified WNet with residual
blocks. The segmentation of whole brain tumor and intra-tumoral regions was carried
out for high-grade glioma. The projected deep neural networks were tested and
evaluated quantitatively on BRATS 2018 dataset. The 2D-VNet model learns at the
rate of 1.4 ms per 2D slice, whereas WNet takes comparatively more time which
is 2.4 ms per 2D slice. The tests carried out showed that the segmentation results
obtained by our models are very similar to those manually obtained by the experts. In
comparison, modified WNet architecture yielded dice score of 99.58% during testing
and showed slightly better results than 2D-VNet with minute edge differences in
predicted segmentation output. Therefore, both the architectures are equally effective
for glioma segmentation with classification of intra-tumoral regions, but 2D-VNet is
comparatively time efficient than WNet model. These models can be further trained
to segment low-grade glioma tumors along with high-grade glioma to make such
models more generalized.
100 S. Gore et al.
References
1. A. Dasgupta, T. Gupta, R. Jalali, Indian data on central nervous tumors: a summary of published
work, in South Asian J. Cancer 5(3), 147–153 (2016)
2. J. Long, E. Shelhamer, T. Darrell, Fully convolutional networks for semantic segmentation, in
IEEE Conference on Computer Vision and Pattern Recognition (2015)
3. V. Shreyas, V. Pankajakshan, A deep learning architecture for brain tumor segmentation in
MRI Images, in IEEE 19th International Workshop on Multimedia Signal Processing (MMSP)
(2017), pp. 1–6. https://doi.org/10.1109/MMSP.2017.8122291
4. Ronneberger, O., Fischer, P., Brox, T.: U-net: convolutional networks for biomedical image
segmentation, in International Conference on Medical image computing and Computer-
Assisted Intervention. Lecture Notes in Computer science, vol. 9351. Springer, Cham (2015),
pp. 234–141
5. S. Ioffe, C. Szegedy, Batch normalization: accelerating deep network training by reducing
internal covariate shift, in CoRR, vol. abs/1502.03167 (2015)
6. N.J. Tustison, J.C. Gee, N4ITK: Nick’s N3 ITK implementation for MRI bias field correction.
IEEE Trans. Med. Imaging 29(6), 1310–20 (2010)
7. L. Nyul, J. Udupa, On standardizing the MR image intensity scale. Magnet. Resonance Med.
42(6), 1072–1081 (1999)
8. G. Wang, W. Li, S. Ourselin, T. Vercauteren, Automatic brain tumor segmentation using
cascaded anisotropic convolutional neural networks, in Brainlesion: Glioma, Multiple Scle-
rosis, Stroke and Traumatic Brain Injuries. Lecture Notes in Computer Science, vol. 10670
(Springer, Cham, 2017)
9. X. Xia, B. Kulis, W-Net: a deep model for fully unsupervised image segmentation, in Computer
Vision and Pattern Recognition. arXiv:1711.08506 (2017)
10. T.J. Jun, J. Kweon, Y.H. Kim, D. Kim, T-Net: nested encoder decoder architecture for the main
vessel segmentation in coronary angiography, in Neural Networks, vol. 128 (2020)
11. P. Adam, A. Chaurasia, K. Sangpil, C. Eugenio, ENet: A deep neural network architecture for
real-time semantic segmentation, in Computer Vision and Pattern Recognition, arXiv:1606.
02147 (2016)
12. F. Milletari, N. Navab, S.A. Ahmadi, V-net: fully convolutional neural networks for volumetric
medical image segmentation, in Computer Vision and Pattern Recognition. arXiv:1606.04797
(2016)
13. A. Casamitjana, M. Cata, I. Sánchez, M. Combalia, V. Vilaplana, Cascaded V-Net using ROI
masks for brain tumor segmentation, in Brain Lesion: Glioma, Multiple Sclerosis, Stroke and
Traumatic Brain Injuries. Lecture Notes in Computer Science, vol. 10670 (Springer, Cham,
2018)
14. W. Chen, Y. Zhang, J. He, Y. Qiao, Y. Chen, H. Shi, X. Tang, Prostate segmentation using 2D
bridged U-net, in International Joint Conference on Neural Networks (2019), pp. 1–7. https://
doi.org/10.1109/IJCNN.2019.8851908
15. C. Ogretmenoglu, Fiçici, O. Erogul, Z. Telatar, Fully automated brain tumor segmentation and
volume estimation based on symmetry analysis in MR images, in CMBEBIH 2017. IFMBE
Proceedings, vol. 62 (Springer, Singapore, 2017)
16. A. Kermi, I. Mahmoudi, M. Khadir, Deep Convolutional neural networks using U-Net for
automatic brain tumor segmentation in multimodal MRI volumes. In: BrainLes 2018, LNCS,
vol. 11384 (Springer, Berlin, 2019), pp. 37–48
17. H. Huang, L. Lin, R. Tong, H. Hu, Q. Zhang, Y. Iwamoto, et al., WNET: an end-to-end
atlas-guided and boundary-enhanced network for medical image segmentation, in IEEE 17th
International Symposium on Biomedical Imaging (ISBI), 3–7Apr 2020, Iowa City, Iowa, USA
18. G. Gindi, A. Rangarajan, I. Zubal, Atlas-guided segmentation of brain images via opti-
mizing neural networks, in Proceedings of SPIE Biomedical Image Processing and Biomedical
Visualization, vol. 1905 (1993). https://doi.org/10.1117/12.148668
19. C. Zhuang, X. Yuan, W. Wang, Boundary enhanced network for improved semantic segmen-
tation, in Urban Intelligence and Applications (2020), pp. 172–184
Multimodal MRI Analysis for Segmentation … 101
20. B. Menze, A. Jakab, S. Bauer, J. Kalpathy-Cramer, K. Farahani, J. Kirby et al., The multimodal
brain tumor image segmentation benchmark (BRATS). IEEE Trans. Med. Imaging 34(10),
1993–2024 (2015)
21. S. Bakas, H. Akbari, A. Sotiras, M. Bilello, M. Rozycki, J. Kirby, et al., Advancing the Cancer
Genome Atlas glioma MRI collections with expert segmentation labels and radiomic features.
Nat. Sci. Data 4, 170117 (2017)
22. K. He, X. Zhang, S. Ren, J. Sun, Deep into rectifiers: Surpassing human-level performance
on ImageNet classification, in Proceedings of International Conference on Computer Vision
(ICCV) (IEEE Computer Society, 2015), pp. 1026–1034
23. He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition, in IEEE
Conference on Computer Vision and Pattern Recognition (2016), pp. 770–778. https://doi.org/
10.1109/CVPR.2016.90
24. S. Reji, E. Earley, M. Basak, Brain tumor segmentation, in CS230: Deep Learning (Standford
University, CA, 2018)
Early Onset Alzheimer Disease
Classification Using Convolution Neural
Network
Abstract Alzheimer’s disease is one of the major causes of death. The disease
treatment is highly recommended in its early stage as it is difficult to treat it in
later stage. The diagnosis of this slow growing disease is difficult as it does not
show any symptoms in the early stage. As the deep neural networks have shown its
success to process the medical images, the paper uses convolution neural network for
early detection of the Alzheimer’s disease using binary classification. The network
model uses T2 magnetic resonance images from Alzheimer’s disease neuroimaging
initiative dataset. The preprocessing extracts the slices with hippocampal region
from the three-dimensional images and removes non-brain region of the slice. The
proposed method achieves 71.13% accuracy. It performs better than AlexNet in terms
of loss and prediction time.
1 Introduction
Data used in preparation of this article were obtained from the Alzheimer’s Disease Neuroimaging
Initiative (ADNI) database (adni.loni.usc.edu). As such, the investigators within the ADNI con-
tributed to the design and implementation of ADNI and/or provided data but did not participate in
analysis or writing of this report. A complete listing of ADNI investigators can be found at: http://
adni.loni.usc.edu/wp-content/uploads/how_to_apply/ADNI_Acknowledgement_List.pdf.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 103
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_8
104 H. Ramani and R.A. Kapdi
and society, the cost required to cure AD patients as well as the increase in the
death rate between 2000 and 2017. The report stated that in spite of intensive care of
AD and other dementia patients, the death rate increases every year. “World Health
Organization” [2] published the information about leading causes of death in the
world. Figure 1 shows pictorial view for death rate of AD compared to the other
diseases. AD holds the seventh position in top ten causes of death in year 2019
globally and has the second highest death rate increase from 2000 to 2019 years.
Due to AD, brain shrinking and death of neurons happen. While in the early
stage of AD, the patient does not show any physical changes as these changes are
minor and unnoticeable. As the disease proceeds, neurons in the brain are destroyed
because of two reasons: (1) the abnormal growth of protein fragment beta-amyloid
and its flow outside neurons. This results in neuron death and prevention of neuron-to-
neuron communication and (2) the abnormal growth of tau protein tangling inside the
neuron which prevents the flow of nutrients and other molecules inside the neurons.
After around 20 year or more, these slow growing changes come up with the physical
symptoms like memory loss, language problem, and inability to remember things at
Early Onset Alzheimer Disease Classification Using Convolution Neural Network 105
a level that they need a full-time assistant. Symptoms of AD will differ from human
to human. Once neurons are dead, there is no way to get them back, because of this
no treatment available to cure AD.
To detect AD, doctor takes questionnaire, test of attention, memory, and language,
etc. If AD seems to be in higher stage from these tests, then experts like neurologist
and radiologist use MRI to diagnose the structural changes in brain due to Alzheimer.
The researchers have tried to address early stage AD with various automated/semi-
automated approaches. Such approaches are addressed in Sect. 2. The paper uses
MRI to diagnose AD using deep neural network approach. The dataset description
and preprocessing mechanism are covered in Sect. 3. And the proposed method and
results are discussed in Sects. 4 and 5. Section 6 concludes the work with future work.
2 Literature Survey
In this section, literature survey carried out for research work is explained in various
subsections.
Across the past several years, many techniques are developed to diagnose AD and
classify its stages. Because of the inadequate dataset, most of the research work is
done on publicly available ADNI dataset and OASIS dataset. This research work can
be categorized into proposed techniques based on CNN model’s variations and other
conventional methodologies. Figure 2 shows the survey taxonomy.
Various CNN models are developed to address AD diagnosis and classification. These
benchmark CNN models have less depth but still provide better accuracy. Another
variation of CNN is transfer learning approach for the detection and classification of
AD, as it gives more accuracy than other solutions. In transfer learning, pretrained
weights, i.e., learned parameters of a particular dataset are used for AD classification.
Description about those research work is given in Table 2.
CNN is a deep neural network (DNN) architecture which tries to mimic the natu-
ral visual perception mechanism [12]. From the invention of the first framework of
CNN to present, there is a significant development which has been done on CNN.
Some benchmark models of CNN are also developed by researchers such as LeNet,
AlexNet, InceptionNet, VGGNet, ResNet, and GoogleNet. The CNNs address wide
variety of problems including image classification, text recognition, object detec-
tion, natural language processing, and many more. CNNs gain popularity for its
better learning capabilities from automatic feature extraction without any human
intervention. Fundamental components of CNN are convolutional layer, pooling
layer, fully connected(FC) layer, activation function, loss function, regularization,
and optimization.
Details of dataset used for research work and preprocessing method are explained in
various subsections.
To diagnose AD, 715 MRIs are selected with axial view of PD/T 2 weighted -
FSE/TSE MRIs in NiFTI format. These images are split into 160 AD, 343 MCI,
and 212 normal aging/CN. To classify subjects into two classes, MCI and AD sub-
jects are counted as AD class and rest of the subjects are having CN class. Hence, 503
subjects are of AD and 212 of CN. Database is used from ADNI database [13, 14].
Table 3 contains information about the dataset used, such as age range and gender
about all subjects.
The FSL-BET [15, 16] tool removes the non-brain region from input 256×256×104
3D MRIs. In the next step, nii2png Python package [17] converts 3D MRIs to 2D
slices. From this 2D slices collection, the slices which have hippocampal region are
selected as hippocampal region is the first and most affected brain region showing
AD effects. At the end, these 2D slices are scaled to 227×227 dimension. The entire
preprocessing steps with example image are illustrated in Fig. 3.
The proposed CNN model addresses the early onset AD classification. The proposed
CNN model as shown in Fig. 4 consists of three convolutional layers. The convolution
layer is an essential component of the CNN framework, which performs the feature
extraction by using an aggregation of nonlinear and linear operations. First layer is
having 16 filters with 4 kernel-size and softmax activation function (AF). Second
layer used 32 filters with 5 kernel-size and LeakyReLU AF. And the third layer has 64
filters with 3 kernel-size, ReLU AF, and padding value as valid. Convolutional layers
have learnable parameters in the form of filters and kernels. Each filter is convolved
over the entire input volume and calculated the dot product between input and values
of the filter in the forward pass.The prominent features of these layers are extracted
using max pooling layers which follow them. Pooling layers do not have any learnable
parameters. The pooling layer divides the input image into group of non-overlapping
portions and that each sub-portion gives the output as per requirement like maximum
value or minimum value, etc. Based on that, pooling layers are having three most
used variations: min pooling, average pooling, and max pooling. The max pooling
is used in the proposed architecture.
The first and last batches of the convolution layer and pooling layer are followed
by dropout of 0.1. The second batch of these layers is followed by batch normal-
ization with epsilon = 0.2, momentum = 0.99, renorm-momentum = 0.99, axis =
−1, and scale parameter = False. The output of the last layer is converted into a
one-dimensional array of a vector. This flattened layer is connected to dense layers,
which are called FC layers as each input is connected to all output using learnable
110 H. Ramani and R.A. Kapdi
weights. Activation functions and the number of nodes can be defined as parameters
in the FC layers. At end of model, there are four FC layers with sigmoid, ReLU,
ReLU, and sigmoid AF, respectively. The last layer from a set of FC layers has same
number of nodes as the number of classes. Due to less network depth, any specific
hardware is not required for training the model. The model is implemented using
TensorFlow and Keras. Training and test MRI datasets are split in a ratio of 77% and
23%, respectively. Accordingly, training dataset consists of 550 subjects of 387 for
AD and 163 for CN, and test dataset consists of 165 subjects of 116 for AD and 49
for CN.
5 Results
The parameters for the training phase are as follows: Loss function = binary_crossentropy,
the optimizer = stochastic gradient descent, epochs = 20, batch size = 128, and steps
per epoch = 1. The model results in 71% average training accuracy and 71.13%
average testing accuracy.
These results of the proposed model are compared to the well-known AlexNet
model with the same set of input. In AlexNet model, parameters for the training
phase are Adam optimizer with 0.001 learning rate, binary_crossentropy and steps
per epoch = 1, epochs = 20. The AlexNet results in 63.69% average training accuracy
and 69.53% average testing accuracy.
Comparison of proposed CNN model and AlexNet is shown in Table 4 , for the
parameters—average testing accuracy, average testing loss, and time taken by mod-
els. AlexNet model has more depth, and hence, it requires more time for the training
process. In a comparison of that, the proposed CNN model takes less time for the
training phase as it has fewer layers. By that, the CNN model is more time efficient
and less complex in structure than the AlexNet model. In proposed CNN model, batch
Early Onset Alzheimer Disease Classification Using Convolution Neural Network 111
6 Conclusion
As an output of this research work, a CNN model is proposed. This model can be
used for the early diagnosis of AD. While most of the existing research work for
AD diagnosis is done using AlexNet, the proposed CNN model is time efficient
compared to AlexNet. The model is trained and tested on the ADNI dataset, which
is used by the majority of researchers in this research area. More efficient in terms
of time and space, deep, and the accurate network is required for early diagnosis of
AD. Clinical assessment data can be used along with image data for better results.
After getting satisfactory results in that, pathological and genetic data can also be
used collaboratively. As this area of research is in the initial stage, there are many
ways open for researchers to contribute to this research area. Though the presented
model gives 71.13% accuracy for a similar dataset as an input, AlexNet gives less
accuracy. In the future, the proposed model can be improved to increase accuracy
and make use of the large dataset.
112 H. Ramani and R.A. Kapdi
References
1. A. Association, Alzheimer’s disease facts and figures. Alzheimer’s Dementia 15(3), 321–387
(2019)
2. W.H. Organization, The top 10 causes of death (2021). Accessed 15 Feb 2021. https://www.
who.int/news-room/fact-sheets/detail/the-top-10-causes-of-death
3. D. Manzak, G. Çetinel, A. Manzak, Automated classification of Alzheimer’s disease using
deep neural network (DNN) by random forest feature elimination, in 2019 14th International
Conference on Computer Science & Education (ICCSE). IEEE (2019), pp. 1050–1053
4. F. Ahmad, W. Dar, Classification of Alzheimer’s disease stages: an approach using PCA-based
algorithm, vol. 33 (2018), p. 153331751879003. https://doi.org/10.1177/1533317518790038
5. H.I. Suk, D. Shen, Deep learning-based feature representation for AD/MCI classification, in
International Conference on Medical Image Computing and Computer-Assisted Intervention
(Springer, Berlin, 2013), pp. 583–590
6. W. Lin, T. Tong, Q. Gao, D. Guo, X. Du, Y. Yang, G. Guo, M. Xiao, M. Du, X. Qu et al., Con-
volutional neural networks-based MRI image analysis for the Alzheimer’s disease prediction
from mild cognitive impairment. Front. Neurosci. 12, 777 (2018)
7. M. Maqsood, F. Nazir, U. Khan, F. Aadil, H. Jamal, I. Mehmood, O.Y. Song, Transfer learning
assisted classification and detection of Alzheimer’s disease stages using 3d MRI scans. Sensors
19(11), 2645 (2019)
8. M. Puranik, H. Shah, K. Shah, S. Bagul, Intelligent Alzheimer’s detector using deep learn-
ing, in 2018 Second International Conference on Intelligent Computing and Control Systems
(ICICCS). IEEE (2018), pp. 318–323
9. M.D. Chitradevi, P. Sathees, Analysis of brain sub regions using optimization techniques and
deep learning method in alzheimer disease, vol. 86 (2019), p. 105857. https://doi.org/10.1016/
j.asoc.2019.105857
10. S. Afzal, M. Maqsood, F. Nazir, U. Khan, F. Aadil, K. Awan, I. Mehmood, O.Y. Song, A
data augmentation-based framework to handle class imbalance problem for Alzheimer’s stage
detection, vol. 7 (2019), pp. 1. https://doi.org/10.1109/ACCESS.2019.2932786
11. N.M. Khan, N. Abraham, M. Hon, Transfer learning with intelligent training data selection for
prediction of Alzheimer’s disease. IEEE Access 7, 72726–72735 (2019)
12. J. Gu, Z. Wang, J. Kuen, L. Ma, A. Shahroudy, B. Shuai, T. Liu, X. Wang, G. Wang, J. Cai
et al., Recent advances in convolutional neural networks. Pattern Recogn. 77, 354–377 (2018)
13. Access data and samples. Available at http://adni.loni.usc.edu/data-samples/access-data/
14. A secure online resource for sharing, visualizing, and exploring neuroscience data. Available
at https://ida.loni.usc.edu/login.jsp
15. Fmrib software library v6.0 Available at https://fsl.fmrib.ox.ac.uk/fsl/fslwiki/
16. M. Jenkinson, C.F. Beckmann, T.E. Behrens, M.W. Woolrich, S.M. Smith, FSL. Neuroimage,
62, 782–90 (2012)
17. A.A. Laurence, NIfTI-Image-Converter (2021). Accessed 30 Jan 2021. https://alexlaurence.
github.io/NIfTI-Image-Converter/
18. MRI scans. Available at https://www.physio-pedia.com/MRI_Scans
19. T. Tapiola, I. Alafuzoff, S.K. Herukka, L. Parkkinen, P. Hartikainen, H. Soininen, T. Pirttilä,
Cerebrospinal fluid β-amyloid 42 and tau proteins as biomarkers of Alzheimer-type pathologic
changes in the brain. Archiv. Neurol. 66(3), 382–389 (2009)
A Study on Evaluating the Performance
of Robot Motion Using Gradient
Generalized Artificial Potential Fields
with Obstacles
Abstract Motion planning (MP) is actually a specialized version with more general
Artificial Intelligence (AI) planning problem. The goal of a general purpose planning
problem is to come up with a sequence of actions that accomplish a given goal. There
are two approaches that utilize random samples. One the Probabilistic Road Map
algorithm that sought the construct of a road map. Second, the rapidly exploring
random tree procedure, which constructs every evolving trees to explore the free
space and forge paths between the start and the goal. Both algorithms have the
pleasing property that they work quite well in practice, even on high dimensional
configuration spaces. Finally, the behavior of the generalized potential fields with
obstacles are discussed. It helps to steer the robot through configuration space by
considering the gradient of this artificial potential field. A strength of these potential
field methods is that they are relatively simple to implement, and they can often
be carried out directly based on sensory input. The finding of the present research
is to describe the challenges being faced in the area of motion planning of robot
and controlling its speed using gradient in generalized artificial potential fields with
obstacles.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 113
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_9
114 S. M. Basha et al.
1 Introduction
The Motion Planning Problem is actually more specifically concerned with coming
up with plans to move a robot from one location to another that avoids all the obstacles
in the environment. This basic approach can be applied to a wide variety of robotic
systems including relatively simple robots that roll around on the ground, to robotic
arms with multiple degrees of freedom.
In Fig. 1, the robots are constrained to move around on a grid of cells. The
limitations are that it cannot go outside the playing area or enter any of the black
grid cells. These correspond to obstacles in real world. Here, our goal is to come up
with a sequence of steps that will take the robot from the starting location (Green
cell), to the goal location (Red cell). Typically, end up with a mathematical structure
called a graph. In this context, path is simply a sequence of consecutive edges that
lead from one node to another. There are many different paths that would solve the
above-mentioned problem.
In Grassfire algorithm [1], begin with marking the destination node with a distance
value of 0. Then, find all of the nodes that are 1 step away from the destination, and
label them with 1. Then, all the nodes that are 2 steps away are labeled as 2 etc., until
the start node is encountered. For every cell in the grid, the distance value that it gets
marked with indicates the minimum number of steps that it would take to go from
that point to the destination node as shown in Fig. 2.
The red line indicates the shortest path (or) list of nodes that has to be visited to
reach the source node. The pseudo code of the Grassfire algorithm is presented in
Algorithm 1.
Note that if two neighbors have the same distance value, it is obvious to choose
one arbitrarily. This happens when the shortest path to the goal is not unique. The
grassfire algorithm has the following desirable properties. If a path exists between
the start and the destination node, it will find one with the fewest number of edges.
If no path exists, the algorithm will discover that fact and report it to the user. In this
116 S. M. Basha et al.
sense, we say that the grassfire algorithm is complete. More formally the amount of
computational effort that needs to expend in order to run the grassfire algorithm on
a grid grows linearly with the number of nodes.
By using priority queues to store sorted list of nodes the complexity can be reduced
as shown in Eq. (1),
where |V | denotes the number of nodes in the graph and |E| denotes the number of
edges.
1.3 A* Algorithm
Heuristic functions are used to map every node in the graph to a non-negative value.
The criteria of the heuristic function is as shown in Eq. (2).
H (goal) = 0
for any 2 adjacent nodes X and Y
H (X ) ≤ H (Y ) + d(X, Y )
d(X, Y ) = weight/length of edge from X and Y
For path planning on a grid the mostly used heuristic functions (Euclidean and
Manhattan distance) are expressed in Eqs. (3 and 4).
2 2
H (xn , yn ) = xn − x g + yn − yg (3)
H (xn , yn ) = xn − x g + yn − yg (4)
Algorithm 3: A*Algorithm
Step 1: Start
Step 2: For each node n in the graph
2.1: n.f = infinity
2.2: n.g = infinity
Step 3: Create an empty list
Step 4: start.g=0
4.1: start.f = H(Start)
4.2: add start to list
Step 5: while list not empty
5.1: Let current= node in list with the smallest f value
5.2: Remove current from list
Step 6: if (current = = goal node) report success
Step 7: For each node n that is adjacent to current
7.1: if n.g>(current.g + cost of edge from n to current)
7.1.1: n.g =current.g + cost of edge from n to current
7.1,2: n.f = n.g + H(n)
7.1.3: n.parent = current
7.1.4: add n to the list if it isn't there already
Step 8: Stop
In the present research work, we have discussed about planning paths for robots
that moved on a grid. For grid-based problems, the Breadth First Search or Grassfire
118 S. M. Basha et al.
Algorithm, which search for the shortest path by exploring outward from a start
location. Finally, the A* Algorithm is a way to speed up the search for a shortest path
when you have an informative heuristic to guide the search procedure. The objective
of the present research work is to evaluate the performance of Robot Motion using
Gradient Generalized Artificial Potential Fields with obstacles. The focus is more on
the algorithms used. The steps that need to be followed in understanding the motion
behavior of ROBOT is the methodology followed in the present research work.
2 Methodology
The graph-based algorithms are important, since they serve as a basis for a wide
range of path planning procedures. The notion of configuration space allowed us to
think about the motion of the robot in terms of the motion of a point, moving through
the configuration space while avoiding the configuration space obstacles.
In the context of configuration space, we discussed about collision checking func-
tions that could be used to decide whether or not a given configuration would collide
with the workspace obstacles, thus providing an implicit description of the configura-
tion space obstacles and the complimentary free space. The plan path through contin-
uous configuration spaces are described using the methods like the visibility graph,
the trapezoidal decomposition, the grid-based approach. Each of these methods
represented a different approach toward capturing the structure of the continuous
configuration space with a discrete graph, so that it helps in applying standard graph-
based techniques like Dijkstra’s algorithm. To solve these path planning problems,
another important class of method is based on the idea of random sampling, in which,
the graph from randomly chosen samples in the configuration space, connected by
edges which represent collision-free trajectories. The Probabilistic Road Map algo-
rithm that sought the construct of a road map or skeleton of the free space and the
rapidly exploring random tree procedure, which constructs every evolving trees to
explore the free space and forge paths between the start and the goal state. Both
algorithms have the pleasing property that they work quite well in practice, even on
high dimensional configuration spaces. It helps to steer the robot through configu-
ration space by considering the gradient of this artificial potential field. A strength
of these potential field methods is that they are relatively simple to implement, and
they can often be carried out directly based on sensory input. The motion planning
complexity increases with increasing in the degree of freedom of the system.
A Study on Evaluating the Performance of Robot Motion … 119
The quantified positions of the robot can be traced with the help of a Tuple
composed of two numbers, tx , t y , which denote the coordinates of a particular refer-
ence point on the robot (red), with respect to a fixed coordinate frame of reference
as shown in the Fig. 3.
On the right-hand side of this figure, we plot the configuration space obstacle,
corresponding to the geometric obstacle shown in the left side of the figure. In this
case, the configuration space obstacle is defined by the Minkowski sum of the obstacle
and the robot shape [4]. The presence of multiple obstacles in space, can be visualized
by the union of all of the configuration space obstacles. All of the geometry of the
robot and the obstacles are captured by the configuration space obstacles.
The configuration space of the robot can be represented with a tuple tx , t y and θ ,
where tx , t y still denote the position of a reference point in the plane, and θ denotes
the applied rotational angle in degrees. In this case, the configuration space has
three dimensions, and the configuration space obstacles can be thought of as three-
dimensional regions in this space. The vertical access corresponds to the rotation θ ,
while the other two horizontal axes correspond to the translational parameters tx , t y .
It is a conceptual framework for thinking about a wide range of the motion planning
problem framed on a continuous configuration space and then use various approaches
to reformulate this problem in terms of a graph. It helps to apply the searching
algorithms like Grassfire, Dijkstra, and A*. In visibility graph the configuration
space obstacles are modeled as polygons [5], in which a node is associated with
every configuration space optical vertex. A problem that can be readily solved using
searching algorithms as discussed in introduction section.
120 S. M. Basha et al.
Algorithm 4: PRMAlgorithm
Step 1: Start
Step 2: For each node n in the graph
2.1: Generate a random point in configuration space, X
Step 3: if (X is in free space)
3.1: find the K closest points in the roadmap to X according to the Dist
function
3.2: Connect the new random sample to each of the k neighbours u sin g t h e
Localplanner procedure.
3.3: Each successful connection forms a new edge in the graph
Step 4: Stop
The PRM procedure relies upon a distance function. It can be used to gauge the
distance between two points in configuration space. This function takes as input the
coordinates of the two points and returns real number as shown in Eq. (5).
Dist(x, y) ∈ . (5)
The common choices for distance functions are as shown in Eq. (6)
A Study on Evaluating the Performance of Robot Motion … 121
Dist1 = |xi − yi |
i
Dist2 = (xi − yi )2 (6)
i
There are often cases where some of the coordinates of the configuration space
correspond to angular rotations. In this situation care must be taken to ensure that
the Dist function correctly reflects distances in the presence of wrapround as shown
in Eq. (7)
In the probabilistic roadmap procedure, the basic idea was to construct a roadmap
of the free space consisting of random samples and edges between them. Once that
had been constructed, connect the desired start and endpoints to this graph and plan
a path from one end to the other. In the first phase construct a generic roadmap of the
entire free space without considering any particular pair of start and end points. The
advantage of this approach is that you can reuse the roadmap over and over again to
answer multiple planning problems [7].
Here the red node depicts the new random configuration that the system gener-
ates, while y depicts the closest existing node in the tree. Make use of the same
Localplanner procedure to decide whether two points in configuration space can be
linked by a collision free trajectory. It turns out that this procedure for generating
random samples is very effective at growing trees that explore and span the free
space.
122 S. M. Basha et al.
Algorithm 5: RRTAlgorithm
Step 1: Start
Step 2: For each node n in the graph
2.1: add start node to the tree
Step 3: Repeat n times
3.1: if X is in free space using the CollisionCheck function.
3.2: find Y the closest points in the roadmap to the random configuration X
3.3: if (Dist , )
3.3.1:Check if X is too far from Y
3.3.2: find a configuration Z that is along the path from X to Y su ch
that Dist(Z,Y)
3.3.3: X = Z;
3.4: if (Localplanner (X, Y))
3.4.1: Check if you can get from X to Y
3.4.2: Add X to the tree with Y as its parent
Step 4: Stop
In this section, the discussion is on another approach to guide robots into obstacle
filled environments based on artificial potential fields [8].
An attractive potential function, Fa (X ) is constructed by considering the distance
x1
between the current position of the robot, X = and the desired goal location,
x2
g
x1
Xg = g as shown on the Eq. (8).
x2
2
Fa (X ) = ξ X − Xg (8)
Here ξ is constant scaling parameter. Note that the function value is zero at the
goal and increases rapidly as the robot moves away.
A repulsive potential function in the plane, Fr (X ), can be constructed based on a
function, ρ(X ) that returns the distance to the closet obstacle from a given point in
configuration space X as shown in the Eq. (9).
2
η 1
− 1
if ρ(X ) ≤ d0
Fr (X ) = ρ(X ) d0 (9)
0 if ρ(X ) d0
A really attractive feature of these artificial potential field based schemes is that they
are relatively simple to implement. In fact, they can incorporate into real time control
schemes running at tens of hertz using local sensor data. However, a downside of these
methods is that it can be very difficult to ensure that they will always work. Ideally,
our artificial potential function would only have a single global minimum located at
the desired configuration. In practice, there are situations where the attractive and
124 S. M. Basha et al.
repulsive forces conspire to produce local minimum at locations other than the desired
location. It turns out that these kinds of local minima are very hard to eliminate. One
way to view these artificial potential field based schemes is as a useful heuristic.
In many cases, they will successfully guide the robot to the desired configuration,
but they can get stuck in dead ends, which is often necessary to use a back tracking
procedure to detect these situations and to switch to a different planning strategy like
Generalizing Potential fields (GPF) [9], in which a control law to move the robot
by considering gradient of the potential field with respect to the configuration space
parameters as shown in Eq. (11).
⎛ f (x)
⎞
∂ x1
⎜ , ⎟
⎜ ⎟
⎜ ⎟
vα − ∇ f (X ) = −⎜ , ⎟ (11)
⎜ ⎟
⎝ , ⎠
f (x)
∂ xn
4 Conclusion
The graph-based algorithms are important, since they serve as a basis for a wide range
of path planning procedures. The notion of configuration space allowed us to think
about the motion of the robot in terms of the motion of a point, moving through the
configuration space while avoiding the configuration space obstacles. In the context
of configuration space, we discussed about collision checking functions that could be
used to decide whether or not a given configuration would collide with the workspace
obstacles, thus providing an implicit description of the configuration space obstacles
and the complimentary free space. The plan path is through continuous configuration
spaces methods like the visibility graph, the trapezoidal decomposition, the grid-
based approach. Each of these methods represented a different approach to capture
the structure of the continuous configuration space with a discrete graph so that we
could apply standard graph-based techniques like Dijkstra’s algorithm to solve these
path planning problems. Another important class of method is based on the idea of
random sampling, in which, the graph from randomly chosen samples in the config-
uration space, connected by edges which represent collision-free trajectories. The
A Study on Evaluating the Performance of Robot Motion … 125
description of two approaches for utilizing random samples is the Probabilistic Road
Map algorithm that sought the construct a road map or skeleton of the free space,
and the rapidly exploring random tree procedure, which constructs every evolving
tree to explore the free space and forge paths between the start and the goal. Both
algorithms have the pleasing property that they work quite well in practice, even on
high-dimensional configuration spaces. It helps to steer the robot through configu-
ration space by considering the gradient of this artificial potential field. A strength
of these potential field methods is that they are relatively simple to implement, and
they can often be carried out directly based on sensory input.
References
1. D. Sutherland, J.J. Sharples, K.A. Moinuddin, The effect of ignition protocol on grassfire
development. Int. J. Wildland Fire 29(1), 70–80 (2020)
2. A. Bozyiğit, G. Alankuş, E. Nasiboğlu, Public transport route planning: Modified dijkstra’s
algorithm, in 2017 International Conference on Computer Science and Engineering (UBMK)
(IEEE, 2017), pp. 502–505, 5 Oct 2017
3. H. Wang, Computer and cyber security: Principles, algorithm, applications, and perspectives.
https://doi.org/10.1201/9780429424878
4. N. Eckenstein, M. Yim, Modular robot connector area of acceptance from configuration space
obstacles, in 2017 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)
(IEEE, 2017), pp. 3550–3555, 24 Sep 2017
5. E. Taheri, M.H. Ferdowsi, M. Danesh, Fuzzy greedy RRT path planning algorithm in a complex
configuration space. Int. J. Control Autom. Syst. 16(6), 3026–3035 (2018)
6. G. Bitar, A.B. Martinsen, A.M. Lekkas, M. Breivik, Two-stage optimized trajectory planning
for ASVs under polygonal obstacle constraints: Theory and experiments. IEEE Access 2(8),
199953–199969 (2020)
7. J. Denny, R. Sandström, A. Bregger, N.M. Amato, Dynamic region-biased rapidly-exploring
random trees, in Algorithmic Foundations of Robotics XII 2020 (Springer, Cham, 2020), pp. 640–
655
8. F. Bounini, D. Gingras, H. Pollart, D. Gruyer, Modified artificial potential field method for
online path planning applications, in 2017 IEEE Intelligent Vehicles Symposium (IV) (IEEE,
2017), pp. 180–185, 11 Jun 2017
9. N.K. Al-Shammari, T.H. Syed, M.B. Syed, An edge–IoT framework and prototype based on
blockchain for smart healthcare applications. Eng. Technol. Appl. Sci. Res. 11(4), 7326–7331
(2021)
Exploratory Analysis of Kidney Disease
Data Set—A Comparative Study
Keyword Data mining · Chronic kidney disease · Decision tree · Random forest
1 Introduction
Nowadays, kidney disease patients are more in India. The functioning of it and causes
associated with it are found in large scale in the nearby area. The main interest is
to identify the major parameters that can cause chronic pains to patients. Therefore,
developing a model to identify the disease can help the people to take precautions
by identifying symptoms and overcome from the chronic disease earlier stage.
Various researcher works on the problems with different perspective. Some of
the reviews are: Ahmed et al. [1] discovered that chronic kidney disease (CKD) is
generally found in South Asian and in black skin people as compared to general
population. They observed that it is due to diabetic people in south Asia and having
maximum risk. Apart from that, other issues, viz. blood pressure, heart problems,
family suffering from the same disease, it is more frequently observed during the age
group of 60 and above. Data mining can be defined as a process of digging out until
that time unidentified compelling and actionable information from big data and then
using the information so derived to make vital tasks in industry and tactic judgment
[2].
A. Muley (B)
School of Mathematical Sciences, Swami Ramanand Teerth Marathwada University, Nanded,
M.S. 431606, India
S. Joshi
Nutan Maharashtra Institute of Engineering and Technology, Talegaon Dabhade, M.S. 410507b,
India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 127
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_10
128 A. Muley and S. Joshi
Priyadharshini et al. [3] used K-nearest neighbor (KNN) and logistic regression
model to diagnose CKD. Their aims of the study were to identify the missing compar-
ative estimations qualities from the data set. They figured out six AI calculations,
viz. strategic relapse, irregular backwoods, uphold vector machine, KNN, credulous
Bayes classifier, and neural networks utilized for proposed models. Selvarathi et al.
[4] aim to detect and diagnose CKDs, which mainly include the kidney stones, cystic
kidney disease, and suspected renal carcinoma. Histograms of directed gradient func-
tion and the KNN algorithm were used to identify the chronic kidney diseases. The
multi-layered convolution neural network (CNN) architecture was applied for kidney
disease classification. Further, batch prediction approach has been tested for CKD
forecasting. [5–11] used an open-source data set which contained numerous health-
related characteristics associated with fitness and results of the analytical investi-
gation. They analyzed with the help of a prediction model for creatinine and the
risk of CKD. Further, various machine learning algorithms were applied to evaluate
achieving model accuracy.
Rezapour [12] applied supervised at mining algorithms for the classification of
variables and helps to identify the influential factors. Decision tree algorithm was
mostly focused and applied on input data. The study results reveal that the risk of
stroke in patients whose vascular access surgery was performed by catheterization
before fistula was 84.21%. Padmanaban and Partiban [13] aimed at the detection of
patients through classification algorithms. Naïve Bayes and decision tree methods
applied forgetting accurate results and give better recitals to measure its parameters
and sensitivity. Chetty et al. [14] proposed classification models for the prediction
and categorization in the form of CKD and non-CKD cases with high accuracy.
Comparative analysis was performed among the selected classification model. Sinha
and Sinha [15] performed comparative study related to CKD cases and classified
it with support vector machine (SVM) and KNN techniques to get accurateness,
exactness, and completing time selected data set.
Chorasia et al. [5], Pasadana et al. [8], Senan et al. [10] forecasted kidney disease
issues by performing pre-processing operations variables and discovering missing.
Comparative analysis was performed among the selected model, and predictive
analytics models were assessed on the basis of exactness of forecasting [16]. Gharb-
dousti et al. [17] applied several machine learning-based classification algorithms on
400 observations and 24 attributes. Data set was preprocessed, missing values were
filled via means of nominal features and replaced by mode for categorical features;
further, the data set was normalized to have a unit scale for all data. Algorithms,
viz. decision tree, linear regression, SVM, naive Bayes, and neural network, were
compared on the basis of correlation matrix features. Tazin et al. [18] predicted the
presence of kidney-related issues through machine learning algorithm on a data set
collected from UCI repository. These algorithms were analyzed on the basis of their
model precision measures and receiver operating characteristic curve with Weka data
mining tool [19].
The major objectives of our study are: (1) To study and compare best classification
technique for CKD data set among decision tree and random forest and (2) to identify
the most responsible factors which cause a person to suffer CKD?
Exploratory Analysis of Kidney Disease Data … 129
2 Methodology
In this study, secondary data associated with CKD was taken from the Kaggle website
[20–22]. The parameters used are gathered in Table 1, and respective features are
explored in Table 2. Further, data mining techniques and neural network are implanted
on the data set. Microsoft Excel (2016) and Free and open-source R software (3.2.2)
are used along with rattle package to analyze the data.
Here, the missing is reinstated by arithmetic mean and mode respective attributes.
So for further analysis, we have considered the mean value for missing observations
and only numerical attributes have been used. Classification is an important issue
in the decision-making process. Appropriate classification gives accuracy for multi-
stage decision-making processes. Here, decision tree and random forest methods
were used for the classification of the data. In the decision tree, the main part is the
root note which divides the data into two or more homogeneous parts. It splits the
data into sub-nodes. Decision node splits into the subsequent nodes, and finally, it
stops the split and that will be our leaf node.
Random forest is a famous ensemble method used to develop predictive modeling
for classification as well as regression kind of problems. It minimizes correlation
issues by selecting subsamples of the features at every division. This method enhances
the classification prediction results. Further, both decision tree and random forest
results were compared with their area under curve (AUC) for train set, test set,
validation, and whole data set. The performance of the model is dependent on its
consistent AUC which can give more precise classification.
Here, we dealt with kidney disease data by decision tree and random forest
exploratory models in data mining. Initially, we have used 5 ways to find missing
observations of collected data. Further, it has been compared with others and
represented in Tables 3 and 4. The results are as follows:
Table 3 explores that decision tree for mean shows minimum error for numerical
attributes, so we choose it as best. The results obtained from decision tree for mean
is as follows:
Table 3 Missing observation errors obtained through different ways for decision tree
Test Mean Mode Nominal R nominal R numerical
Validation 0 1.6 5 6.6 1.6
Training 3.6 2.9 2.5 3.6 2.1
Testing 8.3 8.4 6.7 6.6 8.3
Full 3.7 3.5 3.6 4.5 3
Table 4 Missing observation errors obtained through different ways for random forest
Mean Mode Nominal R nominal R numerical
Validation 0 0 1.6 1.6 1.6
Training 0 0 0 0.7 0
Testing 5 3.4 5 5 6.6
Full 0.7 0.5 1 1.5 1.3
Exploratory Analysis of Kidney Disease Data … 131
Figure 1 represents the decision tree, and it illustrates that hemoglobin is the
most responsible factor for causing CKD. Based on observations, hemoglobin level
in blood is observed to be less than 13; therefore, there will be a 56% chance of
causing CKD. Remaining 44% of the patients, 3% of the patients having blood
glucose random (bgr) content ≥157 may suffer from CKD and the rest of 41% of
the samples; further it is observed that, serum creatinine (SC) level ≥1.3 may suffer
and 37% of them do not having the CKD problem.
Table 4 explores the results obtained through various comparison to perform the most
suitable way to get more accurate results.
Figure 2 represents features of variables and its presence that is responsible for
causing CKD. The important variables that play key role on the classification of
patients, i.e., to view the relative importance of each variable, can be identified.
Hemoglobin, bgr, sc, pcv, sod, and rc are the most important factors causing CKD.
Figure 3 explores the error rates obtained through the random forest method for
CKD, non-CKD, and out-of-bag (OOB) bootstrap sample. With the CKD sample, it
132 A. Muley and S. Joshi
CKD MeanDecreaseAccuracy
wc
bp
pot
age
bu
sc
bgr
sod
pcv
rc
hemo
MeanDecreaseGini notckd
wc
bp
pot
age
bu
sc
bgr
sod
pcv
rc
hemo
OOB
CKD
notckd
0.10
0.08
Error
0.06
0.04
0.02
trees
gives least error as compared to others and it progressively helps to develop optimal
number of trees. It clearly shows that 500 trees were generated. Figure 4 represents
the OOB ROC curve for random forest. From this curve, we get the AUC value for
random forest which is 0.992. Hence, the accuracy of random forest is 99.2%. This
ROC plot is based on OOB predictions for each observations in the training data.
Table 5 represents the confusion matrix, overall error occurred during evaluation
and the values obtained through receiver operating characteristic (ROC) curves for
the decision tree and random forest methodology. We get the subsequent area under
curve (AUC) values of validation, training, testing as well as full data. The result
reveals that random forest gives the better result as it gives accuracy value of 100%
for almost all data split as well as overall data. Also, the overall error is less than
the decision tree model. Hence, random forest gives the most precise result with the
overall accuracy of 1 or 100%.
4 Conclusions
In our study observed that, hemoglobin, blood glucose random, serum creatinine,
packet cell volume, sodium, and red blood cell count were found most responsible
attributes for causing CKD. It can be useful in predicting the presence of CKD for
that patient. It will be helpful to reduce the number of other medical tests of blood
134 A. Muley and S. Joshi
content. For full data, decision tree gives 97% accuracy and random forest gives
100% accuracy. Random forest gives more accurate results than decision tree with
accuracy 99.2%. The results were compared with [5, 7–10, 16], and the most suitable
method is observed as random forest. Most of the researchers were used Weka as
analytical tool for their study but we have preferred R software, and it gives more
accuracy to classify our results. The exploration of the results through visualization
with this R software is a special feature observed to understand the analysis of the
data. In a nutshell, random forest gives us more accurate results, hence in the future,
with the help of this model we can deal with the classification of the data set of
chronic and non-chronic disease patients.
References
Abstract Covid-19 pandemic led to remote working and hence resulting in more
video conferences among all sectors. Even important international conferences
between different nations are being conducted on online video conferencing plat-
forms. Hence, a methodology capable of performing real-time end-to-end speech
translation has become a necessity. In this paper, we have proposed a complete
pipeline methodology, wherein the real-time video conferencing will become inter-
active, and it can be used in the educational section for generating videos of instruc-
tors from just their images and textual notes. We are using automatic voice trans-
lation (AVT), text-to-stream machine translation (MT), and text-to-voice generator
for voice cloning and translation in real time. For video generation, we use general
adversarial networks (GANs), encoder-decoder, and various other previously imple-
mented generative models. The proposed methodology has been implemented and
tested with some raw data and is quite effective for the specified application.
1 Introduction
Eventually, Covid-19 led us to shift toward online (remote) working mode, with a
drastic rise in handling remote meetings and educational events. There may be many
language barriers in these processes, so removing these barriers are also a key task.
For example, remote interaction is happening between two employees (people) from
different countries. They need translators to understand the language, which is a time-
consuming process. A crucial aspect of removing these translators is translating the
speaker’s sentences into the listener’s language and then correcting the lipsyncing
with the translated video.
The first attempts at creating a talking face with desired audio relied heavily on
lip landmarks for the speech representation of a single speaker, which took hours to
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 137
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_11
138 J. Shelar et al.
train. According to our problem statement, these works are ideal for a single speaker,
but we need a more generic model to produce real-time lipsynced videos.
By exploring such speaker-independent approaches, we got SyncNet discrimi-
nator network which performs quite well for syncing lips with the audio [1]. Lipsync
expert is well trained previously on the various videos of news anchors, which will
be useful for error correction while training our deep fakes generator network [1].
Generally, a sequential model is used in any speech-to-text conversion, but since
they are computationally intensive; we attempt to use deep convolutional neural
nets based on a spectrogram of the audio data provided. Recurrent neural networks
(RNNs) take a long time to train, whereas DCNNs are usually very fast and require
less computation power and time [2]. An alternative neural text-to-speech (TTS)
based only on convolutional neural networks (CNN) alleviates these economic costs
of training. Due to the advanced features like high parallelizability, the CNN-based
TSS model’s performance is much optimum than the RNN or gated unit models [2].
We concentrate on an issue with the recovery of the original signal over the noise
generated by the system in the further down sampling and up sampling process; the
percentage recovery is 15% downsampled to 50% original audio.
Initially, for generating spectrograms, the main component is available data and
related datasets. So, the first step is preprocessing; the data generating natural speech
from the text (TTS) is a heavy computing task even after the availability of the
powerful machines to process the incoming data on the model and the complex time-
series nature of the data [3]. Generating the spectrogram network is difficult because
none of the algorithms guarantees a globally optimal solution at low computational
complexity. By using various models, we get better results of audio.
Keeping in mind the future scope of this idea, these limitations can be eliminated
by working on a model of multi-speaker acoustic space [4]. This enables them to
generate the speakers’ voice that has never been heard before during the training
process. Also, a reduction in the training data sample can significantly save computing
resources that can be further used for other purposes [3]. That is why training the
network with few samples can be helpful, but it is the tradeoff that comes at the cost
of the accuracy of the output generated by the model.
2 Literature Survey
A lot of work has been recently done in this area after the introduction to general
adversarial networks. But, before that some successful experiments are also being
done using encoder-decoder networks.
Deepfakes for Video Conferencing Using General Adversarial … 139
One of the most interesting approaches to this problem is the Speech2Vid model [5].
Speech2Vid model successfully generates a talking face just from the audio input and
an image. It consists of two encoders, namely audio encoder and face encoder with
an image decoder and deblurring module, which refines the generated output [5].
The deblurring CNN module works similarly as in the medical images as suggested
in [6, 7]. The model uses skip connections in the network to preserve the identity.
Since there is no discriminator network to aid in improving encoders, the learning
of the model is not robust.
After the Speech2Vid model [5], many other approaches are made to accurately
lipsync the videos, but many of them are trained on some limited set of vocabulary
and identities. This restricts their use in real-time applications such as what we are
discussing for live video conferences.
The introduction of general adversarial networks (GANs) gave a new direction to
this research area leading to one of the most effective models called Wav2Lip [8].
The advantage of having a SyncNet discriminator helps the model in the learning
process, which is a big difference from the previous Speech2Vid [5] model. Even
after comparing with the most effective models, Wav2Lip gave far better results in
standard datasets used for performance measures like LRW, LRS2, and LRS3.
The discriminator works as a classifier in GAN’s setup. It is used to get the difference
between real data from the data created by generator.
Supasorn et al. [9] focused on the different steps of learning to create a new
deepfake for Barack Obama. Audio of Obama passed to the model, and the model
will create a talking video of Barack Obama. They trained their model for hours on the
videos of Barack Obama of his weekly address footage. But, when it comes to making
deepfake of another person, it lacks because the model needs to be trained again on
new person videos and images for hours. Recently [10], Prajwal KR and Rudrabha
Mukhopadhyay used discriminator in GANs setup, resulting in 56% accuracy on the
LRS dataset. They used a single frame for checking the lipsync with respective audio
given as input.
Joon Son Chung and Andrew Zisserman [5] created their own dataset for checking
Lip synchronization; it consists of several hundred hours of speech from the BBC
news channel. SyncNet architecture takes 0.2-s clips of both the audio and video.
Further, it divides into two stream, i.e., video stream and audio stream. While training
the model, both streams were trained simultaneously, which gave 92% accurate
lipsync error.
The most recent work related to generating deepfakes used SyncNet for training
the generator network. This approach generates impressive deepfakes because the
140 J. Shelar et al.
previously well-trained SyncNet network helps the generator to detect lipsync errors
more efficiently [8].
2.3 Translation
Following our analysis, we discovered that we were able to achieve good results using
an end-to-end speech from text method called Tacotron [12], which directly estimates
a spectrogram from an input text. In the text-to-speech system, mel spectrograms
need to be exactly in audible representations. According to [13], we combine the
spectrograms using the subsequent two forms of networks text to mel, which creates
mel spectrogram from text input and spectrogram super-resolution. Spectrogram
network (SSRN) creates a whole short-term Fourier transform spectrogram from a
group of knowledge.
Sharma et al. [14] proposed a fast Griffin-Lim algorithm (FGLA) approach, which
uses a vocoder in the speech synthesis phase. FGLA approach is tested on LJSpeech,
Blizzard, and Tatoeba datasets. The performance is evaluated on synthesis delay and
speech quality.
3 Proposed System
3.1 Generator
Generator is used to generate deepfakes of the user which has mainly two
componenets: audio encoder and identity encoder. Sometimes a deblurring module
can also be used to clean the generated video frames.
Deepfakes for Video Conferencing Using General Adversarial … 141
Fig. 1 Our approach is to make real-time deepfakes generation with accurate LipSync and multi-
lingual voice cloning. The proposed system takes Person (P1)’s image and a sentence in a language
in La. Then, it extracts text from given audio followed by a language translator, translating text from
language La into language Lb. The translated text sentence will be cloned into Person (P1)’s voice
using super-resolution spectrograms. The output of speech-to-speech will be given to deepfakes
generation along with the Person (P1)’s image/video, realistic real-time talking video of Person
(P1) will be created using a generator which is trained using SyncNet model [1]; the SyncNet is a
well-trained model for several hours on recorded videos from BBC news channel and vice versa
for Person (P2)
A VGG-M of 112 × 112 × 3 is used to extract features from images, i.e., encoding
features can also be used for dimensionality reduction. As specified in [5], an image
deblurring module is integrated with the pipeline. It is a CNN inspired by the very
142 J. Shelar et al.
Face encoder now consists of pose information with ground truth [8] which gets
concatenated channel wise. The final output from the face encoder is then passed
over to the discriminator network, which sends back the reconstruction loss as the
feedback to the generator network, which then constructs the new facial frames of
dimensions H × H × 6.
3.2 Discriminator
1
N
E= (yn )dn2 + (1 − yn ) max(margin − dn , 0)2 (1)
2N N −1
In Fig. 2, based on contrastive loss between video and audio streams, the model
selects whether that pair is valid or not. If the distance between the streams is smaller,
Deepfakes for Video Conferencing Using General Adversarial … 143
the loss will be calculated and help the model learn better. If the pair of the streams
are far from each other, simply ignoring the pair must be false [1].
[1] Training time for lipsync discriminator: 29 h (Optimizer = Adam, Batch Size
= 64, Tv = 5) and initial learning rate = 1e-3.
Text2Mel is one of the best techniques to generate spectograms from text input which
can in turn be converted into audio.
3.3.1 Text2Mel
In this network, four main core components of the architecture are input encoder,
waveform encoder, audio tuning attention mechanism, and waveform decoder. It
synthesizes coarse spectrograms from input taken by the user.
• TextEnc: It takes text as an input. So, I = {i,.., iN} ∈ CharN where N is no. of
character, consisting of two matrices K(key), V (value) ∈ R(d × N). Therefore,
(K, V ) = TextEnc(I).
• AudioEnc: It encodes rough mel spectrogram. We get result as Q =
AudioEnc(S1:F,1:T ), where T is the length of the speech.√
• Attention: Attention matrix A = softmax n-axis(K TQ/ d). Defines evaluation
n-th character of iN and e t-th mel spectrum are related.
• AudioDec: AudioDec estimates mel spectrogram from seed matrix R’ where R’
= [R, Q]. We get the result as Y 1:F,2:T + 1 = AudioDec(R), and then, Dbin can
be calculated as shown in Eq. (3).
• The final result we get,
144 J. Shelar et al.
Yft 1 − Yft
Dbin (Y |S) := E f t −S f t log − 1 − S f t log (3)
Sft 1 − Sft
The overall purpose of generating super accurate spectrograms over the SSRN
network is to pass that SSRN data through a batch sampler of downsampling and
upsampling, which affects the quality of training data. From very recent times, the
idea of backpropagation has been floating; however, our model is feedforward. In
the second step, the SSRN network model is trained on the subsequent pairs of low
and high-quality data of mel-course spectrograms.
In audio super-resolution using neural nets paper [13], by generating the mel
spectrogram using spectrogram super-resolution network (SSRN), we can create the
coarse spectrograms that are more sharp than the original waveform. Upsampling
frequency is achieved by quadrupling the length of the sequence from T to 4 T =
T 0, by applying “deconvolution” layers of size twice.
The following are the different ways of voice cloning encoder, vocoder, synthesizer.
Generally, the audio waveform is represented by the function s(t): [0, T ] R, where T
is the time period of the signal (in seconds), and s(t) is the amplitude at t. [2] The
sampling should be done only on the discretize space, so first, we have to discretize
any continuous wave data present like 1, 2,.. RT where R is the sampling rate of ×
(in Hz) range from 4 to 44 kHz. To represent the pitch and modulation of one wave
as compared to others short term, Fourier transforms are used along with inverse
and baseline long-term Fourier transform, respectively, which can further be used
as distinction metrics for the sample input training [2]. By using deep convolution
neural networks (DCNN), we can eliminate the disadvantage of RNN that is over
immediate synthesis of high-resolution data. This is acheived by the staged synthesis
of data as compared to sequential data incase of RNN.
Each convolution, batch normalization, and ReLU non-linearity block in our model
perform a dilated convolution, dense neural backpropagation, and ReLU activation
Deepfakes for Video Conferencing Using General Adversarial … 145
at the end layer. Due to frequent downsampling and reduction metrics in the original
data, fixed dimensional spectrograms get overtrained on different parts of the atten-
tion module which further helps in quickly training the model over the small set of
data by separating and using the noise generated as the feature set.
The diametric and diagonals are halved, and the beta mask from the previous
spectrogram attention size is multiplied during a downsampling operation; this is
altered during an upsampling step [13]. Auto-encoders influenced this bottleneck
architecture, which is considered to allow the model to learn a hierarchy of functions.
Input text will be taken and converted into a mel-coarse spectrogram and fed to
the deep convolutional text-to-speech (DCTTS). The model then returns the output
waveform which is the time series data and gets further encoded as the audio of the
targeted user.
4 Results
These are some of the experimental results after a rough implementation of our idea.
It takes image of a speaker and then generates video frames according to the audio
that is passed as an input as shown in Figs. 3, 4 and 5, and the results are represented
in Tables 1 and 2.
Fig. 3 Generating video frames from a single image and audio spectrograms from language La
146 J. Shelar et al.
Fig. 4 Generating video frames from a single image and audio spectrograms from language La
Fig. 5 Generating deepfake video with accurate lipsync on language Lb using input video and
audio of language La
Table 1 [8] “Lip-Sync Error-Distance” (lower is better) and “Lip-Sync Error-Confidence” are two
new metrics suggested, in which the lipsync accuracy in unconstrained videos can be accurately
measured
LRS2 [16] LRS3 [17]
Approaches LSE-D LSE-C FID LSE-D LSE-C FID
SPEECH2VID [5] 14.23 1.587 12.32 13.97 1.681 11.91
LIPGAB [10] 10.33 3.199 4.861 10.65 3.193 4.732
WAV2LIP [8] 6.469 7.781 4.446 6.986 7.574 4.350
Realistic videos 6.736 7.838 – 6.956 7.592 –
Notice that we only train on the LRS2 train set [1], but we can generalize to any dataset without
difficulty
5 Conclusion
Table 2 [2] As you can see from the table, the result see the sharp rise in the accuracy till the break
even point which is somewhere after 40 h in this case as the attention module start noising the whole
sample and overall accuracy keep on decreasing, but however the variance of the sample keeps on
decreasing which suggests the confidence validation of prediction of speech note to be right
Time for training (TFT) Tacotron (RNN based) Wavenet (DCNN based)
MOS (95%CI) MOS base confidence Variance (%)
12 Days ~ 270 + hrs 207 – 15
2h – 174 92
7h – 261 84
15 h – 271 37
40 h – 254 41
and corporate meetings. The techniques used previously have been studied, like
the Speech2Vid model with encoders and decoders and then Wav2Lip with GANs
which gives better results. Also, deep convolutional neural networks (DCNNs) give
better results for voice cloning than sequential models like LSTMs and RNNs with
GRU with the help of mel spectrograms [18]. We have created a pipeline with these
existing effective techniques which is best in the business. We have also analyzed
our results based on the real-time data samples which show amazing results. There
is a lot of scope to improve these techniques further and reduce the computational
power required as the application discussed regarding video conferencing requires
real time and faster results.
References
1. M. Baldonado, C.-C.K. Chang, L. Gravano, A. Paepcke, The Stanford Digital Library Metadata
Architecture. Int. J. Digit. Libr. 1, 108–121 (1997)
2. H. Tachibana, K. Uenoyama, S. Aihara, Efficiently trainable text-to-speech system based on
deep convolutional networks with guided attention in IEEE april (2018). arXiv:1710.08969
3. S.O. Arik, J. Chen, K. Peng, W. Ping, Y. Zhou, Neural voice cloning with a few sample, in
IEEE ICASSP (2016). arXiv:1802.06006
4. G. Ruggiero, E. Zovato, L. Di Caro, Vincent Pollety. Voice cloning: a multi-speaker text-to-
speech synthesis approach based on transfer learning, in IEEE International Conference on
Acoustics, Speech and Signal Processing (ICASSP) arXiv:2102.05630
5. J.S. Chung, A. Jamaludin, A. Zisserman, You said that? arXiv preprint (2017). arXiv:1705.
02966
6. S. Shinde, U. Kulkarni, D. Mane, A. Sapkal, Deep learning-based medical image analysis using
transfer learning, in Health Informatics: A Computational Perspective in Healthcare. Studies in
Computational Intelligence, vol. 932, eds. by R. Patgiri, A. Biswas, P. Roy. Springer, Singapore
7. S.S. Mane , S.V. Shinde, Different techniques for skin cancer detection using dermoscopy
images. Int. J. Comput. Sci. Eng. (2017), ISSN-2394-5125
8. K.R. Prajwal, R. Mukhopadhyay, Namboodiri, P., C.V. Jawahar. A lip sync expert is all you
need for speech to lip generation in the Wild. In Proceedings of the 28th ACM International
Conference on Multimedia (MM’ 20) (2020). arXiv:2008.10010
148 J. Shelar et al.
Abstract In the modern context, as the data generated is exponential, finding mean-
ingful patterns from large datasets is an urgent need. A ‘Topic Evolution Model’
could generate the evolutions related to a topic of user interest and assist in the
exploration of patterns. In a generic setting, the proposed ‘topic evolution model’
assist researchers and domain experts for the relevant information extraction on
scientific field progress and innovations in technological field or domain from large
archives. The evolution patterns uncover the emerging, decay/fading, peculiar, and
long-lasting research topics and subtopics. The performance evaluation on coherence
metrics asserts that the proposed model significantly minimizes the domain expert
user efforts in topic analysis, as evolving patterns easily reveal underlying statistical
and machine learning details. The perplexity metrics highlights the capability of the
topic model toward the cognitive view of the user, i.e., change of ideas and knowledge
through a period of time reducing the citation bias.
1 Introduction
In the recent years, increasing demand for tools from domain experts and research
fellows that can help them in extracting the information from large database of
documents or corpus to get the information about scientific discoveries and new
innovations resulted in research of evolution patterns, as such revealing these mean-
ingful patterns of evolution from the corpus of documents, research papers and scien-
tific articles has many applications which will help in synthesizing datasets across
different domains like scientific research, historical events and work of literature [1].
While searching, navigating and seeking the specific information in document corpus
of research papers and articles, the ability to find and identify topics with their time
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 149
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_12
150 H. Adhav and V. Singh
of emergence and see their evolution pattern over time could be of significant help
to the system user.
Example: Let us consider a scientific paper document corpus and a researcher
or domain expert who starts his research in a peculiar or specific area or field.
They would want to quickly overview the areas, determine how these topics in the
area/domain have evolved, and find important ideas and the papers that introduced
them. After finding a specific concept in the paper, they want to learn that whether
there were previous papers or articles that discussed the same ‘concept’ or the topic
is new and emerging one.
An information search task often caused or induced by the changes happening in
the pattern of the information objects. Therefore, the goal of a topic evolution model
and evolution graph generation is to track the changes in search-related topics, either
discovered or retrieved in different time and applying temporal similarity between
topics aligning these topics discovered in different time periods [2].
The science evolution patterns generated will help philosophers and historians of
scientific domain to test their theories they have theorized with actual data patterns;
researchers can also make their research work in emerging scientific area [3]. Most of
the policy makers will be able to support innovative ideas and get key indications for
their decision-making processes. Understanding how different topics, ideas, innova-
tions in scientific domain and its literature evolve, diversify or integrate over a period
of time is a very interesting and important problem which will lead to the genera-
tion of very interesting evolution patterns of themselves. Also, at present, there is an
increasing demand for tools from domain experts and research fellows that can help
them while extracting the information from large database of documents or corpus
to get the information about scientific discoveries and new innovations [2].
In case of evolution on scientific literatures through years has two views, i.e.,
cognitive view and social view on evolution patterns. The cognitive view refers
to change of ideas and shared knowledge, but social view deals with authorship
and social interactions. In the existing research works researchers adapted both the
document content as a bag of words and citations in it, i.e., impact of one author’s work
on another one as discussed in cognitive and social view. Though, the philosophy has
limitation that work of individuals who are not related to large institutes gets ignored
creating ‘social’ bias. Due to this, to reduce the social bias, the proposed algorithm
considers cognitive view to model the interactions between ideas independently
rather than social interactions.
Topic evolution graphs generated through the proposed models efficiently track
the evolution of scientific literature by identifying and analyzing evolution patterns,
e.g., emergence, long-lasting, and fading of research topics through ages or the split
of on topic into several subtopics within a domain or inter-domain, etc. The aim
of generating these topic evolution graphs is to outline complex temporal changes
by each topic discovery and finding similarities between different topics via times
epochs.
Topic Evolution Model for Interactive Information Search 151
An information search task often induced or caused by the changes happening in the
pattern of the information objects. Therefore, the goal of a topic evolution model
is to track the changes in search related topics, either discovered or retrieved, i.e.,
extracted in different time applying temporal similarity between topics aligning these
topics discovered in different and equally spaced time periods. The design challenges
faced by this ‘topic evolution pattern’ generation process are still trivial and need
to be considered critically so that a generic model can be effectively designed. The
following are the research questions (RQs) that are conceptually designed to conduct
the work:
RQ-I: How ‘topic evolution model’ steers the user search on untraceable topics?
RQ-II: What are the inherent parameters of topic modeling that affects the
generation of topic evolution model?
The key contribution is to develop a generic framework to assist domain experts or
researcher’s information search and exploration over the voluminous data archives.
Further, investigate the evaluation metrics and criteria to find out the evolving, fading
and long-lasting topics during different time periods.
To tackle all the challenges and achieve the goals proposed for this research contri-
bution, we considered many topic modeling methods like latent Dirichlet allocation
(LDA), latent semantic analysis (LSA), probabilistic latent semantic analysis (PLSA)
and hierarchical Dirichlet process (HDP). All these are powerful tools for the topic
modeling with their advantages and disadvantages. LDA is an unsupervised learning
model for topic modeling. For our experimentation purpose, we chose LDA as it
is the most reliable and suitable technique with which we can control how many
topics are to be generated which is one of its advantages, so we can scale the size
and pattern complexity of the topic evolution system.
The document corpus of research articles extracted from different sources, e.g.,
Google Scholar [4], arXiv.org [5]. The relevant information, e.g., abstract, publi-
cation year, are also acquired. The topics generated by LDA are aligned over the
different time zone, using the Jaccard similarity measure, as it can be directly adapted
with LDA for model generation.
152 H. Adhav and V. Singh
2 Related Works
Topic evolution has attracted fast growing interest in the information retrieval
community for different evolution models with great efficiency. Existing research
efforts of researchers significantly adapted both document as bag of terms or words
and author citations for the purpose topic evolution.
In recent work, Andrei et al. [11] generated topics by a hierarchical Dirichlet process
(HDP) model and uses ‘Bhattacharyya Similarity,’ representing the gradual specia-
tion and convergence similar to biologic evolution, for identifying topic alignments.
Jo et al. [12] built model on the premise that the words which are relevant to a
topic are distributed over documents such that the distribution is correlated with the
underlying document network such as a citation network.
Tong et al. [13] used LDA topic modeling and Jensen–Shannon divergence for the
text mining research whose main task is effectively searching the terms, managing
text patterns and exploring the retrieved text data. They implemented this over the
Wikipedia articles and users’ tweets for identifying the important topics so that
system can be optimized based on the topics for relevant information search. Simi-
larly, Salatino et al. [14] developed an idea of using the ‘Computer Science Ontology’
to model the research topics which introduced the new approach for early detection
of the research topics using the Rexplore system. They applied a clique percolation
method (ACPM) developed for analyzing and evaluating the dynamics or changes
happening between the existent topics.
Topic evolution is an interesting field for information search as data deluge. A
topic evolution from document corpus can optimize the information search. He et al.
[6] implemented topic evolution model in scientific literature with the help of the
citations and explained how it will help the system on the basis of the citation network.
They used the citation-aware approach to generate new and unique inheritance topic
model. Chaudhuri et al. [15] created a research paper and article recommendation
system which with topic modeling used hidden feature identification methods to
further improve the search system. They extracted hidden features like keyword
diversity, sentence complexity, citation analysis and scientific quality measurements
which helped in further improving the recommendation system.
Currently, there are no known systems exist with the capability to generates the
topic evolution patterns, though which may help researchers and domain experts in
their work. The main objective of this paper is to demonstrate the designed system
for the needs discussed. We assert that the proposed system will be a pivotal work
to enable the knowledge discovery using evolving, long-lasting, fading information
patterns in the topic evolution.
topics, fading topics of research and second to highlight the split of research topics
into related subtopics. Eventually, this model will steer the domain expert’s topic
analysis without indulging into underlying statistical and machine learning.
The designed model solely dependent on cognitive view of scientific literature; it
will not be affected by social bias, i.e., citations making sure that evolution pattern is
only dependent on the change of ideas, innovations over time. We have successfully
developed a generic topic evolution model for information search. The pivot evolution
graphs will be able to elaborate on the particular topic interesting to the seeker.
The goal of the ‘topic evolution model for information search’ will be to generate
topic evolution of graphs and be able to represent and filter them so that researchers
and domain experts can use them as they like. The proposed generic framework
enables extraction of meaningful topic evolution patterns for period of time and poten-
tial co-topics. Each topic is divided into long-lasting, peculiar, evolving and fading
terms on the basis of topic labeling. Figure 1 illustrates workflow of proposed ‘topic
evolution model’ document corpus as input first split into several time periods and
evolution patterns as output. Before splitting, each document falling under specific
duration is preprocessed and cleaned from irregularities.
The elementary processing is achieved using LDA [8] with an objective to generate
the topics set, which will be subsequently aligned by Jaccard similarity [9] to produce
the topic evolution graph Gλ for alignment threshold λ. This threshold λ divides
the global evolution graph into graph having topics more than alignment threshold.
Further, these graphs can be split into pivot topic evolution graphs on the basis of
topics t 1 , t 2 ,…, t n . The proposed topic evolution model for information search consists
of two major phases: Topic Extraction and Topic Evolution and Graph Generation.
The topic extraction is a pre-requisites for the topic modeling; Fig. 2 outlines the
inherent steps of proposed topic extraction from document corpus. The document
corpus is acquired from Google Scholar and other similar sources. The preprocessing
is adapted for de-noising the data. We will consider a corpus C of time-stamped
documents, a set of periods P and a set of terms V (vocabulary or dictionary). We
will divide whole document corpus between different time periods.
As the documents corpus is needed to be split into the different set of documents
for different time periods, these documents will be split on the basis of the document
frequency of documents published in the particular period by observing the distri-
bution of documents per year using bar graphs. As the distribution of documents
is known, we can decide the split period such that the distribution of documents
per period is sufficient and equally distributed to train the model in an effective and
efficient way. As for selecting the period, if the user wants to see small evolution
changes in topics they will select the small split period like 2–3 years while the users
who wants to explore the larger evolution changes in topics will select the larger
split period like 5–6 years because larger the period, documents related to a partic-
ular topic will be more making sure that more dominating changes and topics to be
extracted by the topic evolution model.
The extraction of the different topics from document corpus by topic extraction
method, LDA. The extracted topics will be stored in weighted term vectors with their
time periods. This topic description and period tuple will be weighted vector where
each topic description will be terms extracted from corpus C.
⊆ t.k and t.k f ⊆ t.k be the subsets of past and future terms which appear, respectively,
in the ancestor or parent topics and in the descendant or child topics of t.
Each document within duration passes through topic extraction method to identify
potential topics and aligned to single graph Gλ with threshold λ. Threshold λ strongly
influences the evolution graph complexity, as higher λ value generate linear graphs
with isolated topics and lower results in complex graphs with interesting topic.
The number of topics generated is vital, as large document corpus may have ‘number
of topics,’ or generated duplicate topics or small number of topics. The measure of
coherence score really helps optimize the number of potential topics. Optimizing
a topic model for perplexity may not generate human interpretable topics, though
perplexity served as the motivation for the topic coherence. Topic coherence score
indicates the semantic similarity present between the highly weighted terms in the
topic and relative inter-topic distance.
The coherence measure of 0.9 or 1 being measured means that words are identical
or being bigrams. In Fig. 4, coherence score estimates of potential topics on the
published scientific documents (during 2008–2013), indicates two insights: first near
10 topics and second close to 25 topics. A higher peak value often leads to set of
duplicate topics, therefore optimal value 10 is opted for the experimentation.
158 H. Adhav and V. Singh
Figure 5 outlines the coherent topic to a pivot topic, e.g., topic-0 related to
subtopics to ‘Internet of Things’ and topics 1st, 2nd and 3rd are coherent to
‘Computer Architecture & Security,’ ‘Cognitive Radio Networks’ and ‘Cloud and
Mobile Computing,’ respectively.
Figure 6 illustrates the topic visualization of duration 2002–2007. Here, a bubble
represents a pivot topic and its size (%) emphasizes its importance. These topics are
generated by LDA and visualized, in axis and list view. In the axis view, the farther
bubbles are away from each other, the more different they are. The list view, Red
and Blue bars indicate the presence of the topics in the corpus. Red bars estimate
the coherent pivot topic frequency, whereas Blue bars give the overall frequency for
each term. The frequent terms may be visualized, if a user submits nothing.
The similarity matrix is one key criterion to capture the topic evolution and
modeling the alignment, for a pair of duration with threshold value. This align-
ment threshold value is chosen by optimality conditions of sparse and dense patterns
using the similarity matrix as shown in Fig. 7 which will show the similarity values
between topics so that users can decide the alignment threshold such that they want
to generate dense, medium or sparse patterns. So, the alignment threshold(λ) value
for the model is chosen as 0.15. The more the topics are there the larger the matrix
will be, Fig. 7 illustrates the Similarity Matrix of 2008–2013 and 2014–2019 with
threshold value λ as 0.15. Thus, threshold value must be chosen in an optimal way.
The topic evolution graph represents a topic as node and edge as inter-topics similarity
with above or equal to the threshold, as shown in the following Fig. 8 topics of years
2002–07, 2008–13,2014–19 in first, second and thrid rows, respectively. Here, each
topic is colored specifically to indicate the importance, e.g., peculiar terms are red,
fading terms are blue, evolving terms are green and long-lasting terms are pink.
Each of the instance of evolution, graph reveals interesting patterns and grabs
user’s search interest. The user query transforms the current graph and evolves, as
unnecessary edges and nodes will be discarded.
For a user query on ‘spectrum,’ the topic evolution graph is transformed, shown
in Fig. 9. Here, all the nodes containing ‘spectrum’ and relevant edges satisfying
threshold are retained. To further assist the user in exploration within the emerging,
decay/fading, peculiar, and long-lasting subtopics terms from the graphs.
160 H. Adhav and V. Singh
Fig. 8 Instance of generated ‘Topic Evolution Graph’ {years: 2002–07, 2008–13, 2014–19}
0.17 0.16 0.17 0.16 0.17 0.22 0.19 0.16 0.2 0.16 0.17 0.17 0.2
for Topic-0 has terms like ‘spectrum,’ ‘cognit,’ ‘radio,’ ‘future,’ ‘technolog.’ The
designed model is capable to place the unseen data into the appropriate topic of
class.
162 H. Adhav and V. Singh
Fig. 10 Perplexity and Coherence score (for each duration 1995–2001, 2001–07, etc.)
4.5 Analysis
The topic generation will cohesively affect the overall system performance, as lesser
number of topics may be generated in the polynomial-time and large number of
topics makes it the NP-Hard problem. LDA for topic extraction has O(mnt + t 3 )
time complexity [17] and requires the space complexity of O(mn + mt + nt), where
m denotes the number of tokens or samples to be processed, n is the number of topics
to be extracted which are known as features and t = min(m, n). For ‘y’ number of
periods, time complexity will be O(y(mnt + t 3 )).), and for dividing the terms into four
subcategories, it will take O(4n2 ) = O(n2 ) time. For this, the overall time complexity
for the total period will be O(y.n2 ). After the topic modeling, we need to align the
topics per period, each alignment will take time of n2 making overall time complexity
for total period is O(y.n2 ) which will be used for graph generation. Therefore, the
overall time complexity of work for LDA topic modeling is O(y(mnt + t 3 )) and for
evolution graph and topic graph generation is O(y.n2 ) as all the steps are procedurally
evaluated one after another.
Topic Evolution Model for Interactive Information Search 163
5 Conclusion
The topic evolution model steers a user information search and adapts to its cogni-
tive perception on the information stores in the corpus. The proposed model depen-
dent on cognitive view of scientific literature and adapts the temporal changes. The
topic evolution graph enhances the information seeker capability to search within
topics and subtopics, via submitting pivot topics and shown similarity scores. The
metrics, e.g., similarity score, coherence score and perplexity assert the designed
topic evolution model which is an adaptive framework for interactive information
search balances the exploitation–exploration during the search.
In the future, the similar topic evolution pattern may be adapted for the information
retrieval and Web search platforms. More specifically, scholarly search platforms
are the key applications area. Here, for the alignment of temporal changes on topic
evolutions, another document corpus, e.g., Wiley, Web of Science, may be integrated.
As well for the topic extraction, we can use the advanced techniques like deep learning
and artificial neural network which will help in increasing the automation possibilities
of topic evolutions implementation.
References
13. Z. Tong, H. Zhang, A text mining research based on LDA topic modelling, in International
Conference on Computer Science, Engineering and Information Technology (2016), pp. 201–
210. https://doi.org/10.5121/csit.2016.60616
14. A. Salatino, F. Osborne, E. Motta, AUGUR: Forecasting the “Emergence of New Research
Topics”, in ACM/IEEE on Joint Conference on Digital Libraries (ACM, New York, 2018), pp
303–312. https://doi.org/10.1145/3197026.3197052
15. A. Chaudhuri, N. Sinhababu, M. Sarma, D. Samanta, Hidden features identification for
designing an efficient research article recommendation system. Int. J. Digital Libr. 1–17 (2021),
https://doi.org/10.1007/s00799-021-00301-2
16. D. Cai, X. He, J. Han, Training linear discriminant analysis in linear time, in 2008 IEEE 24th
International Conference on Data Engineering (2008), pp 209–217. https://doi.org/10.1109/
ICDE.2008.4497429
17. Graphviz Homepage, https://graphviz.org. Last accessed 15 Sept 2021
A Novel Automated Human Face
Recognition and Temperature Detection
System Using Deep Neural
Networks—FRTDS
Abstract This paper proposes a novel FRTDS (Face Recognition and Tempera-
ture Detection System) that is contactless and performs real-time face recognition.
The system had proved to be fast, built at low-cost, and efficient in user authen-
tication. FRTDS consists of numerous algorithms and techniques that have been
implemented to improve the performance of the entire system with the help of Deep
Neural Networks. FRTDS can capture images from a video stream and can detect
faces from 70–90 cm away from the camera. An interactive front-end recognizes
and displays the identity of the person. FRTDS also includes a temperature sensor
to monitor the health of the person, before they enter any premises. The recognized
face along with temperature data is stored at the back-end with the current time
and date. This paper also presents a novel customized tool that eases the process of
dataset creation and augmentation, and a novel prediction discrepancy elimination
algorithm.
1 Introduction
In any organization, the login and logout credentials of all employees need to be
maintained. Traditionally, attendance is tracked using biometric fingerprint scanners
which are quick and easy to use. COVID-19 pandemic situation demands a contact-
less system that would aid in automatic attendance tracking. Due to the cost incurred
to develop such systems, most organizations reverted to manual attendance and this
is very time-consuming. Face recognition is a computer vision-based task that aims
to identify faces that its system has previously been trained with. It is widely used
V. M (B) · P. Durai
Cambridge Institute of Technology, Bangalore 560036, India
e-mail: varalatchoumy.cse@cambridge.edu.in
P. Durai
e-mail: pranav.19cs033@cambridge.edu.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 165
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_13
166 V. M and P. Durai
2 Literature Survey
3 Proposed Methodology
Face recognition and temperature sensing are the two major modules of novel
FRTDS. The various tasks involved are, pre-processing of the input image, face
detection, transformation of face data, feature extraction using deep neural network,
and classification. With respect to temperature sensing, detection of hand, reading the
temperature, and pushing the metrics to the back-end database are the tasks involved.
These two main modules can be developed individually and can be integrated later,
before deployment. Figure 1 depicts the overall design of the proposed integrated
system.
A Novel Automated Human Face Recognition and Temperature … 167
3.1 Pre-Processing
Haar-cascade classifier is used to detect faces, and only the frames that have
a face in them are selected and extracted. Face frames are re-sized to maintain a
uniform size for all face classes in the dataset. Extracted frames are passed through
a series of image processing filters such as grayscale, and histogram equalization.
This part of pre-processing is required to elevate the clarity of the features in face
images. Grayscale filter solves face color problem. Hence, a grayscale filter is used
to eliminate colors from the images in each face class. Similarly, the histogram
equalization filter equalizes and normalizes the bright and dark parts in the image.
To increase the amount of face data in each class, the data augmentation technique
was used. It applies random rotations concerning left and right directions to the face
frames. Another advantage of this technique is that the system can learn about how
a person’s face will look if it were to be tilted.
Finally, the face frames are stored in a folder with the previously entered personal
identification number as its name. All the folders will be saved under the main
working directory. Each face class would have 30 face frames. The number of faces
per class is dynamic and can vary. This was done to drastically decrease the size of
A Novel Automated Human Face Recognition and Temperature … 169
the dataset, while not compromising on the amount of data required. Light affects a
person’s face in all directions. The trajectory of light rays introduces highlights and
shadows on various parts of the face. This makes the face seem different in terms
of appearance. Representation of how the direction of light affects the facade of a
person’s face in real-time data is depicted in Fig. 3. It becomes a challenge for the
Deep Neural Network (DNN) to extract facial features from the face with varying
lighting conditions. For this experiment, 30-s clips of employee’s face with variation
in lighting in each of it, was collected. The clips were then processed using the novel
“Dataset Creation and Augmentation Tool” to extract face frames.
DNN was trained with images of different employees with data from their face
images with varying lighting conditions. Figure 4 depicts the prediction the massive
improvement in prediction performance after training the DNN successfully with
face images that had variation in lighting.
Before the face can be recognized and validated, it needs to be detected in the image
frame. Multiple techniques facilitate this process. Effective detection of various
objects using Haar feature-based cascade classifiers is functional [2]. This is mainly
a machine-learning-based application where a function with a cascade is trained
170 V. M and P. Durai
and tested with positive and negative image data. Feature representation with Haar-
cascade based classifier has been illustrated in Fig. 5. Any image can be represented
with the below-mentioned three feature patterns. This aids the proposed system to
accurately track faces from the extracted frames using Haar-based classifier [2] in
real-time for edge features, line features, and four rectangle features.
Detected faces are transformed into a uniform template where the eyes and bottom
lips are aligned at the same location in all the images. This is done to maintain
uniformity across all images. Any slight changes in the orientation of the images
are corrected in this module. Picture on the left represents the face image without
orientation transformation, and the picture on the right represents the face image
after the orientation corrections were done as depicted in Fig. 6. It can be observed
that after the transformation, the image is now straight.
172 V. M and P. Durai
A Deep Neural Network (DNN) is an Artificial Neural Network (ANN) with multiple
layers between the input and output layers. A DNN is trained to recognize the features
in a face from the given images and calculate the probability that the features match
a new image that is given to it. The user can review the results and select which
probabilities the network should display (above a certain threshold) and return the
proposed label. Each mathematical manipulation is considered a layer, and complex
DNNs have many layers, hence the name “Deep” networks.
Transformed images are fed into a dense, multi-layer Deep Neural Network.
Novel FRTDS mainly takes advantage of the Keras implementation of OpenFace.
DNN consists of feature maps with multiple convolution layers, a pooling layer,
and linear classification layers. In these layers, specifically in the convolution layer,
dominant features from the person’s face are extracted. Features are then passed on
to the pooling layers where the representation’s spatial size is gradually reduced.
Further, convolutions are applied to the generated feature maps. Proposed neural
network is illustrated diagrammatically in Fig. 7.
A person’s face can be represented in a 128-D unit hypersphere of 1-unit diam-
eter. The points in the hypersphere representation define the face embeddings of a
particular person’s facial features. Facial feature embeddings are the output of the
deep neural network, previously illustrated in Fig. 7. Furthermore, an illustration in
Fig. 8 highlights the parts of the human face and the various facial features map.
3.5 Classification
Embeddings, in general, are just points in a hypersphere that represent a person’s face.
But the problem here is, once the embeddings of multiple faces are extracted with
the help of the DNN, there are chances of overlapping. In this case, one person’s
face embeddings might have a chance to interfere with another person. To tackle
this, labeled embeddings of individual people need to be fed into a classification
algorithm. Due to its supervised-learning method, each face class can be labeled and
given to the algorithm. The algorithm can then learn from these classes, separate
each class from one another, and increase the gap between each class. Finally, it can
classify across all the classes that it has learned from. Novel FRTDS uses the highly
efficient “Support Vector Machine” algorithm for the classification task. A Support
Vector Machine (SVM) [6], is an algorithm that works by using a nonlinear mapping
to transform the original training data into a higher dimension.
As far as the parameters for the SVM classifier is concerned, kernel parameters are
set for polynomial hyperplane. C-parameter is set as value “1”, to stop the problem
of overfitting. In degree parameters, as the polynomial degree increases, then the
training time increases linearly with it.
The trained SVM-based classifier is used to classify and then recognize the face
when the person comes and stands in front of its camera. The name of the person
is also printed on the feed as an alert. Even with tons of training data, no machine
learning model can return a 100% prediction accuracy with high performance. In
highly complex problems such as face recognition, the system should be extremely
fault-free when it comes to prediction performance.
This paper presents a novel Prediction Discrepancy Elimination Algorithm, which
is a fairly lightweight, custom-designed algorithm that aims to remove errors while
prediction takes place. Two classes “A” and “B” have been taken to illustrate the
working of the novel algorithm as depicted in Fig. 9. A trained SVM classifier is
used to make multiple predictions as soon as a person arrives in front of the camera
of the system. These predictions are done in real-time, and the most occurred Person
174 V. M and P. Durai
4 Temperature Detection
5 Integrated System—FRTDS
Parameters such as the name, ID, current date, time, and temperature is stored in
a database in the backend, and can be reviewed anytime. Preferred database is
MongoDB, which is a NoSQL database and MongoDB integrates very well with
Python. Temperature data can be used to keep track of the health of employees in
any given month. Web application was designed and developed which is completely
automated and non-intrusive, thus avoiding any physical contact. When a face is
detected, the camera screen opens up in the middle of the screen, displays the face
being recognized, with a text above the detected face showing the prediction. Real-
time implementation of the product and the result obtained for an employee data is
depicted in Fig. 11
We wanted to always keep the security and privacy of the users in mind while
building this system. For this, the camera will only be triggered when motion is
176 V. M and P. Durai
detected in the frame. Recognition takes around 1–2 s and the system immediately
displays the details of the recognized face as an alert. As soon as the employee’s
face is recognized, the name of the person, in-time and out-time, date, and their wrist
temperature data are sent to the database, with the help of the database schema.
6 Experimental Results
particular person, in different lighting conditions. Novel FRTDS was able to effec-
tively recognize all 20 people that took the test, with a prediction accuracy of 84.54%
as depicted in Fig. 12. Novel FRTDS is developed as a dynamic and real-time face
recognition system.
Prediction accuracy was checked during its development and is depicted in Fig. 13.
In each iteration, weights were tweaked and experimented until an optimal perfor-
mance metric output was observed. As an end result, we were able to achieve a
maximum prediction accuracy of 84.54%.
In Table 2, the proposed FRTDS system has been compared with existing systems.
178 V. M and P. Durai
7 Conclusion
At present, there is a constant increase in the number of COVID-19 cases across all
countries. There is a need for such a system to be able to effectively flatten the curve
of spread as much as possible in all the places with public gatherings, educational
institutions or at work places. There is a demand for the system proposed in this
paper because there is a lot of places where it can be accommodated.
Novel FRTDS can be built and deployed on walls outside the entrances of build-
ings, to effectively monitor the temperature of the people entering and leaving that
particular facility. In hospitals, patients can have a periodic temperature check, as well
as allow only authorized personnel into restricted rooms or labs. In educational insti-
tutions, both attendance and body temperature of students can be monitored while
they enter and leave the premises. In organizations, employees can be monitored for
their work hours. At airports, authorities can keep a track of all the passengers and
their temperatures. The temperature detection module of this system is effective not
only for the Novel Corona Virus but for Malaria as well. Body temperature plays a
major role in the diagnosis of Malaria syndrome in a person. Hence, the very motive
of this project is satisfied. As part of the future research work on novel FRTDS, the
collected face and temperature data can be routed to a separate database which is
deployed on the cloud, and perform data analytics to observe patterns. This gives
researchers and doctors deep new insights about the characteristics of the virus, and
learn underlying details if there exists any. Developers can leverage novel FRTDS
and implement it on handheld devices such as smartwatches.
References
2. R. Liao, S.Z. Li, Face recognition based on multiple facial features, in Proceedings Fourth IEEE
International Conference on Automatic Face and Gesture Recognition (Cat. No. PR00580)
(2000) pp. 239–244. https://doi.org/10.1109/AFGR.2000.840641
3. W. Zhao, R. Chellappa, A. Krishnaswamy, Discriminant analysis of principal components for
face recognition, in Proceedings Third IEEE International Conference on Automatic Face and
Gesture Recognition (1998), pp 336–341, https://doi.org/10.1109/AFGR.1998.670971
4. B.T. Chinimilli, T. Anjali, A. Kotturi, V.R. Kaipu, J.V. Mandapati,Face recognition based
attendance system using Haar cascade and local binary pattern histogram algorithm, in 2020
4th International Conference on Trends in Electronics and Informatics (ICOEI)(48184) (2020),
pp. 701–704. https://doi.org/10.1109/ICOEI48184.2020.9143046
5. C. Chen, Y. Zhan, C. Wen, Hierarchical face recognition based on SVDD and SVM. Int. Conf.
Environ. Sci. Inf. Appl. Technol. 2009, 692–695 (2009). https://doi.org/10.1109/ESIAT.200
9.139
6. R. Senthilkumar, R. K. Gnanamurthy, Performance improvement in classification rate of
appearance based statistical face recognition methods using SVM classifier, 2017 4th Inter-
national Conference on Advanced Computing and Communication Systems (ICACCS) (2017),
pp. 1–7, https://doi.org/10.1109/ICACCS.2017.8014584
7. G. Guo, S.Z. Li, K. Chan, Face recognition by support vector machines, in Proceedings
Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat. No.
PR00580) (2000), pp. 196–201. https://doi.org/10.1109/AFGR.2000.840634
8. S. Nasr, K. Bouallegue, M. Shoaib, H. Mekki, Face recognition system using bag of features
and multi-class SVM for robot applications, in 2017 International Conference on Control,
Automation and Diagnosis (ICCAD) (2017), pp. 263–268. https://doi.org/10.1109/CADIAG.
2017.8075668
9. C. Ding, D. Tao, Trunk-Branch ensemble convolutional neural networks for video-based face
recognition, in IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 40, no.
4 (2018), pp. 1002–1014. 1 Apr 2018. https://doi.org/10.1109/TPAMI.2017.2700390
10. D. Wang, H. Yu, D. Wang, G. Li, Face recognition system based on CNN. Int. Conf. Comput.
Inf. Big Data Appl. (CIBDA) 2020, 470–473 (2020). https://doi.org/10.1109/CIBDA50819.
2020.00111
11. T. Mantoro, M.A. Ayu, Suhendi, Multi-faces recognition process using Haar cascades and
Eigenface methods, in 2018 6th International Conference on Multimedia Computing and
Systems (ICMCS) (2018), pp. 1–5, https://doi.org/10.1109/ICMCS.2018.8525935
12. M.S.I. Sameem, T. Qasim, K. Bakhat, Real time recognition of human faces. Int. Conf. Open-
Source Syst. Technol. (ICOSST) 2016, 62–65 (2016). https://doi.org/10.1109/ICOSST.2016.
7838578
13. A.K. Jain, B. Klare, U. Park, Face recognition: Some challenges in forensics. IEEE Int. Conf.
Autom. Face Gesture Recognit. (FG) 2011, 726–733 (2011). https://doi.org/10.1109/FG.2011.
5771338
A Novel BFS and CCDS-Based Efficient
Sleep Scheduling Algorithm for WSN
B. Srinivasa Rao
Abstract The main aim of the present research is to propose a novel BFS and CCDS-
based efficient sleep scheduling algorithm for WSN using two popular search tech-
niques, Breadth First Search (BFS) and Color Connected Dominated Set (CCDS), for
reducing energy consumption and delay when the message is broadcasted in WSN.
Breadth First Search (BFS) is implemented to find the minimum distance path from
a sensor node and reduce the delay in transmitting the message. Color Connected
Dominated Set is used to transmit messages to all nodes without collision and hence
minimize the energy consumption. The analysis is made between two algorithms
with the same set of nodes.
1 Introduction
The most important aspect of the wireless sensor networks (WSN) is the necessity of
long-term involvement and working of sensor node batteries without charging while
monitoring the critical events that affect the efficiency of the WSN. Hence, intelli-
gent techniques are required for effective conservation of the energy of the power
sources. Based upon the energy waste in WSN, various types of methods like data
reduction, control reduction, energy-efficient routing, duty cycling, topology control
have been reported in the literature [1]. Analytically, all these techniques have their
advantages and disadvantages at a wide range [2]. Obviously, it has been observed
that it is essential to design energy-efficient scheduling algorithms to enhance the
lifetime of the power source and in turn of the sensor nodes [3]. In that context,
sleep scheduling algorithms significantly reduced wireless sensor networks’ energy
consumption and time delay [4]. In general, sleep scheduling algorithms are used
in the form of synchronous, semi-synchronous, and asynchronous mechanisms. In a
synchronous scheduling mechanism, all sleeping nodes wake-up for communication
B. S. Rao (B)
Gokaraju Rangaraju Institute of Engineering and Technology, Bachupally, Hyderabad 500090,
India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 181
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_14
182 B. S. Rao
This section describes the working of BFS and CCDS scheduling algorithms.
The Breadth First Search Scheduling is designed from the inspiration of the Breadth
First Search algorithm of the graph theory. The procedure begins with visiting the
sensor node and all of its neighbor sensors. Then, in the subsequent step, the nearest
neighboring sensors of the nodes are visited, and the same procedure is continued
in the next subsequent steps. The algorithm visits all the adjacent sensor nodes of
all the sensor nodes in the network and ensures that each sensor is visited exactly
once [9]. By implementing the BSF algorithm, a BSF tree can be constructed that
describes the uplink path in the following steps.
Step 1. A sensor node is selected as the central node.
Step 2. Categorization of all the sensor nodes into node levels L1, L2, L3…etc.
Step 3. Each level is depicted by different color.
A Novel BFS and CCDS-Based Efficient Sleep Scheduling … 183
Step 4. In this process, the neighboring sensors of each sensor are computed, as
shown in Table 1.
From Table 1, BSF tree is constructed for each node as shown in Fig. 1. Then a
routing table is built for each node in the WSN, as shown in Table 2. Now each route
in the BSF tree routing table defines the uplink paths for that particular sensor node.
The CCDS scheduling is used to construct a downlink path in WSN when a crit-
ical event is occurred. The scheduling design is from the inspiration of the CCDS
concept proposed for WSN that implements self-regulation among the nodes [1]. In
WSNs, the downlink path construction is different with respect to uplink paths as the
communication will be between a long-distance node to the central node and possi-
bility of multiple paths and to select an optimal one. This process requires internal
self-regulation among the sensor nodes. In the present CCDS scheduling, different
algorithms are combined to construct a Connected Dominating Set and serve as
a backbone to the WSN. The main objective of the backbone is the reduction of
communication overhead, enhancement of bandwidth efficiency, minimization of
overall energy consumption, and improvement of network effective lifetime in a
WSN [1].
2.3 Construction
For the construction of CCDS, we have followed the earlier methods proposed in
[10, 11]. The construction process involves (i) Maximum Independent Set (MIS) in
G, (ii) Connected Dominated Set (CDS), and (iii) Internal Model Controlling (IMC)
algorithm [12].
3 Related Work
Different methods of sleep scheduling for extension of the lifetime of the WSN have
been recently reviewed [1–8]. In general, most of the proposed sleep scheduling
mechanisms involve the components like target prediction, reduction of awakened
sensors, and control active time of the sensor. The energy-efficient TDMA sleep
scheduling algorithm has the advantages of maximization of the lifetime of WSN,
but this mechanism has disadvantages like delay, data overlap, and reduction in
channel utilization [13]. The Balanced-Energy Scheduling used WSN sensor redun-
dancy to increase the lifetime and network load balance to improve the efficiency
[14], but it has a disadvantage of long-distance communication in the WSN. The
DESS algorithm reduces energy utilization and communication delay but has the
issues of communication delay and latency [15]. Recently, a new energy-efficient
reference node selection (EERS) algorithm has been proposed [16] for time synchro-
nization in industrial WSNs. EERS significantly increased the large savings in energy
consumption but applicable to many nodes at the industry level. A multilevel sleep
scheduling algorithm was developed, adopting the clustering concept for wireless
A Novel BFS and CCDS-Based Efficient Sleep Scheduling … 185
sensor networks [17]. Even though the model increases the network lifetime, asyn-
chronization among the clusters is an issue to consider. An efficient sleep scheduling
mechanism was developed for WSN using similarity measures [18]. This model
reduces the energy consumption by scheduling the nodes into active or sleeping
modes. But this mechanism is not effective in the case of sparse distribution of sensor
nodes. A heuristic-based delay tolerance and energy-saving model was developed for
WSN [19], which gave a better performance to other models but is confined to only
a mobile base station scenario. A Sensor Node Scheduling Algorithm was proposed
for heterogeneous WSN [20] to improve network lifetime and regional coverage rate.
However, this model is more suitable for only static nodes WSN and may not be effec-
tive for mobile nodes. Recently, Mhatre and Khot [21] proposed an energy-saving
opportunistic routing sleep scheduling algorithm that effectively reduces energy
dissipation but is confined to only one-dimensional topology networks. Recently,
Sinde et al. [22] proposed energy-efficient scheduling using deep reinforcement
learning that increases the lifetime and reduces the network delay and has shown
better performance than previous models. But the main issue with the model is with
the complexity of deep reinforcement learning. Ant colony optimization algorithm
was used for energy optimization of WSN with better energy efficiency [6]. Another
scheduling algorithm was proposed by Manikandan [23] using Game Theory and
Wake-up approach for energy efficiency in WSN. The main disadvantage with this
model is many approximations that make the model unrealistic. Recently, a metric
routing protocol was proposed for the evolution and stability of IoT networks [7]. A
heuristic approach-based ant colony optimization multipath routing algorithm was
proposed for virtual ad hoc networks to optimize the relay bus and route selection
issues [8]. The above two models give an idea for considering some new approaches
for efficient scheduling mechanisms for WSN. Also, minimizing energy consumption
is good security-providing aspect in WSN [24]. From the above discussion, we infer
that the above-mentioned scheduling algorithms are good at some points and have
drawbacks in other aspects and stressing further research in designing new and novel
efficient algorithms to enhance the lifetime of the sensors. In all the above algorithms
and models, no appropriate attempt has been made to balance energy consumption
and delay time simultaneously. The main advantage of Breadth First Search in WSN
is that during its traversal of the tree in the level by the level manner, it classifies
tree edges and cross edges, and its time complexity is O(N). This property is very
important for efficient routing in wireless sensor networks [9]. Also, Breadth First
Search (BFS) is implemented to find the minimum distance path from a sensor node
and reduce the delay in transmitting the message. On the other hand, the Connected
Dominating Set (CDS)-based routing is one kind of hierarchical method that has
received more attention in reducing routing overhead. Color Connected Dominated
Set is used to transmit messages to all nodes without collision and hence minimize
the energy consumption [1]. The ensemble of these two techniques balances both
energy consumption and delay time in the wireless sensor network. In the present
research work, we propose an efficient sleep scheduling procedure for WSN using
two popular search techniques Breadth First Search and Color Connected Dominated
186 B. S. Rao
Set, for reducing energy consumption and delay when the message is broadcasted in
WSN. In the next section, the proposed model has been presented.
In the present paper, we propose an efficient sleep scheduling procedure for wireless
sensor networks using two popular search techniques, Breadth First Search and Color
Connected Dominated Set, for reducing energy consumption and delay when the
message is broadcasted in WSN. The proposed algorithm is considered in two phases
(i) Uplink phase (ii) Downlink phase for scheduling. The uplink phase is scheduled
by Breadth First Search scheduling with the inspiration of graph theory’s Breadth
First Search algorithms. Color Connected Dominated Set schedules the downlink
phase-inspired scheduling. Finally, the combination of BFS Scheduling and CCDS
scheduling forms the proposed novel efficient sleep scheduling algorithm to reduce
energy consumption and delay when message is broadcasted in WSN. The proposed
new scheduling is briefly described as follows. A model wireless sensor network that
has been deployed for the detection of any critical event is shown in Fig. 2. It consists
of a central node (black shaded) also known as center node that has the capability
of communication with all the network nodes. In the case of detecting a critical or
disaster event by any node of the network (denoted by gray shaded node), the gray
node sends an alarm to the central node and thus constructs the uplink path. For the
construction of the uplink path, BFS scheduling is implemented. In constructing this
uplink path, the shortest path from any node to the central node is computed by BFS
scheduling.
Now the central node transmits the received alert from the gray node during
the uplink phase to all the other sensors in the network. For the construction of
the downlink path, the Color Connected Dominated Set scheduling is implemented.
For the construction of the downlink path, CCDS is constructed using the Internal
Fig. 2 Construction of
uplink path
A Novel BFS and CCDS-Based Efficient Sleep Scheduling … 187
Fig. 3 Construction of
downlink path
Model Control (IMC) algorithm [1, 12]. The Internal Model Control algorithm is
self-regulating process and characterizes the downlink path while transmitting the
alert from the central node to all other sensor nodes of WSN, as shown in Fig. 3.
5 Methodology
This section discusses the methodology for both uplink and downlink phases of the
present scheduling algorithm. Initially, deployment of nodes is performed. In order
to have communication in the WSN, route discovery is done using route table entries
of all the sensor nodes in WSN. A node initiates route discovery by sending a request
to its neighboring first hop nodes to know whether they are located in its path and
waits for a route reply from the first hop neighbors. Based on the first hop nodes’
reply messages, the broadcasting node updates its routing table entry destination ID.
The sequence number and battery status are updated for the latest information about
the nodes’ fresh routes and energy levels. After identifying the neighbor nodes, the
BSF algorithm is implemented to construct the BSF tree that divides the nodes into
different levels. Using these levels, a CCDS is constructed. Also, MIS is constructed
for independent nodes of each level. For comparative study, both scheduling algo-
rithms will be inputted with the same set of nodes. The methodology diagram for the
proposed work is shown in Fig. 4.
The pseudocode for phases of BSF scheduling and CCDS construction has been
presented below. The general notations are followed in the pseudocodes.
188 B. S. Rao
_____________________________________________________________________
Pseudocode-1: BSF-Scheduling Procedure for WSN denoted by V [9] (For notations
refer [9])
_____________________________________________________________________
begin
for each node n ε V do
Distance[n]= infinity, Predicate[n]=-1;
Color[n]=White;
Distance[s]=0, Color[s]=Gray;
Q=Empty , Queue Enqueue(Q,s);
while( Q is not empty) do
u=Head(Q);
for each neighbor of n of u do
if( Color[n] is White) then
Distance[n]=Distance[u] + 1;
Predicate[n]=u;
Color[n]=Gray;
Enqueue(Q,n);
Dequeue(Q);
Color[u]=Black;
end;
____________________________________________________________________________
A Novel BFS and CCDS-Based Efficient Sleep Scheduling … 189
____________________________________________________________________________
Pseudocode-2: Construction of Maximum Independent Set [10] (For notations refer [10])
____________________________________________________________________________
Function MIS(W)
begin
if (!connected(W)) then
begin
X=SCC(W);
if(|X|<=2) P=1 else P=MIS(X);
return (MIS(W-X) + P);
end;
if (|W|<=1) then return(|W|);
Select Y,Z of W such that
(i)d(Y,W) is minimal and
(ii) (Y,Z) is an Edge of W and d(Z,W) is maximal for all neighbors with degree d(Y,W);
if( d(Y,W)=1) then return 1+ MIS(W- M(Y);
if (d(Y,W)=2 then
begin
Z’:=M(Y)-Z;
if (Edge(Z, Z’) ) then return (1+ MIS(W-M(Y));
return Maximum(2+MIS(W-M(Z)-M(Z’)), 1 + MIS2 (W-M(Y)), M 2 (Y));
end;
if (d(Y,W)=3) then return Maximum(MIS 2 (W-Y, M(Y)), 1+ MIS(W- M(Y)));
if Y dominates Z then return MIS(W-Z);
return Maximum( MIS(W-Z), 1 + MIS(W-M(Z)))
end;
____________________________________________________________________
6 Experiment
The software requirements are: (a) Backend: Python 3 (b) Frontend: HTML,
CSS and Bootstrap 4. The Hardware requirements are: RAM—advised to have
>32 GB, Graphic Card, Nvidia GTX 1071, Processor—Intel Core i7-8750H,
Storage—512 GB SSD. Simulator: Network Simulators.
A novel efficient sleep acheduling algorithm for WSN was proposed and experi-
mented on network simulators in the present research work. The information of the
source node and its neighbors has been input and given in Table 3. The implemented
results of the BSF scheduling algorithm are presented in Table 4. Similarly, the exper-
imental results of the CCDS scheduling algorithm are given in Table 5. Table 6 shows
the paths from any sensor node to the central node in WSN. From Table 4, it can be
observed that during BSF scheduling, some parent nodes have children, and others
190 B. S. Rao
do not have as they are leaf nodes. It is well-understood that no energy is release
from leaf nodes.
Similarly, from Table 5, it is understood that during CCDS scheduling, the parent
nodes not having children would not be involved in message transmission. Comparing
Tables 4 and 5, it can be understood that the child node number of the parent nodes
of BSF scheduling and CCDS scheduling are not the same. Also, childless nodes are
more for CCDS. Therefore, finally, it could be understood that the childless parents
of BSF do not involve in the dissipation of energy as they are leaf nodes and childless
nodes of CCBS do not involve in transmission message and turn conserve energy
which is the main objective of the present research work.
A Novel BFS and CCDS-Based Efficient Sleep Scheduling … 191
Table 6 shows the paths from any sensor node to the central node in WSN. The
bold lettered nodes in the table reveal the difference in the number of hops. It can be
observed that the sensor nodes S12, S14, S8 have a smaller number of hops in path
to the central node N1 for BFS scheduling in comparison with CCDS scheduling.
At the same time, the nodes S3, S4, S5, S6, and NS are at one hop distance from S1.
Other nodes such as S2, S3, S5, S6, and S7 are one hop away from the center node.
It is obvious that the hop count varies with the levels of the nodes in the WSN.
192 B. S. Rao
8 Conclusion
In the present paper, the proposed efficient sleep scheduling algorithm has been
successfully designed and implemented. From the experimental results, it can be
concluded that to have balance and better energy-saving and fastest transmission of
a message in WSN, the present algorithm has successfully balanced both the BSF
and CCDS. Further, research is required to improve the model. In the future, we
extend this model for both homogeneous and heterogeneous networks of sufficiently
large size compared with other models numerically and analytically.
Acknowledgements The author is thankful to the management of GRIET for their encouragement.
References
1. Z. Liu, B. Wang, L. Guo, A survey on connected dominating set construction algorithm for
wireless sensor networks. Inf. Technol. J. 9(6) 1081–1092 (2010)
2. R. Soua, P. Minet, A survey on energy efficient techniques in wireless sensor networks. in 2011
4th Joint IFIP Wireless and Mobile Networking Conference (WMNC 2011) (2011), pp. 1–9,
https://doi.org/10.1109/WMNC.2011.6097244
3. A.R. Pagar, D.C. Mehetre,A survey of Energy Efficient Sleep Scheduling in WSN. Semantic-
scholar.org, Corpus ID 212548604 (2015)
4. M. Karthihadevi, S. Pavalarajan, Sleep scheduling strategies in wireless sensor network. Adv.
Nat. Appl. Sci. 11(7), 635–641 (2017)
5. Z. Zhang, L. Shu, C. Zhu, M. Mukherjee, A short review on sleep scheduling mechanism in
wireless sensor networks, in QShine 2017 eds. by L. Wang et al. (LNICST 234, 2018), p. 66
6. J. I. Z. Chen, K. Lai, Machine learning based energy management at internet of things network
nodes. J. Trends Comput. Sci. Smart Technol. 2(3), 127–133 (2020)
7. S. Smys, C. Vijesh Joe, Metric routing protocol for detecting untrustworthy nodes for packet
transmission. J. Inf. Technol. 3(2), 67 (2021)
8. R. Dhaya, Kanthavel R., Bus-based VANET using ACO multipath routing algorithm. J. Trends
Comput. Sci. Smart Technol. (TCSST) 3(1), 40 (2021)
9. V.K. Akram, O. Dagdeviren, Breadth-first search-based single-phase algorithms for bridge
detection in wireless sensor networks. Sensors (Basel, Switz.) 13(7), 8786–8813 (2013). https://
doi.org/10.3390/s130708786
10. J.M. Robson, Algorithms for Maximum independent Sets. J. Algorithms 7, 425–440 (1986)
11. G. Peng, J. Tao, Z. Qian, Z. Kui, Sleep scheduling for critical event monitoring in wireless
sensor networks. IEEE Trans. Parallel Distrib. Syst. 23(2), Feb (2012)
12. D. Rivera, M. Morari, S. Skogestad, Internal model control—PID controller design. Ind Eng.
Chem Process Des Dev. 25, 252–265 (1986)
13. P. Laxman, P. Rajeev,Comparative analysis of TDMA scheduling algorithms in wireless sensor
networks. https://www.semanticscholar.org/ Corpus ID: 61778529 (2014)
14. J. Feng, H. Zhao,Energy balanced multisensory sensory scheduling for target tracking in WSN.
Sensors (Basel) 18(10), 3585 (2018)
15. S. Soumyadip, D. Swagatam, M. Nasir, V.V. Athanasios, P. Witold, An evolutionary multiob-
jective sleep-scheduling scheme for differentiated coverage in wireless sensor networks. IEEE
Trans. Syst. Man Cybern. Part C (Appl. Rev.) 42(6) (2012)
16. E. Mahmoud, M.A. Abd El-Gawad, K. Haneul, P. Sangheon, EERS: Energy-efficient reference
node selection algorithm for synchronization in industrial wireless sensor networks. Sensors
20, 4095 (2020) https://doi.org/10.3390/s20154095
A Novel BFS and CCDS-Based Efficient Sleep Scheduling … 193
17. S. Hassan, M.S. Nisar, H. Jiang, Multilevel sleep scheduling for heterogeneous wireless sensor
networks. Comput. Sci. Technol. Appl. 227 (2016)
18. R. Wan, N. Xiong, N.T. Loc, An energy efficient sleep scheduling mechanism with similarity
measure for wireless sensor networks. Hum. Cent. Comput. Inf. Sci. 8, 18 (2018)
19. O. Jerew, N. Bassan, Delay tolerance and energy saving in WSN in mobile base station. Wirel.
Commun. Mob. Comput. 2019 (2019) Article ID 3929876
20. Z. Wang, Y. Chen, B. Liu, H. Yang, Z. Su, Y. Zhu, A sensor node scheduling algorithm for
heterogeneous wireless sensor networks. Int. J. Distrib. Sens. Netw. 15, 1 (2019)
21. K.P. Mhatre, U.P. Khot, Wireless Personal Communications, 112(1243) (2020)
22. R. Sinde, F. Begum, K. Njau, S. Kaijage, Refining network life time of wireless sensor networks
using energy efficient clustering and DRL based sleep scheduling. Sensors, 20(5), 1540 (2020)
23. K.B. Manikandan,Game theory and wake up approach scheduling in WSN for energy efficiency.
Turk. J. Comput. Math. Educ. 12(10), 2922 (2021)
24. R.B. Gudivada, R.C. Hansdah,Energy efficient secure communication in wireless sensor
networks. in 2018 IEEE 32nd International Conference on Advanced Information Networking
and Applications (AINA) (2018), pp 311–319. https://doi.org/10.1109/AINA.2018.00055
Face Recognition: A Review and Analysis
Abstract Face recognition is the process of identifying and verifying a person from
image or video. A significant amount of contributions have already been made by
the researchers in this field of face identity techniques and recognition. In this paper,
we further explored and investigated the evolution of face recognition methods from
low-level features extracted from global features such as PCA, eigen face and SVM-
based methods to high-level features extracted from deep learning models such as
DeepFace and VGGFace. We also discussed the challenges such as illumination and
pose variation and available standard data sets such as LFW and Yale data set in the
field of face recognition.
1 Introduction
The field of computer vision has produced various tremendous subfields such as face
detection [40, 59], activity recognition [72, 80, 84, 85] and medical image processing
[7, 18]. Facial recognition is one of the crucial subfields which plays an important
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 195
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_15
196 A. Verma et al.
role in biometrics identification [3, 81]. Facial recognition system has the capability
of matching a human face from an image or a video and tries to find in database
of faces and tries to verify the user. Facial recognition system find its application
in various fields such as for security and surveillance—finding missing children,
kinship verification, tracking criminals, etc., for health care—patient medication,
detecting genetic diseases, etc., and for banking and retail—customer verification,
KYC, mobile users, etc.
Global features normally attempt to combine low-level structural and geometric
statistics of the complete objects or entire region of interest as a whole for face
recognition. Few examples of global features-based methods are linear subspace
[10, 26, 50], manifold [25, 36, 90] and sparse representation [22, 23, 88, 94].
However, these features become sensitive to noise. Also, uncontrolled behaviour of
facial changes cannot be handled by these global methods. Later, in early 2000s,
local feature-based methods have been introduced. Local representations describe
the extraction and collection of local features specifically in the spatial–temporal
domain and are obtained in a bottom-up structure. Different feature detector such as
Gabor [45] and LBP [4] and their extensions [16, 24, 95] obtained improved results
in the task of facial recognition. Later, the concept of bag-of-visual words (BOW)
is utilized for various visual classification applications like texture classification,
object/scene retrieval, image categorization and object localization, respectively. For
facial recognition, this codebook representation was used by researchers in [14,
15, 44] which provides better distinctiveness. Both global and local representation
provided significant progress. However, these low-level features need a lot of manual
labour. Furthermore, the generalization capability due to complex nonlinear facial
appearance variations is limited.
A deep convolutional neural network (DCNN) is able to learn robust and high-level
feature representations of an image or video. The supervised deep learning network
of Krizhevsky et al. [43] has got remarkable performance enhancement on large-scale
visual classification data set, such as ImageNet [60], using DCNN [33]. Recently,
many deep architectures such as VGG [68], ResNet [35] and GoogleNet [77] have
been proposed with improved classification performance on various data sets.
The hierarchical features extracted from various layers enabled the network tackle
variation due to face pose and expression changes [15, 74, 78]. The initial convolu-
tion layers extracted features similar to Gabor and SIFT, whereas the higher layers
obtained more complex features to learn facial recognition. The state-of-the-art deep
learning model developed was DeepFace [78]. It achieved great recognition accuracy
on LFW benchmark [37] close to human vision. After this, many deep learning-based
approaches have been introduced in the field of facial recognition such as DeepID
series [73–76], VGGFace [54], FaceNet [64] and VGGFace2 [13].
A significant amount of contributions have already been made by the researchers
in this field of face identity techniques, and recognition and comprehensive reviews
of the related work can be found in [2, 12, 96]. In this paper, we further explore
and investigate the evolution of face recognition methods from low-level features
extracted from global features-based methods to high-level features extracted from
deep learning models.
Face Recognition: A Review and Analysis 197
Since the development of one of the first facial recognition systems by Sakai et. al.
[61], many algorithms have been developed for the same. In this section, we discuss
some global and local feature-based methods. Further we also discuss some recent
deep leaning models for facial recognition.
The eigen face method [83] is considered to be an important facial recognition tech-
nique. The technique primarily calculates eigen vectors [67] and eigen values. The
eigen face method computes variance of faces to represent an face image as a eigen
vector. The technique primarily calculates eigen vectors and eigen values from the
covariance matrix. Further we use the principal component analysis (PCA) [65, 70,
82] to project the higher dimensionality space to a lower-dimensional subspace of
principal components. It enables us to operate with a small set of features for huge
database which also reduces the computational complexity. It was first introduced
by Sirovich et. al. [69] This method computes the difference between features corre-
sponding to different part of the face. Many researchers have used this method with
non-frontal face [79] also.
Face Recognition: A Review and Analysis 199
Gabor functions were first introduced by Dennis Gabor [9] as a tool for signal detec-
tion in noise. Later, these Gabor filters were redefined as 2D functions called as
Gabor wavelets by Daugman [21]. In order to perform facial recognition, the Gabor
wavelets utilize local features [20] call computed at different facial parts which are
termed as Gabor features [66]. These local features are computed at different scales.
Hence, there lies a high probability redundancy in features. Researchers utilized many
feature reduction algorithms [42, 91] to minimize the redundancy in features. As a
wavelet function, Gabor features could extract both spatial and frequency features.
Hence, a facial can be represented as a combination of both [86, 97].
In the meantime, ANN was gaining significant attention and being utilized in many
classification task like age and gender classification. Several face recognition meth-
ods also apply ANN for face recognition [5]. The multi-layer architecture of ANN
enables the network to learn from past experience and also eliminates the distur-
bances due to illumination and face pose variation. It is to be noted that the initial
features still extracted local feature detectors, and ANN was used for classification
only. In [87], the authors utilized ANN with frontal faces for face recognition. In
another works [42, 91], the authors utilized self-organizing map neural network
(SOM) for face recognition. Although the use of ANN provided significant boost
in the face recognition accuracy, the amount of training time was very high due to
multiple layers of the network.
y= wi ∗ xi + b (1)
where wi and b represent the weights and biases and xi and y represent the input and
output.
Hiddel Markov model (HMM) [62] is one of the statistical models used for face
recognition. It consists of two processes. First is the Markov chain which consists of
a set of states, and second, each of this state consists of a probability density function.
It has been utilized in variety of pattern recognition application. In [63], the authors
used HMM for face recognition. However, it was limited to one-dimensional data.
Later, five-state HMMs [52] were developed for facial recognition problems. These
five states correspond to nose, eyes, chin, mouth and forehead. Later, HMM was
bundled with other algorithms like wavelets [11] and also used for facial recogni-
200 A. Verma et al.
tion from temporal data [19]. Further, researchers utilized advance HMM such as
structural hidden Markov model (SHMM) [53] and adaptive hidden Markov model
(AHMM)[48] for face recognition.
Similar to neural network, support vector machine (SVM) [41] was being utilized for
many classification tasks. SVM is an important learning-based method, which can
be effectively utilized to design a classifier as shown in Eq. 2 for facial recognition
problems. The primary features are extracted by local feature detector [6]. Based
on the feature properties, a SVM classifier is trained using the face features. This
kind of hybrid approach was utilized by many researchers such as ICA with SVM
[27] and binary tree with SVM [32]. In order to perform feature extraction and
reduction, various algorithms such as PCA and LDA [51, 71] were utilized before
classification with SVM. In comparison with ANN, SVM provides faster training
and less computation. However, the accuracy is comparatively low.
w T xi + b ≥ 1 for ∀i such that yi = +1
(2)
w T xi + b ≤ 1 for ∀i such that yi = −1
where feature vectors xi ∈ R n and output label yi ∈ +1, −1. w and b represent the
parameters of the hyperplane.
DCNN enabled to extract a wide range of features from images and videos.
Krizhevsky et al. [43] have got remarkable performance enhancement on large data
set in 2012, such as ImageNet [60], using DCNN [33]. Recently, many state-of-
the-art deep architectures such as VGG [68], ResNet [35] and GoogleNet [77] have
been proposed with improved classification performance on various object recogni-
tion data sets.
A significant progress was also been made by the researchers in the facial recog-
nition. They utilized the transfer learning approach to use the state-of-the-art DCNN
model for facial recognition. DeepFace [78] was proposed in 2014 using the AlexNet
architecture. It achieved a very high accuracy on LFW data set. DeepID [75, 76] is
a series of systems (e.g. DeepID, DeepID2, etc.) developed for both identification
and verification. VGGFace [54] and FaceNet [64] were proposed in 2015 utiliz-
ing the VGGNet and GoogleNet architectures respectively. Both the models uti-
lized triple loss function and surpassed the accuracy of DeepFace model. Later in
2017, SphereFace [47] was proposed using the ResNet architecture. The timeline for
progress of facial recognition models has been shown in Fig. 2.
Face Recognition: A Review and Analysis 201
These deep architectures were performing significantly well. However, their com-
putational requirements were very high due to large size of networks. It was difficult
to fit these networks into small embedded devices. Hence, a set of small architec-
tures have been developed such as MobiFace [29] and SqueezeNet [39]. The manual
selection of layers and tuning of parameters are still time consuming and causing
error. Hence, it is required to have a adaptive network architecture model. Recently,
neural architecture search (NAS) [101] has performed outstandingly well in object
classification. It was used for face recognition in [100] to achieve a optimum archi-
tecture.
It was also important to have an end-to-end learning model which will be able to
perform all the task required in a face recognition system, i.e. detection, alignment
and recognition [17, 34, 89, 98]. These models represent more robust and optimal
architecture for face recognition. In [34], the authors were able to register and rep-
resent faces in the same time. In [89], the CNN model performed alignment and
verification task in the same model. Apart from the alignment issue, there are other
factors such as illumination and variation of poses, which are affecting the perfor-
mance of face recognition. The researchers understood this problem, and various
models have been proposed which can perform multitasking [55, 58, 93]. In [57],
authors represented a task-specific network.
In order to extract variety of features from a single image, image augmentation
has been utilized. To process each of the feature, individual networks are assembled
together and formed a multi-input networks [28, 46, 99]. These multi-input networks
202 A. Verma et al.
utilize differently cropped images in different scales to learn a DCNN for recognition
task. Later, for image augmentation, some generative models have been utilized such
as autoencoder [92] and generative adversarial network (GAN)[38]. GAN has gained
a lot of popularity in facial recognition problem due to its effective generation prop-
erty, and it was first proposed by Goodfellow et al. [31]. Many researchers utilized
GAN for face processing application [8, 36]. It also reduced the issue availability of
limited data set for faces required by deep learning architectures. Few of the local
features and deep learning face recognition models are shown in Table 1.
4 Data Sets
In this section, we will discuss some important and commonly used data sets for
facial recognition problems.
It is one of the benchmark data sets for face recognition [56]. It contains 14,126
images from 1199 participants for both training and validation. It was recorded in 15
Face Recognition: A Review and Analysis 203
different sessions. It consists of some duplicate images from the same user as well
in order to understand facial change with respect to time.
It is another benchmark data set for face recognition [1]. It contains 400 images
from 40 participants for both training and validation. The images are recorded in the
span of two years. Compared to other data set, this is relatively less challenging and
considered to be used by beginners.
It is another benchmark data set for face recognition [30]. It contains 2414 images
from 38 participants for both training and validation. It consists of two sets. The data
set is comprised of various challenging environments such as illumination changes,
different expressions and occlusion.
It is another standard data set for face recognition [49]. It contains 4000 images from
126 participants for both training and validation. The images are collected in specific
environment with slight variation in illumination and expressions.
The LFW data set [37] was one of the favourite data sets for deep learning researchers.
Many of the standard face recognition deep architectures [47, 54, 64, 78] were trained
utilizing this data set. It consists of 13,233 images from 5749 users.
There has been significant progress in the field of facial recognition. However, it
is not easy, in fact a very challenging task in the field of computer vision. There
are many factors which affect the accuracy of face recognition such as age group,
illumination, changes in expression and pose variation.
204 A. Verma et al.
5.1 Illumination
The illumination effect is also called as lighting effect. In case of different envi-
ronment and background such as day or night, the illumination gets affected. It may
cause extra light or dark patches in the region of interest which can affect the accuracy
of facial recognition significantly.
Any slight change in head pose will affect the accuracy of face recognition system.
The head pose has three degrees of freedom, i.e. roll, pitch and yaw. In case of
movement, the head utilizes these three motions. Hence, the view angle and eyesight
cause significant variation in face pose which ultimately results in poor accuracy.
The facial parts such as mouth, nose, eyes, chin and their internal relationships man-
ufacture important features for facial recognition. Hence, any change in expression
during registration or verification of human face may affect the performance of the
face verification system directly.
Age is an important factor while preparing a good quality face recognition system.
Face is an combination of skin, tissues and muscles. With ageing, the facial skin and
muscles get changes for each person. Hence, it is important to keep updating the
database of each user timely.
5.5 Occlusion
As mentioned earlier, facial parts such as mouth, nose, eyes, chin and their internal
relationships manufacture important features for facial recognition. If any of the
part is occluded by some other object or not visible properly, then it can affect the
accuracy of facial recognition system.
Face Recognition: A Review and Analysis 205
Recent, face recognition systems, especially deep learning models, are trained on
larger networks which require high computational resources. Hence, it is important
to have sufficient computational architecture to operate these models for real-time
interface. Also, different models are trained with different data set and tuned with
different parameters. Hence, a good system should incorporate variety of data set
and sufficiently train to avoid under fitting or over fitting effect.
Due to these number of challenges, it is difficult to have an ideal or 100% accu-
rate face recognition system. However, we can expect to have a system, which can
minimize the effect of these challenges to a significant level.
6 Conclusion
References
8. J. Bao, D. Chen, F. Wen, H. Li, G. Hua, Cvae-gan: fine-grained image generation through
asymmetric training, in 2017 IEEE International Conference on Computer Vision (ICCV)
(2017), pp. 2764–2773. https://doi.org/10.1109/ICCV.2017.299
9. T. Barbu, Gabor filter-based face recognition technique, Proceedings of the Romanian
Academy-Series A: Mathematics, Physics, Technical Sciences, Information. Science 11, 277–
283 (2010)
10. P. Belhumeur, J. Hespanha, D. Kriegman, Eigenfaces vs. fisherfaces: recognition using class
specific linear projection. IEEE Trans. Pattern Anal. Mach. Intel. 19(7), 711–720 (1997).
https://doi.org/10.1109/34.598228
11. M. Bicego, U. Castellani, V. Murino, Using hidden markov models and wavelets for face
recognition, in 12th International Conference on Image Analysis and Processing, 2003 Pro-
ceedings (2003), pp. 52–56. https://doi.org/10.1109/ICIAP.2003.1234024
12. K. Bowyer, J.K. Chang, P. Flynn, A survey of approaches and challenges in 3d and multi-
modal 3d+2d face recognition. Comput. Vis. Image Underst. 101, 1–15 (2006). https://doi.
org/10.1016/j.cviu.2005.05.005
13. Q. Cao, L. Shen, W. Xie, O.M. Parkhi, A. Zisserman, Vggface2: a dataset for recognising
faces across pose and age, in 2018 13th IEEE International Conference on Automatic Face
Gesture Recognition (FG 2018) (2018), pp. 67–74. https://doi.org/10.1109/FG.2018.00020
14. Z. Cao, Q. Yin, X. Tang, J. Sun, Face recognition with learning-based descriptor, in 2010
IEEE Computer Society Conference on Computer Vision and Pattern Recognition (2010), pp.
2707–2714. https://doi.org/10.1109/CVPR.2010.5539992
15. T.H. Chan, K. Jia, S. Gao, J. Lu, Z. Zeng, Y. Ma, Pcanet: a simple deep learning baseline for
image classification? IEEE Trans. Image Proces. 24(12), 5017–5032 (2015). https://doi.org/
10.1109/TIP.2015.2475625
16. D. Chen, X. Cao, F. Wen, J. Sun, Blessing of dimensionality: high-dimensional feature and
its efficient compression for face verification, in 2013 IEEE Conference on Computer Vision
and Pattern Recognition (2013), pp. 3025–3032. https://doi.org/10.1109/CVPR.2013.389
17. J.C. Chen, R. Ranjan, A. Kumar, C.H. Chen, V.M. Patel, R. Chellappa, An end-to-end sys-
tem for unconstrained face verification with deep convolutional neural networks, in 2015
IEEE International Conference on Computer Vision Workshop (ICCVW) (2015), pp. 360–
368. https://doi.org/10.1109/ICCVW.2015.55
18. N. Cherukuri, N.R. Bethapudi, V.S.K. Thotakura, P. Chitturi, C.Z. Basha, R.M. Mummidi,
Deep learning for lung cancer prediction using nscls patients ct information, in 2021 Interna-
tional Conference on Artificial Intelligence and Smart Systems (ICAIS) (2021), pp. 325–330.
https://doi.org/10.1109/ICAIS50930.2021.9395934
19. J.T. Chien, C.P. Liao, Maximum confidence hidden markov modeling for face recogni-
tion. IEEE Trans. Pattern Anal. Mach. Intel. 30(4), 606–616 (2008). https://doi.org/10.1109/
TPAMI.2007.70715
20. H. Cho, R. Roberts, B. Jung, O. Choi, S. Moon, An efficient hybrid face recognition algorithm
using pca and gabor wavelets. Int. J. Adv. Rob. Syst. 11 (2014)
21. J. Daugman, Two-dimensional spectral analysis of cortical receptive field profiles. Vis. Res.
20, 847–856 (1980)
22. W. Deng, J. Hu, J. Guo, Extended SRC: undersampled face recognition via intraclass variant
dictionary. IEEE Trans. Pattern Anal. Mach. Intel. 34(9), 1864–1870 (2012). https://doi.org/
10.1109/TPAMI.2012.30
23. W. Deng, J. Hu, J. Guo, Face recognition via collaborative representation: its discriminant
nature and superposed representation. IEEE Trans. Pattern Anal. Mach. Intel. 40(10), 2513–
2521 (2018). https://doi.org/10.1109/TPAMI.2017.2757923
24. W. Deng, J. Hu, J. Guo, Compressive binary patterns: Designing a robust binary face descriptor
with random-field eigenfilters. IEEE Trans. Pattern Anal. Mach. Intel. 41(3), 758–767 (2019).
https://doi.org/10.1109/TPAMI.2018.2800008
25. W. Deng, J. Hu, J. Guo, H. Zhang, C. Zhang, Comments on globally maximizing, locally min-
imizing: Unsupervised discriminant projection with application to face and palm biometrics.
IEEE Trans. Pattern Anal. Mach. Intel. 30(8), 1503–1504 (2008). https://doi.org/10.1109/
TPAMI.2007.70783
Face Recognition: A Review and Analysis 207
26. W. Deng, J. Hu, J. Lu, J. Guo, Transform-invariant pca: a unified approach to fully automatic
facealignment, representation, and recognition. IEEE Trans. Pattern Anal. Mach. Intel. 36(6),
1275–1284 (2014). https://doi.org/10.1109/TPAMI.2013.194
27. O. Déniz, M. Castrillón, M. Hernández, Face recognition using independent component anal-
ysis and support vector machines. Pattern Recogn. Lett. 24(13), 2153–2157 (2003). https://
doi.org/10.1016/s0167-8655(03)00081-3
28. C. Ding, D. Tao, Robust face recognition via multimodal deep face representation. IEEE
Trans. Multimedia 17(11), 2049–2058 (2015). https://doi.org/10.1109/TMM.2015.2477042
29. C.N. Duong, K.G. Quach, I. Jalata, N. Le, K. Luu, Mobiface: a lightweight deep learning
face recognition on mobile devices, in 2019 IEEE 10th International Conference on Bio-
metrics Theory, Applications and Systems (BTAS) (2019), pp. 1–6. https://doi.org/10.1109/
BTAS46853.2019.9185981
30. A. Georghiades, P. Belhumeur, D. Kriegman, From few to many: illumination cone models
for face recognition under variable lighting and pose. IEEE Trans. Pattern Anal. Mach. Intel.
23(6), 643–660 (2001)
31. I.J. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville,
Y. Bengio, Generative adversarial nets, in Proceedings of the 27th International Conference
on Neural Information Processing Systems, vol. 2, pp. 2672–2680 (NIPS’14, MIT Press,
Cambridge, 2014)
32. G. Guo, S. Li, K. Chan, Face recognition by support vector machines, in Proceedings
Fourth IEEE International Conference on Automatic Face and Gesture Recognition (Cat.
No. PR00580) (2000), pp. 196–201. https://doi.org/10.1109/AFGR.2000.840634
33. Y. Guo, Y. Liu, A. Oerlemans, S. Lao, S. Wu, M.S. Lew, Deep learning for visual under-
standing. Neurocomputer 187(C), 27–48 (2016). https://doi.org/10.1016/j.neucom.2015.09.
116
34. M. Hayat, S.H. Khan, N. Werghi, R. Goecke, Joint registration and representation learning for
unconstrained face identification, in 2017 IEEE Conference on Computer Vision and Pattern
Recognition (CVPR) (2017), pp. 1551–1560. https://doi.org/10.1109/CVPR.2017.169
35. K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in 2016
IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2016), pp. 770–778.
https://doi.org/10.1109/CVPR.2016.90
36. X. He, S. Yan, Y. Hu, P. Niyogi, H.J. Zhang, Face recognition using laplacianfaces. IEEE
Trans. Pattern Anal. Mach. Intel. 27(3), 328–340 (2005). https://doi.org/10.1109/TPAMI.
2005.55
37. G.B. Huang, M. Mattar, T. Berg, E. Learned-Miller, Labeled faces in the wild: a database for
studying face recognition in unconstrained environments, in Workshop on Faces in ’Real-Life’
Images: Detection, Alignment, and Recognition. Erik Learned-Miller and Andras Ferencz and
Frédéric Jurie, Marseille, France (2008), https://hal.inria.fr/inria-00321923
38. R. Huang, S. Zhang, T. Li, R. He, Beyond face rotation: global and local perception gan for
photorealistic and identity preserving frontal view synthesis, in 2017 IEEE International Con-
ference on Computer Vision (ICCV) (2017), pp. 2458–2467. https://doi.org/10.1109/ICCV.
2017.267
39. F.N. Iandola, M. Moskewicz, K. Ashraf, S. Han, W. Dally, K. Keutzer, Squeezenet: Alexnet-
level accuracy with 50x fewer parameters and <1mb model size. ArXiv abs/1602.07360
(2016)
40. M. Jaya Bhaskar, Y. Venkatesh, R. Sai Bhaskar Pranai, M. Rohith, Face recognition for
attendance management. Int. J. Emerg. Trends Eng. Res. 8(4), 964–968 (2020). https://doi.
org/10.30534/ijeter/2020/04842020
41. K. Jonsson, J. Kittler, Y.P Li, J. Matas, Support vector machines for face authentication. Image
Vis. Comput. 20(5–6), 369–375 (2002). https://doi.org/10.1016/s0262-8856(02)00009-4
42. T. Kathirvalavakumar, Jebakumari, J. Beulah Vasanthi, Face representation using combined
method of gabor filters, wavelet transformation and DCV and recognition using RBF. J. Intel.
Learn. Syst. Appl. 04(04), 266–273 (2012). https://doi.org/10.4236/jilsa.2012.44027
208 A. Verma et al.
43. A. Krizhevsky, I. Sutskever, G.E. Hinton, Imagenet classification with deep convolutional
neural networks. Commun. ACM 60(6), 84–90 (2017). https://doi.org/10.1145/3065386
44. Z. Lei, M. Pietikainen, S.Z. Li, Learning discriminant face descriptor. IEEE Trans. Pattern
Anal. Mach. Intel. 36(2), 289–302 (2014). https://doi.org/10.1109/TPAMI.2013.112
45. C. Liu, H. Wechsler, Gabor feature based classification using the enhanced fisher linear dis-
criminant model for face recognition. IEEE Trans. Image Proces. 11(4), 467–476 (2002).
https://doi.org/10.1109/TIP.2002.999679
46. J. Liu, Y. Deng, T. Bai, C. Huang, Targeting ultimate accuracy: face recognition via deep
embedding. CoRR abs/1506.07310 (2015), http://arxiv.org/abs/1506.07310
47. W. Liu, Y. Wen, Z. Yu, M. Li, B. Raj, L. Song, Sphereface: deep hypersphere embedding
for face recognition, in 2017 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR) (2017), pp. 6738–6746. https://doi.org/10.1109/CVPR.2017.713
48. X. Liu, T. Cheng, Video-based face recognition using adaptive hidden markov models, in
2003 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2003
Proceedings, vol. 1 (2003), pp. I–I. https://doi.org/10.1109/CVPR.2003.1211373
49. A. Martinez, R. Benavente, The ar face database. Tech. Rep. 24 CVC Technical Report (1998)
50. B. Moghaddam, W. Wahid, A. Pentland, Beyond eigenfaces: probabilistic matching for face
recognition, in Proceedings Third IEEE International Conference on Automatic Face and
Gesture Recognition (1998), pp. 30–35. https://doi.org/10.1109/AFGR.1998.670921
51. M. Murtaza, M. Sharif, M. Raza, J. Shah, Face recognition using adaptive margin fisher⣙s
criterion and linear discriminant analysis. Int. Arab J. Inform. Technol. 11, 149–158 (2014)
52. A. Nefian, M. Hayes, Face detection and recognition using hidden markov models 1, 141–145
(1998). https://doi.org/10.1109/ICIP.1998.723445
53. P. Nicholl, A. Amira, D. Bouchaffra, R.H. Perrott, A statistical multiresolution approach for
face recognition using structural hidden markov models. EURASIP J. Adv. Signal Process
2008 (2008). https://doi.org/10.1155/2008/675787
54. O.M. Parkhi, A. Vedaldi, A. Zisserman, Deep face recognition, in Proceedings of the British
Machine Vision Conference (BMVC), ed. by M.W.J. Xianghua Xie, G.K.L. Tam (BMVA
Press, 2015), pp. 41.1–41.12. https://doi.org/10.5244/C.29.41
55. X. Peng, X. Yu, K. Sohn, D.N. Metaxas, M. Chandraker, Reconstruction-based disentangle-
ment for pose-invariant face recognition, in 2017 IEEE International Conference on Computer
Vision (ICCV) (2017), pp. 1632–1641. https://doi.org/10.1109/ICCV.2017.180
56. P.J. Phillips, H. Wechsler, J. Huang, P.J. Rauss, The feret database and evaluation procedure
for face-recognition algorithms. Image Vis. Comput. 16(5), 295–306 (1998), http://dblp.uni-
trier.de/db/journals/ivc/ivc16.html
57. Y. Qian, W. Deng, J. Hu, Task specific networks for identity and face variation, in 2018 13th
IEEE International Conference on Automatic Face Gesture Recognition (FG 2018) (2018),
pp. 271–277. https://doi.org/10.1109/FG.2018.00047
58. R. Ranjan, S. Sankaranarayanan, C.D. Castillo, R. Chellappa, An all-in-one convolutional
neural network for face analysis, in 2017 12th IEEE International Conference on Automatic
Face Gesture Recognition (FG 2017) (2017), pp. 17–24. https://doi.org/10.1109/FG.2017.
137
59. L. Rao, C. Harshitha, C.Z. Basha, N. Parveen, Hybrid computerized face recognition system
using bag of visual words and mlp-based bpnn, in 2020 4th International Conference on
Electronics, Communication and Aerospace Technology (ICECA) (2020), pp. 1113–1117.
https://doi.org/10.1109/ICECA49313.2020.9297499
60. O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A.
Khosla, M. Bernstein, A.C. Berg, L. Fei-Fei, Imagenet large scale visual recognition challenge.
Int. J. Comput. Vis. 115(3), 211–252 (2015). https://doi.org/10.1007/s11263-015-0816-y
61. T. Sakai, T. Kanade, M. Nagao, Y. Ichi Ohta, Picture processing system using a computer
complex. Comput. Gr. Image Proces. 2(3–4), 207–215 (1973). https://doi.org/10.1016/0146-
664x(73)90002-6
62. A. Salah, M. Bicego, L. Akarun, E. Grosso, M. Tistarelli, Hidden markov model-based face
recognition using selective attention - art. no. 649214, in Proceedings of SPIE—The Interna-
tional Society for Optical Engineering (2007). https://doi.org/10.1117/12.707333
Face Recognition: A Review and Analysis 209
63. F. Samaria, S. Young, Hmm-based architecture for face identification. Image Vis. Comput.
12(8), 537–543 (1994). https://doi.org/10.1016/0262-8856(94)90007-8
64. F. Schroff, D. Kalenichenko, J. Philbin, Facenet: a unified embedding for face recognition and
clustering, in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2015), pp. 815–823. https://doi.org/10.1109/CVPR.2015.7298682
65. F. Shamrat, P. Ghosh, Z. Tasnim, A. Khan, M. Uddin, T. Chowdhury, Human face recognition
using eigenface, surf methods (2021). https://doi.org/10.1109/ICPCSN.2021.0908305
66. M. Sharif, S. Mohsin, M. Jamal, M. Javed, M. Raza, Face recognition for disguised variations
using gabor feature extraction. Aust. J. Basic Appl. Sci. 5, 1648–1656 (2011)
67. M. Sharif, S. Mohsin, M.Y. Javed, A survey: face recognition techniques. Res. J. Appl. Sci.
Eng. Technol. 4(23), 4979–4990 (2012)
68. K. Simonyan, A. Zisserman, Very deep convolutional networks for large-scale image recog-
nition. CoRR abs/1409.1556 (2014), http://arxiv.org/abs/1409.1556
69. L. Sirovich, M. Kirby, Low-dimensional procedure for the characterization of human faces.
J. Opt. Soc. Am. A, Opt. Image Sci. 4, 519–24 (1987). https://doi.org/10.1364/JOSAA.4.
000519
70. M. Slavkovic, D. Jevtic, Face recognition using eigenface approach. Serb. J. Electric. Eng. 9,
121–130 (2012). https://doi.org/10.2298/SJEE1201121S
71. R. Smith, J. Kittler, M. Hamouz, J. Illingworth, Face recognition using angular lda and svm
ensembles, in 18th International Conference on Pattern Recognition (ICPR’06), vol. 3 (2006),
pp. 1008–1012. https://doi.org/10.1109/ICPR.2006.529
72. D. Srihari, P. Kishore, K. Eepuri, D. Anil Kumar, T. Maddala, M. Prasad, R. Prasad, A four-
stream convnet based on spatial and depth flow for human action classification using rgb-d
data. Multimed. Tools Appl. 79 (2020). https://doi.org/10.1007/s11042-019-08588-9
73. Y. Sun, D. Liang, X. Wang, X. Tang, Deepid3: face recognition with very deep neural networks.
ArXiv abs/1502.00873 (2015)
74. Y. Sun, Y. Chen, X. Wang, X. Tang, Deep learning face representation by joint identification-
verification, in Proceedings of the 27th International Conference on Neural Information Pro-
cessing Systems, vol. 2. NIPS 14 (MIT Press, Cambridge, 2014), pp. 1988–1996
75. Y. Sun, X. Wang, X. Tang, Deep learning face representation from predicting 10,000 classes, in
2014 IEEE Conference on Computer Vision and Pattern Recognition (2014), pp. 1891–1898.
https://doi.org/10.1109/CVPR.2014.244
76. Y. Sun, X. Wang, X. Tang, Deeply learned face representations are sparse, selective, and robust,
in 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR) (2015), pp.
2892–2900. https://doi.org/10.1109/CVPR.2015.7298907
77. C. Szegedy, W. Liu, Y. Jia, P. Sermanet, S. Reed, D. Anguelov, D. Erhan, V. Vanhoucke, A.
Rabinovich, Going deeper with convolutions, in Computer Vision and Pattern Recognition
(CVPR) (2015), http://arxiv.org/abs/1409.4842
78. Y. Taigman, M. Yang, M. Ranzato, L. Wolf, Deepface: closing the gap to human-level per-
formance in face verification, in 2014 IEEE Conference on Computer Vision and Pattern
Recognition (2014), pp. 1701–1708. https://doi.org/10.1109/CVPR.2014.220
79. Y. Tayal, P. Pandey, D.B.V. Singh, Face recognition using eigenface. Int. J. Emerg. Technol.
Comput. Appl. Sci. (IJETCAS) 3, 50–55 (2013)
80. M. Teja Kiran Kumar, P. Kishore, M. Prasad, Cnn-lstm hybrid model based human action
recognition with skeletal representation using joint movements based energy maps. Int. J.
Emerg. Trend. Eng. Res. 8(7), 3502–3508 (2020). DOI 10.30534/ijeter/2020/100872020
81. P. Tumuluru, L. Burra, D. Bhavanidasari, C. Saibaba, B. Revathi, B. Venkateswarlu, A
Novel Privacy Preserving Biometric Authentication Scheme Using Polynomial Time Key
Algorithm in Cloud Computing (2021), pp. 1330–1335. https://doi.org/10.1109/ICAIS50930.
2021.9395964
82. M. Turk, A. Pentland, Face recognition using eigenfaces, in Proceedings of 1991 IEEE Com-
puter Society Conference on Computer Vision and Pattern Recognition (1991), pp. 586–591.
https://doi.org/10.1109/CVPR.1991.139758
83. M. Turk, Eigenfaces and Beyond (Advanced Modeling and Methods, Face Processing, 2005)
210 A. Verma et al.
84. A. Verma, T. Meenpal, B. Acharya, Human interaction recognition in videos with body pose
traversal analysis and pairwise interaction framework. IETE J. Res. 1–13 (2020). https://doi.
org/10.1080/03772063.2020.1802355
85. A. Verma, T. Meenpal, B. Acharya, Multiperson interaction recognition in images: a body
keypoint based feature image analysis. Comput. Intel. 37(1), 461–483 (2021). https://doi.org/
10.1111/coin.12419
86. A. Vinay, S. Vinay, K.N. Balasubramanya, S. Natarajan, Face recognition using gabor wavelet
features with PCA and KPCA–a comparative study. Procedia Comput. Sci. 57, 650–659
(2015). https://doi.org/10.1016/j.procs.2015.07.434
87. K.V. Vinitha, G.S. Kumar, Face recognition using probabilistic neural networks, in 2009
World Congress on Nature Biologically Inspired Computing (NaBIC) (2009), pp. 1388–1393.
https://doi.org/10.1109/NABIC.2009.5393716
88. J. Wright, A.Y. Yang, A. Ganesh, S.S. Sastry, Y. Ma, Robust face recognition via sparse
representation. IEEE Trans. Pattern Anal. Mach. Intel. 31(2), 210–227 (2009). https://doi.
org/10.1109/TPAMI.2008.79
89. W. Wu, M. Kan, X. Liu, Y. Yang, S. Shan, X. Chen, Recursive spatial transformer (rest) for
alignment-free face recognition, in 2017 IEEE International Conference on Computer Vision
(ICCV) (2017), pp. 3792–3800. https://doi.org/10.1109/ICCV.2017.407
90. S. Yan, D. Xu, B. Zhang, H.J. Zhang, Q. Yang, S. Lin, Graph embedding and extensions:
a general framework for dimensionality reduction. IEEE Trans. Pattern Anal. Mach. Intel.
29(1), 40–51 (2007). https://doi.org/10.1109/TPAMI.2007.250598
91. M. Yang, L. Zhang, Gabor feature based sparse representation for face recognition with gabor
occlusion dictionary. 6316, 448–461 (2010)
92. J. Yim, H. Jung, B. Yoo, C. Choi, D. Park, J. Kim, Rotating your face using multi-task
deep neural network, in 2015 IEEE Conference on Computer Vision and Pattern Recognition
(CVPR) (2015), pp. 676–684. https://doi.org/10.1109/CVPR.2015.7298667
93. X. Yin, X. Liu, Multi-task convolutional neural network for pose-invariant face recogni-
tion. IEEE Trans. Image Proces. 27(2), 964–975 (2018). https://doi.org/10.1109/TIP.2017.
2765830
94. L. Zhang, M. Yang, X. Feng, Sparse representation or collaborative representation: which
helps face recognition?, in 2011 International Conference on Computer Vision (2011), pp.
471–478. https://doi.org/10.1109/ICCV.2011.6126277
95. W. Zhang, S. Shan, W. Gao, X. Chen, H. Zhang, Local gabor binary pattern histogram sequence
(lgbphs): a novel non-statistical model for face representation and recognition, in Tenth IEEE
International Conference on Computer Vision (ICCV’05), vol. 1 (2005), pp. 786–791. https://
doi.org/10.1109/ICCV.2005.147
96. W. Zhao, R. Chellappa, P.J. Phillips, A. Rosenfeld, Face recognition: a literature survey. ACM
Comput. Surv. 35(4), 399–458 (2003). https://doi.org/10.1145/954339.954342
97. Z. Zheng, J. Zhao, J. Yang, Gabor feature based face recognition using supervised locality
preserving projection, in Advanced Concepts for Intelligent Vision Systems (Springer, Berlin,
2006), pp. 644–653
98. Y. Zhong, J. Chen, B. Huang, Toward end-to-end face recognition through alignment learn-
ing. IEEE Signal Proces. Lett. 24(8), 1213–1217 (2017). https://doi.org/10.1109/LSP.2017.
2715076
99. E. Zhou, Z. Cao, Q. Yin, Naive-deep face recognition: touching the limit of LFW benchmark
or not? CoRR abs/1501.04690 (2015). http://arxiv.org/abs/1501.04690
100. N. Zhu, Z. Yu, C. Kou, A new deep neural architecture search pipeline for face recognition.
IEEE Access 8, 91303–91310 (2020). https://doi.org/10.1109/ACCESS.2020.2994207
101. B. Zoph, Q.V. Le, Neural architecture search with reinforcement learning. CoRR
abs/1611.01578 (2016). http://arxiv.org/abs/1611.01578
COVID-19 Time Series Prediction
and Lockdown Effectiveness
Abstract The origin of the COVID-19 pandemic lies at the wet market of Wuhan,
China, which reportedly incepted from a person’s consumption of a wild animal
that was already infected with the disease. Since then, the virus has spread world-
wide like wildfire and poses a major threat to the entirety of the human species
itself. Coronavirus causes respiratory tract infections that can range from mild to
lethal. This paper discusses the use of data analysis and machine learning to draw
from the implications of the growth patterns of previous pandemics in general and
projects that specifically predict future scenarios of COVID-19. It also compares and
measures some of the present pandemic’s short- and long-span predictions with the
equivalent real-world data observed during and after the said span. It also attempts
to analyze how effective the lockdown has been across various countries and what
India specifically must do to prevent a catastrophic outcome.
1 Introduction
R. Biswas
SAP Labs, Bengaluru, India
S. Dutta (B)
Institute of Engineering & Management, Kolkata, India
e-mail: soumi.dutta@iemcal.com
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 211
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_16
212 R. Biswas and S. Dutta
in the air for a longer duration. A person may also get infected by touching a contam-
inated surface and then touching their face [2]. The virus is most contagious in the
duration of the first three days that begins with the onset of symptoms. Spread is
also possible before symptoms appear and from a person who does not show any
symptom; these are known as asymptomatic carriers.
Many factors can influence the way a pandemic grows and evolves, including but
not limited to the various ways the virus itself mutates over time and geographical
area and how various populations interact with and spread the virus. Since these
factors are not exhaustive in nature, it is really hard to conclude which factors might
be useful features to a machine learning model.
This paper aims to highlight how it is possible to train a univariate model and
still be able to accurately assess the short- and long-term outcomes of a pandemic of
this nature. The study was commenced in early 2020, taking records of the infection
rates from the beginning of the pandemic, the model was finalized and trained in the
month of June, and the accuracy assessment was continued well into the following
year up until the submission of this paper in August 2021.
(not yet a pandemic, as of then) had started to display the characteristics of expo-
nential growth. This was of substantial discussion since pandemics are exceptionally
harder to control and contain once the numbers start hitting these extents [8]. To
mitigate the spread of the virus, nearly all countries, whether already affected or
anticipating a serious outbreak, started to implement what is now colloquially known
as The Great Lockdown [9], the effectiveness of which is going to be the next biggest
matter of discussion for this paper in context.
This essay goes incomplete without mentioning that there is a major ongoing global
economic recession that arose as a consequence of the COVID-19 pandemic. The
first significant sign of the recession was the 2020 stock market crash on the 20th of
February, and the International Monetary Fund (IMF) informed the public on 14th
of April that each one of the G7 countries had already descended or were descending
into a “deep recession” and the fact that there had already been a major stunt of
growth in countries with emerging economies [10]. International Monetary Fund
projections imply that this recession will be the severest global economic slowdown
since the great depression and that it will be much worse compared to even the great
recession of 2009.
The 2019–2020 COVID-19 pandemic is projected to have a substantial negative
effect on the economy worldwide, likely for years to come, with profound drops in
GDP accompanied by spikes in unemployment noted around the globe [11].
214 R. Biswas and S. Dutta
4 Proposed Methodology
It is expected that the COVID-19 pandemic, like any other pandemic, will follow an
exponential growth curve. Even if not, at a minimal, a cubic curve can be expected.
There are several traditional curve fitting methods that can be used for this, but in
order to help our model learn the best representation of the time series data that we
have, it is a very good implication to train a few machine learning algorithms. Due
to the cubic nature of the exponential growth, it makes sense to express a polynomial
function in terms of at least a degree of three. We did some experiments in terms of
the degree of polynomial to be chosen, degrees of three and five seemed too good
fits, and demonstrably, the mean optimal of four was the degree that was settled on.
In the first 20–30 days of the pandemic, the growth was very uneven making it
difficult to assess whether the predictions being made were any accurate. Hence,
there was no point dividing up the data into a training set and a test set. In that case,
the attempt we made was to train the models with the entire dataset available at hand
and project the next 15 or 20 days of predictions according to the trained model.
Based upon awaited observations after the said number of days, we compare our
real-world observations to the predictions, giving us a fair enough metric. Even as of
the writing of this paper, for India, we have only 170 data points, i.e., 170 days worth
of data since January 30th: The date on which the first COVID-19 confirmed case
was reported in India. Even considering other countries, we can have an upward of
230 data points which, either way, gives us a very constrained environment to discuss
solely in the context of COVID-19. The process that we conducted and some of the
results that we obtained are as follows. In Fig. 2, the x-axis represents the number of
days passed since detection of the first COVID-19 case in India, and y-axis depicts
the number of cases in lakhs. This is actual data, and no predictions are made yet
(data source: api.covid19india.org).
We are using a univariate supervised model with the input feature (independent
variable) being the number of days passed since the first case was confirmed in the
country and the label (dependent variable) being the daily number of cumulative
cases. Note that common media, including internet visual informatics, use the daily
new number of cases to be the y-axis parameter, but we will avoid it due to the chaotic
nature of that representation since we want to help our model learn the best possible
fit. Also, exponential nature means that we take the cumulation of the last observed
data point added to the new information, so it only makes sense that the model shall
learn the best possible representation using a smoother time series.
Most importantly, perhaps, the one central takeaway of this paper, the under-
standing, is that we will always prefer to overestimate the pandemic rather than
underestimate it. The reason is simply considering the preparation of the worst-case
scenario instead of always trying to be metrically accurate. It would be nice to have
if we could closely represent the pandemic if we use enough features, but given the
univariate nature of the model, we will take overestimation as fair acceptance and
underestimation strictly not so. But, overestimation should be reasonable enough so
as not to represent something unrealistic.
We aim to fit this data into a polynomial regression model of degree 4. The
implementation that we chose was from Python’s sklearn library (scikit-learn.org),
linear regression was used from the linear_model module, and since we required
the features to be polynomial in nature, we preprocessed our input features using
polynomial features model. The loss function that was used is the one that defaults
to regression loss, mean squared error loss function, also known as quadratic loss.
The optimizer that it uses is traditional gradient descent. The following is when you
train the model with the first 40 days of the data and try to predict 170 days. In Fig. 3,
x-axis represents the number of days passed since the detection of the first COVID-19
case in India, whereas the y-axis denotes the number of cases in lakhs. Blue curve:
number of actual cases and red curve: number of cases the model estimates after
being trained with 40 days of data.
The model is taught that the number of cases is deemed to be really low, to begin
with (the red line), but things in the real world start spiking around the 75-day mark.
Further, training the model with the first 60 days of data produces the following
plot. X-axis represents the number of days passed since the detection of the first
COVID-19 case in India. Y-axis represents the number of cases in lakhs Fig. 4.
In Fig. 5, we can note that the prediction gets closer to the real-life data, but even
the first 60 days (two months) are not enough to be able to tell what we are looking
at in the case of a potential pandemic. The next figure is of training the model with
the first 75 days of data.
Here, we get an interesting plot where our prediction actually overshoots the actual
figures that were later seen in the real world around the 100th day mark. This has
some intriguing implications. More on that further down in this section. The next
Fig. 6 is obtained on training the same model with the first 90 days (nearly half of
the duration so far) of data.
This yet again has interesting implications. That is to be discussed in the summary
below. And finally, this is the plot obtained with 120 days of data (Fig. 7), i.e., about
70% of all of the data that is available to us. Now, the model has started to closely
estimate what the pandemic situation is going to look like in the near future. This
has given us interesting insights that it is nearly impossible to tell which direction a
pandemic is going to go, judging by the duration between the first few weeks or even
COVID-19 Time Series Prediction and Lockdown Effectiveness 217
three months of the duration. All of this growth is the organic growth of the virus
unhindered by any human intervention; neither has any medical advances occurred
nor has any proper social distancing norms been followed leading to the growth of
this nature, in India.
About the overestimation during the 75-day training, it indeed looked like India
had been making significant improvement but that soon, again, changed on the exten-
sion of training to 90 days of observation where again the estimation was overshot.
This tells us that India has been through a mix of improvement, and this improvement
only to be followed by a steady incline in the curve that marks neither improvement
nor the other way.
Sticking to the linear regression model with polynomial features of degree 4 had
made nearly accurate predictions of the
1. 50,000 mark to be on the 6th of May.
2. 100,000 mark to be on the 18th of May.
3. 500,000 mark to be on the 26th of June.
4. 1,000,000 mark to be on the 16th of July.
218 R. Biswas and S. Dutta
All of these predictions were made 10–15 days prior to the predicted date, majorly
implying that this pandemic, too, is following a very predictable near-exponential
curve.
It is a point of note that the model was retained, and predictions made from
it were recorded on a daily basis over the course of a year and still continue to
date (as of August 2021). The following (Fig. 8) was plotted a week prior to the
finalized submission of this paper, and it records roughly 550 days of the growth of
the pandemic (in blue). It is interesting to note that despite of the model being trained
with only initial 55 days of data (i.e., 10% of the total existing data), it succeeds at
closely depicting what the pandemic will look like well into the future (in orange).
This is to highlight how less than two months of data can show the nature of
pandemic one and half years into the future, especially trained with only one feature
of the number of the total cases, which is a very simply obtainable metric. Of course,
it is clear that it underestimates the pandemic growth in the short term, but as medical
infrastructure and government is expected to view this as a long-term problem (over
the course of years). It can be noted that longer-term worst-case scenarios need
to be considered even though the curve might seem to flatten few months into the
pandemic, but it can as easily spike again, unexpectedly, due to the unknown nature
of the factors.
Other popular time series prediction models are, however, tried and tested. None
of which significantly improved the performance of any of the predictions over
polynomial regression. However, a point of note is that it was realized that tree-
based algorithms like random forest regression, decision tree regression, XGBoost
are not any useful to time series predictions. This is even after considering the fact
that these can be very good algorithms in other use cases, but they only perform
well when the test setpoints lie within the upper and lower bounds of the training
set, making it unusable for time series predictions; the entire premise of which is
to talk about points that are outside of your training set (i.e., in the future). Here
is an example of how a tree-based algorithm fails when fed with data of only the
first 140 days. Notice how it flattens out after the 140-day mark since that part is
beyond the training set (Fig. 8). This is just to demonstrate of tree-based prediction
COVID-19 Time Series Prediction and Lockdown Effectiveness 219
algorithms fail to perform outside of their training domain. But, this will be kept just
as one example since any further discussion on this is beyond the scope of this paper.
Since the COVID-19 pandemic grew in exponential terms, as presented above, the
world has witnessed an unprecedented surge of need for health care. Even with
the best health care systems like Italy and the United States, the primary problems
that nations faced were due to the overpopulation at the hospitals. COVID-19 being a
disease with no mild symptoms requiring treatment, people got admitted at hospitals,
taking up beds, and soon led to overcrowding. Even individuals with only mild
symptoms in the face of panic got themselves admitted, which led to lesser care
for actually serious but treatable individuals where these countries saw a spike in
the death rate. Even individuals with other medical issues like accidents and other
diseases could not get health care on time, and that contributed to the death rate even
more. Lack of public awareness did not help decrease the contamination rate, and
more and more individuals kept getting infected to the point of requiring medical
treatment.
“Flatten the curve” is a public health strategy to alleviate the spread of the virus
during the pandemic. The curve in question is the epidemic curve, a visual repre-
sentation of the number of people that are infected who require health care plotted
over time. During a pandemic, a health care system is prone to breaking down when
number of people contaminated exceeds the capacity of the health care system’s
ability to treat them. Flattening the curve stands for slowing the spread of the virus
so that the steepest number of people requiring medical attendee on at a single time
is minimized, and the health care system does not exceed its maximum capacity.
Flattening the curve heavily depends on mitigation techniques especially social
distancing.
Warnings about the potential risk of a pandemic were made repeatedly throughout
the 2000s and the 2010s by prominent international organizations, including the
World Health Organization (WHO) and the World Bank, especially after the 2002
SARS outbreak [12]. Forms of the government, including the ones in the United States
and France, both before the 2009 swine flu pandemic and during the years following
the pandemic mentioned, strengthened their health care capacities but then again
weakened them [13]. At the time of the COVID-19 pandemic, health care systems
in many countries were compelled to function near their maximum capacities. In
situations like these, during which a sizable new epidemic emerges, an amount of
infected and symptomatic individuals cause a spike in demand for health care which
has only been predicted statistically, without the commencement of the epidemic or
the potential infectivity and lethality are known in advance. If the need circumvents
the capacity line in the infections per day curve, then the existing health care facility
cannot handle the surge of patients completely, resulting in higher mortality rates
than what would occur had preparations been made [14].
220 R. Biswas and S. Dutta
States of America is actually doing even worse with their lockdown than they started
with. Although, contrary to popular belief, this did not have anything to do with the
increased Black Lives Matter movement which were majorly a consequence of the
death of George Floyd. Looking at India next, we have some obvious conclusions.
222 R. Biswas and S. Dutta
The model observed (Fig. 13) only the first 100 days of data, yet it could nearly
accurately predict what the situation is going to be. This implies that the implemented
lockdowns barely had any effect, and even lesser so when they were eventually lifted
on and off, which is going on as of today. India initially implemented a 21-day
lockdown not to control the pandemic but rather to buy some time for the authorities
to brace for impact before the actual widespread transmission of the virus begins.
In a country like India, mass spreading was inevitable. And especially without any
control interventions, being a country of 1.3-billion population, looking at the current
statistics, if 50% of the population gets affected, it will be a mere 650 million, and if
0.5 percent of this count suffers death, we are looking at about 3 million deaths and
all of this not taking into account the arrival of a vaccine on treatment.
All of this leads to the conclusion that any of the lockdowns implemented in India
did slow down the spread of the virus. India had the lead time to learn from other
countries, and the steps taken from these observations were effective in making the
climb less steep compared to other countries. But, what India has to be aware that it is
still a steep climb nonetheless and will continue to grow in the same exact fashion. The
number of cases will keep getting higher at a near-exponential fashion without being
controlled by public means of social distancing, without the intervention of medical
advances in the form of treatments and vaccinations, even more so. The government
should do better at containing this pandemic. The medical industry needs to be more
considerate at handling patients considering the economy of the country. The general
public needs to be more aware of the situation. If so, we might be looking toward a
better future. The means of attaining any of the above, however, is beyond the scope
of this paper.
COVID-19 Time Series Prediction and Lockdown Effectiveness 223
References
1. A.K. Sahai, et al., ARIMA modelling & forecasting of COVID-19 in top five affected countries.
Diab. Metab. Synd. Clin. Res. Rev. 14(5), 1419–1427 (2020)
2. P. De Masques, Une Responsabilité Partagée Par Les Gouvernements. Public Senat
(2020). https://www.publicsenat.fr/article/politique/penurie-de-masques-une-responsabilite-
partagee-par-les-gouvernements-successifs. Accessed on June 20, 2021
3. W. Dawn Kopecki, B. Jr. Lovelace, World Health Organization declares the coronavirus
outbreak a global pandemic. CNBC (2020). https://www.cnbc.com/2020/03/11/who-declares-
the-coronavirus-outbreak-a-global-pandemic.html. Accesed on June 25, 2021
4. Imperial College London, Report 9—Impact of Non-pharmaceutical Interventions
(Npis) to Reduce COVID-19 Mortality and Healthcare Demand. Imperial College
London. https://www.imperial.ac.uk/mrc-global-infectious-disease-analysis/covid-19/report-
9-impact-of-npis-on-covid-19/. Accessed on June 27, 2021
5. Mohammed, A.-Q., et al., Optimization method for forecasting confirmed cases of COVID-19
in China. J. Clin. Med. 9(3), 674 (2020)
6. N. Sharma, India’s swiftness in dealing with Covid-19 will decide the world’s future, says who.
Quartz. https://qz.com/india/1824041/who-says-indias-action-on-coronavirus-critical-for-the-
world/
7. D. Fanelli, F. Piazza, Analysis and forecast of covid-19 spreading in China, Italy and France.
Chaos, Solitons Fractals 134, 109761 (2020)
8. O.-D. Ilie, et al., Forecasting the spreading of COVID-19 across nine countries from Europe,
Asia, and the American continents using the ARIMA models. Microorganisms 8(8), 1158
(2020)
9. S.P. Stawicki, et al., The 2019–2020 novel coronavirus (severe acute respiratory syndrome
coronavirus 2) pandemic: a joint American college of academic international medicine-
world academic council of emergency medicine multidisciplinary COVID-19 working group
consensus paper. J. Glob. Infect. Diseases 12(2), 47 (2020)
10. C. Anastassopoulou, et al. Data-based analysis, modelling and forecasting of the COVID-19
outbreak. PloS One 15(3), e0230405 (2020)
11. F. Islam, Coronavirus recession not yet a depression. BBC News, BBC, ar. 2020, https://www.
bbc.com/news/business-51984470. Accesed on June 20, 2021
12. Centers for Disease Control and Prevention, Covid-19, in Centers for Disease Control and
Prevention (2021). https://www.cdc.gov/media/dpk/diseases-and-conditions/coronavirus/cor
onavirus-2020.html. Accesed on June 25, 2021
13. B.C. Archibald, A.B. Koehler, Normalisation of seasonal factors in winters’ methods. Int. J.
Forecast. 19(1), 143–148 (2003)
14. A. Tarsitano, I.L. Amerise, Short-term load forecasting using a two-stage sari-max model.
Energy 133, 108–114 (2017)
15. V. Stadnytskyi, et al., The airborne lifetime of small speech droplets and their potential
importance in SARS-CoV-2 transmission. Proc. Natl. Acad. Sci. 117(22), 11875–11877 (2020)
Performance Evaluation
of Electrogastrogram (EGG) Signal
Compression for Telemedicine Using
Various Wavelet Transform
Abstract This paper discusses the recording and compression analysis of an Elec-
trogastrogram (EGG), a non-invasive instrument that visually represents the electrical
activity of the stomach to diagnose stomach illnesses. The EGG signal’s compression
is important in the diagnosis, prognosis, and survival analysis of all stomach-related
disorders, especially in telemedicine applications where the patient is geographically
isolated. Over the years, several signal compression algorithms have been presented.
High cost, signal degradation, and a low compression ratio are just a few drawbacks
that result in an inefficient signal at the receiver’s end. The advantages of EGG
compression in digital domain for telemedicine applications are effective utilization
of storage data, reduced data transmission rate, and efficient transmission band-
width. Various wavelet transformations such as biorthogonal, coiflet, daubechies,
haar, reverse biorthogonal, and symlet wavelet transforms are applied to EGG signals
and examined using MATLAB software in this paper. The wavelet’s performance
was evaluated to select the best wavelet for telemedicine. This is accomplished by
a quantitative analysis of the recovery ratio, percent root mean square difference
(PRD), and compression ratio (CR) measurements. The findings of this study in
terms of determining the optimal signal compression performance can undoubtedly
become a valuable asset in the telemedicine area for the transmission of quantitative
biological signals.
M. Gokul (B)
Department of Biomedical Engineering, School of Bio and Chemical Engineering, Kalasalingam
Academy of Research and Education, Krishnankoil, TN, India
M. Sameera Fathimal
Department of Biomedical Engineering, SRM University, Chennai, Tamilnadu, India
S. Jothiraj
Department of Biomedical Engineering, Rajalakshmi Engineering College, Chennai, Tamilnadu,
India
P. Murugesan
Department of Biomedical Engineering, Faculty of Engineering, Vrije University, Brussel,
Belgium
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 225
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_17
226 M. Gokul et al.
1 Introduction
The EGG acquired with surface electrode was amplified, filtered, and converted to a
digital signal. The performance of the wavelet transform for data compression was
analyzed using the MATLAB software.
Instrumenta
tion Bandpass MATLAB
Surface Filter Analog to
amplifier Analysis
Electrode digital
(amplifies (filters 0.02 (At
(sensor) converter
100 – 500 – 0.20 Hz) PC/Laptop)
micro volts)
Wavelets are ideal for processing biological signals. It outperforms Discrete Cosine
Transform (DCT) compression as it consists of non-uniform frequency spectral char-
acteristics that promote multi-scale analysis and multiresolution properties, reduced
distortion, sophisticated distortion, and sophisticated compression strategies. Addi-
tionally, there are some drawbacks in processing the bio-signals in other transform
like Fourier Transform, where there is a complete loss of time information [6]. The
frequency axis has been uniformly divided, and it is possible to make the resolu-
tion accurate if it is integrated along the whole-time axis. In Short Time Fourier
Transform (STFT), the information in the time domain is taken into consideration
by addition of a window. The frequency resolution depends on time resolution and
the size of the window considered. Due to the box uniformity, it is not possible to
enhance a particular frequency range in STFT. However, the wavelet can capture
useful and effective details of a bio-signal through localized function [7]. Efficient
time and frequency localization can be produced when the signal is portrayed as a time
function. Compared to Fourier transform, the wavelet method has greater flexibility
and enables choosing the characterization for specific biomedical applications. The
Discrete Wavelet Transform (DWT) is more potent as it yields adequate information
to evaluate and integrate a bio-signal. DWT is calculated utilizing the filterbanks in
which filters with different cutoff frequencies analyze the signal at different scales.
It is feasible to alter the resolution by upsampling and downsampling of a signal as
it passes through the high and low pass filters [6, 7].
Wireless Transmission
Percentage root mean difference is a quantity that measures the distortion between
reconstructed and original input signal. The PRD establishes point-wise comparison
with the original data, providing reconstruction fidelity [8, 9]. The mathematical
formula for PRD computation is shown (see Eq. 1).
N
N
[x(n) − X (n)]2 x(n)2 (1)
n=1 n=1
Compressed data must have better fidelity when the compression ratio is high [8].
The mathematical formula for CR is shown below (see Eq. 2).
where Boriginal is bit rate of the original signal, and Bcompressed is bit rate of the
compressed signal.
According to the decomposition level, the signal can be recovered in the recovery
ratio parameter. If the decomposed level is very high, it has a low recovery ratio and
vice versa.
Signal decomposition involves down sampling and filtering. The decomposed
signal consists of a detailed part (high frequency) and an approximation part
(low frequency). The sub-signal generated from the lowpass filter is considered a
maximum frequency equal to half the frequency of the signal in agreement with
the Nyquist principle [10]. The perfect signal can be reconstructed with only half
of the original stores and transmitted samples. Up sampling, when employed, elimi-
nates consecutive second sample. Then the lowpass filtered approximation sub-signal
is passed through a filter, and the process is continued till the desired degree of
decomposition is obtained [11].
The objective of signal compression is to minimize the no. of bits in the data
while storing the data with acceptable quality. Even though numerous compression
techniques have been suggested in the literature, few aim to perfect reconstruct
230 M. Gokul et al.
original data [12]. When an application needs only limited bitrates, methods that
enable a supervised loss of information can be used. The loss is very small and is
called the “lossy” method that combines high compression ratios with acceptable
visual quality. So, compression ratio, performance root means square difference,
and recovery ratio are calculated [13]. In DWT, the reconstruction process of the
wavelet consists of up sampling and filtering. The filter selection is essential for the
reconstructed signal to be perfect [12, 14]. The down sampling of the biological
signal performed during the decomposition phase produces an artifact or distortion
called aliasing. It turns to be a key factor while selecting the filters for decomposition
and reconstruction aspects that are closest (that may not be identical) to cancel the
effects of aliasing [14].
The orthogonality characteristics and the multi resolution analysis (MRA) should be
performed to construct the base wavelet. The orthogonal wavelet describes the infor-
mation contained in an image and leads to the creation of multiresolution analysis
[15, 16]. The spline method is used to construct the base wavelet [17, 18].
All the wavelets have orthogonal properties. Biorthogonal and harr wavelets are
symmetrical, whereas coiflet and symlet are near symmetric and daubechies are
asymmetric. The input signal applied to biorthogonal (left) and reverse biorthog-
onal (Right) wavelets are shown in with a clear view of the original EGG signal,
compressed signal, reconstructed signal, and error signal (see Fig. 4). The harr and
daubechies (see Fig. 5) wavelet transform is applied on the EGG signal, then the
signal is compressed and reconstructed, also the error signal is computed [19]. The
input signal is applied with coiflet and symlet wavelet transform (see Fig. 6) [20, 21].
Fig. 4 EGG signal processing using biorthogonal and reverse biorthogonal wavelet transform
Performance Evaluation of Electrogastrogram (EGG) Signal … 231
Fig. 5 EGG signal processing using haar and daubechies wavelet transform
Fig. 6 EGG signal processing using coiflet and symlet wavelet transform
Results are obtained from different wavelet transforms on various sets of data is
tabulated (see Table 1). For good transmission, the PRD should be low. Among the
wavelets tested, Haar wavelet has very low PRD [8]. But to compare corresponding
Table 1 Performance
Wavelets CR (%) PRD Recovery ratio
analysis with various wavelet
transforms Haar 98.5371 6.1854e−15 67.8443
Symlet 98.5371 8.3502e−13 67.8443
Daubechies 98.5271 1.8939e−11 67.7987
Coiflet 98.5540 2.6572e−08 67.4655
Biorthogonal 98.5492 1.4608e−12 67.6211
Reverse biorthogonal 98.5540 1.4701e−12 67.4655
232 M. Gokul et al.
CR, Coiflet and reverse biorthogonal has better CR than other transforms. In terms
of overall performance, reverse biorthogonal has good and decent performance in
compression, which has better CR with low PRD.
After finding the best wavelet for transmission, the best and suitable order of
wavelet should be chosen to achieve the lowest reconstruction error. To find the
lowest reconstruction error, MRA is applied [22]. The best low average error order
for EGG signal processing is 1.3 in reverse biorthogonal wavelet. The average error
calculated in all levels of reverse biorthogonal wavelet where rbior 1.3 (reverse
biorthogonal 1.3) has the lowest error rate.
4 Conclusion
According to the principal objective of our paper, the best performance wavelet for
compression and suitable order for the lowest reconstruction error was found. The
results obtained proved that the reverse biorthogonal wavelet is more suitable for
Electrogastrogram in telemedicine. While incorporating this wavelet in compression
technique, faster and less expensive transmissions are possible on a daily basis. The
future work of this research is to apply these wavelets for all kinds of biological signals
and to add additional analysis features like an automatic diagnosis in the receiver
end. These advancements could be very useful in the field of gastroenterology and
telemedicine when the subject is affected by severe gastric illness like stomach cancer,
ulcer, etc.
References
9. C.L. Tseng, C.C. Hsiao, I.C. Chou, C.J. Hsu, Y.J. Chang, R.G. Lee, Design and implementation
of ECG compression algorithm with controllable percent root-mean-square difference. Biomed.
Eng. Appl. Basis Commun. 19(04), 259–268 (2007)
10. J. Kevric, A. Subasi, Comparison of signal decomposition methods in classification of EEG
signals for motor-imagery BCI system. Biomed. Signal Process. Control 31, 398–406 (2017)
11. A. Cicone, J. Liu, H. Zhou, Adaptive local iterative filtering for signal decomposition and
instantaneous frequency analysis. Appl. Comput. Harmon. Anal. 41(2), 384–411 (2016)
12. M. Gokul, N. Durgadevi, B. Sukita, Medical product aspects of antenatal wellbeing belt—the
consolidated analysis on product design and specification. World J. Pharm. Res. 9 (2020)
13. S.O. Rajankar, S.N. Talbar, An electrocardiogram signal compression technique: a compre-
hensive review. Analog Integr. Circ. Sig. Process 98(1), 59–74 (2019)
14. C.A.E. Kothe, T.P. Jung, Artifact removal techniques with signal reconstruction. U.S. Patent
Application 14/895,440 (2016)
15. A. Sake, R. Tirumala, Bi-orthogonal wavelet transform based video watermarking using
optimization techniques. Mater. Today: Proc. 5(1), 1470–1477 (2018)
16. P.M.K. Prasad, D.Y.V. Prasad, G.S. Rao, Performance analysis of orthogonal and biorthogonal
wavelets for edge detection of X-ray images. Procedia Comput. Sci. 87, 116–121 (2016)
17. M. Sharma, A. Dhere, R.B. Pachori, U.R. Acharya, An automatic detection of focal EEG signals
using new class of time–frequency localized orthogonal wavelet filter banks. Knowl.-Based
Syst. 118, 217–227 (2017)
18. K. Mourad, B.R. Fethi, Efficient automatic detection of QRS complexes in ECG signal based on
reverse biorthogonal wavelet decomposition and nonlinear filtering. Measurement 94, 663–670
(2016)
19. D. Zhang, Wavelet transform, in Fundamentals of image data mining (Springer, Cham, 2019),
pp. 35–44
20. A. Zaeni, T. Kasnalestari, U. Khayam, Application of wavelet transformation symlet type and
coiflet type for partial discharge signals denoising, in 2018 5th International Conference on
Electric Vehicular Technology (ICEVT). IEEE (2018), pp. 78–82
21. P.P.S. Saputra, R. Firmansyah, D. Irawan, Various and multilevel of coiflet discrete wavelet
transform and quadratic discriminant analysis for classification misalignment on three phase
induction motor. J. Phys. Conf. Ser. 1367(1), 012049 (2019)
22. A. Gudigar, U. Raghavendra, T.R. San, E.J. Ciaccio, U.R. Acharya, Application of multireso-
lution analysis for automated detection of brain abnormality using MR images: a comparative
study. Futur. Gener. Comput. Syst. 90, 359–367 (2019)
The Impact of UV-C Treatment on Fruits
and Vegetables for Quality and Shelf Life
Improvement Using Internet of Things
Abstract The objective of the research is to develop a device that helps sanitize the
consumable goods, materials and make free from bacteria and virus. The materials
bought from outside are exposed to UV-C rays (254 nm) for a duration of 5–15 min so
that fearlessly use them without altering the quality of the goods. The device consists
of UVC light installed inside a chamber with calibrated dosage UVC light. The type
of UV used for the purpose is Type-3 UVC, which is effective with microorganisms,
including new viruses. The items to be disinfected are carried on to the chamber using
a specialized tray to provide the complete penetration of UVC light inside the tray or
directly placing the items; the process begins with scanning and sanitization of the
items, covering the whole surface area in 360°. In this study, the short UV-C therapy
for whole tomato and banana and fresh sliced apple, guava, grapes were chosen, and
its efficiency was assessed with various time duration with multiple methods to check
its nutrients level degradation. The sliced fruits treated with UV-C (254 nm) reported
lower oxidation relative to the control fruit quality by increasing the shelf life. The
result shows the comparative changes when treated with 5–15 min of 254 nm UV-C
dose were more important in retaining quality. The subsequent experiments with
standardized High Performance Liquid Chromatography (HPLC), Bradford assay,
and Dinitrosalicylic acid assay methods were used to test changes in nutrients level
and compared with the fruits treated under 5, 10, and 15 min. The test results showed
that it would inactivate the microorganisms on the object’s surface and proved that
it would not deteriorate the nutrients level.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 235
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_18
236 N. Sneha and B. M. Patil
1 Introduction
During the current pandemic situation COVID-19 breakout, all of us know how
we are taking precautions in ourselves and our surroundings. The safest way is to
applying sanitizer to our hands and wear a mask to keep us safe from the virus
spreading in and around us. Another critical thing during the period is to safeguard
from the materials brought from the outside like groceries, marketplaces, shops, etc.,
since it may carry the virus or bacteria spreading all over the world like wildfire or
sandstorm. People stopped going out even though it’s a basic necessity and running
their lives in fear for many days or even months from march till May 2020. The
COVID-19 is spreading out all over the country made us fear. Few people went to
depression for fear of touching any objects around them. According to research, the
viruses can long last up to 48 h, upon the surfaces like mask, money, bills, mobiles,
clothes, packages, an individual need to be careful in contact with these materials.
Sanitization of the materials [1] is required before they are used in daily activities at
home, office, or outside.
The need for food for humans and animals is increasing drastically, and there is a
sacristy of food. Perhaps food wastage is not stopped in a few locations, events, food
industry [4]. The food wastage during processing should be taken care of well to
serve food for few more living beings. The processing of food aims to process raw
materials into value-added food for human and animal consumption. The processing
of food should ensure food safety for safe use. The preparation and processing of food
also include drying, cooling, and salting, which guarantees the shelf life of perish-
able food products. By enhancing their tastes, flavor, and texture, some microbes
such as yeast and bacteria are used to increase the quality of food products. This
accounts for an improvement in the food item’s value. All these activities include
automated processes carrying out the smooth system working on completing the
food processing activities using Technologies such as IoT, Machine learning [5],
and Image Processing [6]. The technologies include identifying fruits, vegetable
conditions such as ripen, rotten, raw, percentage of microorganisms on the surface
etc.
2 Literature Survey
Koutchma et al. [7] give a comparative analysis using 92 studies which are published
between 2004 and 2015 in the ultraviolet (UV) light and high-pressure processing
(HPP). One of the limitations to their more extensive commercial application lies
in the lack of comparative effects on nutritional and quality-related compounds in
juice products. Minimal processing nonthermal techniques such as ultraviolet (UV)
light and high-pressure processing (HPP) are expected to be used to extend shelf-
life while retaining physicochemical, nutritional, and sensory characteristics with
reduced microbial loads. Moreno et al. [8] used short UV-C treatment for fresh-
cut carambola. The experiment was carried for controlled environment fruit and
UV-C exposed fruit. The test was carried out for the first day, and after 21 days,
the nutritional values were checked. They proved that UV-C exposure reduces the
yeast, bacterial count, and controlled spoilage. Prestorage UV-C exposure was highly
effective in controlling fruit browning through Polyphenol Oxidase (PPO) inhibition
and improved maintenance of tissue integrity.
Yang et al. [9] proposed a model used at the hospital to disinfect or sterilize the
hospital patient room before the new patient gets admitted. They conducted a study
to find the effectiveness of a mobile, automated device, hyperlight disinfection Robot
which utilized UV-C to kill Pseudomonas aeruginosa, Acinetobacter baumanni etc.
They observed that after UV-C exposure in an uncleaned hospital room and found a
significant reduction in the bacteria count. It was giving their feasible involvement in
acquiring the activities of cell wall degrading fruits and vegetable firmness. Weaver
238 N. Sneha and B. M. Patil
et al. [10] provide the details with respect to the sterilization of surgical masks and
N95 respirators. In this, they gave a procedure for sterilizing N95 respirators in a
bio-safety cabinet. Because of Coronavirus worldwide, a shortage of masks plays a
vital role in protecting any individual from effect. So, the main intention behind this
paper is to reuse the masks after exposing to UV-C rays for 15–20 min. This will
help to some extent avoid a shortage of surgical masks and N95 masks. Raeiszadeh
et al. [11] present a review of applications of UV-C light in various places, which
helps sanitize the frequently touched places. The UV-C light were installed in places
like shopping malls, airport etc., to kill the bacteria. Apart from the applications, the
author also discussed the safety considerations for UV-C light usage. Jiang et al. [12]
give the survey of utilization of UV light for sterilization of fruits and vegetables.
This study gives insight into how useful the UV light is to kill the bacteria or fungus
and helps improve the shelf life of fruits and vegetables. However, multiple devices
have been developed recently for the sterilization of groceries, fruits, and vegetables,
etc. The limitation with all these devices is that no exposure at the bottom surfaces
is taken care of in our device.
3 Methodology
The device is developed using Module HC-05, Relay, Motor Driver L298N, and
UV Light, Bluetooth device shown in Fig. 2. The major part of the device lies on
the component UVC light (254 nm), more effective in reducing microbial growth
and cleaning the surface than the commonly used chlorine, hydrogen peroxide, or
ozone, which can leave residue and ultimately reduce quality. After exposure, the
quality of the goods won’t be changed as it retains the nutritional level. The device
can be operated using the APP or else manually. Also, it can be operated based
upon the customer’s requirements. The goods are kept inside the box, and we can
control the action using the buttons ON/OFF, RESET, SET Time, Water (ON/OFF)
through your mobile or external switches to switch on the device. Users will have
an option of selecting either water sprinkling mode or UV exposure mode. After
Cleaning/completion, the System will automatically beep the status for completion.
This effectively works based on bulbs that can produce the precise dosage of UV-
C light required to destroy DNA and RNA viruses. Dual light, one light tube can be
balanced and placed at 180° at the bottom, and another light tube can be placed at the
bottom at 180°. This makes sure the vegetables and fruits are exposed to 360° using
aluminum foils [13], which helps in reflecting light in all degrees. Also, the device
consists of a security door that can prevent light rays from holding the light coming
out of the box. The device can be used for home-based and industry depending on the
quantity of the materials. The food industry is exposed to cleaning of huge vegetables
and fruits to prepare packed foods like juice jam and food materials, which need to be
thoroughly washed when received from the market or directly from the farm. Some
of the research has focused on disinfecting surface area through methods [14].
Depending on the quantity device can be modified, as shown in Fig. 3. The device
consists of a conveyor belt attached with the UV-C light device, which keeps on
scrolling and pushing the vegetables/fruits to the device for continuous exposure of
the UV-C light for few minutes. Also, the device can have incorporated with a water
sprinkler based on the type of vegetables being exposed, like potatoes, carrots etc.
UV radiation is generally used for the treatment of drinking and waste water [15],
air, disinfection, fruit and vegetable juice treatment. UV has been a choice within
research facilities when obtaining biological safety cabinets for years and can even
be used within the laboratory. Within the 100–280 nm wavelengths of UV, identified
as UV-C, with the wavelength peak for 265 nm of germicidal action. This spectrum
action includes absorption of DNA and RNA Microorganisms. The source of UV
light includes sunlight and other electrically human-designed germicidal lamps that
generate light as per the designed dosage of wavelength range of 100–400 nm The
light sample is shown in Fig. 4.
The plant material used for the testing under device are Tomato, guava, apple, grapes,
and banana. Tomato botanical name Solanum lycopersicum [17] is a vegetable and
edible berry of a tomato plant rich in Vitamin C, Potassium, Folate, simple sugars,
glucose, and fructose. The plants are widely affected by the tobacco mosaic virus,
curly top, Pyrenochaeta lycopersici, Didymella stem rot, Early blight, Alternaria
solani, etc. Banana, botanical name Musa genus, is a fruit an edible berry of the banana
plant, which is rich in magnesium, vitamin C, B6, carbohydrates, and fibers. The fruit
is widely affected by panama disease, tropical race 4 etc. Guava, botanical name
Psidium guajava [18], is rich in vitamin C, dietary fiber, folic acid, few concentrates
of vitamin A, and potassium. Grapes, botanical name Vitaceae [19], is a fruit that
is rich in vitamin C, K, E, carbohydrates. Apple, botanical name Malus domestica
[20], is an edible fruit rich in carbohydrates and fiber.
Figure 5 shows the circuit diagram of the device consisting of Arduino and Blue-
tooth, Relay, Stepdown transformer, UV sensor, Fan, Fuse, UV Light, and connec-
tions. The device can be operated by using an APP that is designed to operate it.
To communicate with the device, it needs to be configured with the APP through
Bluetooth. Figure 6 shows the device’s working flow of the device and after the
connectivity between the device and the APP, after connection is successful, the user
can start the device. The device will turn on until and unless the door is closed. If the
door is closed than the user has to set for 5 min and start the device. If in between
the door gets open, then the timer will pause, and the UV light is turned off. Later,
when the door is closed, the timer will be resumed.
Once the food products such as fruits and vegetables harvested from the farm it needs
to thoroughly clean before sending to the processing industry or industry should take
care in cleaning before it can be used in packed food. The availability of products
The Impact of UV-C Treatment on Fruits and Vegetables … 241
needs to be stored in proper storage units for preservation for future processing,
and the quality of food may decrease once it goes to cold storage units. Finding the
Deterioration level [21] of fruits and vegetables can be done using Systematic Visual
Image Analysis, and damage level that can be rated from 1 to 5 or 10 to 100%,
where 1—Poor Quality and 5—best Quality, 10—rotten stage and 100% indicates—
freshness. The Deterioration level of fruits and vegetables or any food products can
be calculated using Eq. (1).
n
i=1 r.s r 1 s1 + r 2 s2 + r 3 s3 + · · · + r n sn
Detorioration level = = (1)
Total Samples n
Case 1: If 10 samples of fruits are considered for quality assessment and among 2
fruits are rated 3 scales, and 8 are rated 5 scales, then deterioration of the 10 samples.
DI = 3 * 2 + 5 * 8/10 = 4 and weighted of the samples is 4 rated. The conclusion
will be 4 rated for 10 samples and they are fresh to use it.
Case 2: If 100 sample of fruits is considered for quality assessment and among 16
fruits are rated 1 scale, 56 fruits are 2 rated, 23 fruits are 3 rated, and 5 are 4 rated
scale, then deterioration of the 100 samples is DI = 1 * 16 + 2 * 56 + 3 * 23 +
4 * 5/100 samples, and the weighted average is 2.17 average rating. The conclusion
will be 2.17 rating samples is not consumable.
The device UV C Based Sterilization for vegetables and fruits is shown in Fig. 7.
The objective is to disinfect the vegetables and fruits from bacteria and viruses by
exposing the materials to UV Light (254 nm—UVC). Three samples were taken:
Sample 1: Untreated, Sample 2: Exposing the sample for UV—10 min, Sample 3:
Exposing the sample for varies minutes. After exposure, the sample is tested for the
quantity of carbohydrates and proteins concentration.
A sample consists of one vegetable and one fruit. The testing results are shown in
Figs. 8 and 9. Figure 8 shows the results of carbohydrate content were determined
using Dinitrosalicylic acid assay [22], a standard spectrophotometric biochemical
assay. The values indicate that there is a higher quantity of carbohydrates, with an
increase in treatment time, indicating possible hydrolysis. Figure 9 shows the results
of protein content were determined using Bradford assay [23], a standard spectropho-
tometric biochemical assay. Banana: The values indicate there is no presence of free
proteins in bananas, possibly due to the low content of proteins in banana fruit.
In tomatoes, there is an increase in the concentration of proteins with increasing
treatment time, possibly due to increased protein release. High-performance liquid
chromatography (HPLC) [24], which is also named as high-pressure liquid chro-
matography. This method is used to separate, identify, and quantify each component
in a mixture.
HPLC method is used for many applications such as during the pharmaceutical
product manufacturing process, separation of a complex biological sample proposes.
The analysis was carried out equipped with a binary pump (LC-20 AD), a variable
wavelength UV–VIS (SPD-M20A) detector, Shimadzu C18 column (250 × 4.6 mm,
5 µ), and a manual injection valve fitted with 20 µl sample loop. The instrument was
controlled by LC solution software. Apple, Grape, and Guava samples were exposed
under a UV incubator chamber for 5 and 10 min, after that macerated in water for
Malic acid, Citric acid, and Vitamin C Analysis.
Vitamin C was analyzed using Shimadzu C18 column with a mobile phase of
acetonitrile and 10 mM potassium di-hydrogen orthophosphate buffer mixed in a
Fig. 8 Carbohydrate
concentration results using
dinitrosalicylic acid assay
ratio of 40:60 (pH = 2.1) at a wavelength of 268 nm with a UV detector. The flow
of the mobile phase was maintained at a speed of 1 ml/min. Vitamin C content has
remained unchanged in all three fruit after 5 and 10 min exposure in the chamber.
Maximum Vitamin C was observed in Guava (4.0 mg/gm FW) followed by Grape
and Apple.
Citric acid and Malic acid were isocratic ally separated using a Shimadzu C18
column at a 1 ml/min flow rate. The mobile phase used was 2% (w/v) ammonium
di-hydrogen orthophosphate (NH4H2PO4) buffer (pH 2.18). UV–VIS detector at
wavelength 214 nm was used for detection. Both Citric acid and Malic acid have
remained unaffected by the exposure in the UV chamber for 5 and 10 min. Maximum
Citric acid was observed in Guava (2.31 mg/gm FW) followed by Grape and Apple.
A similar trend has been observed for malicacid as well. The variation results are
shown in Fig. 10 and sample images during testing is shown in Fig. 11.
Applications: UV-based sterilization system/device can be used in homes, food
industry, hostel mess, hotels, and restaurants etc. The device assures us of cleanness
and microbial-free use for the individual and processing vegetables and fruits for
packaged goods, juices, jams, pickles, snacks, instant foods etc. The device can
have incorporated with deep learning and high-end IoT devices to perform more
analysis along with the cloud, to recognize the quality of the sample used so far in
the industries.
Fig. 10 Variations of citric acid, malic acid, vitamin C in guava, grapes, and apple when exposed
to untreated, 5 and 10 min
The Impact of UV-C Treatment on Fruits and Vegetables … 245
5 Conclusion
During this pandemic, the usage of fruits and vegetables, even after washing, will not
guarantee the destruction of the bacteria. So, we developed a model which consists
of UV light at the top and bottom of the device. The 2 lights will help expose the
items at the top and bottom, and aluminum foils help in reflecting light in all 360°.
After exposure, it destroys the microorganisms on the surface and preserves shelf life
for a further few days compared to cold storage. The device can be used per the load
and device exposure followed by cold storage to preserve a few more days. After
exposure, the experiment was conducted using micrological and HPLC methods
for different samples of tomato, banana, guava, grapes, and apple. The results were
evaluated to show the changes in concentration in carbohydrates, protein, malic acid,
citric acid, and vitamins. These methods proved that after exposure to UV light, the
quality of the items wouldn’t change, and also it extended the shelf life.
Acknowledgements The authors express their sincere gratitude to our Honorable Chancellor Dr.
P. Shyama Raju Sir, Our Director Dr. S. Senthil Sir, Dr. Rajeev Ranjan Sir, School of Computer
Science and Applications, and Dr. Shilpa BR, Dr. Jayashree, School of Applied Science and REVA
family for giving constant encouragement, support to carry out research at REVA University. The
implementation work was carried out along with BCA Students Mr. Yashwanth, Mr. Vinay, Mr.
Uthej, and Mr. Darshan from the School of Computer science and Applications.
References
1. M.H. Khan, H. Yadav, Sanitization during and after COVID-19 pandemic: a short review. Trans.
Indian Natl. Acad. Eng. 5, 617–627 (2020). https://doi.org/10.1007/s41403-020-00177-9
2. . S.Z. Li, A. Jain (eds.), Electromagnetic spectrum, in Encyclopedia of Biometrics (Springer,
Boston, 2009). https://doi.org/10.1007/978-0-387-73003-5_504
3. https://uvceco.com/why-uv-c-cannot-produce-ozone/
4. Y. Gu, W. Han, L. Zheng, B. Jin, Using IoT technologies to resolve the food safety problem—
an analysis based on Chinese food standards, in Web Information Systems and Mining. WISM
2012, ed. by F.L. Wang, J. Lei, Z. Gong, X. Luo, Lecture Notes in Computer Science, vol. 7529
(Springer, Berlin, 2012). https://doi.org/10.1007/978-3-642-33469-6_50
5. S.K. Behera, A.K. Rath, A. Mahapatra et al., Identification, classification & grading of fruits
using machine learning & computer intelligence: a review. J. Ambient. Intell. Human Comput.
(2020). https://doi.org/10.1007/s12652-020-01865-8
246 N. Sneha and B. M. Patil
6. D. Yogesh, A.K. Dubey, R. Ratan, et al., Computer vision based analysis and detection of
defects in fruits causes due to nutrients deficiency. Cluster Comput. 23, 1817–1826 (2020).
https://doi.org/10.1007/s10586-019-03029-6
7. T. Koutchma, V. Popović, V. Ros-Polski, A. Popielarz, Effects of ultraviolet light and high-
pressure processing on quality and health-related constituents of fresh juice products. Compr.
Rev. Food Sci. Food Saf. 15, 844–867 (2016). https://doi.org/10.1111/1541-4337.12214
8. C. Moreno, M.J. Andrade-Cuvi, M.J. Zaro, M. Darre, A.R. Vicente, A. Concellón, Short UV-C
treatment prevents browning and extends the shelf-life of fresh-cut carambola. J. Food Qual.
Article ID 2548791, 9 (2017). https://doi.org/10.1155/2017/2548791
9. H.-H. Yang, U.-I. Wu, H.-M. Tai, W.-H. Sheng, Effectiveness of an ultraviolet-C disinfection
system for reduction of healthcare-associated pathogens. J. Microbiol. Immunol. Infect. 52(3),
487–493 (2019). ISSN 1684-1182. https://doi.org/10.1016/j.jmii.2017.08.017
10. D.T. Weaver, B.D. McElvany, V. Gopalakrishnan, K.J. Card, D. Crozier, A. Dhawan, J.G. Scott,
UV decontamination of personal protective equipment with idle laboratory biosafety cabinets
during the COVID-19 pandemic. Plos One 16(7), e0241734 (2021)
11. M. Raeiszadeh, B. Adeli, A critical review on ultraviolet disinfection systems against COVID-
19 outbreak: applicability, validation, and safety considerations. ACS Photonics 7(11), 2941–
2951 (2020)
12. Q. Jiang, M. Zhang, B. Xu, Application of ultrasonic technology in postharvested fruits and
vegetables storage: a review. Ultrason. Sonochem. 105261 (2020)
13. E.V. Grabovski, P.V. Sasorov, A.P. Shevelko et al., Radiative heating of thin Al foils by intense
extreme ultraviolet radiation. Jetp Lett. 103, 350–356 (2016). https://doi.org/10.1134/S00213
64016050040
14. S. Bredholt, J. Maukonen, K. Kujanpää et al., Microbial methods for assessment of cleaning
and disinfection of food-processing surfaces cleaned in a low-pressure system. Eur. Food Res.
Technol. 209, 145–152 (1999). https://doi.org/10.1007/s002170050474
15. J.P. Chen, L. Yang, L.K. Wang, B. Zhang, Ultraviolet radiation for disinfection, in Advanced
Physicochemical Treatment Processes, ed. by L.K. Wang, Y.T. Hung, N.K. Shammas. Hand-
book of Environmental Engineering, vol. 4 (Humana Press, 2006). https://doi.org/10.1007/978-
1-59745-029-4_10
16. S. Wang, Q. Liu, S. Chen, Y. Xue, Design and application of distance measure ultrasonic sensor,
in Advances in Mechanical and Electronic Engineering, ed. by D. Jin, S. Lin. Lecture Notes in
Electrical Engineering, vol. 178 (Springer, Berlin, 2013). https://doi.org/10.1007/978-3-642-
31528-2_18
17. M. Dorais, D.L. Ehret, A.P. Papadopoulos, Tomato (Solanum lycopersicum) health compo-
nents: from the seed to the consumer. Phytochem. Rev. 7, 231 (2008). https://doi.org/10.1007/
s11101-007-9085-x
18. A. Vijaya Anand, S. Velayuthaprabhu, R.L. Rengarajan, P. Sampathkumar, R. Radhakrishnan,
Bioactive compounds of Guava (Psidium guajava L.), in Bioactive Compounds in Underutilized
Fruits and Nuts, ed. by H. Murthy, V. Bapat. Reference Series in Phytochemistry (Springer,
Cham, 2020). https://doi.org/10.1007/978-3-030-30182-8_37
19. J. Wen, Vitaceae, in Flowering Plants · Eudicots. The Families and Genera of Vascular Plants,
vol. 9, ed. by K. Kubitzki (Springer, Berlin, 2007). https://doi.org/10.1007/978-3-540-32219-
1_54
20. E. Szücs, T. Kállay, Determination of fruiting capacity of apple trees (Malus domestica) by
DRIS, in Plant Nutrition—Physiology and Applications. Developments in Plant and Soil
Sciences, vol. 41, ed. by M.L. van Beusichem (Springer, Dordrecht, 1990). https://doi.org/
10.1007/978-94-009-0585-6_120
21. P. Li, X. Yu, B. Xu, Effects of UV-C light exposure and refrigeration on phenolic and antioxidant
profiles of subtropical fruits (Litchi, Longan, and Rambutan) in different fruit forms. J. Food
Qual. 2017, 12 (2017), Article ID 8785121. https://doi.org/10.1155/2017/8785121
22. M.J. Bailey, A note on the use of 8dinitrosalicylic acid for determining the products of enzymatic
reactions. Appl. Microbiol. Biotechnol. 29, 494–496 (1988). https://doi.org/10.1007/BF0026
9074
The Impact of UV-C Treatment on Fruits and Vegetables … 247
23. C.G. Jones, J. Daniel Hare, S.J. Compton, Measuring plant protein with the Bradford assay. J.
Chem. Ecol. 15, 979–992 (1989). https://doi.org/10.1007/BF01015193
24. V.R. Meyer, High-performance liquid chromatography (HPLC), in Practical Methods in
Cardiovascular Research, ed. by S. Dhein, F.W. Mohr, M. Delmar (Springer, Berlin, 2005).
https://doi.org/10.1007/3-540-26574-0_35
Modeling and Forecasting Stock Closing
Prices with Hybrid Functional Link
Artificial Neural Network
S. Das (B)
Department of Computer Science and Engineering, KL University, Hyderabad, India
e-mail: subhranginee.das@klh.edu.in
S. C. Nayak
Department of Computer Science and Engineering, CMR College of Engineering and Technology,
Hyderabad, India
B. Sahoo
School of Computer Engineering, KIIT University, Bhubaneswar, India
e-mail: bsahoofcs@kiit.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 249
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_19
250 S. Das et al.
1 Introduction
The economy of a country has a direct link to the stock market. The stock price
changes arbitrarily depending upon international law, global market scenario, gold
price, petrol price, exchange rate, various socio-economic and political factors, etc.
[1, 2]. Due to such random movement, the trend it follows is a nonlinear curve.
Predicting a point on such a highly nonlinear curve is a tough job. A very nominal
variation of stock price can have an impact on the prices of the international economy.
Consequently, for the stock market, an efficient prediction mechanism is desired.
Based on the linear association of past and current data, several statistical models are
recommended in the early days to forecast financial data. However, these methods
are not found promising in forecasting stock series efficiently. Therefore, advances in
intelligence techniques such as artificial neural networks (ANNs) have been consid-
ered a better substitute for statistical methods and ANN have been successfully used
in the literature for stock price forecasting [3–6]. ANNs need many parameter selec-
tions, such as hidden layer numbers and number of nodes in each hidden layer.
But higher-order neural network having fewer parameter requirements with impres-
sive computational, storage, and learning capabilities can overcome the demerits of
ANN. This paper used one such higher-order neural network, i.e., FLANN, for better
prediction efficiency.
Finding the optimal weight and biases of FLANN structure is a crucial aspect that
needs human skill. Usually, gradient-based methods are used to achieve this, but it is
associated with few drawbacks such as slow convergence and trapped in local optima.
Later, many evolutionary optimization techniques came forward inspired by natural
phenomena and have been used as a better substitute for gradient-based methods
[7]. Evolutionary learning methods such as genetic algorithm (GA), particle swarm
optimization (PSO), and differential evolution (DE) are more proficient methods for
searching the optimal FLANN parameters. Since no single technique was found suit-
able in solving all sorts of problems, continuous improvements in existing methods
have been carried out by researchers through enhancement in an algorithm [8, 9]
or hybridization of them [10–15]. Recently, AEFA has been anticipated as an opti-
mization method inspired by the principle of electrostatic force [16]. AEFA is based
on a robust theoretical concept of charged particles, electric field, and the attrac-
tion/repulsion force between two charged particles in an electric field. The learning
capacity, convergence rate, and acceleration updates of AEFA have been established
in [16] through solving some benchmark optimization problems.
This work is an initiative toward investigating the potential of AEFA in fine-
tuning the parameters of a FLANN, thus designing a hybrid model called as AEFA
+ FLANN. The proposed AEFA + FLANN is assessed through forecasting stock
prices of DJIA, NASDAQ, BSE, and HSI. Data pre-processing, input selection, and
model design steps are also explained.
Modeling and Forecasting Stock Closing Prices with Hybrid … 251
2 FLANN
For easier naming, we are using z j (t) as the expanded input at tth iteration, where
1 ≤ j ≤ J and J = k × I .
These nonlinear outcomes of the input layer are multiplied with the corresponding
randomly initialized weight values chosen from the range [−1,1], and the summed
result is then passed through the activation function to produce output. Here, we have
taken one neuron in the output layer. The computed output from the output layer is
compared with the desired output to compute error. The resulting error for the given
Here, y(t) is the target output and y (t) is the computed output for an input pattern
J
y (t) = z j (t).w j (t) (3)
j=0
where z j (t) is the functionally expanded input of node j at tth iteration, z 0 (t) = 1
and w j (t) is the assigned weight from node j to the output node at the tth iteration and
w0 (t) is the bias which is initialized with some random value from the range [−0.5,
0.5]. The computed error from Eq. (4) is used to calculate the change in weight
where μ is the convergence coefficient and wdj (t) is the change in weight value at
tth iteration for any input set patterns d. If there are total p set of patterns are there
in training set, then the average change in each weight is given by Eq. (5)
1
p
w j (t) = w dj (t) (5)
p d=1
particle and its fitness function as the quantity of charge associated with that element.
The velocity and position of a particle at time instant t are updated as per Eq. 7 and
8, respectively.
The following steps present the AEFA + FLANN-based forecasting method and
the overall AEFA + FLANN steps are shown in Fig. 2.
For AEFA + FLANN, the collected data are divided into two datasets as training
data and test data. The training dataset is used to train the network of FLANN. Once
the network gets trained and the network parameters are computed, the testing dataset
is used to test the model’s accuracy. For both the data, the rolling window method is
conditioned to select input. The input data are normalized by using sigmoid normal-
ization. Then, from the normalized data, I number of data are selected randomly.
Those I numbers of data pass through the expansion function of the given FLANN
and get expanded. Each input feature gets developed to k numbers of features. So
total I × k numbers of inputs are created in each step. These expanded inputs are
assigned with random weights and biases. Then, the weighted sum goes through
the activation function to produce output. The computed output is denormalized and
compared with the actual output to calculate the error. The generated error is used to
determine the fitness value. From that select the best and the worst fit then by using
the AEFA steps, the best-fit weights are calculated. Once the best possible weights
are computed, the FLANN model is created using these weights to test new patterns.
We conducted different experiments on four real stock price series such as NASDAQ,
DJIA, BSE, and HSI to measure the predictability of the proposed approach and
comparative methods. The actual stock prices on daily basis for one financial year
are collected from https://finance.yahoo.com/quote/history. There are approximately
252 data points on each series. From a series, inputs are carefully chosen through
the rolling window method. Raw data are gone through the normalization process
using the sigmoid method and then fed to the models separately [17]. The FLANN
is trained through AEFA-based learning and approximates an output. The esti-
mated output is denormalized and compared with the observed value. The variation
from actual output is measured as the error caused by the model. Six comparative
models such as gradient descent FLANN (GD-FLANN), genetic algorithm-based
FLANN (GA-FLANN), differential evolution-based FLANN (DE-FLANN), and
PSO-FLANN, multilayer perceptron (MLP), and autoregressive integrated moving
average (ARIMA) are developed similarly and used for fair comparisons.
To avoid the stochastic nature of the neural network-based models, we simulated
each model twenty times and the mean error from twenty experiments is summarized
in Table 1. The best average errors are shown in boldface. For all datasets, AEFA
+ FLANN produced the best average errors. Though few tie-ups are found with
GA-FLANN and PSO-FLANN, the AEFA + FLANN achieved the lowest average
error for all datasets. The AEFA + FLANN estimated prices against actual prices
are plotted in Figs. 3, 4, 5 and 6.
Table 1 Error statistics from all forecasts
Closing price series Error statistic Forecast
MLP ARIMA GD-FLANN GA-FLANN DE-FLANN PSO-FLANN AEFA + FLANN
DJIA Average 0.84293 1.00372 0.01568 0.00773 0.00977 0.00862 0.00773
Std 0.02018 0.04728 0.00875 0.00888 0.00835 0.00875 0.00299
NASDAQ Average 0.97930 0.99529 0.08580 0.04522 0.04407 0.04377 0.02165
Std 0.04283 0.05227 0.03269 0.05364 0.03465 0.00560 0.00935
BSE Average 0.86115 0.99274 0.03805 0.00835 0.02377 0.00835 0.00835
Std 0.03228 0.07274 0.04283 0.02623 0.02113 0.01757 0.02113
Modeling and Forecasting Stock Closing Prices with Hybrid …
This article presented a hybrid prediction model called as AEFA + FLANN for
modeling of stock market price movements. AEFA is used to determine the optimal
parameters of FLANN, thus evolutionarily constructing the forecast. The resulted
hybrid forecast is applied to predict the future closing prices of four real stock data
series. The model inputs are extracted from the original data series using the rolling
window method. The model output is finally denormalized to get the prediction price.
Six relative forecasts are developed. The model performances are measured in terms
of average error and standard deviation. From exhaustive simulation studies, it is
spotted that the AEFA + FLANN model is pretty efficient in catching the hidden
patterns in the stock prices than others and generated the lowest error signals. This
model can be used for other time series data prediction. Therefore, the present study
can be stretched with some improvised version of AEFA and adopting other neural
models.
References
1. C.J. Huang, D.X. Yang, Y.T. Chuang, Application of wrapper approach and composite classifier
to the stock trend prediction. Expert Syst. Appl. 34(4), 2870–2878 (2008)
2. H.-C. Liu, Y.-H. Lee, M.-C. Lee, Forecasting china stock markets volatility via GARCH models
under skewed-GED distribution. J. Money Invest. Bank. 5–14 (2009)
3. S. Soni, Applications of ANNs in stock market prediction: a survey. Int. J. Comput. Sci. Eng.
Technol. 2(3), 71–83 (2011)
4. A. Rao, S. Hule, H. Shaikh, E. Nirwan, P.M. Daflapurkar, Survey: stock market prediction
using statistical computational methodologies and artificial neural networks. Int. Res. J. Eng.
Technol. 08, 2395–2456 (2015)
5. V. Rajput, S. Bobde, Stock market forecasting techniques: literature survey. Int. J. Comput.
Sci. Mob. Comput. 5(6), 500–506 (2016)
6. A. Sharma, D. Bhuriya, U. Singh, Survey of stock market prediction using machine learning
approach, in 2017 International conference of electronics, communication and aerospace
technology (ICECA), Vol. 2, pp. 506–509. IEEE (2017, April)
7. P.B. Rana, J.L. Patel, D.I. Lalwani, Parametric optimization of turning process using evolu-
tionary optimization techniques—a review (2000–2016). Soft Comput. Probl. Solv. 165–180
(2019)
8. N. Shadbolt, From the Editor in Chief: Nature-inspired computing. IEEE Intell. Syst. 19(01),
2–3 (2004)
9. K. Opara, J. Arabas, Comparison of mutation strategies in differential evolution—a proba-
bilistic perspective. Swarm Evol. Comput. 39, 53–69 (2018)
10. S. Jiang, Y. Wang, Z. Ji, Convergence analysis and performance of an improved gravitational
search algorithm. Appl. Soft Comput. 24, 363–384 (2014)
11. S.C. Nayak, B.B. Misra, A Chemical Reaction Optimization based Neuro-Fuzzy hybrid Network
for Stock Closing Prices Prediction, Financial Innovation (Springer, Berlin, 2019)
12. S.C. Nayak, M.D. Ansari, COA-HONN: cooperative optimization algorithm based higher
order neural networks for stock forecasting, in Recent Advances in Computer Science and
Communications (Bentham Science, 2019)
Modeling and Forecasting Stock Closing Prices with Hybrid … 259
13. S. C. Nayak, S. Das, Md. Ansari, TLBO-FLN: teaching learning-based optimization of func-
tional link neural networks for stock closing price prediction. Int. J. Sens. Wirel. Commun.
Control Bentham Sci. (2019)
14. S. Das, S.C. Nayak, B. Sahoo, Towards crafting optimal functional link artificial neural
networks with RAO algorithms for stock closing prices prediction. Comput. Econ. 1–23 (2021)
15. S.C. Nayak, A fireworks algorithm based Pi-Sigma neural network (FWA-PSNN) for modelling
and forecasting chaotic crude oil price time series. EAI Trans. Energy Web (2020)
16. A. Yadav, AEFA: artificial electric field algorithm for global optimization. Swarm Evol.
Comput. 48, 93–108 (2019)
17. S.C. Nayak, B.B. Misra, H.S. Behera, ACFLN: artificial chemical functional link network for
prediction of stock market index. Evol. Syst. 10(4), 567–592 (2019)
Whale Optimization Algorithm Based
Optimal Power Flow to Reduce
Generation Cost
1 Introduction
Electrical power utilization is increasing day by day and also looking for economical
operation by reducing the generation cost. In recent years, one of the predominant
issues applied to realize the optimal planning process of a practical system is optimal
power flow (OPF). The function of optimal power flow (OPF) is more significant in
modern power system operation and control. OPF problem optimizes the regulated
variables by considering the minimization of fuel cost to reduce the generation cost.
T. Papi Naidu
Lendi Institute of Engineering and Technology, Vijayanagaram, AP 535005, India
Annamalai University, Chidambaram, Tamil Nadu, India
B. Venkateswararao (B)
V. R. Siddhartha Engineering College, Vijayawada, AP 520007, India
G. Balasubramanian
Government College of Engineering, Tirunelveli, TN 627001, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 261
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_20
262 T. Papi Naidu et al.
this paper, authors present the well-known algorithm, WOA, and applied on IEEE 30
bus system. By this technique, improved convergence characteristics are obtained,
and the cost of the generation getting reduced.
where PG i is the ith bus generator active power, VG i is the magnitude of the voltage
at ith PV bus, T j is the jth branch tap of the transformer, Q Ck is the shunt capacitor
at kth bus.
Dependent variables presented in Eq. 3
The goal of this is to reduce the cost. The total fuel cost function (F1 ) for a quantity
of thermal generating units can be illustrated by the following Eq. 4:
NTG
F1 = αi + βi PTGi + γi PTGi
2
$/hr (4)
i=1
264 T. Papi Naidu et al.
2.2 Constraints
Q Gi − Q Di − Q l = 0 (6)
Dissimilarity constraints.
The OPF inequality restrictions are
(a) Generator restrictions given in Eqs. 7–9:
min
VGi ≤ VGi ≤ VGi
max
; i ∈ Ng (7)
min
PGi ≤ PGi ≤ PGi
max
; i ∈ Ng (8)
Gi ≤ Q Gi ≤ Q Gi ; i ∈ N g
Q min max
(9)
ci ≤ Q ci ≤ Q ci ; i ∈ Nc
Q min max
(11)
B = 2.
a .
r 1 − a (14)
= 2.
D r2 (15)
In this stage, whale identifies its prey with Eqs. 16 and 17:
H = D × Z p (iter) − Z (iter) (16)
p E [0, 1].
266 T. Papi Naidu et al.
4 Simulation Results
In Table1, the Min and Max limits of the control variables of IEEE 30 bus system
are shown.
This method involves 30 buses with 24 load buses and 6 generators. Tap changing
transformers are linked amid lines 6–10, 4–12, 6–9, and 27–28. Shunt capacitors are
positioned at 9 buses. So, completely there are 19 regulated variables. The outcomes
of the regulated variables are given in Table 2, and its convergence characteristic is
displayed in Fig. 1. The true power generation cost obtained in WOA is compared
with MSA and GWO in terms of control variables and presented in Table 2. From
Table 2, it is identified that 19 regulated variables optimized effectively and produced
better results with WOA compared to GWO and MSA. The cost is 801.41 $/h, losses
are 9.30 MW, and total generation is 292.7 MW in GWO; these are reduced with
MSA in this total generation which is 292.4345 MW, losses are 9.0345 MW and
cost is 800.5099 $/h. By using WOA, these values are further reduced, losses are
8.8140 MW, the cost is 800.3196$/h, and total generation is 292.214 MW.
Table 3 presents the lowest, highest, mean, and standard deviation values of true
power generation cost for 20 runs. All the 20 run values are shown in Fig. 2. From
this, it is witnessed that the objective function value is more or less similar for all
the 20 runs. This indicates that WOA produces the best values for all trails. It is also
observed that the best value achieved at trail 9, and this value is 800.3196 $/hr, worst
value is obtained at third evaluation, this value is 801.3277$/hr. The mean value
of 20 runs is 800.658$/hr, and the standard deviation value for 20 trails is 0.2838.
Comparison with various algorithms available in the literature like MPSO, MFO,
MGBICA, GWO, HSFLA-SA, and TLO is shown in Table 4. It is found that the true
power generation cost is less with the implementation of WOA compared to other
techniques.
Table 1 Boundaries of
Variables Min in p.u Max in p.u
regulated variables
Voltages of generator buses 0.90 1.10
Transformers tap locations 0.95 1.05
Size of shunt capacitors 0.0 0.2
Whale Optimization Algorithm Based Optimal … 267
Table 2 Results for finest WOA for the reduction of fuel cost in IEEE 30 bus system
Control variables and parameters MSA [13] GWO [5] WOA
PTG1 177.2131 177.06 176.0386
PTG2 48.7326 49.02 48.5459
PTG5 21.4572 21.25 21.2817
PTG8 21.0638 21.71 21.6116
PTG11 11.9657 11.64 12.5939
PTG13 12.0021 12.02 12.1423
V TG1 1.0848 0.9910 1.1
V TG2 1.0653 1.0518 1.1
V TG5 1.03386 1.0665 1.1
V TG8 1.03823 0.9621 1.08869
V TG11 1.0927 0.9600 1.1
V TG13 1.04533 1.0262 1.1
QC10 2.37123 1.9793 4.32262
QC12 2.57918 4.7467 0
QC15 4.20734 2.9839 0
QC17 5 1.2097 2.57489
QC20 3.68771 4.2109 4.11584
QC21 4.95747 2.1081 2.5457
QC23 3.08148 3.6728 1.75619
QC24 4.98767 4.1593 3.97527
QC29 2.48706 3.2265 1.86436
T 11 1.04907 0.9875 0.983227
T 12 0.938762 0.9125 1.00358
T 15 0.970177 0.9875 0.992703
T 36 0.97498 0.9500 1.00521
Total power generation PG (MW) 292.4345 292.7 292.214
Total cost ($/h) 800.5099 801.41 800.3196
Ploss (MW) 9.0345 9.30 8.8140
QC in MVAR, V TG in p.u and PTG in MW
5 Conclusion
In this, WOA-based OPF is applied for the reduction of true power cost as an objective
function. The WOA is used for the optimization and found useful when related to the
different procedures like GWO, ABC, MSA, and DE, owing to its random possibility
and fast convergence. It has been executed on the IEEE 30 bus system, and cost has
been reduced. WOA implemented by the authors for OPF as a preface study, in future
268 T. Papi Naidu et al.
820
815
805
800
795
790
26
51
76
1
101
126
151
176
201
226
251
276
301
326
351
376
401
426
451
476
No of itera ons
Fig. 1 Convergence for the reduction of generation cost for IEEE 30 bus system using WOA
Table 3 Standard deviation and mean of true power generation cost in IEEE 30 bus system
Measure Min in $/hr Max in $/hr Mean value in $/hr Standard deviation
True power generation 800.3196 801.3277 800.658 0.2838
cost
800.387 1 800.5553
20 800.96 2
800.3203 19 801.3277
800.8 3
798.8 800.3463
18 4
800.7891 796.8
800.973
17 794.8 5
800.7601
792.8
800.7809
16 790.8 6
800.7565
800.7565
15 7
800.7809
800.7601
14 8
800.973
13 9 800.3196
800.3463 12 10 800.3203
800.5553 11 800.387
hybrid algorithms with FACTS devices are planned to use for OPF problems, which
may give improved outcomes.
References
1. D. Asija, P.V. Astick, P. Choudekar, Minimizing fuel cost of generators using. GA-OPF, in
Proceedings of First International Conference on Smart System, Innovations and Computing.
Smart Innovation, Systems and Technologies, vol. 79, ed. by A. Somani, S. Srivastava, A.
Mundra, S. Rawat (Springer, Singapore, 2018). https://doi.org/10.1007/978-981-10-5828-8_32
2. H.R.E.H. Bouchekara, Optimal power flow using black-hole-based optimization approach.
Appl. Soft. Comput. 24, 879–888 (2014). https://doi.org/10.1016/j.asoc.2014.08.056
3. H.R.E.H. Bouchekara, M.A. Abido, M. Boucherma, Optimal power flow using teaching-
learning-based optimization technique. Electr. Power Syst. Res. 114, 49–59 (2014). https://
doi.org/10.1016/j.epsr.2014.03.032
4. S. Duman, U. Güvenç, Y. Sönmez, N. Yörükeren, Optimal power flow using gravitational
search algorithm. Energy Convers. Manage. 59, 86–95 (2012). https://doi.org/10.1016/j.enc
onman.2012.02.024
5. A.A. El-Fergany, H.M. Hasanien, Single and multi-objective optimal power flow using grey
wolf optimizer and differential evolution algorithms. Electr. Power Components Syst. 43(13),
1548–1559 (2015). https://doi.org/10.1080/15325008.2015.1041625
6. H.-H. Bouchekara, M.A. Abido, A.E. Chaib, Optimal power flow using an improved
electromagnetism-like mechanism method. Electr. Power Compon. Syst. 44(4), 434–449
(2016). https://doi.org/10.1080/15325008.2015.1115919
7. M. Ghasemi, S. Ghavidel, S. Rahmani, A. Roosta, H. Falah, A novel hybrid algorithm of
imperialist competitive algorithm and teaching learning algorithm for optimal power flow
problem with non-smooth cost functions. Eng. Appl. Artif. Intell. 29, 54–69 (2014). https://
doi.org/10.1016/j.engappai.2013.11.003
270 T. Papi Naidu et al.
8. A. Ramesh Kumar, L. Premalatha, Optimal power flow for a deregulated power system using
adaptive real coded biogeography-based optimization. Int. J. Electr. Power Energy Syst. 73,
393–399 (2015). https://doi.org/10.1016/j.ijepes.2015.05.011
9. R. Roy, H.T. Jadhav, Optimal power flow solution of power system incorporating stochastic
wind power using Gbest guided artificial bee colony algorithm. Int. J. Electr. Power Energy
Syst. 64, 562–578 (2015). https://doi.org/10.1016/j.ijepes.2014.07.010
10. S. Surender Reddy, P.R. Bijwe, A.R. Abhyankar, Faster evolutionary algorithm based optimal
power flow using incremental variables. Int. J. Electr. Power Energy Syst. 54, 198–210 (2014).
https://doi.org/10.1016/j.ijepes.2013.07.019
11. A.E. Chaib, H.R.E.H. Bouchekara, R. Mehasni, M.A. Abido, Optimal power flow with emission
and non-smooth cost functions using backtracking search optimization algorithm. Int. J. Electr.
Power Energy Syst. 81, 64–77 (2016). ISSN 0142-0615. https://doi.org/10.1016/j.ijepes.2016.
02.004
12. P.P. Biswas, P.N. Suganthan, R. Mallipeddi, G.A.J. Amaratunga, Optimal power flow solutions
using differential evolution algorithm integrated with effective constraint handling techniques.
Eng. Appl. Artif. Intell. 68, 81–100 (2018). https://doi.org/10.1016/j.engappai.2017.10.019
13. A.-A.A. Mohamed, Y.S. Mohamed, A.A.M. El-Gaafary, Optimal power flow using moth swarm
algorithm, in Electr. Power Syst. Res. 142, 190–206 (2017). https://doi.org/10.1016/j.epsr.2016.
09.025
14. S. Surender Reddy, R.C. Srinivasa, Optimal power flow using glowworm swarm optimization.
Int. J. Electr. Power Energy Syst. 80, 128–139 (2016). https://doi.org/10.1016/j.ijepes.2016.
01.036
15. B. Venkateswara Rao, R. Devarapalli, H. Malik, S.K. Bali, F.P.G. Márquez, T. Chiranjeevi,
Wind integrated power system to reduce emission: an application of Bat algorithm. J. Intel.
Fuzzy Syst. Preprint (Preprint) 1–9 (2021). https://doi.org/10.3233/JIFS-189770
16. D.L. Pravallika, B.V. Rao, Flower pollination algorithm based optimal setting of TCSC to
minimize the transmission line losses in the power system. Procedia Comp. Sci. 92, 30–35
(2016). https://doi.org/10.1016/j.procs.2016.07.319
17. S. Mirjalili, A. Lewis, The whale optimization algorithm. Adv. Eng. Softw. 95, 51–67 (2016).
https://doi.org/10.1016/j.advengsoft.2016.01.008
18. R. Devarapalli, B. Venkateswara Rao, B. Dey, K. Vinod Kumar, H. Malik, F.P.G. Márquez,
An approach to solve OPF problems using a novel hybrid whale and sine cosine optimization
algorithm. J. Intel. Fuzzy Syst. Preprint (Preprint) 1–11 (2021). https://doi.org/10.3233/JIFS-
189763
An Artificial Electric Field Algorithm
and Artificial Neural Network-Based
Hybrid Model for Software Reliability
Prediction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 271
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_21
272 A. K. Behera et al.
1 Introduction
With increasing size and complexity, it is a big challenge for software developers for
developing good quality software systems with quick time. As the size increases, the
number of possible failures also increases in a software which impacts the quality of
a software product. Software reliability is defined as “the probability of a computer
program operating without failure for a particular period of time in a specified envi-
ronment” [1]. Software reliability is a crucial feature for software quality and an
important consideration when determining the length of time required for software
testing. Consequently, an efficient prediction instrument is desired for prediction of
software reliability. Assuming linear association of past and current data, several
parametric models are recommended in early days for forecasting software failures.
Parametric models are statistical methods based on certain assumption, so these
methods are not found promising in forecasting relationship between successive
failure time of software. On the other hand, non-parametric models such as ANNs
have been considered as better substitute to parametric models and successful ANN
applications are there in the literature for forecasting software reliability [2–6].
The parameter fine-tuning (i.e., finding the optimal weight and biases) of ANN
structure is a crucial aspect in ANN application that needs human expertise. Usually,
gradient-based methods are used to accomplish this, but associated with few draw-
backs such as slow convergence, may land at local optima etc. Later, many evolu-
tionary optimization techniques came forward which are inspired from natural
phenomena and have been used as better substitute of gradient-based methods [7,
8]. Evolutionary learning methods such as GA, PSO, DE etc. are more proficient
methods in searching the optimal parameters of ANN. Since no single technique
found suitable in solving all sort of problems, continuous improvements in existing
methods have been carried out by researchers through enhancement in an algorithm
[9, 10] or hybridization of them [11–13]. Recently, AEFA has been anticipated as
an optimization method inspired by the principle of electrostatic force [14]. AEFA
is based on strong theoretical concept of charged particles, electric field, and force
of attraction between two particles in the field of electric. The learning capacity,
convergence rate, and acceleration updates of AEFA have been established in [14]
through solving some benchmark optimization problems.
This work is an initiative toward investigating the potential of AEFA in fine-tuning
the parameters of an ANN, thus designing a hybrid model called as AEFA-ANN. It
is worth to mention that, along with the parameters (weight and bias), another crucial
factor such as deciding optimal number of hidden neurons of ANN are also carried
out by AEFA. The proposed AEFA-ANN is assessed through forecasting successive
failure time of software. Data pre-processing, input selection, model design steps are
also explained.
The article is structured into four parts: Sect. 2 presents short description about
methods and materials followed by Sect. 3 which discuss experimental outcomes
and Sect. 4 which gives the concluding remarks along with future scope.
An Artificial Electric Field Algorithm and Artificial Neural … 273
2 ANN
where , xi is the ith input component,wi j is weight value between ith input neuron
and jth hidden neuron and b j is the bias. f is a nonlinear activation function. Suppose
there are m numbers of nodes in this hidden layer, then for the next hidden layer these
m outputs become the input. Then for each neuron j of the next hidden layer, input
is as in Eq. 2.
m
yj = f bj + wi j ∗ yi (2)
i=1
This signal flows in the forward direction through each hidden layer until it reaches
the output layer. The output yesst is calculated using Eq. 3
274 A. K. Behera et al.
⎛ ⎞
m
yesst = f ⎝bo + vj ∗ yj⎠ (3)
j=1
where, v j is the weight between jth hidden to output neuron, y j is the weighted
sum obtained in Eq. 1, and bo is the output bias. Given a set of training samples
S = {xi , yi }i=1
N
to train the ANN, let yi be the output of ith input sample, and yesst
is the computed output of the same ith input, then the error is calculated by using
Eq. 4.
3 AEFA-ANN-Based Forecasting
Here AEFA is designed on the principle of Coulomb’s law of electrostatic force [14].
It simulates the charged particles as agents and measures their strength in terms of
their charges. The particles are moveable in the search domain through electrostatic
force of attraction/repulsion among them. The charges possessed by the particles
are used for interaction and positions of the charges are considered as the potential
solutions for the problem. According to AEFA, the particle with the highest charge
is the best individual; it attracts particles with lower charges and goes through the
search domain. The mathematical justification of AEFA is illustrated in [14]. Here,
we simulate a potential solution of ANN as a charge particle and its fitness function
as the quantity of charge associated with that element. The velocity and position of
a particle at time instant t are updated as per Eqs. 5 and 6 respectively
The overall AEFA steps are shown in Fig. 2 and Algorithm 1 presents high level
AEFA-ANN-based forecasting.
An Artificial Electric Field Algorithm and Artificial Neural … 275
1 |Observedi − Estimatedi |
N
MAPE = × 100% (7)
N i=1 Observedi
N
(Observedi − Estimatedi )2
ARV = i=1
2
(8)
N
i=1 (Observedi − X)
5 Conclusions
In this article, an AEFA-ANN based hybrid model is proposed to predict the succes-
sive failure time in software. AEFA is used to discover the most feasible ANN param-
eters along with the number of hidden neurons of a single hidden layer ANN, thus
crafting an optimal ANN structure on fly. Four comparative forecasts are developed
in similar manner. To evaluate the proposed and comparative models, experiments
An Artificial Electric Field Algorithm and Artificial Neural … 277
Musa-01
Musa-02
Lee
are conducted on real software failure datasets considering different forecasting hori-
zons. From exhaustive simulation studies, it is detected that AEFA-ANN model is
pretty efficient in catching the hidden patterns in the software failure series data than
others. The present study can be stretched with some improvised version of AEFA
and adopting other neural models.
278 A. K. Behera et al.
Musa-01
Musa-02
Lee
Fig. 4 Error Histogram plots by AEFA + ANN
An Artificial Electric Field Algorithm and Artificial Neural … 279
References
1. M.R. Lyu, Handbook of Software Reliability Engineering, vol. 222 (IEEE Computer Society
Press, McGraw-Hill, 1996)
2. A.K. Behera, S.C. Nayak, C.S.K. Dash, S. Dehuri, M. Panda, Improving software relia-
bility prediction accuracy using CRO-based FLANN, in Innovations in Computer Science
and Engineering. (Springer, Singapore, 2019), pp. 213–220
3. A.K. Behera, M. Panda, Software reliability prediction with ensemble method and virtual
data point incorporation, in International Conference on Biologically Inspired Techniques in
Many-Criteria Decision Making. (Springer, Cham, 2019), pp. 69–77
4. M.K. Bhuyan, D.P. Mohapatra, S. Sethi, Software reliability assessment using neural networks
of computational intelligence based on software failure data. Baltic J. Modern Comput. 4(4),
1016–1037 (2016)
5. K. Juneja, A fuzzy-filtered neuro-fuzzy framework for software fault prediction for inter-
version and inter-project evaluation. Appl. Soft Comput. 77, 696–713 (2019)
6. W.D. van Driel, J.W. Bikker, M. Tijink, Prediction of software reliability. Microelectron. Reliab.
119, 114074 (2021)
7. N. Shadbolt, Nature-inspired computing. IEEE Intell. Syst. 19(1), 2–3 (2004)
8. S.C. Nayak, B.B. Misra, Extreme learning with chemical reaction optimization for stock
volatility prediction. Financ. Innov. 6(1), 1–23 (2020)
9. K. Opara, J. Arabas, Comparison of mutation strategies in differential evolution–a probabilistic
perspective. Swarm Evol. Comput. 39, 53–69 (2018)
10. S. Jiang, Y. Wang, Z. Ji, Convergence analysis and performance of an improved gravitational
search algorithm. Appl. Soft. Comput. 24, 363–384 (2014)
11. S. Nayak, M. Ansari, Coa-honn: cooperative optimization algorithm based higher order neural
networks for stock forecasting. Recent Adv. Comput. Sci. Commun. 13(1), (2020)
12. S.C. Nayak, A fireworks algorithm based Pi-Sigma neural network (FWA-PSNN) for modelling
and forecasting chaotic crude oil price time series. EAI Endorsed Trans. Energy Web 7(28)
(2020).
13. A.K. Behera, M. Panda, S. Dehuri, Software reliability prediction by recurrent artificial
chemical link network. Int. J. Syst. Assur. Eng. Manage. 12, 1–14 (2021)
14. A. Yadav, AEFA: artificial electric field algorithm for global optimization. Swarm Evol.
Comput. 48, 93–108 (2019)
Disaster Event Detection from Text:
A Survey
Abstract With the advent of increasing online information, detecting and moni-
toring disaster events from textual data is challenging. The moment some disaster
event happens, the social media and online web are flooded with lots of informa-
tion about the event. Afterward, the quantity of articles about the event decreases
exponentially. In order to monitor the successive development and after effects of
the disaster events, detection of these events from online documents and tracking
the documents reporting similar events becomes crucial. The information mined can
be utilized to gain insight into the causes and preparing aftermath of the events. In
this paper, a survey of the utility of text published in social media and online news
articles has been carried out for disaster event detection. This survey aims to present
the machine learning approaches applicable and analysis of research studies focused
on disaster event detection from social media and online news articles.
1 Introduction
Event detection is similar to topic detection. It helps in text differentiating and text
categorization. An event is something that happens at a particular place and at a time.
Detection of events from the input source is called event detection. It has many prac-
tical applications. Automatic event detection has many challenges due to different
perceptions of the same event by different people and the unavailability of a proper
definition of an event [1]. This paper deals with the detection of disaster-related
events. Disasters are of two types: natural and man-made. In natural disasters, nature
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 281
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_22
282 A. Gupta et al.
Input source plays a great role in event detection. The first and foremost step in any
event detection is to collect data. When we think upon textual input sources, we have
majorly two sources of input: social media and news articles. Each of these datasets
has its own importance and use cases.
Disaster Event Detection from Text: A Survey 283
Social media is a great source for data collection. It helps in speedy information
exchange and reporting of an event. Several social media platforms such as Twitter,
Facebook, and other microblogging services such as Sina Weibo are available. These
social media networks are used to exchange information that becomes a source of
input for researchers. Twitter is widely used by researchers because of its worldwide
availability and popularity and easy accessibility. Messages sent on Twitter are called
tweets and the character limit for it was 140 characters previously but since Nov.
2017 the character limit has doubled to 280 characters [7].
Yun [8] used a bag of disaster event words to detect disaster events from tweets
on trees. Li et al. [9] focused on the detection of crime and disaster-related events
(CDE) and created a system named TEDAS. They developed a CDE-focused crawler
to classify and rank tweets and location estimation of tweets. Guan and Chen [10]
confirmed the use of social media in damage assessment. According to Xiao et al. [11]
social media helps in the real-time collection of information, the establishment of
situational awareness, and support informal public communications. They examined
the relationship between the dependent variable, number of tweets generated during
a disaster situation, and independent variables mass, material, access, and motivation
(MMAM model) during hurricane Sandy.
Work done on social media datasets for event detection can be broadly classified into
three categories. Researchers had worked on any one of these or a combination of
these.
The main task of disaster event detection is to detect disaster-related events from
the whole dataset. In the paper [4, 12, 13] disaster-related messages are separated
from negative ones. Olteanu et al. [14] presented a method to generate and expand
the query effectively. They collected keywords-based and location-based samples of
tweets related to six disasters and their task is to distinguish tweets into relevant and
non-relevant classes. This dataset is publicly available.1
1 https://crisislex.org/data-collections.html.
284 A. Gupta et al.
In this category, researchers try to find what types of tweets the user does in a disas-
trous situation. Different researchers predict different information from the datasets.
Olteanu et al. [15] created a dataset of 26 disasters that happened during 2012 and
2013 and made it available publicly [Ref. Footnote 1]. They viewed content from
three dimensions- informativeness, information type, and sources to gain insight into
the situation during different disasters. Informativeness is a binary classification task.
Information type and sources have six categories each. Labeling tweets is done using
crowdsourcing. This dataset is used further by many researchers. Pekar et al. [16]
used the above created dataset and had explained four classification problems namely,
relatedness, informativeness, eyewitness, and topics. The first three are binary clas-
sification problems while the last one has six labels. Zahra et al. [17] further gain
insights into eyewitness-related tweets for three natural disasters.
Imran et al. [18] created a dataset using tweets of 19 disaster events that happened
from 2013 to 2015 and performed multiclass classification for topic categorization.
Data annotation is performed through manual labeling as well as crowdsourcing.
Word-embeddings of crisis-related tweets are created and also out-of-vocabulary
words are identified from tweets and annotated using crowdsourcing. This dataset2
is available for future research works. Alam et al. [19], created a human-labeled
multimodal (text and images) dataset for seven different disasters for three classi-
fication tasks namely, informative, humanitarian, and damage severity assessment.
They collected tweets by using keywords. They performed classification by manual
annotation using crowdsourcing.
In [20, 21], authors used different approaches to classify tweets into 25 classes,
a task provided by the 2018 Text REtrieval Conference on Incident Streams track
(TREC-IS).3 Yu et al. [22] classified tweets into five information classes namely,
caution and advice, casualties and damage, information sources, infrastructure and
resource, donation and aid for three different disaster events hurricane Sandy, Harvey,
and Irma. They performed two types of experiments: training and testing with the
same disaster and training and testing with different disasters.
To Extract Location
2 https://crisisnlp.qcri.org/
3 http://dcs.gla.ac.uk/~richardm/TREC_IS/
Disaster Event Detection from Text: A Survey 285
a CNN model to find location indicative words. Sakaki et al. [24] firstly separated
earthquake-related tweets from others and then obtained the longitude and latitude
value of tweets from GPS data and the registered location of the user. They created
a real-time earthquake reporting system after using particle filtering for location
estimation of events from tweets.
In [3], the authors have the main problem of finding the location of disaster victims.
If the user mentioned an address in a tweet then that location is used, else if the user
geo-tagged the tweet then that location is used, else created the Markov model to
uncover the user’s location from the user’s previous tweets.
Unankard et al. [25] created a system named location-based emerging event detec-
tion (LEED) and in this, they found a correlation between user location and event
location by calculating score value to detect a strong correlation between location
and emerging events.
The social media dataset has many limitations which have to be tackled by the
researchers. It has non-standard English words used, grammatical mistakes, spelling
mistakes, a mixture of different dialects, non-standard abbreviations, improper
sentence structure, a mixture of languages, abnormal spaces, and characters, etc.
[12, 23, 26]
News is a good source of information. According to Nugent et al. [27], news channels
help people in understanding the situation by reporting in two phases from which the
initial is the breaking news phase and the other is the situations that arise aftermath.
Also, there is a tendency of people to trust more on news channels. In this paper, the
author used the news dataset to perform the classification of news articles into seven
natural disasters and critical event types. News channels are an authentic source of
information, yet very small no. of research work is available by using news datasets
as input. Ahmad et al. [28] created a news dataset of three languages and proposed a
separate multi-layer perceptron (MLP) layer for each language, and used an ensemble
of CNN and Bi-LSTM and observed that the proposed method increased performance
in multi-lingual disaster event identification.
Lee et al. [29] constructed a news dataset to detect bursty terms from text and iden-
tify disaster events. They developed a term weighting scheme to score the burstiness
of a term. It helped in increasing situational awareness during events happening.
Tanev et al. [30] represented an automatic grammar learning algorithm to detect
micro-events from the news corpus. They manually created a dataset and then anno-
tated it with micro-events present. Also, in this paper, they classified tweets related
to eyewitnesses by extracting various features from them. Min et al. [31] extracted
286 A. Gupta et al.
fire-related sentences from Chosun news articles and estimated location from these
sentences using named entity recognition.
In a supervised approach, the machine learns from the labeled dataset. The main
objective of this approach is to learn from previous data and apply knowledge to
new data. This approach has two main categories: regression and classification. In
the regression, an equation between dependent and independent variables is created.
In classification, class labels are learned from the input labeled dataset and then
classes are predicted for new data. In disaster event detection, classification is mainly
used. Various machine learning techniques like support vector machine (SVM), naïve
Bayes (NB), k-nearest neighbors (KNN), decision trees (DT), random forest (RF),
and deep learning techniques like convolution neural network (CNN), bidirectional
long short term memory (Bi-LSTM), hierarchical attention network (HAN) comes
under this category. Different researchers use different techniques and compare them.
Table 1 lists the papers that used a supervised approach.
In an unsupervised approach, the machine is trained with the unlabeled dataset. The
machine itself finds the similarities between data items and clusters them accordingly.
Clustering and association are two types of unsupervised learning. But in disaster
event detection clustering is majorly used. There are many clustering techniques
for instance hierarchical, density-based, k-means, etc. In disaster event detection,
clusters can be made based on keyword or topic similarity between them or by
bounding them in space and time which is called spatio-temporal clustering. Also, in
clustering, topic modeling techniques for instance latent dirichlet allocation (LDA),
Latent semantic analysis (LSA), etc. is used because it helps in finding hidden topics
from the textual document that best represents the information in it. Table 2 lists the
papers that used an unsupervised approach. Since, clustering approaches have not
labeled dataset, therefore performance cannot be quantitatively measured.
Table 1 Supervised techniques used by different researchers
Author and Application/Event Event detection Performance Pros Cons
reference detected and dataset technique/Approach
Madichetty and Detection of tweets Stacked CNN with The combination of CNN Proposed approach with Used domain-specific
Sridevi [32] related to need and traditional classifiers and KNN at the first level domain-specific features. features
availability of resources (SVM, KNN, NB, DT) (base level) and SVM at Improved performance
(NAR) by organizations the second level (meta
and victims during the level) gained the highest
disaster from tweets average accuracy of 77.05
related to Nepal and Italy percent
earthquake 2015 and
2016
Azlan et. al. [33] Predict disaster events KNN, SVM, NB KNN achieved an Introduced a The severity of a disaster
from social media Twitter accuracy of 0.79 fuzzy-logic-based event is based on no. of
Disaster Event Detection from Text: A Survey
4 Discussion
This paper presents a review of techniques used for disaster event detection. The
main challenges involved in disaster event detection are correctly classifying disaster
events from others, getting insights into them, and finding location information.
Experimental results by different researchers show that there is a strong relation
between disaster events and Twitter activities. Many labeled twitter datasets are
available to use. News dataset are different from social media dataset in terms
of content, length, etc. Social media data contains many mistakes in the form of
spelling, sentence formation, abbreviations, grammar, etc. While using social media
as a dataset, researchers have to take care of them. The performance of techniques
depends largely upon used datasets, parameters chosen, etc. Before the popularity
of deep learning techniques, traditional machine learning classifiers such as SVM,
KNN, RF, etc. were very popular. But, with time, researchers’ interest in state-of-
the-art deep learning techniques increases. These techniques like CNN show better
results than SVM. But the time required in these techniques is more because the
network has to learn more parameters. Many research works used a combination of
different techniques that result in increased performance. The use of word embed-
dings other than bag-of-words, tf-idf, etc. also helped in increasing performance.
Also, if the dataset is balanced, then techniques output good performance values, but
in real-world applications, we encounter unbalanced data. There is a need to develop
good techniques to tackle this problem.
Disaster Event Detection from Text: A Survey 291
5 Conclusion
The rapid spread of information during catastrophe has increased many researchers’
interest in disaster event detection. This information is used to classify disaster-related
events from others and to get situation awareness during disasters or to predict the
location of disasters or victims. In this study, a survey about disaster event detec-
tion from text data is presented. Analysis of previous works is done based on input
source and techniques used. This study shows that work done on the Twitter dataset
is more prominent than other social media platforms and news datasets. To get situ-
ation information like need and help related messages during disasters social media
dataset is used. Location information extraction from the text is also important to
help someone and to locate disasters. Several classification and clustering techniques
are used to detect disaster-related events. Classification is widely used in detecting
disaster events. The main limitation of classification techniques is that they perform
well only for those events that are in the training dataset. Thus, there is a need to
put more emphasis on clustering techniques or to develop the techniques which will
help in finding those events that are not known in advance. Clustering events within
space and time dimensions are a good idea, but only using geo-tagged data limits the
dataset size and this may result in different results than actual. So, there is a need to
discover more techniques to detect the location and temporal expressions from the
text data.
References
1. C.-C. Pan, P. Mitra, Event detection with spatial latent Dirichlet allocation, in Proceedings of
the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries (Association
for Computing Machinery, 2011). https://doi.org/10.1145/1998076.1998141
2. W.Z. Aldyani, F.K. Ahmad, S.S. Kamaruddin, A survey on event detection models for text data
streams. J. Comput. Sci. 16(07), 916–935 (2020)
3. J.P. Singh, Y.K. Dwivedi, N.P. Rana, A. Kumar, K.K. Kapoor, Event classification and location
prediction from tweets during disasters. Ann. Oper. Res. 283(12), 21 (2019)
4. M. Sreenivasulu, M. Sridevi, Comparative study of statistical features to detect the target event
during disaster. Big Data Min. Analytics 3, 121–130 (2020)
5. T. Sakaki, M. Okazaki, Y. Matsuo, Earthquake shakes Twitter users: Real-time event detection
by social sensors, in Proceedings of the 19th International Conference on World Wide Web,
WWW ’10, vol. 01 (2010), pp. 851–860
6. A.H. Hossny, L. Mitchell, N. Lothian, G. Osborne, Feature selection methods for event detection
in Twitter: A text mining approach. Soc. Netw. Anal. Min. 10, 12 (2020)
7. Y. Huang, Y. Li, J. Shan, Spatial-temporal event detection from geo-tagged Tweets. ISPRS Int.
J. Geo-Inf. 7(04), 150 (2018)
8. H. Yun, Disaster events detection using Twitter data. J. Inf. Commun. Convergence Eng. 9, 02
(2011)
9. R. Li, K.H. Lei, R. Khadiwala, K.C.-C. Chang, TEDAS: A Twitter-based event detection and
analysis system, in 2012 IEEE 28th International Conference on Data Engineering (2012),
pp. 1273–1276
10. X. Guan, C. Chen, Using social media data to understand and assess disasters. Nat. Hazards
74, 11 (2014)
292 A. Gupta et al.
11. Y. Xiao, Q. Huang, K. Wu, Understanding social media data for disaster management. Nat.
Hazards, 79(09), 17 (2015)
12. Z. Lin, H. Jin, B.F. Robinson, X.G. Lin, Towards an accurate social media disaster event
detection system based on deep learning and semantic representation 12, 6–8 (2016)
13. Y.A. Ameen, K. Bahnasy, A. Elmahdy, Classification of Arabic tweets for damage event
detection (2020)
14. A. Olteanu, C. Castillo, F. Diaz, S. Vieweg, CrisisLex: A lexicon for collecting and filtering
Microblogged communications in crises, in Proceedings of the 8th International Conference
on Weblogs and Social Media, ICWSM 2014 (2014), pp. 376–385
15. A. Olteanu, S. Vieweg, C. Castillo, What to expect when the unexpected happens: Social
media communications across crises, in Proceedings of the 18th ACM Conference on Computer
Supported Cooperative Work & Social Computing (Association for Computing Machinery,
2015), pp. 994–1009
16. V. Pekar, J. Binner, H. Najafi, C. Hale, V. Schmidt, Early detection of heterogeneous disaster
events using social media. J. Am. Soc. Inf. Sci. 71, 03 (2019)
17. K. Zahra, M. Imran, F. Ostermann, Understanding eyewitness reports on twitter during disasters
05 (2018). https://doi.org/10.5167/uzh-161922
18. M. Imran, P. Mitra, C. Castillo, Twitter as a lifeline: Human-annotated Twitter Corpora for
NLP of crisis-related messages. CoRR abs/1605.05894 (2016)
19. F. Alam, F. Ofli, M. Imran, Crisismmd: Multimodal twitter datasets from natural disasters, in
Proceedings of the International AAAI Conference on Web and Social Media, vol. 12(1) (2018)
20. A. Kruspe et al., Classification of incident-related tweets: Tackling imbalanced training data
using hybrid CNNs and translation-based data augmentation, in Proceedings of the 27th Text
Retrieval Conference (TREC 2018) vol. 16, (Gaithersburg, Maryland, 2018), Nov 14
21. W.G. Choi, S.-H. Jo, K.-S. Lee, CBNU at TREC 2018 incident streams track, in TREC (2018)
22. M. Yu, Q. Huang, H. Qin, C. Scheele, C. Yang, Deep learning for real-time social media
text classification for situation awareness—using Hurricanes Sandy, Harvey, and Irma as case
studies. Int. J. Digit. Earth, 12(02), 1–18 (2019)
23. A. Kumar, J.P. Singh, Location reference identification from tweets during emergencies: A
deep learning approach. Int. J. Disaster Risk Reduction 33, 01 (2019)
24. T. Sakaki, M. Okazaki, Y. Matsuo, Tweet analysis for real-time event detection and earthquake
reporting system development. IEEE Trans. Knowl. Data Eng. 99, 11 (2013)
25. S. Unankard, X. Li, M.A. Sharaf,Location-based emerging event detection in social networks,
in Web Technologies and Applications (Springer, Berlin, 2013). https://doi.org/10.1007/978-3-
642-37401-2_29
26. J. Kersten, A. Kruspe, M. Wiegmann, F. Klan, Robust filtering of crisis-related Tweets 05
(2019). https://elib.dlr.de/127586/
27. T. Nugent, F. Petroni, N. Raman, L. Carstens, J.L. Leidner,A comparison of classification
models for natural disaster and critical event detection from news, in 2017 IEEE International
Conference on Big Data (Big Data) (2017), pp. 3750–3759
28. Z. Ahmad, D. Varshney, A. Ekbal, P. Bhattacharyya, Multi-lingual event identification in
disaster domain 4 (2019)
29. S. Lee, S. Lee, K. Kim, J. Park, Bursty event detection from text streams for disaster manage-
ment, in Proceedings of the 21st International Conference on World Wide Web (Association
for Computing Machinery, 2012). https://doi.org/10.1145/2187980.2188179
30. H. Tanev, V. Zavarella, J. Steinberger, Monitoring disaster impact: detecting micro-events and
eyewitness reports in mainstream and social media, in ISCRAM (2017)
31. K. Min, J. Lee, K. Yu, J. Kim, Geotagging location information extracted from unstructured
data, in 10th International Conference on Geographic Information Science (GIScience 2018)
(2018). https://doi.org/10.4230/LIPIcs.GISCIENCE.2018.49
32. S. Madichetty, M. Sridevi, A stacked convolutional neural network for detecting the resource
tweets during a disaster. Multimedia Tools Appl 80 (2021)
33. F.A. Azlan, A. Ahmad, S. Yussof, A.A. Ghapar, Analyzing algorithms to detect disaster events
using social media, in 2020 8th International Conference on Information Technology and
Multimedia (ICIMU) (2020), pp. 384–389
Disaster Event Detection from Text: A Survey 293
34. E. Spiliopoulou et al., Event-related bias removal for real-time disaster events. arXiv preprint
arXiv:2011.00681 (2020)
35. A. Kuila, S. c. Bussa and S. Sudeshna, A neural network based event extraction system for
Indian languages, in FIRE (2018), pp 291–301
36. V. Nguyen, T.N. Anh, H.-J. Yang, Real-time event detection using recurrent neural network in
social sensors. Int. J. Distrib. Sens. Netw. (2019). https://doi.org/10.1177/1550147719856492
37. T. Cheng, T. Wicks, Event detection using Twitter: A spatio-temporal approach. PLoS ONE 9,
06 (2014)
38. Z. Wang, X. Ye, M.-H. Tsou, Spatial, temporal, and content analysis of Twitter for wildfire
hazards. Nat. Hazards 83 (2016)
Context-Adaptive Content-Based
Filtering Recommender System Based
on Weighted Implicit Rating Approach
Abstract Recommender systems’ job is to churn and filter out desired relevant
information from an unorganized pile of data available. Ratings score data are the
key parameter to recommender engine computations. Rating score data can be either
explicit where users directly give the preference score or implicit where user behav-
iors are to be captured to compute the score. In many applications where user’s explicit
rating is not possible, the implicit rating is computed. Computing implicit rating needs
to be carefully modeled to achieve the best relevant recommendations. A context-
aware feature is needed in some applications to tune the filtering model to extract rele-
vant details to address the context. This paper proposes a recommendation application
to address the mapping of call for research projects published by various funding
agencies to the aspiring researchers who wait and look for applying to such call for
proposals relevant to their domain interest. This paper proposes a context-adaptive
content-based filtering recommendation system supported by service-oriented on-
demand cloud-based architecture. The proposed recommendation system algorithm
is adaptive to address the different call for proposals published to achieve the level
of mapping of relevant research paper abstracts by allowing it to tune appropriate
weight parameters.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 295
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_23
296 K. Navin and M. B. M. Krishnan
1 Introduction
2.1 Algorithm
This section describes about the data set, experimental setup, and evaluation metrics
used in measuring the performance of the recommendation system.
A data set of 500 mock proposal documents is collected from research abstracts
submitted for internal research grants from five disciplines, namely ‘smart farming,’
‘innovative data science projects,’ ‘health care for TB program,’ ‘cyber security
projects,’ and ‘machine learning applications.’ Keyword phrases for each call for
proposals were framed, and weight values for key phrases are set according to the
domain context, i.e., specific to call for proposal area for evaluation purposes. For
example, a sample call for proposal with the title smart farming had keyword phrases
framed like ‘agriculture’, IoT, ‘sensors,’ ‘temperature sensors,’ ‘humidity sensors,’
etc., which have more weight values, but key phrases like ‘data collection’ and
‘analytics’ will have less weight value. Similar contextual meaning-based key phrases
like ‘sensors,’ ‘temperature sensors,’ and ‘humidity sensors’ were given with the same
300 K. Navin and M. B. M. Krishnan
weight values. Dictionary corpus was also made enriched with adequate reserve
words for popular key phrases.
For the sample of five call for proposals, 100 relevant proposal documents papers are
allotted for each call for proposals totaling up to 500. The recommendation system
was tested for each call for proposals with the data set of 100 proposal documents
containing abstract portraying ideas for the proposal. Identifying false selection and
false rejections of proposal documents was done as a measure of a performance metric
to evaluate the proposed recommendation system. Recommendation effectiveness
factor (REF) is based on the F1 score, since class distribution is imbalanced [20,
21]. It is calculated to provide a metric for the effectiveness of the recommendation
system. The following Eq. (1) to (3) are used to measure the effectiveness of the
recommendation system.
TRP → total relevant selected papers of recommendation system
FSP → false selected papers of recommendation system
FRP → false rejected papers of recommendation system
TRP
Precision = (1)
TRP + FSP
TRP
Recall = (2)
TRP + FRP
Precision * Recall
F1 score = 2 ∗ (3)
Precision + Recall
Context-Adaptive Content-Based Filtering Recommender System … 301
The testing of the recommendation engine was carried out with a given set of sample
data. The result implies a minimum threshold of five key to ten key phrases per call for
proposal which is required for minimum effective functioning of recommendation
system. Similarly, it is observed that 15 to 20 key phrases for a call for proposal
are the upper threshold limit. Beyond that, little impact in performance is observed.
Results reflect fewer the keyword phrases set for the selection of relevant proposal
documents resulted in false rejection (FRP) of papers by the recommendation engine
while too many keyword phrases resulted in false selection (FSP) of irrelevant papers
by recommendation engine. The result implies too less the keyword phrases resulted
in missing key information for processing and resulted in more false rejection case.
More keyword phrases reflect the inclusion of overlapping and less impactful
information resulting in a slight increase in false selection cases. It is observed from
the results that the performance of the recommendation engine is affected due to
the irrelevant setting of keyword phrases and improper setting of weights for key
phrases. From the observation of results through the prescribed evaluation process, it
is observed that the distribution of weights for key phrases has a part in designing the
required evaluation metrics for each call for proposals. If the distribution of weights
for key phrases between primary key phrases to secondary key phrases is more or
like the monotonically decrementing model, then both false rejection count and false
selection count can be kept optimally minimal, and F1 score can be maintained high
with high values for precision and recall. For the sample call for proposal, ‘smart
farming’ which is a narrowed domain, the intention was to retrieve more accurate
recommendation papers using monotonically decreasing model which followed key
phrase weight distribution.
Table 1 and Fig. 1a, b show performance metrics through precision, recall, and F1
score which are relatively high and reflect accurate retrieval of papers. Table 2 and
Fig. 1c, d show the performance metrics through precision, recall, and F1 score for
sample call for proposal, ‘innovative data science projects’ which is a broad domain.
Here the distribution of weights for key phrases between primary key phrases to
secondary key phrases is more or like the right-skewed exponentially decrement
model. False rejection count is taken care to be almost null, and false selection count
Fig. 1 Performance of the recommender engine through precision, recall and F1 score on a given
data set with a varying number of key phrases for various call for proposals
Context-Adaptive Content-Based Filtering Recommender System … 303
Fig. 1 (continued)
4 Implementation Model
Figure 2 shows the architecture model for implementing the research proposals
recommendation system. It follows service-oriented architecture (SOA) design
approach implemented using resources of cloud service providers [22, 23]. Web
modules are hosted in Amazon Web Services (AWS) [24, 25] and Elastic Compute
Cloud (EC2) Bean Stack with LAMP stack [20]. The database for the web portal is
maintained in scalable Amazon Relational Database Service (RDS). The Elastic Bean
stack handles auto-scaling and load balancing of web applications. Amazon RDS
gives a scalable database to store entries taken from the form that is stored in DB and
listed in the queue module. The files submitted through fund seeker’s web portal and
fund providers’ web portals are stored in Amazon S3 scalable cloud storage to canter
the growing need for accommodating submitted proposals [24]. Amazon Elastic
Container Service (ECS) is suitable for long-running tasks, and batch jobs are used
to host the REST API [26]-based NLP recommendation system service module which
is deployed into Docker container. Each submitted fund seekers’ proposal and newly
fed call for proposals are fetched from cloud storage and compared by the recom-
mendation system to compute and send recommendation results to the cloud DB.
Amazon Simple Queue Service (SQS) serves as the bridge for connecting between
web application module and service-oriented recommendation engine module to
implement batch processing of computing recommendations for newly submitted
call for proposals and newly submitted proposals. Google firebase [27] is used for
push notifications to notify mobile clients as well as desktop clients.
The proposed recommendation system, a tool for researchers to map their research
interest to relevant call for proposals, employs context-aware content-based tech-
nique that helps to canter the need of collaborating researchers and funding institu-
tions. The proposed implementation model featuring a service-oriented architecture-
based design approach can be easily modified to develop a model for similar types of
recommendation systems applications like bibliography reference mapping, e-portals
system, etc. Future work would involve designing a more sophisticated recommen-
dation system which would involve training of the system and deep learning to play
a role in fine-tuning the algorithm to achieve the best possible results.
Acknowledgements Thanks to the department of computer science SRM IST for providing the
data sets of research abstracts to test our recommender engine product.
References
12. K. Haruna, M.A. Ismail, D. Damiasih, J. Sutopo, T. Herawan, K. Haruna et al., A collaborative
approach for research paper recommender system. PLOS ONE 12(10), e0184516 (2017), by
F. Xia. DOI.org (Crossref), https://doi.org/10.1371/journal.pone.0184516
13. U. Javed, et al., A review of content-based and context-based recommendation systems. Int.
J. Emerg. Technol. Learn. (IJET) 16(03), 274 (2021). DOI.org (Crossref), https://doi.org/10.
3991/ijet.v16i03.18851
14. C. Nascimento et al., A source independent framework for research paper recommendation, in
Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries
(ACM, 2011), pp. 297–306. ACM Digital Library, https://doi.org/10.1145/1998076.1998132
15. S. Philip et al., Application of content-based approach in research paper recommendation
system for a digital library. Int. J. Adv. Comput. Sci. Appl. 5(10), 2014. DOI.org (Crossref),
https://doi.org/10.14569/IJACSA.2014.051006
16. F. Ferrara, N. Pudota, C. Tasso, A keyphrase-based paper recommender system, in Digital
Libraries and Archives. IRCDL 2011. Communications in Computer and Information Science,
vol. 249, eds. by M. Agosti, F. Esposito, C. Meghini, N. Orio (Springer, Berlin, 2011)
17. M.K. Najafabadi et al., A survey on data mining techniques in recommender systems. Soft
Comput. 23(2), 627–654 (2019). Springer Link, https://doi.org/10.1007/s00500-017-2918-7
18. J. Shu, X. Shen, H. Liu, B. Yi, Z. Zhang, A content-based recommendation algorithm for
learning resources, in Multimedia Systems (Springer, 2017) https://doi.org/10.1007/s00530-
017-0539-8
19. Y. Gu et al., Learning global term weights for content-based recommender systems, in Proceed-
ings of the 25th International Conference on World Wide Web, International World Wide Web
Conferences Steering Committee (ACM Digital Library, 2016), pp. 391–400, https://doi.org/
10.1145/2872427.2883069
20. S.A. Gunawardana, G. Shani, A survey of accuracy evaluation metrics of recommendation
tasks. J. Mach. Learn. Res. 10, 2935–2962 (2009)
21. M. Ge, C. Delgado-Battenfeld, D. Jannach, Beyond accuracy: evaluating recommender systems
by coverage and serendipity, in Proceedings of the Fourth ACM Conference on Recommender
Systems (2010), pp. 257–260
22. T. Lorido-Botran et al., A review of auto-scaling techniques for elastic applications in cloud
environments. J. Grid Comput. 12(4), 559–592 (2014). Springer Link, https://doi.org/10.1007/
s10723-014-9314-7
23. A. Biswas, S. Majumdar, B. Nandy, A. El-Haraki, An auto-scaling framework for controlling
Enterprise resources on clouds, in Proceeding of 15th IEEE/ACM International Symposium on
Cluster, Cloud and Grid Computing (CCGrid), (C4BIE workshop, Shenzhen, 2015), pp. 971–
980
24. S. Afzal, G. Kavitha, Load balancing in cloud computing—A hierarchical taxonomical clas-
sification. J. Cloud Comput. 8(1), 22 (2019). DOI.org (Crossref), https://doi.org/10.1186/s13
677-019-0146-7
25. Deploying a High-Availability PHP Application with an External Amazon RDS Database to
Elastic Beanstalk—AWS Elastic Beanstalk. https://docs.aws.amazon.com/elasticbeanstalk/lat
est/dg/php-ha-tutorial.html. Accessed 31 Jan 2021
26. V. Padghan, Amazon S3 tutorial: Everything about S3 bucket storage, in Great Learning Blog:
Free Resources What Matters to Shape Your Career!, 10 Sept 2020, https://www.mygreatlearn
ing.com/blog/amazon-s3/
27. “Documentation.” Firebase, https://firebase.google.com/docs: Dec 2020. Accessed 31 Jan 2021
A Deep Learning-Based Classifier
for Remote Sensing Images
1 Introduction
Classification is the act or the process to group something depending upon its char-
acteristics or features. Classification in image processing plays an important role to
characterize and group the pixels based on their specific features. With the emer-
gence of computer science technology, the classification methods can be applied
in various fields such as remote sensing images and under water images. Remote
sensing images are the part of earth surface which are taken from space, which may
be analog or digital. Classifications in remote sensing images means labeling the
images according to the semantic classes like grass, farm, industry, and resident
[1]. Feature extraction is the important part in image classification problem. Image
classification dataset contains two types of data, train data and test data. The train
dataset is used for learning purpose. In learning process, feature extraction is done
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 309
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_24
310 S. R. Sahu and S. Panda
and it is fed to the machine. With the advancement of machine learning technology,
feature extraction is always carried out prior to classification. But it is quite lengthy
process and also consumes much memory. It is also difficult to extract features in
high dimensions dataset.
To overcome the high dimensionality problem, deep learning is introduced as an
effective method for multi-layer feature extraction [2]. Classification in deep learning
method is pretty good as compared to the traditional method like minimum distance
supervision classification [3], iterative self-organization (ISO) cluster unsupervised
classification [4], support vector machine [5], and random forest classification [6].
Deep learning works on the basics of neural network which can act like human and
mimics the human brain. Deep learning-based applications can also relate to the
human behavior, predict the things, and are capable of taking the decisions. Deep
learning method offers various neural network models in the field of high spatial
resolution remote sensing image applications [2]. With the development of deep
learning, the important demand for research in remote sensing field is observation of
Earth through remote sensing, which can identify and classify the land used and land
covered (LULC) scene from space [7, 8]. Using the developed techniques, many earth
surfaces are taken into focus for research purpose. The technology can be utilized in
many remote sensing applications like urban green space detection [9], hard target
detection [10], urban flood [11], and so on. Convolutional neural network (CNN) [12]
is the very well-known and popular model used in classification problems. Besides
this, deep belief network (DBN) [13] and recurrent neural network (RNN) [14] are
also popular in deep learning mechanism. These are the basics of deep learning
algorithm, which provides the neural network architectures like VGG16, ResNet-
50, MobileNetV2 etc. These architectures are capable to learn features from input
data automatically. These algorithms are heavily used in detection and classification
problems.
In this paper, we have demonstrated the classification of remote sensing image
dataset, RSSCN7 [15] using deep learning concept. As we know that deep learning
is an advanced technology which works on principle of neural network, the concepts
of neural network are derived from neuron of human brain. In human brain, neuron
consists of multiple dendrites that provide input signal to the neuron. Inside the
neuron, there is a cell face containing a nucleus which perform as a functional element
and through the axon, the signal reach to nerve end. Thus, a desire output is generated.
Artificial neuron also works like similar manner. Generally, it consists of three layers
called input layer, hidden layer, and output layer. Input layer provides the input data
to the neuron. Hidden layer performs the calculation, and output layer generates the
expected output.
2 Related Works
The important and interesting research area in remote sensing is scene classification
and detection. The main motive of classification is to determine label correctly with
A Deep Learning-Based Classifier for Remote Sensing Images 311
their belonging classes according to their features. There are many datasets like UC
Merced land used dataset [16], RSSCN7 dataset [2], Areal Image Dataset (AID)
dataset [5], SpaceNet dataset [17] etc., are considered in the research field for classi-
fication. Since deep learning is an advanced technology, it can fulfill the requirements
of feature extraction and accuracy performance over the traditional method. In current
decade, a large amount of research has been done by deep learning concept on the
area of detection and classification.
Zou et al. [15] proposed a method on the basics of feature selection by deep
learning concept for remote sensing scene classification using RSSCN7 dataset. They
trained the data by 100 numbers of epochs. In result, training accuracy was found
to be 77%. On classification testing, it is clearly seen from confusion matrix that
maximum of 93.5% accuracy gave the best result for forest class and a minimum of
65% accuracy belongs to river class. Tayara et al. [18] used NWPU VHR-10 dataset
and presented a uniform one stage model based on CNN architecture to detect the
object and compared with VGGG-16, ResNet-50, and ResNet-101. They obtained a
better mean average precession result as compared to these models. Cheng et al. [7]
demonstrated different type of transfer learning methods like AlexNet, VGGNet-16,
and GoogLeNet to determine the classification performance using RESICS45 remote
sensing dataset. The dataset consists of 31,500 images with 45 different classes. Each
class contains 700 images with an image size of 256 * 256 pixels. Cheng et al. [19]
designed for learning a new rotation invariant layer with CNN using NWPU VHR-10
dataset to improve the performance of multi-class object detection.
In another work, Scott et al. [20] employed CaffetNet-derived DCNN and
GoogleNet-derived DCNN using UCM dataset to resolve the performance of land
cover classifications. They reported the classification accuracy of 95% for both the
models at 90% of confidence level when using the augmented dataset with fine-tuned
feature extraction. Zhai et al. [21] developed two methods of object detection; one
is position-sensitive balancing (PSB) framework and another is the residual network
which works in accordance with fully connected network that can detect 10 multi-
class objects. Li et al. [2] aimed on classification employing with the urban built up
areas. They developed a hybrid model named Same Model with a Different Training
Rounding (SMDTR) using CNN and CapsNet. They compared the accuracy perfor-
mance of training testing and classifications by SMDTR-CNN and SMDTR-CapsNet
with CNN and CapsNet by taking High Spatial Resolution Remote Sensing (HSRRS)
image dataset. They concluded that the accuracy of SMDTR-CNN is 0.2% more than
CNN, and accuracy of SMDTR-CapsNet is 0.6% lower than CapsNet. Pal [6] used
a random forest classifier to classify the agricultural area using Landsat-7 Enhanced
Thematic Mapper (ETM+ ) dataset, consists of 7 classes with 2700 training and
2037 testing pixels. The model achieved maximum accuracy of 97.31% matched
with wheat class and minimum of 81.9% belongs to lettuce class. The overall accu-
racy of testing was found to be 88.37%. Yang et al. [22] approached a hierarchical
deep learning framework to classify the land use objects in geospatial database.
They used encoder–decoder CNN based to classify the land used in multiple levels
like hierarchically and simultaneously. They introduced joint optimizer which can
predict by selecting the hierarchical tuple over all levels in which the joint class score
312 S. R. Sahu and S. Panda
is maximum. It can provide consistent results across different levels. In result, they
achieved overall accuracy up to 92%.
So, many researchers have done research in the field of remote sensing using
traditional and new technologies with different kinds of remote sensing datasets.
Some have also developed their new hybrid methods to perform more accurately
than existing. Our research work is based on deep learning methods which is very
advanced and popular technology since some years.
3 Methods
In this section, we have briefly explained about principles, procedures, and method-
ology used to achieve the expected result and outcomes of the research including
suitable flowchart diagram and architecture of entire research work. The procedure
of our research work contains description about the dataset, the procedure for data
augmentation, and how to design the model to train data.
In our study, we have used RSSCN7 remote sensing dataset which was released by
Wuhan University in 2015 [23], collected from Google Earth (Google Inc.) [7]. A
total numbers of 2800 images are present in the dataset and categories as 7 different
remote sensing image classes; these consists of grass, farm, forest, industry, river,
parking, and residential. Each class is having 400 images of size 400 * 400 pixels.
Figure 1 shows the different types of images in RSSCN7 dataset. From the whole
dataset, we have taken a ratio of 80:20 for training and testing.
Convolutional layer is the first layer in CNN. This layer is responsible for extracting
the features of an input data with the help of a weight matrix. The weight matrix
is also known as kernel having a given size. It can be argued the kernel as a slider
because it slides or moves all over the image at least once to extract the feature of
that image. Convolutional layer convolves the input data and calculate the result for
it and pass the result to the next layer. Consider we have an input image I (M, N)
314 S. R. Sahu and S. Panda
where M represent the number of rows and N represents the number of column.
For convolution, we need a weight matrix that is W (m, n) where m, n is number of
rows and column, respectively. We can obtain the size of extracted feature matrix by
(M-m + 1, N–n + 1). Let R (i, j) is the extracted feature matrix. Now the extracted
component for each (i, j) can be calculated by given formula as shown in Eq. 1,
a
b
R(i, j) = W (m, n).I (i − m, j − n) (1)
m=−a n=−b
where a and b are the constant integers. For example, if we have input image of size
6 * 6 and a weight matrix of size 3 * 3, it will fit on the image from starting co-ordinate
and calculate the feature for that mask area. It will move on entire image, and the
process will stop after calculating all the features. Thus in result, we will get a 4 * 4
matrix which is the extracted features of the image. After convolution operation is
done, the extracted features are passed to the next layer for classification. Figure 3
shows the function of convolution operation.
Pooling layer is placed in between the convolutional layer and fully connected layer.
Pooling is a type of operation which reduces the image dimensions like height and
width and stores their important characteristics. Sometimes, the input images are too
large that we need to reduce the number of trainable parameters. Pooling layer is
responsible to reduce the number of parameter and calculations used in the network
model. Basically, polling layer is of two types: max pooling and average pooling.
Max pooling can operate faster and have better accuracy performance than average
pooling. It is a superior operation to select the invariant features with improving the
generalization. Max pooling is the popularly referral pooling operation due to its
better performance and minimizing capability of calculations than average pooling.
In our proposed work, we have used max pooling operation for smooth calculation
and good performance.
Fully connected layer is the last phase in CNN architecture which is followed by
convolutional layer. As the name suggest, here, the neurons are fully connected
with the previous layer neuron. This layer used activation function which helps the
network to learn, if the model is complexly designed. The activation function in neural
network model is similar with human brain system which is capable of deciding and
predicting the things. It helps to decide which data are to be accepted and which
data to be fired to the next neuron. Every neuron is connected with a link which
is assigned with a weight. It accepts the output signal from the previous layer and
converts the signal into some understandable form which will again fed as input to
the next neuron. If the neuron has not got the proper signal, then it can go back for
update their weight. This method is known as back propagation.
Activation function has the ability to add non-linearity into a neural network
model. Mostly used activation functions in neural models are sigmoid, ReLU, and
softmax. Sigmoid is used to classify the binary class. Softmax is a multi-class classi-
fier helps to classify having more than two classes. ReLU stands for rectified linear
activation function. It is a piecewise linear function which generates the output as
positive for valid input otherwise it will generate zero.
4 LeNet-5
LeNet-5 [24] is a pre-trained CNN model, introduced by LeCun in 1998. The model
consists of three set of convolution layers and designed such a way that the average
pooling layer is placed after one convolution layer. But the last convolution layer has
not any pooling layer. First, second, and third convolution layer have the filter size of
6, 12, and 120, respectively. Then, fully connected layers having 84 and 7 nodes are
316 S. R. Sahu and S. Panda
added after the flatten layers. The last fully connected layer used softmax function
for classification of dataset. The architecture of LeNet-5 is shown in Fig. 4.
In this section, we have divided the result into three sections. The first section of our
work is to design a robust model which can train the data and give accurate result.
A Deep Learning-Based Classifier for Remote Sensing Images 317
On the second section, we have shown the classification result with the help of a
confusion matrix. On third section, we have discussed the accuracy of our model.
The confusion matrix shows the percentage of matching class between train data and
test data.
All the implementations are done by using Python 3.8.3 and Jupyter notebook-
6.0.3 environment on Dell Intel(R) Core (TM)—i3 6100U processor with 4 GB
RAM.
We have briefly discussed about our proposed CNN model in previous section.
Figure 6a shows training accuracy and validation accuracy of LeNet-5 model and
Fig. 6 Accuracy and loss curve of LeNet-5 model. a training versus validation accuracy, b training
versus validation loss
318 S. R. Sahu and S. Panda
Fig. 7 Accuracy and loss curve of proposed model. a training versus validation accuracy, b training
versus validation loss
Fig. 6b shows the training and validation loss of the model. The highest training accu-
racy of LeNet-5 model is 86% and a highest of 87% validation accuracy. Figure 7
shows the accuracy and loss curves of our proposed model. The graph clearly indi-
cates that we obtained a highest of 90% training accuracy and 94% validation accu-
racy on 100 numbers of epochs. However, the validation accuracy fluctuated suddenly
and unexpectedly reached to 84%. The fluctuation of validation model occurs because
of over-fitting of a model or for small size of validation dataset.
The accuracy of pre-trained LeNet-5 model was found to be 84%, whereas our
proposed method gave the accuracy of 89%. The accuracy of a model can be deter-
mined by four parameters, i.e., true positive (TP), true negative (TN), false positive
A Deep Learning-Based Classifier for Remote Sensing Images 319
Fig. 8 Confusion matrix of RSSCN7 dataset a using LeNet-5, b using our proposed CNN model
(FP), and false negative (FN). We can also determine precession, recall, and F1-score
from there parameters.
True Positive (TP)—It is the correctly predicted positive value that means the
model predicted the actual class correctly as yes.
True Negative (TN)—It is the correctly predicted negative value that means the
model predicted the actual class correctly as no.
False Positive (FP)—It is the falsely predicted classes where the model predicts
the class which not belongs to the actual class.
False Negative (FN)—It is the value where the actual class is yes but predicted
class is no.
Accuracy of a model determines the testing performances of a model. How effec-
tively our model is working can be known by the accuracy matrices. Accuracy of a
model is calculated by using formula as shown in Eq. (2).
However, there are still some problems like misclassifications of data. In the
future, we will design a model which can give more accurate result and also can
detect the object. In addition, we can use different types of datasets to check our
testing performance.
References
1. Y. Gao, J. Shi, J. Li, R. Wang, Remote sensing scene classification based on high-order graph
convolutional network. Eur. J. Remote. Sens. 54(S1), 141–155 (2021)
2. W. Li, H. Liu, Y. Wang, Z. Li, Y. Jia, G. Gui, Deep learning-based classification methods for
remote sensing images in urban built-up areas. IEEE Access 7, 36274–36284 (2019)
3. M.E. Hodgson, Reducing the computational requirements of the minimum-distance classifier.
Remote Sens. Environ. 25(1), 117–128 (1988)
4. K.-Y. Huang, The use of a newly developed algorithm of divisive hierarchical clustering for
remote sensing image analysis. Int. J. Remote Sens. 23(16), 149–168 (2006)
5. A. Chambolle, An algorithm for total variation minimization and applications. J. Math. Imag.
Vis. 20(1), 89–97 (2004)
6. M. Pal, Random forest classifier for remote sensing classification. Int. J. Remote Sens. 26(1),
217–222 (2007)
7. G. Cheng, J. Han, X. Lu, Remote sensing image scene classification: Benchmark and state of
the art. Proc. IEEE 105(10), 1865–1883 (2017)
8. L. Gómez-Chova, D. Tuia, G. Moser, G. Camps-Valls, Multimodal classification of remote
sensing images: a review and future directions. Proc. IEEE 103(9), 1560–1584 (2015)
9. A. Canetti, M.C. Gárrastazu, P.P. de Mattos, E.M. Braz, S.P. Netto, Understanding multi-
temporal urban forest cover using high resolution images. Urban For Urban Greening 29,
106–112 (2018)
10. A. Milan, An integrated framework for road detection in dense urban area from high-resolution
satellite imagery and Lidar data. J. Geograph. Inf. Syst. 10(2), 175–192 (2018)
11. Y. Wang, A.S. Chen, G. Fu, S. Djordjevi, C. Zhang, D.A. Savić, An integrated framework
for high-resolution urban flood modelling considering multiple information sources and urban
features. Environ. Model. Softw, 107, 85–95 (2018)
12. S.-C.B. Lo, H.-P. Chan, J.-S. Lin, H. Li, M.T. Freedman, S.K. Mun, Artificial convolution neural
network for medical image pattern recognition. Neural Netw. 8(7–8), 1201–1214 (1995)
13. R. Zand, K.Y. Çamsarı, I. Ahmed, S.D. Pyle, C. Kim, S. Datta, R. Demara, R-DBN: A resistive
deep belief network architecture leveraging the intrinsic behavior of probabilistic devices, in
Proceeding of the ACM Great Lakes Symposium VLSI (GLSVLSI) (2018), abs/1710.00249,
pp. 1–8
14. M.Y. Miao, M. Gowayyed, EESEN: End-to-end speech recognition using deep RNN models
and WFST-based decoding, in Proceeding of the Automatic Speech Recognition Understand
(2016), pp. 167–174, Dec 2016
15. Q. Zou, L. Ni, T. Zhang, Q. Wang, Deep learning based feature selection for remote sensing
scene classification. IEEE Geosci. Remote Sens. Lett 12(11), 2321–2325 (2015)
16. Y. Yang, S. Newsam, Bag-of-visual-words and spatial extensions for land-use classification.
in Proceeding of the 18th SIGSPATIAL International Conference on Advances in Geographic
Information Systems (2010), pp 270–279
17. D. Lindenbaum, T. Bacastow, SpaceNet: A remote sensing dataset and challenge series, in
Proceeding of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)
(2018), pp. 1–10, Jul 2018
18. H. Tayara, K.T. Chong, Object detection in very high-resolution aerial images using one-stage
densely connected feature pyramid network. Sensors 18(10), 3341, (2018)
A Deep Learning-Based Classifier for Remote Sensing Images 321
19. G. Cheng, P. Zhou, J. Han, Learning rotation-invariant convolutional neural networks for object
detection in VHR optical remote sensing images. IEEE Trans. Geosci. Remote Sens. 54(12),
7405–7415 (2016)
20. G.J. Scott, M.R. England, W.A. Starms, R.A. Marcum, C.H. Davis, Training deep convolutional
neural networks for land-cover classification of high-resolution imagery. IEEE Geosci. Remote
Sens. Lett. 14(4), 549–553 (2017)
21. H. Zhai, H. Zhang, L. Zhang, P. Li, Cloud/shadow detection based on spectral indices for
multi/hyperspectral optical remote sensing imagery. ISPRS J. Photogramm. Remote Sens.
144, 235–253 (2018)
22. C. Yang, F. Rottensteiner, C. Heipke, A hierarchical deep learning framework for the consistent
classification of land use objects in geospatial databases. ISPRS J. Photogramm. Remote. Sens.
177, 38–56 (2021)
23. S. -C. Hung, H. -C. Wu, M.H. Tseng, Remote sensing scene classification and explanation
using RSSCNet and LIME. Appl. Sci. 10, 6151 (2020)
24. Y. LeCun, L. Bottou, Y. Bengio, P. Haffner, Gradient based learning applied to documentation
recognition, in Proceedings of the IEEE (1998), pp. 2278–2324 Dec 1998
Performance Evaluation of Machine
Learning Algorithms to Predict Breast
Cancer
Abstract Breast cancer is increasing in all over the world year by year [2, 13],
and it is a dominant cancer in the worldwide. Due to the lack of medical facilities,
these cases are not early diagnosed, and the early detection will help to lower the
death rates. This paper proposes machine learning (ML) methods to foretell whether
the person is suffering with disease or not. It compares results of various ML algo-
rithms: decision tree (DT), logistic regression, random forest classifier (RF), LGBM
classifier, support vector machine classifier (SVC), and K-nearest neighbor (KNN).
The above-mentioned ML algorithms are applied on the data after filling the missing
values, after removing outliers, after applying correlation, after applying the SMOTE
technique and to predict the best model.
1 Introduction
Cancer is a generalized term for group of diseases which can affect at any part of the
body. Rapid growth of abnormal cells which can form as tumor is called as cancer
and slowly affects adjacent parts of the body and spread to other organs [1]. Breast
cancer (BC) is one of the major cancers among the women throughout the worldwide,
which represents the majority of new cases and deaths according to the World Health
Organization (WHO) report, making it as a severe health issue in the present society
[2]. WHO declared that nearly 10 million deaths in 2020 as breast (22.6 lakh cases),
lung (22.1 lakh cases), colon and rectum (19.3 lakh cases), prostate (14.1 lakh cases),
skin (non-melanoma) (12 lakh cases), and stomach (10.9 lakh cases) [3]. Among the
other types of cancers, breast cancer is considered to be the main cause of death
in women in most countries [4]. It is difficult to overestimate the importance of
appropriate breast cancer diagnosis, as the disease ranks second among all cancers
that lead to death in women [5].
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 323
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_25
324 S. S. Sunayna et al.
Family history, reproductive factors, radiation, obesity, and lifestyle are the most
influencing factors for higher risk of breast cancer in women. The early diagnosis
saves the life of patient with advanced treatment. As it is the most dangerous cancer,
always it has a high mortality rate. As per recent statistics, 25% of all new cancer
cases are BC alone and 15% of total cancer deaths are BC among women throughout
the world. Figure 1 shows worldwide different types of cancers with different colors
by WHO [1]. In that, PINK color indicates breast cancer. The early detection of
disease can improve the chance of survival of patients, and it can also suggest timely
treatment to patients.
As the population is increasing continuously, in the same way different types of
health issues are also raising exponentially. Nowadays, huge number of patients are
visiting hospitals for different types of treatments. In today’s world, majority of the
hospitals are maintaining hospital management system to maintain patient health-
care data. But, this data is rarely used in the decision-making process or sometimes
decision’s based on doctors experience may not be guaranteed [6, 7]. So, there is
a need of an automated disease prediction system for saving the life’s of patients.
From the past few years, artificial intelligence and machine learning (ML) techniques
are widely used in medical field to maintain intelligent healthcare systems [8–12].
Researchers are continuously finding new ways to fight with these cancers.
2 Literature Survey
Formation of irregular development of the tissue is the major reasons for the devel-
opment of cancer which is affected by the production of estrogen. Sometimes tumor
may be dangerous (malignant) or non-dangerous (Benign) [13]. Malignant tumors
Performance Evaluation of Machine Learning Algorithms … 325
are expanding to other adjacent organs of the body, whereas benign tumors cannot
expand to other organs of the body and also the cells in malignant tumors are divided
more briskly and expand faster than cells in benign tumor [14]. Early detection of
disease may be difficult in the initial stage, due to the absence of symptoms but,
after some clinical tests it is possible to do exact differentiation of the malignant and
benign tumors.
Different preprocessing methods data cleaning, Synthetic Minority Oversampling
Technique (SMOTE), applying correlation coefficient, and tenfold cross-validations
are applied on the breast cancer data to obtain the accuracy [8]. Sireesha Moturi et al.
[9] used an information gain as a feature selection technique in order to reduce the
search space, and the proposed method was evaluated on the NRI Hospital medical
data. Amrane et al. [14] used K-nearest neighbor and Naive Bayes algorithms are
used to construct the model and got the accuracy of 97.51% and 96.19%, respectively.
Shahidi et al. [15] told that the accuracy scores are different for the different
machine learning models, and it indicates other factors such as filling missing values,
outlier removal, and feature selection methods can also influence the ability of models
to get the highest accuracy [15]. Researchers are continuously finding new ways to
find the best models.
3 Experimental Setup
To conduct the experiment, the following hardware and software are used: Intel(R)
Core(TM) i3-4005U CPU @ 1.70 GHz processor, 1 MB Cache memory, 64 bit
Windows 10 operating system, Jupiter notebook 6.1.4 for python 3.
Data Collection: Wisconsin Diagnostic Breast Cancer Dataset [16] (WDBC
Dataset) is used for the breast cancer prediction. It is obtained from UCI machine
learning repository [16]. The dataset contains 32 attributes, 569 records, and the class
distribution is 357 benign, 212 malignant.
Proposed model of WDBC dataset is depicted in Fig. 2. After preprocessing of
data, the dataset is divided into two parts test set and train set. Seventy-five percent
of records are divided into train set and remaining twenty-five percent of dataset will
be test set. Train data is used to construct the model using ML algorithms, and model
is evaluated using test data.
Machine learning algorithms applied on the dataset for creating some models or to
come to vital conclusions from that dataset. Some popular data mining algorithms are:
1) Decision tree (DT), 2) Logistic regression (LR), 3) Random forest classifier (RFC),
4) light gradient boosting machine classifier (LGBM), 5) support vector machine
classifier (SVC), 6) K-nearest neighbors (KNN). These algorithms are applied on
WDBC dataset to evaluate the best method for the identification of breast cancer.
• After removing missing values
• After removing outliers
• After applying correlation
326 S. S. Sunayna et al.
Missing Values: Data cleaning should be done on missing data or an erroneous data.
Data cleaning can be done by filling missing values manually or by attribute mean
or median or the most probable value.
The heat map was constructed to find missing values. And it is also observed
that there are no white stripes in the heat map. So, the dataset does not contain any
missing values. All the above-said algorithms are applied on this data and evaluated
their accuracy.
It can be observed that from Tables 1, 2, and 3, the accuracy, precision of LGBM
classifier, and recall of decision tree are the best classifiers compared to other models.
Table 2 Comparing
Model Precision
precision of various models
of WDBC LGBM 1.000000
SVC 0.978261
RF 0.945455
KNN 0.907407
LR 0.896552
DT 0.776119
Box plots are constructed for 1–11 attributes, 12–22 attributes, and 23 to 32 attributes.
And it is observed that outliers are present in the data. Outliers are removed from
the dataset and constructed models and evaluated them. Figure 7 depicts the boxplot
for 1–11 attributes. Initially, the dataset contains 569 records, then after removing
outliers the dataset contains 544 records. It means 25 records are removed.
It can be observed that from Tables 4, 5, and 6, the accuracy and precision of
LGBM classifier and recall of decision tree are the best compared to other models
Table 4 Comparing
Model Accuracy
accuracy of various models
after removing outliers LGBM 0.975155
RF 0.962733
DT 0.950311
KNN 0.937888
SVC 0.931677
LR 0.919255
Table 5 Comparing
Model Precision
precision of various models
after removing outliers LGBM 0.977778
RF 0.936170
KNN 0.930233
SVC 0.928571
DT 0.867925
LG 0.854167
Table 6 Comparing
Model Recall
RECALL of various models
after removing outliers DT 0.978723
RF 0.936170
LGBM 0.936170
LR 0.872340
KNN 0.851064
SVC 0.829787
after removing outliers. It can also observe that accuracy and RECALL of DECISION
TREE classifier has INCREASED after removing outliers. Even though precision
of LGBM has decreased compared to after filling missing values still it is the best
algorithm with respect to precision.
Correlation is one of the preprocessing methods which can be used to know the
relationship between features and to identify the relevant attributes. This method is
applicable for continuous data. The original dataset contains 32 attributes, and after
applying correlation, 21 attributes were selected. The above-said ML algorithms are
applied after correlation of WDBC data. The results of performance of the various
Performance Evaluation of Machine Learning Algorithms … 331
Table 7 Comparing
Model Accuracy
ACCURACY of various
models after correlation LGBM 0.972028
LR 0.972028
RF 0.972028
KNN 0.951049
DT 0.944056
SVC 0.930070
Table 8 Comparing
Model Precision
precision of various models
after correlation LGBM 0.962264
SVC 0.957447
LR 0.945455
RF 0.945455
KNN 0.925926
DT 0.894737
Table 9 Comparing
Model Recall
RECALL of various models
after correlation LR 0.981132
RF 0.981132
DT 0.962264
LGBM 0.962264
KNN 0.943396
SVC 0.849057
algorithms are mentioned in Tables 7, 8, and 9. It is observed that LGBM gave better
accuracy and precision, the LR gave better recall.
Most machine learning techniques ignore class imbalance, which in turn gives poor
performance on the minority class. In SMOTE method, duplicating the minority
class is to balance the dataset. These examples do not add any new information to the
model. The WDBC dataset is balanced using SOMTE technique. Before applying
SMOTE Class ‘0’ contains 357 records and class ‘1’ contains 212 records. Figure 8
shows imbalanced dataset, and Fig. 9 represents the balanced classes after applying
SMOTE.
332 S. S. Sunayna et al.
Table 10 ACCURACY of
Model Accuracy
various models after applying
SMOTE LGBM 1
RF 1
DT 1
LR 0.938547
KNN 0.932961
SVC 0.893855
Performance Evaluation of Machine Learning Algorithms … 333
Table 11 PRECISION of
Model Precision
various models after applying
SMOTE LGBM 1
RF 1
DT 1
LR 0.916667
SVC 0.907895
KNN 0.905882
Table 12 RECALL of
Model Recall
various models after applying
SMOTE LGBM 1
RF 1
DT 1
LR 0.950617
KNN 0.950617
SVC 0.851852
Accuracy
1.1
0.9
0.8
LGBM RF LR SVC KNN DT
WDBC DATA SET Correla on Removing Outliers SMOTE
By observing Figs. 10, 11, 12, it can be concluded that LGBM classifier is giving
better accuracy and precision on WDBC dataset, after removing outliers, correlation,
and after SMOTE. DT is giving better recall percentage.
5 Conclusion
This paper presents a disease prediction model which contains filling missing values,
outlier removal, applying SMOTE, feature selection, and classification algorithms
334 S. S. Sunayna et al.
Precision
1.5
0.5
0
LGBM RF LR SVC KNN DT
ON WDBC DATA SET Correla on Removing Outliers SMOTE
Recall
1.5
0.5
0
LGBM RF LR SVC KNN DT
ON WDBC DATA SET Correla on Removing Outliers SMOTE
like decision tree classifier, random forest classifier, LGBM classifier, logistic regres-
sion, KNN, and SVC. The accuracies are varying from one algorithm to other. From
the analysis, it is noticed that the LGBM classifier gives better accuracy and preci-
sion when compared to other classifiers, because of its faster training speed, lower
memory usage, higher efficiency, and compatibility with large datasets. This model
will be helpful for physicians for quick and better decision making in the process of
disease diagnosis to enhance patient safety.
References
4. J. Alwidian, B.H. Hammo, N. Obeid, WCBA: Weighted classification based on association rules
algorithm for breast cancer disease. Appl. Soft Comput. 62, 536–549 (2018). ISSN 1568–4946
5. R. Jafari-Marandi, S. Davarzani, M.S. Gharibdousti, B.K. Smith, An optimum ANN-based
breast cancer diagnosis: Bridging gaps between ANN learning and decision-making goals.
Appl. Soft Comput. 72, 108–120 (2018). ISSN 1568-4946
6. M. Sireesha, S. Vemuru, S.N.T. Rao, Frequent item set mining algorithm: a survey. J. Theor.
Appl. Inf. Technol. 96(3), 744–755. ISSN-1992-8645
7. M. Sireesha, S. Vemuru, S.N.T. Rao, Coalesce based binary table: An enhanced algorithm for
mining frequent patterns. ijet 7(1.5), 51–55 (2018)
8. H. Dhahri, E. Al Maghayreh, A. Mahmood, W. Elkilani, M.F. Nagi, Automated breast cancer
diagnosis based on machine learning algorithms. J. Healthc. Eng. 2019, 11, Article ID 4253641.
https://doi.org/10.1155/2019/4253641
9. S.A. Mohammed, S. Darrab, S.A. Noaman, G. Saake, Analysis of breast cancer detection
using different machine learning techniques, in Data Mining and Big Data. DMBD 2020.
Communications in Computer and Information Science eds. by Y. Tan, Y. Shi, M. Tuba, vol
1234 (Springer, Singapore, 2020). https://doi.org/10.1007/978-981-15-7205-0_10
10. M. Sireesha, S. Vemuru, S.N.T. Rao, Optimized feature extraction and hybrid classification
model for heart disease and breast cancer prediction. Int. J. Recent Technol. Eng. 7(6), 1754–
1772. ISSN-2277–3878
11. S.H. Nallamala, P. Mishra, S.V.Koneru, Breast Cancer detection using machine learning way.
Int. J. Recent Technol. Eng. 8, 1402–1405 (2019)
12. M. Nilashi, O. bin Ibrahim, H. Ahmadi, L. Shahmoradi, An analytical method for diseases
prediction using machine learning techniques. Comput. Chem. Eng. 106, 212–223, https://doi.
org/10.1016/j.compchemeng.2017.06.011
13. S. Moturi, S.N.T. Rao, S. Vemuru, Grey wolf assisted dragonfly-based weighted rule generation
for predicting heart disease and breast cancer. Comput. Med. Imaging Graph. 91, 101936
(2021). ISSN 0895-6111
14. M. Amrane, S. Oukid, I. Gagaoua, T. Ensarİ,Breast cancer classification using machine
learning, in 2018 electric electronics, computer science, biomedical engineerings’ meeting
(EBBT) (2018), pp. 1–4. https://doi.org/10.1109/EBBT.2018.8391453
15. F. Shahidi, S.M. Daud, H. Abas, N.A. Ahmad, N. Maarop, Breast cancer classification using
deep learning approaches and histopathology image: a comparison study. IEEE Access 8,
187531–187552 (2020). https://doi.org/10.1109/ACCESS.2020.3029881
16. UCI Machine Learning Repository: Wisconsin Diagnostic Breast Cancer Dataset
(WDBC Dataset). https://archive.ics.uci.edu/ml/datasets/Breast+Cancer+Wisconsin+(Diagno
stic). Last Accessed on 14 Aug 2021
Topology Dependent Ant Colony-Based
Routing Scheme for Software-Defined
Networking in Cloud
Abstract The exponential growth of cloud computing has resulted in the rapid
creation of data center networks (DCNs). As data center networks have grown in
popularity, efficient routing has become a critical problem for maximizing network
performance, scalability, and reliability. Traditional link-state algorithms are widely
used in data center networks, but they take a longer convergence time. In DCN,
topology-aware routing algorithms were recently discovered to be efficient. This
paper proposes the ant colony-based shortest path routing algorithm (AC*) to illus-
trate typical topologies. To guarantee unified control of the whole network, this
approach decouples the control plane from the data forwarding plane, and topology
description language (TPDL) files are utilized as prerequisite information to estab-
lish initial topology in the software-defined network (SDN) controllers. Unlike other
topology-aware routing algorithms, the ant colony-based shortest path routing algo-
rithm (AC*) is designed to work on a variety of standard topologies. The AC*
algorithm outperforms traditional link-state routing methods according to the results
of the experiments.
1 Introduction
Data centers (DCs) act as a central core in modern ICT ecosystems. The vast network
infrastructure of physical machines in data centers is called as data center network
[1] which enables online information services to run continuously from all over
the world. DC systems are rapidly expanding and redesigning themselves for high
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 337
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_26
338 B. S. Shylaja et al.
reliability and availability to avoid catastrophic failures and system outages [2, 3].
A server system’s reliability and availability in a data center are usually believed to
be based on the reliability and of server systems involved in the system architecture,
as well as the own physical machine availability.
Data center networking (DCN) [4, 5] is a new networking model that has the
potential to overcome the shortcomings of today’s network infrastructures. The net-
work’s control plane, i.e., control logic is separated from primary routers and switches
that transmit traffic in a data center network which allows for vertical integration (the
data plane). Second, by separating data planes and control planes, network switches
can be reduced to basic forwarding units, and control logic can be implemented
in a logically centralized controller (or network operating system), which makes
implementation, network evolution, and (re)configuration easier [6].
This is interesting to see how manipulating a system with similar components
yields different outcomes, since each computable node in a DCN interacts with other
nodes through the topology of the network. Even if the total number of components
remains unchanged, proper allocation and networking would greatly increase the
system’s efficiency and availability. The impact of subsystem allocation and inter-
connection on overall device performance and availability in DCNs has received
little attention.
An efficient framework is required for communication among the physical
machines in a data center networking which is essential for data center agility
and reconfigurability. While maintaining high reliability/availability, capacity, and
throughput, the DCNs must be able to respond to a wide range of application service
demands requirements. In data centers, top of rack (ToR) switches are linked to end
of rack (EoR) switches which are then linked to core switches.
While other steps and characteristics of a data center network must be balanced,
using many small products and similar switches will dramatically diminish the
building cost for a new data center [6]. Furthermore, if the DC’s size needs to be
scaled/built out, pods deployment in a fat-tree topology can be made gradually with
no downtime or rewiring. Furthermore, when considering good results, the greatest
benefit of fat-tree topology [7] is that network software does not need to be written
to be network conscious. Cabling difficulty, on the other hand, is the most serious
deployment error in the fat-tree topology.
In several ways, fat-tree outperforms other DCN topologies. Fat-tree outperforms
BCube and DCell in terms of performance metrics like latency and throughput. Fat-
tree DCNs, unlike three-tier topologies, do not need the use of high-end switches or
high-speed connections, lowering overall deployment costs substantially [6]. Scal-
ability, route diversity, latency, throughput, power usage, and cost are the most
common metrics used to evaluate a DCN in practice [8]. DCNs’ ability to withstand
multiple failures (of links, switches, and compute nodes) has recently become an
important feature for DCNs to support long-running online services [8]. To increase
the reliability and availability of DCNs, stochastic models must be used to model
and evaluate fault-tolerance characteristics.
In a data center network, network topology and the routing protocol that goes
with it are important elements of application success. In the following areas, new
Topology Dependent Ant Colony-Based Routing Scheme … 339
topologies and routing protocols have recently been researched to enhance network
performance.
(1) High-bandwidth: Many existing data center network applications, for
ex.MapReduce, Hadoop, and Dryad are data-intensive and necessitate exten-
sive intranet work communication. In densely connected data center networks,
common features include a bandwidth of high bisection and multiple parallel
paths between any two servers. It is important to provide routing protocols that
can take advantage of the bandwidth of the network and a variety of routes.
(2) Flexibility: The configuration of a data center network will change after it is set
up according to a new survey. 93% of data center operators in the United States
and 88% of data center operators in Europe plan to expand their facilities. As
a result, without removing or replacing existing switches data center network
should be able to accommodate incremental network size expansion such as
adding servers and network bandwidth.
(3) Scalability: In data center network, routing and forwarding should be based
on forwarding small states of switches and expandable to huge networks.
Since forwarding tables use costly and increasingly fast line speeds, scala-
bility of forwarding tables is highly desired in huge enterprise and data center
networks. If the forwarding state is small and will not expand with the size
of the network, we can build large data centers with relatively inexpensive
switches and eliminate the need for switch memory upgrades as the network
expands.
2 Literature Study
Numerous network topologies and routing methods have been proposed in a recent
study. Each has its network design, routing algorithms, and failure prevention and
recovery mechanisms.
DCNs are divided into three groups according to [9] DCN architecture classifi-
cation: 3 tier [5], fat-tree [6], Portland [7], and F2Tree [8] are switch-centric archi-
tectures, while DCell [9], Ficonn [10], MCube [11] are server-centric architectures,
and Helios [12] is a hybrid/enhanced architecture. Server networks in data centers
are usually built using 2 switch-centric topologies (three-tier, fat-tree) and 2 server-
centric topologies (BCube, DCell). Fat-tree (its variants also) may be a good fit for
DCN topologies in mass-produced DCs from companies like Google and Facebook.
BCube is a high-performance and reliable MDC network architecture. Demands
from data-intensive computing, recent technological developments, and special MDC
specifications all influenced the design and implementation of BCube. Rather than
using a switch-oriented approach, the BCube network architecture uses a server-
centric approach. It deals with commodity switches and positions intelligence on
MDC servers. Between any two servers, BCube provides several parallel short paths.
BSR is a source routing protocol used by BCube [10].
340 B. S. Shylaja et al.
DCell can use a robust and inexpensive single path unicast routing algorithm due to
its recursive design. DCell routing was designed with a divide-and-conquer strategy
in mind. This method computes the transitional relation (n1; n2) that connects the two
DCellk1s before calculating the path from src to dest in a DCell k. The next step is to
figure out where the two sub-paths from src to n1 and n2 to dest are located. In DCell
routing, nal route combines the two sub-paths (n1; n2) [11]. There has been a lot of
work performed on the modernization of routing protocols. Pre-computed backup
paths, such as MPLS fast-reroute (see RFC4090) and IP restoration [12], will not be
a better idea for huge, scaled data center networks. FCP [13] is the modern routing
model that suggests finding a working route without knowing the entire topology.
Some studies examined how to improve efficiency by changing timing parameters.
This, however, is not a significant change that applies to SRP. No sufficient work
on routing protocols to replace OSPF in data center networks has been done. Three
simple steps make up Hedera’s control loop. It starts by looking for massive flows at
the edge switches. It then calculates good paths for large flows by estimating their
natural demand and using placement algorithms. Finally, the switches are wired with
these paths [14].
3 System Architecture
To achieve better efficiency with dynamic routing, the routing strategy is continuously
updated based on the existing state of the network. However, it usually requires
several operations and is thus more costly. In Fig. 1, data center networks typically
have routine coordinates, addressing, and connections between nodes that can be
expressed recursively. Designers must have a well-defined definition method to fully
utilize them.
This section introduces a method for formally describing standard network topolo-
gies. In Fig. 2, the regularities of topology are more explicitly illustrated in a formal-
ized way with this approach. It is also important to have a domain-specific language.
A topology description language, which acts as a link between formalized formulas
and routing programmers, is used to get more intuitive and parse able forms of
topology description.
For data center networks, in Fig. 3, the AC* algorithm is a TPDL-based SDN
routing system. It focuses on lowering topology discovery overheads and replacing
the conventional shortest path algorithm with a more robust route selection scheme.
The use of TPDL as prior knowledge is one of the core concepts in AC*.
The controller will create a simple environment for ensuring that the entire network
works by leveraging the knowledge in TPDL. AC* can accommodate topology
changes and failures with the addition of additional components.
Data center networks are more stable and less changeable by default. Because
of these characteristics, only less amount of variable information, such as failure of
the link is obtained during system startup, while the vast majority of static and less
changeable topology information is provided before system startup. When compared
Topology Dependent Ant Colony-Based Routing Scheme … 341
AC* Controller
Topology Manager
OpenFlow Module
to a typical LLDP topology detection mechanism, total device overheads in SDN are
significantly lower.
The TPDL parser oversees processing TPDL input files and sending the results
to the topology manager. The topology manager will create a topology from TPDL
data and update it while the controller is running to keep it up to date with the current
condition of the network. As its name implies, the routing calculator selects routes
using the AC* algorithm. The routing calculator finds the best path for the flow when
OpenFlow module sends a packet-in request. Any topology changes are immediately
communicated to the fault processor by the topology manager.
The AC* switch architecture is given in Fig. 4. A fault detector (FD) module has
been added to OpenFlow switches which use hello message processor to detect faults
between switches and their neighbors (HMP). Using its identifier, HMP sends hello
messages to all ports which are connected on regular basis.
Furthermore, all received hello packets will be forwarded to HMP so that the
details of the neighbors can be saved. The FD watches for hello messages from
its neighbors to see if there is a problem between them. If the hello message is
not received after a predetermined time, it will give a fault message report to the
controller using OpenFlow module.
Normal Open vSwitch switches [15] are transformed into AC* switches using the
proposed architecture in the AC* implementation. Traditional SDN implementations
AC*
Fault Detector
Packet Processor
are compatible with AC* controllers and switches, allowing AC* to use all SDN’s
existing algorithms and infrastructure. The traffic engineering methods described in
[16] and the QoS algorithms described in [17], for example, can still be used on AC*
with little modifications they can benefit from AC* topology information.
Step 1: Assume P1’s path is (l1, m1, n1), and P2’s path is (l1, m1, n1) (l2, m2,
n2). With m1 and m2, a crossover operation is performed, new route obtained is
P3: (l1, m2, m1, n1).
Step 2: The new route P3 is calculated by deleting the duplicated switch.
Step 3: Another new path, P4, is obtained in the same way.
Step 4: When the fitness function is applied to P1, P2, P3, and P4, the best
direction is found.
Step 5: Mutation—The mutation operation is dependent on the mutation proba-
bility that has been predetermined. For the mutation process, two points from the
off-spring’s paths are chosen at random.
For instance, in the case of g1,
g1 = (2, 4 j 7, 6, 5 j 8, 9, 3) (1)
void crossover () {
Random rmd = new Random ();
int crossoverpt = rmd. nextInt(population.individuals[0].geneLength);
//choose a random crossover point for (int k = 0; k < crossoverpt; k + + ) {
//Swap values among parents int tmp = fittest.genes[k];
fittest. genes[k] = secondFittest.genes[k]
secondFittest.genes[k] = tmp
}
}
1. The optimal path has m switches, and the frequency of mutation is set in the
simulation part.
2. Two natural numbers, n1 and n2, are generated at random (n2 < n1 < m).
3. A new path, Pn, is obtained by swapping the switch at n1 and n2 locations of
the optimal path, P0.
4. Calculate the fitness of P0 and Pn and choose the path with the lowest value as
best. After AC* operations, the best route is determined, and packets are sent
along with it.
void mutation1 () {
Random rmd = new Random ();
//choose a random mutation point.
int mutationpt = rmd.nextInt(population.individuals[0].geneLength);
if (fittest.genes[mutationpt] = = 0) {fittest.genes[mutationpt] = 1;
} else {
fittest.genes[mutationpt] = 0;
}.
mutationpt = rmd.nextInt(population.individuals[0].geneLength); if (second-
Fittest.genes[mutationpt] = = 0){
secondFittest.genes[mutationpt] = 1;
} else {
secondFittest.genes[mutationpt] = 0;
}
}
4 Experiments
To explain how AC* works, we compare it to various routing schemes. For first, OSPF
was chosen because it is one of the widely used link-state routing algorithms in data
center networks. Since AC* is based on SDN, SDN is increasingly being extended
to data center networks. Floodlight is also used as a representation of conventional
SDN. Additionally, since DCell like fat-tree and BCube is a standard topology-aware
topology, we use its topology and routing algorithm [18, 19]. The controller is used
in our tests to ensure that AC* is feasible and performs well. The Quagga [17] routing
suite was used to create OSPF networks, as well as standard SDN networks based
on Open vSwitch and floodlight.
Topology Dependent Ant Colony-Based Routing Scheme … 345
The path computation effectiveness of the AC* algorithm is compared with the tradi-
tional OSPF algorithm is shown in Fig. 5. Different scale fat-tree network topologies
with scales of k = 4, k = 8, k = 12, and k = 16 are used for path computation
efficiency testing.
During the experiment, the first packet transmission will be completed in less
than one second by AC*. Both OSPF and floodlight, on the other hand, take tens
of seconds to complete network synchronization and detection. The DCell takes
the same amount of time as the AC* to converge in a small network. But, like the
floodlight, it takes much longer in a larger network.
First packet delay is used as the estimation criterion in end-to-end delay evaluation
for both AC*, floodlight, and DCell. The experiment uses three traffic models: one-
to-all, one-to-one, and all-to-all, which will be the evidence for inter data center traffic
scenarios. The fat-tree topology is used by AC* and floodlight, with k = 4, which
means the number of flows in various traffic models is 1, 15, and 240, respectively.
DCell uses a network of DCell of 20 servers with flow numbers of 1, 19, and 380 as
shown in Fig. 7.
Following convergence, the network’s nodes will be sent out the first packet of
the flows at the same time. Multiply the returned round-trip time (RTT) by two to
get the end-to-end delay.
Topology Dependent Ant Colony-Based Routing Scheme … 347
5 Conclusion
This paper proposes AC* for standard network topologies, which is a topology-
aware routing algorithm based on software-defined networks. This algorithm gener-
ates and implements a routing scheme using SDN and TPDL technologies, as well
as extremely efficient routing calculations and handling of fault mechanisms are
implemented. The AC* algorithm incorporates discovery, crossover, and mutation
operations to increase path search speed and capability. In contrast to OSPF, flood-
light, and DCell, the AC* algorithm’s routing computation is quicker, and the network
convergence period is shorter. The AC* algorithm is thought to better leverage the
ability of data center networks (DCN) and increase network performance.
References
1. R. Sahba, A brief study of software-defined networking for cloud computing, in 2018 World
Automation Congress (WAC) (IEEE, 2018), pp. 1–5
2. A.A. Bahashwan, M. Anbar, N. Abdullah, New architecture design of cloud computing using
software-defined networking and network function virtualization technology, in International
Conference of Reliable Information and Communication Technology (Springer, Cham, 2019),
pp. 705–713. 22 Sep 2019
3. S. Jain, A. Kumar, S. Mandal, J. Ong, L. Poutievski, A. Singh, S. Venkata, J. Wanderer, J.
Zhou, M. Zhu, J. Zolla, U. Hölzle, S. Stuart, A. Vahdat, B4: experience with a globally-
deployed software defined wan, in Proceedings of the ACM SIGCOMM 2013 Conference on
SIGCOMM,ser.SIGCOMM”13 (ACM, New York, 2013), pp. 3–14
348 B. S. Shylaja et al.
Abstract More practical experiments proved high performance rates due not only to
recent hardware technological advancements, but also to the new proposed pipelines
in preprocessing inspired from domain-specific techniques for high-speed image
recognition systems, image analysis applications, increased quality of tracking
systems, and closely related to human performance of some decision-based applica-
tions. The performance has been obtained, mainly, through learning from previous
experiments, integrating multi-step approaches, parallel processing, and transfer
learning. In this paper, we describe some computational complexity results based
on practical projects in facial image analysis and recognition.
1 Introduction
Recently, high performance rates in image analysis have been reported. Not only
the hardware technological advancements have a big contribution, but also the new
proposed pipelines in preprocessing inspired from domain-specific techniques. In
this paper, we describe some computational complexity results both in theory and
applications. Firstly, the framework of transfer learning is discussed, and computa-
tional complexity models in machine learning (ML) are examined. The most popular
convolutional neural networks (CNNs) are discussed from point of view of static
complexity, and the usage of transfer learning is illustrated in the context of solving
A.-S, . Moloiu
University ‘Politehnica’ of Bucharest, Bucharest, Romania
G. Albeanu
Spiru Haret’ University, Bucharest, Romania
e-mail: g.albeanu.mi@spiruharet.ro
F. Popent, iu-Vlădicescu (B)
University ‘Politehnica’ of Bucharest and Academy of Romanian Scientists, Bucharest, Romania
e-mail: popentiu@imm.dtu.dk
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 349
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_27
350 A.-S, . Moloiu et al.
facial analysis tasks. Some concepts and notations are well known in literature [1–4],
and only short remarks will be given.
3 Complexity Models
The learning objective in ML is based on finding the best hypothesis (if possible),
otherwise only a probably approximately correct learning hypothesis [4]. In order to
measure the learning capability, the empirical risk minimization (ERM) principle is
used to estimate a hypothesis risk function. Depending on the nature of the labeling
set (discrete, or continuous), the risk measure is given along a summation, respective,
integration procedure. The so-called loss function, or the cost function, to be used
during risk minimization can be:
• Mean squared error (MSE—as the difference between the obtained predictions
and the known values, square it, and average it for the whole dataset);
• The likelihood function (the sum of each input sample multiplied by its predicted
probability);
• The cross-entropy loss (a straightforward modification of the likelihood function
with logarithm of predicted probability).
Other loss functions can be used like: Huber loss, mean absolute error loss,
Kullback Leibler divergence loss [18], adding penalty terms to MSE [1], ArcFace
[19].
352 A.-S, . Moloiu et al.
Using the likelihood function for estimating the model parameters (weights)
for neural networks (in machine learning) implies that the model improves with
increasing size of the training dataset. Actual practice recommends MSE for solving
regression problems and cross-entropy for both binary and multi-class classification
problems. A natural selection for the loss function (when false positives and false
negatives are similar in costs) would be the 0–1 loss function, which is 0 if the
predicted classification equals that of the true class, or is 1 otherwise.
The computational complexity of learning (CCL) deals with evaluation of learning
algorithms that are able to estimate a hypothesis (a model and a set of computed
parameters) according to a computational paradigm, that means, a learning algorithm
receives a training set and outputs a hypothesis, which is a program/model. According
to Mitchel [1], CCL may answer to questions about the sample complexity (the
minimum size of the training pool for achieving convergence with high probability),
the computational complexity (the training time or the required effort to find the
hypothesis), the expressiveness (what kind of models are best identified), and the
mistake bound (the size of misclassified patterns toward achieving convergence).
Depending on the family of hypothesis, such answers can be obtained fast, while for
learning algorithms in facial analysis, it is difficult to provide such indicators.
In statistical learning view, the basic assumption refers to using identical and
independent distributed pairs both for training and testing. The generalization error
depending on the sample size and model complexity is measured by the gap between
training error and test error. If a bound of the gap can be given for the source
domain, the principal problem is to control the target error. Some basic estimators
will be outlined below after describing some concepts and notations. The probably
approximately correct (PAC) algorithm analysis is used.
Let be given a learning problem in the set problems = (Z, H, c), with input space
Z, the set of hypothesis H, and the cost function c. Let be f : (0, 1) x (0, 1) → N.
A learning algorithm A has computational complexity O(f ) if, for some constants
ε and δ from (0, 1), the complexity of A is below Kf (ε, δ), where K is a positive
constant, the output of A is hA being approximately correct, that means, hA is the
best hypothesis with probability 1-δ: Prob(Error(hA ) ≤ ε) ≥ 1-δ [3, 4, 20].
Bounds for the generalization error, for sample complexity, or for computational
complexity were identified for various learning problems taking into account the
size of Z, the size of H, ε, δ, and the Vapnik–Chervonenkis (VC) dimension [1,
2, 21]. Taking into account the size of hypothesis space, the number of training
samples necessary to achieve the best (ε, δ) hypothesis can be overestimated by
(log(|H|+log(1/δ))/ε.
In the images that follows, we show only the evolution of the number of training
samples in the statistical training case of 95% (Fig. 1), 97% (Fig. 2), and 99% (Fig. 3),
when the approximation error is 0.01 (over size of hypothesis space).
The above figures explain the necessity to use large sizes of training samples,
according to the hypothesis space for any modeling technique. Therefore, the
modeling approach can affect the performance of the analysis framework.
The VC dimension of a learning model is the size of the largest subset of points
from the input space Z that can be shattered by the model. In many cases, the VC
On Computational Complexity of Transfer Learning … 353
dimension is related to the number of parameters of the learning model. The case
of piecewise neural networks is described in [22]. Let be a multilayer perceptron
(MLP), with n inputs, two layers, where one is hidden with m processing units and
the second having one output (m + 1 neurons). The number of parameters (weights)
is (n + 1)m + m + 1 = VC dimension. If the training error is ε, a bound for the
sample complexity is μln(μ), where μ = 32((n + 1)m + m-1)/ε.
354 A.-S, . Moloiu et al.
If the hypothesis space H has the VC dimension d, the sample size is n, and
δ is given, then an upper bound of the generalization error is sqrt((dlog(n) +
log(2/δ))/n), that means, n should be increased enough to decrease the generalization
error depending on log(n)/n. Also, according to [1], a good bound on the size of set
of samples is (4 log(2/δ) + 8d log(13/ε))/ε. However, under (d-1)/(32ε), there is no
chance for small error. From above relations is clear that during training is necessary
a large amount of input data, due to the fact that both ε and δ are small positive
numbers near zero. In the images that follows, we show only the evolution of the
number of training samples used in neural neural networks training in the statistical
case of 95% (Fig. 4), 97% (Fig. 5), and 99% (Fig. 6), when the approximation error
is 0.01 (over size of hypothesis space).
The above figures show the evolution of the number of training samples required
to gain a performance level, against n and m. This analysis is valuable for researchers
to estimate, by practical strategies, the number of inputs used to obtain a high perfor-
mance level. Figures 4, 5, and 6 describe a logarithmic evolution depending on the
number of inputs and the number of internal neurons. This analysis is valid only for
neural (multi-layers) networks.
Other measures, not considered here, as Kolmogorov complexity [23] and Rade-
maker complexity [24] can be used when neural networks are used as learning
models.
In the following, the main interest goes to neural networks (NN) learning models
taking into account other aspects. There are many types of NNs and some tricks
to accelerate NN training [25] use fast computed activation functions (e.g., ReLU)
and over-specification and regularization. NN learnability depends on the class of
hypothesis, the number of layers, the activation function, and the number of neurons.
In this respect, positive and negative results are given in [25]. According to Hu
weights is 61 M, while the number of MACs is 724 M. Some techniques were used by
AlexNet: ReLU nonlinearity, local response normalization, and weights splitting in
groups. An adapted AlexNet was used by Cîrlescu et al. [30] in emotion recognition
(eight output classes).
The third important CNN is VGG-16 (also there is VGG-19). VGG-16 has 13
ConV layers and three FC layers, requiring 138 M weights to be learnt and 15.5G
MACCs.
Another CNN to be mentioned is ResNet-50 consisting of 53 ConV layers and one
FC layer, with 25.5 M weights and 3.9 G total MACs. Let consider also MobileNet,
which is a class of small and less computing intensive architectures for vision appli-
cations based on depthwise separable convolutions [31]. MobileNetV1 lightweight
(486,784 parameters) and MobileNetV1 deep (1,088,576 parameters), architecture
constructed based on MobileNetV1 by using the process of ablation, were used by
Issa et al. [32] for facial recognition (identification) and verification. The architectures
were tested using MSCeleb1M database under TFRecords format.
Considering the computational effort and top-5 error indicators for ResNet-50,
AlexNet, and VGG-16, it was founded a correlation of -0.99455. This is consistent
with practical experiments consisting of high processing effort to obtain reduced
error.
Facial analysis with ML techniques asks for solving tasks like: face detection,
facial features detection, and facial recognition. After detection, face alignment is
an important preprocessing task consisting of geometric transformations like trans-
lation, scale, and rotation. By facial features detection, some localization tasks are
required for eyes, mouth, and nose. Not only face recognition (for verification tasks
with high accuracy) but also gender recognition and the affective state (anger, antici-
pation, disgust, fear, joy, sadness, surprise, and trust) of the person can be established
with some accuracy [30].
In order to describe the performance of a face verification system, genuine and
impostor pairs are considered and used for testing [32]. A genuine pair consists of
two samples that pertain to the same user, while an impostor pair is constructed using
samples acquired from different users. The following four direct indicators belong
to the confusion matrix: (1) true positives/true accept (TP) the number of authorized
individuals who claim to have access to the system and are classified correctly; (2) true
negatives/true reject (TN) unauthorized persons trying to impersonate and classified
correctly; (3) false positives/false accept (FP) unauthorized individuals who claim to
have access to the system and are classified incorrectly; and (4) false negatives/false
reject (FN) persons who have the right of access but are rejected by the system. They
are used to compute false acceptance rate (FAR) or false rejection rate (FRR). FAR
is the probability of cases for which a biometric system authorizes an unauthorized
person. FRR is the probability of cases where a biometric system denies access to an
authorized person. The work of Issa et al. [32] proved performance in user verification
by discriminating between genuine and impostor pairs.
358 A.-S, . Moloiu et al.
up the process, both pre-trained networks and some transfer learning procedures can
be used:
1. Freezing any pre-trained network for image understanding and adding two FC
layers, the last one being responsible with binary classification (gender recog-
nition), with multi-class discrimination (eight classes for emotion recognition,
or n + 1 classes when n different image prototypes are available);
2. Adapting the loss function like ArcFace;
3. Preparing auxiliary data and augmenting the dataset (adding mustache, adding
glasses, small zooming, small shifting, small rotating);
4. Oversampling small classes and under-sampling large classes.
Experiments were possible using inhouse applications written in Python and using
TensorFlow library and GPU hardware. The following qualitative aspects have to be
mentioned:
– The use of augmented data has added performance to the recognition systems,
which were able to generalize much better;
– The modified architectures are faster, consume less memory, and prove compa-
rable performance with state-of-the-art models.
5 Concluding Remarks
The transfer learning approaches used in machine learning projects proved efficiency
in short-time implementation of new systems. In order to guarantee a performance
360 A.-S, . Moloiu et al.
rate, it is necessary to estimate the size of training data both in the initial phase and
in the transfer-based phase. In this paper, an analysis inspired by PAC paradigm was
conducted for solving image analysis and recognition tasks. The use of augmented
data has added performance to the recognition systems, which were able to generalize.
Moreover, much better results were obtained through modified architectures inspired
by transfer learning approaches.
References
1. T.M. Mitchell, Machine Learning (McGraw-Hill, Inc., New York, 1997), pp. 870–877
2. S. Shalev-Shwartz, S. Ben-David, Understanding Machine learning: From Theory to Algo-
rithms (Cambridge University Press, Cambridge, 2014). https://doi.org/10.1017/CBO978110
7298019
3. A. Engel, Complexity of learning in artificial neural networks. Theoret. Comput. Sci. 265,
285–306 (2001)
4. R. Gupta, T. Roughgarden, A PAC approach to application-specific algorithm selection. SIAM
J. Comput. 46(3), 992–1017 (2017)
5. K. Weiss, T.M. Khoshgoftaar, D. Wang, A survey of transfer learning. J. Big Data 3, 9 (2016).
https://doi.org/10.1186/s40537-016-0043-6
6. O. Day, T.M. Khoshgoftaar, A survey on heterogeneous transfer learning. J Big Data 4, 29
(2017). https://doi.org/10.1186/s40537-017-0089-0
7. Y. Jang, H. Lee, S.J. Hwang, J. Shin, Learning what and where to transfer, In Proceedings of
the 36th International Conference on Machine Learning. (PMLR 97, 2019), pp. 3030–3039
8. H.-W. Ng, V.D. Nguyen, V. Vonikakis, S. Winkler, Deep learning for emotion recognition
on small datasets using transfer learning, In Proceedings of the International Conference on
Multimodal Interaction. (ICMI, ACM, 2015), pp. 443–449. https://doi.org/10.1145/2818346.
2830593
9. C. Florea, L. Florea, C. Vertan, M. Badea, A. Racoviteanu, Annealed label transfer for face
expression recognition. British Mach. Vis. Conf. (BMVC), Art.Id 321, 1–12 (2019). https://
bmvc2019.org/wp-content/uploads/papers/0321-paper.pdf
10. K. Feng, T. Chaspari, A review of generalizable transfer learning in automatic emotion
recognition. Front. Comput. Sci. (2020). https://doi.org/10.3389/fcomp.2020.00009
11. J. Luttrell, Z. Zhou, Y. Zhang, C. Zhang, P. Gong, B. Yang, R. Li, A deep transfer learning
approach to fine-tuning facial recognition models, in Proceedings of the 13th IEEE Conference
on Industrial Electronics and Applications (2018), pp. 2671–2676. https://doi.org/10.1109/
ICIEA.2018.8398162
12. M. Badea, C. Florea, L. Florea, C. Vertan, Can we teach computers to understand art? Domain
adaptation for enhancing deep networks capacity to de-abstract art. Image Vis. Comput. 77,
21–32 (2018). https://doi.org/10.1016/j.imavis.2018.06.009
13. B. Maschler, S. Kamm, M. Weyrich, Deep industrial transfer learning at runtime for image
recognition. Automatisierungstechnik 69(3), 211–220 (2021). https://doi.org/10.1515/auto-
2020-0119
14. W.M. Kouw, L.J.P. van der Maaten, J.H. Krijthe, M. Loog, Feature-level domain adaptation.
JMLR 17(171), 1–32 (2016)
15. J. Yosinski, J. Clune, Y. Bengio, H. Lipson, How transferable are features in deep neural
networks? NIPS’14, in Proceedings of the 27th International Conference on Neural Information
Processing Systems, vol. 2 (2014), pp. 3320–3328
16. H.-W. Lee, N. Kim, J.H. Lee, Deep neural network self-training based on unsupervised learning
and dropout. Int. J. Fuzzy Logic Intell. Syst. 17(1), 1–9 (2017). https://doi.org/10.5391/IJFIS.
2017.17.1.1
On Computational Complexity of Transfer Learning … 361
17. X. Li, Q. Sun, Y. Liu, S. Zheng, Q. Zhou, T.-S. Chua, B. Schiele, Learning to self-train for semi-
supervised Few-Shot classification, in Advances in Neural Information Processing Systems 32:
Annual Conference on Neural Information Processing Systems. (NeurIPS, 2019), pp. 10276–
10286
18. A. Achille, G. Paolini, G. Mbeng, S. Soatto, The information complexity of learning tasks,
their structure and their distance. Inf. Infer. J. IMA 10(1), 51–72 (2021). https://doi.org/10.
1093/imaiai/iaaa033
19. J. Deng, J. Guo, N. Xue, S. Zafeiriou, ArcFace: Additive angular margin loss for deep face
recognition, in IEEE/CVF Conference on Computer Vision and Pattern Recognition (2019),
pp 4685–4694. https://doi.org/10.1109/CVPR.2019.00482
20. J. Pan, Review of metric learning with transfer learning. AIP Conf. Proc. 1864, 1–9 (2017),
Art.Id 020040. https://doi.org/10.1063/1.4992857
21. Y. Zhang, J. Lee, M. Wainwright, M.I. Jordan, On the learnability of fully-connected neural
networks, in Proceedings of the 20th International Conference on Artificial Intelligence and
Statistics, (PMLR 54, 2017), pp. 83–91. http://proceedings.mlr.press/v54/zhang17a/zhang17a.
pdf
22. P.L. Bartlett, N. Harvey, C. Liaw, A. Mehrabian, Nearly-tight VC-dimension and pseudo dimen-
sion bounds for piecewise linear neural networks. JMLR 20(63), 1–17 (2019). https://jmlr.org/
papers/v20/17-612.html
23. J. Schmidhuber, Discovering neural nets with low Kolmogorov complexity and high general-
ization capability. Neural Netw. 10(5), 857–873 (1997)
24. W. Gao, Z.-H. Zhou, Dropout Rademacher complexity of deep neural networks. Sci. China
Inf. Sci. 59(7), 1–12 (2016), Art.Id 072104. https://doi.org/10.1007/s11432-015-5470-z
25. R. Livni, S. Shalev-Shwartz, O. Shamir, On the computational efficiency of training neural
networks, in Proceedings of the 27th International Conference on Neural Information
Processing Systems, vol. 1 (NIPS, 2014), pp. 855–863
26. X. Hu, W. Liu, J. Bian, J. Pei, Measuring model complexity of neural networks with curve
activation functions, in Proceedings of the 26th ACM SIGKDD Conference on Knowledge
Discovery and Data Mining KDD (2020). https://dl.acm.org/doi/10.1145/3394486.3403203
27. V. Sze, Y.-H. Chen, T.-J. Yang, J.S. Emer, Efficient processing of deep neural networks: A
tutorial and survey. Proc. IEEE 105(12), 2295–2329 (2017). https://doi.org/10.1109/JPROC.
2017.2761740
28. Y. LeCun, Y. Bengio, G. Hinton, Deep learning. Nature 521, 436–444 (2015). https://doi.org/
10.1038/nature14539
29. A.-S. Moloiu, Automatic Character Recognition. Licence Thesis in Informatics (under the
supervision of Dana-Mihaela Vilcu), “Spiru Haret” University, Bucharest, 2014
30. M.G. Cîrlescu, A.S. Moloiu, G. Albeanu, Improving Facial Analysis using Deep Learning.
Technical Report, “Spiru Haret” University, Bucharest, 2019
31. A.G. Howard, M. Zhu, B. Chen, D. Kalenichenko, W. Wang, T. Weyand, M. Andreetto, H.
Adam, MobileNets: Efficient convolutional neural networks for mobile vision applications.
https://arxiv.org/abs/1704.04861 (2017)
32. B. Issa, A.-S. Moloiu, G. Albeanu, Developing a robust facial verification system using deep
neural networks. Technical Report, “Spiru Haret” University, Bucharest, 2019
33. LFW, http://vis-www.cs.umass.edu/lfw/ (last accessed 30 July 2021)
Adaptive Classifier Using Extreme
Learning Machine for Classifying
Twitter Data Streams
Abstract As the online business is being exponentially advanced from time to time,
analyzing the data on the fly is the need of the hour and getting high attention from
the researchers. The proliferation of a variety of data that has been generated from
different kinds of devices needs to have a practically efficient algorithm for the data
stream analysis. However, most of the existing data mining algorithms are widely
used for the analysis of the stream of data as well which are better for static data
mining and not as efficient for the data stream. Thus, to overcome this limitation, this
work has proposed an adaptive classifier (algorithm) for the analysis of data stream
instantly. The proposed adaptive classifier uses extreme learning machine along with
the product’s attributes-based rule database to identify the popularity of the product.
1 Introduction
M. A. M. Raja (B)
Department of Computer Science and Engineering, RMK College of Engineering and
Technology, Chennai 601206, India
e-mail: arunmcse@rmkcet.ac.in
S. Swamynathan
Department of Information Science and Technology, College of Engineering Guindy, Anna
University, Chennai 600025, India
e-mail: swamyns@annauniv.edu
T. Sumitha
Department of Computer Science and Engineering, RMK Engineering College, Chennai 601206,
India
e-mail: sat.cse@rmkec.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 363
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_28
364 M. A. M. Raja et al.
about the products or services shared by other users on social media. Many social
media applications provide different services for knowing the trend about events or
popular personalities or products or services. It has become routine that people look
for trending topics or products. Twitter is one of the familiar and powerful social
media applications for sharing messages instantly with other Twitter users. Twitter
allows users to share information on many topics and also provides trending topics
to its users.
The main contribution of the work in this paper is the adaptive classifier using
extreme learning machine (ELM). ELM is the neural network-based classifier that
makes use of the number of hidden nodes for learning from data in a short time. In this
work, an adaptive classifier using ELM is used, and it provides a faster training phase
for data streams. The performance of ELM is further improved using the decision
rule database.
In Sect. 2, the related works of twitter trend analysis with different learning models
and classification algorithms are discussed. In Sect. 3, the system architecture with
its various functional components is described. In Sect. 4, the experiments and results
of the proposed adaptive classifier are explained. In Sect. 5, the comparative analysis
of the proposed work with other existing works is concluded.
2 Related Works
Many research works have been carried out for performing the analysis of the social
media data to predict customer interest. Twitter data analysis is performed by many
organizations to evaluate their product or services to know their usage among users.
Trend analysis includes the identification of popular events that are being discussed
on Twitter. Qian et al. [1] used a multi-modal social event tracking framework which
is used for finding the topics of social events by modeling social media documents.
This makes use of an incremental learning strategy to get the updated event topics
from social media. Shi et al. [2] applied a trend identification mechanism for knowing
the public interest. The authors have used complex event processing methods for
performing public mood tracking. These methods used microblogs which are trans-
formed into microblog events using sentiment analysis. After finding the event, the
summarization technique is used to summarize the public mood at different periods.
Khater et al. [3] implemented a recommendation system in which the trend is
identified based on the interest of the social media users. The trending user community
is identified using the location of the social media users. The recommendation is
provided to the users who prefer the trending topic in their interested domain. The
suggestion about the trending topic is provided using the information gathered from
the social media users along with the content recommended by the social media
application. Lai et al. [4] combined the time and space attributes as a spatio-temporal
model since the time series play a vital role in microblog trend detection. It helps
to find the relation among the topic discussed by different user communities. The
Adaptive Classifier Using Extreme Learning Machine … 365
abnormalities in the topic diversion are also found out by correlating the time and
space of the user content.
He and Yan [5] made a model to mine blogs. This model is used to understand the
use of social media in customer co-creation. The relevant posts and blogs are identi-
fied for finding the customer co-creation. This co-creation information is helpful for
organizations to promote their products or services easily. Masud et al. [6] designed
a class detection model which is used to identify new classes in the generated data.
This class detection model helps to identify the immediate change in the generation
of data. The new class topic detection is achieved by the ensemble classification
framework. It is effective in handling the drift among the data streams, but it requires
much time to give the newly evolved topic.
Kasiviswanathan et al. [7] used distributed dictionary learning method for
handling the evolution of new topics in the data streams. When compared to the
ensemble methods, this dictionary learning focuses on the specific domain with the
dictionary of data. This reduces the learning time required for identifying the new
topic in the data streams. Aiello et al. [8] designed topic modeling for sensing trending
topics on Twitter. It is required to focus on the trend of a specific topic in a particular
domain. Co-occurrence of related events and topic ranking is important methods for
detecting the trending in Twitter. Wang et al. [9] invoked enhanced topic modeling
methods to focus on the volume as well as the time-based content generation in
social media. Xie et al. [10] implemented a bursty topic detection model to identify
bursty topics from Twitter with the time series of user posts. The trending topics are
identified with the scaling of data generation.
Fatma et al. [11] performed cloud-based data stream optimization to process data
streams immediately in the real-time environment. This requires optimization of the
methods to process the streams instantaneously. The continuous query optimization
is used with the multiple plans for the data streams in the cloud environment for the
cluster of data. Abdullatif et al. [12] used fuzzy methods to perform the analysis on
non-stationary data streams. This addresses the way to find out the evolving nature
in the data streams and outlier identification.
Many works have performed a trend analysis of the Twitter social networking
application. However, there is a lack in identifying the top tweet content about a
product or service without spending much time. Since most of the classification-
based product reputation finder required much time for training the classifier. It is
necessary to identify top tweets by reducing the training time using the adaptive
classifier. In this work, an adaptive classifier is proposed to handle the Twitter data
streams for identifying the popularity of the products.
The system design of the adaptive classifier is shown in Fig. 1. The components of
the systems are tweet extractor, sliding window, tweet repository, reservoir sampler,
366 M. A. M. Raja et al.
Twitter is the most promising social media application, used by many Internet users.
It allows the users to share their views or opinions instantly with others through the
140-character microblog, called tweets. Recently, the tweet size has been increased
to 280 characters by Twitter [13]. It is designed to collect tweets from Twitter. The
input keywords are given to the extractor for a specific domain. The tweets related
to these keywords are collected using Twitter streaming API. Each tweet consists
of many parameters [14] such as user name, tweet content, user id, date and time,
geolocation, retweet, and favorite.
Adaptive Classifier Using Extreme Learning Machine … 367
There are a total of 1907 positive words, 4750 negative words used for finding
the polarity of a tweet. The classifiers are first trained on the collected tweets using
these keywords and then tested on new tweets [15].
In this section, the experiment is explained with the result analysis, and also, the
results are compared with other data stream classifiers to show better use of the
proposed adaptive classifier. The dataset used for this work is collected from Twitter
social media. Twitter provides APIs for collecting tweets. In this work, the tweets
are collected using the Twitter streaming API. The tweets are received from the
streaming API either in JSON or XML format.
The proposed adaptive classifier makes use of the extreme learning machine algo-
rithm (ELM) [16, 17]. It is the single-layer feed-forward network [18] for the training
of data in a short time. This classifier requires one iterative feed and not multiple
iterative feeds. It is considered that there may be a varying number of hidden layers.
In a practical view, one hidden layer is enough [19] for estimating the nonlinear
function. Though the hidden layer is only one, the number of nodes may be kept
as sufficient as required to estimate the nonlinear function with the input values. In
ELM, the training data are converted as vectors that are considered as the features
which have the attribute values. In this work, the tweet with its positive and negative
polarities of words are considered as two features.
The ELM algorithm is shown in Fig. 2. The distinct input samples are given to
the ELM classifier as inputs. There are 100 hidden nodes used in ELM for experi-
mental purposes. 10,000 tweet samples are populated using the reservoir sampler. 10
368 M. A. M. Raja et al.
rounds are conducted to evaluate the learning capabilities of the ELM. The learning
capability is measured using both training time and training accuracy.
The training of classifiers is performed on the collected tweets. It is inferred from
Table 1 that the training time required for an adaptive classifier using ELM is lesser
than the time required by decision tree-based and ensemble-based stream classifiers.
The error rate is also minimum since the classifier training makes use of domain-
specific keywords as well as the polarity word corpus. In addition, the error rate is
reduced with the increase of the number of hidden nodes in ELM.
The results are shown in Table 1 that the adaptive classifier using ELM is capable
of performing better when compared to other stream classifiers.
Adaptive Classifier Using Extreme Learning Machine … 369
Table 1 Performance comparison of adaptive ELM and other stream classifiers on training data
Stream classifier Training data Precision Recall F-measure
Positive Negative Positive Negative Positive Negative
Adaptive ELM Sample 1 0.998 0.995 0.970 0.998 0.996 0.994
stream classifier Sample 2 0.997 0.994 0.972 0.996 0.995 0.993
Sample 3 0.998 0.995 0.971 0.998 0.996 0.994
Sample 4 0.996 0.996 0.970 0.997 0.994 0.995
Average 0.997 0.995 0.970 0.997 0.995 0.994
Ensemble Sample 1 0.986 0.984 0.992 0.976 0.986 0.984
stream classifier Sample 2 0.985 0.983 0.991 0.975 0.985 0.983
Sample 3 0.983 0.985 0.992 0.976 0.986 0.982
Sample 4 0.985 0.984 0.993 0.974 0.983 0.984
Average 0.985 0.984 0.992 0.975 0.985 0.983
Decision Sample 1 0.946 0.930 0.956 0.958 0.946 0.953
tree-based Sample 2 0.936 0.925 0.952 0.946 0.942 0.952
stream classifier
Sample 3 0.932 0.918 0.947 0.942 0.945 0.942
Sample 4 0.928 0.914 0.942 0.938 0.938 0.944
Average 0.936 0.922 0.949 0.946 0.943 0.948
It is observed that the comparison of the classifiers on the training data is shown in
Fig. 3. The adaptive classifier on the training data streams achieves better precision,
recall, and f-measure when compared to ensemble and decision tree-based classifiers.
Classifiers are generally used to classify the data based on the available features.
But, hierarchical classifiers like ELM are used in many scenarios, wherein the dataset
Table 2 Types of
Feature Class label
smartphone features and its
class labels Battery C1
Screen size C2
Price C3
Memory size C4
Camera pixel C5
Processor speed C6
Touch sensitivity C7
Durability C8
Compactness C9
has more features with a large volume of data. In these cases, the rule-based classifiers
help to increase the accuracy by classifying the missed instances in some other
classifiers [20]. In this work, ELM-based adaptive classifier using which the tweets
have been classified with polarity as features. There is a need for improving the
classification process for better product recommendations for two reasons. First,
the tweets need to be considered with the features available in the Twitter data for
smartphone brands. Second, the need for a rule-based classifier is to achieve the
highest classification accuracy with the minimum number of hidden nodes in ELM.
Sometimes, the increase in the number of hidden nodes may lead to an increase
in training time for the data streams. So, there is a necessity to keep the number of
hidden nodes minimum. To overcome these difficulties, the rule database is required
so that the data stream will be correctly classified according to the rules represented
in the rule database. The features of smartphones and their corresponding class labels
are shown in Table 2 to use in the rule database.
The rules are created based on the attributes of smartphones. There are five smart-
phone attributes taken for consideration with the total attribute values of 13. 1287
possible rule combinations are identified for these attributes. These rule combinations
are taken as follows.
With the combination of each attribute value with other attribute values, 13C5
(nCr) combinations are obtained as 1287. The tweets are processed using the rule
database, and Table 3 shows how the tweets are processed, and the rules are evaluated.
It is necessary to consider the attribute value present in the tweets for identifying
whether the particular smartphone model may be recommended or not to the user. The
sentiment is identified as associated with the attributes mentioned in the tweet. The
tweet is analyzed with the polarity present in the content, and also, the smartphone
model is identified from the tweet content. After every tweet is invoked to the rule
database, it is concluded that whether the user recommends the product or not and
in addition what attribute or feature the user is interested on a particular smartphone
model is also identified. The comparison of the classifiers on testing data streams
is shown in Fig. 4. The classifier performance mainly varies in the unclassified data
Adaptive Classifier Using Extreme Learning Machine … 371
streams. The number of unclassified data streams in the ELM classifier is less when
compared to the other two data stream classifiers such as ensemble and decision tree.
The product reputation is identified using the adaptive classifier. The classification
results have been shown in Table 4. Four features of the smartphone are considered
for finding out which of the features are commented with more positive or negative
tweets. Two different sets of streams are used for checking the product’s reputation.
In this work, an adaptive classifier algorithm has been proposed using extreme
learning machine along with the product’s attribute-based rule database. This work
proposes a method for adapting the classifier to the changing nature of data. The
rule-based ELM classifier adapts to the continuously generated Twitter streams. The
performance of the classifier has improved, and the classifier model adapted to new
data streams and produced better classification accuracy up to 98.27%. The proposed
adaptive classifier algorithm is suitable for handling data streams with lesser training
time. This experimental study provides insights for developing a classifier model for
dynamic data streams.
References
1. S. Qian, T. Zhang, C. Xu, J. Shao, Multi-modal event topic model for social event analysis.
IEEE Trans. Multimedia 18(2), 233–246 (2016)
2. S. Shi, D. Jin, G. Tiong-Thye, Real-time public mood tracking of chinese microblog streams
with complex event processing. IEEE Access 5, 421–431 (2017)
3. S. Khater, D. Gračanin, H.G. Elmongui, Personalized recommendation for online social
networks information: Personal preferences and location-based community trends. IEEE Trans.
Computat. Soc. Syst. 4(3), 104–120 (2017)
4. E.L. Lai, D. Moyer, B. Yuan, Topic time series analysis of microblogs. IMA J. Appl. Math.
81(3), 409–431 (2016)
5. W. He, G. Yan, Mining blogs and forums to understand the use of social media in customer
co-creation. Comput. J. 58(9), 1909–1920 (2015)
6. M.M. Masud, Q. Chen, L. Khan, Classification and adaptive novel class detection of feature-
evolving data streams. IEEE Trans. Knowl. Data Eng. 25(7), 1484–1497 (2013)
7. S.P. Kasiviswanathan, G. Cong, P. Melville, R.D. Lawrence, Novel document detection for
massive data streams using distributed dictionary learning. IBM J. Res. Dev. 57(3), 1–15
(2013)
8. L.M. Aiello, G. Petkos, C. Martin (2013) Sensing trending topics in Twitter. IEEE Trans.
Multimedia 15(6), 1268–1282 (2013)
9. Z. Wang, L. Shou, K. Chen, G. Chen, S. Mehrotra, On summarization and timeline generation
for evolutionary tweet streams. IEEE Trans. Knowl. Data Eng. 27(5), 1301–1315 (2015)
10. W. Xie, F. Zhu, J. Jiang, E.P. Lim, K. Wang, TopicSketch: Real-time bursty topic detection
from Twitter. IEEE Trans. Knowl. Data Eng. 28(8), 2216–2229 (2016)
11. M.N. Fatma, R.M. Ismail, N.L. Badr, M.F. Tolba, Cloud-based data streams optimization.
WIREs Data Min. Knowl. Discov. 8(8), e1247 (2018)
374 M. A. M. Raja et al.
Keywords COVID-19 · Decision making · Soft set · Fuzzy soft set · Lockdown
1 Introduction
COVID-19 is a contagious disease caused by one of the strains from the family of
Coronavirus named 2019-nCoV. The disease started in Wuhan, China at the end of the
year 2019 and gradually spread to the whole world leading to the ongoing coronavirus
pandemic. Viruses mutate easily to adapt to the surrounding environment, which
creates new variants of the same virus with more resistance to the medicines available.
Due to different mutated variants of the virus, the containment of the virus becomes
difficult. The difficulties grow more with the more spreading of the disease. The
symptoms of the coronavirus are shown in 2–14 days of infection. Due to its high
transmission rate the virus is creating havoc among all, even with a recovery rate
R. K. Mohanty (B)
SCOPE, VIT, Vellore, Tamil Nadu 632014, India
B. K. Tripathy
SITE, VIT, Vellore, Tamil Nadu 632014, India
e-mail: tripathybk@vit.ac.in
S. Ch. Parida
KBV Mahavidyalaya, Kabisurya Nagar, Odisha, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 375
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_29
376 R. K. Mohanty et al.
of greater than 98% and fatality rate of less than 2%. Another dangerous feature of
the disease is that it shows different symptoms in different persons. The presence of
many variants of the virus adds to the list of symptoms even further.
To design a model for handling any uncertain situation like this pandemic, there
are many uncertainty-based models available in literature such as fuzzy set (FS) [18],
rough set, Intuitionistic fuzzy set [1]. Molodtsov [11] proposed a soft set (SS) model
to handle uncertainty-based problems with multiple parameters. Every parameter
is associated with a subset in a SS. In [15] characteristic functions are introduced
to redefine the definitions and notions of SS. SS is applied for decision making in
[9]. Mohanty et al. [10], Sooraj et al. [14], and Tripathy et al. [16, 17] redefined
FSS through the use of characteristic functions and made the definitions more useful
for decision making applications. Later several other hybrid models related to SS
were redefined other hybrid models of the soft set using characteristic approach
and several decision-making algorithms were proposed using these models. There
are several articles about uncertainty-based decision making applications to handle
different situations during Covid-19 pandemic [2–5, 12, 13]. This paper proposes
an algorithm to strategize the unlock process by assessing the seriousness of the
coronavirus pandemic.
F : C → Poset(W ) (1)
1, if s ∈ F(t);
χ(F,C)
t
(s) = (2)
0, Otherwise.
F : C → Poset(W ) (3)
The coronavirus pandemic is creating havoc in the allover world since the beginning
of 2020. To save the people from coronavirus and contain the spreading of the virus,
the governments impose forceful lockdowns in their respective areas. This application
is all about the assessment of the seriousness of pandemic in an area to exit from
lockdown.
The rate of spreading of a viral pandemic like Covid-19 depends on many param-
eters such as Test positivity Rate, Fatality Rate, healthcare Index, Active cases,
recovery rate, average new cases per day, vaccinated population Percentage, Active
Ratio, doubling Rate, Homeless population, Population Density, Air Quality, Traffic
Mobility, Vulnerable age group population, Transmission Rate, Asymptomatic
positivity rate.
To decide on more relaxations from lockdown, the administrators need to think
about most of these necessary safety parameters as much as possible in their
constraints. The decision will be more accurate if the decision is taken by assessing
all these parameters in the areas as small as possible. Because the parameters won’t
be uniform for all places of a bigger area.
7-day moving average: Some parameter values go up and down a lot. For
example, the number of new COVID-19 cases daily is difficult to assess as it depends
on many other parameters like the reporting time, testing time, exposure to infec-
tion, and many more such parameters. Because of this, it’s very difficult to apply
those values in decision-making applications directly. The “7-day moving average”
approach can handle nicely to these types of parameters. An n-day moving average
378 R. K. Mohanty et al.
Fig. 1 Daily covid-19 case increase with the 7-day moving average approach
takes the mean value of past n-days, and the result is saved as of the value of that day.
For example, the value for June 7 is the mean of values from May 1 to May 7. For
May 8, it’s the mean of values from May 2 to 8, and so on. This approach adjusts the
confirmation delay due to some reason. In most cases, coronavirus disease shows its
symptoms in 3–10 days. So, the 7-day moving average is an appropriate approach
in this case. The parameters like Test positivity Rate, Fatality Rate, recovery rate,
average new cases per day, Transmission Rate, Asymptomatic positivity rate can be
handled using the “7-day moving average” approach.
Figure 1 shows the real COVID-19 cases growth values and daily growth values
plotted as per the 7-day moving average approach.
In this section, the definitions of some important parameters used to assess the coron-
avirus situation are discussed. Due to space constraints, all the discussed parameters
are not used in the application provided in the next section.
i. Test positivity Rate = TotalTotal
number of positive cases found
number of tests done
ii. Transmission Rate (R0): Average number of persons getting infected from a
single person.
iii. Doubling Rate: It is the number of days in which the number of infected
people will be doubled.
iv. Asymptomatic positivity rate: The number of asymptomatic positive cases
identified, divided by total covid positive cases confirmed.
v. Fatality: percentage of death occurred among total covid positive cases
identified.
Decision Making on Covid-19 Containment Zones’ Lockdown … 379
vi. Recovery Ratio: Percentage of people recovered among total covid positive
cases identified.
vii. Active Ratio: Percentage of people who are suffering in covid among total
covid positive cases identified.
Fuzzification of collected data can be done as follows.
Infected: Number of persons infected divided by population of the area considered.
Values for the parameters Active, Recovered, Vaccination can also be computed
similarly, that is, dividing the value by the total population of the area considered.
The values which are expressed in percentages can be fuzzified by dividing the
value by 100.
Doubling Rate (Rd ): In this application, the doubling rate value is fuzzified by
using the formula mentioned in Eq. 5.
1
Fuzzified Rd = min 1 − ,1 (5)
Rd
n
Normalized value ai = ai / a j , i = 1, 2, ..., n (6)
j=1
where P(ti ) denotes the priority of the parameter ti Cti and denotes the type of
parameter, Cti = (−1) if a parameter ti is negative, otherwise Cti = 1.
To compare the scores of multiple competitors, the following formula is used.
Score(xi ) × n
Cscore(xi ) = n (8)
i=1 x i
380 R. K. Mohanty et al.
where Cscore(xi ) denotes the comparison score of the competitor xi and n denotes
the number of competitors.
Step-1: Get the Covid-19 data table and fuzzify the data as mentioned in the previous
section (Table 1).
Step 2: Normalize all values as mentioned in Eq. 6 to get the required data in FSS
format (Table 2).
Step 3: Compute the priority of all parameters as mentioned in Eq. 7.
Step 4: Compute the priority data table by multiplying respective parameter
priorities to FSS values.
Step 5: Compute the comparison table by using the formula given in Eq. 8.
Step 6: The decision table can be constructed by computing the total score of
every competitor in the comparison table.
Step 7: Rank the scores obtained in the comparison table. The place with a higher
score should get a better rank. Better-ranked places are better suitable for more
relaxations in lockdown.
Table 6 is constructed by computing the sum of scores in each parameter for all
states.
The state which got a higher score can be given more relaxations in lockdown.
The seriousness of the pandemic situation in the states can be realized by looking at
the scores in Table 6. The higher score is better.
4 Conclusions
This paper proposes an algorithm to help in making strategies for the lockdown
exiting process in this ongoing pandemic using a fuzzy soft set. It also provides
an application of the proposed algorithm to make the process more understandable.
This paper used the state-level data in the provided application. But, the approach
can also be implemented in smaller areas to make the decision even more effective.
The approach can be better by taking different uncertainty-based models like rough
sets and interval-valued mathematics.
Decision Making on Covid-19 Containment Zones’ Lockdown … 383
References
1. K. Atanassov, Intuitionistic fuzzy sets. Fuzzy Set Syst. 20, 87–96 (1986)
2. A. Ghosh, S. Roy, H. Mondal, S. Biswas, R. Bose, Mathematical modelling for decision making
of lockdown during COVID-19. Appl. Intell. 1–17 (2021)
3. A. Gulia, N. Salins, Ethics-based decision-making in a COVID-19 pandemic crisis. Indian J.
Med. Sci. 72(2), 39 (2020)
4. M. Gupta, S.S. Mohanta, A. Rao, G.G. Parameswaran, M. Agarwal, M. Arora, S. Bhatnagar,
Transmission dynamics of the COVID-19 epidemic in India and modeling optimal lockdown
exit strategies. Int. J. Infect. Dis. 103, 579–589 (2021)
5. M. Herle, A.D. Smith, F. Bu, A. Steptoe, D. Fancourt, Trajectories of eating behavior during
COVID-19 lockdown: longitudinal analyses of 22,374 adults. Clin. Nutrit. ESPEN 42, 158–165
(2021)
6. https://www.who.int/publications/m/item/weekly-epidemiological-update-on-covid-19---27-
april-2021. Last accessed on 25th May 2021.
7. https://www.mohfw.gov.in/, last accessed on 25th May 2021.
8. https://www.covid19india.org/. Last accessed on 25th May 2021.
9. P.K. Maji, R. Biswas, A.R. Roy, An application of soft sets in a decision making problem.
Comput. Math. Appl. 44, 1007–1083 (2002)
10. R.K. Mohanty, T.R. Sooraj, B.K. Tripathy, IVIFS and decision-making. Adv. Intell. Syst.
Comput. 468, 319–330 (2017)
11. D. Molodtsov, Soft set theory—first results. Comput. Math. Appl. 37, 19–31 (1999)
12. A. Smirnova, L. DeCamp, G. Chowell, Mathematical and statistical analysis of doubling
times to investigate the early spread of epidemics: application to the COVID-19 pandemic.
Mathematics 9(6), 625 (2021)
13. S. Snuggs, S. McGregor, Food & meal decision making in lockdown: how and who has Covid-19
affected? Food Qual. Prefer. 89(104145), 1–6 (2021)
14. T.R. Sooraj, R.K. Mohanty, B.K. Tripathy, Improved decision making through IFSS. Smart
Innov. Syst. Technol. 77, 213–219 (2018)
15. B.K. Tripathy, K.R. Arun, A new approach to soft sets, soft multisets and their properties. Int.
J. Reason. Based Intell. Syst. 7(3/4), 244–253 (2015)
16. B.K. Tripathy, R.K. Mohanty, T.R. Sooraj, A. Tripathy, A modified representation of IFSS and
its usage in GDM. Smart Innov. Syst. Technol. 50, 365–375 (2016)
17. B.K. Tripathy, T.R. Sooraj, R.K. Mohanty, A new approach to fuzzy soft set theory and its
application in decision making. Adv. Intell. Syst. Comput. 411, 305–313 (2016)
18. L.A. Zadeh, Fuzzy sets. Inf. Control 8, 338–353 (1965)
Deep Learning on Landslides:
An Examination of the Potential
Commitment an Expectation of Danger
Evaluation in Sloping Situations
Abstract The landslide comes as the rehashed geographical dangers in the blustery
season, which brings fatal damages, harmful to property, what’s more, monetary
misfortunes. Landslides are in charge of in any event 17% of all fatalities from
normal perils around the world, and almost 25% of yearly losses brought about
by regular dangers. Because of worldwide environmental change, the recurrence of
a landslide event has been expanded and, in this way, the misfortunes and harms
related to landslides likewise have been expanded. Consequently, the exact forecast
of landslide event, and observing and advance cautioning for ground needs to be
done. Developments are significant errands to diminish the harms and misfortunes
brought about via landslides. Landslide is turning into an issue all through the world.
The recurrence and greatness of landslides undermining an enormous populace and
the condition is expanding over the world. Remote detecting framework is playing
an imperative job in Landslide forecast. In explicit, satellite remote detecting is
powerful in covering a huge territory for catching pictures, which thusly is utilized
as a contribution for preparing the framework for foreseeing landslides before about
fourteen days utilizing the neural systems.
1 Introduction
Normal risks like avalanches are one of the most unsafe geological calamities in
various zones all throughout the planet. Avalanches can achieve huge property
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 385
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_30
386 J. Aruna Jasmine and C. Heltin Genitha
mischief and human misfortunes in rough regions. A continuous world failure report
shows that heavy slides and avalanches address 42% of the overall event of calami-
tous occasions, with ordinary yearly monetary disasters in light of avalanches adding
up to billions of US dollars. As per information from the Middle for Exploration on
the study of disease transmission of Fiascos (CRED) in Brussels, Belgium, torrential
slides are liable for basically 17% of all passing’s brought about by catastrophic
events around the world [1].
Solid early cautioning frameworks are a sensible methodology for landslide hazard
decrease. Such methods can be successfully implemented if avalanche movement
can be visualized. For example, during the 1985 Xin tan avalanche, which occurred
26 km upstream of the Three Gorges Dam (TGD), practical foresight of the avalanche
dislodging decreased the financial incidents and losses significantly [2].
Torrential slide delicacy assessment techniques can be accumulated into the theo-
retical (information driven) and measure (acquired from information and reality-
based) systems, reliant upon the way where they treat torrential slide beginning parts
and models (Fig. 1) by and large, learning-driven cycles depend totally on the expert
assessment of those doing the vulnerability appraisal [3]. Along these lines, data-
driven strategies are just to a great extent used for weakness examination over tremen-
dous regions since they don’t have a strong actual thought of slope dissatisfaction
[4]. Quantitative assessments might be separated into two kinds: information driven
cycles and based strategies. The exact connections between the areas of torrential
slides that have recently happened and torrential slide prompting parts are surveyed in
information driven strategies, and afterward quantitative guesses for torrential slide
free zones with almost tantamount conditions are made. These methods utilize an
as far as anyone knows information driven strategy that considers data from earlier
avalanches to decide the overall meaning of every part. This framework expects
that conditions that have recently set off torrential slides will do as such once more
[5]. The different primary procedures used to anticipate avalanches dependent on
past designs are bivariate quantifiable techniques, multivariate approaches, and fake
neural structure assessment. In the two different assessments, each cut section, for
instance, tendency, geography is considered to address issues. In multivariate quan-
tifiable strategies, the solid relationship among avalanches and geographic variables
like development is thought of. A phony neural system offers a computational part
that can ensure about, address, and register an aide starting with one multivariate
space of data then onto the accompanying given arrangement of information tending
to the affiliations [6]. A fake neural structure is set up by the utilization of a lot of
related information and yield respects. Since the data-driven proceedings are used
by most researchers, the properties are not put forth in this review. These techniques
are considered good for practically every one of the works. The most pacifying tech-
nique is the examination of Landslide transport over a wide range of huge districts
and their impact factors [7]. Additionally, these avalanche defenselessness evaluation
procedures consider only the associations among avalanches and related segments,
not the mistake framework [8]. To add to that, the real models, generally, take out the
transient pieces of avalanches and can’t expect the measure of existence of changes in
bringing landslide into a limit criteria (e.g., variation in the water table and difference
in land use) [9].
includes the passionate portrayal, evaluating assessments along and over the bulk
turn of occurrences.
The severe disturbance, disaster, loss of property, wealth, and agony caused by land-
slides have always been a matter of serious concern and discussion. As in front of
timetable as the 1930s was the predecessors in predicting landslides. To take a step
on this discussion Schuster and Costa have given away a volume of a book about the
landslide. This method seems to be more phenomenal on stream blockages without
any weakness. Figure 2 shows the landslide on dams in New Zealand.
Despite the extensive study done thus far, the multimodal geomorphic character-
istics seen in large slide dams continue to provide enormous challenges in portraying
an assessment explanation. It’s unclear if the heavy sliding volume or the volume of
the allocated storeroom, for example, would perform the stunt as an operator measure
to assess the geomorphic criticality or impact.
There are various factors furthermore that co-sway the avalanche occasion, for
instance: incline assortment inside the inclination unit, typical recurring pattern of
slope unit, lithological fair assortment, the ordinary partition to water-net, and typical
detachment to basic segments.
A zone ZG118 on the Baishuihe heavy slide and a zone ZG111 on the Bazimen
heavy slide was chosen to set up and grab the longing model, as mentioned:
• For zone ZG118, the organizing dataset was selected from August 2003 to
December 2012, and the remaining data from January 2013 to December 2013
was utilized to test the model.
• For zone ZG111, the organizing dataset was selected from August 2003 to
December 2011, and the data from January 2011 to December 2012 was utilized
to test the model.
Taking into account the accumulated dislodging turn, the dislodging throughout
the flood season displays consistent step-wise progress [15] demonstrated that the all-
inclusive length model may be mirrored by discarding the influence of the advance-
ment in collected clearings utilizing a moving standard way of thinking. The TGR’s
water level fluctuates between 145 and 175 m at various points. The year was chosen
as the moving normal cycle for the TGR’s booking case [17].
There were three levels in the two models in zones ZG118 and ZG111. The first
two levels were LSTM layers, with the third layer being thick. The period of the data
creation was substantial, and the fraction of recorded data points that were reinforced
as information was also significant [18]. The system search process did not restrict
the perfect length of the data plan.
It indicates that the clearing time course of action of stations XD1, XD2, XD3, and
XD4 began later or was shorter than that of stations ZG118 and, more importantly,
ZG93, even though stations ZG118 and ZG93 had a comparable launch structure.
Station ZG118 was chosen as an example for the assessment of the expulsion using
the new model because it has a longer dislodging time plan and a low flying altitude.
Figure 6 addresses the different estimations of avalanches. Blue concealing in
Fig. 6 addresses the data assembled from google earth while the orange concealing
outline addresses the data accumulated from the field mapping. The above outline
depicts the Number of avalanches at substitute tallness. The data gathering is from
the avalanche stock guide for the Chittagong Hilly Areas of Bangladesh subject to
Google earth and field mapping.
Figure 7 addresses the match rate between the threat control made by the ITC
gathering and the estimate maps concerning the assorted avalanche types. The map
1 in Fig. 7 is the geomorphologic hazard control made by the ITC bunch subject to
the comprehension of ethereal photograph and the field assessment. It is made by
masters. The sorts of dangers are low, moderate, and high. The map 2 is the desire
map made by the quantitative estimate model concerning the various avalanche types.
It is made by the desired model.
Information driven help vector machine (SVM) was used to choose the perfect
avalanche forming factors. Components that reason avalanches like soil, precipitation
are progressively basic to anticipate the avalanche earlier with more precision with
Deep Learning on Landslides: An Examination … 393
the objective that we assembled the data through google earth and made it as a
diagram that addresses the weight of the components causing avalanches. Figure 8
discusses the landslide conditioning factors.
394 J. Aruna Jasmine and C. Heltin Genitha
6 Conclusion
References
1. M. Baldonado, C.-C.K. Chang, L. Gravano, A. Paepcke, The stanford digital library metadata
architecture. Int. J. Digit. Libr. 1, 108–121 (1997)
2. K.B. Bruce, L. Cardelli, B.C. Pierce, Comparing object encodings, in Theoretical Aspects of
Computer Software. Lecture Notes in Computer Science, vol. 1281, ed. by M. Abadi, T. Ito
(Springer, Berlin, 1997), pp. 415–438
Deep Learning on Landslides: An Examination … 395
3. A.C. Roy, M.M. Islam, Predicting the probability of landslide using artificial neural network,
in 2019 5th International Conference on Advances in Electrical Engineering (ICAEE) (2019),
pp. 874–879. https://doi.org/10.1109/ICAEE48663.2019.8975696
4. J.A.V. Ortiz, A.M. Martínez-Graña, A neural network model applied to landslide susceptibility
analysis (Capitanejo, Colombia). Geomat. Nat. Hazards Risk 9(1), 1106–1128 (2018). https://
doi.org/10.1080/19475705.2018.1513083
5. Y. Bengio, P. Simard, P. Frasconi, Learning long-term dependencies with gradient descent is
difficult. Neural Netw. IEEE Trans. 5(2), 157–166 (1994)
6. Y. Cao, K. Yin, D.E. Alexander, C. Zhou, Using an extreme learning machine to predict the
displacement of step-like landslides in relation to controlling factors. Landslides 13(4), 725–736
(2016)
7. S.-Y. Chen, W.-Y. Chou, Short-term traffic flow prediction using EMD-based recurrent
Hermite neural network approach, in 2012 15th International IEEE Conference on Intelligent
Transportation Systems (IEEE, 2012). https://doi.org/10.1109/ITSC.2012.6338665
8. J. Corominas, et al., Prediction of ground displacements and velocities from groundwater level
changes at the Vallcebre landslide (Eastern Pyrenees, Spain). Landslides 2(2), 83–96 (2005)
9. J. Du, K. Yin, S. Lacasse, Displacement prediction in colluvial landslides, three Gorges
reservoir, China. Landslides 10(2), 203–218 (2013)
10. R. Eberhart, J. Kennedy, A new optimizer using particle swarm theory, in MHS’95. Proceedings
of the Sixth International Symposium on Micro Machine and Human Science (IEEE, 1995)
11. Y. Fan, et al., TTS synthesis with bidirectional LSTM based recurrent neural networks, in
Fifteenth Annual Conference of the International Speech Communication Association (2014)
12. X. Fan, et al., Failure mechanism and kinematics of the deadly June 24th 2017 Xinmo landslide,
Maoxian, Sichuan, China. Landslides 14(6), 2129–2146 (2017)
13. X. Fan, X. Qiang, G. Scaringi, Brief communication: post-seismic landslides, the tough lesson
of a catastrophe. Nat. Hazard. 18(1), 397–403 (2018)
14. Z. Ma, F. Mei, Q.X. Xuanmei, G. Scaringi, Brief communication: post-seismic landslides, the
tough lesson of a catastrophe. Nat. Hazards Earth Syst. Sci. 18(1): 397–403 (2018). F. Piccialli,
Machine learning for landslides prevention: a survey. Neural Comput. Appl. 33, 10881–10907
(2021). https://doi.org/10.1007/s00521-020-05529-8
15. C. Lissak, A. Bartsch, M. De Michele, et al., Remote sensing for assessing landslides and
associated hazards. Surv. Geophys. 41, 1391–1435 (2020). https://doi.org/10.1007/s10712-
020-09609-1
16. F.A. Gers, J. Schmidhuber, Recurrent nets that time and count, in Proceedings of the
IEEE-INNS-ENNS International Joint Conference on Neural Networks. IJCNN 2000. Neural
Computing: New Challenges and Perspectives for the New Millennium, vol. 3 (IEEE, 2000)
17. F. Paul, Remote sensing-based assessment of hazards from glacier lake outbursts: A case study
in the Swiss Alps. Can. Geotech. J. 39, 316–330 (2002)
18. J. Innes, Debris flows. Prog. Phys. Geogr. 7(4), 469–501 (1983). International Federation of
the Red Cross and Red Crescent Societies, I (2001). World Disasters Report 2001
The Good, The Bad, and The Missing:
A Comprehensive Study on the Rise
of Machine Learning for Binary Code
Analysis
Abstract Binary code analysis is an enabling technique for a wide range of appli-
cations such as digital forensics, software reengineering, malware detection, and
hardening software codes against known vulnerabilities. With the recent advance-
ments in compilers and run time libraries, the lack of high-level semantics is a critical
capability that exists in binary analysis. This challenge highly impacts the perfor-
mance of the existing binary disassembly tools which may provide inaccurate results
between source code and binary code. To address this problem, machine learning
techniques attract significant attention in recent times due to their dominance and
automation in analyzing the code at a binary level. Hence, this article discusses the
challenges in existing disassembly tools and attempts to show the significance of
machine learning approaches in binary code analysis.
1 Introduction
Binary Code Analysis (BCA) is the process of deriving properties about the behavior
of computer programs which can be done by either dynamic program analysis or static
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 397
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_31
398 S. Priyanga et al.
program analysis [1]. It is primarily focused on two major areas: program optimiza-
tion and program correctness. The input to program analysis can be source code,
binary file, byte-code, memory dump, abstract model, etc. Two factors emphasize
the importance of the analysis of binary program files: (1) most of the time, high-level
source code is not available and, (2) there is mistrust in the compilation chain.
BCA is currently evolving in the fields of defense, medical, and other Internet of
Things [2–4] based environments where the source code is not certainly accessible.
However, analyzing the binaries is not an easy task as the stripped binaries lack
precise information like function boundaries when compared to source code. Func-
tion identification is the important step in the binary analysis as most of the binary
analysis approaches depend on function boundary information to identify functions
and other security vulnerabilities effectively. On the other hand, compiler optimiza-
tion makes the function identification process complex in binary code [5]. Due to
the varying behavior of compiler optimizations, the following challenges are posed
to the function identification [6]:
1. All the bytes are not functions: Each byte present in a binary sequence is avail-
able when reverse engineered. These bytes can be a variable that is independent
of any function which is not present in a program.
2. Functions are non-contiguous: During BCA, functions are not necessary to
present continuously in the memory. It can interleave or can share code with
other functions.
3. Functions reachability: Some functions in the binary program cannot be
reached always when it reacts to system status such as high memory/ CPU
usage. On the other hand, some functions can be called from other programs
which are not reached inside the path.
4. Compilations are not the same: The performance of the BCA is influenced
by the compiler optimization. It changes from compiler to compiler. Compiler
version and compiler optimization are the major factors in manipulating func-
tions. On disassembling, compiling a code with static links varies from dynamic
links compilation.
5. Multiple entries in a function: During compiler optimization, functions may
have multiple entries. i.e., a single function can be called at multiple instances.
To solve the fore mentioned problems in function identification, several disas-
sembly tools and machine learning approaches have been developed. This article
brings out a detailed study on BCA. Further, it put forth the challenges and research
gaps present in the existing disassembly tools and machine learning approaches for
function identification in binary program analysis. Also, we provide our insights for
future directions on machine learning approaches for BCA.
This article presents the background study in Sect. 2. Section 3 discusses the chal-
lenges in existing disassembly tools. Section 4 provides insights on machine learning
in binary program analysis. Section 5 provides scope and assumptions following the
conclusions in Sect. 6.
The Good, The Bad, and The Missing: A Comprehensive Study … 399
2 Background
Most of the analysis techniques have been designed and developed for source code
analysis or byte-code analysis. The crude way of statically analyzing the binaries is
to disassemble them and then analyze the assembly code for the behavior of interest.
This process is time-consuming as well as completely dependent on the skill of the
analyst to decode the bits and pieces of the target logic that appears in the assembly
code. With the exponential increase in the rate of production of binary files, it is
necessary to automate or semi-automate the process of decoding the assembly code
[7].
Binary analysis can evaluate object and library files for the quality of the software
and finding bugs or vulnerabilities which further helps in extending the data flow
analysis to binary code. However, few problems exist specific to binary analysis: (i)
type information of data/address/code is missing, (ii) lack of high-level structure,
(iii) compiler can introduce dynamic jumps on its own, (iv) Shared code and data
in the same memory, and (v) all problems in the source code analysis still exist at a
binary level.
The binary analysis includes the following stages: (i) Disassembly of the binary,
(ii) Recovering semantics, (iii) Translation to IR, and (iv) Compiling back to binary
[8]. Among the 4 stages, the loss of semantic information is the bigger hurdle in
enabling the analysis of binaries, as the compilers strip the semantic information
during the compilation process from source to binary. Thus, the critical step in
enabling the analysis of binaries is recovering/constructing the semantic information
from the binary code. i.e. Recovering semantics.
In efforts at the global level, The Defense Advanced Research Projects Agency
(DARPA) has launched the Cyber Grand Challenge, a competition to create auto-
matic defensive systems capable of reasoning about flaws, formulating patches, and
deploying them on a network in real-time [9]. Under this program, academicians and
industry are designing frameworks and technologies which enable binary analysis
and help identify vulnerabilities in binaries. A few frameworks and tools developed
as part of this initiative are MAYHEM, ANGR (FIRMALICE), and MC-SEMA.
These tools are available for ×86 based architecture. Similar translation tools are not
available currently for other architectures like MIPS and ARM which are commonly
used in many popular devices/systems.
When the program develops from code to artifact, it is easy to understand the logic
of the particular command which makes the program act on the specified operation,
since the design and code are already known. Considering the reverse engineering
process, where only the program is present, the code and design logic are unknown
400 S. Priyanga et al.
poses a greatest hurdle for reverse engineers to disassemble the binary. This situa-
tion is further complicated by the fact that the plethora of reverse engineering tools
available and the capacity of the reverse engineer is decided by the ability to use
available tools to their maximum potential [10]. However, the existing disassembly
tools may not provide a particular set of advantages over the other despite being in
the same category and it is listed (Table 1).
IDA Pro [11] allows disassembly and step-by-step instruction jumping to under-
stand the flow of a program, it also provides a Control Flow Graph (CFG) which
increases the visualization and understanding of the control flow inside the program.
Though Ollydbg [12] possesses both debugger and disassembler, it is not much
versatile as IDA Pro. Immunity Debugger [13] provides the same functionalities as
Ollydbg and it restricts to allow the disassembling of files that are not meant for
static disassembly. The disassembly of Ollydbg and Immunity Debugger is far less
functional and understandable when compared to IDA Pro. Binary Ninja is entirely
different from both IDA Pro and Radare2. Though it is a popular reverse engineering
platform, fewer functionalities exist than IDA Pro [14].
Tools like Radare2 and GNU Debugger provide complex maneuverability inside
the disassembled code. However, it requires a deep knowledge for a reverse engineer
to utilize the tools as both are command-line-based tools. Cutter [15] is a tool that
works based on the disadvantage of radare2 and has an option to use Ghidra function-
alities to the tools as well and it is built as a GUI-based tool for easier functionality
[16, 17]. This tool has a promising future which has the resource utilization variable.
However, it is not as robust as IDA Pro when it comes to functionality and versatility
and it is not considered a viable option due to stability issues.
IDA Pro allows python scripting rather than supporting multiple architectures
without the commercial license [18]. For a novice to script the process and purchase
the commercial license would be a daunting task as the tool is not cheap as well.
Ignoring the above statements, these tools face difficulty to learn and produce
improved results when the number of binaries increases [19].
Over the past few decades, research in function identification shows a signifi-
cant growth in this field and produced more remarkable disassembly tools. Each
tool adopts different systematic strategies which have its strengths and weaknesses.
However, the performance of most of the existing tools relies on the algorithms
and heuristics to prove their correctness. Extensive study of the disassembly tools
emphasize the need to address the following research gaps in function identification:
(i) Accuracy and efficiency
(ii) Varying results for different types of tools
(iii) Factors that affect the results of existing disassembly tools
The above-said limitations alarming the researchers that still the research gap
exists to improve the efficiency and effectiveness of function identification. Existing
disassembly tools employ function signatures to identify functions. However,
these signatures can be generated automatically by ML approaches. Hence, recent
researchers focus on designing ML approaches for function identification that can
learn the key features of a binary code automatically [20].
Table 1 Summary of disassembly tools
Features IDA Pro Binary Ninja Ollydbg
License Commercial Commercial Single developer
Static or Dynamic Both Static Both (more dynamic)
Speed Longest load time Long load time Faster in dynamic debugger
Multiple Instances Requires heavy resources
Memory utilization High Medium Medium
Automation Requires a commercial license
Architectures ×86 32-bit, × 86 64-bit, ×86 32-bit, × 86 64-bit, ×86 32-bit, ×86 64-bit,
ARMv7,ARMv8,Thumb2,PowerPC, ARMv7,ARMv8,Thumb2,PowerPC, ARMv7,ARMv8,Thumb2,PowerPC,
MIPS,6502,8085,8086,8051,AMD k6-2 3-D MIPS,6502 (Paid) MIPS,6502 (paid)
The Good, The Bad, and The Missing: A Comprehensive Study …
In General, machine learning-based BCA comprises two phases: (i) Data pre-
processing and (ii) Training and Testing phase (Fig. 1). In the data pre-processing
phase, data is collected and pre-processed with feature selection and feature extrac-
tion techniques. In feature extraction, binary sequences of patterns are learned from
feature vectors to identify the distinct patterns from the binaries. This feature may
not help in reducing the size of original data which further opens room for feature
selection. This phase helps in identifying informative features from binary code and
achieves dimensionality reduction [21]. Reduced data obtained from these techniques
act as an input for training the machine learning model. Based on the training data,
the ML-based classifier identifies vulnerability paths, recognizing functions, and so
on.
The major challenge that exists in the binary analysis is the lack of high-level
semantic structures since the compilers discard them during source code compilation
making this an advantage for the intruders to modify the binary sequences that have a
huge impact on the output. In general, functions are the building blocks of high-level
programs but most of the binaries derived from them are undifferentiated sequences.
It is hard to find out useful information and the functional relation details from the
binary sequences. Therefore, the existing binary analysis techniques rely on function-
information repositories which initially attempt to recover the functions from the
binary sequences.
In recent times, several ML approaches have been designed for function iden-
tification to automate the signature generation process which is evident from the
recent tools such as Byte Weight and CMU Binary Analysis Platform (CMU-BAP)
[22]. However, these tools rely on predefined function signatures for the identifi-
cation of functions/constructs from the given binary sequences. Also, it requires
learning for every compiler version and it is failed to act independently for unknown
compilers. Initially, Rosenblum et al. [23] paved the way to design an ML approach
for function identification. This paper designed a supervised learning approach that
considers graphlet features to define the structure of the program. Bao et al. [22]
employ weighted prefix trees that learn CFG based features to improve the effi-
ciency of function identification. Shin et al. [24] apply Recurrent Neural Network
(RNN) which solve function boundary identification problem efficiently by learning
tokens of byte sequences. It has reduced the computation time drastically despite
achieving better accuracy on a prior test suite. FID proposes an ensemble learning
approach using LinearSVC, AdaBoost, and Gradient Boosting to recognize the func-
tions. This method works fine for different compilers and optimizations [25]. The
complex, undifferentiated, and ill-conditioned nature of binary instructions makes it
difficult for the above-said tools to achieve high accuracy (Table 2).
Based on the literature study, the following points summarize the need for machine
learning in function identification:
1. Analyzing multiple binaries manually is tedious and time-consuming
2. Pre-processing without information loss results in the massive volume of
samples
3. Feature engineering to identify informative samples thereby achieving dimen-
sionality reduction
4. Training the learning model with more informative and unique samples yields
a robust and reliable function identification tool
From our study, we observed that most of the machine learning-based function iden-
tification models construct a baseline profile of functions and conditional statements
and their corresponding binary sequences. Also, the performance of ML approaches
is limited to syntax and semantic feature extraction. Hence, we assume that the
function identification model can be envisaged as a multi-task knowledge discovery
framework that integrates various phases like extraction and selection of informative
feature vectors from the binary sequences (Fig. 2). Thereby, given a binary sequence
as an input to the ML model will provide the respective function as the output. In this
way, we provide a taxonomy of Recurrent Neural Network (RNN), a deep learning-
based function identification model, since it can incorporate context and efficiently
scale the computation and memory based on the binary sequence length.
However, the performance of the RNN relies on various intrinsic parameters like
learning rate, weight decay, number of epochs, batch size, momentum, etc. The
identification of optimal values of these parameters enhances the performance of
RNN in terms of maximum classification accuracy. Hyperparameters of RNN can
be optimized using simple metaheuristic approaches such as swarm optimizations,
genetic algorithms, etc. which were predominantly used for parameter tuning in deep
learning approaches.
As a takeaway of this study, the following key aspects should be considered while
evaluating the performance of the ML-based function identification model:
(i) Accuracy of the ML approach towards the complete dataset
(ii) Run time performance
(iii) Stability of the results over different compiler architectures/optimization
The Good, The Bad, and The Missing: A Comprehensive Study … 405
7 Conclusions
References
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 407
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_32
408 A. Pramanik and A. K. Das
1 Introduction
An urban area can be upgraded to become a smart city by allowing the connection
among intelligent vehicles who work together to accomplish complex jobs [1]. Beside
this advantage, this causes cybersecurity challenges for the network of the connected
vehicles. Nowadays, as the cyberattacks have been increasing, the cybercriminals
try to attack the network to disrupt all types of communication on the network [2].
These communications are made through Controller Area Network (CAN), and it
provides the protocol to make the communication [3]. The main motivation behind
this network is to improve traffic safety and driving efficiency.
In all industries, the technological impact has been increasing, and many intelligent
devices are introducing in all sectors. Therefore, the use of IoT has been increasing.
There are many applications of IoT, and one of the important applications is the
Internet of vehicles (IoV) [4]. IoV is an intelligent system for transportation. It
consists of hardware and numerous networks that permit cars to share information
with several components in real time [5]. The IoV made five kinds of communication
through the network. These are as follows:
1. Vehicle to Sensor (V2S) [6] structure allows vehicles to send information to the
sensors. Then the sensors send that information to the microcontroller.
2. Vehicle to Vehicle (V2V) [7] structure helps to interchange data regarding the
position and speed of a vehicle to the other neighboring vehicles wirelessly.
3. Vehicle to Infrastructure (V2I) [8] structures allow vehicles to share information
with the supporting Roadside Units (RSUs) wirelessly.
4. Vehicle to Cloud (V2C) [9] structures give access to the vehicles to gain addi-
tional data via Application Program Interfaces (APIs) from the Internet.
5. Vehicle to Personal devices (V2P) [10] structures provide permission to share
information to any electronic device from vehicles.
The Controller Area Network (CAN) bus provides the protocol to establish these
types of communication between vehicles and the microcontroller component.
Figure 1 describes the vehicle communication through the Internet.
A Controller Area Network (CAN) bus specifies the standard during communication
between the Electronic Control Units (ECUs) [12]. It provides a message which is
broadcasted to the destination from the transmitter. This message contains several
Ensemble Machine Learning Approach … 409
fields. The standard CAN bus data frame format [13] contains Start of Frame (SOF),
Message Identifier (MID), Remote Transmission Request (RTR), Reserved field (R),
Data Length Code (DLC), Data field, Cyclic Redundancy Check (CRC) sequence,
CRC Delimiter Bits (CRC DB), Acknowledgment (ACK) field, ACK Delimiter Bits
(ACK DB), and End Of Frame (EOF) field [13] as shown in Table 1.
The SOF and EOF are used to denote starting and ending of the remote and data
frame. The arbitration ID contains the message ID and Remote Transmission Request
(RTR) bits to differentiate the remote and data frames. It is used to recognize the
Electronics Control Unit (ECU) and also gives information about the priority of the
packet. The Reserved field and DLC induce the length of the message ID and the
size of the data. The Data field contains the data of the data frame. The utilization of
the CRC field during packet transmission is to encounter the error in the data packet.
The Acknowledgment field confirms the arrival of an authentic CAN packet [14].
410 A. Pramanik and A. K. Das
The data packet is broadcasted over the bus to all the ECUs, but it does not provide
any information about the address. As the packet contains numerous arbitration IDs,
the devices can transmit the data packet to different devices. The CAN bus network
allows the addition of new nodes and also receives every packet in the absence of
the sender’s address in the message. Hence, the harmful packets can be sent to the
devices easily [15].
In CAN bus, any device can send or read data from any device as the devices do not
verify the packets. As there is no strict security protection, the system can get attack
easily. The attacks generally occur are as follows:
• Denial of Service (DoS) attack: In this attack, the authorized users cannot get
access to the network facilities due to jams caused by unauthorized access to
the network. The attacker introduces superior instruction (ID 0000) in every 0.3
milliseconds. The valid instruction does not get the chance to access the network
in time as the priority of ‘0000’ is higher [16, 17].
• Fuzzy attack: In this attack, the attacker gathers information about IDs. Then they
randomly send packets of random IDs into the network. Therefore, the vehicles
start behaving abnormally [16, 17].
• Spoofing attack: In this attack, the attacker steals information of the particular
type of IDs. Then they inject messages of certain IDs such as Revolutions Per
Minute (RPM) IDs or gear IDs in every 1 millisecond. After that, the vehicles start
behaving unexpectedly [16, 17].
Therefore, attack detection is important for in-vehicle communication to provide
security in this network so that the vehicles can communicate with each other without
any malicious attack and perform well.
The attack detection of CAN bus has become an important topic in academia as
well as in industry. Therefore, many researchers have proposed different approaches
on this matter. Seo et al. [16] proposed a GAN-based machine learning technique
to detect intrusion on the system. They have extracted the useful information from
CAN IDs and converted those IDs to a image by encrypting one hot vector. In their
proposed GAN algorithm, they have combined the discriminators to differentiate
the attack information from normal messages. Lee et al. [18] proposed a method of
sending the remote frame to a receiver having a particular identifier for detecting any
kind of attack on the CAN bus network. This strategy is based on CAN’s offset ratio
and time delay between request and reply messages. Each node has a constant offset
Ensemble Machine Learning Approach … 411
ratio of reply and time delay in a non-attack state; however, these parameters fluctuate
in an attack state. Xiao et al. [3] proposed the SIMATT-SECCU framework to detect
intrusion on the CAN bus and build a security control unit (SECCU) by combining
the benefits of LSTM units and GRUs, as well as the simplified attention model
(SIMATT) to reduce overhead in computation. Song et al. [17] proposed an attack
detection method using deep convolution neural network. Their proposed method
utilizes the Inception–ResNet model’s structure by lowering the number of layers
and size in the architecture. In IDS for CAN bus, Tian et al. [19] have employed
a machine learning approach called Gradient Boosting Decision Tree (GBDT) and
proposed a new feature based on entropy as the GBDT algorithm’s feature creation.
Hu et al. [20] proposed robust anomaly detection using support vector machines. They
have applied their model to noisy data also. Hossain et al. [21] suggested an in-vehicle
CAN bus communications intrusion detection system based on LSTMs. They have
created new dataset by first extracting non-attack data from their experimental vehicle
and then adding attacks later. Barletta et al. [22] proposed an unsupervised Kohonen
Self-Organizing Map (SOM) network approach to detect intrusion for in-vehicle
communication networks. It is an Artificial Neural Network (ANN) that allows high-
dimensional data to be visualized on a two-dimensional map. Han et al. [23] presented
a method based on the survival analysis model to detect intrusion for vehicular
networks . The survival analysis is a statistical tool for determining which factors
influence the measurement of object’s survival rate and duration. The suggested
approach is concerned with the rate of survival of a single CAN ID in a chunk
unit. Tariq1 et al. [24] proposed an ensemble-based approach, and in their method,
they have used heuristics and Recurrent Neural Networks (RNNs) for predictions of
attacks. We have seen the proposed-related works are not able to get good accuracy
to detect all types of attacks. The above mentioned models are quite complex and
also not able to get good accuracy for all kinds of attacks. In our work, we have tried
to get better accuracy for all types of attacks on Internet of vehicles communication
using a less complex model.
1.5 Contribution
The movement of autonomous vehicles has increased with the advancement of tech-
nology. Injection of any wrong information in the CAN during vehicle communi-
cation via the Internet and cloud can disrupt the movement of vehicles. Therefore,
intrusions detection in CAN has been an important task for the cybersecurity sys-
tem. Thus, the main objective of this paper is to propose an effective CAN bus attack
detection algorithm for maintaining the security of the vehicle network. Initially, the
dataset has processed, and then the djb2 [25] hashing operation has performed over
the data and transformed into a new form by getting the hash values for the whole
data. Then the class imbalance nature of the dataset has also checked and applied
undersampling operation to get uniform distribution of non-attack and attack data
of the dataset. Here, we have considered DoS attack, Fuzzy attack, Gear attack, and
412 A. Pramanik and A. K. Das
RPM attack. The datasets for these attacks are class imbalanced datasets. Next, we
have applied our proposed method on class balanced dataset to classify the normal
data from attacked data. The proposed method is an ensemble machine learning algo-
rithm to detect various attacks with more accuracy. The ensemble method combines
KNN classifier with the XGBoost classifier to predict the attacks more accurately
than the individual classifiers. We have performed ten-fold cross-validation over the
dataset to evaluate the performance of the proposed model. It has been seen that
the performance of the model is better compared to other related works to detect
all types of attacks of in-vehicle communication, and also the architecture of the
proposed model is less complex compared to them.
The rest of the section has assembled as follows: Sect. 2 represents the prepro-
cessing of the dataset and the framework of our proposed model. The experimental
results for evaluating the model are drawn in Sect. 3, and finally, the conclusion and
future scope have discussed in Sect. 4.
2 Proposed Methodology
In the proposed method, an ensemble machine learning algorithm has been used.
The main purpose of using machine learning is to build a generalized algorithm from
a finite training dataset that can detect the attack on any new data. In the proposed
ensemble machine learning technique, we have stacked KNN with XGBoost algo-
rithm. The purpose of applying the XGBoost algorithm as it supports regularization
prevents the model from overfitting and also performs gradient boosting by consid-
ering gradient in the loss function [26]. This algorithm has also the ability of parallel
processing. Then the XGBoost algorithm is stacked with the KNN for better per-
formance of the proposed model and to get accurate result. The motivation behind
using KNN is it does not require any training time and learns from the training data
only at the time of making prediction [27].
j=m
L(θ ) = r (x j , x j ) + (gn ). (1)
j=1 n
j=m
Lk = r ((x j , x (k−1)
j ) + gt ( p j )) + (gn ) (2)
j=1
j=m
1
L ≈k
[r (x j , x (k−1)
j ) + d j gk ( p j ) + f j gk2 ( p j )] + (gk ) (3)
j=1
2
j=m j=m
where d j = φx(k−1) j=1 r (x j , x (k−1) ) and f j = φx2 (k−1) j=1 r (x j , x (k−1) ) are
first and second-order gradient statistics on the loss function [28]. In the next step,
the constant terms have been removed from Eq. (3) to achieve the following reduced
objective (4) at step k.
j=m
1
L k = d j gk ( p j ) + f j gk2 ( p j )) + (gk ) (4)
j=1
2
Equation (5) has been written from Eq. (4) by putting value of Eq. (1) as below
j=m
1 1 2
ı=S
k
L = d j gk ( p j ) + f j gk ( p j )) + (gk ) + α(S) + γ
2
v (5)
j=1
2 2 i=1 i
Now, Ji = [ j| p(q j ) = i] represents the sample set of leaf i. The optimal weight vi
has been computed of leaf i for a stable structure p(q) by the following formula [28].
j Ji dj
vi = (6)
fj + γ
414 A. Pramanik and A. K. Das
In next Eq. (7), the optimal result has been obtained from Eq. (6)
ı=S
1 j Ji d j
[ + α(S)] (7)
2 i=1 f j + γ
The above Eq. (7) has been used to calculate the score that determines the structural
quality of a tree [28].
Now, considering the sample sets of left and right nodes after the split are JL and
J R , where J =JL ∪ J R , then by substituting into gain [28], we got
1 ( j JL l d j )2 j J R l d j )
2
j Jl d j )
2
L split =− + − −α (8)
2 j JL f j + γ j J R f j + γ j J f j + γ
Now considering our dataset, we are performing the construction of decision trees
by XGBoost algorithm.
DLC DLC
(8, 5, 2)
(8, 5, 2)
The left and right trees have been drawn based on DLC field of our dataset, as it
contains different parameters such as 8, 5, and 2. Based on these values, the possible
trees have been drawn. Then the similarity weights are calculated for every node
of the left and right tree. After calculating the score by Eq. (7), the overall gain is
calculated for both trees by using the formula of Eq. (8). After that, we consider the
splitting of nodes for the tree that gets more gain compared to the others splitting.
In this way, the splitting operation has been performed considering the different
attributes of the dataset.
information of that particular message ID that contains any missing value. As the
dataset is a class imbalance dataset, an undersampling operation has been applied
over the dataset to get uniform distribution of the data of different classes.
Feature selection: In our dataset, the features are time stamps, arbitration ID, data
length code (DLC), and 8-byte data. The dataset contains labeled data. We have
selected all these features to train our model.
Building model: In our proposed work, we have used the KNN stacked with the
XGBoost model to predict the attack during in-vehicle communication. Here, we
have chosen KNN and XGBoost as the base classifier. In KNN, we have considered
seven neighbor training examples to predict the class. Then the combination of the
predictions of level 0 has been used in level 1 for the final prediction. In level 1, we
have chosen XGBoost as the meta-classifier. Figure 2 represents the architecture of
the proposed model.
We have performed ten-fold cross-validation operation on the dataset to get the
accuracy score of our proposed model. At first, we have split the data into ten-folds,
and from each of the unique folds, we have selected one-fold as the test data and the
remaining k-1 have selected for training purposes.
3 Performance Evaluation
In this section, we have given details of the performance of our proposed method.
We have applied our proposed method on the publicly available dataset1 and com-
pared the performance of our proposed approach to other related work. In this case,
the actual dataset has been transformed into a new form by applying djb2 hashing
operation over the whole data. The hexadecimal data of different length has been
converted into numeric values that give single value for the whole data of a particular
Table 2 Comparison of accuracy between our proposed method with other related work
Proposed method Accuracy in DoS Accuracy in Accuracy in gear Accuracy in RPM
attack fuzzy attack attack attack
DNN + Triplet 84 84 83 85
DNN + SVM 78 75 78 78
DNN + Softmax 61 60 63 63
Reduced 86 85.6 86 86
Inception–ResNet
XGBoost with 89 87 88 88
KNN
Table 3 Tenfold cross-validation score by our proposed method for different types of attack on
in-vehicle communication
Attack type DoS Fuzzy Gear RPM
1 89.2 87.3 88 87.9
2 89 87 87.8 88.2
3 88.8 86.5 88.3 88.1
4 89.3 87 88 87.8
5 89 86.9 87.9 87.6
6 88.6 87.2 88.2 87.9
7 89.1 87.1 88.1 88.2
8 89 87.1 87.8 88.3
9 89 86.9 88 88.3
10 89.1 87.2 88.1 88
Mean 89.01 87.02 88.02 88.03
message, and then the accuracy of different methods has been observed and given in
the below table.
In Table 2, it can be seen that the models have been built using Deep Neural
Network(DNN) with triplet, SVM, and softmax classifier that have not performed
well compared to the reduced Inception–ResNet model and our proposed model for
DoS, Gear, and RPM attack. It has also been seen that the accuracy of the reduced
Inception–ResNet model is less compared to our proposed model. In our proposed
method, we improved the model performance by getting better accuracy for all types
of the attack in vehicular communication. The proposed model has achieved 88%
accuracy for Gear and RPM attack, 87% accuracy for Fuzzy attack, and 89% accuracy
for DoS attack. Hence, we can say that our proposed model performs better compared
to the mentioned method for different types of attack that happen during message
transmission in a vehicular network.
In Table 3, the accuracy score after applying ten-fold cross-validation on the
dataset of DoS attack, Fuzzy attack, Gear attack, and RPM attack has been given,
Ensemble Machine Learning Approach … 417
and the mean score has also been given in the table. We have got a mean score of
89.01% for DoS attack, 87.02% for Fuzzy attack, and 88% for RPM attack and Gear
attack.
The drawback of our proposed method is it takes a long time for the large dataset
but performs well for all types of attacks during in-vehicle communication. In future
work, we can apply this method to other types of attacks such as impersonate attacks
on the CAN bus network and other types of intrusion detection in the Internet of
things (IoT) by removing this drawback of our model.
4 Conclusion
In this paper, we have mentioned the vulnerability of the autonomous vehicle network
system. We have proposed an ensemble machine learning method to detect cyber
attacks into the system. In our proposed method, we have used KNN and XGBoost as
a base classifier, and we have chosen XGBoost as a meta-classifier. As the dataset for
different attacks is class imbalance, an undersampling operation has been performed
over the dataset to create the balanced dataset in our method. We have compared
our model performance with other researcher’s work in this paper, and it has been
seen that our model provides better accuracy for detecting the attacks compared to
them. The accuracy for DoS, Gear, RPM, and Fuzzy attacks has reached 89%, 88%,
88%, and 87%. In future work, it can be applied to detect other types of intrusion in
the domain of cybersecurity, and computational time can also be reduced to detect
attacks in real time.
References
1. L.M. Ang, K.P. Seng, G.K. Ijemaru, A.M. Zungeru, Deployment of IOV for smart cities:
applications, architecture, and challenges. IEEE Access 7, 6473–6492 (2018)
2. J. Jang-Jaccard, S. Nepal, A survey of emerging threats in cybersecurity. J. Comput. Syst. Sci.
80(5), 973–993 (2014)
3. J. Xiao, H. Wu, X. Li, Internet of things meets vehicles: sheltering in-vehicle network through
lightweight machine learning. Symmetry 11(11), 1388 (2019)
4. W. Wu, Z. Yang, K. Li: Internet of vehicles and applications, in Internet of Things (Elsevier,
2016), pp. 299–317
5. L. Tuyisenge, M. Ayaida, S. Tohme, L.E. Afilal, Network architectures in internet of vehi-
cles (IOV): Review, protocols analysis, challenges and issues, in International Conference on
Internet of Vehicles (Springer, 2018), pp. 3–13
6. B. Ji, X. Zhang, S. Mumtaz, C. Han, C. Li, H. Wen, D. Wang, Survey on the internet of vehicles:
network architectures and applications. IEEE Commun. Stand. Mag. 4(1), 34–41 (2020)
7. A. Demba, D.P. Möller, Vehicle-to-vehicle communication technology, in 2018 IEEE Interna-
tional Conference on Electro/Information Technology (EIT) (IEEE, 2018), pp. 0459–0464
8. C. Wietfeld, C. Ide: Vehicle-to-infrastructure communications, in Vehicular Communications
and Networks (Elsevier, 2015), pp. 3–28
418 A. Pramanik and A. K. Das
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 419
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_33
420 S. Bhoite et al.
1 Introduction
2 Related Work
The researchers have studied various related national and international research
papers and thesis to understand objectives, type of algorithms used, datasets, data
pre-processing methods, features selection methods, etc. Kalathiya et al. [3] used
different machine learning algorithms to analyze students’ admission preferences.
They found random forest classifier is a good classifier as its accuracy is very high.
Khandale and Bhoite [4] used different machine learning models to analyze students’
placement in the early stage of their academics. They found an AdaBoost classifier
along with the bagging and decision tree as the base classifier as its accuracy is very
high. Nie et al. [2] used logistic regression (LR), SVM, random forest (RF), and
decision tree (DT) to propose a system for advanced forecasting of career choices
for college students based on campus big data. Random forest performs compara-
tively better than the other methods. Roy et al. [5] used advanced machine learning
algorithms like SVM, random forest decision tree, OneHot encoding, XGboost to
predict students’ careers. Out of all, SVM gave more accuracy than the XGBoost.
Waghmode and Jamsandekar [6] designed a framework for the expert system useful
for career selection. Machine learning algorithms ID3, PRISM, and PART give 100%
Predictive Analytics of Engineering and Technology Admissions 421
accuracy in classification along with rules. Gorad et al. [7] developed a Web applica-
tion based on student’s personality trait, interest, and capacity to take up the course
that would help students for their careers. The prediction is done using one of the
decision tree algorithms which is the C5.0 package in Borgavakar and Shrivastava
[8] used a k-means clustering algorithm to classify student’s grades on basis of class
tests, mid-test, and final test into three categories: ‘High’, ‘Medium’, and ‘Low’.
Saikuswanth et al. [9] developed a system on the basis of student marks in math-
ematics, physics, and some questions expert system and artificial neural network
are applied for student career assessment to analyze their capabilities whether they
are perfect for the job. Padmapriya* [10] recommended that the data mining algo-
rithm decision tree induction is best when compared to a naïve Bayesian classi-
fier according to classification accuracy misclassification rate, speed, and size on
students’ personal data, pre-college data and under graduation data to predict higher
education admissibility. Bibodi et al. [11] have done work on predicting the university
when students apply to explicit universities. The Random forest provided better accu-
racy than other algorithms, i.e., 90% accuracy. Sonawane and Dondio [12] explored
the system to help the students to find the best foreign universities/colleges based
on their performance. He used three algorithms: KNN (76% accuracy), decision tree
(80% accuracy), and logistic regression (68% accuracy). Aljasmi et al. [13] experi-
mented with machine learning models multiple linear regression, K-nearest neighbor,
random forest, and multilayer perceptron to predict the opportunity of a student to
get admitted to a master’s program. Experiments showed that the multilayer percep-
tron model surpasses other models. Apoorva et al. [14] were helped the students by
providing an open-source machine learning model to know their chance of admission
into a particular university in the USA with high accuracy. Naveen et al. [15] used
different regression models and chose the best among them all regression models to
advise students to plan and choose a career for them and join the university as per
choice. The random forest has given good results among all models.
3 Research Methodology
The proposed work was carried out by performing exploratory analysis and
experiments on various machine learning algorithms.
When we start to develop an ML model, it is essential to analyze the data first, which
has been done using statistical and visualization techniques. It will bring our focus
on important aspects of it for further analysis. This process helps in the following
manner.
422 S. Bhoite et al.
• Get an understanding of the statistical properties of the data, the schema of the
data,
• Get an understanding of the missing values and inconsistent data types.
• And also get an understanding of the predictive power of the data, such as the
correlation of features against target variable.
In the process of EDA, we have drawn and analyzed the following graphs [16].
6 Feature Engineering
In general, every machine learning algorithm takes some input data to generate
desired outputs. These input data are called features, which are usually presented in
structured columns. As per goal or objectives, algorithms require input features with
some specific characteristic to get desired output. Hence, there is a need for feature
engineering. Feature engineering efforts mainly have two goals:
• Generating the proper input dataset, as per the requirement of the machine learning
algorithm.
• Improving the performance of machine learning models.
In the machine learning project, data preparation is very important. The following
steps are carried out to prepare the dataset ready.
• Missing Values
• Handling categorical data (Label Encoder)
• Change in data type
• Drop columns
7 Feature Selection
Every time, domain experts of the problem may not be available to decide independent
features to predict the category of the target feature. Hence, before fitting the model,
we must make sure that all the features that we have selected are contributing to
the model properly and weights assigned to it are good enough so that our model
gives satisfactory accuracy. For that, we have used three feature selection techniques:
univariate selection, recursive features importance, and feature importance [18]. We
used the Python scikit-learn library to implement it.
Univariate selection method shows the highest score for the following features.
While using the recursive feature importance method, the following features are
selected and the remaining are rejected.
Selected Features: [‘Candidate Type’, ‘Category’, ‘PH Type’, ‘Defense Type’,
‘HSC Eligibility’, ‘BRANCH’].
Inbuilt class feature importance comes with tree-based classifiers, we used an extra
tree classifier from Python scikit-learn library for extracting the top 7 features of the
dataset (Fig. 1).
Predictive Analytics of Engineering and Technology Admissions 425
8 Experimentation
Table 4 (continued)
S. Name of Data Data splitting Parameter tuned Number of
No. algorithm splitting folds/ratio parameter
method tested
used
3 Decision K-FCV 3 5 10 max_depth 7 to 15
tree min_impurity_decrease 7 to 15
max_leaf_nodes 7 to 15
min_leaf_nodes 7 to 15
max_features 7 to 15
T-TS 70:30 80:20 90:10 max_depth 7 to 15
min_impurity_decrease 7 to 15
max_leaf_nodes 7 to 15
min_leaf_nodes 7 to 15
max_features 7 to 15
4 Random K-FCV 3 5 10 max_depth 7 to 15
forest min_impurity_decrease 7 to 15
max_leaf_nodes 7 to 15
min_leaf_nodes 7 to 15
max_features 7 to 15
T-TS 70:30 80:20 90:10 max_depth 7 to 15
min_impurity_decrease 7 to 15
max_leaf_nodes 7 to 15
min_leaf_nodes 7 to 15
max_features 7 to 15
5 Gaussian K-FCV 3 5 10 7 to 15
NB T-TS 70:30 80:20 90:10 7 to 15
6 K K-FCV 3 5 10 leaf_size 7 to 15
neighbors n_neighbors 7 to 15
classifier
T-TS 70:30 80:20 90:10 leaf_size 7 to 15
n_neighbors 7 to 15
After cleaning all the data, removing all the noise, selecting relevant features, and
encoding it into machine learning form, the next step is to build a predictive model
by applying various ML techniques to find out the best model which gives us more
accuracy for train and test both.
After implementing all the above methods mentioned in Tables 6 and 7, we found
decision tree is the best classifier to predict the probability list of top colleges for any
individual.
As we have seen, in our dataset, 27 college records are present. And being college
as the target variable, it will have 27 different classes. Also, it is observed that
though each class is having a different number of records present in the dataset,
the values of selected features are quite similar. Hence, the dataset was balanced.
This understanding is very important while selecting a model as per performance
accuracy.
We can see using K-fold cross-validation we found 0.98 accuracies for the decision
tree classifier for training as well as testing accuracy. Under EL, out of 3 classifiers,
428 S. Bhoite et al.
we found the highest accuracy for the AdaBoost classifier (DT), which is 0.64 for
both train and test. But which is comparatively very less. And for gradient boosting
classifier model was found to be overfitted with 1 for training and 0.99 for testing
accuracy. Some datasets are easy to get high accuracy because though the difference
between classes is very high, similarity among classes sample is also high. Hence,
we have chosen a decision tree classifier to implement the model.
10 Implementation
While selecting a college for engineering admission we have proposed the following
FGEAAPS Web module. The aspirant parent or student has to submit some basic
information like HSC marks, merit marks, home university, branch, etc. Then, they
will get a probability list of 3 colleges from higher to lower probability, where they
will get admission.
Predictive Analytics of Engineering and Technology Admissions 429
11 Conclusion
In this research paper to predict college for an engineering admission, EDA, feature
selection, label encoding, feature scaling, normalization, and standardization are
rigorously implemented on the dataset using various Python libraries to prepare the
dataset ready to apply ML algorithms. For this study, 8 input features are selected
out of 20 features, which are ‘Merit Marks’, ‘Candidate Type’, ‘Category’, ‘Home
University’, ‘PH Type’, ‘Defense Type’, ‘HSC Eligibility’ and ‘BRANCH’. These
features are very important according to Univariate Selection, Recursive Features
Importance and Lasso feature selection methods. Massive EDA is used by checking
and plotting correlations between each input feature with the target feature. We have
built the ML models for the prediction of the college by a testing suite of ML classifi-
cation algorithms and EL methods. The suit contains Logistic Regression, K Nearest
Neighbors’, Decision Tree Classifier, Random Forest Classifier, Naive Bayes and
Support Vector Machine classifiers. Under EL we have tested Adaptive Boosting,
Gradient Boosting, and GridSearchCV methods. Nevertheless, EL methods are
popular and give the best performance on a predictive modeling project, we have
received opposite results. After comparison, the Decision tree gave higher accu-
racy for college prediction project when compared with other approaches. Finally,
430 S. Bhoite et al.
References
16. M.L. Waskom, Seaborn: statistical data visualization. J. Open Source Softw. 6(60), 3021 (2021).
https://doi.org/10.21105/joss.03021
17. S. Bhoite, (2021). https://www.kaggle.com/drsachinbhoite/engineering-technology-students-
admission-data
18. P. Moulos, I. Kanaris, G. Bontempi, Stability of feature selection algorithms for classification
in high-throughput genomics datasets, in 2013 IEEE 13th International Conference on Bioin-
formatics and Bioengineering (BIBE), pp. 1–4 (2013). https://doi.org/10.1109/BIBE.2013.670
1677
19. T.O. Ayodele, Types of machine learning algorithms, in New advances in machine learning,
vol. 3, pp. 19–48 (2010)
20. X. Wang, G. Gong, N. Li, Automated recognition of epileptic EEG states using a combination
of symlet wavelet processing, gradient boosting machine, and grid search optimizer. Sensors
19(2), 219 (2019)
21. S. Ahmed, A. Zade, S. Gore, P. Gaikwad, M. Kolhal, Smart system for placement prediction
using data mining. Int. J. Res. Appl Sci. Eng. Technol. (IJRASET) 5 (2017). ISSN: 2321-9653.
www.ijraset.com
22. S. Bhoite, (2021). https://sbresearchproject.herokuapp.com/
Investigating the Impact of COVID-19
on Important Economic Indicators
Abstract COVID-19 has impacted the world unlike any other world event in our
recent memory. Entire humanity has been afflicted by this pandemic. As a conse-
quence of the pandemic, the governments around the world have decided to impose
lockdowns restricting economic interactions and relationships in a scale and form
which has not been witnessed by the modern man ever in his memory. The general
assumption here is that growing COVID-19 patient and mortality counts give rise
to a greater sense of uncertainty, and this greatly impacts the prices. It is imperative
thus for both the researcher community to observe and investigate the influence of
COVID-19 patient and mortality counts on geopolitical and economic index indica-
tors as well as the influence of these COVID-19 indicators upon important economic
indicators such as the gold price as well as stock market prices. For this specific pur-
pose, this work investigates the influence of COVID-19 patient and monthly death
counts on the economic indicators of gold and stock market prices.
1 Introduction
In the contemporary world, COVID-19 pandemic has spread all over the world and
created major health crisis causing deaths and dislocations to millions of people
across the globe. This has caused severe lockdowns all over the world which results
in a complete halt toward any economic activities throughout the world. The current
D. Banerjee (B)
Sarva Siksha Mission, Kolkata, India
e-mail: debanjanbanerjee2009@gmail.com
A. Ghosal
St. Thomas’ College of Engineering and Technology, Kolkata, India
I. Mukherjee
Indian Institute of Information Technology, Kalyani, India
e-mail: imon@iiitkalyani.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 433
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_34
434 D. Banerjee et al.
work looks at investigating the impact of these severe economic disruptions on the
Indian economy. The current work makes the assumption that the rising monthly
COVID-19 positive and fatality counts influence the geopolitical and economic
uncertainity, and these very COVID-19-related indicators also influence important
macroeconomic indicators such as gold price, inflation and stock market prices. The
current work obtains monthly COVID-19 patient growth and monthly COVID-19
fatality counts from the Worldometer source alongside obtaining geopolitical and
economic uncertainity alongside gold, inflation and stock market prices from var-
ious Internet sources. The current work performs linear regression between these
continuous variables to understand the influence of COVID-19-related indicators on
uncertainity and economic indicators.
2 Related Work
The researcher community has been investigating whether and how much the political
events of the day impact the macroeconomics of a country. Abdel-Latif et al. [1]
investigated the aspect of financial liquidity in the space of the primarily oil-producing
West Asian economies. Ahir et al. [2] had come up with their own world uncertainity
index based upon the word uncertainity with respect geopolitical and economic
considerations as described in the annually published International Economic Unit
country reports. Alqahtani et al. [3]. worked upon the importance of geopolitics
with respect to GCC countries. Antonakakis et al. [4] analyzed the relation between
geopolitics and its influence upon oil prices and stock prices.
Apergis et al. [5] discussed how geopolitical uncertainity impacts stock market
investment returns. Aysan et al. [6] discussed the impact of geopolitical uncertainity
on the Bitcoin and other cryptocurrencies. Baker et al. [7] investigated possible impact
of events such as the US presidential elections or American foreign interventions
with respect to the US economic growth. Balcilar et al. [8] investigated the relation
between uncertainty and stock market risks among the BRICS countries. Banerjee
et al. [9] applied the geopolitical risk for predicting the price of London Gold Price
fix prices.
Barro and Ursua [11] and similarly Barro [11] as well as Blum [13] have observed
that severe impact on stock market by unpredicted macroeconomic events. Baur et
al. [12] analyzed the hedging relationship between geopolitical risk uncertainity and
precious metals. Caldara et al. [14] first came out with the geopolitical risk factor
while using the total amounts of references in news media.
Das et al. [15] utilized panel data techniques to understand how FDI impacts the
labor productivity in the Indian banking sector. Jiaying and Stanislav [16] performed
panel data regression on multivariate data for econometric considerations. Saiz and
Simhonson [17] have observed that specific newspaper items like terrorism and war
significantly impact Western economies with the example of the USA.
Investigating the Impact of COVID-19 on Important Economic Indicators 435
3 Proposed Method
The present work investigates the influence of the COVID-19-related indicators, i.e.,
monthly change in total COVID-19 positive counts and monthly change in total
COVID-19 death counts on geopolitical and economic uncertainity as well as gold
price and stock market prices.
• The work assumes that growth in monthly COVID-19 positive and fatality counts
influences geopolitical and economic uncertainity as well as economic indicators
such as gold price, stock market price.
• The change in percentage in the total monthly COVID-19 positive counts and
COVID-19 fatality counts is computed. These are continuous-type variables; thus,
it is possible to perform linear regression with these cross-sectional-type data. The
data are collected from Worldometer online source.
• The geopolitical and economic uncertainity indicators are also obtained from
online resources like Google search trend Web site as well as policyuncertain-
ity.com.
• The work also obtains important economic indicators such as gold price, stock
price.
• The work applies linear regression with at first uncertainity indicators as the Y
variable and the COVID-19 indicators and the X variable and then economic
indicators as the Y variable and the COVID-19 indicators as the X variables.
• The work operates based upon the null hypothesis that the COVID-19 indicators
do not have any influence over either uncertainity or economic indicators.
• Based upon the regression statistics, i.e., p-values and F-statistics values, the null
hypothesis is either accepted or rejected. If the p-values and F-statistics values are
less than 0.01, then only null hypothesis is accepted, otherwise it is rejected.
• The current work assumes that most uncertainity in the popular minds is created
by rapid growth in COVID-19 patient and fatality counts.
• The COVID-19 monthly statistics have been derived from Worldometer online
resources.
• The work obtains these data on a monthly basis from March 2020 to August 2021.
• This work utilizes the percentage of change in the monthwise total COVID-19
patient and fatality counts.
• These computed variables are continuous-type data and therefore in normal dis-
tribution.
• The continuous nature of these types of data allows linear regression.
• The current work computes the COVID-19 patient change percentage on the below
general formula: s * 100/t and COVID-19 fatality change percentage as p * 100/r
436 D. Banerjee et al.
• Here, s represents total COVID-19 patient count in the month (i+1), and t represents
total COVID-19 patient count in the monthi
• Here, p represents total COVID-19 fatality count in the month (i+1), and r repre-
sents total COVID-19 fatality count in the monthi
In the above equation, Eq: 1, mcpcp represents monthly COVID-19 positive change
percentage. The equation to express monthly COVID-19 death growth percentage
can also be expressed as the following:
The present work utilizes geopolitical uncertainity and economic uncertainity monthly
indicators for utilizing the regression technique in order to determine causation
between geopolitical and economic uncertainity factors and important economic
indicators such as gold price, stock market price and cryptocurrency prices.
• Geopolitical risk index : This work obtains the count of total utterances of the
word “geopolitical uncertainit” from Jan 2020 till May 2021 by using Google
trends web data analytics online web resource. This count has been considered
by the current work as the geopolitical risk index. We calculate the monthly
growth percentage of geopolitical risk index variable.
• Economic uncertainity index : This work utilizes the economic uncertainity index
by Baker et al. [4]. We calculate the monthly growth percentage of economic
uncertainity index since this is a continuous-type variable in normal distribution,
therefore, important for causation with respect to the economic indictors gold
price, stock market and cryptocurrency prices.
Investigating the Impact of COVID-19 on Important Economic Indicators 437
In this subsection, the formulae through which geopolitical and economic risk indi-
cator growth percentages have been derived have been discussed.
This work performs the regression techniques with the help of the R-programming
language library lm. The lm is utilized since all the four main independent and
influencing variables do have continuous type, and therefore, using linear regression
technique in this case will be very useful. The target variable that has been used in
these regression techniques is the Bombay Nifty Gold Price, BSE Sensex, Bombay
Stock Exchange cryptocurrency price. Table 1 depicts all the results obtained by
experimenting with the above mentioned techniques.
The work performs regression based upon following criterion.
• The work utilizes a null hypothesis to begin with the regression procedures. The
null hypothesis is applicable to all the other features. The null hypothesis is that
the utilized feature does not influence various target variables which are in this
case, i.e., BSE gold price, BSE Sensex, BSE cryptocurrency price.
• The null hypothesis can be rejected based upon two given conditions, i.e., the prob.
value and the F-statistic prob. value; both are having a value lesser than 0.01, and
henceforth, the alternate hypothesis can be accepted.
• The target variables in these regression procedures are BSE gold price, BSE Sen-
sex, BSE cryptocurrency price, respectively.
• The work considers the Y -variable in this case as Bombay Stock Exchange gold
price (monthly average value), and the X variables in this case are COVID-19
positive case growth monthly percentge, COVID-19 death count growth monthly
percentage, monthly geopolitical risk indicator value, monthly economic policy
indicator value, respectively.
• The formulae in this case will be (Y − X 1 + error), whereas X 1 is COVID-19
positive case growth monthly percentage.
• Tthe formulae in this case will be lm which comes from a R-programming language
programming library.
• Once the above regression statistics are gathered particularly the probability p-
value and the F-statistics value, then the work utilizes the same formulae using
other independent variables like Bombay Stock Exchange (BSE) Sensex price,
Bombay Stock Exchange cryptocurrency price.
• The formulae in this case will be (Y − X 2 + error), (Y − X 3 + error) and (Y − X 4
+ error), whereas Y represents the Bombay Stock Exchange gold price. The X2, X3
and X4 are COVID-19 death count growth monthly percentage, monthly geopo-
litical risk indicator value, monthly economic policy indicator value, respectively.
Similarly for these variables also, we need to obtain the probability p-value and
the F-statistics values.
• Once the regression statistics are obtained in the above-mentioned manner, the
work introduces BSE Sensex, BSE cryptocurrency price as Y variables, respec-
tively. The Y -variable here will be first used as BSE Sensex and then Y -variable
that will be used as BSE cryptocurrency price.
• The p-values and F-statistic values are always calculated.
Investigating the Impact of COVID-19 on Important Economic Indicators 439
• All the above-explained data explanation points have been depicted as observed
during the experiments with the aforementioned variables.
• The software tool R has been utilized for this purpose.
• All data that have been utilized during this regression process are actually open
source-based data.
• Given the nature of the regression, we can say that this is clearly a case of cross-
sectional data that have been utilized for regression purpose.
• The cross-sectional regression has been performed since all the variables, i.e.,
the dependent as well as independent variables involved in this occasion are all
continuous variables, and moreover, no time-related variables are involved in these
experiments.
• All the geopolitical risk indicators have been computed based upon the complete
monthly search counts of terms “geopolitical" as well as “uncertainity" as shown
in the Google search trends for India as a country during the months from January
2020 till June 2021.
• All the cross-sectional data-related regression have been rechecked after perform-
ing the same equations while using the Excel sheet-based regression formulae.
Only after similar results have been found between both the Excel formulae and
R-programming language programming tools, we come to a conclusion and pub-
lish it.
• It can be derived from Table 1 that COVID-19-related indicators are weakly cor-
related with geopolitical and uncertainity indexes as in both cases, the correlation
value is less than 0.3.
• The conclusion that can be made from Table 1 is that COVID-19-related indicators
are weakly correlated with the gold prices as in both cases, the correlation value
is less than 0.3.
440 D. Banerjee et al.
6 Discussion
The current work concludes from the observations as depicted in Table 2 that the
following features do not influence the target variables since
• COVID-19 patient growth percentage has probability 1.0125 and F-statistic prob-
ability 3.2071 when the target variable is BSE gold price. Both of these values are
more than 0.01.
• COVID-19 death count growth percentage feature has probability 1.1192 and F-
statistic probability 7.1296 when the target variable is BSE gold price. Both of
these values are more than 0.01.
• COVID-19 patient growth percentage has probability 7.0125 and F-statistic prob-
ability 2.1271 when the target variable is BSE Sensex price. Both of these values
are more than 0.01.
• COVID-19 death count growth percentage feature has probability 9.1742 and F-
statistic probability 6.0106 when the target variable is BSE Sensex price. Both of
these values are more than 0.01.
• COVID-19 patient growth percentage has probability 8.0692 and F-statistic prob-
ability 7.4071 when the target variable is BSE cryptocurrency price. Both of these
values are more than 0.01.
• COVID-19 death count growth percentage feature contains probability 4.1802
and F-statistic probability 7.6496 when the target variable is BSE cryptocurrency
price. Both of these values are more than 0.01.
• COVID-19 patient growth percentage has probability 6.0178 and F-statistic prob-
ability 3.2671 when the target variable is inflation India. Both of these values are
more than 0.01.
• COVID-19 death count growth percentage feature has probability 5.9198 and F-
statistic probability 9.1071 when the target variable is inflation India. Both of these
values are more than 0.01.
The current work also makes conclusions from the observations as depicted in Table
3 that the following features do not influence the target variables since
• COVID-19 patient growth percentage has probability 5.1984 and F-statistic prob-
ability 6.7812 when the target variable is geopolitical uncertainity index. Both of
these values are more than 0.01.
• COVID-19 death count growth percentage feature has probability 8.1192 and F-
statistic probability 7.6957 when the target variable is geopolitical uncertainity
index. Both of these values are more than 0.01.
• COVID-19 patient growth percentage has probability 2.0125 and F-statistic prob-
ability 2.6273 when the target variable is economic uncertainity index. Both of
these values are more than 0.01.
• COVID-19 death count growth percentage attribute contains probability 4.1179
and the attribute F-statistic probability 4.0109 when the target variable is economic
uncertainity index. Both of these values are more than 0.01.
442 D. Banerjee et al.
Table 3 Statistical relationship between COVID-19 factors and economic geopolitical uncertainty
7 Conclusion
A very nuanced and careful observation over the impact of COVID-19 features such
as COVID-19 total positive count as well as COVID-19 total death count indicates
to us that they are not as influential when it comes to important target variables such
as geopoltical and economic uncertainity as well as important economic indicators
such as BSE Sensex price, gold and cryptocurrency prices and inflation.
It can be deduced that initially people were apprehensive and anxious about the
uncertain natures of the pandemic; however, people usually adopted to this anxiety as
a new normal. Therefore, though initially the COVID-19 indicators did influence the
economic and uncertainity indicators in the sense that as the COVID-19 indicators
increased the other indicators also increased, however, as people later adopted to the
uncertainity of the situation and the government came up with economic incentives,
the COVID-19 indicators began to lose influence over the uncertainity and economic
indicators.
However, since the COVID-19 is still creating many variants according to the
World Health Organization, it is important that further information is collected from
around the world to ensure that a timely and accurate picture can be inferred from
the rapidly expanding information base of COVID-19-related information especially
the total patient counts and total death counts and its influence over the important
economic indicators such as BSE Sensex price, gold and cryptocurrency prices and
Inflation.
Investigating the Impact of COVID-19 on Important Economic Indicators 443
8 Future Work
Future work would include that more COVID-19 indicators such as R-factor monthly
growth and hospitalization rates other than the ones discussed in the present work
should be investigated for their influence on geopolitical and economic uncertainity
indicators in this country. Considering the huge population of India, a region-wise
data collection and regression processes need to be considered. More diverse sources
of data gathering are more important in order for further investigation.
References
1. H. Abdel-Latif, M. El-Gamal, Financial liquidity, geopolitics, and oil prices. Energy Econ. 87,
104482 (2020)
2. H. Ahir, N. Bloom, D. Furceri, The world uncertainty index. The World Uncertainity, pp 1–37
(2018)
3. A. Alqahtani, E. Bouri, X.V. Vo. Predictability of GCC stock returns: the role of geopolitical
risk and crude oil returns. Econ. Anal. Policy 171–181 (2020)
4. N. Antonakakis, R. Gupta, C. Kollias, S. Papadamou, Geopolitical risks and the oil-stock nexus
over 18992016. Fin. Res. Lett. 23, 165–173 (2017)
5. N. Apergis, M. Bonato, R. Gupta, C. Kyei, Does geopolitical risks predict stock returns and
volatility of leading defense companies? Evidence from a nonparametric approach. Defence
Peace Econ. 29(6), 684–696 (2018)
6. A.F. Aysan, E. Demir, G. Gozgor, C.K.M. Lau, Effects of the geopolitical risks on bitcoin
returns and volatility. Res. Int. Bus. Finance 47, 511–518 (2019)
7. S.R. Baker, N. Bloom, S.J. Davis, Measuring economic policy uncertainity. Q. J. Econ. 131(4),
1593 (2016)
8. M. Balcilar, M. Bonato, R. Demirer, R. Gupta, Geopolitical risks and stock market dynamics
of the BRICKs. Econ. Syst. 42(2), 295–306 (2018)
9. D. Banerjee, A. Ghosal, I. Mukherjee, Prediction of gold price movement using geopolitical
risk as a factor, in Emerging Technologies in Data Mining and Information Security (Springer,
Singapore, 2019), pp. 879–886
10. R.J. Barro, J.F. Ursua, Rare macroeconomic disasters. Ann. Rev. Econ. 4(1), 83–109 (2012)
11. R.J. Barro, Rare disasters and asset markets in the twentieth century. Q. J. Econ. 121(3), 823–
866 (2006)
12. D.G. Baur, L.A. Smales, Hedging geopolitical risk with precious metals. J. Bank. Fin. 117,
105823 (2020)
13. N. Bloom, The impact of uncertainity shocks. Econometrica 77(3), 623–685 (2009)
14. D. Caldara, M. Iacoviello: Measuring geopolitical risk. Board of Working Paper (2016)
15. G. Das, B. Ray Chaudhuri,Impact of FDI on labour productivity of Indian IT firms: horizontal
spillover effects. J. Soc. Manag. Sci. XLVII(2) (2018)
16. G. Jiaying, V. Stanislav, Panel data quantile regression with grouped fixed effects. J. Econ.
213(1), 68–91 (2019)
17. A. Saiz, U. Simonsohn, Proxying for unobservable variables with internet document frequency.
J. Eu. Econ. Assoc. 11(1), pp. 137–165 (2013)
Classification of Tumorous
and Non-tumorous Brain MRI Images
Based on a Deep-Convolution Neural
Network Model
Abstract A brain tumor is one of the most critical diseases, observed due to
unwanted cell division. The survival rate of brain tumor is very high if it is iden-
tified in the grade I stage, and early treatment is started. There are different well-
appreciated techniques, introduced by different researchers, which are available to
classify tumorous and non-tumorous MRI images. Still, these techniques are not
sufficient to meet the current requirement. To satisfy the present requirement, we
proposed a novel reinforcement learning-based deep-convolution neural network
(CNN) model to identify tumors in brain MRI images. In this technique, the region
of interest is extracted, and labeling of each image is performed. This step is followed
by a preprocessing (noise removal) technique. Then, we consider a standard dataset,
and it is segregated into three classes—70% for training, 15% for validation, and 15%
for testing. Based on the layered architecture of the deep-convolution neural network
model, the hyper-parameters of our proposed method are calculated for the training
dataset. This step is followed by the validation and testing step. The combination of
reinforcement learning and deep learning while designing CNN, the layered architec-
ture makes our model unique. After consulting all the cases, we gained commendable
results compared to other methods. The novelty, high-performance capability of this
method can be appreciated. This method may be implemented as a mobile or Web
application.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 445
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_35
446 D. Chowdhury et al.
1 Introduction
The normal brain contains multiple brain tissues and segments [20] (Shown in
Fig. 1a). Abnormal growth of brain tissue is known as brain tumor [3]. Among various
types of brain tumors (Shown in Fig. 1b), based on their types (primary, secondary,
etc.,), grades (in terms of aggressiveness), etc., their viability is disclosed. Based on
the understanding of its location, generation point of the tumor cell, and its type,
proper treatment of brain tumor can be started. Damaging different areas of brain
tissues is considered an adverse effect of brain tumors. Non-identification at its initial
stage and carelessness in treatment may increase the mortality rate. Its patients often
find various symptoms such as headaches, loss of consciousness, vomiting, etc. [3].
4% of the world’s cancer population is suffering due to brain and central nervous
system (CNS) cancer. As per world cancer research journal statistics, 12.7 males and
10.7 females per 100,000 people in the world [17] are affected by this type of cancer.
It is addressed as a predominant issue of society, and an automated-computer-based
model is required which will increase the precision level of classification and detec-
tion of a brain tumor among doctors. As a result, it will help the doctors to start early
and accurate treatment for the patient so that the survival rate of the patient can be
increased. We found that many of the previous methods are incomplete. Some of
them operate up to the brain segmentation level only. Some of them do the feature
selection only, and some of them perform only classification. Hence, to generate
a completely automated method for doctors, we use the concept of reinforcement
learning and deep learning-based [1]. CNN to build a model for image classification
[2]. If we look into the organization of the paper, then we can find that it is segregated
into several sections. Section 2 describes the main methodology, merits, and demerits
of some of the previous research papers; Sect. 3 describes a detailed explanation of
our proposed model to overcome the various difficulties as mentioned in Sect. 2,
whereas the experimental results along with conclusion are presented in Sects. 4 and
5, respectively.
(a) Brain structure (b) Tumorous Brain MRI (c) Non-Tumorous Brain MRI
2 Literature Survey
In this section, we have presented the different existing methods, their advantages,
and disadvantages with the help of a chronological table (Table. 1) to display different
types of brain tumor classifications and detection strategies so that we can compare
them, identify the drawbacks, and proposed our method in such a way that we can
overcome most of them.
Region of tumor is detected using features that are developed manually. It is so
happed for old methodologies. Detection of gliomas tumors or glioblastomas tumors
is rather difficult due to their rare discerning features. These generic features are rarely
worked in reality. To tackle this problem, we propose an end-to-end reinforcement
learning-based deep learning pipeline that learns the features in iterations for better
performance on the given task.
3 Proposed Methodology
For brain MRI classification and detection, we are proposing a newly developed
reinforcement learning-based deep-CNN-based strategy for MRI classification. Our
proposed technique is divided into multiple steps as follows:
To extract the images from the selected dataset, we are doing the following steps:
(a) setting the augmented path, (b) load up the data. We are setting the augmented
path from the directory from where we can receive augmented data (both “yes” and
“no”, “yes” folder in the directory contains the brain tumorous MRI images, and
“no” folder in the directory contains the brain non-tumorous MRI images (Shown in
Figs. 1c and 5) contains both doctors identified samples and non-identified samples.
448 D. Chowdhury et al.
To load the data, we are using the loading_data_up algorithm using Eqs.1 and 2.
The following algorithm takes two arguments; the first one is a list of directory paths
for the folders “yes” and “no” that contain the image data, and the second argument
is the image size and for every image in both directories and does the following:
450 D. Chowdhury et al.
In this section, we are segregating the tumorous and non-tumorous data. We are
placing the cropped MRI image in the “yes” set; if it has a tumor otherwise, we
are keeping the cropped MRI image in the “no” set. Then, we are splitting the
X and y objects into training, validation (development), and testing sets. In the current
method, we are splitting the data in the following way: 70% of the data for training,
15% of the data for validation, and 15% of the data for testing.
(g) One O/P unit with a completely interrelated layer. It holds one neuron and a
sigmoid activation method for binary classification.
To perform the network training, we can use the train_data_model algorithm using
Eq. (7), which is working as follows:
The performance of the algorithm can be calculated with the help of accuracy
score and F1 score matrices on the validation dataset and test dataset. Our proposed
method can also be represented using the following block diagram (Shown in Fig. 3)
to understand the algorithms at a glance.
454 D. Chowdhury et al.
4 Experimental Results
To evaluate the performance of the proposed method, here we consider the Kaggle
brain tumor dataset [16], which consists of 1085 tumorous and 980 non-tumorous
images. The size of brain tumor images in the dataset may vary, and it can be of
different types (color, gray, etc.,), formats (.jpg, jpeg, etc.,), and resolutions. The
proposed method is applied in the Python environment, version 3.7, with the hardware
configuration of an Intel processor (Core i5), a RAM of 8 GB size, and a graphics
card of 4 GB size. We have used the anaconda as a distributor of Python. Through
an anaconda navigator, we are running the base environment, which will be useful to
access the Jupiter notebook as an open Web application source and to run our code.
It produces good results concerning earlier methods after F1 and accuracy score
calculation.
While the execution of augmented_path algorithm (Sect. 3.1), we are getting
2065 no. of samples, and X is in the shape of (2065, 240, 240, 3), where 2065 is
the no. of samples, 240 is the image width, and 240 is the image height, and 3
denotes the no. of channels. After running the loading_data_up (Sect. 3.1) and crop-
ping_brain_contour_generation algorithm (Sect. 3.2), we are finding the cropping
results of the MRI images (Shown in Fig. 4).
After cropping of each image is segregated into two categories, “yes” (tumorous)
and “no” (non-tumorous) and we obtain the following results (Fig. 5 shows a portion
of non-tumorous, Fig. 6 shows a portion of tumorous images):
Classification of Tumorous and Non-tumorous Brain MRI Images … 455
We are splitting the X and Y objects into training, validation (development), and
testing sets. We split the dataset as: for training → 70%, for validation → 15%,
and for testing → 15%. After execution, we receive no_of_traing_samples = 1445;
no_of_development_samples = 310; no_tast_samples = 310; training shape values
are (1445, 240, 240, 3) and (1445, 1), respectively; development shape values are
(310, 240, 240, 3), and (310, 1); test shape values are (310, 240, 240, 3) and (310,
1). After developing the proposed reinforcement-based deep-CNN-based multi-
layered model, it is trained through the trainning_data_modell algorithm (discuss
in Sect. 3.4), and it trains on n_no_of_samples and validates on m_no_of_samples,
which may be observed from Table 2. No. of iterations/epochs have been changed,
and the total time consumed by the algorithm is calculated every time. The layered
architecture and hyperparameters of this model can be observed in Fig. 7.
After calculation of loss (Shown in Fig. 8) and accuracy (Shown in Fig. 9)
concerning training and validation, following two graphs are generated:
We are experimenting with our building model with the best models so that the
best validation accuracy can be achieved. After the 23rd iteration, we achieve 91%
accuracy. We find that regarding testing data best model gives accuracy as follows:
test loss = 0.33 and test accuracy = 0.89. The F1 score is 0.88. For validation, the
data model gives a 0.91 F1 score. The percentage of positive (tumorous samples
are indicated by pos variable or positive in Table 3) and negative (non-tumorous
samples are indicated by neg variable or negative in Table 3) examples/samples are
given below:
In the end, we are finding that our model can be used for brain tumor detec-
tion and classification from MRI images with test_set_accuracy of 88.7% and
test_set_F1_score of 0.88. As the data are balanced, we are considering the results
to be satisfied. If you compare our algorithm with the rest of the classification-based
algorithms (given in Table 4), then we may find a good considerable result.
Classification of Tumorous and Non-tumorous Brain MRI Images … 457
5 Conclusion
This paper illustrates the classification of tumorous and non-tumorous brain MRI
images based on reinforcement learning-based deep-CNN model. The combination
of reinforcement learning and deep learning while designing CNN, the layered archi-
tecture makes our model unique. The proposed method is tested on the Kaggle brain
MRI dataset [16]. The experimental results show the effectiveness of the proposed
method in terms of accuracy and F1 score. Some concluding observations regarding
this paper are—the proposed algorithm gives a better result in terms of accuracy
and F1 score concerning other existing techniques. The accuracy score of 91% for
458 D. Chowdhury et al.
validation data, 89% for the test data, and F1_score of 0.91 on validation data and
0.88 on test data is obtained for our proposed method, as shown in Table 4. It applies
to large datasets. We find that it is very simple and efficient. In the future, accu-
racy and F1 score may be improved. Several other matrices can be included in our
proposed technique to measure the performance such as precision score, recall score,
and roc. In the future, this model can be utilized to categorize other kinds of cancer
or diseases.
Classification of Tumorous and Non-tumorous Brain MRI Images … 459
References
1. R.S. Sutton, A.G. Barto, Reinforcement Learning: An Introduction (MIT press, 2018), pp. 56–
58
2. M. Jain, P.S. Tomar, Review of image classification methods and techniques. Int. J. Eng. Res.
Technol. (IJERT) 2(8) (2013). ISSN: 2278-0181
3. D.G. Macenka, L. Hays, A. Varner, E. Weiss, P.Y. Wen, Frankly speaking about
cancer: Brain tumors. Cancer Support Community and the National Brain Tumor
Society web resource. http://blog.braintumor.org/files/public-docs/frankly-speaking-about-can
cer-brain-tumors.pdf. Accessed 30 January 2021
4. D.M. Joshi, N.K. Rana, V.M. Misra, Classification of Brain cancer using artificial neural
network, in 2nd International Conference on Electronic Computer Technology (IEEE, 2010),
pp. 112–116. https://doi.org/10.1109/ICECTECH.2010.5479975
5. M.U. Akram, A. Usman, Computer-aided system for brain tumor detection and segmentation,
in International Conference on Computer Networks and Information Technology (IEEE, 2011),
pp. 299–302. https://doi.org/10.1109/ICCNIT.2011.6020885
6. I. Maiti, M. Chakraborty, A new method for brain tumor segmentation based on watershed and
edge detection algorithms in HSV color model, in 2012 National Conference on Computing
and Communication Systems (IEEE, 2021), pp. 1–5. https://doi.org/10.1109/NCCCS.2012.641
3020
7. D. Sridhar, I.M. Krishna, Brain tumor classification using discrete cosine transform and
probabilistic neural network, in 2013 International Conference on Signal Processing, Image
Processing & Pattern Recognition (IEEE, 2013), pp. 92–96. https://doi.org/10.1109/ICSIPR.
2013.6497966
8. H.S.M. Abdulbaqi, Z. Mat, A.F. Omar, I.S.B. Mustafa, L.K. Abood, Detecting brain tumor in
magnetic resonance images using hidden Markov random fields and threshold techniques, in
2014 IEEE Student Conference on Research and Development (IEEE, 2014), pp. 1–5. https://
doi.org/10.1109/SCORED.2014.7072963
460 D. Chowdhury et al.
9. Parveen, A. Singh, Detection of a brain tumor in MRI images, using a combination of fuzzy
c-means and SVM, in 2015 2nd International Conference on Signal Processing and Integrated
Networks (SPIN) (IEEE, 2015), pp. 98–102. https://doi.org/10.1109/SPIN.2015.7095308
10. R. Ahmmed, M.F. Hossain, Tumor detection in brain MRI image using template-based K-
means and fuzzy C-means clustering algorithm, in 2016 International Conference on Computer
Communication and Informatics (ICCCI) (IEEE, 2016), pp. 1–6. https://doi.org/10.1109/
ICCCI.2016.7479972
11. S. Banerjee, S. Mitra, B.U. Shankar, Synergetic neuro-fuzzy feature selection and classification
of brain tumors, in 2017 IEEE International Conference on Fuzzy Systems (FUZZ-IEEE) (IEEE,
2017), pp. 1–6. https://doi.org/10.1109/FUZZ-IEEE.2017.8015514
12. F.P. Polly, S.K. Shil, M.A. Hossain, A. Ayman, Y.M. Jang, Detection and classification of
HGG and LGG brain tumor using machine learning, in 2018 International Conference on
Information Networking (ICOIN) (IEEE, 2018), pp. 813–817. https://doi.org/10.1109/ICOIN.
2018.8343231
13. M. Gurbină, M. Lascu, D. Lascu, Tumor detection and classification of MRI brain image
using different wavelet transforms and support vector machines, in 2019 42nd International
Conference on Telecommunications and Signal Processing (TSP) (IEEE, 2019), pp. 505–508.
https://doi.org/10.1109/TSP.2019.8769040
14. D. Divyamary, S. Gopika, S. Pradeeba, M. Bhuvaneswari, Brain tumor detection from MRI
images using Naive classifier, in 2020 6th International Conference on Advanced Computing
and Communication Systems (ICACCS) (IEEE, 2020), pp. 620–622. https://doi.org/10.1109/
ICACCS48705.2020.9074213
15. A. Biswas, M.S. Islam, Brain tumor types classification using K-means clustering and ANN
approach, in 2021 2nd International Conference on Robotics, Electrical and Signal Processing
Techniques (ICREST) (IEEE, 2021), pp. 654–658. https://doi.org/10.1109/ICREST51555.
2021.9331115
16. N. Chakrabarty, Brain MRI images for brain tumor detection. Kaggle web resource. https://
www.kaggle.com/navoneel/brain-mri-images-for-brain-tumor-detection. Accessed 1 April
2021
17. K.H. Kalan Farmanfarma, M. Mohammadian, Z. Shahabinia, S. Hassanipour, H. Salehiniya,
Brain cancer in the world: an epidemiological review. World Cancer Res. J. (WCRJ) 7(1356)
(2019). https://doi.org/10.32113/wcrj_20197_1356
18. Aiwale, Ansari, Brain tumor detection using KNN. https://doi.org/10.13140/RG.2.2.35232.
12800
19. G. Rajesh Chandra, K.R.H. Rao, Tumor detection in brain using genetic algorithm. Proc.
Comput. Sci. 79, 449–457 (2016). ISSN 1877-0509. https://doi.org/10.1016/j.procs.2016.
03.058
20. The Human Brain Atlas, Proteinatlas we resource, https://www.proteinatlas.org/humanprot
eome/brain. Accessed 5 Aug 2021
Social Distance Monitoring and Face
Mask Detection Using Deep Learning
1 Introduction
Monitoring social distancing regulations and manually checking people’s masks are
most likely to lead to scarcity of resources and are supposed to allow errors creeping
in due to human intervention. There is an urgent need to solve the virus transmission
by studying the ideal social distancing rules. This system includes the detection of
people violating social distance regulations and the classification of masks. Citizens
have ensured safety with strict rules by checking whether sufficient distance and mask
usage are observed. The availability of several cameras in public places improved
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 461
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_36
462 K. Yagna Sai Surya et al.
the enormous applicability of the system. It has the scope to have techniques to
process which can be used to keep an eye on many utilities. This piece of work
discusses a lightweight system that is likely to help in the prevention of COVID-19
as it uses surveillance through video mode for identifying and confirming that the
social distancing is maintained by everyone so that the spread can be reduced which
further helps in the slowdown of the virus transmission.
2 Literature Survey
A virtual model of social distancing was evaluated to help people who are reminded
in public places. Use a distance ruler to measure the gap. The method involves
two-dimensional person recognition and person recognition from different angles,
social distance monitoring, and mask recognition [1]. The method proposed is a
balance between ResNet-50 and other standard machine learning algorithms classi-
fiers (support vector machines (SVM)), decision tree, and set) to enhance the perfor-
mance of the model [2] to achieve good accuracy (68%) for the classification of
fake masks. As per suggestions in [3], integration of inverse perspective mapping
(IPM) technology with a proposed deep neural networks (DNN) is more effective.
The SORT tracking algorithm provides accurate and universal human tracking by
detection which can be used in other applications.
Context-attention RCNN is a structure that is capable of detecting the structure
of wearing face covers, and it amplifies the intra-class abbreviate and extricates
by recognizing highlights for this purpose [4]. Another method is a model called
SocialdistancingNet-19 which can detect a person’s frame and display a label. If the
distance is below a certain value, it can help determine whether it is safe or dangerous
and uses the gravitational center to calculate the distance between people (violent
method) [5].
One of the accurate face mask detectors is the RetinaFace Mask [6]. In a single
stage, it detects it, and for fusing multiple feature maps with high-level semantic
information, it uses a feature pyramid network. For detecting face masks, it is a
context-attention module [7]. The research proposed an intelligence system based on
thermal imaging to classify people’s social distance and implemented an algorithm
to measure and classify the distance among people and automatically check and
monitor whether social distance rules are followed or violated [8, 9].
Social Distance Monitoring and Face Mask Detection Using Deep … 463
The human identification system was classified to monitor social distancing and
security breaches during the pandemic [10]. The pre-trained repeated CNN model
is to identify different models [11–13]. The human recognition process is done by
segmenting the spots. This location is being tracked. Using single object recognition
(SSD) deep learning methods, MobileNetV2 and OpenCV will determine social
distance and mask [14, 15]. The YOLOv3 model which identifies and detects objects
and OpenCV image processing library are used to start the project [16]. The project
will play a major role in areas where many people are expected, such as shopping
malls, movie theaters, or airports. Through this project, we can ensure that people
follow the socialization process.
A new facial recognition machine uses principal component analysis (PCA) and
convolutional neural networks is the faster-RCNN for human imaging detection
[17]. From a higher perspective, the forms of people are very different. By using
transfer learning, the new training level is also integrated with the previously trained
architecture [18] overall image performance by achieving 96% accuracy and 0.6%
false alarm rate. Taking into consideration several features like a collection of data
quickly, evaluation of policy accuracies, adjustment of policies which are responding
well, which indicate that the understanding of economic consequences and health of
public a cost–benefit analysis was performed in [19].
3 Method
The methods followed in this paper use deep learning. We use OpenCV, Keras, and
TensorFlow to detect masks and the MobilenetV2 as the basis of the classifier. Also,
the definition of CNN architecture with weights being rained beforehand is carried
out through YOLO object detector files. OpenCV’s DNN module has compatibility
with this YOLO model. Object recognition is to identify everyone (only people) in
the stream and calculate the Euclidean distance between all recognized people. Use
these distances (Fig. 1) to check if the distance between two people is less than N
Pixels and display safe or unsafe. If a person wears a mask and follows the social
distance protocol, they will be in a safe zone where if a person does not wear a mask,
then it will be shown in a red rectangle box and with the message of Alert as well.
3.1 Dataset
We train our dataset for the face mask detection model in this module. The dataset
contains two sub-folder. With_mask (WM) and without_mask (WOM) (each folder
464 K. Yagna Sai Surya et al.
Fig. 1 Workflow of the social distance between the people in the frame
contains 2 k samples of images). Face images are having or not having masks that
have an average width of 278.77 and an average height of 283.68.
3.2 Training
Using TensorFlow/Keras, the model is trained, and the face mask detector (MD) is
serialized to the disk of the computer. When MD is trained with the images from the
dataset, load the MD, then start detecting the faces, and then divide each face as WM
(if a mask is detected) or WOM (if no facial mask is detected). We will cover each
stage and its sub-divisions in detail in the rest of this guide, but now, let us look at the
dataset we used to train the COVID-19 facial mask detector. This process comprises
three steps as detailed below.
(1) Pre-trained ImageNet weights are loaded into MobileNet, and then take your
attention away from the network.
(2) Create a new FC header and insert it into the database to replace the old header.
(3) Freeze network benchmarks: Table 1 shows the benchmarks of this model.
To train a customized mask detection mechanism, we divided our design into two
independent stages, then further divided into sub-stages (Fig. 2).
The base layer weights are not updated during the back-propagation of errors.
Instead, they are adjusted. Finally, the Adams optimizer, binary cross-entropy, and
learning rate reducer are used to compile the model. If classes more than two are
observed, then classification cross-entropy must be used.
After completing the training, evaluate the resulting model in the test set. The
next step is to make predictions in the test set to get the most likely category label
index. Then, print out the classification report and check it on the terminal. Then,
serialize the classification through the mask. After completion, the last step is to draw
the accuracy and loss curve of this model. Once the plot is ready, it is automatically
saved in the path of the plot file on the disk.
Fig. 3 Plot diagram of accuracy and curves of the face detection model
4 Procedure
With an eye on the property of YOLO, which returns the center coordinates first and
then returns the height and weight, the coordinates of the bounding box coordinates
are scaled back. From the center of the bounding box, its left and top corners can be
generated.
To calculate the Euclidean distance, import libraries TensorFlow, Keras, NumPy,
and SciPy. SciPy is used to compute distance using Euclidean distance. There is a file
named coco.names (pre-trained model) (Fig. 4), which contains a list of 80 object
classes that the model can recognize. This model is only trained for these 80 object
classes.
The network object is ready for execution, throws an exception in failure cases,
and assigns the pixel values about the threshold value 0.4. The webcam sets the
5 Experiment
Now, the model is tested with input images of people in groups (Fig. 6). Figure 8
represents people wearing mask, while Fig. 10 represents people not wearing mask.
In Figs. 7 and 9, many violations are identified and alert message is displayed in red
color indicating that people are not following measures, whereas in Fig. 11, alert
message is displayed to wear a mask.
6 Results
When wearing masks, they are in a safe area. This will be displayed as a green
rectangle with a safety warning. If a person is not wearing a mask, Fig. 12, they will
be displayed as a red rectangle with a warning. Similarly, if they maintain social
Social Distance Monitoring and Face Mask Detection Using Deep … 473
distance, then Fig. 13 will be shown in inexperienced rectangular box with safe alert
message. Wherever if they do not follow social distancing and not wear a mask
(Fig. 14), then the system will show an associate alert message with red rectangular
box as shown in Fig. 15.
Here, we considered different cases:
7 Conclusion
We obtained high accuracy by taking ML tools and basic techniques which are
reasonable. Several different applications can use this method. A red frame and a
red line are used to measure the distance between pedestrians walking on the street.
Video confirms this approach. The visual results show that the proposed method can
identify social isolation measures, which can be developed for other environments,
such as offices and restaurants. Public health systems can be greatly influenced by
deploying this model.
474 K. Yagna Sai Surya et al.
Fig. 14 Evaluating social distance and face mask, case 3: with more people in the frame
Social Distance Monitoring and Face Mask Detection Using Deep … 475
Fig. 15 Displaying alerts considering both social distance and face mask, case 4: with more people
in the frame
8 Future Scope
References
1. M. Sohan, So you need datasets for your COVID-19 detection research using machine learning.
arXiv:2008.05906
2. M. Loey, G. Manogaran, M.H.N. Taha, N.E.M. Khalifad, A hybrid deep transfer learning
model with machine learning methods for face mask detection in the era of the COVID-19
pandemic (2021). PMID: 32834324, PMCID: PMC7386450. https://doi.org/10.1016/j.measur
ement.2020.108288
3. M. Rezaei, M. Azarmi, DeepSOCIAL: social distancing monitoring and infection risk
assessment in COVID-19 pandemic. MDPI (2020). https://doi.org/10.1101/2020.08.27.201
83277
4. J. Zhang, F. Han, Y. Chun, W. Chen, A novel detection framework about conditions of wearing
face mask for helping control the spread of COVID-19. IEEE (2021). https://doi.org/10.1109/
ACCESS.2021.3066538
5. U. Singhania, B.K. Tripathy, Text-based image retrieval using deep learning, in Encyclopedia
of Information Science and Technology, 5th edn. (IGI Global, USA, 2020), pp. 87–97
476 K. Yagna Sai Surya et al.
6. V. Prakash, B.K. Tripathy, Recent advancements in automatic sign language recognition (SLR),
in Computational Intelligence for Human Action Recognition (CRC Press, 2020), pp. 1–24
7. M. Jiang, X. Fan, H. Yan, RetinaMask: a face mask detector (2020) (this version, v2). arXiv:
2005.03950
8. S. Saponara, A. Elhanashi, A. Gagliardi, Implementing a real-time, AI-based, people detection
and social distancing measuring system for Covid-19. J. Real-Time Image Proc. (2021). https://
doi.org/10.1007/s11554-021-01070-6
9. Vinitha, Velantina, Social distancing detection system with artificial intelligence using
computer vision and deep learning. Int. Res. J. Eng. Technol. (IRJET) (2020). e-ISSN:
2395-0056
10. R. Debgupta, B.B. Chaudhuri, B.K. Tripathy, A wide ResNet-based approach for age and
gender estimation in face images, in Proceedings of International Conference on Innovative
Computing and Communications (Springer, Singapore, 2020), pp. 517–530
11. M. Cristani, A.D. Bue, V. Murino, F. Setti, A. Vinciarelli, The visual social distancing problem.
IEEE Access 8, 126876–126886 (2020). https://doi.org/10.1109/ACCESS.2020.3008370
12. A. Adate, B.K. Tripathy, Understanding single image super-resolution techniques with genera-
tive adversarial networks, in Advances in Intelligent Systems and Computing, vol. 816 (Springer,
Singapore, 2019), pp. 833–840
13. A. Adate, B.K. Tripathy, Deep learning techniques for image processing, in Machine Learning
for Big Data Analysis. (Boston, De Gruyter, Berlin, 2018), pp. 69–90
14. P. Nagrath, R. Jain, A. Madana, R. Arora, P. Kataria, J. Hemanth, SSDMNV2: a real-time DNN-
based face mask detection system using single shot multibox detector and MobileNetV2. 66,
102692 (2021)
15. S. Yadav, Deep learning-based safe social distancing and face mask detection in public areas for
COVID19 safety guidelines adherence. IJRASET 8(VII), 4 (2020). https://doi.org/10.22214/
ijraset.2020.30560
16. S. Srivastava1, I. Gupta, G. Upadhyay, U. Goradiya, Social distance detector using YOLO v3.
Int. Res. J. Eng. Technol. (IRJET) (2021). e-ISSN: 2395-0056
17. A.H. Ahamad, N. Zaini, M.F.A. Latip, Person detection for social distancing and safety violation
alert based on segmented ROI, in 10th IEEE International Conference on Control Page |
29System, Computing and Engineering (ICCSCE) (Penang, Malaysia, 2020), pp. 113–118.
https://doi.org/10.1109/ICCSCE50387.2020.9204934
18. I. Ahmed, M. Ahmad, J.J.P.C. Rodrigues, G. Jeon, S. Din, A deep learning-based social distance
monitoring framework for COVID-19 (2020). https://doi.org/10.1016/j.scs.2020.102571
19. L. Thunström, S.C. Newbold, D. Finnoff, M. Ashworth, J.F. Shogren, The benefits and costs of
using social distancing to flatten the curve for COVID-19, Cambridge University Press (2020)
An Effective VM Consolidation
Mechanism by Using the Hybridization
of PSO and Cuckoo Search Algorithms
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 477
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_37
478 S. Mangalampalli et al.
1 Introduction
2 Related Works
In this section, we have systematically formulated the problem. Let us assume that
we have k number of tasks {ta1 , ta2 , ta3 , ta4 , ta5 , … tak }, n number of VMs are to
be represented as {Vm1 , Vm2 , Vm3 , Vm4 … Vmn }, i number of physical machines
are to be indicated as {Ph1 , Ph2 , Ph3 , Ph4 … Phi } and j number of Datacenters to be
represented as {d 1 , d 2 , d 3 , d 4 … d j }. Now, these n number of virtual machines are to
be intelligently mapped onto i number physical machines based on status which is
given by resource manager in the cloud paradigm thereby we can address the energy
efficiency parameter in cloud paradigm.
For effective allocation of VMs onto Physical hosts, we need to identify the status
of virtual machines whenever they are getting tasks onto them and then we are
keeping a threshold value i.e., VM status index which is based on the utilization of
CPU.
We have assumed that the VM status index is divided into three types. i.e., over-
utilized VM, underutilized VM, and normal utilized VMs.
We have written the rules for the VM status index
• If utilization of CPU is less than 25% then it is to be represented as underutilized
and VM status index is set to be 1.
• If utilization of CPU is between 25 and 59% then it is to be represented as in
normal condition and VM status index is said to be 2.
• If utilization of CPU is more than 60% then is to be represented as over-utilized
and VM status index is set to be 3.
In the Proposed System architecture, initially users will submit tasks which were
submitted onto cloud console and these tasks are to be submitted onto task manager
and based on size and processing capacity of the task they will be given as input to
the scheduler which maps tasks to appropriate VMs. For every allocation of a new
VM onto a physical machine which was resided in the datacenters based on the VM
status index given by the resource manager.
In the cloud computing paradigm, a load balancer is connected to the resource
manager and verifying whether the VMs allocated onto the physical machines were
appropriate and any physical machines were overloaded or underloaded, or balanced.
In this paper, we have used the VM status index as the parameter to know the
status of VMs, and based on that VM status index load balancer have to balance and
An Effective VM Consolidation Mechanism … 481
allocate the corresponding VM onto a physical host, and thereby we want to address
the parameters named as energy consumption and makespan in cloud computing.
The proposed architecture is shown in Fig. 1.
The following are the metrics we need to address in this algorithm.
3.1 Makespan
where avail indicates the availability of a VM and exectn indicates execution time of
a task on a corresponding virtual machine.
482 S. Mangalampalli et al.
m t
ecs (vmn ) = comp
ecs (vmn , t) + ecs
idle
(vmn , t)dt (2)
0
From the above Eqs. 2 and 3, we will calculate total energy consumption in cloud
computing.
Hybrid CSPSO was proposed in [13], where CS is used as a local search process
and PSO is used as a global search process for optimization of solutions because
in Cloud Computing search space is dynamic and we cannot use a single algorithm
either PSO or CS to optimize solution. Each of these algorithms has its limitations
i.e., PSO could be trapped into local minima and CS can be used for the local search
process and that in the classical cuckoo search step size is limited to 1. In this search
process, CS is used for local search and PSO is used for the global search process and
An Effective VM Consolidation Mechanism … 483
Vi j (k + 1) = Vi j (k) + C1 R1 localibest
j (k) − x i j (k)
j (k) − x i j (k)
+ C2 R2 globalibest (7)
xi j (k + 1) = xi j (k) + Vi j (k + 1) (8)
−→ −→
lbest (k + 1) = −
→
xi (k + 1), if f (xi (k + 1)) < f lbest (k) (9)
−→
if not lbest (k)
−→
−
g−
→
best (k + 1) = arg min f lbest (k + 1) (10)
We have simulated Cloudsim [3]. For this simulation, we have considered 10 data-
centers, number of physical hosts used were 500, number of VMs used were 1000
and the capacity of each VM is 2048 MB and the hypervisor we have used is Xen.
For this simulation, we have used 1000 cloudlets and these were generated randomly
in Cloudsim and were evaluated against existing PSO and CS algorithms.
For 100 tasks PSO, CS, and CSPSO algorithms have a makespan of 1358.7,
1311.5, 1285.46 respectively. For 500 tasks PSO, CS, and CSPSO algorithms have
a makespan of 1845.8, 1756.8, 1723.78 respectively. For 1000 tasks PSO, CS, and
CSPSO algorithms have a makespan of 1285.46, 1723.78, 2056.78 respectively.
From Table 1 and Fig. 2, we can identify that our proposed approach outperforms
existing CS and PSO algorithms in terms of makespan.
Table 1 Evaluation of
Tasks PSO CS Proposed CSPSO
makespan
100 1358.7 1311.5 1285.46
500 1845.8 1756.8 1723.78
1000 2520.5 2146.9 2056.78
Fig. 2 Evaluation of
makespan
486 S. Mangalampalli et al.
Table 2 Evaluation of
Tasks PSO CS Proposed CSPSO
energy consumption
100 1145 947.8 925
500 2435 2097.8 1989.9
1000 3256 2476 2378
For 100 tasks PSO, CS, and CSPSO algorithms have consumed the energy of 1145,
947.8, and 925 W respectively. For 500 tasks PSO, CS, and CSPSO algorithms
have consumed energy of 2435, 2097.8, 1989.9 respectively. For 1000 tasks, PSO,
CS, and CSPSO algorithms have consumed energy of 3256, 2476, and 2378 watts
respectively. From Table 2 and Fig. 3, we can identify that our proposed algo-
rithm is outperformed over existing algorithms PSO and CS by minimizing energy
consumption.
the future, we want to evaluate the algorithm by using a grey wolf optimizer to
evaluate the efficiency of the algorithm.
References
1. F. Liu, J. Tong, J. Mao, R. Bohn, J. Messina, L. Badger, D. Leaf, NIST cloud computing
reference architecture. NIST Spec. Publ. 500, 1–28 (2011)
2. M.S. Sudheer, M. Vamsi Krishna, Dynamic PSO for task scheduling optimization in cloud
computing. Int. J. Recent Technol. Eng. 8(2), 332–338 (2019)
3. R.N. Calheiros et al., CloudSim: a toolkit for modeling and simulation of cloud computing
environments and evaluation of resource provisioning algorithms. Softw. Pract. Exp. 41(1),
23–50 (2011)
4. S.K. Mishra et al., Energy-efficient VM-placement in cloud data center. Sustain. Comput.
Inform. Syst. 20, 48–55 (2018)
5. M. Abdel-Basset, L. Abdle-Fatah, A.K. Sangaiah, An improved Lévy based whale optimization
algorithm for bandwidth-efficient virtual machine placement in cloud computing environment.
Clust. Comput. 22(4), 8319–8334 (2019)
6. E. Barlaskar, N. Ajith Singh, Y. Jayanta, Energy optimization methods for virtual machine
placement in cloud data center. ADBU J. Eng. Technol. 1 (2014)
7. A. Tripathi, I. Pathak, D.P. Vidyarthi, Modified dragonfly algorithm for optimal virtual machine
placement in cloud computing. J. Netw. Syst. Manage. 28, 1316–1342 (2020)
8. A. Tripathi, I. Pathak, D.P. Vidyarthi, Energy efficient VM placement for effective resource
utilization using modified binary PSO. Comput. J. 61(6), 832–846 (2018)
9. S. Gharehpasha, M. Masdari, A. Jafarian, Virtual machine placement in cloud data centers
using a hybrid multi-verse optimization algorithm. Artif. Intell. Rev. 1–37 (2020)
10. S. Gharehpasha, M. Masdari, A discrete chaotic multi-objective SCA-ALO optimization algo-
rithm for an optimal virtual machine placement in cloud data center. J. Ambient Intell. Humaniz.
Comput. 1–17 (2020)
11. E. Barlaskar, Y.J. Singh, B. Issac, Enhanced cuckoo search algorithm for virtual machine
placement in cloud data centres. Int. J. Grid Util. Comput. 9(1), 1–17 (2018)
12. S. Mangalampalli, V.K. Mangalampalli, S.K. Swain, Energy aware task scheduling algorithm
in cloud computing using PSO and cuckoo search hybridization. Solid State Technol. 63(6),
13995–14010 (2020)
13. R. Chi et al., A hybridization of cuckoo search and particle swarm optimization for solving
optimization problems. Neural Comput. Appl. 31(1), 653–670 (2019)
Customer Segmentation via Data Mining
Techniques: State-of-the-Art Review
Abstract Customers are more vigilant, intelligent, and dynamic in society. They
change their preferences and habits according to their needs. Knowing the needs of
customers is an important part of marketing where a company should discover the
loyal customers in this heterogeneity. The concept of dividing heterogeneity into
homogeneous forms is termed as customer segmentation. Customer segmentation is
an integral part of marketing where companies can easily develop relationships with
customers with a huge set of customer data in an organized manner. Understanding the
customer’s hidden knowledge is a resourceful idea of computational analysis where
accurate information could be optimized for the taste and preference of the customer.
This type of computational analysis is termed as data mining. This paper discussed
on a systematic review of customer segmentation via data mining techniques. It is
a systematic review of supervised, unsupervised and other data mining techniques
used in segmentation.
1 Introduction
S. Das
Department of MBA, Aditya Institute of Technology and Management (AITAM), Tekkali 532201,
India
J. Nayak (B)
Department of Computer Science, Maharaja Sriram Chandra BhanjaDeo (MSCB) University,
Baripada, Odisha 757003, India
e-mail: jnayak@ieee.org
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 489
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_38
490 S. Das and J. Nayak
reaction from consumers [1]. Over the years, consumers’ behaviour has changed
continuously. Now, consumers are more volatile than before. Often, they change
their habits and preferences. Therefore, it is impossible for a seller or manufacturer
to identify the consumer’s needs and wants in the mass markets. The idea of dividing
the market into various groups or sub-groups is typically known as segmentation.
The concept of segmentation is justified and explained by different experts to iden-
tify the needs and wants of customers rationally. This strategic application of market
targeting will ensure to anticipate consumer reaction, because they may have varied
preferences for consuming goods or services according to their profile [2]. Neverthe-
less, the selection of segmentation techniques consistently depends on the variables
input, such as the geographic, demographic, behavioural, or psychological profile of
consumers forecasted with some statistical or non-statistical approaches.
According to Smith [3], segmentation is a distinctive marketing strategy closely
associated with product differentiation and homogeneity. The customer may obtain
a variety of alternatives from manufacturers. In this diversified market structure,
manufacturers may get confused about selecting or retaining the customer. To attract
and retain customers, often marketers adopt selective techniques through advertising
or sales promotion rather than to understand the customer’s motives. In the gener-
alization of the mass market, it is difficult to identify the needs and wants of the
customer through all kinds of promotional techniques. Therefore, customer segmen-
tation could be a choice for the marketer to provide preferential goods or services to
the customer. The basic idea of customer segmentation is to cluster/group customers
to identify, understand and target their needs. This concept of customer segmentation
was initially introduced by Smith in 1956 as an unconventional technique for product
differentiation strategy. A segment or group of customers can be depicted as a set of
customers who have similar types of demographic, psychological, and behavioural
profiles [4]. Now the selection of segmentation techniques is a sophisticated area of
research in this information and communication age, particularly in the areas of data
mining (DM) and database management systems (DBMS). With the huge data sets,
now the traditional market forecasting techniques are becoming of no use. Several
statistical techniques, like multivariate analysis, time series and so on, are also failing
to perform satisfactory clustering or segmentation. In this connection, a new form
of knowledge management technologies with soft computing and hard computing
like data mining, machine learning, artificial intelligence, etc. will definitely solve
market-related problems [5].
In this competitive world, today, most sellers want to know the needs and pref-
erences of the customer. Now they profusely maintain good relationships with
customers at every stage of business operations. The concept of maintaining a
good relationship with the customer is known as customer relationship management
(CRM). This theory of customer relationship management is becoming an integral
part of marketing strategy. With the proliferation of the Internet, the idea of relation-
ship management has become popular due to several computational approaches. The
company and customers can easily interact and understand each other by learning
the hidden knowledge from the enormous quantity of data. The concept of under-
standing and analysing the hidden knowledge of the customer is data mining. Data
Customer Segmentation via Data Mining … 491
mining is a computational analysis process that discovers the consumer’s taste and
preferences through customer segmentation, dividing huge sets of data [6]. The data
mining approach is also useful for manufacturers who have lost their quality when
the products decay. In this case, the recency, frequency, and monetary (RFM) form of
segmentation failed to quantify the exact preference rather than other methods like the
Fuzzy Analytic Network Process (FANP) [7]. Sometimes, data mining techniques
are useful for profiling the customer base, targeting, aligning the right channels,
cross-selling products, enhancing customer relationships and providing value to the
customer [8]. However, prioritising the customer within the existing customer base is
also an important technique in data mining. To improve the service quality and effec-
tiveness of the product, importance-performance analysis (IPA) is also a part of data
mining [9]. Customer segments are highly volatile; they may change according to the
preference of the customer, which creates confusion about the re-computation of data.
These uncertainties require streaming of data in a proper form where data mining
helps to cluster the data. As a result, customer segmentation performs continuously
[10]. Data mining techniques are also predicting the future probability and behaviours
that allow businesses to be more practical and knowledge-driven [11]. Data mining
techniques also provide the advantage of customer segmentation functions [12]. Data
mining also classifies blogs into supervised and unsupervised learning models for
extracting knowledge from voice over the Internet protocol [13].
After a meticulous review of 550 academic literature, 57 research articles and
17 conference papers were considered in this review process. This paper discusses
customer segmentation via data mining techniques from a review perspective. This
paper is a systematic investigation into supervised, unsupervised and other data
mining techniques. The supervised approaches, such as neural networks, naive
Bayes, linear regression, logistic regression, support vector machine (SVM), K-
nearest neighbour, boosting and decision tree (DT), hidden Markov model (HMM),
and random forest have an enormous contribution to object detection and classifica-
tion. In unsupervised approaches, complex classification of data, identification and
processing of variables have more emphasis through K-means clustering, K-nearest
neighbours (KNN), hierarchal clustering, anomaly detection, neural networks, prin-
ciple component analysis, independent component analysis, apriori algorithm, etc.
Some of the research articles on other data mining techniques, such as chi-square
automatic interaction detector (CHAID), RFM, genetic algorithm (GA), and logistic
regression, etc., have revealed classification and relationship management. The paper
is organised into 5 sections. Section 2 presents various issues involved with customer
segmentation. Section 3 explains various segmentation techniques. Section 4 is about
critical investigation. Section 5 concludes the discussion and conclusion.
492 S. Das and J. Nayak
systematic way of defining the tools that help the business to grow and develop.
Therefore, selection of the right tools involves cross-functional cause to deal with
the business goal. Customer segmentation has a lot of pros and cons while classi-
fying customers into different profiles. Sometimes it procures, retains, and attracts
the customer. It clusters the customers according to the market demand. However,
it could be successful when accurate data interpretation, knowledge discovery, and
information dissemination are properly done. Often, due to inexact information, it is
not effective. The manual process of segmentation is time-consuming, un-scalable,
and not agile. Therefore, segmentation does not help one-to-one marketing. With
the help of the latest technologies like data mining, artificial intelligence, machine
learning, etc., accurate segmentation is possible and makes the customer profitable.
3 Segmentation Techniques
Segmentation
techniques
and other methods of data mining approach (Fig. 1). Most of the techniques related
to artificial neural networks (ANNs), fuzzy logic (FL), machine learning (ML), RST
and evolutionary methods (EM) such as GA are the main data mining tools to analyse
data perfectly. These technologies have been widely used in data preparation and data
analysis. It is a challenging task for modern marketing professionals to consider the
right technique or algorithm. Most of these algorithms have significant advantages
and disadvantages also. To avoid this problem, researchers should consider either
a supervised or unsupervised approach. The supervised approach is a classifica-
tion method where the inputs and outputs are mapped properly. In the supervised
approach, all the common algorithms, i.e. support vector machines, logistic regres-
sion, artificial neural networks, naive Bayes, and random forests, significantly work
further. These approaches follow a hierarchical process to maintain a good relation-
ship between input and output datasets. The unsupervised approaches are clustering
of data inherently. Some familiar algorithms include k-means clustering, principal
component analysis, and auto encoders. Since no labels are provided, there is no
specific way to compare model performance in most unsupervised approaches. In
this connection, DM techniques using neural networks, decision trees, genetic algo-
rithms, fuzzy logic, and K-nearest neighbour could be able to predict, comprehend,
and cluster the customers properly [20]. Besides the non-traditional methods, some
traditional techniques like self-organizing maps (SOM) can also be used to make
segmentation. In this approach, a set of initial cluster prototypes are made before
applying the K-means to get the final clusters of data sets through near visualization.
Some researchers said that the U-matrix is also one of the best options for clustering
the data for analysing the results by time of hits.
Customer Segmentation via Data Mining … 495
Data preparation is a systematic way of transforming raw data into a basic form
of data for predictive analysis to remove errors or mistakes. Data preparation is a
challenging task to acquire proper prediction analysis. It uses automatic search like
grid and random search to find unity in data preparation. Often it is difficult to gather
a variety of data. For example, the data might be stored in a CSV file for classifica-
tion and regression consists of rows, columns, and values for any data preparation
method. However, most of the authors articulated that data preparation techniques
are inferred using statistical and non-statistical techniques. Statistical techniques like
exploratory factor analysis and correspondence analysis; and computational tech-
niques such as soft computing tools (e.g. RST or GA) are typically used in data
preparation [5]. Exploratory factor analysis (EFA) is a common statistical method
applicable to multivariate statistics to uncover a relatively large set of data. Most of
the time, researchers use this technique for scaling the data sets through the question-
naire. EFA is accurate as each factor is symbolized by multiple measured variables.
EFA is based on common factors, unique factors, and errors of measurement. With
this EFA model, we can easily identify the common factors and other related manifest
variables. The correspondence analysis (CA) is an expansion of principal compo-
nent analysis appropriate for discovering relationships amongst qualitative variables
(or categorical data). Like principal component analysis, it also offers a solution for
summarizing and visualizing the data in two-dimension plots. Correspondence anal-
ysis is a significant form of geometric approach for visualizing rows and columns
of a two-way contingency table appropriately. The main aim of this tabular form is
to provide a global view of the data for easy interpretation. However, these statis-
tical techniques have been replaced by soft computing to segment or classify the
data and provide accurate results. In particular, soft computing (SC) is an improved
technique over conventional traditional systems. It is also part of hard computing.
It has many intelligent and user-friendly features. Soft computing consists of FL,
ANNs, RST, and EM. The principal component of soft computing is to eliminate the
uncertainty and vagueness of data through fuzzy tools and EM, which are involved in
the optimization and searching process. Furthermore, ANNs and RST will solve the
classification and rule generation problems. Recently, soft computing technologies
have been used for resolving data mining problems. Soft computing is widely used
for the analysis and interpretation of data. RST is mathematical computation and
granular approximation which discovers the hidden pattern in an uncertain environ-
ment widely used in soft computing. Therefore, soft computing is a computational
method that is useful for data preparation.
496 S. Das and J. Nayak
for image segmentation with the use of a proper algorithm [24]. Sentiment anal-
ysis (SA) is a newly emerged research topic which unlocks a new future for busi-
nessmen, writers, and bloggers. It is an emerging form of computational algorithm
to understand the percentage of product acceptance and rejection where the business
acumen builds up their strategy to improve the product performance. In this regard,
opinion mining will be possible to find the exact intention of the customer through
supervised machine learning models [25]. The supervised approach also detects the
musical boundaries between verse and chorus segments. Here the perceptual aspects
such as timbre, harmony, melody, and the rhythm of music through boosting [26].
Graph base spectral algorithm is a recent topic in research today, which detects image
objects through a clustering algorithm in a meaningful enlarge structure [27]. The
fault diagnosis system (FDS) is also an improved method of supervised learning
using a support vector machine for appropriate decision-making [28]. The decom-
position of nuclear waste objects through robotics is a matter of concern where the
RGBD-based detection and categorization is applied by a deep convolutional neural
network (DCNN) from unlabelled RGBD videos. It helps to make an object detection
benchmark to recognize waste objects perfectly [29]. In this connection, supervised
learning is a leading algorithm that was developed to identify the data, cluster and
recognize to perceive the individual customer expectations. This type of segmen-
tation will be helpful for researchers and business leaders to develop the product
quality and meet the needs of the customers.
evaluated through the fuzzy joint points (FJP) method where the data set is classified
in hierarchical order [30].
DNA array analysis is a functional algorithm to measure the expression of multiple
genes in an unsupervised approach. Just like supervised learning, a two-way clus-
tering framework is also able to identify gene patterns and perform cluster discovery
on samples where connectivity among the groups of genes could be possible [31].
Speech recognition and grouping of voices through co-channel (two-talker) speech
separation is also a part of the unsupervised learning approach. For voice segre-
gation and segmentation of speech, a differential algorithm like tandem will work
to separate the unvoiced speech [32]. This unsupervised approach is also applied
for the summarization of opinions. The state-of-the-art algorithm has been used
in this process where the summarization method is informative and readable [33].
This approach also detects human activity recognition from raw data by wearable
sensors to identify expectations [34]. The segmentation of data classification could
be possible through multidimensional time series using the hidden Markov model,
which predicts human activity accurately. Automatic summarization of documents
is a recent development in the summarization of documents where the algorithms
classify the data into words, sentences, and phrases and finally process the docu-
ment. It also observes the relevancy, redundancy, and length of the document while
summarizing it [35]. Most researchers used the unsupervised learning approach for
different perspectives, such as facial landmark detectors, protocol features of word
extraction, product attribute extraction, clusters of pixel images, and so on.
In recent years, customer segmentation in direct marketing has become more effec-
tive with the development of database marketing techniques. These types of data
mining approaches ensure direct marketers segment customers in a better way to
perform with a different marketing strategy. The data mining approaches such as
CHAID, RFM, GA, and logistic regression were used as the analytical tools for direct
marketing segmentation with two types of data sets. It was found that amongst all the
approaches, RFM is the perfect approach. However, CHAID is also an optimal solu-
tion for segmenting the data into sequence. So an empirically based RFM approach
could replace both CHAID and logistic regression in database marketing systems
[36]. Therefore, it can be observed from several studies that RFM technology has
been used vividly to segment customers to access information. The marketing repre-
sentatives of commercial banks can segment through k-means classification to obtain
potential customers. To obtain useful information from the customer, four types of
data mining methods, such as neural network, C5.0, classification and regression tree,
and chi-squared automatic interaction detector, will definitely be helpful to detect
the background information for credit card holders [37, 38]. Market segmentation
has a key role in continuing the relationship with a loyal customer. In this regard,
there must be a correlation between the retailer and the customer. By the use of the
Customer Segmentation via Data Mining … 499
divisive cluster analysis technique of data mining, the retailer can find all kinds of
information from the customer database [39].
The advent of technology for data optimization and screening is an important
technique for data mining that mines vast data sets and classifies the market accord-
ingly. In particular, ANN and particle swarm optimization (PSO) methods are recent
developments for market decision strategy. With the integration between statistical
analysis and particle swarm optimization, we can reduce redundant data and segment
the market properly [40]. Data mining techniques have become an indispensable
method in market segmentation. The classification of larger data sets from databases
is a recent form of market research where some intelligent solutions, such as neural
networks, evolutionary algorithms (EA), fuzzy theory, RFM, hierarchical clustering,
K-means, bagged clustering, kernel methods, Taguchi method, multidimensional
scaling, model-based clustering, rough sets, and others, will be very effective and
time-bound [41]. So, clustering the data is an important feature of data mining tech-
niques where latent class analysis (LCA), prior clustering, and some description
of similarity or distance measures of data are used for segmenting large groups of
customers for individual expectations [42]. To understand the various research arti-
cles, we can confirm that data mining is vividly used for the exploration and prediction
of expected outcomes in the heterogeneous market. Data mining is used for classifica-
tion, clustering, association, and sequential analysis. In this regard, certain statistical
applications such as regression, time series, association and sequential analysis will
be beneficial for mining large data sets [43].
4 Critical Investigation
Despite the importance of segmentation analysis on different data sets, minor atten-
tion has to be paid to check the reliability and validity. Because some variables, like
demographics (age, gender, income, religion, etc.) are more reliable than behavioural
or psychological characteristics. In particular, in the case of an attitude survey, proper
care should be considered and should test the reliability of data. To check the relia-
bility of data, statistical measures like factor analysis, conjoint analysis, co-relation,
component matrix, etc. will be beneficial for data analysis. However, these tradi-
tional methods could not provide accuracy due to several exceptions to the number
Customer Segmentation via Data Mining … 501
Product oriented
observed variable
Product oriented
36% unobserved variable
fuzzy clusters in the datasets. Some of the indexes also compare the clusters. Hence,
inherently, the data sets should be checked and rechecked through the proper method
to test their reliability.
Data mining is the significant procedure of analysing large volumes of data to ascer-
tain business acumen, which helps companies to resolve problems, mitigate risks,
and grasp new opportunities. This particular division of data science derived from the
similarities in data between searching for important information in a large database
and mining a peak. Both processes need sifting through wonderful amounts of mate-
rial to find hidden value. Data mining can answer all kinds of business questions that
traditionally took more time to resolve manually. Using a wide range of statistical
techniques to analyse data from a different perspective, users can identify patterns,
trends, and relationships. Customer segmentation is a measure of concern for market
analysis where proper data classification is important. Though there is the applica-
tion of several statistical techniques in a customer database, data mining techniques
could help predict, analyse and profile the customer in a significant way. Several
academic literature has given the importance of various data mining techniques, like
supervised, unsupervised, and other data mining techniques, but it could be difficult
to identify the exact data mining techniques for their study. So the researchers should
have domain knowledge of business, techniques, and also a fitness model. Here, we
proposed a data mining model (Fig. 3) based on the suitability of customer needs
and expectations.
Customer segmentation using data mining is a recent study where most of the
academic literature suggests the classification of data. Some of these studies empha-
sized different clustering methods also. However, the selection of segmentation tech-
niques is a challenging task for a business concern. About the selection of the segmen-
tation, we must think about two important aspects, i.e. the objective of management
and the recent trends in the market. The classical methods like factor analysis, regres-
sion, conjoint analysis, or co-efficient determinants may not provide accurate predic-
tions. Therefore, in this review, we observed that computational algorithms could
justify businessmen for analysis and prediction. As we know, most business acumen
are expanding their products or services into different markets and also searching for
a better customer portfolio where they can target customers and position their brands.
In this connection, we highlighted four types of segmentation techniques, such as
general observable variables, unobservable variables, product-specific observable
variables, and product-specific unobservable variables. In the first case, the variables
Customer Segmentation via Data Mining … 503
References
2. P.Q. Brito et al., Customer segmentation in a large database of an online customized fashion
business. Robot. Comput.-Integr. Manuf. 36, 93–100 (2015). https://doi.org/10.1016/j.rcim.
2014.12.014
3. W.R. Smith, Product differentiation and market segmentation as alternative marketing
strategies. J. Mark. 21(1), 3–8 (1956). https://doi.org/10.1177/002224295602100102
4. A. Nairn, P. Berthon, Creating the customer: the influence of advertising on consumer market
segments—evidence and ethics. J. Bus. Ethics 42(1), 83–100 (2003). https://doi.org/10.1023/
A:1021620825950
5. A. Hiziroglu, Soft computing applications in customer segmentation: state-of-art review and
critique. Expert Syst. Appl. 40(16), 6491–6507 (2013). https://doi.org/10.1016/j.eswa.2013.
05.052
6. A. Hajiha, R. Radfar, S.S. Malayeri, Data mining application for customer segmentation based
on loyalty: an Iranian food industry case study, in 2011 IEEE International Conference on
Industrial Engineering and Engineering Management (IEEE, 2011). https://doi.org/10.1109/
IEEM.2011.6117968
7. V. Golmah, G. Mirhashemi, Implementing a data mining solution to customer segmentation
for decayable products—a case study for a textile firm. Int. J. Database Theory Appl. 5(3),
73–90 (2012)
8. M.M.T.M. Hassan, M. Tabasum, Customer profiling and segmentation in retail banks using
data mining techniques. Int. J. Adv. Res. Comput. Sci. 9(4), 24–29 (2018)
9. S.Y. Hosseini, A.Z. Bideh, A data mining approach for segmentation-based importance-
performance analysis (SOM–BPNN–IPA): a new framework for developing customer retention
strategies. Serv. Bus. 8(2), 295–312 (2014). https://doi.org/10.1007/s11628-013-0197-7
10. M. Carnein, H. Trautmann, Customer segmentation based on transactional data using stream
clustering, in Pacific-Asia Conference on Knowledge Discovery and Data Mining (Springer,
Cham, 2019). https://doi.org/10.1007/978-3-030-16148-4_22
11. W. Wang, S. Fan, Application of data mining technique in customer segmentation of shipping
enterprises, in 2010 2nd International Workshop on Database Technology and Applications
(IEEE, 2010). https://doi.org/10.1109/DBTA.2010.5659081
12. J. Ranjan, R. Agarwal, Application of segmentation in customer relationship management: a
data mining perspective. Int. J. Electron. Custom. Relat. Manag. 3(4), 402–414 (2009). https://
doi.org/10.1504/IJECRM.2009.029298
13. L.-S. Chen, C.-C. Hsu, M.-C. Chen, Customer segmentation and classification from blogs by
using data mining: an example of VOIP phone. Cybern. Syst. Int. J. 40(7), 608–632 (2009).
https://doi.org/10.1080/01969720903152593
14. Z. Yihua, Vip customer segmentation based on data mining in mobile-communications industry,
in 2010 5th International Conference on Computer Science & Education (IEEE, 2010). https://
doi.org/10.1109/ICCSE.2010.5593669
15. C. Qiuru et al., Telecom customer segmentation based on cluster analysis, in 2012 International
Conference on Computer Science and Information Processing (CSIP) (IEEE, 2012). https://
doi.org/10.1109/CSIP.2012.6309069
16. H. Gong, Q. Xia, Study on application of customer segmentation based on data mining tech-
nology, in 2009 ETP International Conference on Future Computer and Communication (IEEE,
2009). https://doi.org/10.1109/FCC.2009.66
17. X. Lai, Segmentation study on enterprise customers based on data mining technology, in 2009
First International Workshop on Database Technology and Applications (IEEE, 2009). https://
doi.org/10.1109/DBTA.2009.96
18. H. Hwang, T. Jung, E. Suh, An LTV model and customer segmentation based on customer
value: a case study on the wireless telecommunication industry. Expert Syst. Appl. 26(2),
181–188 (2004). https://doi.org/10.1016/S0957-4174(03)00133-7
19. C.-H. Cheng, Y.-S. Chen, Classifying the segmentation of customer value via RFM model and
RS theory. Expert Syst. Appl. 36(3), 4176–4184 (2009). https://doi.org/10.1016/j.eswa.2008.
04.003
Customer Segmentation via Data Mining … 505
20. S. Kelly, Mining data to discover customer segments. Interact. Mark. 4(3), 235–242 (2003).
https://doi.org/10.1057/palgrave.im.4340185
21. R.J. Calantone, J.S. Johar, Seasonal segmentation of the tourism market using a benefit segmen-
tation framework. J. Travel Res. 23(2), 14–24 (1984). https://doi.org/10.1177/004728758402
300203
22. W. Wang et al., A weakly supervised approach for object detection based on soft-label boosting,
in 2013 IEEE Workshop on Applications of Computer Vision (WACV) (IEEE, 2013). https://
doi.org/10.1109/WACV.2013.6475037
23. N. Malandrakis et al., A supervised approach to movie emotion tracking, in 2011 IEEE Interna-
tional Conference on Acoustics, Speech and Signal Processing (ICASSP) (IEEE, 2011). https://
doi.org/10.1109/ICASSP.2011.5946961
24. L. Yang et al., A supervised approach to the evaluation of image segmentation methods, in
International Conference on Computer Analysis of Images and Patterns (Springer, Berlin,
Heidelberg, 1995). https://doi.org/10.1007/3-540-60268-2_377
25. Md.S. Islam et al., Supervised approach of sentimentality extraction from Bengali Face-
book status, in 2016 19th International Conference on Computer and Information Technology
(ICCIT) (IEEE, 2016). https://doi.org/10.1109/ICCITECHN.2016.7860228
26. D. Turnbull et al., A supervised approach for detecting boundaries in music using difference
features and boosting, in ISMIR (2007)
27. L. Yang et al., A supervised approach to the evaluation of image segmentation methods, in
International Conference on Computer Analysis of Images and Patterns (Springer, Berlin,
Heidelberg, 1995). https://doi.org/10.1016/j.neucom.2011.09.002
28. I. Monroy et al., A semi-supervised approach to fault diagnosis for chemical processes. Comput.
Chem. Eng. 34(5), 631–642 (2010). https://doi.org/10.1016/j.compchemeng.2009.12.008
29. L. Sun et al., A novel weakly-supervised approach for RGB-D-based nuclear waste object
detection. IEEE Sens. J. 19(9), 3487–3500 (2018). https://doi.org/10.1109/JSEN.2018.288
8815
30. A.J. Ferreira, M.A.T. Figueiredo, An unsupervised approach to feature discretization and
selection. Pattern Recogn. 45(9), 3048–3060 (2012). https://doi.org/10.1016/j.patcog.2011.
12.008
31. E.N. Nasibov, G. Ulutagay, A new unsupervised approach for fuzzy clustering. Fuzzy Sets
Syst. 158(19), 2118–2133 (2007). https://doi.org/10.1016/j.fss.2007.02.019
32. Ke. Hu, D.L. Wang, An unsupervised approach to cochannel speech separation. IEEE Trans.
Audio Speech Lang. Process. 21(1), 122–131 (2012). https://doi.org/10.1109/TASL.2012.221
5591
33. K. Ganesan, C.X. Zhai, E. Viegas, Micropinion generation: an unsupervised approach to gener-
ating ultra-concise summaries of opinions, in Proceedings of the 21st International Conference
on World Wide Web (2012)
34. D. Trabelsi et al., An unsupervised approach for automatic activity recognition based on hidden
Markov model regression. IEEE Trans. Autom. Sci. Eng. 10(3), 829–835 (2013). https://doi.
org/10.1109/TASE.2013.2256349
35. R.M. Alguliyev, R.M. Aliguliyev, N.R. Isazade, An unsupervised approach to generating
generic summaries of documents. Appl. Soft Comput. 34, 236–250 (2015). https://doi.org/
10.1016/j.asoc.2015.04.050
36. J.A. McCarty, M. Hastak, Segmentation approaches in data-mining: a comparison of RFM,
CHAID, and logistic regression. J. Bus. Res. 60(6), 656–662 (2007). https://doi.org/10.1016/
j.jbusres.2006.06.015
37. W. Li et al., Credit card customer segmentation and target marketing based on data mining,
in 2010 International Conference on Computational Intelligence and Security (IEEE, 2010).
https://doi.org/10.1109/CIS.2010.23
38. Z. Lu et al., Customer segmentation algorithm based on data mining for electric vehicles, in 2019
IEEE 4th International Conference on Cloud Computing and Big Data Analysis (ICCCBDA)
(IEEE, 2019). https://doi.org/10.1109/ICCCBDA.2019.8725737
506 S. Das and J. Nayak
39. V.L. Miguéis, A.S. Camanho, J. Falcão e Cunha, Customer data mining for lifestyle segmen-
tation. Expert Syst. Appl. 39(10), 9359–9366 (2012). https://doi.org/10.1016/j.eswa.2012.
02.133
40. C.-Y Chiu et al., An intelligent market segmentation system using k-means and particle
swarm optimization. Expert Syst. Appl. 36(3), 4558–4565 (2009). https://doi.org/10.1016/j.
eswa.2008.05.029
41. S. Dutta, S. Bhattacharya, K.K. Guin, Data mining in market segmentation: a literature review
and suggestions, in Proceedings of Fourth International Conference on Soft Computing for
Problem Solving (Springer, New Delhi, 2015). https://doi.org/10.1007/978-81-322-2217-0_8
42. E.R. Swenson, N.D. Bastian, H.B. Nembhard, Healthcare market segmentation and data
mining: a systematic review. Health Mark. Q. 35(3), 186–208 (2018). https://doi.org/10.1080/
07359683.2018.1514734
43. S. Mckechnie, Integrating intelligent systems into marketing to support market segmentation
decisions. Intell. Syst. Account. Finance Manag. Int. J. 14(3), 117–127 (2006). https://doi.org/
10.1002/isaf.280
44. P. Kotler, K.L. Keller, Marketing Management, ed. by W. Lassar, international 11th edn.
(Prentice Hall, New Jersey, 2003)
45. M. Wedel, W.A. Kamakura, Market Segmentation: Conceptual and Methodological Founda-
tions, vol. 8 (Springer Science & Business Media, 2012)
46. Y. Wind, Issues and advances in segmentation research. J. Mark. Res. 15(3), 317–337 (1978).
https://doi.org/10.1177/002224377801500302
47. L. Alfansi, A. Sargeant, Market segmentation in the Indonesian banking sector: the relationship
between demographics and desired customer benefits. Int. J. Bank Mark. (2000). https://doi.
org/10.1108/02652320010322976
48. D.G. Tonks, Validity and the design of market segments. J. Mark. Manag. 25(3–4), 341–356
(2009). https://doi.org/10.1362/026725709X429782
49. M. Taks, J. Scheerder, Youth sports participation styles and market segmentation profiles:
evidence and applications. Eur. Sport Manag. Q. 6(2), 85–121 (2006). https://doi.org/10.1080/
16184740600954080
50. J. Bruwer, E. Li, Wine-related lifestyle (WRL) market segmentation: demographic and
behavioural factors. J. Wine Res. 18(1), 19–34 (2007). https://doi.org/10.1080/095712607015
26865
51. P. Vyncke, Lifestyle segmentation: from attitudes, interests and opinions, to values, aesthetic
styles, life visions and media preferences. Eur. J. Commun. 17(4), 445–463 (2002). https://doi.
org/10.1177/02673231020170040301
52. A. Vellido, P.J.G. Lisboa, K. Meehan, Segmentation of the on-line shopping market using
neural networks. Expert Syst. Appl. 17(4), 303–314 (1999). https://doi.org/10.1016/S0957-
4174(99)00042-1
53. J. Swait, A structural equation model of latent segmentation and product choice for cross-
sectional revealed preference choice data. J. Retail. Consum. Serv. 1(2), 77–89 (1994). https://
doi.org/10.1016/0969-6989(94)90002-7
54. T. Teichert, E. Shehu, I. von Wartburg, Customer segmentation revisited: the case of the airline
industry. Transp. Res. Part A Policy Pract. 42(1), 227–242 (2008). https://doi.org/10.1016/j.
tra.2007.08.003
55. A. Lindridge, S. Dibb, Is ‘culture’ a justifiable variable for market segmentation? A cross-
cultural example. J. Consum. Behav. Int. Res. Rev. 2(3), 269–286 (2003). https://doi.org/10.
1002/cb.106
56. F. Casarin, A. Moretti, An international review of cultural consumption research. SSRN
Electron. J. Department of Management, Università Ca’ Foscari Venezia working paper 12
(2011)
57. A.M. Gonzalez, L. Bello, The construct “lifestyle” in market segmentation: the behaviour of
tourist consumers. Eur. J. Mark. (2002). https://doi.org/10.1108/03090560210412700
58. D.B. Valentine, T.L. Powers, Generation Y values and lifestyle segments. J. Consum. Mark.
(2013). https://doi.org/10.1108/JCM-07-2013-0650
Customer Segmentation via Data Mining … 507
59. U.R. Orth et al., Promoting brand benefits: the role of consumer psychographics and lifestyle.
J. Consum. Mark. (2004). https://doi.org/10.1108/07363760410525669
60. C.-S. Yu, Construction and validation of an e-lifestyle instrument. Internet Res. (2011). https://
doi.org/10.1108/10662241111139282
61. A.M. Thompson, P.F. Kaminski, Psychographic and lifestyle antecedents of service quality
expectations: a segmentation approach. J. Serv. Mark. (1993). https://doi.org/10.1108/088760
49310047742
62. J.L.M. Tam, S.H.C. Tai, Research note: the psychographic segmentation of the female market
in Greater China. Int. Mark. Rev. (1998). https://doi.org/10.1108/02651339810205258
63. T.F. Srihadi, D. Sukandar, A.W. Soehadi, Segmentation of the tourism market for Jakarta:
classification of foreign visitors’ lifestyle typologies. Tour. Manag. Perspect. 19, 32–39 (2016).
https://doi.org/10.1016/j.tmp.2016.03.005
64. B. Oates, L. Shufeldt, B. Vaught, A psychographic study of the elderly and retail store attributes.
J. Consum. Mark. (1996). https://doi.org/10.1108/07363769610152572
65. T.M.M. Verhallen, R.T. Frambach, J. Prabhu, Strategy-based segmentation of industrial
markets. Ind. Mark. Manag. 27(4), 305–313 (1998). https://doi.org/10.1016/S0019-850
1(97)00064-3
66. E.J. Cheron, R. McTavish, J. Perrien, Segmentation of bank commercial markets. Int. J. Bank
Mark. (1989). https://doi.org/10.1108/EUM0000000001458
67. S.W. Clopton, J.E. Stoddard, D. Dave, Event preferences among arts patrons: implications for
market segmentation and arts management. Int. J. Arts Manag. 48–59 (2006)
68. A. Buratto, L. Grosset, B. Viscolani, Advertising a new product in a segmented market. Eur. J.
Oper. Res. 175(2), 1262–1267 (2006)
69. R. Sánchez-Fernández, M. Ángeles Iniesta-Bonillo, A. Cervera-Taulet, Exploring the concept
of perceived sustainability at tourist destinations: a market segmentation approach. J. Travel
Tour. Mark. 36(2), 176–190 (2019)
70. K. Bijak, L.C. Thomas, Does segmentation always improve model performance in credit
scoring? Expert Syst. Appl. 39(3), 2433–2442 (2012). https://doi.org/10.1016/j.eswa.2011.
08.093
71. A. Sell, P. Walden, Segmentation bases in the mobile services market: attitudes in, demographics
out, in 2012 45th Hawaii International Conference on System Sciences (IEEE, 2012)
72. A. Sell, J. Mezei, P. Walden, An attitude-based latent class segmentation analysis of mobile
phone users. Telemat. Inform. 31(2), 209–219 (2014)
73. D.J. Ketchen, C.L. Shook, The application of cluster analysis in strategic management research:
an analysis and critique. Strateg. Manag. J. 17(6), 441–458 (1996)
74. G. Punj, D.W. Stewart, Cluster analysis in marketing research: review and suggestions for
application. J. Mark. Res. 20(2), 134–148 (1983)
Solar Radiation Prediction Using
Artificial Neural Network:
A Comprehensive Review
Abstract Solar energy has a great potential along with wind energy to fulfill the
energy demands of any nation. In the last few decades, considerable research interest
has been shown to harness solar energy. This paper is prepared to consider the role
of artificial neural network (ANN) in predicting the solar radiation value of any
location. The prime objective of this paper is to review the most recent study on
ANN-based techniques for finding out the best available methods in the literature
for the prediction of solar radiation in comparison with the conventional methods. A
comprehensive discussion has been presented to find out some of the research gaps
in this domain.
1 Introduction
The integration of renewable energy sources mainly the unpredictable ones such as
wind and solar, into existing or future energy supply networks, is going to be one
of the most challenging tasks for the global energy market in the future. This high
demand needs to be fulfilled by new power plants. Now because most of the power
plants of present-day run-on fossil fuels, any new addition will lead to an increase in
greenhouse gas emission which leads to global warming [1]. Also, it is expected that
fossil fuels reserves will end in the coming few decades; hence, they may work as a
temporary alternative only. Hence, permanent source has to be found, and research
must be done to make those sources more efficient and reliable. Considering India’s
geographical conditions, solar and wind energy are two major alternative sources
B. Paul (B)
Department of Mechanical Engineering, Motilal Nehru National Institute of Technology
Allahabad, Prayagraj, UP, India
e-mail: bipaul@mnnit.ac.in
H. Paul
Department of Computer Science and Engineering, LDC Institute of Technical Studies, Soraon,
Prayagraj, UP, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 509
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_39
510 B. Paul and H. Paul
70
60
50
Publications
40
30
20
10
Year
Fig. 1 Increasing trend of research in the field of solar radiation prediction using ANN. Source:
https://www.webofscience.com/wos/woscc/basic-search
Generally, solar radiation is measured with the help of solar radiation measurement
devices. Nevertheless, due to the higher installation and maintenance cost and also
due to the requirement of time-to-time calibration, these are very sparse at most
of the research institutes. Hence, it is therefore very much important to forecast
solar radiation of a particular location with easily measurable climatic variables such
as temperature, relative humidity, and wind speed. Keeping this in mind, several
numbers of models have been proposed in the literature to predict the solar radiation
data. In the following section, some of the important papers are discussed.
Bilgili and Ozgoren [5] have predicted the daily total global solar radiation using
three different techniques mainly multi-nonlinear regression (MNLR), multi-linear
regression (MLR), and feed-forward ANN methods in Adana city of Turkey. They
have reported that the ANN method performed better than the other two methods in
predicting the daily global solar radiation. They have found that the average mean
absolute percentage errors (MAPE) of testing for the MNLR, MLR, and ANN models
are 18.41%, 32.1%, and 12.89% respectively. This shows that the performance ANN
model performs better than the other two models.
Al-Shamisi et al. [6] have studied two different ANN techniques mainly multilayer
perceptron (MLP) and radial basis function (RBF) to predict monthly average global
solar radiation of Al Ain city, UAE. They have trained the neural network model using
the weather data from 1995 to 2004. However, for testing and model validation, the
data between 2005 and 2007 were used. Eleven different models were tested with
MLP and RBF techniques with different input combinations. They have reported
512 B. Paul and H. Paul
that the RBF technique performs better than the MLP technique in most cases. The
statistical parameters such as MAPE, mean bias error (MBE), root mean square error
(RMSE), and correlation coefficient (R2 ) for the RBF model have been reported to
be 35%, 0.307%, 3.88%, and 92%, respectively.
Here n refers to the number of data; oi and t i are the ith predicted value and
measured values, respectively.
Xue [7] has predicted the daily diffuse solar radiation using the ANN technique.
They have used mainly genetic algorithm (GA) and particle swarm optimization
(PSO) techniques to improve the accuracy of the back-propagation neural network
(BPNN) model. A total of seven input parameters mainly comprise the month of the
year, mean temperature, relative humidity, wind speed, sunshine duration, rainfall,
and daily global solar radiation that have been used for the study. The author has
observed that the BPNN optimized by the PSO model has outperformed the BPNN
optimized by GA models.
Alsina et al. [8] works on the prediction of monthly average daily global solar
radiation over Italy using ANN models. They have used the data from 45 locations
for the training and testing of multi-location ANN. At least 13 input parameters
were considered for each location including the geographical coordinates and the
monthly values of climatologic parameters. They have found that by using all the
available input, the best suitable ANN leads to a normalized root mean square error
(NRMSE) of 1.65%, a MAPE of about 2.66%, and a mean percentage bias error
(MPBE) of—0.20%, respectively.
Antonopoulos et al. [9] have investigated the Hargreaves method, multi-linear
regression methods (MLR), and ANN technology to estimate solar radiation. The
daily meteorological measurements of radiation, air temperature, relative humidity,
and wind velocity were used to develop the solar radiation models. They have
suggested the use of Hargreaves and multi-linear regression over ANN. They have
found in their study that the ANN models cannot be recommended due to their higher
difficulty involved, which is not in proportion to the significant gain inaccuracy.
Ağbulut et al. [10] have studied four different machine learning algorithms such as
support vector machine (SVM), kernel and nearest neighbor (k-NN), deep learning
(DL), and ANN to predict daily global solar radiation data of four different locations
in Turkey. The training of these algorithms has been done using daily maximum
and minimum ambient temperature, day length, cloud cover, extraterrestrial solar
radiation, and solar radiation of these locations. They have reported that the RMSE,
MABE, and R2 values of all the models are ranging from 2.273 to 2.820 MJ/m2 ,
from 1.870 to 2.328 MJ/m2 , from 0.855 to 0.936, respectively. They concluded that
the ANN model is the best model among all models. However, they have also stated
that all the machine learning models can be used to predict solar radiation with high
accurateness.
A similar kind of study has also been reported by many researchers. It can be
observed that in addition to the different pragmatic models available in the literature,
different AI-based techniques such as kernel nearest neighbor (k-NN), SVM, deep
learning (DL), genetic algorithms (GA), and ANN have recently gained a lot of atten-
tion from the scientific community in the prediction of solar radiation data [11–25].
Solar Radiation Prediction Using Artificial Neural … 513
Figure 2 shows a representative ANN model for solar radiation forecasting. Almost
in all the studies, it is reported that the ANN methods have offered more accurate
results in comparison with the conventional methods available for the prediction of
solar radiation. The different statistical prediction accuracy indices reported in the
literature are presented in Table 1.
In this section, a comparative study of the research work done in the field of solar
radiation prediction has been presented. From the literature review, it can be observed
that different conventional models, linear, nonlinear, fuzzy logic, and neural network-
based models have been used to predict solar radiation prediction. Diez et al. [26]
have compared the performance parameters of the ANN model with the classic
models (CENSOLAR typical year, weighted moving mean with two days delay,
linear regression, and Fourier and Markov analysis). They have found that the ANN
model is better and easy to implement as they require fewer inputs in comparison with
the classic models. Citakoglu [27] has compared the performance of ANN, adaptive
network fuzzy inference system (ANFIS), multiple linear regression (MLR) models,
and four empirical equations (Angstrom, Abdalla, Bahel, and Hargreaves–Samani)
used for the estimation of solar radiation. They have reported that the ANN model has
514 B. Paul and H. Paul
n 2
i (ti −oi )
R2 (coefficient of determination) 1− n
i (o i ) 2
n
i (oi − ti )
1 2
RMSE (root mean square error) n
1 n oi −ti
MAPE (mean absolute percentage error) n i ti × 100
n
i |oi −ti |
MAE (mean absolute error) n
n
i (oi − ti )
1
MBE (mean bias error) n
n
i (oi −ti )
RMBE (relative mean bias error) 1 n × 100
n i (oi )
n
(ti −oi )2
MRV (mean relative variance) i 2
ti − n in (ti )
1
better performance accuracy among all the techniques compared. Table 2 summarizes
a comparative representation of the research on solar radiation prediction reported
in the literature in the last decade.
From the existing literature, it can be observed that the ANN-based techniques are
very popular in predicting solar radiation. However, some of the key observations
and research gaps experienced during the study have been summarized below.
• It has been observed that to test, train, and validate the solar radiation predic-
tion models, prolonged climatic parameters datasets are required. The higher the
volume of data, the higher is the accuracy. Moreover, such a huge volume of
data is not easily available for most of the locations due to the expensiveness
of the measuring instruments. Along with that the difficulty in ease of access of
the measuring locations also puts severe constraints in considering accurate and
correct models.
• From the published literature, it can also be observed that the precise choice of
meteorological and geographical input parameters plays a key role in predicting
solar radiation with reliability and better accuracy.
• Sunshine hour and air temperature have been observed to be the most effective
input parameters to predict solar radiation. However, a generalized study can be
Table 2 Comparative representation of the research on solar radiation prediction
Component Authors Different techniques Input parameters Performance indicators Remarks
used used
Global Bilgili and Ozgoren [5] MLR, MNLR, and Sunshine duration, Testing ANN model is most useful
ANN air temperature, MLR among all the models due to its
wind speed, solar MAPE (%) = low MAPE value
radiation, wind 16.55–92.41
speed MNLR
MAPE (%) =
14.58–28.23
ANN
MAPE (%) =
9.23–18.50
Global Al-Shamisi et al. [6] ANN (radial basis Maximum RMSE = 35% In most the cases, RBF
function, multilayer temperature, mean MBE = 0.307% technique performed better than
perceptron) wind speed, MEP = 3.88% the MLP technique
sunshine, and mean R2 = 92%
Solar Radiation Prediction Using Artificial Neural …
relative humidity
Global/daily Xue [7] BPNN, genetic The month of the R = 0.934–0.953 BPNN optimized by PSO model
algorithm (GA), and year, sunshine RMSE = 0.78–0.932 is better than BPNN and BPNN
particle swarm duration, mean MAE = 0.685–0.836 with GA
optimization (PSO) temperature,
rainfall, wind
speed, relative
humidity, and daily
global solar
radiation
Global Antonopoulos et al. [9] Hargreaves method, Air temperature, Correlation coefficient Hargreaves and the multi-linear
ANN, multi-linear radiation, humidity, (r) = 0.891–0.94 regression model outperformed
regression methods and wind velocity the ANN model
(continued)
515
Table 2 (continued)
516
(continued)
Table 2 (continued)
Component Authors Different techniques Input parameters Performance indicators Remarks
used used
Global Pang et al. [13] ANN model, recurrent Solar radiation and ANN RNN model has the highest
neural network (RNN) other dry bulb R2 = 0.933–0.974 prediction accuracy in
model, and deep temperature, dew RNN comparison with ANN
learning algorithm point temperature, R2 = 0.97–0.983
wind speed, gust
speed, and wind
direction
Global Ozgoren et al. [14] ANN, multi-nonlinear Latitude, longitude, MAPE = 2.14% It was found that for the ANN
regression model altitude, month, 8.07% model, the error values were
monthly maximum, Correlation coefficient within the acceptable limits
atmospheric (r) = 0.9854–0.9990
temperature,
minimum
atmospheric
Solar Radiation Prediction Using Artificial Neural …
temperature, mean
atmospheric
temperature, soil
temperature, wind
speed, relative,
humidity,
atmospheric
pressure, rainfall,
vapor pressure,
cloudiness, and
sunshine duration
Surface Li et al. [15] Principle component Daily surface solar RMSE = They have proposed a hybrid
analysis, wavelet radiation maps, 30.78–30.98 W/m2 model for future mapping
transform analysis, clear sky index prediction
517
and ANN
(continued)
Table 2 (continued)
518
done to observe the influence of each parameter on the overall performance ANN
models.
• It has been experienced that different ANN models need to be developed using
different geographical input parameters, mainly latitude, longitude, altitude,
extraterrestrial radiation, and check for accuracy. This kind of study may be useful
to predict the solar radiation of any location without the dependency on the solar
radiation measurement instruments.
• It has also been observed that the performance indicators of various ANN models
do get altered with the impact of geographical and meteorological variables,
training algorithm, and ANN architecture configuration. Hence, a suitable selec-
tion of input parameters is very much significant for predicting solar radiation
with better accuracy.
• Some hybrid ANN models can be studied to check for the improvement in
accuracy.
This paper offered a comprehensive review of the most recent works to predict
solar radiation using ANN. Sustainable utilization of renewable solar energy involves
a precise understanding of the available solar radiation and its variation with climatic
parameters. In this direction, the ANN models are found to be the right choice to accu-
rately predict solar radiation. The ANN models are preferred due to their high poten-
tial to simulate the nonlinear and time-variant input–output systems in comparison
with other classical and empirical models available. This study provides an updated
state-of-the-art review to support further research in this direction. Moreover, unless
few studies working on this important topic of research, there is no existing method-
ological analysis available to perform the selection process of most pertinent input
variables for ANN models. The study can be done in this direction to choose the right
input parameter combinations for better accuracy of ANN models. A real-time solar
radiation prediction system using ANN requires a higher computation cost. There
may be challenges with real-time training in case of sudden changes in metrological
data. Finally, it has also been observed that there is a paucity of research work in the
direction to predict diffuse and beam solar radiation using ANN.
References
1. S. Mekhilefa, R. Saidur, A. Safari, A review on solar energy use in industries. Renew. Sustain.
Energy Rev. 15, 1777–1790 (2011)
2. J.C.R. Kumar, M.A. Majid, Renewable energy for sustainable development in India: current
status, future prospects, challenges, employment, and investment opportunities. Energy Sustain.
Soc. 10, 2 (2020)
3. https://mnre.gov.in/solar/current-status/. Accessed 15 July 2021
4. W. Yaïci, E. Entchev, Performance prediction of a solar thermal energy system using artificial
neural networks. Appl. Therm. Eng. 73(1), 1348–1359 (2014)
5. M. Bilgili, M. Ozgoren, Daily total global solar radiation modeling from several meteorological
data. Meteorol. Atmos. Phys. 112, 125–138 (2011)
520 B. Paul and H. Paul
6. M.H. Al-Shamisi, A.H. Assi, H.A.N. Hejase, Artificial neural networks for predicting global
solar radiation in Al Ain City—UAE. Int. J. Green Energy 10(5), 443–456 (2013)
7. X. Xue, Prediction of daily diffuse solar radiation using artificial neural networks. Int. J.
Hydrogen Energy 42(47), 28214–28221 (2017)
8. E. Federico Alsina, M. Bortolini, M. Gamberi, A. Regattieri, Artificial neural network opti-
misation for monthly average daily global solar radiation prediction. Energy Convers. Manag.
120, 320–329 (2016)
9. V.Z. Antonopoulos, D.M. Papamichail, V.G. Aschonitis, A.V. Antonopoulos, Solar radiation
estimation methods using ANN and empirical models. Comput. Electron. Agric. 160, 160–167
(2019)
10. Ü. Ağbulut, A.E. Gürel, Y. Biçen, Prediction of daily global solar radiation using different
machine learning algorithms: evaluation and comparison. Renew. Sustain. Energy Rev. 135,
110114 (2021)
11. J.M. Álvarez-Alvarado, J.G. Ríos-Moreno, S.A. Obregón-Biosca, G. Ronquillo-Lomelí, E.
Ventura-Ramos Jr., M. Trejo-Perea, Hybrid techniques to predict solar radiation using support
vector machine and search optimization algorithms: a review. Appl. Sci. 11, 1044 (2021)
12. M. Taki, A. Rohani, H. Yildizhan, Application of machine learning for solar radiation modeling.
Theor. Appl. Climatol. 143, 1599–1613 (2021)
13. Z. Pang, F. Niu, Z. O’Neill, Solar radiation prediction using recurrent neural network and
artificial neural network: a case study with comparisons. Renew. Energy 156, 279–289 (2020)
14. M. Ozgoren, M. Bilgili, B. Sahin, Estimation of global solar radiation using ANN over Turkey.
Expert Syst. Appl. 39(5), 5043–5051 (2012)
15. P. Li, M. Bessafi, B. Morel, J. Chabriat, M. Delsaut, Q. Li, Daily surface solar radiation
prediction mapping using artificial neural network: the case study of Reunion Island. ASME.
J. Sol. Energy Eng. 142(2), 021009 (2020)
16. S.-Y. Wang, J. Qiu, F.-F. Li, Hybrid decomposition-reconfiguration models for long-term solar
radiation prediction only using historical radiation records. Energies 11, 1376 (2018)
17. R. Meenal, A. Immanuel Selvakumar, Assessment of SVM, empirical and ANN based solar
radiation prediction models with most influencing input parameters. Renew. Energy 121, 324–
343 (2018)
18. Y. Feng, N. Cui, Q. Zhang, L. Zhao, D. Gong, Comparison of artificial intelligence and empirical
models for estimation of daily diffuse solar radiation in North China Plain. Int. J. Hydrogen
Energy 42(21), 14418–14428 (2017)
19. M. Laidi, S. Hanini, A. Rezrazi, M.R. Yaiche, A.A. El Hadj, F. Chellai, Supervised artificial
neural network-based method for conversion of solar radiation data (case study: Algeria). Theor.
Appl. Climatol. 128, 439–451 (2017)
20. M. Vakilia, S.-R. Sabbagh-Yazdi, K. Kalhorb, S. Khosrojerdi, Using artificial neural networks
for prediction of global solar radiation in Tehran considering particulate matter air pollution.
Energy Procedia 74, 1205–1212 (2015)
21. B. Ihya, A. Mechaqrane, R. Tadili, M.N. Bargach, Prediction of hourly and daily diffuse solar
fraction in the city of Fez (Morocco). Theor. Appl. Climatol. 120(3), 737–749 (2014)
22. A.K. Yadav, H. Malik, S.S. Chandel, Selection of most relevant input parameters using WEKA
for artificial neural network based solar radiation prediction models. Renew. Sustain. Energy
Rev. 31, 509–519 (2014)
23. Y.W. Kean, V. Karri, Comparative study in predicting the global solar radiation for Darwin,
Australia. ASME. J. Sol. Energy Eng. 134(3), 034501 (2012)
24. A. Mellit, A.M. Pavan, A 24-h forecast of solar irradiance using artificial neural network:
application for performance prediction of a grid-connected PV plant at Trieste, Italy. Sol.
Energy 84(5), 807–821 (2010)
25. M.A. Behrang, E. Assareh, A. Ghanbarzadeh, A.R. Noghrehabadi, The potential of different
artificial neural network (ANN) techniques in daily global solar radiation modeling based on
meteorological data. Sol. Energy 84, 1468–1480 (2010)
26. F.J. Diez, L.M. Navas-Gracia, L. Chico-Santamarta, A. Correa-Guimaraes, A. Martínez-
Rodríguez, Prediction of horizontal daily global solar irradiation using artificial neural networks
Solar Radiation Prediction Using Artificial Neural … 521
(ANNs) in the Castile and León region, Spain. Agronomy 10(96), 2–20 (2020). https://doi.org/
10.3390/agronomy10010096
27. H. Citakoglu, Comparison of artificial intelligence techniques via empirical equations for
prediction of solar radiation. Comput. Electron. Agric. 118, 28–37 (2015)
A Concise Review on Automatic Text
Summarization
Dishank Jani, Nehal Patel, Hemant Yadav, Sanket Suthar, and Sandip Patel
Abstract Today, data is the most important thing humanity needs, thus under-
standing the linguistics of such a large data is not practically possible so, text summa-
rization is introduced as the problem in natural language processing (NLP). Text
summarization is the technique to convert long text corpus such that the semantics of
the text does not change. This paper provides a study of different text summarization
methods till Q3 2020. Text summarization methods are broadly classified as abstrac-
tive and extractive. In this paper, more focus is given to abstractive summarization
a review for most of the methods of text summarization to date is written concisely
along with the evaluations and advantages-disadvantages also for each method. At
the end of the paper, the challenges faced by researchers for this task are mentioned
and what improvements can be done in every method for summarization is also
written in a structured way.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 523
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_40
524 D. Jani et al.
1 Introduction
NLP is among the most important research areas in machine learning. Text summa-
rization is an application of NLP among others. There are two types of text summa-
rization, abstractive and extractive summarization. In extractive summarization, the
sentences of the corpora are given a rank based on some techniques and the sentences
with the highest rank are selected. In abstractive summarization, the model has to
develop semantics of corpora and use its own ability to create its sentences with
help of various algorithms and techniques. The input structure used can be multi-
document or single document. In a single document, the length of the text is small
while in the case of multi-document length is unpredictable. Abstractive text summa-
rization was coined first time in 1958 by author Luan. Luan used the approach of
term frequency to generate the model; it is the simplest model which consists of five
steps. The classification of the summarization method is illustrated in Fig. 1. The
first full-scale summarizer was built in 2003 by Alonso I Alemany and Fuentes Fort,
which was based on the lexical chain concept. In 2007, Dipanjan et al. focused on
main three aspects of a fine text summarization model which are,
EvaLaxon
OntoClean
Ontology Based
Natural Language
Metrics
Graph Based
OntoMetric
Structured Based
Method
Rule Based
Tree Based
Supervised
Methods Mutimodal
Semantic
Rank Based
Fuzzy Logic
Analysis
Unsupervised
Methods
Extractive Latent Semantic
Summarization Analysis (LSA)
Frequency Based
In today’s world, abstractive text summarization has earned lots of popularity due to
its ability to generate sentences. Abstractive text summarization methods are gener-
ally hard to understand and generate [1]. Most abstractive approaches are supervised.
There are mainly three approaches for abstractive summarization structured-based,
semantic-based (using NLP), and hybrid-based methods [2]. Figure 2 shows the
common flow of text summarization.
There are mainly two NLP architectures for abstractive text summarization which
holds 90% of research areas
• Pointer Generator Architecture
• Seq2Seq Architecture [3]
• Combination of encoder–decoder with pointer generator Network.
The use of long short-term memory (LSTM), of gated recurrent neural network
(GRU) in the model, gives faster and more efficient results compared to basic RNN.
LSTM is used when accuracy is more important, while GRU is used when speed
is more important. Today, abstractive text summarization using NLP approaches is
considered one of the most effective methods and is widely considered by researchers.
Text Training
Load Data Create Word Sunnarization the model
Lemmatization Embedding
Dataset Cleaning Vocabulary model and
methods evaluation
This method can be viewed in two categories, generation-based approach [7] and
revision-based approach. The rule-based approach consists of six steps according to
Pimpalshende and Mahajan [8]:
• Text preprocessing to classify text into different subfields based on its domain.
• Removing unnecessary and null values (decomposition).
• Sentence tokenization and feature vector generation by analysis of generated
subfields.
• Eliminating stop words.
• On basis of feature vector similarity matrix and phrases are generated.
• Selection of most likely sentences based on the probabilities of context concerning
the target (attention mechanism).
Advantages:
• Powerful at linguistics.
• Most likely, sentences will have appeared at the starting of the summary only.
Limitation:
• Not efficient for multi-document text corpus.
A Concise Review on Automatic Text Summarization 527
The graph-based method can be pragmatic to both single documents by Kumaresh and
Ramakrishnan [9] and multi-document [10] inputs corpus. The selection of proper
sentences in a structured manner for extraction plays a very crucial role. Selection
is done on basis of ranking algorithms. There are two types of ranking algorithms
available [11] which are TextRank and shortest path algorithm. TextRank algorithm
gives better accuracy compared to shortest path algorithm as TextRank algorithm
does not depend on the vertices of a graph; it highly depends on the context and
according to it, and this algorithm does prediction. Yasunaga et al. [10], GRU for the
graph-based model is used which is less accurate than GRU with similarity graph.
While using GRU with personalized discourse, graph performs with the most accu-
racy. Graph-based text summarization for both single as well as a multi-document
input format is proposed by Erkan and Radev [12]. The LexRank algorithm [12] for
graph-based summarization is implemented on DUC 2003 and2004 datasets with
an average ROUGE-1 (Recall-Oriented Understudy for Gisting Evaluation) score
of 0.3963 (DUC 2003) and 0.3966 (DUC 2004) for continuous LexRank. Mihalcea
and Tarau [13] has proposed a language-independent algorithm for graph-based text
summarization, but this method is of extractive output type as well as can be applied
only for single text document inputs.
Data processing is done by Ajiambo et al. [14]. The whole corpus is divided into
different phrases and a decision tree is developed based on common words. Selection
of the common phrases is done with some searching algorithms like beam search or
sometimes theme algorithm also. Gupta and Lehal [15] nested tree implementation of
tree-based summarization is conveyed. Advantage: Summary is smaller with a higher
ROUGE score compared to traditional method, and Limitation: Nested Tree-based
method [16] is limited to single document input type only. It can be transformed to
multi-document by the addition of an RST Parser.
Sahoo et al. [19] mainly focus on hybrid summarization on single document input.
It uses the concept of sentence compression. According to them, hybrid summariza-
tion consists of five steps test preprocessing, ranking sentences based on features,
applying a rule-based approach, compressing the sentence generated, and finally the
selection of sentences using inference. ROUGE evaluation for DUC 2002 dataset
shows the score of 0.5239 (average recall for R-1) and 0.4691 (average precision
for R-1). Advantage: 71% accurate, and disadvantages: quality of summary depends
on the quality of the compression method, this approach is not as powerful as NLP
approaches such as encoder–decoder and pointer generator architecture.
Aksoy et al. [20] there are four stages of the semantic-based approach. Pronominal
resolution understands the semantics of the word and categorizes it based on its
features. Thus, by pronominal resolution, we can handle null and unknown words.
With the help of part of speech (POS) tagging, identification of the word based on
noun, pronoun, or adjective is evaluated. After POS tagging, our text preprocessing
is done. Figure 3 shows the semantic-based text summarization flow.
Semantic role modeling (SRL) [21] is used to obtain the relationship between each
word (target) when the context is known. WordNet is used to select the most likely
sentence from the vocabulary. The semantic-based method is further categorized
into three approaches. The multimodal document is solely made to handle multi-
document input corpus precisely [22]. Multimodal document produces symmetrical
abstract summary in salient as well as graphical form. The limitation of this model
is that it is evaluated manually. Another semantic-based method is the information-
based approach [23] in which extraction of graph-based vocabulary is performed. The
summary produced is 78% smaller than the input corpus and with a 3% loss only.
The disadvantage of this method is it faces issues in understanding the semantics
of sentences, thus the quality of linguistics is poor. The semantic graph method is
an extension to the information-based approach but uses the context of corpus to
predict the output. The graph produced by this approach is called a rich semantic
graph (RSG). The summary predicted is small and productive. The only limitation
faced by the semantic graph-based approach is it only works on a single document
input corpus. In all these techniques, the semantic representation of corpus is pushed
in the model and then performs the above-mentioned four steps in a structured and
effective manner.
A Concise Review on Automatic Text Summarization 529
Extractive text summarization includes the selection of important texts from the input
corpus and concatenating them for the output summary. Selection is done based on
rank, vital keywords, and phrases in the sentences [24]. For the ranking of sentences,
various parameters are considered such as title word, content word features, cue
phrases feature, biased word features, and upper case words [25]. Extractive text
summarization methods are of two types, unsupervised and supervised methods.
Unsupervised learning methods include a graph-based method, latent semantic anal-
ysis (LSA), fuzzy logic approach, and concept-based approach. Supervised learning
methods include summarization using Bayes rule, neural networks approach (using
NLP), and conditional random field (CRF). In this paper, we will mainly focus on
supervised abstractive text summarization approaches and unsupervised extractive
text summarization approaches. Extractive text summarization includes four steps
[26]:
• Text preprocessing
• Removing of stop words
• Steaming
• POS tagging.
The advantage of extractive text summarization is that it is easy to generate and
understand. There are rarely any grammatical errors in extractive text summarization
compared to abstractive methods as it consists of the same phrases and keywords as
that of the original input corpus. Major limitations of these methods include:
• Lack of understanding semantics of sentences.
• Inefficient in the analysis of input corpus.
• Sometimes import keywords and phrases can get missed out.
• Sentences at times became meaningless.
• Selected sentences are generally longer than other sentences, thus unnecessary
phrases may get included sometimes.
In this method, training of data is not necessary. This machine learning technique
is not as accurate as supervised learning techniques. Its main goal is a selection of
words from corpus, assemble them in a structured way, and place them in summary
with help of clustering. According to Aggarwal and Gupta [21], clustering analysis is
the process of assembling the text in a structured way to predict the correct summary.
There are four basic clustering analysis algorithms for organizing the text K-means,
fuzzy C-means, hierarchical clustering, and a mixture of Gaussians.
530 D. Jani et al.
6 Evaluation Methods
Evaluation methods are of two types [28], human evaluation and automatic evalu-
ation. Automatic evaluation is performed based on some algorithms such as Bleu
score [29] or ROUGE score. Bleu score algorithm is generally recommended for
machine translation models while for text summarization ROUGE score is prefer-
able. ROUGE (Recall-Oriented Understudy for Gisting Evaluation) score method is
generally preferred for evaluation of text summarization. This method was proposed
by Johri and Kumar [30]. According to Lin, ROUGE score is calculated on basis
of the relationship between the predicted summary and the original summary, more
the difference lowers the score. Different variations of ROUGE score are available
for evaluation [22]; for instance, N-gram Co-Occurrence Statistics (ROUGE-N),
Longest Common Subsequence (ROUGE-L), Weighted Longest Common Subse-
quence (ROUGE-W), and Skip-Bigram Co-Occurrence Statistics (ROUGE-S). The
comparison between different evaluation methods is shown in Fig. 4 [31]. ROUGE-
N (N-gram Co-Occurrence Statistics): This variation is similar to Bleu score in
machine translation [32]. Here, the length greater than 1 indicates that the summary
is coherent. We generally consider the value of N between 1 and 9. ROUGE-1 and
ROUGE-2 are the most commonly used variations.
ROUGE-L (Longest Common Subsequence): We Know LCS means the subse-
quence of the sentence with maximum length. Thus, the prediction of the score is
done on basis of calculation of LCS of original and predicted summary and deriving
the similarity index in-between. The similarity is derived by intuition that the longer
A Concise Review on Automatic Text Summarization 531
the calculated subsequence between two sequences of text, the better will the simi-
larity. The level of subsequence can be either sentence level or summary level. This
is the most effective technique of rouge evaluation if the LCS predicted is effi-
cient. ROUGE-W (weighted longest common subsequence): Normal LCS does not
work efficiently as it is unable to distinguish the relationship between spatial rela-
tions. Thus, to overcome this, weighted component is appended to traditional LCS
which is referred to as WLCS (weighted LCS). The extra weight ‘k’ is in form of
a polynomial function. ROUGE-W outperforms ROUGE-L for such spatial rela-
tions. ROUGE-S (Skip-Bigram Co-Occurrence Statistics): Skip-Bigram is the pair
of words including null spaces. ROUGE-S compares the indistinguishable Skip-
Bigrams and evaluates them on basis of difference. ROUGE-S performs better than
ROUGE-L. In ROUGE-S, importance of the candidate score, I s not considered; thus
to overcome this, ROUGE-SU is introduced. We can obtain the ROUGE-SU score
by the addition of the <SOS> tag at the beginning of a sentence. Human evaluation
methods perform similar to Bleu Score (for machine translation) their correlation is
less compared to ROUGE evaluation. We can achieve a 90% correlation with the
assistance of the ROUGE valuation approach. On the evaluation of the DUC 2001
dataset, we found that ROUGE-W performs significantly better than ROUGE-L and
ROUGE-SU. Table 1 presents the summarized view of all the methods along with
their summary type, reference papers, dataset, and its efficiency using the ROUGE
Score.
Table 1 Comparison of various summarization techniques
532
Approach Input type support (yes/no) Output summary Available languages Dataset used for ROUGE evaluation
Single document Multi-document type evaluation efficiency
Ontology-based [17] Yes (domain specific) No Extractive English DUC 2002 0.5058 (R-1 F2)
Ontology-based [18] Yes No Abstractive English DUC 2002 0.4636 (R-1)
Rule-based [8] Yes No Abstractive English, Hindi DUC 2006 0.44151 (R-1)
Graph-based [11] No Yes Abstractive and English DUC 2004 0.393 (R-1)
extractive
Semantic-based [20] Yes No Abstractive and English, Hindi DUC 2006 0.39017
extractive
SumBasic [33] No Yes Extractive English DUC 2005 0.26054
LSA [34] No Yes Extractive English, Hindi DUC 2004 0.289 (ROUGE-L)
Fuzzy logic Yes Yes Extractive English, Hindi DUC 2003 0.6257
implementation [35]
Tree-based [36] Yes Yes Abstractive English DUC 2004 0.354 (R-1)
Term-frequency [37] Yes No Extractive English DUC 2005 0.672
LexRank [12] Yes Yes Extractive English DUC 2003 and 0.3963 (DUC 2003)
DUC 2004 and 0.3966 (DUC
2004) for R-1
(continued)
D. Jani et al.
Table 1 (continued)
Approach Input type support (yes/no) Output summary Available languages Dataset used for ROUGE evaluation
Single document Multi-document type evaluation efficiency
Lead and body [38] Yes No Extractive English DUC 2005 0.22176 (ROUGE 2)
for
FreqDist-Term_Dice
Hybrid [39] Yes No Abstractive English DUC 2002 0.464 (precision
score), 0.508 (recall
score), 0.485 (F-score)
w.r.t OPIOSIS
reference 10%
Feature-based Yes No Extractive English DUC 2004 0.6629 (R-1)
summarization [40]
A Concise Review on Automatic Text Summarization
533
534 D. Jani et al.
7 Conclusion
In this paper, we have studied and analyzed various research papers on different
extractive and abstractive text summarization methods. In abstractive text summa-
rization, we studied three main methods structured, semantic, and hybrid text summa-
rization methods. In structured abstractive summarization, there are four variations
available ontology-based, graph-based, rule-based, and tree-based. Apart from that,
we observed different evaluation methods such as human evaluation, Bleu score,
and ROUGE Score evaluation. In this paper, different types of datasets available for
English and Hindi text along with their classification are also mentioned above in a
concise manner. For all the above text summarization methods, some problems are
not solved yet. Rule-based text summarization performs better than graph-based, but
it performs poorly for multi-document input corpus, while graph-based efficiently
summarizes multi-document but a generated summary can go out of order. Similarly,
LSA performs better, but it is unable to handle word embedding in a precise way.
Overall, summary generated using NLP idea results in better handling of words
even using the abstractive approach. Seq2seq approach using encoder–decoder
architecture produces a less concise summary than pointer-generator architecture,
while the summary produced by pointer-generated architecture is more extractive
oriented. Encoder–decoder architecture with the pointer-generator switch is a better
alternative.
References
11. K.S. Thakkar, R.V. Dharaskar, M.B. Chandak, Graph-based algorithms for text summarization,
in 2010 3rd International Conference on Emerging Trends in Engineering and Technology
(IEEE, 2010), pp. 516–519
12. G. Erkan, D.R. Radev, LexRank: graph-based lexical centrality as salience in text summariza-
tion. J. Artif. Intell. Res. 22, 457–479 (2004)
13. R. Mihalcea, P. Tarau, A language independent algorithm for single and multiple docu-
ment summarization, in Companion Volume to the Proceedings of Conference Including
Posters/Demos and Tutorial Abstracts (2005)
14. F. Ajiambo, C. Nzila, S. Namango, B. Deshmukh Ashvini, P. Shelke Pooja, A. Kokare Sayali,
S. Taware Saksha et al., Int. Res. J. Eng. Technol. (IRJET) 4(03) (2017)
15. V. Gupta, G.S. Lehal, A survey of text summarization extractive techniques. J. Emerg. Technol.
Web Intell. 2(3), 258–268 (2010)
16. M.S. Binwahlan, N. Salim, L. Suanmali, Swarm diversity based text summarization, in Inter-
national Conference on Neural Information Processing (Springer, Berlin, Heidelberg, 2009),
pp. 216–225
17. L. Hennig, W. Umbrath, R. Wetzker, An ontology-based approach to text summarization, in
2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent
Technology, vol. 3 (IEEE, 2008), pp. 291–294
18. M.J. Mohan, C. Sunitha, A. Ganesh, A. Jaya, A study on ontology based abstractive
summarization. Procedia Comput. Sci. 87, 32–37 (2016)
19. D. Sahoo, A. Bhoi, R.C. Balabantaray, Hybrid approach to abstractive summarization. Procedia
Comput. Sci. 132, 1228–1237 (2018)
20. C. Aksoy, A. Bugdayci, T. Gur, I. Uysal, F. Can, Semantic argument frequency-based multi-
document summarization, in 2009 24th International Symposium on Computer and Information
Sciences (IEEE, 2009), pp. 460–464
21. R. Aggarwal, L. Gupta, Automatic text summarization. Int. J. Comput. Sci. Mob. Comput.
6(6), 158–167 (2017)
22. C. Greenbacker, Towards a framework for abstractive summarization of multimodal documents,
in Proceedings of the ACL 2011 Student Session (2011), pp. 75–80
23. D. Mallett, J. Elding, M.A. Nascimento, Information-content based sentence extraction for
text summarization, in International Conference on Information Technology: Coding and
Computing, 2004. Proceedings. ITCC 2004, vol. 2 (IEEE, 2004), pp. 214–218
24. H.P. Luhn, The automatic creation of literature abstracts. IBM J. Res. Dev. 2(2), 159–165
(1958)
25. N. Moratanch, S. Chitrakala, A survey on extractive text summarization, in 2017 International
Conference on Computer, Communication and Signal Processing (ICCCSP) (IEEE, 2017),
pp. 1–6
26. A. El-Refaey, A.R. Abas, I. Elhenawy, Review of recent techniques for extractive text
summarization. J. Theor. Appl. Inf. Technol. 96(23), 7739–775 (2018)
27. A.P. Widyassari, S. Rustad, G.F. Shidik, E. Noersasongko, A. Syukur, A. Affandy, Review of
automatic text summarization techniques & methods. Journal of King Saud Univ. Comput. Inf.
Sci. (2020). https://doi.org/10.1016/j.jksuci.2020.05.006
28. M. Allahyari, S. Pouriyeh, M. Assefi, S. Safaei, E.D. Trippe, J.B. Gutierrez, K. Kochut, Text
summarization techniques: a brief survey. arXiv preprint arXiv:1707.02268 (2017)
29. K. Papineni, S. Roukos, T. Ward, W.-J. Zhu, BLEU: a method for automatic evaluation
of machine translation, in Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics (2002), pp. 311–318
30. P. Johri, A. Kumar, Review paper on text and audio steganography using GA, in International
Conference on Computing, Communication & Automation (IEEE, 2015), pp. 190–192
31. C.-Y. Lin, Rouge: a package for automatic evaluation of summaries, in Text Summarization
Branches Out (2004), pp. 74–81
32. K.V. Kumar, D. Yadav, An improvised extractive approach to Hindi text summarization, in
Information Systems Design and Intelligent Applications (Springer, New Delhi, 2015), pp. 291–
300
536 D. Jani et al.
Abstract Heart disease, often known as Cardiovascular disease is one of the most
lethal yet silent killers of humans, resulting in a rise in the mortality rate of sufferers
per year. Every year, it kills nearly 17 million people worldwide in myocardial
infarctions and cardiac attacks. Heart failure (HF) occurs when the heart cannot
produce enough blood to satisfy the body’s needs. On the other hand, current risk
prediction techniques are moderately effective because statistical analytic approaches
fail to capture prognostic information in big data sets with multi-dimensional inter-
actions.The research investigates the proposed AdaBoost ensemble technique with
Synthetic Minority Oversampling Technique (SMOTE) on the medical reports of 299
heart failure patients obtained during their follow-up period at Faisalabad Institute
of Cardiology (Punjab) and Allied Hospital Faisalabad (Pakistan), during April–
December, 2015. The proposed approach builds on ensemble learning techniques
such as adaptive boosting. It provides a decision support mechanism for medical
practitioners to identify and forecast heart diseases in humans based on risk factors
for heart disease. The efficacy of the proposed method validates by comparing various
machine learning algorithms, and it is evident that the proposed method performs
better with an accuracy of 96.34.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 537
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_41
538 B. Kameswara Rao et al.
1 Introduction
Heart issues are currently a major source of worry in the medical community. The
heart plays a vital role in the human body’s organs. The lack of blood circulation to the
human body leads to the heartbeat disability and causes death in minutes. Major risk
factors of heart disease are unhealthy blood cholesterols, usage of tobacco, alcohol,
diabetes mellitus, obesity, eating high saturated fats, age, family history, lack of
physical exercise, and poor diet [1]. As per the World Health Organization (WHO)
reports, due to coronary artery disease (CAD), nearly, 17.5 million people are losing
their life [2]. Various types of heart diseases are arrhythmia (occurs due to heartbeat
abnormality), atherosclerosis (this condition leads to limits of oxygen flow to the
organs, which causes heart stroke), hypertensive heart disease (leads to a thickness
of heart muscles and heart failure), coronary artery disease (also called as ischemic
heart disease), congenital heart defects, pulmonary valve stenosis (this condition
arises before birth), heart infections (bacteria or viruses cause this condition). Some
of the symptoms of heart disease problems are chest pain, breathlessness, fatigue,
stomach pain, sweating, irregular heartbeat, arm or leg pain, depression, and swollen
ankles [3–5].
In recent times, providing the best quality services and effective diagnosis is chal-
lenging in the medical field. The severity of heart disease is the leading cause, which
may lead to sudden death. Heart disease, however, can be efficiently detected in its
early stages and treated, controlled, and managed. An electrocardiogram (ECG) is a
tool that examines the heartbeat rate and shows the possible functioning and irreg-
ularities of the heartbeat. Several clinicians are still unable to address the precise
needs of heart disease patients. However, it is essential to define an accurate diag-
nosis system to avoid problems of heart disease. Therefore, it is necessary to develop
a diagnostic plan based on ECG data and machine learning methods to classify heart
diseases and detect the problems in the cardiovascular system. Machine learning
is mainly used in the medical field to effectively diagnose, detect, and predict
various diseases. Machine learning (ML) is progressively used to predict various
heart diseases, and it is a subset of artificial intelligence (AI). Nowadays, it is crit-
ical to be able to sense the input data and decide the given task in the absence
of human intervention. Machine learning works based on the models by receiving
input data and applying mathematical or statistical models to predict the outputs.
Various ML algorithms are utilized for daily actives in different domains, especially
in the healthcare domain; more research is being conducted to forecast the severity
level of the disease [6–9]. The ensemble learning model provides better accuracy by
addressing the issues that machine learning algorithms face, such as time-consuming
data collecting, error-prone methods, and selecting the correct algorithm. Individual
machine learning algorithms are combined to form an ensemble learning model. The
ensemble learning model provides high accuracy and addresses the issues faced on
machine learning algorithms, such as time-consuming data collecting, error-prone
methods, and selecting the correct algorithm.
The major contributions in the article include:
Identification of Heart Failure in Early Stages … 539
• Ensemble learning method adaptive boosting has been proposed for the identifi-
cation of the heart disease in the early stages.
• The class imbalance issue in the data has been addressed by the oversampling
technique SMOTE.
• The proposed method performance is compared with the different ensemble and
ML models such as bagging, stacking, K-nearest neighbor (KNN), multi-layer
perceptron (MLP), linear discriminant analysis (LDA), quadratic discriminant
analysis (QDA), decision tree (DT), logistic regression (LR), and Gaussian Naive
Bayes (GNB).
The remainder of the paper is divided into five sections. Section 2 discusses the
literature study on the prediction of heart disease; Sect. 3 describes the proposed
approach of the paper. Section 4 depicts the experimental setup for the proposed
technique, which includes empirical data, data preprocessing, simulation environ-
ment, and parameter setting, as well as compared methodologies and performance
measurements, are taken into consideration to validate the suggested method; Sect. 5
results are analyzed, and at last, but not least in Sect. 6 concludes the paper.
2 Literature Study
Table 1 (continued)
S. Data set (source) Method Performance Evolution References
No. factor
7 Enterprise data Logistic regression, Deep unified Accuracy [17]
warehouse, gradient boosting, max networks—76.4%
research patient out networks, deep (accuracy)
data repository unified networks,
cost-saving
evaluation—connected
cardiac care program
(CCCP)
8 1106 heart failure Multiple kernel Multiple kernel CI, hazard [18]
patients records, learning, K-means learning—95%, ratio (HR)
(MADIT-CRT) clustering K-means
clustering—95%
(CI)
9 Allied Hospital Cox regression, Cox Calibration [19]
Faisalabad-Pakistan Kaplan Meier plot, regression—81% slope, ROC
and Institute of Martingale residuals, (discrimination curve,
Cardiology bootstrapping ability) discrimination
Apr–Dec, 2015 data ability
10 Random clinical Multistep modeling EMR-wide AUC, [20]
data strategy, EMR-wide predictive accuracy
predictive model model—0.78
(AUC), 83.19%
(accuracy)
3 Proposed Method
Freund et al. [21] presented an ensemble learning technique called adaptive boosting
(AdaBoost). Base learner classifiers have been built based on the distribution of the
dataset weights, where previous base learners’ predictions determine the weights of
the instances on the dataset. If a prediction on an instance causes misclassification,
the instance weight increased in the next model; otherwise, the weight will remain
unchanged. The weighted vote makes the final decision of the base learners and the
weights basing on the models’ misclassification rates. Usually, decision trees use as
base learners in AdaBoost, where the model with a high prediction accuracy will have
high weights, and a model with a low prediction accuracy will have low weights.
AdaBoost Algorithm
1: Initialize weights w = 1
p, where p is instances in the data
2: While q < Q do: where Q is the number of models that need to be grown
2.1 A model is constructed for all the data points, and the hypothesis is, Hq x p , where
x p corresponds to the dataset and, y p corresponding labels
(continued)
542 B. Kameswara Rao et al.
(continued)
AdaBoost Algorithm
2.2 Compute the error e for the training set, which sums over all data points x p using
Eq. 1
S (q)
w p ∗G ( y p = Hq (x p ))
eq = p=1
S (q) (1)
p=1 w p
2.4 Update the weights for training S in the following (q + 1) model as shown in Eq. 3:
(q+1) (q)
wp = w p ∗ exp m ∗ G y p = Hq x p (3)
3: Continue Q iterations and compute the functional output by using Eq. 4:
Q
f (x) = sign q ∗ Hq x p (4)
q
4 Experimental Setup
This section addresses the dataset, data processing followed in the experiment, simu-
lation environment, parameter setting of the proposed method, and various classi-
fiers, and performance measures to verify the proposed method performance with
the comparable techniques.
The clinical heart dataset considers for experimentation, and it comprises 299
patient’s heart clinical medical history collected from Allied Hospital in Faisalabad
(Punjab, Pakistan) and Faisalabad Institute of Cardiology during April–December,
2015 [11]. Out of 299 patients, 105 are women, and 194 are men ranging from 40 to
95 years. The dataset has 13 attributes that refer to essential features, clinical features,
body features, and lifestyle features for each patient, including the detailed features,
type, and description of each feature of the dataset is presented in Table 2. The dataset
consists of Boolean features such as high blood pressure, anemia, diabetes, smoking,
and sex. ‘Creatinine phosphokinase’ (CPK) reflects the level of CPK enzyme in the
blood. When muscle tissue is damaged, CPK is released into the bloodstream, when
tissues are damaged. CPK levels that are too high in a patient’s blood can lead to heart
failure. The ‘ejection fraction’ indicates how much blood the left ventricle pumps
out as a proportion of each contraction. ‘Platelets’ are the count of platelets in the
Identification of Heart Failure in Early Stages … 543
blood. When a muscle breaks down, creatine produces ‘serum creatinine’, which is a
waste product. Doctors use serum creatinine in the blood to monitor the functioning
of the kidneys. Sodium is a mineral that helps nerves and muscles function correctly.
The ‘serum sodium’ test is a common blood test that determines if a patient’s blood
sodium levels are normal or not. The goal attribute in the proposed work is ‘Death
event’, which indicates whether the patient died or survived before the conclusion
of the follow-up period, which is on average 130 days.
Data preprocessing is a critical activity that will increase the quality of raw exper-
imental data. It is a preliminary stage that takes all the data, sorts it, organizes it,
and merges it. Data preprocessing can also significantly impact the efficiency of
the generalization of a supervised machine learning algorithm. Null values in the
dataset are verified, and there are no missing or null values. The dependent variable
‘death_event’ is highly imbalanced with ‘0’: 203 and ‘1’: 96, presented in Fig. 1.
Synthetic minority oversampling technique (SMOTE) is used in the experimenta-
tion to resolve the class imbalance. SMOTE is implemented using over-sampling
the minority class or under-sampling the majority class. In the article, oversampling
of minority classes uses to address the class imbalance. Before feeding the data to
the classification model, SMOTE was applied to obtain better accuracy. It is done
simply by duplicating instances from the minority class example in the dataset until
544 B. Kameswara Rao et al.
a model fit. After using SMOTE, the ‘death_event’ class consists of ‘0’: 203 and ‘1’:
203 presented in Fig. 2.
The proposed adaptive boosting classifiers predict the survival of heart disease from
the clinical records of several patients. For validating the performance, the proposed
method compares with various machine learning and ensemble learning algorithms
different performance metrics such as confusion matrix, true-positive rate, false-
positive rate, precision, f 1-score, accuracy, and ROC-AUC (Area under the receiver
operating characteristic curve) [22].
546 B. Kameswara Rao et al.
5 Result Analysis
The survival of heart failure prediction using the AdaBoost ensemble technique
and validated the proposed method with various ensemble learning techniques
and machine learning is described in this section. The experimental results of the
measures mentioned above, such as true positive (TP), true negative (TN), false posi-
tive (FP), false negative (FN), precision, true-positive rate (TPR), F1, ROC-AUC,
and false-positive rate (FPR), are presented in Table 4. Compared to other techniques
such as stacking, bagging, logistic regression, decision tree, linear discriminant anal-
ysis, quadratic discriminant analysis, multi-layer perceptron, K-nearest neighbor, and
Gaussian Naive Bayes, the results obtained by the proposed AdaBoost technique are
utterly better. Among all, k-nearest neighbor performs the worst in relative efficiency,
while Ada-Boost performs the best.
From the assessment of the findings, ensemble learning methods obtained an accu-
racy between 96 and 90%, and machine learning algorithms achieved between 89 and
63%. In detail, the AdaBoost classifier achieved better accuracy of 96.34, followed
by stacking and bagging, with 93.9% and 90.24%, respectively. Decision tree with
89.02%, logistic regression, and linear discriminant analysis obtained the same accu-
racy 86.59%; quadratic discriminant analysis, Gaussian Naïve Bayes, multi-layer
perceptron produce 82.93%, 81.71%, 70.73% accuracies, respectively, and finally,
k-nearest neighbor obtained 63.4% accuracy. True positive and true negative signifies
the correctly classified instances. False positive and false negative represent incor-
rectly classified instances. In the proposed AdaBoost classifier, TP and TN are 36 and
43, indicating that 36 patients are healthy and predicted as healthy, and 43 patients
are unwell and anticipated as sick. FP and FN are 0 and 3, signifying that all iden-
tified sick patients are predicted as unhealthy, and 3 patients are ill but predicted as
healthy. In the case of a recall, stacking and decision tree produced the highest value
with 0.95, followed by AdaBoost and multi-layer perceptron obtained 0.92; bagging
achieved 0.87; LDA and LR got 0.85, Gaussian Naïve Bayes and QDA with 0.79,
and k-nearest neighbor attained a value of 0.67. In the case of FPR value, AdaBoost
obtained 0, followed by bagging, stacking, LDA, LR, QDA, DT, GNB, KNN, and
MLP have 0.07, 0.07, 0.12, 0.12, 0.14, 0.16, 0.16, 0.40, 0.49, respectively. For preci-
sion, AdaBoost delivers a value of 1.00, followed by bagging, stacking, LDA, LR,
DT, QDA, GNB, MLP, KNN produce 0.93, 0.92, 0.87, 0.87, 0.84, 0.84, 0.82, 0.63,
0.60 values, respectively. AdaBoost obtained the highest ROC-AUC value 0.96, then
bagging, stacking, decision tree produced 0.94, 0.90, 0.89, LR, LDA, QDA, GNB,
MLP, KNN obtained the values 0.86, 0.86, 0.83, 0.82, 0.72, and 0.64, respectively.
The proposed method outperformed compared to various ensemble learning and
machine learning models by considering all the results.
Figure 3a–j presents the precision-recall curves for machine learning techniques,
proposed AdaBoost classifier, and other ensemble learning techniques. AdaBoost and
stacking obtained the highest average precision value of 0.99, followed by bagging
with 0.98, LDA, LR, GNB, MLP, DT, and KNN obtained average precision 0.93, 0.92,
Table 4 Performance evaluation of proposed and comparative methods
Intelligent technique Accuracy TP FP TN FN TPR (recall) FPR Precision TNR F1 ROC-AUC
AdaBoost 96.34 36.00 0.00 43.00 3.00 0.92 0.00 1.00 1.00 0.96 0.96
Stacking 93.90 37.00 3.00 40.00 2.00 0.95 0.07 0.93 0.93 0.94 0.94
Bagging 90.24 34.00 3.00 40.00 5.00 0.87 0.07 0.92 0.93 0.89 0.90
Decision tree 89.02 37.00 7.00 36.00 2.00 0.95 0.16 0.84 0.84 0.89 0.89
Identification of Heart Failure in Early Stages …
Logistic regression 86.59 33.00 5.00 38.00 6.00 0.85 0.12 0.87 0.88 0.86 0.86
Linear discriminant analysis 86.59 33.00 5.00 38.00 6.00 0.85 0.12 0.87 0.88 0.86 0.86
Quadratic discriminant analysis 82.93 31.00 6.00 37.00 8.00 0.79 0.14 0.84 0.86 0.82 0.83
Gaussian Naïve Bayes 81.71 31.00 7.00 36.00 8.00 0.79 0.16 0.82 0.84 0.81 0.82
Multi-layer perceptron 70.73 36.00 21.00 22.00 3.00 0.92 0.49 0.63 0.51 0.75 0.72
K-nearest neighbor 63.41 26.00 17.00 26.00 13.00 0.67 0.40 0.60 0.60 0.63 0.64
547
548 B. Kameswara Rao et al.
Fig. 3 Precision-recall curves of a AdaBoost, b stacking, c bagging, d DT, e LR, f LDA, g QDA,
h GNB, i MLP, j KNN
Identification of Heart Failure in Early Stages … 549
Fig. 3 (continued)
0.92, 0.90, 0.82, and 0.72, respectively. The results show that AdaBoost performs
very well in terms of average precision.
The prediction error is defined as the variation between predicted and actual
values. In the proposed method, the 3 instances from class ‘0’ predicted as class ‘1’
are incorrectly classified instances in the test data in the proposed technique. The
class prediction error for the proposed technique, other machine learning techniques,
and ensemble learning techniques is shown in Fig. 4.
The accuracy of the proposed technique and various comparative models is
presented in Fig. 5. The proposed adaptive boosting performed better than all
methods.
6 Conclusion
Fig. 4 Class prediction error for the various comparative methods and the proposed technique
Fig. 5 Accuracy of the various comparative methods and the proposed method
Identification of Heart Failure in Early Stages … 551
number of patient records and extend the work using sophisticated deep learning
techniques on the image data.
References
1. Y. Xing, J. Wang, Z. Zhao, Y. Gao, Combination data mining methods with new medical data
to predicting outcome of coronary heart disease (2008), pp. 868–872. https://doi.org/10.1109/
iccit.2007.204
2. J. Mackey, G. Mensah, WHO 2004_atlas oh heart disease and stroke.pdf (2004), p. 9
3. A.A. Ariyo et al., Depressive symptoms and risks of coronary heart disease and mortality in
elderly Americans. Circulation 102(15), 1773–1779 (2000). https://doi.org/10.1161/01.CIR.
102.15.1773
4. M.A. Whooley et al., Depressive symptoms, health behaviors, and risk of cardiovascular events
in patients with coronary heart disease. JAMA J. Am. Med. Assoc. 300(20), 2379–2388 (2008).
https://doi.org/10.1001/jama.2008.711
5. L.R. Wulsin, J.C. Evans, R.S. Vasan, J.M. Murabito, M. Kelly-Hayes, E.J. Benjamin, Depres-
sive symptoms, coronary heart disease, and overall mortality in the Framingham Heart
Study. Psychosom. Med. 67(5), 697–702 (2005). https://doi.org/10.1097/01.psy.0000181274.
56785.28
6. A. Singh, R. Kumar, Heart disease prediction using machine learning algorithms, in 2020
International Conference on Electrical and Electronics Engineering (ICE3), Feb 2020, pp. 452–
457. https://doi.org/10.1109/ICE348803.2020.9122958
7. D. Shah, S. Patel, S.K. Bharti, Heart disease prediction using machine learning techniques. SN
Comput. Sci. 1(6), 345 (2020). https://doi.org/10.1007/s42979-020-00365-y
8. R. Katarya, P. Srinivas, Predicting heart disease at early stages using machine learning: a survey,
in Proceedings of the International Conference on Electronics and Sustainable Communication
Systems, ICESC 2020 (2020), pp. 302–305. https://doi.org/10.1109/ICESC48915.2020.915
5586
9. D.P.G. Apurb Rajdhan, M. Sai, A. Agarwal, D. Ravi, Heart disease prediction using machine
learning. Lect. Notes Electr. Eng. 9(04) (2020)
10. Z. Masetic, A. Subasi, Congestive heart failure detection using random forest classifier. Comput.
Methods Programs Biomed. 130, 54–64 (2016). https://doi.org/10.1016/j.cmpb.2016.03.020
11. D. Chicco, G. Jurman, Machine learning can predict survival of patients with heart failure
from serum creatinine and ejection fraction alone. BMC Med. Inform. Decis. Mak. 20(1),
1–16 (2020). https://doi.org/10.1186/s12911-020-1023-5
12. C.R. Olsen, R.J. Mentz, K.J. Anstrom, D. Page, P.A. Patel, Clinical applications of machine
learning in the diagnosis, classification, and prediction of heart failure: machine learning in
heart failure. Am. Heart J. 229, 1–17 (2020). https://doi.org/10.1016/j.ahj.2020.07.009
13. S. Rahayu, J. Jaya Purnama, A. Baroqah Pohan, F. Septia Nugraha, S. Nurdiani, S. Hadianti,
Prediction of survival of heart failure patients using random forest. J. Pilar Nusa Mandiri 16(2),
255–260 (2020). [Online]. Available: www.ubsi.ac.id
14. M. Peirlinck et al., Using machine learning to characterize heart failure across the scales.
Biomech. Model. Mechanobiol. 18(6), 1987–2001 (2019). https://doi.org/10.1007/s10237-
019-01190-w
15. F.S. Alotaibi, Implementation of machine learning model to predict heart failure disease. Int.
J. Adv. Comput. Sci. Appl. 10(6), 261–268 (2019). https://doi.org/10.14569/ijacsa.2019.010
0637
16. E.D. Adler et al., Improving risk prediction in heart failure using machine learning. Eur. J.
Heart Fail. 22(1), 139–147 (2020). https://doi.org/10.1002/ejhf.1628
17. S.B. Golas et al., A machine learning model to predict the risk of 30-day readmissions in
patients with heart failure: a retrospective analysis of electronic medical records data. BMC
Med. Inform. Decis. Mak. 18(1), 1–17 (2018). https://doi.org/10.1186/s12911-018-0620-z
552 B. Kameswara Rao et al.
18. M. Cikes et al., Machine learning-based phenogrouping in heart failure to identify responders
to cardiac resynchronization therapy. Eur. J. Heart Fail. 21(1), 74–85 (2019). https://doi.org/
10.1002/ejhf.1333
19. T. Ahmad, A. Munir, S.H. Bhatti, M. Aftab, M.A. Raza, Survival analysis of heart failure
patients: a case study. PLoS ONE 12(7), 1–8 (2017). https://doi.org/10.1371/journal.pone.018
1001
20. K. Shameer et al., Predictive modeling of hospital readmission rates using electronic medical
record-wide machine learning: a case-study using Mount Sinai heart failure cohort, in Pacific
Symposium on Biocomputing 2017 (2017), pp. 276–287
21. Y. Freund, R.E. Schapire, M. Hill, Experiments with a new boosting algorithm rooms f 2B-428,
2A-424 g (1996)
22. B.K. Rao, P.S. Kumar, D.K.K. Reddy, J. Nayak, B. Naik, QCM Sensor-Based Alcohol Classi-
fication by Advance Machine Learning Approach (Springer, Singapore, 2021), pp. 305–320
A Comparative Study of Different
Forecasting Models for Energy Demand
Forecasting
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 553
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_42
554 T. Islam et al.
reason, proper energy forecasting and planning are essential for continued prosperity
and economic growth.
Energy consumption forecasting and planning is traditionally done through econo-
metric modeling in Australia and many other countries. Australian Energy Market
Operator (AEMO) lays down the methodology followed for various energy consump-
tion estimates (see details can be found in [1]). The basic tool they follow is linear
regression which is the most common tool for econometric modeling. Linear regres-
sion models are usually good enough where forecasted values are used in long-term
planning and the fluctuation of data is not significant. As this type of model is gener-
ated using a least square method, the estimated values depend on the average trend
of data used. However, no data collected from any practical system shows a linear
trend. Even if there is a clear trend of data, either upward or downward, we cannot
expect a perfect fitting to a linear model. Practical data are usually fluctuating with or
without cyclic or seasonality patterns. For example, the energy consumption pattern
in summer is different from the same in winter, and it varies with population growth
and technology changes. As of the literature, the use of linear regression for such
situations, specifically for short-term forecasting, is not a preferred option. However,
it is possible to apply nonlinear regression using the linear regression methodolo-
gies where the form of the nonlinear function must be provided by the user. The
main drawback of this approach is that it is extremely hard to find the right form of
nonlinear function for a given set of data.
To fit the data into an appropriate nonlinear function, machine learning approaches
such as neural networks and support vector machines usually perform better than
standard linear regression time series models [2]. These approaches are very popular
for both short- and long-term predictions. In recent times, there has been a great
expansion in the usage of machine learning models for various applications related
to prediction problems.
To find an appropriate prediction approach of Australian energy consumption, in
this paper, we conduct a comparative study of four existing methodologies, namely
linear regression, feed-forward neural network, support vector machines, and extreme
gradient boosting method. These methods are implemented using a wide range of
data from the Australian energy sector and their performances are analyzed and
compared. We also consider the appropriate input selection, model selection, and
the robustness of forecast accuracy. The insight of a problem in regression model
implementation is uncovered and reported. The results support the fact that machine
learning techniques provide more accurate forecasting.
The paper is organized as follows. After the introduction, Sect. 2 presents a brief
literature review on different popular forecasting techniques. Section 3 describes the
details of the experimental study. The results are presented and analyzed in Sect. 4.
Finally, conclusions are drawn in Sect. 5.
A Comparative Study of Different Forecasting Models … 555
2 Literature Review
There is a wide range of techniques or models that have been used for energy
forecasting. These models can be broadly classified into the following categories:
traditional statistical techniques, artificial neural network techniques, evolutionary
computational techniques, and other techniques. A brief introduction and review of
these techniques are presented below.
First, we discuss the traditional statistical techniques such as linear regression
and a specific moving average technique that are widely used in forecasting. Linear
regression is a method to predict a target variable by fitting the best linear relationship
between a dependent variable and the independent variables. The best fit (also known
as the goodness of fit) is done by making sure that the sum of all the distances between
the shape and the actual observations at each point is as small as possible [3]. A
straight-line equation is derived with constant values that give the least amount of
error between forecast values and actual values. A simple linear regression equation
is expressed as below:
Y =a+b∗ X (1)
connected nodes. The input layer takes in the input signal. Hidden layers act as
distillation layers, and they distill out important patterns from the input and pass
them onto the next layers. Hidden layers also have activation functions that help in
capturing the nonlinear relationships and convert their input to output. Layers and
nodes are connected through weights which change when the ANN is trained to
map an input to output. Later on, the ANN is tested with test data to predict output
from input test data [7]. Multilevel perceptron model (MLPM), which is also called
a feed-forward neural network (FFNN), is a simple ANN and can be used for both
classification and numerical prediction problems [8].
Machine learning is a way of using computer programs for automating tasks.
However, machine learning systems are not explicitly programmed rather; they are
provided with many examples of data and solutions, which is called training the
system. After that, machine learning systems can automate the tasks themselves
because they learn the rules [5]. ANNs are often at the heart of machine learning
systems. Deep learning is learning through ANNs. Here, the “Deep” refers to many
layers of successive representations of information [5]. This is essentially many
layers of neural networks structured to learn input–output rules for a task. Some of
the commonly used ANN techniques are discussed below.
Recurrent neural network (RNN) goes through an internal loop for processing
information. It reuses quantities calculated in the previous iteration of the loop. It is
a preferred model to use for sequential data [9]. In convolutional neural networks
(CNNs), there are three sections: convolutional layers, pooling layers, and fully
connected MLP layers [8].
Support vector machine (SVM) sets up a hyperplane and boundary lines in such
a way that the maximum number of data points are captured within the boundaries.
Support vector regressor (SVR) works on the same principles of SVM with minor
differences. It creates a correlation matrix based on training data. Relevant parameters
from the correlation matrix are subsequently extracted and used in estimator functions
that estimate outputs for test data [10].
Evolutionary computation (EC) techniques are widely recognized as stochastic
global search methods for solving complex optimization problems. Among the EC
techniques, a genetic algorithm (GA) is a popular approach. EC techniques are based
on the concept of natural selection, adaptation, and survival of the fittest. The process
uses a population of individuals (/solution points) that are evolved through variation
operators such as crossover and mutation in several generations (/iterations) until
a stopping criterion is met. As the forecasting methods usually apply optimization
techniques in minimizing the prediction errors, EC techniques can serve the purpose
of optimization. Hyperparameter optimization of machine learning models can be
done through EC techniques.
As indicated earlier, we will conduct an experimental study of four forecasting
techniques that include linear regression, feed-forward neural network, support
vector machines, and the extreme gradient boosting method. These methods are
briefly reviewed below.
AEMO uses linear regression for various sectors of the economy [1]. The equation
below describes their consumption forecast for small-medium enterprises (SME)
A Comparative Study of Different Forecasting Models … 557
based on Gross State Product (GSP), electricity price, efficiency, climate change,
and shock factor.
algorithm to minimize the loss when adding new models” [24]. Wang et al. [25] used
an XGBoost-based hybrid model to forecast building energy consumption success-
fully. The data was provided by the US national energy laboratory. XGBoost-based
hybrid model was used by Fan et al. [26] and Wang et al. [27] to accurately predict the
short-term load of a distribution network for an electricity company in China and the
load demand with greater accuracy compared to status quo models and techniques
respectively.
Although many approaches were published in the area of energy forecasting, most
had the following limitations: they used a single set of variables/features for judging
their respective model performances and none of the studies showed how the models
performed on multi-period forecasting.
3 Experimental Study
As previously discussed, various models have been proposed for energy demand
forecasting. Most of these methods can be categorized into two major categories:
(a) causal models and (b) historical data-based methods [28]. In the causal methods,
energy consumption is taken as the output variable and some economic, social, and
climatic variable is taken as the input variable. The focus of this research is on causal
models with the following objectives:
• Ascertain the set of variables as input variables,
• Ascertain the set of models for reasonable accuracy, and
• How to assess the robustness of outputs.
In this study, we considered the following four models: linear regression, ANN
(feed-forward model), SVR, and XGBoost (Extreme Gradient Boost). The features
of these models are presented below.
(a) Linear Regression function was used from sklearn.linear_model python library
with all its default values.
(b) ANN (MLP/Feed-Forward Model)—Structure consisted of 1 input, 1 output,
and 3 hidden layers. Hidden layers had 200, 112, and 50, respectively. The
optimizer type was “adam”.
(c) The SVR function was used from sklearn.svm module in python. The Kernel
type used was “rbf”. C had the default value of 1 and epsilon 0.1. C is a penalty
factor for misclassified data points, epsilon represents a margin of tolerance
where no penalty is given to errors.
(d) XGBRegressor function was used from the xgboost python module. All the
default parameters were used. Default values can be found in [29].
In this study, we consider two sets of input variables for judging the performance of
four models under consideration as shown in Table 1. These two sets of features were
chosen based on their importance in affecting the output variable. The output variable
in both cases was the energy consumption (in tons of oil). Features contained in data
A Comparative Study of Different Forecasting Models … 559
set Alpha and Beta were selected based on SelectKbest run results. SelectKbest
run results showed which were the most important features and also subsequent
correlation considerations among the features themselves were considered in creating
data set Alpha and Beta. Each feature was scaled from 0 to 100 with min–max feature
scaling.
Each set of input variables (Alpha and Beta) was run through each of the four
models (LR, ANN, SVR, and XGBoost) described earlier. The accuracy of prediction
was noted for each case.
Primarily data was sourced from [30, 31]. Following is a snapshot of the input data
file. As part of preprocessing of data, annual data was converted to monthly data and
backcast if some data was missing. Backcasting of data was done by mostly using
functions that are available within MS Excel that provides the best fit for a given set
of data. Sample input data are shown in Fig. 1.
Year,Month,Population,GDP,CPI,Electricity_Market_Spot_Prices,Househol
d_Electricity_price_index,Coal_Prices,Gas_Prices,Calculated_Energy_Us
e
1960,1,0,0,0,0,0,0,0,0
1960,2,0,0,0,0,0,0,0,0
.
.
1960,12,0,0,0,0,0,0,0,0
1961,1,0.114067099,0.033043091,0.013569047,0.102523687,0.141242938,0.
084813454,0.141003855,0.083142697
1961,2,0.228134198,0.066086183,0.027138094,0.205047374,0.282485876,0.
169626909,0.282007709,0.166285395
.
.
A sliding window is used for various periods of data. The window consisted of
five different periods. The periods of the training window are 1, 2, 3, 6, and 10 years.
The source data consist of monthly data, and the task of the research method is to
make predictions for multiple months (month-wise for 12 months) in each instance.
The experiments are run for five different training spans, namely 12, 24, 36, 48, and
120 months, and the testing period is one year for all runs. A sliding window was
slid over 30 years: from 2019 backward till 1990.
Python 3.5 was used for building the models. These modules were used for
importing relevant functions from them: Sklearn, Tensorflow, and xgboost. A laptop
with AMD 4 core CPU with 8 GB RAM was used to run the experiments.
The experiments were run and the output errors as MSEs were collected for each
of five different training spans, over four different models, with two different data
sets in both training and testing phases. These errors were then summarized, and the
summary of average errors is presented in Table 2. From this table, it can be seen that
the three machine learning models (MLP, SVR, and XGBoost) perform better than
the LR model across various training periods and both data set Alpha and Beta. The
LR model did well on certain instances but performed poorly across various training
lengths and data sets. Especially since the scenarios dealt with forecasting for 12
monthly periods, the errors added up quickly to produce very large numbers. In the
table, M is shown where the error is very big. It can be seen that XGBoost is the most
accurate model among the four models. It performed well across different training
periods and data sets (Alpha and Beta). Currently, AEMO uses linear regression
models for the SME sector’s long-term forecasting model. The results reveal that
the XGBoost model is outstanding for this kind of task among the compared models
across various data types and also for multi-period forecasting.
As shown in Table 2, for the testing phase with the Alpha data set, the average
MSEs for SVR and MLP are at least two and six times higher than the same from
XGBoost, respectively. With the Beta data set, they are at least 1.7 times higher.
However, the MSEs vary a lot with the length of training data sets for the three
better-performing methods (MLP, SVR, and XGBoost). To visualize their relative
Table 2 Summary results from the models with five different training periods
Test mean MSE Test mean MSE Train mean MSE Train mean MSE
range for Alpha range for Beta range for Alpha range for Beta
MLP 18.6–38.56 5.10–26.26 1.35–7.57 0.55–4.98
SVR 6.46–21.80 5.10–13.84 0.37–1.81 0.0–0.74
XGBoost 3.03–3.22 3.04–3.05 0.0–0.0 0.0–0.0
LR 13.9–8.34E + 14 6.64–168,885 0.0–0.0 0.0–0.64
A Comparative Study of Different Forecasting Models … 561
50.00
40.00
30.00
20.00
10.00
0.00
1 YEAR 2 YEARS 3 YEARS 6 YEARS 10 YEARS
Fig. 2 Test MSE for various training lengths with data set Alpha
30
25
20
15
10
5
0
1 YEAR 2 YEARS 3 YEARS 6 YEARS 10 YEARS
Fig. 3 Test MSE for various training lengths with data set Beta
performances, the training length-wise MSEs are plotted in Fig. 2 for the Alpha data
set and in Fig. 3 for the Beta data set. From these two figures, it is clear that MLP is
worse than SVR for lower and higher training lengths and they have similar perfor-
mances for two training lengths in the middle. However, XGBoost is consistently
better than both MLP and SVR for all training lengths. So, it is easy to claim that
XGBoost is the best model, out of the four models investigated in this study, for the
data sets used.
5 Conclusions
This research aimed to find suitable models for making longer-term and multi-period
forecasting for energy requirements in Australia based on macroeconomic input
variables. This paper presented the considered models along with the results obtained
562 T. Islam et al.
and the results of a traditional econometric model with two different feature (input
variable) sets and 5 different training window scenarios. In each case, forecasting
was made for 12 periods consisting of 12 months of data.
Out of the 4 models (MLP, SVR, XGBoost, and LR) considered in this study,
the 3 AI-based models (MLP, SVR, and XGBoost) produced reasonably accurate
results across all scenarios. LR-based model sometimes did well, but other times
performed poorly when data variability was high. XGBoost model proved to be the
most accurate and robust among these models as it performed best across different
training lengths and for both source data sets (Alpha and Beta). In the testing phase,
the mean MSE ranges obtained from XGBoost for the Alpha and Beta data sets are
3.03–3.22 and 3.04–3.05, respectively. These ranges are a few times higher for MLP
and SVR. In the training phase, XGBoost provided zero mean MSE values for both
data sets, which is both positive and significant for both of them. Note that although
LR shows good mean MSE for its training phase, they are unusually high (e.g., 13.9
to 8.34E + 14 for Alpha) for the testing phase, due to data fluctuations.
The task of the models was to produce 12 months of forecast (test) based on
training periods of various lengths. It appeared that shorter training periods improved
performance for the 3 AI-based models. This can be explained by the fact that the
most recent data might be the best predictor for future performance.
Further work can be carried out for investigating the suitability of these models
or additional AI-based models for making longer-term forecasting with a forecast
period of 1–5 years. Whether the hybridization of models can lead to better long-term
forecasting and can also be investigated.
References
Abstract Social media has become an inevitable part of human’s daily life enabling
people to express their opinion, sentiments, and ideologies. During this COVID-19
pandemic when the whole world went into a lockdown situation, Twitter served as an
outlet for people to express their emotions. This work proposes streaming the real-
time Twitter data on COVID-19 using Twitter API and handling the streaming big
data using the Apache Spark framework. Here the fake account detection to detect the
non-legit accounts present in the streamed data was accomplished by the proposed
feature-based algorithm which attain overall accuracy of 98.74%. This constructed
fake account detection model filters out the genuine accounts from the API streamed
Twitter data. Sentimental analysis on these genuine Twitter accounts is performed
by modifying the Natural Language Processing (NLP) state-of-art algorithm called
Bidirectional Encoder Representations from Transformers (BERT). The proposed
method achieved 88.30% of classification accuracy rate by concatenation of the
pooled NN layer with the influential feature.
1 Introduction
The outbreak of Coronavirus disease 2019 (COVID-19) caused due to the virus
named SARS-Cov-2 has taken the form of a pandemic causing humongous loss
of life and economy all around the world. To effectively control the spread of the
virus, the government of every country has enforced preventive measures such as
lockdown, social distancing, etc. But this confinement period has resulted in severe
psychological issues due to boredom and loneliness [1]. To fill this void people all
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 565
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_43
566 S. P. Preethi and R. Senthilkumar
around the world took social media networks such as Twitter, Facebook, Instagram as
a platform to express their opinions and sentiments. Therefore public sentiment can
be captured by monitoring the text content generated by these social media platform
users. But before analyzing these tweets for sentiment, the fake or non-legit accounts
have to be removed from the Twitter API streamed data. Fake accounts are created
by malicious social bots and they expand their social communication by creating
multiple social accounts. It has to be differentiated from other legit accounts as they
spread misinformation among the users of the account and for reliable sentimental
analysis of the public opinion. Then the semantic of the tweets has to be derived
to arrive at the opinion of the Twitter account users. The sentimental analysis uses
NLP extensively to categorize the emotions of the human. Sentimental analysis in
real-time is the sector in NLP that is not addressed enough [2]. It requires a powerful
Big-data tool to analyze the incoming streaming data. The main setback in streaming
data sentimental analysis is that the genuinely of the data that is streamed is low [3].
The fake data streamed reduces the efficiency of the model constructed to analyze
sentiments [4].
2 Related Work
Fake accounts are created by malicious social bots and they expand their social
communication by creating multiple social accounts or by disguising themselves as
a follower of the account [5, 6]. To identify such malicious accounts from legitimate
account URL features such as URL redirection, frequency of shared URLs, and
spam content in the URL of the tweets can be used [7]. Twitter streams consist of
both high-quality URLs and low-quality URLs as the Twitter user’s tweet about the
COVID-19. Lisa Singh et al. in their experiment found that misinformation from low-
quality URLs spread by the bot accounts is shared more than the high-quality health.
Rout R. et al. [8] proposed a learning automata-based malicious social bot detection
(LA-MSBD) algorithm to identify the legitimate user. Here the direct trust is derived
from Bayes’ theorem, and the indirect trust is derived from the Dempster–Shafer
theory (DST) to determine the trustworthiness of each Twitter user accurately.
Bot accounts are dangerous because they try to manipulate the content and spread
misinformation which can greatly affect public opinion and misguide people [9].
Sharma et al. [10] in their work identified unreliable and misleading content based
on fact-checking sources and examined the narratives promoted in misinformation
tweets, along with the distribution of engagements with these tweets spread by bots
[11]. But these bots are not very easy to be detected as they actively try to avoid detec-
tion. Phillip Efthimiou et al. [12] proposed a novel approach to ensemble features
such as followers-to-friends ratio, and message variability, length of user names,
reposting rate, temporal patterns, and sentiment expression for bot detection. Kılınç
[13] proposed a method that checked the confidence of the data generated from
Twitter and then analyzed the sentiment of the confident data. Here both fake account
detection and sentimental analysis are done using the Naive Bayes algorithm.
Sentimental Analysis of Streaming COVID-19 Twitter Data … 567
Walt et al. [14] work researches on the fake account created by human. The
proposed work considered the friend-to-follower ratio engineered from the friend
and follower count and concluded that the engineered features to detect the fake bot
account cannot be used as efficiently on fake human accounts. Xiao et al. [15] in their
work proposed a methodology to detect a cluster of malicious bot accounts by using
a supervised machine learning pipeline. This method suggests using main features
such as content generated by the users.
Mukherjee et al. [16] proposed fine-grained sentimental classification to clas-
sify multiple human emotions toward pandemics using the Roberta classifier. The
proposed methodology was trained and tested on two benchmark datasets AIT (non-
covid) and SenWave (Covid) datasets [17]. The ROBERTA method resulted in a good
Jaccard index, F1 micro, and F1 macro scores. Chriqui and Yahav [18] in their work
suggested a transformer-based model named Hebert. The constructed tool HebMo
used Hebert to extract emotions from the Hebrew UGC gives a very efficient score
even for emotion detection in the English language.
3 System Architecture
The proposed methodology involves streaming real-time data using the Twitter plat-
form through Twitter API. The Twitter developer account is created from which
the consumer and access tokens are obtained. These tokens are used to stream the
Twitter live data. This live-streamed data is then sentimentally analyzed using the
feature combined BERT algorithm. But the major concern is to conserve the reli-
ability of the Twitter data streamed. Thus the genuine of the Twitter accounts are
checked using the feature-based algorithm and only the legit accounts are filtered and
the fake accounts are dropped. The feature-based algorithm is a rule set constructed
using ensemble-engineered features-based conditions derived from the metadata of
the real-time streaming data. The feature is engineered by analyzing the profile-based
metadata such as name, description, screen name, status, and behavior-based features
such as friends count, the following count, statuses count, and listed count. Then the
sentimental analysis is done for legit accounts using fine-tuned BERT model. In
this proposed model, the influential feature is concatenated as a layer along with the
pooled neural network of the BERT model. The analyzed sentiments are then visually
represented to get an overall idea of people’s sentiments. The system architecture is
shown in Fig. 1.
work proposes concatenation of influential features as a layer along with the BERT
model. It suggests combining pooled NN layer with the influential feature with
higher correlation and adding layer gives better classification accuracy compared to
the state-of-art BERT algorithm used for sentimental analysis. The modified BERT
architecture is given in Fig. 2.
4 Methodology
Features on Twitter derived from the meta-data of the user accounts can be used
to analyze fake accounts run by bots on the Twitter platform. This work proposes
constructing a feature-based algorithm that ensembles engineered features. The bag-
of-words are constructed including the most commonly used words by bot. The
constructed bag-of-words are checked with features such as name, screen_name,
description, and statuses of the streamed Twitter accounts. The accounts are checked
if they are verified by Twitter. The feature is engineered by analyzing features such
as friends count, the following count, statuses count, and listed count to compute the
friends-to-following ratio because the bots follow the maximum number of people
while their friend count is much less compared to the legitimate account. The data is
Sentimental Analysis of Streaming COVID-19 Twitter Data … 569
analyzed for features such as followers_retweet, status frequency, and the threshold
is set for determining bots which ensures that the account is legit.
Algorithm
Procedure: Ruleset construction- feature engineering
Input: meta-data features of the Twitter account
Output: classified legit and non-legit accounts
Start
Read Source ← training_data_bot.csv
Construct bag of words to check name, screen_name,
description and status of the account has spam words.
name = contains("bag of words"== true)
570 S. P. Preethi and R. Senthilkumar
BERT can be used for a large variety of natural language processing tasks by fine-
tuning the model. Sentimental analysis is a classification task and thus it can be done
similar to the next sentence classification. The classification layer is added to the
top of the transformer output for the [CLS] token. Along with this fine-tuning, this
work proposes concatenation of influential features as a layer along with the BERT
model. The input tweets are first embedded into vectors and processed by the neural
networks. The vector output of the sequence is that every vector is of the same size H
corresponds to the input token. BERT has two main functionalities masked language
modeling (MLM) and next sentence prediction (NSP). Here a classification layer is
added on the top of the output of the encoder. The output is then multiplied by an
embedding matrix to transform each word into vocabulary dimensions. Softmax is
Sentimental Analysis of Streaming COVID-19 Twitter Data … 571
used to calculate the probability of each word to the mask. The context of the tweet
is perceived by segment embedding where the sentence number is encoded into the
vectors. At last, the position embedding is added to each tokens denoting the position
of the word within the sentence. The segment embedding and position embedding
make up the NSP. Combining pooled NN layer with the most influential feature
i.e. friend_count gives better classification accuracy compared to the state-of-art
algorithms.
Algorithm
Procedure: Fine-tuning BERT model
Input: Three input vectors - token embedding, segment
embedding, and position embedding
Output: the classified sentiment of the tweets.
Start
Read Source ← covid19_tweets.csv (kaggle covid19 tweets
dataset)
corr_feature = df.corr(method=’spearman’)
Assign sampleDf ← read source dataset
Split the dataset into test and train data along with the
most correlated feature (usr_frnd).
encoder ← LabelEncode()
encoder.fit(target_values)
Save the encoding map named twitter_classes.npy for later
use.
tokenizer ← tokenization.FullTokenizer(vocab_file,
do_lower_case)
Convert the tokens to token ids using the function
encode_names(n, tokenizer).
Using one hot encoding, encode the influential feature.
featureEncoder = LabelEncoder()
featureEncoder.fit(usr_frnd)
Save the encoding of the influential feature
twitter_wkd.npy for future use.
Pre-process the input tweets, built a function of
bert_encoder().
Built a fine-tuned BERT model using the inputs defined in
the pre-processing and save the model
’twitter_BERT_usr_frnd’.
Train and test the model using the fine-tuned Bert model.
Using the saved encoder and model now classify the
Twitter live-streamed data.
End
5 Experimental Setup
The environments in which the implementation is done and labeling of the dataset
based on the polarity are explained in this setup.
572 S. P. Preethi and R. Senthilkumar
The social honeypot dataset was collected on the Twitter platform from 30th
December 2009 to 2nd August 2010 [19]. Kaggle collected the covid19_tweets.csv
using Python script and Twitter API. The labeling of these tweets is done using
a lexicon-based approach using [20] Valence Aware Dictionary and Sentiment
Reasoner (VADER) to label the tweets. The polarity scores above 0.5 are labeled
as positive, scores below -0.05 are labeled as negative and scores between − 0.5
and 0.5 are labeled as neutral. The datasets are used to train the constructed model
and real-time test data is streamed using Twitter developer API using the keyword
#COVID for validation data.
6 Results
Table 1 Comparison of
S. No Methodology Overall accuracy (%)
methodologies in fake
account detection 1 Decision tree algorithm 86.25
2 Binomial Naive Bayers 68.75
3 Random Forest algorithm 85
4 Single layer perceptron 70
5 Multilayer perceptron 76
6 Feature-based algorithm 98.74
Sentimental Analysis of Streaming COVID-19 Twitter Data … 573
Fig. 3 ROC of
feature-based algorithm
rate is shown in Fig. 3. The figure illustrates that the true positive rate equals 98.74
on a scale of 0–1.
The overall accuracy of the model built is evaluated using this metric. Accuracy is
nothing but the fraction of right predictions by the model constructed. The constructed
feature combined BERT algorithm achieves 88.30% of validation accuracy. The loss
is also known as Softmax loss and is mostly used in multi-class classification. This
kind of loss will train a deep NN to output a probability over the multi-classes. The
graph plotting overall accuracy and loss is given in Figs. 4 and 5 respectively.
7 Conclusion
The main contribution of this paper is to sentimentally analyze the public emotion
on Twitter during this COVID-19 pandemic using fine-tuned BERT algorithm. For
reliable results, a model to filter the genuine accounts based on the analyzed features
is constructed. The feature-based algorithm achieved a higher overall accuracy of
98.74% compared to the Decision tree algorithm with 86.25%, Naive Bayers with
68.75%, Random Forest with 85%, Single layer perceptron with 70%, and Multilayer
perceptron with 76%. The legit account tweets are then to be sentimentally analyzed
for the public emotion using Fine-tuned BERT algorithm. The fine-tuned feature
combined BERT algorithm achieves overall training and validation accuracy of about
90% and 88.30% respectively. Future work may include automating the fake account
detection system that upgrades the rule-set that is compatible with any dataset. The
sentiment of the people is complicated and thus it cannot be merely classified into
positive, negative, and neutral emotions. Thus this work can be extended to classify
fine-grained multi-class emotions rather than three sentiments expressed.
References
1. A. Badawy, E. Ferrara, K. Lerman, Analyzing the digital traces of political manipulation, in The
2016 Russian Interference Twitter Campaign. arXiv e-prints page arXiv:1802.04291 (2018)
2. R. Liu, Y. Shi, C. Ji, M. Jia, A survey of sentiment analysis based on transfer learning. IEEE
Access 7, 85401–85412 (2019)
3. W. Medhat, A. Hassan, H. Korashy, Sentiment analysis algorithms and applications: a survey.
Ain Shams Eng. J. 5, 1093–1113 (2019)
4. M. Cinelli, W. Quattrociocchi, A. Galeazzi, C.M. Valensise, E. Brugnoli, A. LuciaSchmidt,
P. Zola, F. Zollo, A. Scala, The covid-19 social media infodemic. arXiv preprint: arXiv:2003.
05004. (2020)
Sentimental Analysis of Streaming COVID-19 Twitter Data … 575
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 577
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_44
578 Z. Aizaz et al.
1 Introduction
Neural networks can provide useful conclusions from large and imprecise datasets.
Almost all modern computing algorithms are lagging behind the computing abilities
of the neural networks. Neural networks when implemented on hardware have chal-
lenging requirements because of their computationally intensive nature. Whether it
is pattern recognition onboard on a satellite, disease prediction using image datasets
and computer vision in a robot, all of them require neural network implementation
on dedicated hardware such as on Field Programmable gate Arrays (FPGA) or on
the Application Specific Integrated Circuits (ASIC). The hardware requirement and
large energy consumption of neural networks can be curtailed by using an emerging
paradigm called approximate computing. Approximate computing is a technique for
designing compact and low-energy digital logic circuits by introducing some inac-
curacy so as to minimize the circuit complexity [1]. Adders, multipliers and counters
are the most frequently used arithmetic circuits on a processor, and their efficient
design is crucial for hardware efficiency of an application-based circuit. Multipliers
are undoubtedly the most important primitives for image processing and artificial
neural network (ANN) applications, predominating the area, delay and overall per-
formance of their hardware implementations. The partial products of the multipliers
can be generated in two ways, one using AND gates and the other using the Booth
encoding. Booth-encoded partial products are generally preferred as the number of
rows is reduced to half. For example, if the size of a multiplier is n bits, the number
of rows of partial product is also n for a non-Booth multiplier but n2 + 1 for a Booth
multiplier. Booth multiplier circuits [2, 3] have proven to be the better as compared
to other multipliers in speed and energy/power efficiency. Compressor circuits play
a vital role in the partial product accumulation stage of the multiplier circuit using
tree-based schemes such as Wallace and Dadda [4]. In case of addition of two or three
bits of partial products, half and full adders are used, but when the number of partial
product bits is more than three, a compressor is used to add them. A n-m compressor
has n inputs and m outputs. Thus, full adder is also a 3-2 compressor which adds three
input bits and provides two output bits. Most efficient state-of-the-art multipliers have
used 5-3 compressors [5, 6]. A simple 5-3 compressor consists of two full adders
with three outputs, i.e., Sum, Carry1 and Carry2. The weight of Carry1 and Carry2 is
double as compared to the weight of Sum. If the input carry and Carry2 pins are not
used, a 4-2 compressor can be designed which allows a significant saving in area and
power consumption. But, this requires the third carry bit Carry2 to be modified from
1 to 0 at the input combination of 1111. This modification induces error in the output
of the compressor. Since some outputs of a 4-2 compressor are wrong, its design is
approximate and not accurate. A number of previously proposed designs used dif-
ferent efficient approximate 4-2 compressors [7–11]. Dual-mode 4-2 compressors
are presented in [12] which can change their mode of operation between exact and
approximate. In [13], power-efficient 4-2 compressors consisting of just one NAND
gate and three NOR gates are presented. In [14], a wide variety of approximate 4-2
compressors are extensively reviewed with their hardware and error characteristics.
Neural networks are inherently error tolerant, and therefore, implementing neural
Efficient Approximate Multipliers for Neural Network Applications 579
applications using an approximate multiplier provides very little loss in accuracy but
large gains in energy, and timing performance can be achieved [15]. In this paper,
we have designed an FPGA-based neural network using approximate multiplication
instead of exact multiplication.
Hidden neurons
Input neurons Weights
Output neurons
Io
I1
I2
IN
1
i x,y = (1)
1 + e−sx,y
N
sx,y = i j,y−1 × w j x,y−1 (2)
j=1
N is the number of neurons in the last layer before the output, and w j x,y−1 is the weight
connecting the neuron x and the layer y-1. The multiplication operation shown in
Equation (2) requires parallel multipliers on the circuit. Thus, instead of using exact
multipliers, approximate multipliers can be used to obtain area, power and energy
savings on the circuit that performs the neural application.
A Booth multiplier operates in a series of three stages, i.e., partial product genera-
tion, accumulation and the final two row addition, as shown in Fig. 3. Let A be the
multiplicand string and B be the multiplier string. The Booth encoder encodes the
Efficient Approximate Multipliers for Neural Network Applications 581
MULTIPLIER MULTIPLICAND
BOOTH
ENCODER
PARTIAL PRODUCT
GENERATOR
PARTIAL PRODUCT
ACCUMULATION USING
ADDERS AND
COMPRESSORS
FINAL ADDITION
PRODUCT
Table 1 Modified Booth encoding (b represents LSBs of multiplier and a represents LSBs of
multiplicand)
b2i+1 b2i b2i−1 Partial Pi, j for a2i a2i−1
products
00 01 10 11
0 0 0 0 0 0 0 0
0 0 1 A 0 0 1 1
0 1 0 A 0 0 1 1
0 1 1 2A 0 1 0 1
1 0 0 −2 A 1 0 1 0
1 0 1 −A 1 1 0 0
1 1 0 −A 1 1 0 0
1 1 1 0 0 0 0 0
partial products by utilizing three consecutive bits of the multiplier, b2i+1 , b2i and
b2i−1 . These three bits of the multiplier are responsible for the partial product value
to be equal to 0, A, 2A, −2 A and −A, respectively, shown in Table 1. Last three bits
of B and last two bits of A are used to encode the partial products, then B and A are
shifted right by 2 bits and 1 bit, respectively, and again last three bits of B and last two
bits of the A are used. Let the sixteen-bit multiplier string be 0000011101011101, a 0
is first appended in the least significant bit (LSB) position, then three LSBs are 010,
shifted right by two bits, and remaining LSB groups are 110, 011, 010, 110, 011, 000,
000. There are eight such groups that provides eight rows n2 of partial product matrix
582 Z. Aizaz et al.
and one row for neg bit[2]. The partial product matrix shown in Fig. 4 is for 8-bit
multiplier; therefore, there are only four groups of three bits formed, and hence, the
matrix has 4+1 rows as shown in Fig. 4. The partial products are added columnwise,
and finally, all the columns of the partial product matrix are converted to two rows
which can be added again using a fast adder.
Ahm[13]
Multiplier Design
Akb[12]
Ha[11]
Venk[8]
Mom[7]
Proposed
10200
10000
Proposed
9800
Mom[7]
Area(μm2)
9600 Venk[8]
9400 Ha[11]
9200 Akb[12]
9000 Ahm[13]
8800
0 2 4 6 8 10
MRED(10-3)
voltages. For the sake of fair comparison, in all the existing multipliers and also in the
proposed multiplier, we have used approximate compressors in 16 least significant
columns of the partial product matrix. The PDP values of the proposed design are
minimum which signifies that this design possesses improved energy consumption.
The proposed approximate multiplier is an energy-efficient design which reduces the
overall power consumption of the processor. Table 4 shows an improvement of 12%
in area, 17.3% in delay and 13.7% in power, by the proposed Booth multiplier over the
exact signed Booth multiplier. Figures 8 and 9 show the PDA (power × delay × area)
analysis and area versus MRED of various approximate multipliers, respectively.
the trained weights on the chip memory. Two hidden layers are used to construct
the neural network structure. Each layer is composed of 120 neurons with rectified
linear unit as activation function for each layer. The offline training of 60000 images
is carried out. The number of bits of weights and inputs is converted to 16 bit since
we are using 16-bit approximate multipliers. For the training purpose, floating point
notation is used. The accuracy is calculated using 10,000 test images. The number of
epochs used is 400, while the batch size is 80. First of all, the neural network imple-
mentation is performed using exact multipliers for the calculations that take place
inside the neurons, and the classification accuracy is noted as shown in Table 5. Then,
we used proposed 16-bit signed approximate multipliers in place of exact multipliers.
The implementation of exact and approximate multipliers is performed using look-
up tables (LUTs) on a XEM7350 Kintex-7 FPGA (XC7K160T-1FFG676C) board.
Table 5 demonstrates that the results of the FPGA implementation of the neural net-
work and proposed approximate neural network achieve low energy consumption
and high classification accuracy compared to existing design. It is observed that
8-bit or 16-bit multipliers provide sufficient accuracy, and therefore, most neural
network hardware implementations used low bit-width multipliers to increase the
energy efficiency. The implementation of approximate neural network is also per-
formed using various state-of-the-art multipliers instead of the proposed multiplier
on simulation level. The variation in mean square error with exact multiplier and
proposed multiplier is shown in Fig. 10, and it shows that the error occurred due to
proposed design is not very large compared to that of the exact multiplier. Table 6
shows the accuracy of the proposed and existing multipliers compared to the exact
multipliers.
6 Conclusion
Since neural networks are computationally intensive, running a neural network appli-
cation on hardware necessitates low-energy and high-speed circuits. Multipliers
are essential circuits in digital design. They are widely used in image processing
and neural network applications. Hardware resource utilization and processing time
required by multipliers are more than addition and subtraction. In this paper, a novel
compressor-based approximate multipliers is presented. The proposed compressor is
used in place of exact compressor in a 16-bit signed Booth multiplier. The multiplier
is then used to implement a neural network on FPGA. Due to approximate mul-
588 Z. Aizaz et al.
0.9
0.8
0.7
Classification Error
0.6
0.5
0.4
0.3
0.2
0.1
0
0 50 100 150 200 250
Epochs
Fig. 10 Error vs epoch to compare error in each epoch due to exact (red) and approximate (blue)
multipliers
tiplier, the neural network also becomes approximate. The disadvantage of design
of approximate multipliers using approximate compressors is that it provides lim-
ited hardware reduction and can be incorporated with truncation of partial products
or with coding of input operands to achieve further energy savings. An important
requirement for an approximate multiplier used for neural network applications is
the scalability. The advantage of the scalable design is that the proposed compressor
circuit can be used in a multiplier with bit-width of any value. This means that 4-bit,
8-bit, 32-bit and 64-bit multipliers can also use the proposed compressor. This allows
implementation of neural network hardwares on FPGAs or ASICs having variable
precisions.
Efficient Approximate Multipliers for Neural Network Applications 589
In the future, the proposed multiplier can be used in the biomedical circuits
which require battery-operated wearable devices. These devices use machine
learning-based classifiers. Due to low PDA value of the proposed multipliers, high-
performance compact neural network-based wearable devices can be designed.
References
Prakriti Dwivedi, Akbar Ali Khan, Sareeta Mudge, and Garima Sharma
Abstract Water is the most fundamental need of mankind, and its demand has been
ever increasing concomitant with growth in world’s population. Regrettably, planet
earth is witnessing a steep decrease in water quality leading to various diseases
and deficiencies in human body. An immense pressure to meet the demand has not
only led to the reduction of important minerals in water but also their appropriate
proportion in it. This neglect of fundamental need of good quality water has reached
a point that global attention is needed for it and consequently steps are being taken to
enhance awareness, and research is being conducted to make healthy potable water
within the reach of everyone. This paper attempts to align with the philosophy of ‘AI
for Social Good’ to address this problem. Experimental results of this paper include
Accuracy of 97.58%, AUC of 0.9939, Recall of 0.8521, Precision of 0.9163, F 1 score
of 0.8831, Kappa of 0.8696, MCC of 0.8703 with a run time of 150 s.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 591
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_45
592 P. Dwivedi et al.
pressure on potable water production for meeting the ever growing demand. For
this purpose, sea water is converted into portable one by desalination process that
generally involves removing of mineral components from saline water to make it
drinking-ready. Thus making the most fundamental need of human that is water just
good enough to drink, but not possessing the requisite mineral content that may
lead not only to diseases such as cholera, diarrhoea etc. but also to multi-mineral
deficiency in human body as for various mineral inputs the best and balanced supply
source is water. The threshold level of presence of some of the minerals in water
is as follows perchlorate-56 OD, ammonia—32.5 OD, nitrates—10 OD, radium—5
etc. Here, OD stands for Optical Density as the unit of measurement of minerals in
water per litre. This level of inappropriate mineralized water intake has become so
alarming that it has become a universal problem that requires immediate attention
and action. The UN General Assembly’s Sustainable Development Goal (SDG) aims
to provide universal access to safe and affordable drinking water for all by 2030. This
has led to an increased effort at the global level through a variety of public and private
organizations, NGO, governments etc. Moumen et al. [1] dwell on the relevance of
SDG goals besides focussing on Morocco’s water management plan and highlight
the fact that despite building of many dams ostensibly for economic growth yet the
country lacks in terms of safe drinking water and sanitation. Various technology
infusions were made to achieve this objective as the introduction of AI has been a
significant facilitator that has shown the enormous potential to leverage AI for the
social good [2], making the goal attainable for society 5.0. Gunning and Aha [3]
and Adadi and Berrada [4] conducted a variety of researches in this domain applying
machine learning technique with Explainable AI (XAI) approach. But the use of such
an approach in water sector, to benefit from robust prescriptive analytics with an aid
of Explainable AI concept, is still rare. Therefore, this paper makes an attempt to
use AutoML approach to provide prescriptive analytics in this domain which serves
as a catalyst for the SDG goal 6 of UNGA and also gives a clear awareness of the
importance of minerals in water content to largest stakeholder, which is every human
being.
2 Literature Review
Numerous studies have been conducted in this field with the purpose of better predic-
tion of water quality using various AI algorithms or models. Ahmed et al. [5] imple-
mented the three different models using the Wavelet De-noising techniques (WDT)
to predict water quality of Johor River. As a result, this WDT-ANFIS (Adaptive
Neuro-Fuzzy Inference System) model had improved the predictive precision for
parameters of water quality. Elkiran et al. [6] applied a series of AI models for water
quality parameter modelling. The results have shown that the neural-based set model
improves predictability by up to 14%. Ahmed et al. [7] in their work have estimated
WQI most adequately through a supervised ML algorithm that includes gradient
boosting and MLP (Multi-layered perceptron) having an accuracy of 85.07%. Lu
Explainable AI (XAI) for Social Good … 593
et al. [8] studied the same using the two hybrid decision tree-based models and
advanced denoising method. Chen et al. [9] identified that big data would help
in improving the quality of water. Yilma et al. [10], Abbasi et al. [11] and Ehte-
shami et al. [12] have validated their models using ANN and WQI approaches for
modelling, respectively. Barzegar and Moghaddam [13] have discussed about the
salinity in groundwater as one of the important issue. Having considered the Tabriz
Plain confined acquifier, the authors have compared the results after investing the
accuracy of three different neural networks viz. MLP (multi-layer perceptron neural
network), RBFNN (radial basis function neural network) and a generalized regression
neural network (GRNN). According to them, the Committee Neural network which
is combined of GRNN, RBFNN and MLP performs better than any other Artificial
Neural Networks. Hussain et al. [14] have calculated the health risk factors which are
associated with the intake of impure water. Their study was confined to the region of
Pakistan and nearby provinces. Several elements of water are absorbed by the body
and can lead to chronic diseases. Gao et al. [15] have traced the concentrations of
metals of all water concentrations to meet the drinking standards through their study.
It was hence found that Arsenic was a dominating pollutant having carcinogenic
effects that cannot be neglected. Chatalova et al. [16] on the other hand have high-
lighted the challenges faced by water related sectors viz. agriculture etc. Havelaar
et al. [17] in their work have made a comparative analysis conducting a case study
on hypothetical potable water supply from the surface water. Their region of study
is confined to Netherlands and have used disability adjusted life-years (DALYs) in
order to calculate the positive and negative effects of consuming disinfected water.
As a result, the risk of cause of infection was found to get lowered by ozonation of
water with cryptosporidium parvum. Gutiérrez et al. [18] have reported the complex-
ities faced and the immediate need for incorporation of drinking water legislation.
Acharya et al. [19] have pressed upon the fact that how the quality of rivers and lakes
are declining in spite of so much advancement in technology. All this is because of
the human activities and sudden climatic changes. In their study, they have specifi-
cally focussed on rivers and lakes in Nepal. As per Munos et al. [20], it can be clearly
that the issue of availability of clean drinking water is the criticality worldwide and
it has been one of the major reasons for 1.7 million deaths annually due to water
borne diseases like diarrhoea. Various researches have been done using different
approaches in order to predict the quality of water before consuming. This paper is
a step forward attempt to explain the amount of elements required for the water to
be safe for drinking using XAI approach through AutoML with a higher accuracy of
97.58%. He et al. [21] have also stated that AutoML provides the promising results
without involving human intervention as well the outcomes are reliable enough to
build a deep learning system.
594 P. Dwivedi et al.
3 Research Design
The research design schema proposed for this paper as shown in Fig. 1 can be divided
into six steps which are:
The use case for this [22] research is gathered from Kaggle through secondary
research, and it is tailored for the purpose of education, research practice and acqui-
sition of adequate knowledge of basic mineral composition of water and conclude
whether the water is safe to drink or not. The dataset consists of 8000 data points and
20 features excluding the target variable—safeness of water, which has been cate-
gorized into safe and not safe. The independent variables in this dataset are nothing
but the minerals and their Optical Density (OD) level of which some are aluminium,
perchlorate, ammonia, nitrate etc.
Here, a sound and clear understanding of the dataset is required as it forms the pre-
requisite for the next stage that is pre-processing of data. Clear understanding of the
target variable is attained in this step which is of binary classification type in this
case. It is a vital stage because it decides the algorithm to be deployed later in the
machine learning pipeline.
This step involves the initial cleaning of the dataset including presence of any
unusual data pattern, presence of any outlier or missing values, one-hot encoding
and balancing of target variable using Synthetic Minority Oversampling Technique
(SMOTE) statistical method.
This stage can be further be divided into two steps: Model Building and Model
Evaluation.
Before starting with this step, it is important to split the dataset into train and test data
where 80% of the data is allocated for training, and remaining 20% for testing. Post
this, the data is fed into the model through Auto ML approach using PyCaret library of
machine learning which automates the task of model selection and hyper-parameter
tuning imparting the advantage of evaluating and returning the list of ranked models
based on accuracy and efficiency over time, through various metrics. The model at the
top of the list as per the dataset in this paper is the Light Gradient Boosting Method
(LGBM) showing an accuracy of 96.58% which was further improved to 97.58%
after optimization. In other words, it can be said that out of every 100 predictions
made 97 are correctly predicted indicating whether the minerals in water are present
in optimum quantity or not. LGBM is a tree-based ML algorithm which makes use
of Gaussian technique to filter the data. Other ranked models in the list are Gradient
Boosting Method(GBM), Random Forest Classifier, Decision Tree and Ada Boost
Classifier respectively, which forms the top 5 suggested models here.
596 P. Dwivedi et al.
The working efficiency of the models is decided based on various evaluation metrics.
Being a binary classification problem, the evaluation metric used in this paper are:
a. Confusion Matrix—It is n × n square matrix, where n represents the count of
classes present in the target variable, where the rows of the matrix represents
the actual class and the columns represents the predicted class. The four tags of
the matrix are True Positives (T.P), False Negatives (F.N), False Positives (F.P)
and True Negatives (T.N). Parameters that can be derived from the matrix are:
a.1. True Positive Rate and True Negative Rate which are the ratio of
deriving the actual positive and negative predictions, respectively, over
the total actual prediction in case of true positive rate whereas total actual
negatives in case of true negative rate. The same can be represented in
Eqs. (1) and (2):
a.2. False Positive Rate and False Negative Rate which are the ratio of
deriving the actual negative, predicted as positives and actual positives,
predicted as negatives, respectively, over total actual negatives in case of
false positive rate whereas total actual positives in case of false negative
rate. The equation representation of the same is shown in Eqs. (3) and
(4), respectively.
b. Accuracy Rate—As per Eq. (5), it is the ratio of deriving the sum of correct
predicted values over the sum of predicted values which forms one of the most
important evaluation metrics for any classification model
c. Precision (P)—As per Eq. (6), it is the ratio of deriving the predicted actual
positive over the sum of predicted positives. It is basically used when the model
cannot afford to have much of false positives. Hence, neglecting it is more vital
than encountering any false negative.
d. Recall (R)—As per Eq. (7), it is the ratio of obtaining the predicted actual
positives over the sum of actual positives. It is basically used when the model
cannot afford to have much of false positives. Hence, neglecting it becomes
more crucial than encountering any false positives.
e. F1 Score—As per Eq. (8), it is the reciprocal of arithmetic mean of precision and
recall which indicates the accuracy of classifier in terms of number of instances
when it is able to classify the classes of target variables correctly. It is mostly
used with the combination of other evaluation metrics.
f. AUC—It is Area Under Curve measures the whole area falling under the ROC
curve. Its ranges hover between 0 and 1. The higher the AUC of the model,
better the model is at distinguishing between the positive and negative classes.
g. Cohen Kappa’s Co-efficient—As per Eq. (9), it is a measure of the accordance
between a pair of variable which represents the degree to which the data is the
correct representation of the measured value. It ranges between 0 and 1.
Here,
t o = the observed accordance among the variable.
t e = the hypothetical ratio of chance accordance.
h. Mathew’s Correlation Co-efficient (MCC)—As per Eq. (10), It is the measure
to gauge the effectiveness of the entire confusion matrix with a single value.
Higher the value of MCC, better the model at making the correct prediction.
√
MCC = (T.P ∗ T.N)−(F.P ∗ F.N)/ (T.P + F.P)(T.P + F.N)(T.N + F.P)(T.N + F.N)
(10)
i. Model Deployment—It is the final stage of the machine learning pipeline where
the trained ML model is taken and made available to users and to other systems.
It is beyond the scope of the study.
The results attainment process begins with the Exploratory Data Analysis of the
dataset, which is followed by various other analyses.
598 P. Dwivedi et al.
Here, the first step is to analyze all the independent variables with reference to the
target variable which is ‘is_safe’ in this case. The visuals for the same shown in
Fig. 2 is comprised of violin and Kernel Density Estimate (KDE) plots which gives
the overall birds view of the variables and their distribution pattern which in returns
helps to check for skewness of the data and detection and removal of outliers, if any.
The heat map shown in Fig. 3 shows the degree to which two variables are asso-
ciated with each other. A negative correlation value indicates an inverse relation
between the two variable and the positive correlation value means direct relation.
A very high correlation value between any two independent variables can lead
to multi-collinearity which needs to be removed. In this dataset, no such case of
multi-collinearity was detected.
The most vital feature of PyCaret—that of giving a list of models ranked on the basis
of their accuracy along with other evaluation metrics—was utilized to shortlist the
top 5 models for further optimization.
Table 1 shows the list of top 5 default and optimized models ranked on their basis
of their accuracies. Other evaluation metrics like AUC, recall, precision, F 1 Score,
Kappa and MCC are also given which helps in selecting the top model to proceed
with further steps of obtaining its feature importance and other related results for the
same. From the above Table 1, it can be concluded that the top model suitable for
Explainable AI (XAI) for Social Good … 599
this dataset is Light Gradient Boosting method which has an accuracy of 96.58% for
the default model and 97.58% after optimizing it. Other evaluation metric of LGBM
like F 1 score after the model optimization is observed to be 0.8831 which proves to
be a good score to move ahead with and it also shows the correctness of the recall
and precision values as F 1 score is nothing but the weighted average of both.
600 P. Dwivedi et al.
Figure 6 shows the discrimination threshold plot for the model which shows the
certainty or score at which the positive class is chosen over the negative ones whose
ideal value is 0.5. The threshold value for the above model is 0.47, thus indicating
nearer to perfect balance between the cases.
Figure 7 shows the ROC curve along with its AUC score for the top model. ROC
curve is the certainty curve plotted as True Positive Rate on y-axis against the False
Positive Rate on the x-axis and the AUC curve score is the measure of separability
indicating how well the model is able to differentiate between classes. In Fig. 7, the
AUC for ROC of both classes that is 0 and 1 are 0.99 which shows that the model is
able to predict perfectly whether the water is safe for drinking or not.
Figure 8 reflects the Precision-Recall curve for LGBM where the average precision
comes out to be 0.96 which proves that the model is good enough to detect all the
positive values correctly.
Explainable AI (XAI) for Social Good … 603
Figure 9 shows the learning curve for the classifier better known as the validation
curve of the model. It is a tool which makes it easy to determine whether a bias error
or the variance error effects the estimator more.
Also termed as interpretable AI in which the outcome of the classifier can easily be
interpreted by human while having a complete understanding of the entire machine
learning path thus demystifying the black box machine learning approach. This
approach of Explainable AI is termed as white box approach as it gives a deeper
insight from the model which can help the various stakeholders make better business
decisions and also forms the base for prescriptive analytics. It comprehends the model
via numerous plots like feature importance plot, SHAP plot etc.
Feature importance plot is a feature based visual in which all the features are
individually assigned a score called feature importance score which is obtained using
various permutation techniques. The level of importance of any feature in the use case
is judged on the basis of the rate of change that feature shows in its score when any
change or modification is made in the model. The change in the feature importance
score not only indicates the higher importance of that variable as per the dataset
but also its high degree of association with the target variable. Beyond this, it can
additionally help in dimensionality reduction thus making the model run time short.
In Fig. 10, the top five features are perchlorate, ammonia, nitrates, radium and
chloramine which in this case means that the presence of the above mentioned mineral
is of utmost important to declare water as mineral rich and fit for drinking.
SHAPley values, a widely used approach of game theory also provides an edge to
Explainable AI where the input variable sums the difference between the current and
expected model output for the prediction. The SHAP bar plot in Fig. 11 indicates the
sum of mean SHAP values of both the classes of the target variable that is whether
the water is safe for drinking or not. As per the above dataset, the sum of mean
SHAP value is highest for aluminium which is coming out be around 2.7 followed
by cadmium which is 2.4. The SHAP Scatter plot in Fig. 9 shows the correlation of
the top 2 features obtained from the SHAP bar plot. This indicates that higher the
value of the average predicted value, higher is its importance and effect on the target
variable.
In the individual force plot (Fig. 12), red depicts the features which are driving
the predictions above the base value whereas those driving the predictions lower
than the base value are shown in blue. Here, base value means the average of the
estimators distributed over the whole input space. The model predicted output for
the LGBM model i.e. f (x) comes out to be −4.54 which is much higher than the
average predicted value i.e. −5.9. This indicated the high contribution of a particular
feature to a high output value. As per Fig. 12, contribution of mineral aluminium
as an independent variable towards the portability of water is 0.07 higher than the
average model predicted output.
5 Conclusion
This research paper has proposed Light Gradient Boosting Machine algorithm as it
was at the top of the leader-board of accuracy-based ranked models suggested by
PyCaret to conclude whether the water is safe for drinking or not. The proposed model
with an optimized accuracy of 97.58% along with various other evaluation metrics
like AUC, Kappa co-efficient, precision, recall and MCC shows a very promising and
satisfactory result which are good enough to contribute to this domain and help its
various stakeholders. The top three feature obtained from the feature importance plot
are perchlorate, aluminium and ammonia which shows that these are the most impor-
tant minerals to be present in water to make it mineral rich and best fit for drinking.
The unique approach of Explainable AI in this research provides an extra edge to it
by indicating the features (minerals) which should necessarily be present in water. It
does so with the help of Shapley bar and correlation plot which shows that aluminium,
cadmium and silver are the minerals whose proper and adequate combination can
make water mineral rich and good enough for drinking. The conclusion made by
various organizations working in this domain and by domain experts regarding the
water quality threshold and the important minerals matches with the result shown in
this paper. Hence, it can be concluded that machine learning techniques can aptly be
applied to resolve such issues. Therefore, proper attention by various water giants
and municipal corporations towards the presence of these mineral should be a special
focus to make water potable but also make this most fundamental need of human
being easily accessible to all in a healthier manner.
References
8. H. Lu, X. Ma, Hybrid decision tree-based machine learning models for short-term water quality
prediction. Chemosphere 249, 126169 (2020)
9. K. Chen, H. Chen, C. Zhou, Y. Huang, X. Qi, R. Shen, F. Liu, M. Zuo, X. Zou, J. Wang, Y.
Zhang, D. Chen, X. Chen, Y. Deng, H. Ren, Comparative analysis of surface water quality
prediction performance and identification of key water parameters using different machine
learning models based on big data. Water Res. 171, 115454 (2020)
10. M. Yilma, Z. Kiflie, A. Windsperger, N. Gessese, Application of artificial neural network in
water quality index prediction: a case study in Little Akaki River, Addis Ababa, Ethiopia.
Model. Earth Syst. Environ. 4, 175–187 (2018)
11. T. Abbasi, S. Abbasi, Water Quality Indices (Elsevier, Oxford, 2012)
12. M. Ehteshami, N. Farahani, S. Tavassoli, Simulation of nitrate contamination in groundwater
using artificial neural networks. Model. Earth Syst. Environ. 2–28 (2016)
13. R. Barzegar, A. Asghari Moghaddam, Combining the advantages of neural networks using
the concept of committee machine in the groundwater salinity prediction. Model. Earth Syst.
Environ. 2–26 (2016)
14. S. Hussain, M. Habib-Ur-Rehman, T. Khanam, A. Sheer, Z. Kebin, Y. Jianjun, Health risk
assessment of different heavy metals dissolved in drinking water. Int. J. Environ. Res. Publ.
Health. 16, 1737 (2019)
15. B. Gao, L. Gao, J. Gao, D. Xu, Q. Wang, K. Sun, Simultaneous evaluations of occurrence and
probabilistic human health risk associated with trace elements in typical drinking water sources
from major river basins in China. Sci. Total Environ. 666, 139–146 (2019)
16. L. Chatalova, N. Djanibekov, T. Gagalyuk, V. Valentinov, The paradox of water management
projects in central Asia: an institutionalist perspective. Water 2017, 300 (2017)
17. A. Havelaar, A. De Hollander, P. Teunis, E. Evers, H. Van Kranen, J. Versteegh, J. Van Koten,
W. Slob, Balancing the risks and benefits of drinking water disinfection: disability adjusted
life-years on the scale. Environ. Health Perspect. 108(4), 315–321 (2000)
18. A. Gómez-Gutiérrez, M. Miralles, I. Corbella, S. Garcia, S. Navarro, X. Lleberia, Drinking
water quality and safety. 63–68 (2016)
19. T. Acharya, A. Subedi, D. Lee, Evaluation of machine learning algorithms for surface water
extraction in a Landsat 8 scene of Nepal. Sensors 19, 2769 (2019)
20. M. Munos, C. Walker, R. Black, The effect of oral rehydration solution and recommended
home fluids on diarrhoea mortality. Int. J. Epidemiol. 39, i75–i87
21. X. He, K. Zhao, X. Chu, AutoML: a survey of the state-of-the-art. Knowl. Based Syst. 212,
106622 (2019)
22. Water quality. https://www.kaggle.com/mssmartypants/water-quality. Retrieved Accessed on
15 July 2021
Prediction of Dynamic Virtual Machine
(VM) Provisioning in Cloud Computing
Using Deep Learning
Abstract The increasing usage of remote services and high marketplace compe-
tition requires cloud service providers to plan and provision computing resources
efficiently, while providing affordable services and managing their data center expen-
ditures. Generally, IaaS cloud resources are managed by predicting either long-term
workload or long-term resource utilization pattern. But it does not give any genuine
information about the necessary memory/CPU before exposing it to the physical
machine. So, the prediction of dynamic virtual machine (VM) provisioning is a chal-
lenging problem in cloud computing. In this paper, we explored CPU usage details
of VMs in Azure cloud dataset to predict utilization patterns. The dataset is used to
train several deep learning models. Training with CPU utilization as the target class,
we predict the minimum, maximum, and average CPU utilization values. The results
are then analyzed using multiple evaluation metrics. After evaluating the different
models, we conclude that the GRU performs better in predicting the CPU utilization.
B. Padhi
National Institute of Science and Technology, Berhampur, Odisha 761008, India
M. Reza (B)
Department of Mathematics, GITAM University Hyderabad, Hyderabad 502329, India
e-mail: mreza@ieee.org
I. Gupta
Bennett University, Greater Noida, Uttar Pradesh 201310, India
P. S. Nagendra
RVR and JC College of Engineering, Guntur, Andhra Pradesh 522019, India
S. S. Kumar
Saintgits College of Engineering, Kottayam, Kerala 686532, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 607
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_46
608 B. Padhi et al.
1 Introduction
The world has witnessed that cloud computing has emerged as a well-adopted
computing paradigm that offers computing resources such as CPU, memory, server,
network, and platform beyond the geographic boundaries. Cloud services are acces-
sible over the Internet, in a seamless manner from individual to enterprise-level on
pay-per-use basis. However, the pandemic COVID-19 crisis has been like an exam-
ination that no one was prepared for. All the offices, educational institutes, business,
scientific research, entertainment, and even personal work have shifted from personal
interaction to the virtual world. Such a massive shift to the digital domain would have
been an impossible thing a decade ago. It would have been even difficult to imagine
such a thing. It has tested the digital efficiency and ability of online applications to
a more reliable, secure, and dramatic scale tremendous achievement, which is only
possible due to cloud computing. Cloud service providers (CSP) manage their data
centers and provide all services access to their end-users based on the service level
agreement (SLA).
Large companies are already having their own cloud server to handle their day-
to-day work. But smaller companies, academic institutions are now adapting to this
new normal and shifting their works to online platforms for the safety and security
of their stakeholders. However, these institutions are dependent on commercially
available cloud servers to fulfill their requirements. Because of these reasons, the
number of online classes and labs, conference calls, zoom meetings, etc., have been
skyrocketed. So currently, millions of new users now actively seeking a good cloud
provider. Due to this high marketplace competition, providers are facing many diffi-
culties to generate interesting features and services, while managing the data center
expenditures. Now, cloud service providers need to handle over as well as under-
utilization of the cloud resources to manage the quality and cost of the service. This
over/under-utilization of the resources is also referred to as the load and demand issue.
A load in the cloud server can be comprised of CPU utilization, memory capacity,
network, traffic, etc., and load balancing is the process of distributing the workload
among various servers to optimize resource utilization and avoid overloading. To
achieve this load balancing, cloud servers implement various load balancing algo-
rithms (LBA). Based on the system state, LBAs can be static (e.g. round robin,
min-min, max–min algorithm) where the load balancing is done at compile-time
based on processing power, memory, etc., or dynamic (e.g. ant colony optimization,
biased random sampling algorithm) where load balancing is done dynamically based
on different policies and state of the nodes. In [1], the authors have explained the
working, pros, and cons of these algorithms in detail. Although these algorithms
do a decent job in real-time load balancing in servers, they lack the ability to track
trends and patterns in the workload which can be useful in further optimization
of the resource’s utilization. Therefore, this forecasting the virtual machine (VM)
provisioning in cloud servers is a learning issue now.
Cloud resource prediction and VM provisioning have drawn the attention of the
cloud computing research community. As we can find the involvement of machine
Prediction of Dynamic Virtual Machine (VM) Provisioning … 609
learning (ML) and artificial intelligence (AI) for predicting cloud resources, but
no existing research work claims definite clouds resource prediction. The models
developed for this problem can be broadly divided into three groups [2]: analytical,
computational intelligence, and simulation models. The analytical model follows
the greedy approach to reduce the search time. The models such as fuzzy logic,
neural networks, genetic algorithms, and multi-agent systems are used in computa-
tional intelligence models. Overdriver, memory-buddies, and VMCtune are used in
simulation model-based prediction.
Many models have been proposed to deal with the problem of cloud resource
prediction across the data center for fulfilling the SLA parameters. In [2], the authors
proposed a Bayesian model against the workload patterns of Amazon and EC2. The
model can decide the slow and fast VM resources but is only applicable for computing
and memory-optimized resources. However, transaction throughput and latency into
underlying resources, e.g. vCPU cores have not been considered here. In the paper
[3, 4], the authors took a data-centric approach for generating a prediction model for
cloud resources. Here, both regression models and RNNs are used to devise a model.
For regression models, the author chose to predict 95 percentile CPU usage data
separately for delay-insensitive and interactive VMs. The authors concluded that the
delay-insensitive VMs are more stable than interactive VMs. In the RNN model,
the author used LSTM to predict the minimum (min), maximum (max), and average
(avg) CPU usage for a specific VM in the overall data. Much research investigated
and modeled for resource allocation provisioning in different cloud-based models
[4–7]. In [8], authors had reported a cost-saving benefit of dynamic scaling of cloud
resources which does not require any setup time. But the approach was designed
for workload prediction based on past traces and generally, it is applicable for auto-
scaling only. The limitations of the work can be marked as unarranged historical
data for a specific application domain. Chen et al. [9] proposed an EEEMD-ARIMA
method for cloud resource prediction which is a short-term method and based on
ensemble empirical mode decomposition. They verified the effectiveness of this
method and compared the experimental results of the EEMD-ARIMA method with
those of the ARIMA model in terms of RMSE, MAE, and MAPE.
Cloud resource waste is the main concern of most IT companies around the world
that are using public cloud services. The public cloud customers will spend more than
an astonishing amount of 50 billion dollars on the IaaS from the providers such as
AWS, Azure, Google, and VMware [10]. This boom has arrived due to the compar-
atively broader adoption of the public cloud services and further within the existing
accounts, the expansion of the infrastructure. Many-a-times, the growth in spending
surpasses the growth in commerce. The primary reason is that the significant chunk
of what companies spend on cloud resources gets wasted. Cloud resource waste holds
significance, not only in terms of resources that are not used but also the spending
on the resources, which are largely ignored and go unchecked. Therefore, keeping a
check on the usage, and adequately monitoring them is the need of the hour. It is appli-
cable for both small-scale enterprises and large organizations to carry out daily tasks,
in a proper cost-efficient manner. The leading causes of this issue are as follows: In
cloud resource prediction, there are two components, i.e., the actual computing load
610 B. Padhi et al.
and the maximum computing load. Generally, the companies opt for the maximum
computing load, in the way of making sure that everything is running smoothly when
the needs come to utilize the resources at their full capacity. However, such situations
do not often come up, where on regular days, the consumption requirement is much
lower. Idle resources are mainly seen in development centers. Here, various stages
like testing, staging, and various other courses of action are seen. Around 80% of the
organizations and data centers are occupying more server capacity than they require.
They are not only increasing their budget, but they are also proving to be problems for
the service providers. People are not utilizing pay-per-use cloud services to improve
efficiency and utilizing the resources, as per the demand. Although cloud resources
are not like fossil fuels, it has it is own kind of limitations too. Therefore, dynamic
cloud resource prediction is a recognized hot topic of research.
Although there are several techniques available to characterize and predict the
workload, IaaS cloud resources are mostly managed efficiently and used by fore-
casting the long-term workload or the long-term resource utilization pattern. Cloud
service providers can now collect metrics out of their own infrastructures and analyze
them with proper ML techniques and effectively enhance performance. The type and
nature of workload at a public cloud are never fixed, therefore more desirable tech-
niques are required to forecast the workload on cloud servers. In this paper, we have
tackled the above issue. Here, the Microsoft Azure dataset [11, 12] is taken into
consideration and various deep learning techniques are applied to it for predicting
the future workload, then these models are compared to find the best one. After
analyzing the whole Microsoft Azure cloud dataset, we looked deeper into the CPU
usage details of specific VMs, which are grouped on the basis of their timestamps, to
predict future utilization patterns. To select the best suitable model for our project,
several machine learning and deep learning models like GRU, LSTM, and IndRNN
have been tested as these models perform better on time series data. The normalized
dataset is used to train the prediction models. Training with CPU utilization as the
target class is done using these deep learning algorithms. We used Google Colab
notebooks to run the entire analysis. The “min CPU,” “max CPU,” and “avg CPU”
utilization values are predicted using these machine learning techniques and selected
the suitable model. The results are evaluated using mean absolute percentage error
(MAPE), mean absolute error (MAE), and root mean square error (RMSE).
2 Dataset
The Azure dataset shows part of the actual first-party virtual machine (VM) work-
load of Microsoft Azure in a region. The first-party workload is comprised of internal
VMs and first-party VM services. This dataset released in 2017 consists of 2,013,767
VMs [11]. It is acquired over 5958 Azure subscriptions. The time series data were
acquired over a period of 30 days and contain VM CPU utilization readings and
VM information table for each 5 min. So, the total number of VM CPU utilization
readings available in the dataset are 1,246,539,221. The whole dataset is divided
Prediction of Dynamic Virtual Machine (VM) Provisioning … 611
into over 128 files. It consists of 1 subscription, 1 deployment, 1 VM table, and 125
vm_cpu_reading files. Characteristics of the dataset include (1) encrypted identifi-
cation number of the VMs and deployment and subscription to which it belongs (2)
the VM category, (3) VM size in terms of max core, memory, disk allocation, (4) the
minimum, average, and maximum VM resource utilization during the 5 min.
3 Methodology
This section describes the data preprocessing, data formatting, data analysis, and
description of the models used in the proposed methodology.
Table 3 Vm_cpu_readings
Timestamp Min CPU Max CPU Avg CPU
after formatting
utilization utilization utilization
2017–01-01 715,146.5368 2,223,302.433 1,229,569.371
0:00:00
2017–01-01 700,473.8403 2,212,393.246 1,211,321.709
0:05:00
2017–01-01 705,953.5659 2,213,056.745 1,206,634.914
0:10:00
2017–01-01 688,383.0732 2,187,572.239 1,190,368.507
0:15:00
Here, Table 3 shows few examples of the records after the data are transformed
and grouped and Fig. 2 shows the min, max, and avg CPU utilization of all the entries.
Now, the vm_cpu_readings data have been properly formatted. This is time
series/sequential data. As we know, recurrent neural networks (RNNs) models are
614 B. Padhi et al.
best known for analyzing sequential data. So, we are using 3 RNN variants for the
prediction model. These RNN variants are.
• Long short-term memory (LSTM)
• Gated recurrent unit (GRU)
• Independently recurrent neural network (IndRNN).
Our proposed technique for the analysis is shown in Fig. 3.
Initially, the dataset is divided into the training set and test set (generally in the
80–20 ratio). Then, the look back value is set. The input and output to the model
are determined by this look back value. For example, if look back = i, then 0 to ith
entries of the file is the first input and ith value is the first output. Similarly, then i +
1th to 2*ith entries of the file is the second input, and 2*ith value is the second output.
After dividing the training and test set into inputs and outputs, a model is chosen,
and its parameters are initialized. The optimizer and no. of epochs are defined, and
the model is trained and validated on the training set. Then, the trained model is
tested on the test set and evaluated using various evaluation metrics. This process is
repeated for all the models.
The deep learning models used in our proposed methodology, i.e., LSTM, GRU, and
IndRNN are imported from the Keras library in Python. The optimizer is the Adam
optimizer, the loss function is the mean squared error for all the models, and they are
trained for 20 epochs each. Table 4 provides the summaries of the models.
4 Result
We have used 3 different evaluation metrics for the evaluation of models which are
as follows.
RMSE is the more accustomed metric for regression models, and it is defined as the
square root of the mean squared difference between the target value and the predicted
value as shown in Eq. (1).
1 n 2
RMSE = y j − ŷ j (1)
n j=1
The MAE is more efficient to handle outliers and does not penalize the errors as
extremely as mean squared error. The mathematical formula for MAE is shown in
Eq. (2).
1
n
MAE = y j − ŷ j (2)
n j=1
Fig. 4 Prediction of min, max, and avg. CPU workload using GRU
5 Conclusion
displayed the best-desired result. After testing all the models, it is faster than LSTM
and IndRNN. So, this model can be used by the CSPs to optimize their resources
according to both cloud users’ and cloud providers’ benefits.
The limitation of the model is that it is susceptible to small datasets. But the service
providers constantly track their resource usage logs which can be used to generate
large datasets on workload and CPU utilization. So, when bulk data on cloud servers
utilization become available, this model can be modified and trained accordingly to
provide an optimum result.
References
1. T. Deepa, D. Cheelu, A comparative study of static and dynamic load balancing algorithms in
cloud computing, in 2017 International Conference on Energy, Communication, Data Analytics
and Soft Computing (ICECDS) (2017), pp. 3375–3378
2. G.K. Shyam, S.S. Manvi, Virtual resource prediction in cloud environment: a Bayesian
approach. J. Netw. Comput. Appl. 65, 144–154 (2016). ISSN 1084–8045. https://doi.org/10.
1016/j.jnca.2016.03.002
3. M. Hariharasubramanian, Improving application infrastructure provisioning using resource
usage predictions from cloud metric data analysis. Retrieved from. https://doi.org/10.7282/t3-
y8e4-5v69
4. R. Moreno-Vozmediano, R.S. Montero, E. Huedo et al., Efficient resource provisioning for
elastic cloud services based on machine learning techniques. J. Cloud. Comput. 8, 5 (2019).
https://doi.org/10.1186/s13677-019-0128-9
5. B. Sotomayor, R.S. Montero, I.M. Llorente, I. Foster, Resource leasing and the art of
suspending virtual machines, in 2009 11th IEEE International Conference on High Perfor-
mance Computing and Communications, 2009, pp. 59–68. https://doi.org/10.1109/HPCC.200
9.17
6. B. Sotomayor, K. Keahey, I. Foster, Combining batch execution and leasing using virtual
machines, in Proceedings of the 17th International Symposium on High Performance
Distributed Computing. ACM: USA, 2008, pp. 87–96
7. C. Li, L.Y. Li, Optimal resource provisioning for cloud computing. J. Supercomput. 62(2),
989–1022 (2012)
8. E. Caron, F. Desprez, A. Muresan, Pattern matching based forecast of non-periodic repetitive
behavior for cloud clients. J. Grid Comput. 9(1), 49–64 (2011)
9. J. Chen, Y. Wang, A resource demand prediction method based on EEMD in cloud computing.
Procedia Comput. Sci. 131, 116–123 (2018)
10. https://www.gartner.com/en/documents/3982411. Accessed on 24 April 2021
11. Azure Public Dataset: https://github.com/Azure/AzurePublicDataset. Accessed on 25 April
2021
12. E. Cortez, A. Bonde, A. Muzio, M. Russinovich, M. Fontoura, R. Bianchini, Resource central:
Understanding and predicting workloads for improved resource management in large cloud
platforms, in Proceedings of the 26th Symposium on Operating Systems Principles (SOSP
‘17). Association for Computing Machinery, New York, NY, USA, 2017, pp. 153–167. https://
doi.org/10.1145/3132747.3132772
Explainability of Deep Learning-Based
System in Health Care
Abstract Ocular disease is an eye disease that reduces the eye’s ability to work
normally. Early ocular disease detection is important to avoid blindness caused by
some of the diseases like cataracts, glaucoma, diabetes, age-related macular degen-
eration (AMD), etc. Artificial intelligence (AI) techniques have been used to build
systems for the speedy diagnosis of such diseases. In recent years, the deep neural
network (DNN) has shown remarkable success in this area. But the black box nature
of such systems has created questions on the use of DNN in a high-risk system like
health care. Explainable AI (XAI) is a suite of methods and techniques that provides
explanations of predictions made by AI systems. This helps to achieve accountability,
transparency, and debugging of the model in the healthcare domain. In this paper, we
have proposed to develop an ocular disease classification model and an XAI method
that can be used to explain the classification of eye diseases, from eye fundus images.
1 Introduction
Ocular disease is one of the eye diseases that affects the normal functioning of the eye
and reduces visibility. Early detection of ocular disease is curable. For the prevention
of vision loss or blindness, regular eye checkups are mandatory. According to World
Health Organization (WHO) [1], currently, 2.2 billion people in the world are facing
vision impairment, and from this, 1 billion people are facing near or distance vision
impairment that has been prevented. Eyesight reduction can have lifelong effects like
work, daily activities, and health status. Automatic detection of disease is critical
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 619
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_47
620 S. Kinger and V. Kulkarni
and must prevent vision loss. The ocular surface disease is nothing but the damage
to the surface layers of the eye called cornea and conjunctiva. Ocular disorders
can happen at any age. There are dozens of ocular disorders; some may just be an
infection, while others may be more serious and lead to vision loss or even blindness.
Some of the com- mon ocular disorders are refractive errors, cataracts, diabetic
retinopathy, glaucoma, and myopia or nearsighted. Early and accurate detection of
disease prevents blindness.
Hence, this has developed the interest of several researchers in the field of auto-
matic detection of ocular pathologies. With the recent advancement in deep learning
technologies, AI-based methods have provided high performance in disease detec-
tion based on image classification. The machine learning-based models hence can
prove to be very effective in preventing blindness caused by cataracts, age-related
macular degeneration, diabetes, glaucoma, etc.
Machine learning (ML) has shown exponential growth in recent years. ML is
playing a key role in applications like social media with notable examples like
sentiment analysis, filtering spam, etc.; transport examples—autonomous vehicle,
air traffic control safety monitoring; financial services examples—fraud detec-
tion, algorithmic trading portfolio management; healthcare examples—disease diag-
nosis, drug discovery, robotic surgery; eCommerce examples—product recom-
mendation, customer support, advertising; virtual assistance examples—intelligent
agents, natural language processing, and many more. Machine learning’s sub-field,
deep learning, is growing and replacing human beings at many places, and at some
places, it has outperformed human beings as well. Some of the applications where
deep learning has outperformed in vision, text, and speech areas include light-duty
autonomous vehicles get approval to ply in California, Robots take over warehouses
in Amazon, Google AI beats human players at strategy games StartCraft, AI helps
doctors identify cancer cells, GAN generate real-life human faces, etc.
The inherent structure of deep learning algorithms involves nested layers of
nonlinear neurons that are highly accurate and successful models in their predic-
tion. How- ever, these models are black box in nature as they lack transparency in
providing the ex- act cause of their predictions. Due to these limitations, despite high
accuracy, the effectiveness and adoption of these systems are very low. Hence, lack of
transparency poses a major hurdle for its adoption in critical domains like autonomous
driving, medical applications, military, and legal to name a few. Explainable machine
learning (XAI) [2] or interpretable machine learning (IML) [3] programs comprise a
suite of machine learning methodologies that help develop more explainable models
without impacting the properties like the high accuracy of the models. Moreover, it
helps human users understand the predictions, trust the results, and effectively deploy
these systems for wider adoption.
The explanation is a mechanism that helps verification of decisions made by the AI
model. For example, an explainer for cancer detection AI model using microscopic
images generates an explanation that maps pixels from input image to output predic-
tion. Similarly, explainer for speech recognition AI model identifies specific time
slice of power spectrum from input audio that contributed most toward the output
Explainability of Deep Learning-Based System in Health Care 621
2 Related Work
In this section, we cover some basic terminology used in XAI, some properties of
XAI algorithms, business benefits of XAI.
Below are some of the most important benefits why a business should adopt XAI for
its AI-based solutions.
• Model performance: Model or dataset bias is easy to understand when we know
how the model is working and arrive at a particular decision.
• Decision making: Stakeholders can make better sales strategy if the reasoning
behind predictions made by the model is known.
• Control: XAI can help to ensure that the data without permission cannot be used
for analysis.
• Trust: XAI can help to build trust by providing interpretable models.
• Ethics: XAI helps identify biases in the model and hence makes it ethical for
adoption in production.
• Accountability: Who is accountable for an AI system’s decisions.
• Regulations: Ensures governance, accuracy, transparency, and explainability are
high for the AI model.
The author in [8] has used the CNN model and classified hard exudates to the central
pixel. In another work [9], researchers have used ImageNet pre-trained DCNN model
to detect AMD. During preprocessing, image cropping and resizing are done and
then used for the classification of early stages/intermediate or intermediate/advanced
stages. In the paper [10], the author has used 24 layers of CNN with batch normal-
ization and max-pooling to classify the disease. In [11], 18-layer CNN model is
proposed for glaucoma detection. Tan et al. [12] have proposed a diagnosis of AMD
624 S. Kinger and V. Kulkarni
at an early stage using a fourteen-layer deep CNN model. The diabetic retinopathy
lesions exudates, hemorrhages, microaneurysms can segment through a ten-layer
CNN as shared in the work by Tan et al. [13]. Kwasigroch et al. [14] proposed
a method that can classify five classes of diabetic retinopathy (DR) using the DL
network. Chai et al. [15] use a model to detect glaucoma using several CNN models—
one by directly injecting fundus images, the second to obtain optic disk region using
Faster-RCNN, and the third to segment disk area, cup area, and parapapillary atrophy
(PPA) area. Dai et al. [16] identified microaneurysms candidates through automatic
image-to-text mapping. Grassmann et al. [17] proposed a method of nine classes of
AMD disease—a random forest algorithm is trained based on results provided by
CNN’s. Khojasteh et al. [18] detect the exudates using pre-trained residual networks
(ResNet-50). Jain et al. [19] use DL method to classify healthy and disease retinal
fundus images and tested on two datasets one from Friedrich-Alexander University
and the other from the local hospital in Bangalore with the accuracy of 96.5–99.7%.
In most of the research work, DL is used to classify ocular disease but not providing
any explanations like which part or feature of the eye image contributed to specific
classification. We are proposing a model for ocular disease classification and explain
the results of the same by applying the XAI method.
In this section, we describe the details of the CNN architecture used for ocular disease
classification. We have used images of the eye from the ocular disease recognition
dataset [20] to train the model. We also share the complete procedure for building
the solution which is divided into three major steps that are discussed in subse-
quent sections: the dataset preparation, training and evaluating the trained model,
and analysis of the results. Figure 1 shows the overall system where the labeled
ocular disease recognition dataset is first processed for resizing and then images are
augmented. The pre-trained model is then trained using this dataset, and finally, the
results are analyzed using the explainable method. The explanations provided by the
XAI method are used for validation with the subject matter expert. If the explana-
tions provided indicate erroneous model prediction, then appropriate steps are taken
to improve the model.
For our experiments, we are going to use the ocular disease intelligent recognition
(ODIR) dataset [20] which contains color fundus pictures of left and right eyes of
approx. 6000 patients. The data also contains metadata about the age of the patient
Explainability of Deep Learning-Based System in Health Care 625
and diagnostic keywords from doctors. It is one of the biggest datasets collected by
Shanggong Medical Technology Co. Ltd. The data is collected from various hospi-
tals/medical centers in China and is real patient information available in this area.
Since the data is collected by different medical institutions using different vendor
cameras, the image size/resolution varies a lot. Under the supervision of trained
human readers, these images are annotated to classify images into eight classes—viz.
normal(N), diabetes (D), glaucoma (G), cataract (C), AMD (A), hypertension (H),
myopia (M), and other diseases/abnormalities (O). Figure 2 shows the distribution
of data across various classes.
After exploring the dataset, some of the challenges in the ODIR dataset observed
were:
• Data is highly unbalanced. A large proportion of the dataset is having normal (N),
diabetes (D) images while the occurrence of other disease images is very few.
• Multi-label disease images are also part of this dataset; other
diseases/abnormalities (O) contain images of different eye disease images.
• The dataset contains high resolution and size around 2976 × 2976 or 2592 ×
1728 pixels images which will take a longer time for the model’s training.
Data preprocessing steps required to create a valid dataset are:
• Applied random zoom, random rotation, flip left–right, flip top–bottom, etc. on
the original image, as shown in Fig. 3, to reduce data unbalance for certain classes.
• Labeling is done by renaming image names based on the diagnostic keywords.
• To reduce the training time, images are resized to 250 × 250 pixels.
626 S. Kinger and V. Kulkarni
Deep learning-based approaches like convolutional neural network (CNN) has shown
the highest accuracy. CNN model consists of many layers that help to train the large
dataset and learn features of it correctly. In recent years, many researchers have used
CNN in all areas of classification including health care. It is an unsupervised method
that learns itself to extract features during the training phase. In this work, we have
used the CNN architecture (Fig. 4) with an input layer that takes 250 × 250 RGB
images. The first two 2D convolutional layers extract the features from the input
images followed by the ReLU activation function. To reduce the spatial size of input
representation, max-pooling layers are added, and to avoid overfitting, two dropout
layers are added. Finally, a dense layer with shape 8 is added to map each class of
data.
Explainability of Deep Learning-Based System in Health Care 627
The model is trained on 20 epoch with learning rate of 0.00001 and batch size
of 32. The CNN model is trained and tested in python programming language with
Keras’s deep learning framework. All images are resized to a specific size using the
OpenCV tool before training. Some of the evaluation metrics like accuracy, precision,
recall, sensitivity, and specificity used to check the performance of this model and
shown in Table 2. Figure 5 shows the training and validation loss and accuracy values
concerning the number of an epoch. We are going to validate the predictions made
by this model using XAI methods.
628 S. Kinger and V. Kulkarni
Fig. 5 Training and validation loss and accuracy vs number of the epoch
Some of the most commonly used model-agnostic post-hoc methods to explain local
and global predictions of any DNN model are—Local Interpretable Model-Agnostic
Explanations (LIME) [21], Anchors: High-Precision Model-Agnostic Explanations
[22], Shapley Additive Explanations (SHAP) [23]. These are perturbation-based XAI
techniques while Saliency Maps [24], Gradient class activation mapping (CAM)
(GradCAM) [25], Gradient class activation mapping (GradCAM++) [26], Layer-
wise Relevance Propagation (LRP) [27] are gradient-based explainability methods.
Figure 6 shows comparative results from some of the (gradient-based and
perturbation-based) methods when applied to the ImageNet dataset [28].
We have used LIME for explaining the classification of the model. LIME high-
lights the area which contributes toward the classification. The model’s predictions
and image data are given as input to XAI methods to generate visual explanations.
These explanations can be analyzed with the help of an eye specialist. This analysis
will help to validate the predictions.
Explainability of Deep Learning-Based System in Health Care 629
LIME is a model-agnostic post-hoc method that works on text, image, and tabular
data. It perturbs the test observation to create local linear model. The output of LIME
is feature importance that is a list of explanations, reflecting the contribution of each
feature to the prediction of a data sample. The overall decision boundary in points
is complex but in the neighborhood of a single decision, the boundary is simple. A
single decision can be explained by auditing the black box around the given instance
and learning a local decision.
Snippet 1 shows high-level LIME algorithm, and Fig. 7 depicts the implementation
of LIME on the image classification model. The algorithm creates random segments
of the input image using a segmentation algorithm (e.g., quickshift). The image is
then perturbed randomly by masking different segments. The perturbed image is
then fed to the original interpretable model and calculate the class prediction and
cosine similarity between the original and perturbed image. Using these neighboring
samples and their outcome, a linear regression model is generated which is then
used to predict the local outcome and mask the original image with corresponding
segments to generate the required explanation.
630 S. Kinger and V. Kulkarni
Fig. 8 Different class predictions (row 1 and 3) by the model and their corresponding LIME
explanations (row 2 and 4). (Green portion indicates area contributing toward prediction.)
The stricter regulations from European General Data Protection Regulator (GDPR)
mandates results produced by AI systems be transparent in terms of providing expla-
nations behind its prediction. In the healthcare domain, where the risk of erroneous
pre- diction is high, it is extremely important to deploy XAI methods to make AI
systems transparent and traceable. The XAI methods need continuous development
to enable better improvements in AI-based systems.
Through this work, we have introduced XAI and introduced basic terminologies
used in XAI. The prior work in ocular disease detection has been limited to develop-
ment of DL model and its evaluation based on prediction accuracy. In our work, we
have applied XAI method on the ocular disease identification AI model to explain the
632 S. Kinger and V. Kulkarni
reasoning behind predictions made by the system and analyzed the results obtained
from the XAI method.
The future work involves applying more explainers to our model and comparing
their explanations. Another area to explore is using different CNN model architecture
to compare the prediction results on ocular data and compare their results using one
of the explainers.
References
15. Y. Chai, H. Liu, J. Xu, Glaucoma Diagnosis Based on Both Hidden Features and Do- main
Knowledge through Deep Learning Models. Knowl.-Based Syst. 161, 147–156 (2018). https://
doi.org/10.1016/j.knosys.2018.07.043
16. Dai, L., Fang, R., Li, H., Hou, X., Sheng, B., Wu, Q., Jia, W.: Clinical Report Guided Reti- nal
Microaneurysm Detection With Multi-Sieving Deep Learning. IEEE Transactions on Medical
Imaging 37(5), 1149–1161 (2018). https://doi.org/10.1109/tmi.2018.2794988
17. Grassmann, F., Mengelkamp, J., Brandl, C., Harsch, S., Zimmermann, M.E., Linkohr, B.,
Peters, A., Heid, I.M., Palm, C., Weber, B.H.: A Deep Learning Algorithm for Prediction of
Age-Related Eye Disease Study Severity Scale for Age-Related Macular Degeneration from
Color Fundus Photography (2018). https://doi.org/10.1016/j.ophtha.2018.02.037
18. P. Khojasteh, L.A.P. Júnior, T. Carvalho, E. Rezende, B. Aliahmad, J.P. Papa, D.K. Kumar,
Exudate detection in fundus images using deeply-learnable features. Comput. Biol. Med. 104,
62–69 (2019). https://doi.org/10.1016/j.compbiomed.2018.10.031
19. Jain, L., Murthy, H.V.S., Patel, C., Bansal, D.: Retinal Eye Disease Detection Using Deep
Learning. In: 2018 Fourteenth International Conference on Information Processing (ICIN-
PRO). pp. 1–6 (2018), https://doi.org/10.1109/ICINPRO43533.2018.9096838
20. Larxel: Ocular Disease Recognition (2020). https://www.kaggle.com/andrewmvd/ocular-dis
ease-recognition-odir5k. Accessed on 20 April 2021.
21. Ribeiro, T., Singh, S., Guestrin, C.: ”Why Should I Trust You? In: Proceedings of the 22nd
ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD
’16. pp. 1135–1144. ACM Press (2016)
22. Ribeiro, T., Singh, S., Guestrin, C.: Anchors: High-Precision Model-Agnostic Explanations.
Proceedings of the AAAI Conference on Artificial Intelligence 32(1) (2018)
23. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions. Advances in
Neural Information Processing Systems pp. 4765–4774 (2017)
24. Simonyan, K., Vedaldi, A., Zisserman, A.: Deep Inside Convolutional Networks: Visualising
Image Classification Models and Saliency Maps. CoRR abs/1312.6034 (2013). http://dblp.uni-
trier.de/db/journals/corr/corr1312.html#SimonyanVZ13
25. Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-CAM:
Visual Explanations from Deep Networks via Gradient-Based Localization. In: Proceedings of
the IEEE International Conference on Computer Vision (ICCV). pp. 618–626 (2017). https://
doi.org/10.1109/ICCV.2017.74
26. Chattopadhay, A., Sarkar, A., Howlader, P., Balasubramanian, V.N.: Grad-CAM++: General-
ized Gradient-Based Visual Explanations for Deep Convolutional Networks. Proceedings of
the 2018 IEEE Winter Conference on Applications of Computer Vision (WACV) pp. 12–15
(2018)
27. S. Bach, A. Binder, G. Montavon, F. Klauschen, K.R. Müller, W. Samek, On Pixel-Wise
Explanations for Non-Linear Classifier Decisions by Layer-Wise Relevance Propagation. PLoS
ONE 10(7), e0130140–e0130140 (2015). https://doi.org/10.1371/journal.pone.0130140
28. Das, A., Rad, P.: Opportunities and Challenges in Explainable Artificial Intelligence (XAI): A
Survey. ArXiv abs/2006.11371 (2020)
A Hybrid MSVM COVID-19 Image
Classification Enhanced with Swarm
Feature Optimization
Abstract COVID-19 (novel coronavirus disease) is a serious illness that has killed
millions of civilians and affected millions around the world. Therefore, numerous
technologies that enable both the rapid and accurate detection of COVID-19 illnesses
will provide much assistance to healthcare practitioners. A machine learning-based
approach is used for the identification of COVID-19. In general, artificial intelligence
(AI) approaches have yielded positive outcomes in healthcare visual processing and
analysis. CXR is the digital image processing method that plays a significant role
in the analysis of corona disease. In this research article, at the initial phase of
the process, a median filter is used for the noise reduction from the image. Edge
detection is an essential step in the process of COVID-19 detection. The canny
edge detector is implemented for the detection of edges in the CXR images. The
principal component analysis (PCA) method is implemented for the feature extraction
phase. There are multiple features extracted through PCA. The essential features are
optimized by an optimization technique known as swarm optimization is used for
feature optimization. For the recognition of COVID-19 through CXR images, a
hybrid multi-class support vector machine technique is implemented. The particle
swarm optimization (PSO) technique is used for feature optimization. The proposed
system has achieved an accuracy of 97.51%, specificity (SP) of 97.49%, and 98.0%
of sensitivity (SN).
1 Introduction
Due to the outbreak of an incurable illness throughout China during early 2019,
numerous persons fall affected in a regional supermarket. The illness was initially
unidentified; however, experts identified its indications as being comparable with
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 635
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_48
636 B. Singh and R. Agarwal
coronavirus illness and influenza. The precise reason for COVID-19 infectious
diseases was undetermined at first; however a scientific inspection and evaluation
of significant samples using a legitimate PCR test, the latent infection was identified
and termed “COVID-19” mostly on World Health Organization (WHO) suggestion
[1]. The COVID-19 outbreak spreads quickly across international borders, wreaking
havoc on the worldwide patient’s quality of life, economy, and well-being [2].
According to Worldometers data recorded in mid-July 2021, upwards of 86 million
people across the world infected with COVID and more than 1,870,000 individuals
dying as a result including its illness. CXR images are utilized to provide the internal
information of the body parts. X-ray images are the electromagnetic rays produced
by the radiation and show the internal parts of the body [3]. COVID-19 can also
detect through CXR images. The images of chest X-ray provide the internal view of
the chest that can easily detect the virus. Example of a chest X-ray Image is shown
in Fig. 1.
The WHO declared coronavirus infection 2019, a pandemic around March 11th,
2020, owing to the rapid flu virus and widespread [4]. Normally, the first case was
reported in Wuhan City of South China on 31st December, 2019. COVID-19 epidemi-
ologic component was identified and isolated as a new coronavirus, which was given
the name 2019-nCoV at first [5]. The disease genomic was ultimately transcribed,
and the World Association of Categorization of Infections dubbed it acute respira-
tory syndrome (SARS-CoV-2) since it was morphologically closer to the COVID-19
breakout that caused the SARS spread in 2003.
In this section, we have mentioned three different types of COVID-19 tests such
as polymerase chain reaction (PCR) test, COVID-19 antigen tests, and COVID-19
antibody test.
PCR testing is used to examine for the accumulation of RNA polymerase in the
organism, which can be detected when receptors are formed or signs of the disease
appear. This indicates that the testing can detect whether or not someone is infected
with the virus at an early stage of sickness [7].
Any extraneous materials or chromosomal DNA in the body that causes an antibody
reaction are referred to as a COVID-19 antigen [8]. This examination aids in the iden-
tification of antigens associated with the COVID-19 disease. Antigen testing often
referred to as antigen detection testing is a type of diagnostic test that provides results
more quickly than molecular assays. However, there is a disadvantage: Antigen
testing is more likely to lose an actual diagnosis.
low-breathing, lung infection cough, and cold. There are various techniques avail-
able for the detection of COVID-19 but still it some methods have some problems
such as (i) imbalanced data and classification issues, (ii) multiclass or multi-label and
hierarchical classification issues, and (iii) issues in features flattening and high false
negative rate. The deep learning-based model is designed by [1] which is based on
a convolutional neural network (CNN) known as decompose, transfer, and compose
(DeTraC) that is utilized to categorize the COVID-19 images of CXR. The discovery
of class boundaries is used with the help of class decomposition and handles non-
identical datasets. DeTraC was established by utilizing several pre-trained CNN
models, in which an extreme accuracy rate was achieved by VGG19 in DeTraC. The
outcomes proved the ability of DeTraC to recognize COVID-19 test cases from a
descriptive image dataset gathered from various health centers globally. The defined
model (DeTraC) got the maximum accuracy of 93% and 100% sensitivity in iden-
tifying COVID-19 images of chest X-ray (CXR) from normal cases and respiratory
lung problem cases.
This paper is organized into various sections: Sect. 2 has included a detail survey
of the COVID-19 disease using existing methods. Research methodology has been
explained in Sect. 3. In Sect. 4, discussed the dataset description, simulation result
analysis, and result discussion. The conclusion and further work have given in Sect. 5.
2 Literature Review
Abbas et al. [9] designed a deep learning-based model which was based on convo-
lutional neural network (CNN) known as DeTraC that utilized to categorize the
COVID-19 images of CXR. The discovery of class boundaries is used with the help
of class decomposition and handles non-identical datasets. DeTraC was established
by utilizing several pre-trained CNN models, in which an extreme accuracy rate was
achieved by VGG19 in DeTraC. The outcomes proved the ability of DeTraC to recog-
nize COVID-19 test cases from a descriptive image dataset gathered from various
health centers globally. The defined model (DeTraC) got the maximum accuracy of
93% and 100% sensitivity in identifying COVID-19 images of chest X-ray (CXR)
from normal cases and respiratory lung problem cases. Raajan et al. [10] suggested
a strategy that used the ResNet architecture and convolution neural network for
training the pictures provided by the CT scan to efficiently diagnose coronavirus-
affected individuals. The infected person was correctly determined by comparing the
training and testing files. On the public dataset based on computed tomography, the
accuracy and specificity were 95.09% and 81.89%, respectively. The results were
alone taken without incorporation of other statistics such as specific regions, the
density of population. Based on the findings, it was clear that the proposed approach
would accurately classify corona positive cases of patients. To classify COVID-19
chest X-ray frames, Zebin et al. [11] used a matrix factorization approach. The
researchers employed two openly searchable chest X-ray datasets 1, 2. The classi-
fication method distinguishes between infections in the lungs caused by corona and
A Hybrid MSVM COVID-19 Image Classification Enhanced … 639
3 Proposed Methodology
Upload the (Chest Resize the input Convert 3D Add ar ficial Noise
X-ray) Images image to 2D image (Salt & Pepper)
Op mized
Feature Regions or edges Smooth Image
classifica on Extrac on calculated Calculated (Median
using
Using PCA (Canny) Filter)
PSO+MSVM
and detection/classification of the COVID-19 CXR images. For this, a new hybrid
approach is used to enhance the accuracy rate, time consumption, and other param-
eters. Generally, we will be using machine learning concepts which have to follow
two phases like training and testing process. Figure 2 is the flowchart of the proposed
methodology, and the detailed steps are described as below:
This proposed work searches the dataset from the online sites. The proposed dataset
was named KAGGLE (COVID-19-radiography-database) [14]. The dataset is a
collection of 133 chest X-ray (CXR) images. These datasets are of different types
such as viral pneumonia, COVID, and normal X-ray images. Initially, upload the
CRX medical images and resize the image 0–255 range set. In research work, 3D
image is converted to 2D dimensional image. It optimized the image dimensional
size of the input image. It only detects unwanted noise in the converted image. In pre-
processing, the step is implemented median filter method to remove the unwanted
noises in the input image. After that, the filtration process detects the ROI component
using the CANNY edge detection method.
642 B. Singh and R. Agarwal
This phase implemented the principal component analysis (PCA) method for feature
extraction. This method is a dimensionality optimization method and searches the V
(eigenvectors) of a covariance matrix with maximum E (eigenvalues) and then utilizes
those to project the CRX image data into a novel sub-space of equal or minimum
dimensions. PCA steps are discussed as (i) input image converts into matrix (row,
column), (ii) mean value calculated, (iii) covariance matrix (Co) calculated, (iv)
calculate the E and V (eigenvalues and vectors), and (v) sort the E by decreasing
order to rank the corresponding V.
The data collection is the most critical feature of the project. We are using CXR
images in this project, which are divided into three sections. The chest X-ray picture
dataset is freely available on the KAGGLE Website. There are total of 477 images
present in the dataset which is the collection of three different types of images. The
first type is COVID-19 positive case images that have 133 images. The second type
is normal CXR medical images that are 133 in numerals. The third and the least type
is viral pneumonia images that hold 133 images. It defines the dataset collection in
different types such as COVID, normal, and virus of pneumonia [14].
(i) MSE: MSE is defined as the estimator that measures the mean of square of
the difference between expected and actual values [15]. The mathematical
equation of MSE is shown in Eq. 1.
n
2
MSE = 1/m xi − xi∧ (1)
k=0
TP + TN
Accuracy = (2)
TP + FP + FN + TN
TN
Specificity = (3)
TN + FP
TP
Sensitivity = (4)
TP + FN
The proposed work is performed in the MATrix Laboratory (MATLAB) using simu-
lation tool and GUI designed DESKTOP APPLICATION. A total of 133 * 3 images
have been used in these experimental results, including COVID-19, COVID-19 pneu-
monia, normal CXR (X-ray images of the chest) images are given. It almost 133 * 3
images of CXR images have been used for training (knowledge domain). The testing
module has used 97 images for detection system. The testing domain represents
the proposed model hybridization (PSO + MSVM) classification approach; long-
term dataset CXR images of COVID-19 diseases have been selected for the testing
domain.
Figure 3 defines the COVID-19 disease category CXR image is uploaded, and
secondly defines the resize the test uploaded image 0–255 range. It converts the 3D
CXR image into a 2D CXR image and calculates the grayscale image. It mitigates
the dimensionality of the CXR uploaded images.
Figure 4 shows and identifies the noise data in the uploaded test CXR image.
Firstly, apply the median filtration method and evaluate the smooth or noise-free
image. Secondly, the canny edge detector method is applied to calculate the edges
of the filtered image. It works very smoothly without affecting the features of edges.
Figure 5 defines the feature extraction line-graph using the PCA algorithm. PCA
will be utilized to minimize the size of the chest X-ray images as well as to remove
the unwanted feature vectors. The images of the dataset are of m * n pixel size. The
identical features of X-ray images of the chest will be identified with the help of
eigenvectors and values. The input image will be compared with other images of the
dataset and the features of images will be compared. Then, the similar features will
be extracted.
Figure 6 defines the feature selection procedure using PSO method. In this method,
to choose the reliable or valuable feature sets and defines them in matrix format.
This approach depends on proposed performance metrics modification, varied search
strategy, and modifies the solution-space to generate the search simply using various
global and public features. After that MSVM algorithm is implemented to select
features then classifies the disease. The message box defines the diagnosed disease
category in the various CXR images. The diagnosis detection procedure is done by
PSO + MSVM (hybridization). Figure 7 shows the accuracy rate with proposed
model. Figure 8 defines the confusion matrix of the proposed model. This proposed
model has improved the accuracy rate as compared with the existing model. And
after, the accuracy graph shows the confusion matrix which is represented by the
TN, TP, FN, and FP. The full form of these terms like TP = true positive, TN =
true negative, FN = false negative, and FP = false positive. Figures 9, 10, and 11
0 206 133
True Label
1 97 477
0 1
Predicted Label
show the comparison analysis with proposed and existing methods using GoogleNet,
SqueezeNet, and DeTrac method. This proposed method has improved the accuracy
rate, specificity (SP), and sensitivity (SN).
Table 4 shows the comparison between hybridization and existing methods using
PSO + MSVM, DeTrac, GoogleNet, and SqueezeNet classifiers. The proposed
system performance value of accuracy is 97.51 with hybridization PSO + MSVM;
accuracy value is 97.30%, GoogleNet value is 89.60%, and SqueezeNet value is
82.70%. The existing DeTrac approach SP performance value is 98.20%, GoogleNet
is 90.30%, and SqueezeNet classifier value is 83.80%. The existing DeTrac approach
SN performance value is 96.30%, GoogleNet is 88.80%, and SqueezeNet classifier
value is 81.40%. The proposed work performance with parameters such as Acc. value
is 97.51%, SP value is 97.49, SN value is 98.0, and MSE value is 0.0030.
It concluded the COVID-19 (novel coronavirus illness) is a serious illness that has
killed millions of civilians and contaminated people all over the world. There are
ample approaches for the detection of COVID-19 but still, there are some challenges
A Hybrid MSVM COVID-19 Image Classification Enhanced … 649
that are not resolved by existing methods. The most challenging parameter is the
accuracy of the methods. The hybrid methodology has been used for the detection
of COVID-19. Digital image processing is very helpful in the medical field. There
are several advantages of digital image processing that are discussed in this work.
Machine learning (ML) and optimization techniques are used for efficient results.
The MSVM is used for the classification of images. The KAGGLE COVID-19 CXR
image dataset is used for the training and testing purpose of the network. There is
around 133 * 3 images exist in the dataset. The dataset is divided into three sub-
parts that are COVID-19 positive cases, normal cases, and viral pneumonia cases.
The proposed system is compared with existing systems and achieved better perfor-
mance, and the compared systems are DeTrac, GoogleNet, and SqueezeNet. The
performance of the proposed methodology is examined with four different parame-
ters which are accuracy of 97.51%, specificity of 97.49%, and 98.0% of sensitivity.
Moreover, there are certain flaws in this research work. A more extensive analysis, in
particular, necessitates a higher number of healthcare data, particularly COVID-19
statistics.
For more efficient results, in future, pre-trained feature extractor will be imple-
mented. More CXR images will be used for validation of the model. The effec-
tive method will be introduced to improve the precision rate and mitigate time
consumption.
References
1. M. Ghaderzadeh, F. Asadi, Deep learning in the detection and diagnosis of COVID-19 using
radiology modalities: a systematic review. J. Healthcare Eng. 2021 (2021). https://doi.org/10.
1155/2021/6677314
2. N. Chen et al., Epidemiological and clinical characteristics of 99 cases of 2019 novel
coronavirus pneumonia in Wuhan, China: a descriptive study. Lancet 395(10223), 507–513
(2020)
3. S. Minaee et al., Deep-COVID: predicting covid-19 from chest X-ray images using deep transfer
learning. Med. Image Anal. 65, 101794 (2020)
4. N. Zhu et al., A novel coronavirus from patients with pneumonia in China, 2019. New Engl. J.
Medicine (2020). https://doi.org/10.1056/NEJMoa2001017
5. W.G. Dos Santos, Natural history of COVID-19 and current knowledge on treatment therapeutic
options. Biomed. Pharmacother. 110493 (2020)
6. T. Singhal, A review of coronavirus disease-2019 (COVID-19). Indian J. Pediatr. 87(4), 281–
286 (2020)
7. Q. Cai, S.-Y. Du, S. Gao, G.L. Huang, Z. Zhang, S. Li, X. Wang, P.-L. Li, P. Lv, G. Hou, L.-N.
Zhang, A model based on CT radiomic features for predicting RT-PCR becoming negative in
coronavirus disease 2019 (COVID-19) patients. BMC Med. Imag. 20(1), 1–10 (2020)
8. A. Mohanty, A. Kabi, S. Kumar, V. Hada, Role of rapid antigen test in the diagnosis of COVID-
19 in India. J. Adv. Med. Med. Res 77–80 (2020)
9. A. Abbas, M.M. Abdelsamea, M.M. Gaber, Classification of COVID-19 in chest X-ray images
using DeTraC deep convolutional neural network. Appl. Intell. 51(2), 854–864 (2021)
10. N.R. Raajan, V.S. Lakshmi, N. Prabaharan, Non-invasive technique-based novel corona
(COVID-19) virus detection using CNN. Nat. Acad. Sci. Lett. 44(4), 347–350 (2021)
650 B. Singh and R. Agarwal
11. T. Zebin, S. Rezvy, COVID-19 detection and disease progression visualization: deep learning
on chest X-rays for classification and coarse localization. Appl. Intell. 51(2), 1010–1021 (2021)
12. R.M. Pereira et al., COVID-19 identification in chest X-ray images on flat and hierarchical
classification scenarios. Comput. Methods Programs Biomed. 194, 105532 (2020)
13. N.S. Punn, S.K. Sonbhadra, S. Agarwal, COVID-19 epidemic analysis using machine learning
and deep learning algorithms. MedRxiv (2020)
14. COVID-19 Radiography Database (2021). Retrieved 1 April 2021, from https://www.kaggle.
com/tawsifurrahman/covid19-radiography-database
15. Z. Wang, A.C. Bovik, Mean squared error: love it or leave it? A new look at signal fidelity
measures. IEEE Signal Process. Mag. 26(1), 98–117 (2009)
16. F. Khatami, M. Saatchi, S.S.T. Zadeh, Z.S. Aghamir, A.N. Shabestari, L.O. Reis, S.M.K.
Aghamir, A meta-analysis of accuracy and sensitivity of chest CT and RT-PCR in COVID-19
diagnosis. Sci. Rep. 10(1), 1–12 (2020)
QCM Sensor-Based Alcohol
Classification Using Ensembled Stacking
Model
Abstract Alcohol consumption is the global yoke of injury and disease attributable
as per the early study. The excessive intake of alcohol is coupled with unconstructive
consequences and jeopardizing future prospects. This paper presents an ensemble
model made of an array of five chemical compounds of quartz crystal microbalance
(QCM) sensors to find the corresponding compositions of a gas mixture. This study
makes use of QCM sensor responses to determine the gas compositions. These phys-
ical device sensors are used to sense the resonance frequency change of gas sensors
by classifying the chemical compounds and recognizing their harmful effects. The
main focus of the study is to determine the reaction of QCM sensors to five different
alcohols, such as 1-octanol, 1-propanol, 2-butanol, 2-propanol, and 1-isobutanol, and
to determine the effective sensor type in the classification of these compounds. The
experiment is conducted to classify and identify the constituent component amount
through an ensemble classifier to progress the efficiency of the QCM sensors. The
results of 125 different scenarios illustrated that various alcohols could be classified
effectively using a stacking classifier from the QCM sensor data.
P. Suresh Kumar
Department of Computer Science and Engineering, Aditya Institute of Technology and
Management (AITAM), Tekkali, India
H. S. Behera
Department of Information Technology, Veer Surendra Sai University of Technology, Burla
768018, India
R. Pedada · G. M. Sai Pratyusha · V. Velugula
Department of Computer Science and Engineering, Dr. Lankapalli Bullayya College of
Engineering, Visakhapatnam 530013, India
J. Nayak (B)
Department of Computer Science, Maharaja Sriram Chandra BhanjaDeo (MSCB) University,
Baripada, Odisha 757003, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 651
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_49
652 P. Suresh Kumar et al.
1 Introduction
Nowadays, the detection of chemical compounds and their effects plays an important
role. The alcohols have adverse effects when used in high quantities. Generally,
alcohols like ethanol, propanol, and methanol are using in many skincare products,
medicine, drugs, and cleansers. Some food items will also have alcohol for the long
durability of the item unintentionally consuming by the human. In order to reduce
the effects caused by alcohol, there must be some intelligent computing techniques
that detect alcohol. Now, the pattern classification with gas sensors is best suited for
recognition, detection, and classification. For the classification of alcohol, a highly
selective sensor is required. Hence, QCM is a promising technology in detecting
alcohols. QCM is an acoustic sensor having an array of gas sensors. The fact behind its
sensor ability is that it detects the mass deposited on its crystal surface by estimating
the change in its resonance frequency (f). QCM is an e-nose sensor that resembles the
human nose to detect alcohols through its odor since different alcohols have different
aromas.
Many industries strive for technologies that inexpensively detect chemical
compounds. QCM is a sensor with high sensitivity, stability, low cost, low power
requirements, small size, and low weight. By its thin layer, it can be used in gas
absorption studies [1]. The alcohols would place on a thin layer of QCM and are
identified by changing their fundamental oscillation frequency, as shown in Eq. 1.
C f f 02
f = m (1)
A
where
A is the area of sensitive layers,
Cf is the mass sensitivity constant of the quartz crystal,
f0 fundamental resonance of the quartz crystals,
m mass change.
The array of gas sensors is used as a detecting system to measure the alcohol
mixtures. The sensors would produce a signal when the mass is placed on its thin
layer. The signals from the sensors are preprocessed and taken as a dataset by using
preprocessing algorithms. Machine learning algorithms train the dataset to find the
corresponding composition of alcohols [2].
Machine learning is more accessible with information as it can process, analyze,
and generate data from existing data, and this process can be automated. Machine
learning can have a continuous upgrade as we can add new information without
changing the historical data such that we can add new features to the model and
improve the algorithm for better accurate results. Sometimes machine learning
models have time constraints with fewer data, and more training to the system
is required while classifying to make the predictions and decisions easily. The
algorithms must be a promising technique to verify the data [3].
QCM Sensor-Based Alcohol Classification Using Ensembled … 653
Artificial neural network (ANN) is one of the most used machine learning tech-
niques for alcohol classification [4]. Classification results are used for different
approaches like effects of alcohol in cosmetics and hygiene products, predicting
flavors in Chinese liquors [5], estimating anesthesia dose [6], and many. Palaniappan
et al. [7] addressed the problem of alcohol classification and proposed an array of
operational amplifiers. Because of its prediction and decision, many classification
models in machine learning are used to find the type of alcohol placed on QCM.
Connor et al. [8] combined clinical experiences with machine learning algorithms
such as decision trees, which promise a better understanding of complex relation-
ships. Ordukaya and Karlik [9] analyzed the raw data is collected from fruit juice
and alcohol mixture, fruit juice halal authentication, and alcohol mixture with e-
nose by using KNN, support vector machines are used in most of the studies. Also,
hybrid models such as Kanna et al. [10] utilized multilayer perceptron using the
backpropagation algorithm to classify the alcohol abusers on the features extracted
from gamma brand spectral power of the multichannel visual evoked potential signal
in the time domain. The ensemble learning technique is a machine learning technique
that produces the best predictive output from the base models. So that it deals with
the pros of machine learning by ignoring its cons, we can choose the best technique
to apply to data and predict the best outcome.
The significant contribution of this research includes:
(a) A stacking classifier is proposed to classify the different types of alcohol
(b) Classification of alcohol experimentation is done by using the dataset from the
UCI repository proposed by Fatih et al. [11]
(c) Compare the performance of the proposed stacking classifier with several
machine learning methods such as stochastic gradient descent (SGD), Gaussian
naive Bayes (GNB), quadratic discriminant analysis (QDA), multilayer percep-
tron (MLP), linear discriminant analysis (LDA), linear regression (LR), deci-
sion trees (DT), K-nearest neighbor (KNN), gradient boosting (GB), AdaBoost.
From the results, it is evident that the proposed method outperformed well.
The remaining paper is organized into five sections: Sect. 2 presents the literature
study on alcohol classification using QCM sensors. Section 3 describes the method-
ology used, including the proposed method, performance measures, and experimental
setup. Section 4 explains the environment setup and result analysis of various machine
learning and ensemble learning algorithms on alcohol classification using QCM
sensors have been discussed in Sect. 5. Section 6 concludes the paper.
2 Literature Study
Adak et al. [12] developed a model to classify the alcohols obtained by QCM sensors
with different characteristics using ant bee colony-based neural network. Optimiza-
tion is achieved by an artificial bee colony that is based on nectar searching behavior.
The performance of the proposed model is evaluated using mean absolute percentage
654 P. Suresh Kumar et al.
error and mean square error. Based on the evaluation results of 300 scenarios, the
proposed method successfully classified the alcohols.
Katardjiev et al. [13] applied support vector machine, random forest, decision
trees, and K-nearest neighbor algorithms on clinical trial data of alcohol addicted
patients by a Uppsala-based company for alcohol relapse forecasting. K-nearest
neighbor predictor fitted with a radial basis function (RBF) kernel to model the data,
it predicts the best results for explained variance and root mean square error. Hence,
it is evident that ML-based models help predict addicted patients.
Li et al. [5] proposed a random forest technique that is optimized by reversing the
number of decision trees used to predict the flavors of Chinese liquors. The proposed
random forest classifier is compared with various machine learning classifiers such as
linear discriminant analysis, backpropagation artificial neural network (BP-ANN),
and support vector machine. Modified random forest outperformed well in terms of
accuracy than other models.
Some more literature on alcohol classification in QCM sensors using different
machine learning algorithms has been presented in Table 1.
3 Proposed Methodology
Considered features X = {xi ∈ R m }, set of class labels Y = {yi ∈ N } and data for
training are given as DS = {xi , yi }i=1
m
, here the learning model is M on the training
data DS. In the first level, learning is performed on the original training dataset with
distributed weights, and learning parameters are tuned on the base classifier. New
datasets are created and predicted; the labels from the output of first-level classifiers
are considered new features. In place of using predicted labels, we can use probability
estimators of the said first-level classifiers.
The proposed stacking approach utilizes three base classifiers: K-nearest neighbor,
random forest, and Gaussian naive Bayes to predict alcohol type, and logistic regres-
sion is used as a meta-classifier in the proposed method. Integration of indepen-
dent methods of the proposed model and overall systematic process structure is
represented in Fig. 1.
656 P. Suresh Kumar et al.
4 Environment Setup
In this section, we have discussed the dataset we have used for experimentation,
performance measures considered to evaluate the models, and finally, explained the
simulation environment and parameter setting of various models. Data preprocessing
is applied to the dataset through data cleaning methods and then transforms into struc-
tured vectors. These vectors are divided into an 80:20 ratio for training and testing.
We used the Intel i5 processor with 6 GB RAM on Windows 10 operating system. The
proposed method and various machine learning algorithms are implemented using
Scikit-learn, which is an open-source machine learning library based on python.
QCM Sensor-Based Alcohol Classification Using Ensembled … 657
Alcohol QCM sensor data is considered in this experiment and is available openly
at the UCI machine learning repository [11]. Five different gas sensors are used
for classification with various gas measurements such as 1-octanol, 1-propanol, 2-
butanol, 2-propanol, and 1-isobutanol [12]. The feature distribution helps determine
the dataset’s characteristics. From the feature distribution, we can identify the data’s
possible temporal range and recurrence of occurrences. In comparison with partially
skewed and fully skewed features, ordinarily, normally distributed characteristics are
highly useful in obtaining good accuracy. Feature distribution of the alcohol dataset
is shown in Fig. 2.
The dataset is analyzed, and the performance has been evaluated by various evalua-
tion metrics like True Positive (TP), False Positive (FP), True Negative (TN), False
Negative (FN), True Negative Rate (TNR), False Negative Rate (FNR), and accuracy,
recall, precision [21].
The experimentation was carried on the alcohol dataset obtained by the QCM sensor
available in the UCI machine learning repository. Other competitive ML-based algo-
rithms are simulated on the QCM sensor alcohol dataset along with the stacking
classifier to obtain effective performance. The proposed method is compared with
various machine learning techniques such as K-nearest neighbor, decision trees,
stochastic gradient descent, Gaussian naive Bayes, logistic regression, linear discrim-
inant analysis, multilayer perceptron, quadratic discriminant analysis, and some more
ensemble methods such as AdaBoost, and gradient boosting. The parameter setting
for the proposed method and other compared methods is shown in Table 2.
5 Result Analysis
This study shows the performance of ensemble stacking classifier for the various
machine learning models which are presented in Table 3. The SGD and GNB classi-
fiers deliver an enormous misclassification rate performance for all the classes. The
accuracy of SGD and GNB is 0.37 and 0.44.
The classifiers QDA and MLP were classified precisely with the same results.
Class 1-octanol is categorized accurately. The 1-propanol class classifies 12 instances
correctly, and the remaining 8 instances are misclassified as 2-butanol, 2-propanol,
and 1-isobutanol. The 2-butanol class classifies 9 instances correctly, and the
remaining 11 instances are misclassified as 1-propanol, 2-propanol, and 1-isobutanol.
The 2-propanol class shows that 17 of them are classified correctly, and the remaining
6 of 1-propanol and 2-butanol classes are misclassified as 2-propanol. The 1-
isobutanol class classifies overall 20 instances correctly, but 3 instances of 1-propanol
and 2-butanol are misclassified as 1-isobutanol. The overall accuracy of these is 0.77,
whereas the individual accuracy of 100% is achieved for the 1-octanol class. The
false positive, false negative, and FPR give 0 and recall, F1-score, precision give 1
for 1-octanol class compared to other classes.
The LDA classification classifies the classes 1-octanol and 1-isobutanol precisely,
i.e., the individual accuracy of these classes is 100%. The 1-propanol class shows
that 15 of them are classified correctly, and the remaining 5 are misclassified as
QCM Sensor-Based Alcohol Classification Using Ensembled … 659
Table 2 Parameter setting of various models with ‘Alcohol using QCM sensor’
Technique Parameter setting
K-nearest neighbor n_neighbors = 5,
weights = ‘uniform’,
algorithm = ‘auto’
Decision tree criterion = ‘gini’,
splitter = ‘best’,
max_depth = none,
Stochastic gradient descent Solver = ‘hinge’
max_iter = 1000
Gaussian Naive Bayes var_smoothing = 1e-09
Logistic regression random_state = 1,
solver = ‘newton-cg’
Linear discriminant analysis solver = ‘svd’
Multilayer perceptron activation = ‘logistic’,
batch_size = 10,
random_state = 2,
solver = ‘adam’
Quadratic discriminant analysis reg_param = 0.0,
store_covariance = False,
tol = 0.0001
AdaBoost base_estimator = DecisionTreeClassifier(max_depth = 5),
n_estimators = 50,
learning_rate = 1.0
Bagging base_estimator = DecisionTreeClassifier(),
n_estimators = 10,
max_samples = 1.0
max_features = 1.0,
Gradient boosting n_estimators = 40,
random_state = 1
Stacking classifiers = [KNN, RF, GNB], meta_classifier =
LogisticRegression(),
use_probas = True, use_clones = False
2-butanol and 2-propanol. The 2-butanol class shows that 14 of them are classified
correctly, and the remaining 6 are misclassified as 1-propanol and 2-propanol. The
2-propanol class shows that 14 of them are classified correctly, and the remaining 7
are misclassified 1-propanol and 2-butanol. The individual accuracy of 1-propanol,
2-butanol, and 2-propanol classes is 0.92, 0.87, and 0.87. The LDA gives an overall
accuracy of 0.83.
The result analysis of the LR classifier in the case of 1-octanol and 1-isobutanol
class is classified precisely; therefore, the TPR, F1-score, precision, and accuracy
of these classes are 100%. The 1-propanol class shows that 16 of them are classified
correctly, and the remaining 4 are misclassified as 2-butanol and 2-propanol. The
2-butanol class shows that 16 of them are classified correctly, and the remaining 4
660 P. Suresh Kumar et al.
Table 3 (continued)
Class label Model TP TN FP FN TPR FPR F1-score Precision Accuracy Over all
accuracy
QDA 17 73 6 3 0.85 0.08 0.79 0.74 0.90
MLP 17 73 6 3 0.85 0.08 0.79 0.74 0.90
LDA 14 72 7 6 0.70 0.09 0.68 0.67 0.87
LR 18 77 2 2 0.90 0.03 0.90 0.90 0.96
DT 19 78 1 1 0.95 0.01 0.95 0.95 0.97
GB 18 78 1 2 0.90 0.01 0.92 0.95 0.97
KNN 20 79 0 0 1.00 0.00 1.00 1.00 1.00
AdaBoost 20 79 0 0 1.00 0.00 1.00 1.00 1.00
Stacking 20 79 0 0 1.00 0.00 1.00 1.00 1.00
1-isobutanol SGD 10 62 17 10 0.50 0.22 0.43 0.37 0.72
GNB 1 78 1 19 0.05 0.01 0.09 0.50 0.79
QDA 20 76 3 0 1.00 0.04 0.93 0.87 0.96
MLP 20 76 3 0 1.00 0.04 0.93 0.87 0.96
LDA 20 79 0 0 1.00 0.00 1.00 1.00 1.00
LR 20 79 0 0 1.00 0.00 1.00 1.00 1.00
DT 18 78 1 2 0.90 0.01 0.92 0.94 0.96
GB 19 79 0 1 0.95 0.00 0.97 1.00 0.99
KNN 20 77 2 0 1.00 0.03 0.95 0.91 0.97
AdaBoost 19 79 0 1 0.95 0.00 0.97 1.00 0.98
Stacking 19 79 0 1 0.95 0.00 0.97 1.00 0.98
are misclassified as 1-propanol and 2-propanol. The 2-propanol class shows that 18
of them are classified correctly, and the remaining 2 are misclassified as 1-propanol
and 2-butanol each. The LR classifier achieves an overall accuracy of 0.90.
The DT Classifier in the 1-octanol class is classified precisely; therefore, these
classes’ TPR, F1-score, precision, and accuracy are 1. The 1-propanol class shows
that 18 instances are classified correctly, and 2 instances are misclassified as 2-butanol
and 2-propanol. The 2-butanol class shows that 20 of them are classified correctly,
and 3 instances of 1-propanol and 1-isobutanol are misclassified into 2-butanol. The
2-propanol class shows that 19 instances are classified correctly, and 1 instance is
wrongly predicted as 1-isobutanol. The 1-isobutanol class shows that 18 instances are
classified correctly, and 2 instances are misclassified as 2-butanol. The DT Classifier
classifies each class approximately and achieved an overall accuracy of 0.94. The
class 1-octanol achieves an individual accuracy of 1.00, as 1-propanol has a false
negative of 2 and 2-butanol has a false positive of 3 that gives an individual accuracy
of 0.97 and 0.96. The classes’ 2-propanol and 1-isobutanol attained the individual
accuracy of 0.97 and 0.96.
The GB Classifier predicts the 1-octanol class precisely; therefore, the TPR, F1-
score, precision, and accuracy of these classes are 1. The 1-isobutanol shows that 19
662 P. Suresh Kumar et al.
instances are classified correctly, and the single remaining instance is misclassified
as 1-propanol. The 1-propanol class shows that 18 instances are classified correctly,
and the remaining 2 instances are misclassified as 2-butanol and 2-propanol. The
2-butanol class shows that 20 of them are classified correctly, and one instance of
1-propanol is wrongly predicted as 2-butanol. The 2-propanol class shows that 18
are classified correctly, and only 2 are wrongly predicted as 1-propanol. The class
1-octanol achieves an individual accuracy of 1.00, as 2-butanol and 1-isobutanol
achieve an individual accuracy of 0.99, 2-propanol, and 1-propanol with an individual
accuracy 0.97 and 0.95. The FPR for 2-butanol and 2-propanol gives a value of 0.01;
i.e., one instance is misclassified for each and achieved an overall accuracy of 0.94.
The KNN Classifier classifies the classes’ 2-butanol and 2-propanol precisely.
The 1-octanol shows that 16 instances are classified correctly, and the two instances
are misclassified as 1-isobutanol. The 1-propanol class shows that 20 of them are
classified correctly, where one instance of 1-octanol is misclassified as 1-propanol.
The1-isobutanol class shows that 20 instances are classified correctly, where two
instances of 1-octanol are misclassified as 1-isobutanol. The FPR for 1-propanol and
1-isobutanol is 0.01 and 0.03 and for other classes is 0. The precision for 1-octanol,
2-butanol, and 2-propanol is 1.00, 1-propanol is 0.95, and 1-isobutanol is 0.91. The
KNN Classifier achieves an overall accuracy of 0.97.
The AdaBoost classifier classifies the classes’ 1-octanol and 2-propanol precisely.
The class1-propanol shows that 19 of them are classified correctly, and one instance
is wrongly classified as 2-butanol. The class 2-butanol shows that 20 of them are clas-
sified correctly, and one instance of 1-propanol is improperly classified as 2-butanol.
The table shows that only two instances are wrongly classified, i.e., 1-propanol is
misclassified as 2-butanol, and 1-isobutanol is misclassified as 1-propanol, with an
overall accuracy of 0.98.
The stacking classifier precisely classifies all the classes except 1-propanol and
1-isobutanol. The 1-propanol class shows that 20 of them are classified correctly,
and one instance of 1-isobutanol is wrongly classified as 1-propanol. Where the false
positive of all classes that are 0 except 1-propanol is 1. The F1-score and precision
for 1-octanol, 2-butanol, and 2-propanol are 1. The proposed method gives an overall
accuracy of 0.99.
Figure 3 shows the ROC curve of all the proposed models, and finally, Fig. 4
represents the various graph plots for TPR, FPR, F1-score, precision, accuracy, and
overall accuracies of the models for the dataset classes.
Figure 4 illustrates that TPR and F1-score of all the models with the target feature,
where the GNB evaluates the lowest value for 1-isobutanol and SGD evaluate for
1-propanol and 1-octanol. Figure 4 shows that GNB and SGD evaluate the highest
FPR for classes’ 2-propanol and 2-butanol compared to other classes with various
models. Figure 4 shows that F1-score to 1 for the 1-isobutanol class with LDA and
LR models. The QDA, MLP, LR, LDA, DT, GBC models show the value of F1-score
to 1 for the 1-octonal class. Similarly, STK, ADB, KNN evaluate the F1-score to
1 for the 2-proponal class. Figure 4 shows that all the models evaluate precision
value to 1 for the class 1-octanol except for the model SGD. Similarly, LDA, LR,
DT, GBC, ADB, and STK evaluate precision value to 1 for the 1-isobutanol class.
QCM Sensor-Based Alcohol Classification Using Ensembled … 663
F: ROC curve of LR
E: ROC curve of LDA
A B
C D
E F
Fig. 4 a TPR versus models, b FPR versus models, c F1-score versus models, d precision versus
models, e accuracy versus models, f accuracies of models
Figure 4 represents the individual class accuracies of all the models. Compared
to other classes, 1-octanol and 1-isobutanol classes give good accuracy for all the
models, except for k-Ninth ADB. STK has classified all the classes accurately (100%)
except for 1-propanol and 1-isobutanol with 98%. Figure 4 clearly shows that the
proposed model produces better accuracy than other models in identifying target
classes.
QCM Sensor-Based Alcohol Classification Using Ensembled … 665
6 Conclusion
This study proposed a classifier based on an ensemble model for determining the
reaction of QCM sensors to five different alcohols such as 1-octanol, 1-propanol, 2-
butanol, 2-propanol, and 1-isobutanol and classifying the sensors. Also, various ML
algorithms are considered for effective analysis of the performances for classifying
the accurateness. The classifiers QDA and MLP have driven the same results with an
accuracy of 0.77 and performed well in categorizing the 1-octanol class. The LDA
and LR performed well in classifying the classes’ 1-octanol and 1-isobutanol, where
LDA gives an accuracy of 0.83 and LR with an accuracy of 0.90. The DT and GB
classifiers perform well in predominantly categorizing the 1-octanol class and other
classes. The KNN performs well in classifying the classes 2-butanol and 2-propanol,
but the 1-octanol class is misclassified to 1-propanol and 1-isobutanol. The AdaBoost
performs well in categorizing the classes 1-octanol and 2-propanol. However, the
proposed model shows that the classes 1-octanol, 2-butanol, and 2-propanol are
categorized correctly. The class 1-isobutanol is misclassified as 1-propanol and gives
an accuracy of 0.99. From this study, it is clear that all the models can classify 1-
octanol class precisely except KNN. The QDA, DT, MLP, and GB can classify the 1-
octanol class precisely. The LDA and LR classifiers are also able to classify 1-octanol
and 1-isobutanol classes. AdaBoost is also able to classify the classes 1-octanol
and 2-propanol. The overall results show that 1-octanol, 2-butanol, 2-propanol, and
1-isobutanol classes are categorized except the class1-propanol. Among all these
models, the proposed method signifies its performance to a larger extent by correctly
classifying the alcohol classes. In the future, a deep study may be conducted on
other properties of alcohol by using deep learning methods for various practical
applications.
References
6. H.M. Saraoğlu, B. Edin, E-nose system for anesthetic dose level detection using artificial neural
network. J. Med. Syst. 31(6), 475–482 (2007). https://doi.org/10.1007/s10916-007-9087-7
7. S. Palaniappan, N.A. Hameed, A. Mustapha, N.A. Samsudin, Classification of alcohol
consumption among secondary school students. JOIV Int. J. Inf. Vis. 1(4–2), 224 (2017).
https://doi.org/10.30630/joiv.1.4-2.64
8. J.P. Connor, M. Symons, G.F.X. Feeney, R.M. Young, J. Wiles, The application of machine
learning techniques as an adjunct to clinical decision making in alcohol dependence treatment.
Subst. Use Misuse 42(14), 2193–2206 (2007). https://doi.org/10.1080/10826080701658125
9. E. Ordukaya, B. Karlik, Fruit juice–alcohol mixture analysis using machine learning and elec-
tronic nose. IEEJ Trans. Electr. Electron. Eng. 11, S171–S176 (2016). https://doi.org/10.1002/
tee.22250
10. P.S. Kanna, R. Palaniappan, K.V.R. Ravi, Classification of alcohol abusers: an intelligent
approach, in Third International Conference on Information Technology and Applications
(ICITA’05), vol. 1, pp. 470–474 (2005). https://doi.org/10.1109/ICITA.2005.95
11. A. Fatih, P. Lieberzeit, P. Jarujamrus, N. Yumusak, UCI machine learning repository: alcohol
QCM sensor dataset data set (n.d.). Retrieved April 17, 2020, from http://archive.ics.uci.edu/
ml/datasets/Alcohol+QCM+Sensor+Dataset
12. M.F. Adak, P. Lieberzeit, P. Jarujamrus, N. Yumusak, Classification of alcohols obtained by
QCM sensors with different characteristics using ABC based neural network. Eng. Sci. Technol.
Int. J. 23(3) (2019). https://doi.org/10.1016/j.jestch.2019.06.011
13. N. Katardjiev, S. Mckeever, A. Hamfelt, A machine learning-based approach to forecasting
alcoholic relapses (2019)
14. Triyana et al., Chitosan-based quartz crystal microbalance for alcohol sensing. Electronics
7(9), 1–11 (2018). https://doi.org/10.3390/electronics7090181
15. A. Pisutaporn, B. Chonvirachkul, D. Sutivong, Relevant factors and classification of student
alcohol consumption, in 2018 IEEE International Conference on Innovative Research and
Development (ICIRD), pp. 1–6 (2018). https://doi.org/10.1109/ICIRD.2018.8376297
16. X. Zhu, X. Du, M. Kerich, F.W. Lohoff, R. Momenan, Random forest based classification of
alcohol dependence patients and healthy controls using resting state MRI. Neurosci. Lett. 676,
27–33 (2018). https://doi.org/10.1016/j.neulet.2018.04.007
17. L. Breiman, Stacked regressions. Mach. Learn. 24(1), 49–64 (1996). https://doi.org/10.1007/
BF00117832
18. M.J. van der Laan, E.C. Polley, A.E. Hubbard, Super learner. Stat. Appl. Genet. Mol. Biol. 6(1)
(2007). https://doi.org/10.2202/1544-6115.1309
19. D.H. Wolpert, Stacked generalization. Neural Netw. 5(2), 241–259 (1992). https://doi.org/10.
1016/S0893-6080(05)80023-1
20. G. Wang, J. Hao, J. Ma, H. Jiang, A comparative assessment of ensemble learning for credit
scoring. Expert Syst. Appl. 38(1), 223–230 (2011). https://doi.org/10.1016/j.eswa.2010.06.048
21. P. Suresh Kumar, H.S. Behera, J. Nayak, B. Naik, Bootstrap aggregation ensemble learning-
based reliable approach for software defect prediction by using characterized code feature.
Innov. Syst. Softw. Eng. 1–22 (2021). https://doi.org/10.1007/s11334-021-00399-2
A Novel Image Falsification Detection
Using Vision Transformer (Vi-T) Neural
Network
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 667
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_50
668 M. R. Tankala and Ch. Srinivasa Rao
1 Introduction
With the invention of image editing software, any image can be modified to appear
legitimate when compared to the original. The usage of image editing these days is
rather popular, and in fact, this is rather relevant in the digital era. Facebook, What-
sApp, Twitter, and Web-based pages all contain plenty of modified photographs,
which is made easier utilizing image manipulation programs like GIMP, Adobe
Photoshop, Coral Draw, Paint 3D. Copy-move image forgery, spliced image
forging, and resampling are the primary picture manipulation methods. To be
employed along with the primary and focused methodology for image forgery detec-
tion, these approaches fall under pixel-based image forgery detection [1, 2] tech-
niques, which are a fundamental and versatile method for spotting counterfeit among
images. Techniques including zooming, resampling, rotation, the addition of noise,
and stitching are commonly employed in picture editing to achieve visuals that are
more closely matched to those of the user choice. Splicing involves the blending of
two distinct photographs, one is taken of each prospect, and then the photographs are
blended to make a composite shot, which presents them as the original. For gaining
the copy-move image forgeries, one must copy and paste the targeted area in the
same image. Resampling changes the picture resolution (and hence the resolution
of objects inside the picture) concerning the sampling frequency and compression
standard.
As a result, we require a new method for image forgery detection. The area of
picture forensics is concerned with establishing the authenticity of photographs and
safeguarding the user’s rights. Figure 1 illustrates the methods for detecting picture
forgeries. There are primarily two methods for detecting forgeries. The first is an
active technique, which entails introducing a digital signature at the transmission end
Fig. 1 Classification of
image forgery detection
methods
A Novel Image Falsification Detection Using Vision Transformer … 669
and retrieving it at the receiver’s end via specified hardware. Digital watermarking
schemes are employed in the active approach. This is a rather time-consuming method
that requires specialized training for application across sequential processes. In the
second strategy, commonly referred to as the passive technique proposes the proce-
dure of image forgery. This approach consists of three techniques for forgery and can
be interpreted for forgery findings through software programming on a collection of
photographs that includes both real and altered images. Additionally, this approach
is easily configurable, owing to the algorithms well-designed and well-implemented
evaluation datasets. By contrast, a passive approach does not require a watermark or
signature to establish authenticity.
Figure 2 [3], Fig. 3 [3], Fig. 4 [4] illustrate copy-move forgery and spliced forging
approaches obtained following post-processing attacks on the benchmark datasets
[3, 4]. When compared to actual photographs from the benchmark datasets, these
images appear to be genuine. Figure 4 shows spliced image forgery obtained by
joining two different images to form a composite image. Convolution neural networks
Fig. 2 Original picture (left) and forged image (right) obtained after manipulation. (Image courtesy
from Frith dataset [3])
Fig. 4 Forged image manipulated with post-processing techniques (left) and original picture (right)
indicating splicing image forgery (courtesy CASIA image manipulation dataset[4])
(CNN) have now been extensively preferred for a few years over the classic detec-
tion approach. Tasks of computer vision, such as classification and localization, are
well attained through deep learning, including neural networks such as ResNets,
Fast-RCNN, generative adversarial networks (GAN), sparse encoders, YOLOv4
networks. The drawback with CNN is they take more computation resource and
time during the process of convolution process on images. Also, they are many deep
layers that are to be trained for image classification tasks. Methods like transfer
learning are well suited in achieving a required classification task. But again, they
require extensive training on both training and testing datasets.
To overcome the problem of extensive training on many layers with convolution
properties and feature maps, a new method with Vision Transformer-based neural
network is proposed in this paper for image forgery detection and localization of
forged areas. The primary advantage of Vision Transformer is that it does not involve
the use of convolutional filters, which are a vital process in many deep learning algo-
rithms for image classification and image recognition. This paper primarily focuses
on image tamper classification and localization of forged areas using Vision Trans-
former and attention networks. The proposed method using Vision Transformer does
not employ convolutional filters during the process of tamper image classification,
and thus, the approach finds a new objective for image falsification detection and
classification between real and fake images.
2 Related Work
There are two approaches for image forgery detection and classification [5]
using algorithms, and the following Sects. 2.1 and 2.2 describe them toward the
implementation.
A Novel Image Falsification Detection Using Vision Transformer … 671
In the traditional approach, the picture was divided into regionally prudent or block
prudent, and features extracted from the sample image were compared with original
image features, using a keypoint technique or a block-said strategy to get keypoint
pairs. For similarity verification, the distance between key pairs was calculated and
employed for defining a picture as tamper or as an authentic image. Finally, a classifier
is utilized to provide confusion matrix categorization for the assessment. This tech-
nique is an old way of finding a correlation between the real and morphed images; later
on, with hybrid methods extracting characteristics like local binary pattern (LBP),
Fourier descriptors and HSV color moments were used. These hybrid characteristics
are taught on an image classification learning system, and outcomes are evaluated in
terms of accuracy and F1-score. To determine the performance of any learnt model,
precision against recall is also compared to a true positive versus false positivity rate.
To verify the validity of any image, both block-based and keypoint techniques use
features such as SURF, PCA, DWT, DCT. The traditional approach looks for hidden
patterns, features in images for training a classifier accordingly and is confined to
perform well on a small image dataset. Also, it takes more time for computation and
testing the forgery images and will not work for new data unless better trained for
large datasets. To overcome this problem, deep learning algorithms are proposed and
are compared to traditional approaches for implementation, localization, and testing
on images.
In this approach, CNNs are extensively used, with the given dataset being supple-
mented using data augmentation techniques, then applied to layers of neural nets for
extraction of low-level features and ultimately to the softmax function for turning
into a probability value to discover the class of attributes. This method is effective for
classifying [6] actual and altered images with improved accuracy. CNNs necessitate
a lot of computing power and a lot of data. Deep learning approaches such as RNN,
RESNET, SPARSE-NETS, and CNN-LSTMS are good at picture classification, but
they take a lot of training on a given dataset.
When compared to single-task fully convolutional network (SFCN) approaches,
the multi-task fully convolutional network (MFCN) presented by Salloum et al. [7]
performs well in locating tamper regions. Amerini et al. [8] used a multi-domain
CNN strategy to localize double-JPEG compression. They examined both the spatial
and frequency domains, as well as fully connected layers to obtain higher accuracy
on the UCID dataset. Based on the frequency domain, CNN takes DCT coefficients
for each patch as input. Frequency domain-based network has two convolutional
layers, two pooling layers, and three full connections make up CNN. Multi-domain
CNN connects the outputs coming from fully linked layers of these two networks,
672 M. R. Tankala and Ch. Srinivasa Rao
and it classifies the patch into one of three classes: uncompressed, single, or double
compressed [6].
3 Proposed Method
In this research work, we propose a novel method for image forgery detection using
Vision Transformers for forgery detection in images where it is robust in the detection
of tamper areas and classification of images concerning real and morphed images.
As shown in the methodology in Fig. 5, these are basic steps required to follow
for image forgery classification and localization. The first is preprocessing, in which
a preprocessed image dataset is fed into the Vision Transformer. The image is then
divided into smaller patches, with correlation among the patches being found with
attention-based networks. Next, evaluation metrics are generated and plotted against
the number of epochs trained for the dataset. The penultimate phase is locating the
image tamper region and predicting the dataset’s test picture.
Figure 6 presents an overview of the model. A 1D input of token embedding is
sent to the standard transformer. To process 2D images, we reshape the image (H, W,
C) into a sequence of flattened 2D patches (N = HW /P2). The transformer employs
constant latent vector size D through all of its layers; thus, we flatten the patches and
map to D dimensions via a trainable linear projection given in Eq. (1). We call the
patch embedding produced by this projection the output.
As we introduce the core components of the transformer, we will touch on multi-
head self-attention (MSA), multi-layer perceptron (MLP), and linear neural network
(LN) (layer normalization). Multi-head self-attention (MSA) module takes the inputs
as X ∈ Rn × d is linearly transformed to three parts, i.e., queries Q ∈ Rn × dk , keys K ∈
Rn × dk , and values V ∈ Rn × dv where n is the sequence length and d, dK, dV are the
dimensions of inputs, queries (keys), and values, respectively, in Eq. (1). The scaled
dot-product attention is applied on Q, K, V.
QK T
Attention(Q, K , V ) = Softmax √ V (1)
dk
The proposed algorithm is expressed as shown above. We utilized the ViT base
model, according to the settings of model parameters in all tests. The model has
12 encoder layers and 12 attention heads in each one. It features 768-dimensional
embedding and a 3072-dimensional feed-forward sub-network. For the fine-tuning,
we used a pre-trained model that was trained on ImageNet-21 k, and then fine-tuned
on ImageNet-1 k. For the last 30 iterations, we ran the machine learning algorithm
with a batch size of 2048, all on image forgery data. We used the Adam technique to
optimize it, and we set the learning rate to 0.001. We had a sequence with 196 tokens
length with a fixed image size of 224 pixels by 224 pixels and a fixed patch size
of 16 pixels by 16 pixels. For accuracy comparison, we checked how the approach
performed by assessing standard overall accuracy (OA), the number of correctly
identified photographs divided by the total number of images.
The overall architecture in Fig. 6 is termed as Vision Transformer (ViT), and the
process is explored in the following Steps [9, 10].
1. Split an image (real or morphed image) into patches taken from the dataset and
smooth the patches.
2. Convert the flattened patches into lower-dimensional linear embedding.
3. Add positional embedding.
4. Feed the sequence as an input to a conventional transformer encoder.
5. Pre-train the model with image labels (totally supervised on a big dataset)
6. Fine-tune on the downstream dataset for picture classification.
The transformer [9] building blocks are scaled attention units for dot products.
Transformer encoder is shown in Fig. 7. When an image is routed through a trans-
former model, simultaneous attention weights are calculated between each token.
The attention unit generates embedding for each token in context that contains infor-
mation on the token itself as well as a weighted combination of other relevant tokens,
and each of which is weighted according to its attention weight.
A Novel Image Falsification Detection Using Vision Transformer … 675
Image patches are equivalent to a sequence of tokens that are similar to the
sequence of tokens used in natural language processing to represent sentences. With
the help of transformer layers, encoding processing and training on the model takes
place to obtain position embedding values. Following that, it is used for MLP after
going via multi-head attention, among other things. The ViT model [10] is trained
on both the original and fake images, and labeling is performed once the encoding
process is completed with the help of the transformer block. Finally, after the process,
fine-tuning is carried out. The model contains 12 layers, a hidden size D of 768, an
MLP size of 3072, 12 heads, and a total of 86 M parameters.
Dell desktop configuration with Intel Core i7-940 (2.93 GHz) Quad processor, 64 GB
RAM, and NVIDIA GeForce RTX-3080 Ti GPU was used for all experiments done
on two datasets. Coding was implemented in TensorFlow which is a free and open-
source deep learning neural network library written in the Python programming
language. The sea-born library was used to create the visualization, and attention
maps were utilized to find the locations of the tampering areas based on the training
of the model. TensorBoard was used to plot and visualize all graphs showing training,
676 M. R. Tankala and Ch. Srinivasa Rao
testing, and loss curves of the model. Table 1 indicates forgery benchmark datasets
used for classification and image forgery localization using Vision Transformer [11].
ReLU function is used in multi-layer perceptron for updating the parameters during
backpropagation, and also, it does not suffer with vanishing gradient problem which
is an advantage in neural networks. ReLU is an example for an activation function
which allows output only when it has positive inputs else zero outputs zero.
5 Results
Performance metrics on the MICC-MF2000 dataset achieved good training and vali-
dation accuracy and were executed for 200 steps for model training. Also, loss
obtained in the end proves to be satisfactory for the implementation of image clas-
sification for forgery detection. Figure 8 indicates the precision and recall matrices
Fig. 8 Precision matrix and recall matrix obtained on MICC-MF 2000 dataset
A Novel Image Falsification Detection Using Vision Transformer … 677
obtained on the original class and predicted class. Figure 9 shows graphical results
obtained after training and validating accuracy against epoch loss.
MICC-MF2000 is a 2000-image dataset that is regarded a benchmark for detecting
picture forgeries. Vision Transformer performed well on this dataset and was also
well trained for demonstrating generalized performance on a fresh dataset during the
forgery detection testing phase. The well-known CASIA 2 dataset also performed
well during the training phase when the suggested technique was used, but achieved
lower validation accuracy than previous deep neural networks. Localization, too, is
best accomplished through the use of an attention mechanism.
Performance metrics on the CASIA 2 dataset achieved good training and average
validation accuracy and were executed for 100 steps for model training. Also, loss
obtained in the end proves to be satisfactory for the implementation of image clas-
sification for forgery detection. Figure 10 indicates the precision and recall matrix
obtained on the original class and predicted class. Figure 11 shows graphical results
Fig. 11 Training loss and validation loss versus total epochs on MICC-MF2000 dataset
obtained after training and validation loss against total epochs. Table 2 indicates train-
able and non-trainable parameters of two models of Vision Transformer obtained
using ViT-b16-model weights for two datasets of image forgery used for evalua-
tion. Table 2 indicates the hyper-parameters used for training the neural network and
summary obtained after model compilation on image forgery datasets MICC-MF
2000 and CASIA 2. Epoch loss and accuracies are shown in Figs. 9 and 11.
Previously, initial layers of a CNN model are used for visualization, but in recent
times, a well-trained model utilizes smooth and nice filters for visualization. A vision
transformer that is well trained is allowed to find the localization [7] basing on
A Novel Image Falsification Detection Using Vision Transformer … 679
Fig. 12 Forged image (left) and localization using attention networks (right)
The results section and graphing of performance metrics against multiple benchmark
datasets includes tamper detection such as copy-move forgery and spliced picture
forgery provide a complete evaluation approach using Vision Transformer (ViT).
The MICC-F2000 dataset achieved 97% validation accuracy. However, CASIA 2
dataset achieved 64% validation accuracy over test images. Further localization of
forged areas is achieved by attention networks using the model. The work will be
expanded to identify photographs using deep learning methods for tamper detection
while employing cloud computing GPU resources in the future.
Compliance with Ethical Standards Conflict of Interest: The authors declare that they have no
conflict of interest.
680 M. R. Tankala and Ch. Srinivasa Rao
References
1. A.H. Saber, M.A. Khan, B.G. Mejbel, A survey on image forgery detection using different
forensic approaches. Adv. Sci. Technol. Eng. Syst. J. 5(3), 361–370 (2020)
2. M.A. Qureshi, M. Deriche, A bibliography of pixel-based blind image forgery detection
techniques. Signal Process.: Image Commun. 39, 46–74 (2015)
3. K. Asghar, X. Sun, P.L. Rosin, M. Saddique, M. Hussain, Z. Habib, Edge-texture feature based
image forgery detection with cross-dataset evaluation. Mach. Vis. Appl. 30(7–8), 1243–1262
(2019). https://doi.org/10.1007/s00138-019-01048-2
4. J. Dong, W. Wang, T. Tan, CASIA image tampering detection evaluation database, in 2013
IEEE China Summit and International Conference on Signal and Information Processing,
pp. 422–426 (2013). https://doi.org/10.1109/ChinaSIP.2013.6625374
5. A. Jegorowa et al., Deep learning methods for drill wear classification based on images of holes
drilled in melamine faced chipboard. Wood Sci. Technol. 55(1), 271–293 (2021)
6. Z.J. Barad, M.M. Goswami, Image forgery detection using deep learning: a survey, in 2020
6th International Conference on Advanced Computing and Communication Systems (ICACCS)
(2020). https://doi.org/10.1109/ICACCS48705.2020.9074408
7. R. Salloum, Y. Ren, C.-C. Jay Kuo, Image splicing localization using a multi-task fully
convolutional network (MFCN). J. Vis. Commun. Image Representation 51, 201–209 (2018)
8. I. Amerini, T. Uricchio, L. Ballan, R. Caldelli, Localization of JPEG double compression
through multi-domain convolutional neural networks (2017). https://doi.org/10.1109/CVPRW.
2017.233
9. A. Dosovitskiy et al., An image is worth 16 x 16 words: transformers for image recognition at
scale. arXiv preprint arXiv:2010.11929 (2020)
10. S. Paul, P.-Y. Chen, Vision transformers are robust learners. arXiv preprint arXiv:2105.07581
(2021)
11. A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A.N. Gomez, Ł Kaiser, I. Polosukhin,
Attention is all you need. Adv. Neural Inf. Process. Syst. 30, 5998–6008 (2017)
12. E.W. Teh, M. Rochan, Y. Wang, Attention networks for weakly supervised object localization.
BMVC (2016)
Design of Intelligent Framework
for Intrusion Detection Platform
for Internet of Vehicles
Ch. Ravi Kishore, D. Chandrasekhar Rao, Janmenjoy Nayak,
and H. S. Behera
1 Introduction
Vehicles today can have up to 70 electronic control units (ECUs), all of which are
interconnected and communicate through specialist automotive bus systems such as
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 681
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_51
682 Ch. R. Kishore et al.
in road transportation, such as humans, vehicles, roads, parking lots, and city infras-
tructure, that allows for real-time communication between them [5]. IoV system
security breaches might have major consequences for people, automobiles, roads,
and other infrastructure. Automobiles, as well as human lives, are at risk from faulty
or deceptive information. Hackers or intruders can take over the Car system and
cause chaos and accidents [6].
AVs are susceptible to both internal and external communication assaults. In the
CAN, all ECUs in a vehicle can communicate with one another via bus communica-
tion (ECUs). Because of its effective error detection method for steady transmission
[7], it can minimize wiring costs, weight, and complexity. If the CAN bus is compro-
mised, however, all ECUs are vulnerable to a variety of attacks because all ECUs
communicate with one another via this network. If malicious messages are injected
into the network traffic or if hostile attacks are conducted, the nodes will execute
them without confirming their origin. As a result, message injection attacks on the
CAN bus can be divided into three for example, a denial of service (DoS) attack or
spoofing can be used to take over resources or send destructive information, such as
gear and RPM data. Unwanted states or malfunctions are caused via fuzz attacks,
which inject arbitrary messages into the CAN bus [8].
The CAN bus distinctive characteristics could be vulnerable due to no considera-
tion of security. Many injection attacks, such as flooding, fuzzing, spoofing, replay,
and bus–off attacks, exploit CAN bus vulnerabilities [9]. The CAN bus has no authen-
tication for source and destination addresses, so the injected data can be processed
in normal ECUs without verification. The flooding attack causes CAN message
delaying of normal ECUs by sending a bunch of messages that have the highest
priority (CAN ID 0 × 000). The fuzzing attack injects random CAN IDs and data.
The spoofing attack controls certain vehicle functions by setting specific CAN ID
and the data field. The replay attack is to inject normal CAN bus traffic, which was
collected during driving.
We have set our research focus on two hypothetical situations. The first issue
that needs to be addressed is data imbalance, which is prevalent in network data.
Additionally, the IDS should be constructed with the help of ML algorithms. In this
paper, we examine the security mechanism for detecting malicious attacks on the
CAN bus, which is called the intrusion detection system. IDS can be viewed as a
classification problem. To create intelligent identification cards for the CAN bus,
we proposed some tree-based machine learning algorithms and traditional machine
learning models together (Decision Tree (DT), Random Forest (RF), Ada Boost,
Extra Tree, HG Boost, XG Boost, Cat Boost, Gradient Boost, Gaussian Naive Bayes
(NB), K Nearset Neighbor (KNN)). An intelligent intrusion detection system should
be both highly accurate and inexpensive to run. This approach, which improved our
detection accuracy, used a Multiclass Adaptive Boosting Ensemble learning model.
In this paper Sect. 2 describes related works. Section 3 explains about proposed
intrusion detection system. Section 4 describes performance evaluations and Sect. 5
concludes the research work.
684 Ch. R. Kishore et al.
2 Related Works
Building an IDS system begins with gathering enough information on network traffic,
both regular and atypical, as a result of a variety of attacks. The most prevalent
Design of Intelligent Framework for Intrusion Detection … 685
CAN bus threats are message injection attacks, hence collecting data from CAN
messages/frames is the first step in building an IDS for CAN bus intrusion. When it
comes to assaults, the most important aspects are the CAN IDs and the data fields.
It is critical to detect intrusions in CAN bus on time. The vehicle’s ECUs will
generate a large amount of data, making detection of an intrusion more difficult. As
a result, the data must be compressed. Network data is often class imbalanced and
attack-label examples are often insufficient because networks in real life are usually in
a normal state. One way to obtain more class-imbalanced data is by extracting it from
the minority classes in which insufficient data exists. This concept is shown in Fig. 2.
This issue is dealt with by using a combination of two approaches: oversampling and
SMOTE [15]. Because the features of a replay attack are similar to those of a normal
attack, replay is frequently misclassified as a normal attack. To address this problem,
the proposed method ignores the messages’ time stamps as repeated message data
will be the same only even if timestamps changes, this process will result in reducing
data set which will result in a reduction of computational time of the system which
is a major concern in CAN IDS system and also produce a balanced data between
normal and replay.
Here we have implemented our proposed method (adaptive boosting) on the
preprocessed dataset obtaining after applying SMOTE. Here the decision tree is
used as a base classifier for the proposed method to detect intrusion detection in
vehicles as we have a multiclass problem, here our proposed method is used to boost
the performance of a base classifier. Here, our proposed method predicts the intrusion
detection types from the “N” number of base classifiers constructed in the training
data, and base classifiers are added sequentially and getting trained by the weight
samples and the error is observed. This process is continued until the finest model is
obtained, which will give the best performance. Using the weighted average of the
base classifier predictions, the final intrusion detection categories were determined.
Algorithm for the proposed adaptive boosting-based model for intrusion detection
on CANBus dataset.
(1) Initialize the weights [Eq. (1)] of each S i ∈ S, Where S = All samples
wi = 1/N , i = 1, 2, 3, 4 . . . . . . M (1)
(2) For m = 1 to N
(a) Fit a model classifier Z m (x) to the training data using the weight wi
(b) Compute prediction error [Eq. (2)]
M
i=1 M y i = Z m (x i )
Er r m = M (2)
i=1 W i
(c) Estimate weighting factor on the basis of predicted error [Eq. (3)]
Furthermore, the proposed system has been evaluated and analyzed using state-
of-the-art methods, and it has demonstrated a significant improvement over other
simulation models used in this paper.
3.1 Dataset
Intrusion detection in the CANBus dataset was prepared by Kang et al. [16]. The
dataset was generated for the car security competition ‘Car Hacking: Attack &
Defence Challenge- 2020’. The competition problem was to develop attacks and
detection algorithms for CAN, a widely used standard of in-vehicle communication.
The data was collected from Hyundai Avante CN7. The dataset contains 86, 81,500
samples of various attacks such as Replay, Normal, Flooding, Fuzzy, and Spoofing
as shown in Table 1. However, a huge imbalance between the samples belongs to
normal, and replay attacks need to be addressed.
4 Result Analysis
In this section, first obtained is the confusion matrix from the proposed model for
IDS’s performance on the CAN dataset. Initially, it shows that the proposed model
is misclassifying replay attacks to normal attacks and finding a low recognition rate.
Fig. 3a depicts the IDS’s performance on the CAN dataset before it was pre-processed
using the decision tree algorithm.
The following is the procedure for balancing the normal and replay classes:
1. Table 1 shows that the normal class has 7,808,258 records, whereas the replay
attack has only 110,474 records.
2. By ignoring the time stamp of normal data and identifying distinct data points
in normal, the number of records is reduced from 7,808,258 to 1,152,376.
3. Even after reducing the normal data set, normal data is still 90% more than
replay data, indicating a clear problem of class imbalance.
4. To overcome class imbalance, SMOTE was used on the minority class, which
resulted in a balanced class (as shown in Table 1) and an improvement in replay
attacks (as shown in Fig. 3b.
The performance of the proposed methodology has achieved a significant
improvement over the state-of-the-art methods. The results of various decision tree
algorithms on the intrusion dataset have been shown in Table 2.
4.1 Evaluation
We worked on many machine learning models on the CAN dataset in this paper, and
all model parameters were set by picking appropriate values as shown in Table 3.
After putting the test dataset through the classifier, we evaluate its performance
using typical evaluation measures. To assess the effectiveness of the suggested
strategy, various performance measures such as Accuracy, Precision, Recall, F1
Design of Intelligent Framework for Intrusion Detection … 689
Score, and ROC-AUC were computed and compared. True Positive, True Nega-
tive, False Positive, False Negative, and ROC-AUC are the terms used here. Table 4
shows the results of analyzing the performance of various strategies using the above
measures.
We used a variety of machine learning classification models in this research. Gaus-
sian NB, Gradient Boost, KNN, CatBoost, Hist Gradient Boosting, Extra Tree, RF,
DT, and Adaptive Boosting are some of the techniques used. Original car hacking
data samples were used to evaluate all of the models. All models, however, were
implemented using SMOTE, using car hacking data samples. The total correct clas-
sification rate and F1 score of all SMOTE-enabled models are higher than that of
non-SMOTE-enabled models. Among all models, the Adaptive Boosting ensemble
classification model with SMOTE has the highest accuracy and F1 score with minimal
execution time as shown in Table 4.
The proposed system outperforms the methods of Kang et al. [16] in terms of
accuracy and F1-score. Proposed AdaBoost has shown the best accuracy while NB
and DT have got minimal accuracy. Graphical representation of confusion matrix for
AdaBoost is represented in Fig. 4
690 Ch. R. Kishore et al.
5 Conclusion
Moreover, all the results and analyses show that the proposed model is able to
reach highly improved accuracy as well as F1 score when compared to other machine
learning models. Further research is to build a highly generalized model which will
work on large datasets with a greater number of attacks.
References
1. N. Khatri, R. Shrestha, S.Y. Nam, Security issues with in-vehicle networks, and enhanced
countermeasures based on blockchain. Electronics 10(8) 893 (2021)
2. M.-J. Kang, Kang J.-W., Intrusion detection system using deep neural network for in-vehicle
network security. PloS one 11(6) e0155781 (2016)
3. S. Alam, M. Shuaib, A. Samad, A collaborative study of intrusion detection and prevention
techniques in cloud computing, in Proceedings of ICICC 2018, vol. 1 (2019) https://doi.org/
10.1007/978-981-13-2324-9_23
4. W. Wu, Z. Yang, K. Li., Internet of vehicles and applications, in Internet of Things (Morgan
Kaufmann, 2016), pp. 299–317
5. Y. Sun et al., Attacks and countermeasures in the internet of vehicles. Ann. Telecommun.
72(5–6) 283–295 (2017)
6. E. Seo, H.M. Song, H.K. Kim, Gids: Gan based intrusion detection system for in-vehicle
network, in 2018 16th Annual Conference on Privacy, Security and Trust (PST) (IEEE, 2018)
7. H. Lee, S.H. Jeong, H.K. Kim, OTIDS: A novel intrusion detection system for in-vehicle
network by using remote frame, in 2017 15th Annual Conference on Privacy, Security and
Trust (PST) (IEEE, 2017), pp. 57–5709
8. M.L. Han, B.I. Kwak, H.K. Kim, Anomaly intrusion detection method for vehicular networks
based on survival analysis. Veh. Commun. 14, 52–63 (2018)
9. N.V. Chawla et al., SMOTE: synthetic minority over-sampling technique. J. Artif. Intell. Res.
16, 321–357 (2002)
10. T. Alladi, V. Kohli, V. Chamola, F.R. Yu, Securing the internet of vehicles: a deep learning-based
classification framework. IEEE Networking Lett. 3(2), 94–97 (2021)
Design of Intelligent Framework for Intrusion Detection … 693
11. A. Zhou, Z. Li, Y. Shen, Anomaly detection of CAN bus messages using a deep neural network
for autonomous vehicles. Appl. Sci. 9(15), 3174 (2019)
12. M. Muter, N. Asaj, Entropy-based anomaly detection for in-vehicle networks. [IEEE 2011
IEEE Intelligent Vehicles Symposium (IV) - Baden-Baden, Germany (2011.06.5-2011.06.9)]
2011 IEEE Intelligent Vehicles Symposium (IV), pp. 1110–1115 (2011). https://doi.org/10.
1109/ivs.2011.5940552
13. M.J. Kang, J.W. Kang, A novel intrusion detection method using deep neural network for in-
vehicle network security, in 2016 IEEE 83rd Vehicular Technology Conference (VTC Spring)
(IEEE, 2016), pp. 1–5
14. H.M. Song, J. Woo, H.K. Kim, In-vehicle network intrusion detection using deep convolutional
neural network. Veh. Commun. 21, 100198 (2020)
15. D. Elreedy, A.F. Atiya, A comprehensive analysis of synthetic minority oversampling technique
(SMOTE) for handling class imbalance. Inf. Sci. 505, 32–64 (2019)
16. H. Kang et al., Car hacking and defense competition on in-vehicle network. Workshop Automot.
Auton. Veh. Secur. (AutoSec). 2021 (2021)
17. L. Breiman, Random forests. Mach. Learn. 45(1), 5–32 (2001)
18. R.E. Schapire, Explaining adaboost, in Empirical Inference (Springer, Berlin, 2013), pp. 37–52
19. J.H. Friedman, Stochastic gradient boosting. Comput. Stat. Data Anal. 38(4), 367–378 (2002)
20. P. Geurts, D. Ernst, L. Wehenkel, Extremely randomized trees. Machinelearning 63(1), 3–42
(2006)
21. A. Guryanov, Histogram-based algorithm for building gradient boosting ensembles of piece-
wise linear decision trees, in Analysis of Images, Social Networks and Texts. AIST 2019. Lecture
Notes in Computer Science, vol. 11832, eds. by W. van der Aalst et al. (Springer, Cham, 2019)
22. T. Chen, C. Guestrin, Xgboost: A scalable tree boosting system, in Proceedings of the 22nd
Acm Sigkdd International Conference on Knowledge Discovery and Data Mining, pp. 785–794
23. A.V. Dorogush, V. Ershov, A. Gulin, CatBoost: gradient boosting with categorical features
support. arXiv preprint arXiv:1810.11363 (2018)
Autonomous Vehicles: A Survey
on Sensor Fusion, Lane Detection
and Drivable Area Segmentation
Abstract Most road traffic accidents are caused due to human errors and lead to fatal
injuries, hospitalization, and even deaths. Autonomous vehicles pose to be a good
option to minimize the possibility of such human errors. But autonomous systems
today are not capable of handling extreme uncertainties that an everyday driver has
to face on the road. Researchers are trying to study new ways of making autonomous
vehicles better and safe. In this paper, we have discussed some of the most important
parts of a modular autonomous vehicle: Sensor fusion, lane detection, and drivable
area segmentation. We also present a detailed survey of existing and state-of-the-art
approaches for these modules. Understanding these techniques and how they work,
can lay a proper foundation for the planning and acting phase of autonomous vehicle
systems.
1 Introduction
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 695
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_52
696 T. Morkar et al.
error in the causes of road accidents. Autonomous machines can be broken down
into three important modules of sensing, perceiving, and planning and acting.
In this paper, we have conducted a thorough survey on the sensing and perceiving
block of the autonomous vehicle systems as shown in Fig. 1. Sensor fusion is the
most vital part of converting deep random data into meaningful information for the
perceiving blocks. We discuss sensor fusion techniques that address specific problems
like configurations, calibration errors, and required data fusion. We also examine the
state-of-the-art techniques for lane detection and drivable area segmentation further
and provide comparison analysis to understand and choose the correct techniques
based on the required parameters like available computing power, response time, and
input video stream quality. Our contribution through this paper is to provide a deep
study on these modules and strengthen the base for further research in the acting
phase of the autonomous vehicle systems.
The remainder of this paper is organized as follows: Literature survey is discussed
module wise in detail in Sect. 2 where part 2.1 is Sensor fusion, part 2.2 is Lane
detection, and part 2.3 is Drivable area segmentation. In Sect. 3, we have discussed
of the survey to understand the outcomes of the available literature on all these
modules for an autonomous vehicle. In Sect. 4, we have conclusion for the study
performed.
Fig. 1 Block diagram of an autonomous vehicle with 3 main modules: Sense, perceive, and plan
and act
Autonomous Vehicles: A Survey on Sensor Fusion, Lane … 697
2 Literature Survey
This section is divided into literature survey of the sensor fusion, lane detection, and
drivable area segmentation modules, respectively.
For being able to perceive the real world accurately, autonomous systems need to
be able to interpret that data. Sensor fusion is needed here to combine data from
multiple sensors on the vehicle and provide self-awareness and situational awareness
parameters for the vehicle like its localization, positioning, object detection, and
tracking. There are various issues while combining multi-sensor data like calibration,
data parallax, and synchronization issues. Some of the focus areas of studies in this
domain are about which sensor configurations should be used [2, 3], how to achieve
proper data synchronization by automation [4, 5], and ways of achieving better
sensor fusion using techniques like Kalman filter [6, 7], Fuzzy logic [8, 9], and
Deep Learning-based models [10, 11]. We will provide a thorough survey and recent
advancements in this topic for autonomous vehicles.
Various types of sensors are available to provide critical information about the
surrounding of autonomous vehicles such as RADAR (Radio Detection And
Ranging), LIDAR (Light Detection and Ranging), ultrasound, camera, thermal
camera, and other few. Every sensor has its own pros and cons as mentioned in
Fig. 2 and a detailed comparison can be seen in Table 1. And these sensors provide
data which in itself is not enough for an autonomous system to make inference
from and so we need to use configurations of these sensors to obtain some useful
Fig. 2 Comparison of strengths and weaknesses of different sensors shown on a radar chart. Data
taken from [12]
698 T. Morkar et al.
Table 1 Comparison of
Sensor Range Reliability in all condition Feasibility
sensors with respect to their
properties LIDAR Good High Low
Ultrasound Bad High High
RADAR Good High High
Camera Best Low High
information. Sensors can be categorized into various categories based on their func-
tioning, way of working, and various other parameters [10]. Understanding these
sensor technologies and choosing the correct configuration is vital.
The sensor configuration must be in such a way that the vehicle needs no heavy
alterations, increase the area of sensing for the vehicle, and maximize the use of
already existing sensors on the vehicle. And keeping this in mind, Cho et al. [13]
have designed a system with 6 RADARs, 6 LIDARs, and 3 cameras that can detect
any object within 200 m of the vehicle and at least 2 different sensors can detect
an object if it is within 60 m of range from the vehicle. This system was used
for detecting moving objects and tracking them in an urban driving environment.
Sensors selected for the system should not only be as accurate as they can be but
also feasible and efficient. A sensor fusion framework was proposed where tasks
like road segmentation, obstacle detection, and object tracking are performed with
low-cost, high efficiency, and robust methods using an encoder-decoder-based fully
convolutional neural network and an Extended Kalman Filter [2]. They were able to
obtain a detailed map of the environment and successfully implemented the sensor
fusion of LIDAR, RADAR, and camera data.
Being able to implement an efficient and useful configuration of sensors on an
autonomous vehicle is an important step and so is the location and orientation of the
sensors on it. VESPA (Vehicle Sensor Placement and orientation for Autonomy) [3],
is a system proposed to optimize the placement and orientation of the sensors on a
vehicle given its field of view zones, set of sensors, and required ADAS (Advanced
Driver Assistance System) features on the vehicle. They have tested the system for
optimizing the perception performance of 2 vehicles, namely, the 2016 Chevrolet
Camaro and the 2019 Chevrolet Blazer. Studies have also been conducted to maxi-
mize the results by altering the sensors and the ways they are used like in the case
of Radar [14, 15] and LIDAR [16, 17].
When using multiple sensors, due to the absence of a time clock, the data streams
from different sensors can be out of sync and it can create a problem in the process
of sensor fusion. Various techniques for synchronization and calibration of sensor
data are being studied by researchers. Car vibrations and steering events can be used
to synchronize the driving data from various inputs in some cases [4]. The data
was properly synchronized with an average synchronization error of 13 ms by the
Autonomous Vehicles: A Survey on Sensor Fusion, Lane … 699
authors and this method requires no manual annotation before, during, or after the
data collection phase, and no artificially created synchronization events are needed
but this method has not been evaluated on large multi-vehicle on-road datasets.
An automatic method of calibration is proposed in a paper [5] where the authors
show how multiple 2D and 3D sensors like LIDARs and cameras can be calibrated
automatically using a rolling target in front of the vehicle such as a ball. They use
various algorithms to detect and track the center of the rolling ball through all the
sensors to synchronize their data streams. 3D maps are usually generated by sensor
fusion of LIDAR and camera data to help the decision-making module of autonomous
vehicles. Data calibration of LIDAR and camera [18, 19] is very important to get
a precise 3D map output. An ROS framework was also proposed by Oliveira et al.
[20] which was able to calibrate the multiple sensors in a multi-modal manner with
similar accuracy as compared to the standard pairwise calibrations of sensors. High
definition 3D map generation using multiple sensors like GNSS (Global Navigation
Satellite System), IMU (Inertial Measurement Unit), and LIDAR using point clouds
[21] can be used for navigation systems in vehicles where accurate sensor calibration
is required.
Techniques used for sensor fusion vary from sensors to sensors and use-cases to use-
cases. The most widely used technique in sensor fusion is using Extended Kalman
Filters (EKF) to fuse different sensor data streams and minimize the noise. An EKF
that reflects on the distance characteristics of LIDAR and RADAR sensors [6] can
be used to accurately estimate the distances from the vehicle and hence, improve
the accuracy of position estimation. For better navigation system and localization on
maps, an Extended Kalman Filter is used for fusing data from 3D LIDAR, GNSS, and
inertial data [7]. This allows to accurately localize the car on a map in urban areas too.
Also, Adaptive Kalman Filter (AKF) with an attenuation factor can help decrease
noise and assist in navigation based on INS/GPS and the accuracy reported by Liu
et al. [22], was 20% higher than that of a traditional AKF. A fuzzy logic enhanced
Kalman filter to fuse the information from machine vision, laser radar, IMU, and
speed sensors on the vehicles can also be used as an efficient sensor fusion technique
[8]. Fuzzy logic enhanced Kalman filter enables to reduce noise from sensors and
translate it into meaningful information for the autonomous system.
Deep learning is also used for sensor fusion in autonomous vehicles [10] for
purposes of localization and perception of the environment. While using sensors like
cameras, RADARs, LIDARs, and thermal cameras, deep learning techniques provide
a way to combine the data streams and make useful representations. Another proposed
method in deep learning is that feature level fusion can be done while working with
thermal cameras, visual cameras, and radars using RVNets and TVNets [11]. These
networks work together along with the skip connections to extract features to the
output branch. The feature fusion and object detection is done after the transfer.
700 T. Morkar et al.
Tasks like object detection require data fusion to work better and a deep learning-
based model, Camera, Radar Fusion Net (CRF-Net) [23] is specially designed for
identifying correct data fusion behavior for camera and radar sensors in order to
provide better results for detection purpose. Human activity recognition is also a
key feature to detect on roads and a study [24] shows how they used Long Short-
Term Memory (LSTM) networks to train the deep learning models that detect human
activity by applying it to the classification model at a sensor fusion level of the system.
It aids in better performance of the model.
Wang et al. [25] proposed LaneNet, a Lane detection system to achieve a diverse,
less computational, and real-time solution. The proposed architecture divides the
task in stages: Lane Edge Proposal and Lane Localization network, it runs a binary
classification on the image pixels to generate lane edges, these are then fed into the
Table 2 Accuracy and F1 score metrics comparison of different approaches for lane detection
Approach Accuracy F1 score
Towards End-to-End lane detection: an instance segmentation approach 0.964 –
Ultrasound 0.9563 0.9677
RADAR 0.89a –
Camera 0.942 0.739
aRONELD is not a standalone technique it is used to improve on other techniques used in lane
detection
Autonomous Vehicles: A Survey on Sensor Fusion, Lane … 701
localization network in the next stage. This can be seen in the block diagram from
Fig. 3.
These stages are optimized for precision and computational cost yet running
speed. Alight-weight encoder-decoder architecture is composed of stacked depth
wise convolution layers, with 11 convolution filters for fast feature encoding. It is
then converted into lane edge coordinates for feeding into next levels, now the high
speed localization comes into picture which consists of point function encoder and
LSTM decoders for robust detection in various scenarios.
This double staged technique creates a space for additional optimization in the
lane edge map created by the first network which serves as interpretable inline
features, this mitigates the effect of the neural network-based method’s black-box
property and makes identification failures more trackable. This further refines param-
eters of the network in a weakly-supervised manner, alleviating manner that needs
proper-annotated training data. Last but not least, the proposal network feature can be
merged into the semantic segmentation network, further lowering the total computing
expense of driving assistance systems.
Neven et al. [26], trained a neural network from start till finish for lane detection,
this takes into consideration the abrupt lane switching issue as well as the available
number of lanes. These features can be accomplished by treating the problem as
702 T. Morkar et al.
instance segmentation. Lane Net architecture uses binary segmentation with a clus-
tering loss function to optimize single-shot segmentation. Every lane is assigned an
identifier of its corresponding lane. As the network generates a series of pixels per
lane, they must also fit the curvature across these pixels to obtain lane parameters.
Traditionally, the lane pixels are projected onto a “bird’s-eye view” representation
first, using a transformation matrix. This creates an issue with generalization of
transition parameters with non-flat grounds such as slopes, hills, this can be addressed
with a set of learnable parameters of a neural network called H-net. Instead of a
handcrafted method, this learning-based method fits a transform with a polynomial
curve rather than linear transforms.
Results of this approach outperforms techniques with fixed transformation when
H-Net is used for lane detection. It also gives us a better mean-squared-error-score and
can match points even in undulated slopes. This paper stood 4th in the tuSimple chal-
lenge with just 0.5% differential from the first entry. These results were achieved on a
model which was only trained on the tuSimple dataset yet has a good generalization
of parameters.
Meyer et al. [27] proposed a novel approach for detecting road markers, lane bound-
aries, and central lines by reformulating the polyline detection problem as a bottom-
up composition of small line segments capable of detecting bounded, dotted, and
continuous polylines with one head. This approach has some significant benefits
over previous approaches. The approach is well-suited for real-time applications
with almost no restrictions on the forms of the observed polylines at 187 FPS.
A sequence of Recurrent Neural Networks (RNNs) has been proposed for highly
effective yet automatic instance segmentation. They each anticipate a bounding
box/crop around an entity and then use gated graph neural networks to predict
the polyline vertices node by node, with optional refinement. Not only are those
RNNs typically slow and difficult to train, but they often require extra care to predict
the initial vertex, which is based on a more general initialization that works well
for instance segmentation but not at all for their applications. Though quicker than
recurrent methods, 30 ms is still a long time in comparison to single shot detectors.
Furthermore, the same head can distinguish both dotted and continuous lines. This
allows robotic applications such as road marking identification and lane centerline
determination, but it is also feasible for a wide range of other applications, such
as blood vessel detection. Although they demonstrated the general concept with
YOLO9000, using a modern backend such as Efficient Net and adding ideas from
YOLOv4 is expected to increase efficiency even more.
Autonomous Vehicles: A Survey on Sensor Fusion, Lane … 703
Lucas et al. [28] proposed LaneATT, an anchor-based deep lane detection model
that, like other generic deep object detectors, uses anchors for feature pooling. This
hypothesis that since lanes follow a normal pattern and are highly correlated, global
knowledge could be critical in some cases to infer their locations. As a result, this
paper proposes a novel anchor-based focus framework for aggregating global infor-
mation. The model was thoroughly tested on three of the most commonly used
datasets in the literature.
LaneATT is a single-stage anchor-based paradigm for lane detection (similar to
YOLOv3 or SSD). By integrating local and global functionality, the model can more
effectively use details from other lanes, which could be needed in situations where
conditions such as occlusion or not visible lane markers exist. Finally, the merged
features are sent to completely connected layers, which forecast the final output lanes.
The system achieves the second-highest registered F1 on TuSimple despite being
much quicker than the top-F1 method (171 vs. 30 FPS). The method stood as a
new state-of-the-art for real-time methods on CULane in terms of both speed and
precision (+4.38% of F1 compared to the state-of-the-art method with a comparable
speed of about 170 FPS). Furthermore, on all three backbones, the system scored a
high F1 (+93%) on the LLAMAS benchmark.
The research work by Chng et al. [29] differs in the fact that others are techniques to
detect lanes on road and the method proposed in this paper works as an enhancing
technique for the already existing lane detection method. RONELD is used in pairs
with deep learning models that perform poorly for effective lane detection. This
approach is based on observations that forecast lane marking from this network’s
probability map will boost accuracy efficiency.
Accuracy results can be improved in various folds. RONELD is expected to be a
prime approach that uses network probability maps to improve the system’s output.
This solution has low computational time, making it eligible for real-time systems.
The proposed method was tested with Spatial CNN and Efficient Net, resulting
in higher accuracy and lower processing time for RONELD. This shows that by
combining this method the overall effectiveness of the techniques can be improved.
The precision of traditional techniques can be improved by 69.4% with 0.3 to 0.4 IoU
thresholds, and further up by 2 folds on the more constrained 0.5 threshold against
both previous methods.
704 T. Morkar et al.
Oršić et al. [30] have shown the success and robustness of semantic segmentation
methods on real road driving datasets, even in challenging visibility conditions.
However, real-time inference remains a challenge due to the tremendous computa-
tional power requirements. With this paper, the authors have proposed a light-weight
and faster approach for achieving semantic segmentation in real-time.
Currently, deep fully-convolutional models provide the best results for semantic
segmentation, but require extraordinary computational resources. Most of the
approaches which currently try to deal with this issue, make use of some custom
made light-weight architectures. These approaches are not ideal for visual percep-
tion on a large scale. The authors state that these approaches don’t make use of a
huge regularization benefit offered by transfer-learning from a larger dataset, with
more diverse data. Thus, they are prone to overfitting.
Oršić et al. [30] proposed a segmentation model which is built upon a pre-trained
ImageNet encoder which benefits from the regularization induced by knowledge
transfer, and a decoder that helps restore the resolution of encoded features. On the
Cityscape dataset, this approach is able to achieve 75.5% mIoU while processing
Autonomous Vehicles: A Survey on Sensor Fusion, Lane … 705
1024 × 2048 images at 39.9 Hz on a GTX 1080Ti, and thus the authors argue that
this method provides an acceptable speed-accuracy trade-off.
Yu et al. [32] have tried to address the dilemma of spatial resolution and real-time
performance trade-off with a new approach, using a Bilateral Segmentation Network
(BiSeNet). The proposed architecture, as shown in Fig. 4, achieves an optimal balance
between segmentation performance and speed on datasets like Cityscapes, CamVid,
and COCO-Stuff.
To minimize the loss of spatial details, most current approaches use the U-Shaped
architecture. But it has two major drawbacks. (1) The U-Shaped structure can slow
down the model due to computational power required for high resolution maps. (2)
Most of the spatial information cannot be recovered after being lost in the crop-
ping processes. To combat this, the authors propose a new Bilateral Segmentation
Network (BiSeNet) with two sections: Context Path (CP)and Spatial Path (SP). These
components are designed to resolve the issues like the loss of spatial information and
shrinkage of the receptive field.
On testing, it was found that this method could obtain a large receptive field very
rapidly. With the rich spatial details and a large receptive field, this architecture
achieves 68.4% mIoU on Cityscape’s dataset at 105 FPS.
706 T. Morkar et al.
Zhao et al. [33] proposed an image cascade network (ICNet). This method utilizes
multi-resolution branches under proper label guidance to resolve the challenge of
reducing the amount of computational power required for pixel-wise inference. They
have provided a detailed analysis of their framework and introduced the cascade
feature fusion unit to achieve real-time segmentation with high accuracy.
After an in-depth analysis of time budget, the authors developed an Image Cascade
Network (ICNet), a high efficiency segmentation method. It makes use of the effi-
ciency of processing low-resolution images, and the high inference quality of high-
resolution images. In this approach, the low-resolution images through a full semantic
perception network to generate a coarse prediction map. In the second step, cascade
feature fusion units and cascade label guidance strategies are used to combine the
medium and high resolution features, which help to refine the coarse prediction map.
This architecture is able to achieve 5 × improvement in inference time and 5 ×
reduction in memory consumption. It can run at 30 FPS on 1024 × 2048 input from
various datasets such as Cityscapes, CamVid, and COCO-Stuff.
3 Discussion
Sensor fusion can be used to improve the quality of available data, decrease noise,
increase reliability, estimate unmeasured states and increase coverage of the sensors.
There are various issues while combining multi-sensor data like calibration, data
parallax, and synchronization issues. We saw the various techniques of how sensor
Autonomous Vehicles: A Survey on Sensor Fusion, Lane … 707
4 Conclusion
Sensor fusion is the most important bridge between the real world raw data and
the decision making unit of any autonomous vehicle. Working on improving sensor
fusion techniques is one the promising ways to improve performance and safety
of autonomous vehicles. Lane detection poses various problems such as weather
condition, road conditions, and computation needed. It is necessary to choose a lane
detection technique with good performance for better understanding of the roads
for the vehicle systems. The segmentation of drivable areas are critical capabilities
to achieve autonomous navigation for autonomous vehicles too. One of the main
issues faced here is the computational cost and the trade-off between the quality and
real-time response needs to be handled carefully according to the use cases of the
systems.
The research works studied in this paper were identified from recent papers with
state-of-the-art techniques and a thorough investigation was carried out to select
708 T. Morkar et al.
important ones that will prove to help in further studies and discussions. A proper
in-depth analysis was also done regarding the performance, response time, and
computational cost to highlight the pros and cons of the mentioned methods.
All these modules of sensor fusion, lane detection, and drivable area segmentation
were discussed in detail and can serve as a strong base for building a proper, efficient,
and safe autonomous vehicle system.
References
18. Y. Zhu, C. Li, Y. Zhang, Online camera-LiDAR calibration with sensor semantic information.
IEEE Int. Conf. Robot. Autom. (ICRA) 2020, 4970–4976 (2020)
19. E.-S. Kim, S.-Y. Park, Extrinsic calibration between camera and LiDAR sensors by matching
multiple 3D planes. Sensors 20(1), 52 (2020)
20. M. Oliveira, A. Castro, T. Madeira, E. Pedrosa, P. Dias, V. Santos, A ROS framework for the
extrinsic calibration of intelligent vehicles: A multi-sensor, multi-modal approach. Rob. Auton.
Syst. 131 (2020)
21. V. Ilci, C. Toth, High definition 3D map creation using GNSS/IMU/LiDAR sensor integration
to support autonomous vehicle navigation. Sensors 20(3), 899 (2020)
22. Y. Liu, X. Fan, C. Lv, J. Wu, L. Li, D. Ding, An innovative information fusion method with
adaptive Kalman filter for integrated INS/GPS navigation of autonomous vehicles. Mech. Syst.
Signal Process. 100, 605–616 (2018)
23. F. Nobis, M. Geisslinger, M. Weber, J. Betz, M. Lienkamp, A deep learning-based radar and
camera sensor fusion architecture for object detection. Sens. Data Fusion Trends Solutions
Appl. (SDF) 1–7 (2019). https://doi.org/10.1109/SDF.2019.8916629
24. S. Chung, J. Lim, K.J. Noh, G. Kim, H. Jeong, Sensor data acquisition and multimodal sensor
fusion for human activity recognition using deep learning. Sensors 19(7), 1716 (2019). https://
doi.org/10.3390/s19071716
25. Z. Wang, W. Ren, Q. Qiu, LaneNet: Real-time lane detection networks for autonomous driving.
arXiv:1807.01726 [cs.CV] (2018)
26. D. Neven, B. De Brabandere, S. Georgoulis, M. Proesmans, L. Van Gool, Towards End-to-End
lane detection: An instance segmentation approach. arXiv:1802.05591 [cs.CV] (2018)
27. A. Meyer, P. Skudlik, J.-H. Pauls, C. Stiller, YOLinO: Generic single shot polyline detection
in real time arXiv:2103.14420 [cs.CV] (2021)
28. L. Tabelini, R. Berriel, T.M. Paixão, C. Badue, A.F. De Souza, T. Oliveira-Santos, Keep your
eyes on the lane: Real-time attention-guided lane detection. arXiv:2010.12035 [cs.CV] (2020)
29. Z.M. Chng, J.M.H. Lew, J.A. Lee, RONELD: Robust neural network output enhancement for
active lane detection. arXiv:2010.09548 [cs.CV] (2020)
30. M. Oršić, I. Krešo, P. Bevandić, S. Šegvić, In defense of Pre-trained ImageNet architectures for
real-time semantic segmentation of road-driving images. arXiv:1903.08469 [cs.CV] (2019)
31. V. Nekrasov, C. Shen, I. Reid, Light-weight RefineNet for real-time semantic segmentation.
arXiv:1810.03272 [cs.CV] (2018)
32. C. Yu, J. Wang, C. Peng, C. Gao, G. Yu, N. Sang, BiSeNet: Bilateral segmentation network for
real-time semantic segmentation. arXiv:1808.00897 [cs.CV] (2018)
33. H. Zhao, X. Qi, X. Shen, J. Shi, J. Jia, ICNet for real-time semantic segmentation on high-
resolution images. arXiv:1704.08545 [cs.CV] (2018)
Identification of Malicious Access in IoT
Network from Connection Traces
by Using Light Gradient Boosting
Machine
Abstract In this paper, light gradient boosting machine (light GBM) ensemble
learning model based model has been used for identification of malicious access
in IoT network from connection traces. This proposed approach starts with exclusive
feature bundling (EFB) process to combine features that are mutually exclusive. Then
the Gradient-based One-Side Sampling (GOSS) for generation of re-sampled dataset
in which data selection is based on loss of the initial model and the absolute values
of gradients. The final model is designed by using re-sampled data in which efficient
splitting on data is achieved through information gain. The performance evalua-
tion of the proposed model is conducted with state-of-the-art ensemble learning and
machine learning based model in order to get overall generalized performance and
efficiency.
1 Introduction
The Internet of Things has made the system where people immensely connected
through the networks and IoT devices. It has transformed their lifestyle and also has
a significant impact on the way they work every computational activity, ultimately
transforming them from traditional to the modern time. The Internet of Things (IoT)
is an exciting and fantastic way to connect thousands or billions of devices for data
collection, analysis, sensing, and controlling other devices. It has encouraged new
opportunities and continuously providing endless solutions to many current diffi-
cult challenges. Due to exponential growth of IOT applications even within a short
period of time it’s not so far that, more than trillions of IoT gadgets may connect to
E. Oram · M. R. Senapati
Department of Information Technology, Veer Surendra Sai University of Technology, Burla,
Sambalpur, Odisha 768018, India
B. Naik (B)
Department of Computer Application, Veer Surendra Sai University of Technology, Burla,
Sambalpur, Odisha 768018, India
e-mail: bnaik_mca@vssut.ac.in
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 711
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_53
712 E. Oram et al.
the internet by the end of 2022. But at the same time, as demand for IoT devices and
more applications are emerging; designs architecture of working models becomes
increasingly complicated. The expansion of IoT environments is introducing a lot
of redundant risk factors to the systems knowingly or unknowingly provoking a red
signaling situation in the near future. As it operates on the data-based applications
and data-driven models for peoples, it become complicated than manual ways. As a
result, having a suitable solution for processing massive data in any complex scenario
is self-evident. The main concerns of any IoT-connected device and its application
areas are security privacy and the steps taken to address security concerns. Distributed
attacks and cross-platform scripting are other two major IoT security concerns. As
the majority of IoT devices are very adaptable for common household applications,
the framework must ensure that they are used safely and securely. However, dealing
with potential attacks like denial of service, man at the middle attack, congestion,
data interruption, malware threats, hacking, tampering etc. is a difficult issue for
IoT infrastructure. IoT security issues are of two categories: technological obstacles
and security management concerns [1]. The first one is associated with the technical
hurdles of the electronic devices and their nature of working in a virtual environment,
while the second entails the technicalities of the software framework’s failure. Tech-
nical issues can be resolved through adaptive physical intervention by professionals,
but the second category demands authorization and systems enabled with trust-based
authorization and point-to-point connectivity. Though the innovations and upgrades
of all linked devices are taking place, IoT-based security confirms command execu-
tion and develops the framework. There are several approaches to deal with IoT
device security issues, including all connected IoT devices to confirm each other’s
operations. And then most importantly, it is requiring each IoT device to first make
clear its own functionality before making connection with other devices for commu-
nication. Within distributed and high-performance computing networks with limited
computation energy like issues, it needs to retain authenticity of confidentiality,
accessibility, availability, and consistent information in the IoT system.
In this study, a novel Light boost GBM (Gradient Boosting machine) ensemble
learning approach [2] is used for the identifications of intrusive behaviors in IoT
framework. The ensemble method helps to enhance the execution of a single method
by adding variety of independent models. The total work constitutes two parts where
the first part focuses on feature construction by considering prediction of base classi-
fiers to detect the anomalies and the second part focuses on the construction of meta-
learners. The proposed technique gives better output in comparison with the other
ML approaches with a minor difference of performance parameters. The followings
are the major contributions of this study:
i. An optimized advanced ensemble learning model LGBM has been used for
identification of malicious IoT activities in the IoT network.
ii. The performance evaluation of the proposed model is conducted with state-of-
the-art ensemble learning and machine learning based model in order to get
overall generalized performance and efficiency.
Identification of Malicious Access in IoT Network from Connection Traces … 713
The rest segment of the article has been organized into following sections: Sect. 2
includes problem formulation followed by the representation of proposed model;
Experimental result and analysis are discussed in Sect. 3; and Sect. 4 is Conclusion
and Future scope of this research.
2 Proposed Work
This proposed study makes use of Light GBM for prediction of anomalies type.
Let I = {I1 , I2 ...In } is the past IoT network activity
traces with anomalies instances
collected over the time, and Ii = Ii,1 , Ii,2 ...Ii,m , ai is the ith instance of past activity
traces with anomalies instance ai . The IoT device’s connection traces and communi-
cation profile are considered as a vector of m number of features and anomaly type
ai can be any of the anomaly types ai ∈ a. The attack type ‘a’ can be visualized as
a = {a1 , a2 , a3 , a4 , a5 , a6 , a7 , a8 } where a1 = dP (dataProbing), a2 = DoS (DoSat-
tack), a3 = mC (malitiousControl), a4 = mO(malitiousOperation), a5 = spying,
a6 = scan, a7 = wSU(wrongSetUp), and a8 = Normal. We have considered IoT
dataset from kaggle [3] for the experimentation. This is a synthetic data collected by
simulation through a virtual setup called DS2OS (Distributed Smart Space Orches-
tration System). This dataset is the collection of the communications traces between
numbers of IoT nodes connected in the network. This dataset has 357,952 samples
with 13 no. of features and eight no. of classes. It includes a variety of anomalies
such as DoS, mC, mO, scan, spying, and wSU, having the percentage of distributions
3.4%, 57.7%, 8.8%, 8%, 15.4%, 5.3%, and 1.2% respectively. This is an imbalance
data and hence random naive oversampling [4, 5] is used in this work prior to training
the model in order to get rid of class imbalance problem [6]. This makes uniform
sample distribution in the dataset, roughly achieving sample distribution of 14.28%
for each class.
ai = lightGBM Ii (1)
In Eq. 1, a Ii represents predicted anomaly type, Ii is the ith instance of past
network trace without any label (i.e., activity type information) and lightGBM Ii
is the prediction on Ii . The min–max scalar and label encoding has been used for
data pre-processing. In this experiment, the poor prediction performance is observed
in machine learning models such as DT (Decision Tree), LDA (Linear discrimi-
nant analysis), MLP (Multi Layer Perceptron), NB (Naïve Bayes), and LR (Logistic
regression). So, one efficient boosting ensemble learning technique, i.e., light GBM
has been used, which is light weight in terms of computation and still efficient as
compared to other boosting approach. The major steps of the proposed approach are:
(i) Using exclusive feature bundling (EFB) to combine features that are mutually
exclusive, (ii) Designing an initial model with minimum loss θ 0 (iii) Computation of
absolute values of gradients, (iv) Using Gradient-based One-Side Sampling (GOSS)
714 E. Oram et al.
for generation of re-sampled dataset S1 and S2, and Merge S1 and S2 and create a
dataset D* = S1 + S2, (v) Finding optimal split by calculating information gain, (vi)
Update the model θ m = θ m-−1 + θ m * and (vii) Obtaining the final model θ M (Fig. 1).
3 Simulation Results
The Table 1 presents the performance of DT, LDA, MLP, NB, LR, RF (Random
Forest) [7], Bagging [8], AdaBoost [9], GBT [10], XGBoost [11], and Proposed
Identification of Malicious Access in IoT Network from Connection Traces … 715
The communications between several IoT devices in the network should be secure
to keep up the necessary protection. In line with this, a system that monitors and
identifies suspicious activities in the network is desirable due to increasing use of IoT
devices. In this work, we have used light GBM for identifying anomalies in the IoT
environment. This proposed model makes use of past IoT devices’ connection traces
with various anomalies types for designing a model which can identify anomalies in
future connection traces. Due to increasing use of IoT devices in many applications,
the security, privacy, and reliability became a challenging goal to achieve. Although
no system can provide complete security solution, this may be an add-on to existing
security solutions.
716 E. Oram et al.
Fig. 2 Model’s prediction of anomalies a DT, b LDA, c MLP, d NB, e LR, f RF, g Bagging, h
AdaBoost, i Gboost, j XGBoost, k Proposed LightGBM
Identification of Malicious Access in IoT Network from Connection Traces … 717
Fig. 2 (continued)
718 E. Oram et al.
Fig. 2 (continued)
Identification of Malicious Access in IoT Network from Connection Traces … 719
Fig. 2 (continued)
720 E. Oram et al.
References
1. P.N. Mahalle, B. Anggorojati, N.R. Prasad, R. Prasad, Identity authentication and capability
based access control (IACAC) for the Internet of Things. J. Cyber Secure. Mobil. 1, 309–348
(2013)
2. G. Ke, Q. Meng, T. Finley, T. Wang, W. Chen, W. Ma, Q. Ye, T.Y. Liu, Lightgbm: A highly
efficient gradient boosting decision tree. Adv. Neural Inf. Proc. Syst. 30, 3146–3154 (2017)
3. M.-O. Pahl, F.-X. Aubet, DS2OS traffic traces, 2018, (https://www.kaggle.com/francoisxa/ds2
ostraffictraces). [Online; Accessed 29 Dec 2018]
4. N.V. Chawla, K.W. Bowyer, L.O. Hall, W.P. Kegelmeyer, SMOTE: synthetic minority over-
sampling technique. J. Artif. Intell. Res. 16, 321–357 (2002)
5. A. Liu, J. Ghosh, C.E. Martin. Generative oversampling for mining imbalanced datasets, in
DMIN (2007), pp. 66–72
6. G. Lemaître, F. Nogueira, C.K. Aridas, Imbalanced-learn: A python toolbox to tackle the curse
of imbalanced datasets in machine learning. J. Mach. Learn. Res. 18(1), 559–563 (2017)
7. S. Messaoud, A. Bradai, S.H.R. Bukhari, P.T.A. Qung, O.B. Ahmed, M. Atri, A survey on
machine learning in internet of things: Algorithms, strategies, and applications. Internet of
Things, 100314 (2020)
8. L. Breiman, Bagging predictors. Mach. Learn. 24(2), 123–140 (1996)
9. T. Hastie, S. Rosset, J. Zhu, H. Zou, Multi-class adaboost. Stat. Interface 2(3), 349–360 (2009)
10. J.H. Friedman, Greedy function approximation: A gradient boosting machine. Ann. Stat. 1189–
1232 (2001)
11. T. Chen, T. He, M. Benesty, V. Khotilovich, Y. Tang, H. Cho, Xgboost: extreme gradient
boosting. R Package Version 0.4–2 1(4) (2015)
Big Data in Education: Present
and Future
Abstract Any mechanism aims to formulate the existence of a human being simple
and comfortable. Big data is using to take out the important data from a huge quan-
tity of structured and unstructured data. From the last few years, usage of learning
management system has been rapidly rising. Students have initiated utilizing cell
phones, mainly smartphones that have become a part of their daily basis to get the
online content. Large amount of idle information produced from the student’s online
activities is not able to process further. This scenario has outperformed in the diffu-
sion of big data techniques into education field. Big data changed the learning style by
building it into simple, unproblematic and exciting. This results in the enhancement
of big data techniques to process the large quantity of data. This study looks into the
challenges faced by many students and latest applications of big data technologies
in educational sectors.
J. Nayak (B)
Department of Computer Science, Maharaja Sriram Chandra BhanjaDeo (MSCB) University,
Baripada, Odisha 757003, India
H. Swapnarekha
Department of Information Technology, Aditya Institute of Technology and Management
(AITAM), Tekkali, K Kotturu, AP 532201, India
A. R. Routray
Department of ICT, Fakir Mohan University, Balasore, Odisha 756019, India
S. R. Nayak
Amity School of Engineering and Technology, Amity University Uttar Pradesh, Noida, India
H. S. Behera
Department of Information Technology, Veer Surendra Sai University of Technology, Burla,
Sambalpur, Odisha 768018, India
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 721
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_54
722 J. Nayak et al.
1 Introduction
Nowadays, the Internet is playing a vital role, and the number of the online commu-
nity is growing vastly. Billions of Internets users are producing a huge amount of data
and transforming that information to remaining users, respectively [1]. Here, data
will store the characters, symbols or quantities that are performed by a computer. The
term ‘big data’ refers to any kind of data [2] that is very huge, and traditional applica-
tions are not sufficient to process them. So, no other conventional data management
tools are capable to restore it efficiently. Examples of big data contain the quantity of
data collected on the Internet every day, twitter feeds, mobile phone location infor-
mation, stock exchanges, social media sites, jet engines and YouTube videos viewed,
etc. Big data is of three types: structured, unstructured and semi-structured. Struc-
tured data possesses predefined format and is easy to access. Unstructured data is of
natural type and is not formatted till their usage. The combination of both structured
and unstructured data is of the type semi-structured. Volume, variety, velocity and
variability are the main characteristics of big data [3]. Volume is related to the size
which plays a vital role in verifying the value out of data. Variety is associated with
various resources and behavior of data. Velocity is the speed of data, and variability is
the inconsistency that hampers the procedure of being capable to hold and control the
data efficiently. Better decision-making, enhanced customer service and improved
operational effectiveness are few advantages of big data.
We are in the era of data, where we can acquire a big quantity of data [4]. Gener-
ally, data is produced from all the sectors such as the aviation sector, social media and
sports as well as in the sector of education. The major focus of this study is to inves-
tigate the role of big data in the educational field and focus on various domains of
education for better solution. The role of big data in concerned domain offers various
advantages to students as well as educational institutions. The education sector has
been important for both individuals and society. A wealthy economy requires expe-
rienced workers with the ability to initiate and make businesses in one hand [2].
Similarly, on the other hand, persons with career objectives will always be looking
to reside at the skill acquisition and cutting edge of education. In the current learning
environments, users are learning through various online communities such as online
chats, discussion forums, instant messaging. To learn the required contents, students
started browsing the Internet for accessing their courses. Due to these activities of
the student, a huge amount of data has been produced by the learning manage-
ment systems. On the other hand, many educational organizations have also created
large data that utilizes applications to deal with the classes, students and courses.
As a result, the amount of available data is massive. Conventional data processing
methods cannot be handled to process those. Due to this, the educational institutions
have initiated discovering big data technologies to progress the educational data.
Big Data in Education: Present and Future 723
The quality of education can be enhanced by applying big data within various settings
of education system such as administration, student learning and teaching delivery
process. The application of big data in education allows the educational institutions
to understand the challenges encountered by the students and to find out the strategies
that address these issues. The following Fig. 1 depicts some of the applications of
big data in education.
Providing quality education and enhancing the behavior of the students are considered
as fundamental objective of the educational system. Therefore, predicting students’
performance assists the teacher in identifying and enhancing the performance of
the weak students which in turn enhances the overall quality of education. Various
prediction models have been proposed by several researchers for predicting the
student performance. Punlumjeak et al. [5] have proposed feature selection and
machine learning model for predicting the student performance on cloud environ-
ment. The proposed model is used for finding the problem areas. In addition, it
Prediction of
student
dropout
Prediction of
Construction
student
of curriculum
performance
Applica ons
of Big Data
in Educa on
Analyzing Course
learners recommender
behaviour system
is also used to understand the factors that impact the performance of the students.
Further, the proposed model has been evaluated on Rajamangala University of Tech-
nology students’ data, and the empirical results indicate that feature selection with
neural network classifier obtained better accuracy of 90.60% when compared with
the feature selection with decision tree. To enhance the quality of educational insti-
tutions, K-nearest neighbor (KNN) classification approach has been suggested by
Nagesh et al. [6]. The suggested approach predicts the performance of the students
in end semester examination using Hadoop MapReduce environment. The following
Table 1 illustrates other research work carried out in the prediction of students’
performance.
Generally, students are provided with a wide option of courses they can select for
formal or informal classroom learning at the secondary and higher education. Being a
beginner, it is always difficult for the student to choose the correct course. Normally,
students will make a selection based on the recommendation of seniors or teacher’s
expertise in subject or even based on the difficulty or attractiveness of the course.
Certainly, such a selection process cannot provide an overall evaluation of the candi-
date related to the course. Therefore, an intelligent course recommender system is
needed to assign relevant course to the student. A recommender system based on
collaborative filtering has been proposed by Dwivedi and Roshni [20]. Based on the
grade points attained by the candidate in other subjects, the proposed system suggests
elective courses to be selected by the candidate. In addition, item-based recommen-
dation of Mahout machine learning library has been utilized by the authors on the
top of Hadoop framework to produce set of recommendations. The patterns between
grades and subject have been identified using similarity log-likelihood. Further, the
performance of the recommendation has been evaluated using root mean square
error between actual and recommended grades. The following Table 3 represents the
analysis of other works carried out on the course recommender system.
Due to the advancement of online learning, a large volume of student activity data
is available. To identify students at risk, it is necessary to analyze large volumes of
student’s activity data to identify the patterns of student’s learning behavior. In addi-
tion, analyzing behavior of student is very essential for generating student-centered
726 J. Nayak et al.
learning system. Understanding student’s behavior further assists both the teacher
and student to attain the educational goals. A systematic review on the prediction of
the students at risk is performed by Na and Tasir [26]. The authors have analyzed
the student’s learning behavior on online learning in conjunction with the analyt-
ical methods and types of data. From the findings, it was observed that almost all the
analytical methods and various types of data have successfully predicted the students
at risk. Other works on the analysis of learner’s behavior are shown in Table 4.
Big Data in Education: Present and Future 727
The main purpose of integrating big data in education is to assist the policymakers
and academicians to automatically design the course and learning content. Further, it
also encourages the exchange of current learning resources among distinct systems.
Table 5 represents the works carried out in the construction of curriculum using big
data technology.
728 J. Nayak et al.
3 Critical Analysis
In this article, a systematic analysis on the applications of big data in education has
been carried out. From the analysis, it is noticed that various techniques such as
clustering, classification and regression have been successfully utilized for the inter-
pretation of big data in educational system. This section presents a precise analysis
on the applications of big data in educational system along with the percentage of
articles published in distinct applications of big data in the education domain using
various open-source tools and techniques.
Due to the latest advancements in the education system, large amount of educational
data is available in the current learning system. Therefore, to explore the complex
Big Data in Education: Present and Future 729
Table 5 Other works performed on the construction of curriculum using big data technology
Author Framework/Technology Application Result References
and year
Hu et al. Virtual reality technology Construction of Provides learners [32]
(2020) curriculum for more diversified and
science, pragmatic sensuous
technology, learning materials
engineering, art
and Mathematics
(STEAM) in
primary and
middle schools
Li et al. MOOCs + SPOC Construction of Enhances teaching [33]
(2020) innovative effectiveness and
curriculum for develops the habit of
computer solving real-world
fundamentals problems by
students
Liu – Design and Allows the health [34]
(2020) evaluation of professional to
humanistic implement
curriculum system humanistic values in
for nursing their services
students
Zhang Hadoop Construction of Enhances the [35]
et al. mathematics teaching reforms of
(2018) education mathematics and
curriculum efficacy of higher
education
Jensen CoNVO framework Incorporating big Undergraduates can [36]
(2017) data tools into be provided with a
undergraduate MIS single course to
curriculum essential prospects
of data science
18.81698437 25.48474188
2.192587744
19.47148818
34.03419782
education system using big data such as prediction of student performance, recom-
mender system for course selection. It is noticed from the figure that 34.03% of
the articles have been published on the intelligent course recommender system at
different levels of education. Next, majority of articles have been published on the
prediction of student performance to improve the quality of education (25.48%).
Among the distinct applications, less work has been published in the prediction of
student dropout using big data, i.e., 2.19%.
To provide more customized services and enhanced efficiencies, analysis of big data
needs to be performed through the usage of various open-source tools. The following
Fig. 4 depicts some of the open-source tools used in mining educational data.
The percentage of articles published in mining educational data using various
open-source tools has been represented in Fig. 5. From the figure, it is identified that
58.49% work has been published using Orange tool which is a Python-based tool
used for processing big data. Next, 18.47% work has been carried out using Weka
tool which is a Java-based tool for processing big data. After Weka tool, 9.7% of the
work has been carried using Hadoop framework which allows distributed processing
of datasets over clusters of computer networks. Next, 8% work has been done using
MapReduce which allows parallel processing of clusters of computer network. It
is also noticed from the figure that least work (i.e., 5.2%) has been carried using
MangoDB which uses JSON-like documents rather than table-based architecture.
Mango
Weka
DB Open
source
tools
MapRe
Orange
duce
732 J. Nayak et al.
5.248497288
58.49582173
Fig. 5 Distribution of articles in educational data mining using various open-source tools
Nearest
Clustering
neighbor
• Group the records •Values are predicted by
of similar type by using the predicted
finding the distance values of the records
between them in an that are nearest to the
n-dimensional space record
60
40
20
0
% of ar cles published on Big datain educa on using various Techniques
technique (i.e., 58.79%). Next, regression technique has been used in addressing the
processing challenges of big data in education. It is also observed that least work has
been published using nearest neighbor.
Big data in education gives extraordinary prospects for instructors to contact and
train students in innovative ways. Various advantages of big data in educational field
include: (i) improvement of student results, (ii) customizing programs, (iii) reducing
dropouts, (iv) targeting international recruitments, (v) analytics of educators, (vi)
proposing of new learning plans or enhance the learning experience, (vii) career
prediction and (viii) credible grading, respectively, as shown in Fig. 8. Such type of
data will permit organizations to regulate their recruitment policies and assign funds
accordingly. Big data makes an advance for a progressive structure where students
will be trained in interesting techniques. Impacts of these applications of big data in
education can be explained briefly in the following sections.
With the applications of big data in the education field, the whole educational
system gathers the profits of this technology along with students as well as parents.
Student’s educational performance can be determined from their acquired results.
Here, every individual student creates a unique data track and examines this data
track in the current time which aids to get a better result in producing the best
learning surrounding and understands the individual performance.
734 J. Nayak et al.
Customized programs for individual students can be generated by using big data.
Blended learning which is an integration of both the online as well as offline learning
will give a possibility to make customized programs. These can be generated for every
student for providing them the opportunity to pursue classes that they are concerned
in and also can work at their own pace.
By taking the benefit of big data, a survey [37] has been done in which a huge online
database was utilized to forecast the success and failure of a student. The study
discovered that measures of pupil’s performance that refused over time were major
forecaster of the probability of dropping out. Females students with earlier transfer
credits or college education and older students are having more risk of dropouts.
Nowadays, due to the improvement of student results by using big data, the rate of
dropouts at institutes is decreasing day by day.
Here, educational sectors can more precisely forecast applicants and also examine
the feasible factors that influence the process of application. Such information will
let institutions to alter their recruit policies and assign funds accordingly. This influx
Big Data in Education: Present and Future 735
of information assists students to know the data about the remaining schools whole
over the world and also speeds up the hunting process for international pupils.
Due to the processing of data-driven systems, educators gather the utmost profits of
big data analytics. So, several institutions generate various learning experiences with
the accordance of student’s preferences as well as their learning abilities. Various
programs were promoted for supporting individuals to select course according to
their wish.
The educator manages every individual student gradually and commences a signif-
icantly more interesting and deep discussion on the subject of preference. This will
provide the possibility for each student to choose up a better comprehension of the
subjects, respectively. Also, it can improve the course plans and digital reading mate-
rial that is used by the students. This progressive incorporation not only develops
education principles but also builds a space for a better society.
The student’s performance feedback will help to know and understand their improve-
ments, strengths and weaknesses. These feedback reports suggest the area in which
students are paying attention and also will aid to know in which area that particular
student is searching for a profession. If any student is paying more attention to a
particular subject, then the decision of that student must be valued and the student
has to be supported to pursue what they want to follow.
Here, big data offers educators a quicker and more suitable way to score and rate
student tests, essays and quizzes. In big data, all the educational sectors have a
consistent and credible system for marking several numbers of papers as well as
releasing the results sooner by making the procedure simple for everyone. Big data
has got major changes to several aspects of education. According to research, the
736 J. Nayak et al.
5 Challenges
Certainly, big data and some other associated technologies like deep learning and
cloud computing are the input to a successful education system. Regardless of the
perspective of learning analytics, there has been doubted and uncertainty because
of challenges that ought to address for attaining preferred learning outcomes [38].
However, among all the chances and advantages, big data provides a group of chal-
lenges in the educational sectors. Certainly, big data resolved several problems, those
were struggled by the educators. Low Internet use competencies, faulty systems, etc.
are the negative impact of big data on education. Several challenges include limited
talent pool, scalability, and storage issue, data errors and data safety concerns.
The demand for trained data specialists is high because of having a continued warm-
up of more areas with big data. Due to the lack of data science courses in most of
the colleges, only a few people exist with the required skills to make sure flawless
implementation of big data in the educational field. As a result, major educational
institutes were unable to utilize the technology and proper resources.
In a few cases, the speed at which the information is gathered and scrutinized leaves
ahead of the processing abilities of obtainable big data machines. Slowdowns and
collides are the events that change the feature of analysis and the outcomes. Due to
this, developers must come forward with extra storage systems as well as scalable
processing that sustains both the current and upcoming needs.
As big data deals with the huge amount of data, there will be a chance of losing
some data while placing various datasets of the total student population in numerous
categories. This problem is majorly quite a common thing in cloud storage system
that is expensive to approve, yet it requires entirely new data.
Big Data in Education: Present and Future 737
Data safety is one of the biggest concerns regarding big data in the educational sector.
Data safety is very costly to efficiently run the active or constantly modernizing kind
of data. Proper guidelines require guaranteeing the rights of data, and some privacy
issues should take care so that the information can be protected from misuse.
6 Future Goals
As compared to the early data warehousing technique, big data has emerged as a
reliable and challenging model [6]. Especially, it has been an efficient tool for the
e-learning industry. However, in spite of these sufficient provisions, researchers,
educators and learners are basically driving no perceptibility. Current educational
analytics had the belief of supporting as a sense building factor in piloting an unsure
change by contributing an understandable assessment data and investigation, exhib-
ited through user-controlled visualizations. In many of the applications of education,
big data will be quite helpful for modern designing and planning of educational insti-
tutions as well as boosting for computer-assisted learning systems. Big data can be a
standalone system as a replacement for statistical models as a safeguard against over-
fitting data. There is a huge opportunity for research on the modern teaching–learning
process, and big data is one of the important models for that.
7 Conclusion
As the data concerned in education becomes superior, the purpose of big data tech-
niques turns out to be more essential in learning environments. So, the attention has
been paid toward big data in educational sectors. Big data can decrease dropouts, can
show modified learning surroundings to the users and can enlarge long-term study
plans. All this process is feasible through the successful utilization and improvement
of big data analytics in the educational sectors. With ignoring few drawbacks, this
technique is suitable and effective for applying in some applications. As a result, the
power of big data and its usage in education are still the hot topic of research. Though
it is in the phase of infancy, still it has the possibility to be a game-changer in the
upcoming future. However, with further development, big data can be efficiently put
to employ and get even more benefits for both the students as well as educators.
738 J. Nayak et al.
References
1. K. Sin, L. Muthu, Application of big data in education data mining and learning analytics—A
literature review. ICTACT J. Soft Comput. 5(4) (2015)
2. Wikipedia, “Big data—Wikipedia, The Free Encyclopedia”, https://en.wikipedia.org/w/index.
php?title=Big_data&oldid=669888993. Accessed (2015)
3. https://www.colocationamerica.com/blog/big-data-and-education. Accessed on 15 Apr (2021)
4. S. Ray, Big data in education. Gravity, Great Lakes Mag. 8–10 (2013)
5. W. Punlumjeak, N. Rachburee, J. Arunrerk, Big data analytics: Student performance prediction
using feature selection and machine learning on microsoft azure platform. J. Telecommun.
Electron. Comput. Eng. (JTEC) 9(1–4), 113–117 (2017)
6. A.S. Nagesh, C.V.S. Satyamurty, K. Akhila, Predicting student performance using KNN
classification in bigdata environment. CVR J. Sci. Technol. 13, 83–87 (2017)
7. A. Almasri, R.S. Alkhawaldeh, E. Çelebi, Clustering-based EMT model for predicting student
performance. Arab. J. Sci. Eng. 45, 10067–10078 (2020)
8. N. Varela et al., Student performance assessment using clustering techniques, in International
Conference on Data Mining and Big Data (Springer, Singapore, 2019). https://doi.org/10.1007/
978-981-32-9563-6_19
9. A. Hamoud, A.S. Hashim, W.A. Awadh, Predicting student performance in higher education
institutions using decision tree analysis. Int. J. Interact. Multimedia Artif. Intell. 5, 26–31 (2018)
10. R. Hasan et al., Student academic performance prediction by using decision tree algorithm, in
2018 4th International Conference on Computer and Information Sciences (ICCOINS). (IEEE,
2018). https://doi.org/10.1109/ICCOINS.2018.8510600
11. I. Singh, A.S. Sabitha, A. Bansal., Student performance analysis using clustering algorithm,
in 2016 6th International Conference-Cloud System and Big Data Engineering (Confluence)
(IEEE, 2016). https://doi.org/10.1109/CONFLUENCE.2016.7508131
12. https://nces.ed.gov/fastfacts/display.asp?id=16. Accessed on 15 July 2021
13. J.S. Catterall, On the social costs of dropping out of school. High Sch. J. 71(1), 19–30 (1987)
14. S. Lee, J.Y. Chung, The machine learning-based dropout early warning system for improving
the performance of dropout prediction. Appl. Sci. 9(15), 3093 (2019)
15. W. Tenpipat, K. Akkarajitsakul, Student dropout prediction: A KMUTT case study, in 2020 1st
International Conference on Big Data Analytics and Practices (IBDAP) (IEEE, 2020). https://
doi.org/10.1109/IBDAP50342.2020.9245457
16. N. Wu et al., CLMS-Net: dropout prediction in MOOCs with deep learning, in Proceedings of
the ACM Turing Celebration Conference-China (2019). https://doi.org/10.1145/3321408.332
2848
17. V. Hegde, P.P. Prageeth, Higher education student dropout prediction and analysis through
educational data mining, in 2018 2nd International Conference on Inventive Systems and
Control (ICISC) (IEEE, 2018). https://doi.org/10.1109/ICISC.2018.8398887
18. L. Haiyang et al., A time series classification method for behaviour-based dropout prediction,
in 2018 IEEE 18th International Conference on Advanced Learning Technologies (ICALT)
(IEEE, 2018). https://doi.org/10.1109/ICALT.2018.00052
19. C. Márquez-Vera et al., Early dropout prediction using data mining: a case study with high
school students. Expert Syst. 33(1), 107–124 (2016)
20. S. Dwivedi, V.S.K. Roshni, Recommender system for big data in education, in 2017 5th National
Conference on E-Learning and E-Learning Technologies (ELELTECH) (IEEE, 2017). https://
doi.org/10.1109/ELELTECH.2017.8074993
21. B. Mondal et al., A course recommendation system based on grades, in 2020 International
Conference on Computer Science, Engineering and Applications (ICCSEA) (IEEE, 2020).
https://doi.org/10.1109/ICCSEA49143.2020.9132845
22. B. Ma, Y. Taniguchi, S. Konomi, Design a course recommendation system based on association
rule for hybrid learning environments. Inf. Process. Soc. Jpn. 7 (2019)
23. S. Alghamdi, N. Alzhrani, H. Algethami, Fuzzy-based recommendation system for university
major selection, in IJCCI (2019). https://doi.org/10.5220/0008071803170324
Big Data in Education: Present and Future 739
24. H. Zhang et al., MCRS: A course recommendation system for MOOCs. Multimedia Tools
Appl. 77(6), 7051–7069 (2018)
25. J. Xiao et al., A personalized recommendation system with combinational algorithm for online
learning. J. Ambient Intell. Humanized Comput. 9(3), 667–677 (2018)
26. K.S. Na, Z. Tasir, Identifying at-risk students in online learning by analysing learning behaviour:
A systematic review, in 2017 IEEE Conference on Big Data and Analytics (ICBDA) (IEEE,
2017). https://doi.org/10.1109/ICBDAA.2017.8284117
27. T. Purwoningsih, H.B. Santoso, Z.A. Hasibuan, Online Learners’ behaviors detection using
exploratory data analysis and machine learning approach, in 2019 Fourth International Confer-
ence on Informatics and Computing (ICIC) (IEEE, 2019). https://doi.org/10.1109/ICIC47613.
2019.8985918
28. M. Al Fanah, M.A. Ansari, Understanding E-learners’ behaviour using data mining techniques,
in Proceedings of the 2019 International Conference on Big Data and Education (2019). https://
doi.org/10.1145/3322134.3322145
29. M. Pérez-Sanagustín et al., Analyzing learners’ behavior beyond the MOOC: An exploratory
study, in European Conference on Technology Enhanced Learning (Springer, Cham, 2019).
https://doi.org/10.1007/978-3-030-29736-7_4
30. N. Yan, O.T.-S. Au, Online learning behavior analysis based on machine learning. Asian Assoc.
Open Univ. J. (2019). https://doi.org/10.1108/AAOUJ-08-2019-0029
31. K.A. Douglas et al., Big data characterization of learner behaviour in a highly technical MOOC
engineering course. J. Learn. Anal. 3(3), 170–192 (2016)
32. X. Hu et al., Construction and application of VR/AR-based STEAM curriculum system in
primary and middle schools under big data background. J. Phys. Conf. Ser. 1624(3) (2020)
IOP Publishing, 2020
33. M. Li et al., The innovative curriculum construction of computer fundamentals course based
on SPOC+ MOOC in higher education, in 2020 15th International Conference on Computer
Science and Education (ICCSE) (IEEE, 2020). https://doi.org/10.1109/ICCSE49874.2020.920
1872
34. X. Liu, Research on the construction of humanities curriculum and evaluation system for
nursing students according to job needs based on big data technology analysis. J. Phys. Conf.
Ser. 1648(3) (2020) IOP Publishing
35. L. Zhang, X. Yang, Y. Zhang, The research on cloud platform construction of mathematics
education curriculum under big data background [C] (2018). https://doi.org/10.25236/iwass.
2018.056
36. S. Jensen, Integrating big data services into an undergraduate mis curriculum. Int. J. Syst. Serv.
Oriented Eng. (IJSSOE) 7(2), 58–73 (2017)
37. D. Niemi, E. Gitin, Using big data to predict student dropouts: Technology affordances for
research, in Proceedings from the International Association for Development of the Information
Society (IADIS) International Conference on Cognition and Exploratory Learning in Digital
Age (2012)
38. S.O. Fadiya, S. Saydam, E.J. Chukwuemeka, Big data in education; Future technology
integration. Int. J. Sci. Technol. 2(8), 65 (2014)
Breast Cancer Mammography
Identification with Deep Convolutional
Neural Network
Abstract Breast cancer is considered as one of the most common invasive diseases
in women and a major cause of cancer death. One of the most effective diagnostic
methods used for the diagnosis of breast cancer is mammography. However, from
last one decade, the use of intelligent-based techniques by the medical expert and
radiologists has been suggested by the breast cancer researchers. Especially, the
application of deep learning such as convolutional neural network (CNN) in particular
is providing effective performance in classifying the mammograms accurately which
can assist imaging specialists. For getting an accurate classification of mammograms,
the CNN model should be trained with more number of labeled mammograms. But
it is not always available to get mammograms more labels. The main purpose of this
experiment is to perform a highly accurate classification of mammograms through
CNN with dense layers. In our study, we developed two classification models such as
the proposed CNN-based models with a single dense layer and CNN with two dense
layers. Here, the dense layers act as a backbone for the CNN model for the accurate
classification of mammograms. This work is intended to improve the performance
of CNN with more dense layers including the preprocessed mammograms having
multi-views. The final experiments from our two proposed models show that the
first CNN model with a single layer is obtaining 100% accuracy for breast cancer
detection in 38.64 s execution time. The second CNN model with two dense layers
is also achieving the same result in 42.64 s execution time.
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 741
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_55
742 P. B. Dash et al.
1 Introduction
The last two decades have witnessed a drastic growth in the number of breast cancer
cases throughout the world. Most of the studies suggested that early detection and
efficient diagnosis can reduce the number of cases. In the modern days, the treat-
ment became advanced due to the latest treatment procedures such as advanced
screening, and electronic imaging systems. The abnormal growth of tumor cells like
two components in the breast is malignant and benign leads to breast cancer. To
make a better-improved treatment of breast cancer, early detection is of great impor-
tance. Mammography is the most common and cost-effective method for the early
detection of breast cancer. Due to the poor clarity and two-dimensional views of
mammograms, it is becoming a complicated task to analyze the mammograms in
a manual process. To get a better analysis for making mammograms more signifi-
cant and useful, many researchers are working on different computational intelligent
algorithms. Good quality with unambiguous mammographic images helps many
medical experts for easy identification of whether the tumor is malignant or benign.
But when the tumor is benign, the expert recommends for next investigations. The
test on histopathological analysis can explore the type of the tumor. Another case
like conventional mammography leads to the non-confirmation of 30% of breast
cancer cases [1]. Due to the dense breast of the younger woman, there is a chance
of falling in the detection rate of breast cancer. For such complex imaging anal-
ysis, machine learning and deep learning have recently recorded a good milestone
for getting effective results and performance in the healthcare domain [2–7] over
traditional techniques.
Computer-aided diagnosis (CAD) can be a great support to doctors and researchers
for the early detection of breast cancer. The success of CAD completely depends on
how the mammogram images are enhanced and more no. relevant features can be
extracted from it [8]. Akselrod-Ballin et al. [9] have used a combined approach of
machine learning and deep learning for early breast cancer detection applying to
digital mammographic images. They have considered XGBoost classifier for selec-
tion of top breast cancer symptomatic features and deep neural network as a compu-
tational model. They have obtained area under the receiver operator characteristic
(ROC-AUC) of 0.91 with a specificity of 77.3% and sensitivity of 87%. Shen et al.
[8] have considered four convolutional network models (ResNet-ResNet, ResNet-
VGG, VGG-VGG, and VGG-ResNet) for the detection of breast cancer on screening
mammograms. They have ensembled these four models for obtaining a high accuracy
rate of classification on heterogeneous mammographic platforms. They obtained an
AUC of 0.95 for the individual model. However, four-model averaging improved the
AUC to 0.98 (sensitivity: 86.7%, specificity: 96.1%).
Among all the deep learning models, the convolutional neural network is the most
effective for image analysis. For a medical expert, it is an essential task to observe
the whole mammogram for a tumor-like lesion screening and density classification.
Charan et al. [10] employed the CNN model to discriminate between normal and
abnormal mammograms. The model was able to get 65% accuracy by dividing the
Breast Cancer Mammography Identification with Deep … 743
breast area using morphological processes, and this greatly improved its classifi-
cation performance. The classification of benign and malignant mammograms was
undertaken by the use of two CNN models, namely AlexNet C1 and GoogLeNet-
C12. This was done by Samala et al. [11]. They have checked during the experiment
that fixing the parameters of some CNN layers is helping to increase the classification
performance and making the model robust. They obtained an AUC of 1 at over 1000
epochs.
Arora et al. [12] have considered five CNN architectures (AlexNet, VGG16,
ResNet, GoogLeNet, and InceptionResNet). They combined them as an ensemble
model to deal with mammographic classification. They achieved an accuracy of
0.88 with an AUC of 0.88. Chougrad et al. [13] developed a deep convolutional
neural network with the best fine-tuning strategy for the analysis of mammograms.
They have taken three different models (VGG16, ResNet50, Inception v3) for the
experiment, and the proposed model obtained the best result 98.23% accuracy and
0.99 AUC. By using the image patches, the efficiency of feature extraction can be
improved to a certain extent. But it does not always help the radiologists to find the
abnormalities in a large volume of mammograms.
In our proposed study, we analyze the breast cancer detection technique based
on deep learning. This paper investigates the importance of dense layer with CNN
instead of randomly taking the no of dense layer for CNN. Here, we explore the
impact of fine-tuned dense layers and experimentally determine the adopted CNN
architecture along with execution time carried out. This is how the paper is organized.
As well as the suggested CNN classification model, Sect. 2 outlines the full workflow
and design structure for the CNN, Sect. 3 is about the brief description of data
preparation and experimental setup. Section 4 describes the details of experimental
results and analysis, and Sect. 5 is for conclusion.
2 Methodology
In the trending world of artificial intelligence, CNN is the most powerful neural
network model for image recognition, image processing, and image classification.
Most of the trending areas where CNN is widely used are objects detection and
face recognition. A CNN is a modern artificial neural network specifically made to
process pixel data. While doing image classification, the CNN model takes images
as input, processes them, and classifies them into several classes properly. This type
of model uses the input images as an array of pixels and its size depends on image
resolution.
744 P. B. Dash et al.
Most of the CNN models are built with a series of convolution layers consisting
of filters (Kernels), pooling, fully connected layers (FC), and activation functions
according to requirement. The CNN architecture is shown in Fig. 1.
The preprocessed images are used for training the CNN model after performing
resizing and normalization process. The more generalized input image size is 224 ×
224 × 3. The second layer of the proposed CNN model is the convolution layer which
is responsible for selecting potential features from the images. The convolutional
layer contains a set of filters of size 3 × 3. In the process of convolution, the filter
moved over the images and the dot products are computed on the pixel values. By
this process, the convolved features are obtained as the output of the convolution
processes. On the obtained convolved features, the pooling operation is performed
for dimensionality reduction. The powerful layer of CNN is the convolution layer.
The image’s pixels must be linked to achieve seamless transition. This data layer
learns what belongs in each square of input by looking at features. This is achievable
because of a set of filters. Every filter is sized 3 by 3. The filters perform dot product
computations using input images and all other filters to compute the result. This is
a convoluted outcome. Although many pooling techniques are being used by many
researchers such as max-pooling, average pooling, and global pooling, we have used
max-pooling [Eqs. (1) and (2)] in this proposed work.
1
N
f avg = xt (2)
2 t=1
The ReLU function [Eq. (3)] is used for both the convolutional layer and pooling
layer. Finally, the extracted features from the pooling process are used for performing
the task of classification. Here, a fully connected layer is used in which the output
layer nodes are directly connected to the previous layer nodes. Here, we have used
the softmax activation [Eq. (4)] function in the output layer to compute the class
label. The selection of appropriate activation functions is one of the crucial parts of
the model as it controls the learning process of the model’s network. We can have
many other alternative activation functions such as sigmoidal, parametric ReLU, and
tangent, which are being used by the researchers as per their requirements. In this
paper, two architectures of the CNN model with Adam optimization algorithm have
been used for the detection of the type of breast cancer on the breast cancer dataset.
In our first architecture, we have taken CNN with one hidden layer neural network
and the second architecture is CNN with two hidden layers. These two architectures
are shown in Figs. 2 and 3, respectively. The complete workflow of this proposed
work is shown in Fig. 4.
The experiments have been carried out on Dell laptop system with Windows 8.1 Pro
64-bit OS, Processor Intel Core (TM) i5-6700 CPU @3.40 GHz (8 CPUs) ~3.4 GHz,
and 4 GB RAM. The proposed models have been developed and tested in a Python
TensorFlow environment with Spyder IDE. The programming environment includes
OpenCV for image processing which converts the raw images to machine-readable
format in the form of a Numpy array. Other important packages such as Tensor-
Flow, Keras, Numpy, Matplotlib, OS, Time, Random, and Pillow are used in this
experiment.
For our experiment, we have considered the benign breast tumor dataset collected
from the IEEE-data port [14], which is the collection of the information of 83 patients
from India. This dataset is including information such as the patient’s clinical history,
histopathological features, and mammograms. Here, the task is to classify the patients
into ten subclasses of benign tumors from their clinical history, histopathological
features, and mammograms.
Breast Cancer Mammography Identification with Deep … 747
The proposed CNN model has been tested on a breast cancer dataset, and the results
and observations are presented in this section. The performances of the proposed
models have been improved by increasing the dense layer in the fully connected
network of CNN. We have considered two proposed architectures for the compar-
ative study. Especially, the considered models will help analyze the execution time
difference with various dense layers.
In this experiment, many performance measures are considered such as precision
(Eq. 5), recall (Eq. 6), F1-Score (Eq. 7), and ROC-AUC to estimate the efficiency of
our proposed models. The performances of the proposed models have been measured
in these metrics. Equation 5 shows the precision as the ratio of accurately detected
positive cases to the total number of expected positive cases. As illustrated in Eq. (6),
recall is the ratio between the total number of true positives and the total number
of true positives with false negatives. Precision and recall can be used to explain
F1-Score in Eq. (7).
TP
Precision = (5)
TP + FP
TP
Recall = (6)
TP + FN
2 × TP
F1 score = (7)
2 × TP + FP + FN
Fig. 5 Accuracy vs No of
Epoch analysis for CNN
with single dense layer
Fig. 6 Accuracy vs No of
Epoch analysis for CNN
with two-dense layer
low in epoch 1 and gradually increases toward epoch 30. Finally, it reaches around
95% accuracy for both the proposed models. In the case of testing these models, the
accuracy curve is quite different. During the testing phase, the test accuracy is coming
100% from epoch 20 onwards without fluctuation with execution time 38.64 s for the
proposed CNN with a single dense layer. However, for the proposed CNN with two
dense layers, the 100% accuracy is coming from epoch 7 onwards without fluctuation
with an execution time of 42.64 s.
Figures 7 and 8 show the loss curve for both proposed CNN models. It has been
observed that the training loss gradually decreases to 0.25 after 17 epochs for CNN
with a single dense layer. In the case of the CNN with two dense layers model, the
training loss comes 0.19 after 10 epochs. The test loss is coming to 0.05 at epoch 10
and 0.07 at epoch 18 for CNN with two dense layers and CNN with a single dense
layer, respectively.
Moreover, we have conducted the ROC analysis on the classification models for
both CNN with a single dense layer and CNN with two dense layers and are presented
750 P. B. Dash et al.
in Figs. 9 and 10, respectively. The CNN with two dense layer classifiers is more
significant and robust to classify the breast cancer types.
5 Conclusion
Deep learning-based methods are considered the most significant approaches for
solving medical domain problems over traditional machine learning algorithms. Such
models are having enormous potential to improve the detection rate of breast cancer
by screening mammography. Our approach may help for future development of a
superior system like CAD which will help the radiologist to identify the most suspi-
cious cases with high priority. In this chapter by using CNN, we have developed
two models for breast cancer mammography classification. The developed methods
Breast Cancer Mammography Identification with Deep … 751
possess the accuracy of 100% in classifying the mammogram images with an execu-
tion time of 38.64 s using the CNN with a single dense layer, whereas 100% accu-
racy with an execution time of 42.64 s for CNN with two-dense layer classification
model. It clearly shows that the dense layer plays a vital role in CNN architecture
for better classification. In the future, we further plan to develop a more efficient
intelligent-based model to work in the direction of identifying the wrong indications
of mammography images, early symptoms of dense breast-related issues, and the
dangerous invasive lobular carcinomas issues.
References
1. A. Bhale, M. Joshi, Automatic sub classification of benign breast tumor, in Smart Trends in
Systems, Security and Sustainability (Springer, Singapore, 2018), pp. 221–232. https://doi.org/
10.1007/978-981-10-6916-1_20
752 P. B. Dash et al.
2. V. Gulshan, L. Peng, M. Coram et al., Development and validation of a deep learning algorithm
for detection of diabetic retinopathy in retinal fundus photographs. JAMA 316(22), 2402–2410
(2016). https://doi.org/10.1001/jama.2016.17216
3. A. Esteva, B. Kuprel, R.A. Novoa et al., Dermatologist-level classification of skin cancer with
deep neural networks. Nature 542(7639), 115–118 (2017) [Published correction appears in
Nature; 546(7660): 686 (2017)]. https://doi.org/10.1038/nature21056
4. B.E. Bejnordi, M. Veta, J.P. van Diest et al. Diagnostic assessment of deep learning algorithms
for detection of lymph node metastases in women with breast cancer. JAMA 318(22), 2199–
2210 (2017). https://doi.org/10.1001/jama.2017.14585
5. P. Rajpurkar, J. Irvin, K. Zhu et al., CheXNet: Radiologist-level pneumonia detection on
chest X-rays with deep learning. arXiv:1711.05225 [cs, stat]. http://arxiv.org/abs/1711.05225.
Published November 2017. Accessed 10 Sept 2018. https://doi.org/10.1371/journal.pmed.100
2686
6. G. Litjens, T. Kooi, B.E. Bejnordi et al., A survey on deep learning in medical image analysis.
Med. Image Anal. 42, 60–88 (2017). https://doi.org/10.1016/j.media.2017.07.005
7. C.D. Lehman, A. Yala, T. Schuster et al., Mammographic breast density assessment using deep
learning: Clinical implementation. Radiology 290(1), 52–58 (2019). https://doi.org/10.1148/
radiol.2018180694
8. L. Shen et al., Deep learning to imp.breast cancer detection on screening mammography. Sci.
Rep. 9(1), 1–12 (2019). https://doi.org/10.1038/s41598-019-48995-4
9. A. Akselrod-Ballin et al., Predicting breast cancer by applying deep learning to linked health
records and mammograms. Radiology 292(2), 331–342 (2019). https://doi.org/10.1148/radiol.
2019182622
10. S. Charan, M.J. Khan, K. Khurshid, Breast cancer detection in mammograms using convo-
lutional neural network, in Processing of the 2018 International Conference on Computing,
Mathematics and Engineering Technologies (iCoMET) (Sukkur, Pakistan, 2018), 3–4 Mar
2018. https://doi.org/10.1109/ICOMET.2018.8346384
11. R.K. Samala, H. Chan, L.M. Hadjiiski, M.A. Helvie, C.D. Richter, Generalization error analysis
for deep convolutional neural network with transfer learning in breast cancer diagnosis. Phys.
Med. Biol. 65(10), 1–13 (2020). https://doi.org/10.1088/1361-6560/ab82e8 (PMID: 32208369)
12. R. Arora, P.K. Rai, B. Raman, Deep feature-based automatic classification of mammo-
grams. Med. Biol. Eng. Comput. (2020). https://doi.org/10.1007/s11517-020-02150-8
(PMID:32200453)
13. H. Chougrad, H. Zouaki, O. Alheyane, Deep convolutional neural networks for breast cancer
screening. Comput. Methods Programs Biomed. 157, 19–30 2018. https://doi.org/10.1016/j.
cmpb.2018.01.011 PMID: 29477427
14. https://ieee-dataport.org/open-access/benign-breast-tumor-dataset. Accessed on 24 Apr 2020
CatBoosting Approach for Anomaly
Detection in IoT-Based Smart Home
Environment
Abstract With the advances in technology, IoT devices have become an integral
part of our daily lives due to their rapid expansion and deployment. As the IoT
devices communicate continuously leads to concerns about privacy and security due
to vulnerabilities which the attackers can exploit. The raw observations of sensor
nodes influence the decision-making in IoT networks. So, it requires an established
method to monitor and analyze the data from the sensor nodes. Therefore, data
collected nodes in IoT systems must resist attacks through anomaly detection imple-
mented on the data collected sensor nodes. This paper proposed the CatBoosting
approach to perform intelligent and adaptive anomaly detection for smart home envi-
ronment devices. The proposed approach assists in supervising the data with normal
and abnormal activities with enhanced resources management. The DS2OS dataset
with several attacks on the IoT environment is considered to evaluate the effective-
ness of the proposed anomaly-based detection system. Moreover, various researchers
existing approaches on the DS2OS dataset is studied and described briefly with the
proposed method. Finally, the experimentation of the simulation model performance
and evaluation is evaluated through different metrics.
1 Introduction
The IoT integrates computing power device machines with a network of heteroge-
neous and constraint nodes connected wirelessly to communicate over the Internet in
real-time. The IoT is a smart network architecture used to exchange information with
agreed protocols without human intervention. IoT uses unique addressing protocols
to interact and cooperate with things (objects) to create new application services. The
stay-connected characteristic of IoT allows for a connection anytime and anywhere
for users through coordinated, integrated, monitored, and controlled computing and
© The Author(s), under exclusive license to Springer Nature Singapore Pte Ltd. 2022 753
J. Nayak et al. (eds.), Computational Intelligence in Data Mining, Smart Innovation,
Systems and Technologies 281, https://doi.org/10.1007/978-981-16-9447-9_56
754 D. K. K. Reddy and H. S. Behera
communication systems [1]. The IoT devices are mounting swiftly and will attain
about twenty-one billion connected devices. Due to the large scale of the objects
and heterogeneity character, IoT devices exchange massive information. This gains
the probability of getting affected by malicious attackers [2]. As IoT is allied with a
wide range of protocols, applications, and platforms, such as health monitoring, smart
house, smart city, smart grid, smart environment, and so on, ensuring a contented
place for spiteful users to launch attacks without any hinder. Perception, transporta-
tion, and application layer are the three levels that make up an IoT security frame-
work [3]. Perception nodes (Ex. sensors) are used to collect data in the perception
layer. The critical security problems in this layer are secure communications between
nodes and lightweight authentication. The transportation layer has three layers: (i)
access network, (ii) core network, and (iii) local area network. The transportation
layer provides ubiquitous access to the perception layer by using wireless networks,
ad-hoc networks, and other methods. As a result, various attacks, such as informa-
tion leaks, network disablement, DoS attack, and so on, become common security
challenges in this layer. An attack detection and prevention method could be imple-
mented before the entire layer suffers a significant loss to address these issues. And
also, IoT devices do not have malware or virus protection software due to the char-
acteristic outcome of the low-memory and low-power nature [4, 5]. The malicious
activity makes IoT devices profoundly susceptible to becoming bots and carrying
out to other network devices. The root cause of current IoT threats and significant
breaches in IoT’s security and privacy issues are determined by ubiquitous, mobile,
unattended, constrained, interdependence, myriad, diversity, and intimacy [6, 7].
A rise in potential security threats could be a result of the increased distribution of
such smart devices. As a result, a dependable, smart, and secure system for detecting
cyber-attacks and recovering itself automatically is required. One of the main goals
of IoT technologies is to monitor, detect anomalies, and detect changes or drifts
in the environment [8]. Our goal is to identify new events that previous models do
not describe to maximize IoT adoption. Anomaly detection is used in smart homes
applications to spot abnormal activity in high-dimensional data. This research aims
to offer a Boosting learning-based method to perform anomaly-based detection in
smart home IoT devices in an aberrant state. The following are the key contributions
of this article:
(1) We demonstrate and implement a CatBoosting ensemble learning-based
anomaly detection technique on categorical IoT traffic traces datasets using
multiclass rather than binary classification.
(2) We show that ensemble learning models outperform traditional machine
learning models in attack and anomaly detection systems in IoT contexts.
(3) We compare the performance of our method to that of other methods on the
DS2OS dataset.
(4) On the categorical dataset, the proposed method performs optimal for anomaly
detection and low false positives.
CatBoosting Approach for Anomaly Detection in IoT-Based … 755
2 Related Works
Anomaly detection in IoT environments is one of the most critical issues that require
immediate attention in the IoT domain. Several researchers have been striving to
secure the IoT environment by using undiscriminating attack detection algorithms.
This section includes the performance of various researchers’ approaches on the
DS2OS dataset, a summary of their study, and some brief information on attack
detection in IoT networks.
Hasan et al. [9] performed anomaly detection for IoT infrastructure through
several ML models performances with artificial neural network (ANN), support
vector machine (SVM), logistic regression (LR), decision tree (DT), and random
forest (RF). Though these ML strategies are equally accurate, other measurements
show that RF outperforms well. Vangipuram et al. [10] proposed a novel imputation
approach to impute missing values in the data. The traditional ML classifier’s perfor-
mance is investigated using an imputed dataset generated by using the proposed
imputation, F-K-means, and K-means methods. The ML classifiers performed well
in categorizing the attacks through the proposed imputation process approach. The
accuracy for attack classes is 99% for malicious control when using the suggested
imputation and classification technique. For the attack classes wrong setup, data
type probing, scan, DoS, spying, and malicious operation achieved the accuracy of
100%. Dash et al. [11] proposed an IoT-based security framework using an adaptive
boosting algorithm with a synthetic minority oversampling technique (SMOTE). The
proposed work first handles the imbalanced nature of the data through SMOTE, and
next adaptive boosting is used to classify the multiclass for anomaly detection. The
proposed framework derives 100% accuracy in identifying normal and abnormal
activity. Cheng et al. [12] proposes hierarchical stacking temporal convolutional
network with semisupervised learning. The semisupervised technique is based on
both unsupervised and supervised learning, where the unlabeled data is trained using a
small portion of labeled data. The experiment results reveal that the proposed method
improves anomaly detection performance with an accuracy of 98.22% with increased
inefficiency. Latif et al. [13] proposed a lightweight random neural network-based
(RaNN) prediction model to forecast the DS2OS attacks. According to the evalu-
ation results, the RaNN model is an advanced scheme of ANN, which obtains an
accuracy of 99.20%. The model outstands well compared to traditional ML clas-
sifiers (ANN, SVM, and DT). Reddy et al. [14] proposed an examination of deep-
learning (DL) neural networks for anomaly exposure systems to estimate the system’s
behavior into untruthful and truthful actions. The DL models show a high likelihood
of exploring performance comparative to machine learning algorithms. The studied
dataset DS2OS is pre-processed by removing the NAN instances, and the remaining
noisy data is encoded into numerical data. The proposed approach derives an accu-
racy of 98.28% in categorizing anomaly detection. Sahu and Mukherjee [15] made
a comparisons study with LR and ANN classification algorithm. The classification
algorithm is trained on the complete dataset, and the algorithms are tested after
removing the feature value. The proposed approach derives an accuracy of 99.99%
756 D. K. K. Reddy and H. S. Behera
in classifying normal and abnormal behavior of anomaly detection. Islam et al. [16]
proposed a study between shallow and deep belief networks for different IoT threats.
Where the shallow models contain DT, RF, and SVM. The DL models contain deep
belief networks, deep neural networks (DNN), long short-term memory (LSTM),
stacked LSTM, bidirectional LSTM. Among the shallow models, SVM produces an
accuracy of 99.44%, and among DL models, bidirectional LSTM produces an accu-
racy of 99.39%. Kumar et al. [17] proposed a novel intrusion detection system with
distributed ensemble design using fog computing. The framework is a combination
of two levels. The first level consists of individual learners with k-NN, Gaussian
naive Bayes, and XGBoost. RF uses the first level prediction results for final classifi-
cation. For most of the attacks, the proposed work shows a detection rate of 99.99%.
Singh and Singh [18] proposed a hyperparameter tuned gradient boosting (GB) algo-
rithm. Initially, the feature selection procedure is being used to reduce the dataset’s
dimensions, which improves the attack and anomaly detection environment. The
next GB algorithm is used with some hyperparameter modification to achieve the
best results. The model outperforms the competition when it comes to identifying
attacks on IoT sensor environments, with an accuracy of 99.40%. Bokka and Sada-
sivam [19] proposed a DL-based DNN to detect attacks in the IoT-based smart home
environment. The proposed model derives an accuracy of 99.42%. For classifying
and addressing various attacks and abnormalities operations on the network, Reddy
et al. [20] proposed a supervised meta-algorithm-based approach known as bagging.
To improve the DS2OS dataset efficiency, the proposed work consists of bagging with
several ML models performances with k-NN, DT, RF, and extra trees classifier. This
proposed work estimates the overall experimentation performance and evaluation of
the simulation model with an accuracy of 99.9%.
3 Proposed Methodology
This section describes the working of the Bossting and CatBoosting approaches.
3.1 Boosting
weak learners are trained sequentially. At each iteration, new learners are modified
with weight for trained observations that the previous learners have poorly classified.
4 Dataset Description
The IoT DS2OS dataset is publicly available from the Kaggle website [22]. The
dataset is different from the conventional network dataset because the traces are
captured from the application layer in the IoT environment DS2OS. The environ-
ment is architected by different simulated IoT sites with lightning control sensors,
movement sensors, temperature control sensors, battery sensors, door lock sensors,
heating control sensors, washing machine sensors, and questing service sensors
spread across different locations. The dataset consists of 13 features comprising
357,952 data instances of categorical data consisting of numerical and nominal data
and the timestamp feature. The timestamp is discrete, so this feature is excluded. The
paper [14] provides in depth description of the dataset with various feature and its
datatype.
CatBoosting Approach for Anomaly Detection in IoT-Based … 759
Table 1 Percentage
Attacks Total Aggregated Anomalous
distribution of anomalies
instances data (%) data (%)
Wrong setup 122 0.03 1.21
Data probing 342 0.09 3.41
Spying 532 0.14 5.31
Malicious 805 0.22 8.03
operation
Malicious 889 0.24 8.87
control
Scan 1547 0.43 15.44
DoS 5780 1.61 57.70
Normal 347,935 97.24 –
5 Result Analysis
It is critical to test the procedure to ensure that the visualization and study anal-
ysis for the dataset are adequate. The results analysis of the simulation work on
the DS2OS dataset reveals the undetected anomalies cases. This section represents
the result analysis of the proposed CatBoosting algorithm. The dataset consists of
multi-classification with 357,952 instances. 80% of the data is used for training (i.e.,
286,361), and 20% of the data (i.e., 71,591) is used for testing purposes. The evalua-
tion measures that are taken into consideration are confusion matrix, accuracy, FPR,
760 D. K. K. Reddy and H. S. Behera
6 Conclusion
Attacks and malicious activity pose a more severe threat to privacy and security than
ever before due to steep increase of IoT device and applications. The promising exper-
tise knowledge of IoT makes people’s life handiness but still privacy and security
issues are key apprehension. Several existing anomaly detection methods and models
on DS2OS are considered to analyze and improve the detection performance. As a
762 D. K. K. Reddy and H. S. Behera
References
1. D.K.K. Reddy, H.S. Behera, J. Nayak, B. Naik, U. Ghosh, P.K. Sharma, Exact greedy algorithm
based split finding approach for intrusion detection in fog-enabled IoT environment. J. Inf.
Secur. Appl. 60(June), 102866 (2021)
2. Z.-K. Zhang, M.C.Y. Cho, C.-W. Wang, C.-W. Hsu, C.-K. Chen, S. Shieh, IoT security: Ongoing
challenges and research opportunities, in 2014 IEEE 7th International Conference on Service-
Oriented Computing and Applications (2014), pp. 230–234
3. Q. Jing, A.V. Vasilakos, J. Wan, J. Lu, D. Qiu, Security of the Internet of Things: perspectives
and challenges. Wirel. Netw. 20(8), 2481–2501 (2014)
4. J. Nayak, P.S. Kumar, D.K.K. Reddy, B. Naik, D. Pelusi, Machine learning and big data in
cyber-physical system: Methods, applications and challenges, in Cognitive Engineering for
Next Generation Computing (Wiley, 2021), pp. 49–91
5. A.P. Johnson, H. Al-Aqrabi, R. Hill, Bio-inspired approaches to safety and security in IoT-
enabled cyber-physical systems. Sensors 05 Feb 2020. [Online]. Available: https://www.mdpi.
com/1424-8220/20/3/844
6. K.B. Prakash, J. Nayak, B.T.P. Madhav, S. Padmanaban, V.E. Balas, Big data analytics and
intelligent techniques for smart cities (CRC Press, Boca Raton, 2021)
7. W. Zhou, Y. Jia, A. Peng, Y. Zhang, P. Liu, The effect of IoT new features on security and
privacy: New threats, existing solutions, and challenges yet to be solved. IEEE Internet Things
J. 6(2), 1606–1616 (2019)
8. U. Ghosh, M. Alazab, A.K. Bashir, A.-S.K. Pathan, Deep Learning for Internet of Things
Infrastructure, vol. s8-IX, no. 234 (CRC Press, Boca Raton, 2021)
9. M. Hasan, M.M. Islam, M.I.I. Zarif, M.M.A. Hashem, Attack and anomaly detection in IoT
sensors in IoT sites using machine learning approaches. Internet of Things 7, 100059 (2019)
10. R. Vangipuram, R.K. Gunupudi, V.K. Puligadda, J. Vinjamuri, A machine learning approach
for imputation and anomaly detection in <scp>IoT</scp> environment. Expert Syst. 37(5),
647–661 (2020)
11. P.B. Dash, J. Nayak, B. Naik, E. Oram, S.H. Islam, Model based IoT security framework using
multiclass adaptive boosting with SMOTE. Secur. Priv. 3(5), 1–15 (2020)
12. Y. Cheng, Y. Xu, H. Zhong, Y. Liu, Leveraging semisupervised hierarchical stacking temporal
convolutional network for anomaly detection in IoT communication. IEEE Internet Things J.
8(1), 144–155 (2021)
13. S. Latif, Z. Zou, Z. Idrees, J. Ahmad, A novel attack detection scheme for the industrial Internet
of Things using a lightweight random neural network. IEEE Access 8, 89337–89350 (2020)
14. D.K. Reddy, H.S. Behera, J. Nayak, P. Vijayakumar, B. Naik, P.K. Singh, Deep neural network
based anomaly detection in Internet of Things network traffic tracking for the applications of
future smart cities. Trans. Emerg. Telecommun. Technol. 32(7), 1–26 (2021)
15. N.K. Sahu, I. Mukherjee, Machine learning based anomaly detection for IoT Network:
(Anomaly detection in IoT Network), in 2020 4th International Conference on Trends in
Electronics and Informatics (ICOEI)(48184) (2020), no. Icoei, pp. 787–794
764 D. K. K. Reddy and H. S. Behera
16. N. Islam et al., Towards Machine Learning Based Intrusion Detection in IoT Networks. Comput.
Mater. Contin. 69(2), 1801–1821 (2021)
17. P. Kumar, G.P. Gupta, R. Tripathi, A distributed ensemble design based intrusion detection
system using fog computing to protect the internet of things networks. J. Ambient Intell.
Humaniz. Comput. 0123456789, Nov (2020)
18. K. Singh, N. Singh, An ensemble hyper-tuned model for IoT sensors attacks and anomaly
detection. J. Inf. Optim. Sci. 41(7), 1715–1739 (2020)
19. R. Bokka, T. Sadasivam, Deep learning model for detection of attacks in the Internet of Things
based smart home environment. Expert. Syst. 37(5), 725–735 (2021)
20. D.K.K. Reddy, H.S. Behera, G.M.S. Pratyusha, R. Karri, Ensemble bagging approach for IoT
sensor based anomaly detection, in Information, vol. 11(5) (Springer, Singapore, 2021), pp
647–665
21. L. Prokhorenkova, G. Gusev, A. Vorobev, A.V. Dorogush, A. Gulin, CatBoost: unbiased
boosting with categorical features. Adv. Neural Inf. Process. Syst. 2018-Decem(Section 4),
6638–6648, Jun (2017)
22. R. Pinto, M2M USING OPC UA (2020). [Online]. Available: https://ieee-dataport.org/open-
access/m2m-using-opc-ua. [Accessed: 18 Sep 2020]