Outlier Detection in Sensor Data Using Ensemble Learning
Outlier Detection in Sensor Data Using Ensemble Learning
com
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com
ScienceDirect
Procedia Computer Science 00 (2020) 000–000
Procedia
Procedia Computer
Computer Science
Science 00 (2020)
176 (2020) 000–000
1160–1169 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia
Abstract
Abstract
Analyzing sensor data from a production environment is quite challenging because of the high-dimensional nature of the data. In
Analyzing sensor
addition, the data from
generated a production
data is in the formenvironment is quite
of time-series, wherechallenging
the sequencebecause of the high-dimensional
of registrations may be of utmostnature of the data.
significance. In
One
addition, the generated data is in the form of time-series, where the sequence of registrations may be of utmost significance.
of the main goals of the paper is to determine if the given time-series of feature combinations is normal or rare. This goal could One
of the main goals
successfully of the paper
be achieved is to determine
by combining multipleifmachine
the givenlearning
time-series of feature
models. In this combinations
paper, a slidingiswindow
normal or rare.ensemble
based This goalmethod
could
successfully be achieved by combining multiple machine learning models. In this paper, a sliding window based ensemble
is proposed to detect outliers in a streaming fashion. The proposed method uses a combination of clustering algorithms to construct method
is proposed(clusters)
subgroups to detect representing
outliers in a streaming fashion.
different data The proposed
structures. method uses
These structures are alater
combination
used in a of clustering
one-class algorithmsalgorithm
classification to construct
to
subgroups
identfy the(clusters) representing
outliers. Thus, different
if a pattern doesdata structures.
not belong These
to any structures
of the common arestructures
later usedorinclusters,
a one-class
it isclassification algorithm
an outlier. Further, to
based
identfy the outliers. Thus, if a pattern does not belong to any of the common
on the rare pattern classification, machine failures could be predicted in advance. structures or clusters, it is an outlier. Further, based
on the rare pattern classification, machine failures could be predicted in advance.
c 2020
© 2020 The
The Authors.
Author(s). Published
Published byby ElsevierB.V.
Elsevier B.V.
c 2020
This The
is an Author(s).
open Published
access article underbythe
Elsevier B.V.
CC BY-NC-ND
BY-NC-ND license
license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
(https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open
Peer-review access
under article under
responsibility
responsibility of thescientific
ofthe
KES CC BY-NC-ND
committee
International. license
of the(https://creativecommons.org/licenses/by-nc-nd/4.0/)
KES International.
Peer-review under responsibility of KES International.
Keywords: Outlier detection; ensemble learning; clustering; classification; sensor data; Industry 4.0; machine learning
Keywords: Outlier detection; ensemble learning; clustering; classification; sensor data; Industry 4.0; machine learning
1. Introduction
1. Introduction
One of the most common challenges often faced in production environment is how to deal with patterns failing to
One oftothe
conform most common
common challenges
structures. often faced inand
These extraordinary production environment
rarely occurring is how
patterns are to deal outliers.
called with patterns failing
Hence, to
it is of
conform to common structures. These extraordinary and rarely occurring patterns are called outliers.
great importance to detect outliers and handle them one way or another to achieve optimal production performance. Hence, it is of
great
Thus, importance to detect
to predict any outliers
failures and to
in order handle
avoidthem one way
unplanned or another
machine to achieve
stoppages, it isoptimal production
important performance.
to accurately predict
Thus, to predict any failures in order to avoid unplanned machine stoppages, it is important to accurately
outliers by applying appropriate detection model(s). Outlier detection techniques normally belong to one of the predict
fol-
outliers by applying appropriate detection model(s). Outlier detection techniques normally belong to
lowing learning approaches: supervised, unsupervised, semi-supervised and ensemble. Supervised learning requires one of the fol-
lowing learning approaches: supervised, unsupervised, semi-supervised and ensemble. Supervised learning
labeled training data while unsupervised learning does not need training data. Semi-supervised learning combines a requires
labeled training
small data set ofdata while
labeled unsupervised
data with a largelearning
data setdoes not need training
of unlabeled training data
data.[1].
Semi-supervised
In comparison,learning
ensemblecombines
learninga
small data set of labeled data with a large data set of unlabeled training data [1]. In comparison, ensemble learning
uses a combination of multiple learning algorithms to obtain better prediction accuracy; for instance, clustering and
classification as mentioned in this paper. In fact, ensemble learning has not been applied to outlier detection to a large
extend [2] for this reason it is considered in this paper.
To summarize, the main contributions in this paper are as follow:
• Presenting an outlier detection method based on sliding window and ensemble learning for time-series
• Considering several learning models and choosing the best combination
• Testing the accuracy and the validity of the proposed detection method
The paper is structured as follows. Section 2 explains the motivation of this paper. Section 3 describes the proposed
ensemble method. Section 4 evaluates the performance of the method. Section 5 presents the related work. Section 6
concludes the paper and points out the future research directions.
2. Motivation
This study is performed in real-world settings in a medium-sized manufacturing company Dolle [3] in Northern
Denmark. One of the main goals of this case study “is to find patterns in sensor data that can help to predict and
ultimately prevent machine’s failures”. Fig. 1, provides a snapshot of the subset of sensor readings (time-series) from
a unique machine in the production facility. The snapshot contains readings of five sensors ranging from 1 to 5. The
timestamp granularity is one second. The sensor readings are binary encoded into 1 or 0 for event and non-event,
respectively.
In addition, the sensor readings in Fig. 1 are read as follows: 1 (entry of raw material): the value 1 = entry, 0 = no
entry; 2 (exit of final product): the value 1 = exit, 0 = no exit; 3 (raw material quality sensing): the value 1 = not good,
0 = good; 4 (machine error): the value 1 = error, 0 = no error; and 5 (machine alarm) : the value 1 = alarm, 0 = no
alarm. Undoubtedly, Dolle’s case study clearly illustrate the challenges of analyzing data from industrial sensors.
3.1. Overview
This section presents the proposed ensemble method based on unsupervised machine learning algorithms for pre-
dicting outlier in a time series. The ensemble method detects the outliers by applying clustering prior to classification.
Also to be noted: sensor data from a production environment presents itself as a steady stream, thus the performed
analysis must be dynamic or real-time. In brief, the proposed ensemble method comprises of the following two steps,
which are presented in Fig. 2 and further discussed as:
Creating Applying
structures the model
3.2. Structures
Since a single sensor reading does not reveal any categorical information, then it follows that it does not make
much sense to construct clusters of single sensor readings. Thus, in order to predict the future incidents a sequence of
sensor readings should be grouped, where one structure or pattern is produced for each group. This grouping does not
cause readings to become aggregated into a single output reading instead the readings retain their separate identities.
In fact, in the case of time series it is not a good idea to just look at arrays succeeding one another. Owing to the fact
that if the starting point was different the structures probably end up in totally different clusters or individual readings
might fall into separate windows. In order to get all possible combinations of arrays the concept of sliding window is
used in this paper. Each sliding window (see Fig. 3) has the size of n observations. Each window can be considered as
a snapshot in time, representing a combination of sensor reading over a given period of time. Hence, the purpose of
this process is to group/cluster a specific window together with other similar snapshots in time.
3.3. Clustering
The clustering technique introduced in this paper is inspired by classical clustering algorithms such as, K-Means,
Mean-Shift, K-Modes, DBSCAN etc. [4] combined with biclustering algorithms for instance, Coclust Mod/SpecMod,
Spectral Biclustering/Co-clustering, Delta-biclustering etc. [5]. Classical clustering identifies global patterns by ap-
plying clustering of either rows or columns of a data matrix, whereas, biclustering or co-clustering identifies local
patterns by allowing simultaneous clustering of the objects (rows) and attributes (columns) of a data matrix. Bicluster-
ing assists in detecting abnormal patterns who are in the earliest stages of a machine fault. Biclustering also facilitate
in reducing the dimensionality of the high-dimensional data by picking up only a small sets of objects and attributes
based on the local patterns. Further, the objects and attributes may participate in multiple clusters or does not partic-
ipate in any cluster at all. Overlapping clusters are not considered in this case study for the reason that overlapping
may cause uncertainty problem and may well requires fuzzy logic to solve the problem.
Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169 1163
4 Author name / Procedia Computer Science 00 (2020) 000–000
In the first place, classical clustering and biclustering algorithms are applied in parallel to the sets of preceding
sensor readings or structures in order to identify clusters containing global and local patterns, respectively (see Fig. 4).
Next, a new cluster label either Cluster1 (group of common structures) or Cluster0 (group of rare structures) is
assigned to each cluster by using the size (number of structures in each cluster) [6] and a voting technique. Cluster
sizes are used as the key factor for identifying groups of observations that are distinct from the majority of the data.
Indeed, outliers are rare events by definition [7], as a result larger clusters may merge to Cluster1 and smaller clusters
vice versa. As each structure belongs to multiple clusters due to the use of different clustering algorithms then the final
cluster label assigned to each structure (either Cluster1 or Cluster0 ) could be determined by iterative [8] or majority
[9] voting techniques. Moreover, Algorithm 1 specifies the sequence of computational steps.
Algorithm 1
1. Slide across the sensor readings with a sliding interval of one second and construct data structures using a time-
based window of size n.
(a) Assign each structure to the corresponding clusters using classical and biclustering algorithms (each algo-
rithm works individually and in parallel on the whole set of structures).
(b) Merge clusters of the similar structures using cluster analysis and/or reallocate clusters using a voting tech-
nique.
Generally, one-class classification involves training a model on preceding normal/common structures. Different
models such as, One-Class Support Vector Machines (OC-SVM), Isolation Forest (IForest), Local Outlier Factor
(LOF) a few more [10] could be used as a one-class classifier. One-class algorithms could be effective for imbalanced
datasets [11] (as in this case study). In imbalanced datasets most of the data belongs only to one class. Later on, the
classifier predicts the outliers when new sensor events are recorded. Due to the fact that one-class classifier fits on
cluster(s) with common/normal structures, thus any novel rare pattern that does not belongs to one of the common
structures is labeled as an outlier (see Fig. 5). In addition, Algorithm 2 specifies the sequence of computational steps.
1164 Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169
Author name / Procedia Computer Science 00 (2020) 000–000 5
...
Training dataset (normal structures)
Batch Processing
One-class
Step 2: Classifier
classifier
Outlier
predictions
...
New patterns
Real-time Processing
Fig. (5) One-class classifier
Algorithm 2
The following example will help in understanding the proposed ensemble method. Two clustering algorithms are
used for this example, a classical clustering algorithm K-Modes [12] and a Delta-biclustering algorithm [13]. K-
Modes is an extension of K-Means and it is commonly used for clustering categorical variables (as in this case study).
Further, K-Modes clustering uses Hamming distance or dissimilarity rather than Euclidean distance and modes instead
of means. Conversely, Delta-biclustering uses greedy iterative approach that involves deleting rows and columns
from the main data matrix. Specifically, it searches for sub-matrices with suitable row and column correlation until
a Mean Squared Residue (MSR) score lower than a specific threshold or delta value is achieved. To start with, a data
matrix/structure (DM1 ) with 5x5 dimension is considered. The data matrix contains readings of five sensors ranging
from 1 to 5 with a granularity of one second representing a 5 second time period.
Subsequently, the matrix is compared with one of the centroid’s using the K-Modes algorithm with a total dissimi-
larity score of “6” calculated based on the global pattern. The total dissimilarity score is not significant which is why
the data matrix identifies as a C1 cluster. Similarly, Delta-biclustering is applied to the main data matrix (without over-
Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169 1165
6 Author name / Procedia Computer Science 00 (2020) 000–000
lapping and fixed number of sub-clusters). The following sub-matrix, B1 could be materialized, where the sub-matrix
reveals a local pattern. Hence, two clusters (C1 and B1 ) are identified for DM1 .
Further, cluster analysis is performed once both the clustering algorithms finish their execution for a specific time
interval and clusters are identified for all the structures in the training dataset. Historical data and domain knowledge
indicates that outliers are rare events manifesting as low frequency or density inferring that small clusters can be an
indication of outliers [14]. Hence, based on the cluster analysis the most similar clusters are merged into one cluster ei-
ther Cluster1 or Cluster0 , where Cluster1 indicates that normal structures belong to that cluster, and Cluster0 indicates
that rare structures belong to that cluster. In order to determine the final clustering result for DM1 , the following for-
mula is used: cluster.valuecount() > (dataset size*(outlier percentage/100)). Based on the proposed formula, C1 has
a higher value count and B1 has a lower value count than the threshold value (dataset size*(outlier percentage/100)),
respectively. In addition, a truth table with logic AND gate assigns Cluster0 to DM1 . The reason to use a truth table
rather than a majority/iterative voting technique is due to the fact that in this running example there are no overlapping
clusters and as the dataset in imbalanced, hence more caution is required to identify abnormal or rare behaviours.
DM1 is allocated Cluster0 group for the reason that biclustering uncovers a fault in the machine.
Lastly, the one-class classifier is trained against the normal cluster (Cluster1 ) and the patterns in the streaming
sensor data are matched in near real-time against known structures in order to determined that if the sensor data
stream is normal or rare.
4. Evaluation
In this section, the proposed ensemble method is evaluated in terms of validation of the clustering results and pre-
diction accuracy of one-class classification. In general, clustering validation can be categorized into two categories:
external validation and internal validation. The external validation requires comparing the results of a cluster analysis
to externally available results, for example availability of class labels, whereas, internal validation uses the internal in-
formation of the clustering process to evaluate the goodness of the clustering structure without reference to external in-
formation. The real-world dataset used in this paper is not labeled. Hence, internal evaluation measures are used to val-
1166 Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169
Author name / Procedia Computer Science 00 (2020) 000–000 7
idate the quality of the diverse clustering algorithms. In order to measure the goodness of the clustering structures there
are well known measures available, like Silhouette coefficient (sklearn.metrics.silhouette score), Calinski-Harabasz
index (sklearn.metrics.calinski harabasz score), Davies-Bouldin index (sklearn.metrics.davies bouldin score) and so
on [15]. Similarly, to evaluate how good the one-class classifier is, test accuracy is commonly used as a measure [16].
Test accuracy involves predicting completely unseen patterns.
4.1. Setup
In order to store structures, a high-level array Python Dask (version 2.16.0) [17] is used. Dask is an alternative to
Spark. It can scale to a cluster of 100s of machines. It can be used when arrays are really heavy (i.e. they do not fit
into main memory). Dask divides arrays into smaller chunks and operate on them in parallel. In order to choose a
suitable classical clustering and a biclustering algorithm for the proposed ensemble learning method, several classi-
cal and biclustering algorithms are experimented with in this paper. With respect to classical algorithms: K-Means,
Mean-Shift, K-Modes, DBSCAN and Birch are examined, whereas, in connection with biclustering algorithms: Co-
clustMod, CoclustSpecMod, CoclustInfo, SpectralCoclustering and SpectralBiclustering are assessed. On the whole
Python (version 3.6.7), scikit-learn (version 0.21) [18] and coclust (version 0.2.1) [19] are used to test all the cluster-
ing algorithms. Likewise, several one-class classification algorithms: One-class Support Vector Machine (OC-SVM),
Isolation Forest (IForest) and Local Outlier Factor (LOF) are also tested to detect the outliers. The clustering and clas-
sification algorithms have run on a single-node hardware platform with a 8th Generation Intel Core i7-8565U 1.8 GHz
processor (Turbo 4.60 GHz, 4 Cores 8 Threads, 8MB Cache), 32GB DDR4 RAM and 1TB M.2 PCIe NVMe SSD.
The reported results were obtained by running each algorithm 20 times with averaging over the best 5 executions. For
each algorithm, the default parameter values were used, except for the number of clusters which was set differently in
order to achieve optimal clustering performance.
In order to evaluate the performance of the proposed ensemble method, firstly, Table 1 and Table 2 refer to the clus-
tering results obtained by applying five classical clustering and five biclustring/coclustring algorithms. The algorithms
are evaluated based on the internal clustering validation by using the following three measures: Silhouette coefficient,
Calinski-Harabasz index and Davies-Bouldin index. These three measure are commonly used when the labels are not
known and the evaluation must be performed using the model itself.
The score for Silhouette coefficient is bounded between -1 for incorrect clustering and +1 for highly dense clus-
tering. Similarly, higher Calinski-Harabasz score relates to a model with better defined clusters and a lower Davies-
Bouldin index relates to a model with better separation between the clusters. It can be observed from Table 1 that
K-Modes and Birch methods have performed quite well and from Table 2 it is obvious that SpectralCoclustring and
CoclustSpecMod have performed well. Since, multiple clustering solutions could provide more insights than only one
solution for this reason the combinaion of K-Modes and SpectralCoclustering in the proposed ensemble method could
improve cluster accuracy and validity as compared to the individual clustering solution. Further, Fig. 6a and Fig. 6b
present the test accuracy of the one-class classification models based on the clustering results. Among the one-class
classifiers, OC-SVM is an unsupervised algorithm that learns a decision function for novelty detection by classifying
novel data as similar or different to the training set. In order to perform outlier detection from moderate to high dimen-
sional datasets LOF and IForest could be used. Furthermore, Fig. 6a and Fig. 6b show that the classifiers are tested
with novel patterns that are both normal and rare, respectively. The results show that the average test accuracy of the
one-class classifiers (based on the combined K-Modes and SpectralBiclustering) for novel patterns that are normal
is approximately 85% and for rare patterns it is approximately 80%, which is pretty good as compared to individual
clustering algorithms.
80 80
70 70
60 60
Test accuracy (%)
50 50
40 40
30 30
20 20
10 10
0 0
SpectralBiclustering SpectralBiclustering
K-Modes K-Modes and SpectralBiclustering K-Modes K-Modes and SpectralBiclustering
5. Related Work
This section mainly concentrates on the previous work done in relation to outlier or anomaly detection. A com-
prehensive survey of outlier detection methodologies by [20] emphasized that outliers should be identified as early
as possible in order to prevent catastrophic consequences. Further, a novel method for outlier detection in high-
dimensional data based on principal component analysis and local kernel density estimation is proposed by [21].
An ensemble method for outlier identification based on supervised learning is presented by [22]. A study to identify
anomalous flights using a combination of hierarchical and DBSCAN clustering algorithms is conducted by [23]. Fur-
ther, [24] proposed a lambda architecture for real-time anomaly detection on large data sets where the performance of
the system is critical. A real-time anomaly detection method based on univariate auto-regressive data-driven models
for data streams that uses a moving window to identify data that differs from historical patterns is put forth by [25].
Also, a methodology to predict the probability of future occurrence of industrial asset failures using logistic regres-
sion is offered by [26]. In addition, [27] suggested in combining or ensembling different machine learning techniques
in order to predict the machine related faults more accurately. Likewise, to accurately detect defective combustion
engines based on high-dimensional vibration data using unsupervised anomaly detection techniques such as, LOF,
OC-SVM and IForest is presented by [28]. Further, the work by [29] proposed an unsupervised real-time anomaly
detection algorithm that utilizes LSTM-based Auto Encoder to detect anomalies at an earlier stage of production
line. Also, [30] proposed the fundamental blocks for successfully building ensembles for unsupervised outlier detec-
tion. The blocks are: learning accurately, using diverse models and combining these models. In addition, an IForest
1168 Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169
Author name / Procedia Computer Science 00 (2020) 000–000 9
based anomaly detection framework using sliding window is proposed by [31]. Moreover, a clustering and classifi-
cation based technique for fault diagnosis, is proposed by [32]. First, the technique determines the optimal number
of clusters. Second, it performs clustering by selecting the best algorithm (DBSCAN) from multiple clustering algo-
rithms. Third, it labels the dataset. Last, it uses a classification tool (TPOT) [33] to find the best classification method
(sklearn.ensemble.ExtraTreesClassifier) that results in high accuracy. Similarly, [34] suggested combining cluster-
ing and classification ensembles to identify breast cancer profiles. The ensemble of two clustering (K-Medoids and
K-Means) and two classification (Artificial Neural Network and K-Nearest Neighbor) algorithms is used to refine
clustering results. In contrast to these works, the work presented in this paper employs time-based window on an
unlabelled real-world binary dataset with more than 2 million data points from a manufacturing company. The work
emphasis on making use of global and local clustering algorithms along with one-class classifier to detect outliers in
real-time with high accuracy. The other vital feature of the proposed work is the processing of a large volume of data
all at once, such as the clustering and the training of the classifier is done in the batch mode. This classifier is then
used in near real-time to predict the outliers.
On the whole the focus of these previous works is on various aspects and recent advancements of of outlier de-
tection. The work presented in this paper considers a number of the recommendations presented in those previous
works, however, most of them focus on theoretical rather than practical issues in relation to outlier detection, while
the focus of this paper is to provide a real-word application of the detection techniques. Further, it can also be seen
from the the previous works, outlier detection in manufacturing industry remains briefly addressed, for that reason
this paper is among the very few using ensemble methods for this purpose. The proposed ensemble method combines
the clustering and classification algorithms for outlier detection in order to enhance operational efficiency for small
and medium-sized enterprises (SMEs).
Accurately detecting outliers in sensor data from a production environment is an important issue. Outlier detection
intends to find patterns in data that do not belong to one of the common structures. This paper introduced a new
ensemble learning model that is inspired by classical clustering (global patterns) combined with the principles of
biclustering (local patterns) to construct clusters representing different types of structures. These structures are later
used in a one-class classifier to detect the outliers. The accuracy of the ensemble method is tested on a real-world
dataset from the production industry. The results have verified the accuracy and the validity of the proposed method.
For the future work, a near real-time alerting service will be implemented that would trigger an alert with some
related information for correct and timely response by classifying the suspicious events as, faulty input, no output,
internal machine error and so on. In addition, gaps in the sensor data streams will be filled at the data preprocessing
stage for the reason that these gaps could lead to inaccurate results. Also, the execution time of the proposed method
will be determined in order to measure the performance.
Acknowledgements
This research is supported by University College of Northern Denmark - Research and Development funding and
Dolle A/S.
References
[1] Chandola, Varun, Arindam Banerjee, and Vipin Kumar. (2009) “Anomaly detection: A survey.” ACM Computing Surveys 41 (3): 1–58.
[2] Rayana, Shebuti, Wen Zhong, and Leman Akoglu. (2016) “Sequential ensemble learning for outlier detection: A bias-variance perspective.”
Proceedings of the 16th IEEE International International Conference on Data Mining: 1167–1172.
[3] Dolle A/S https://www.dolle.eu.
[4] Weng, Ziqiao. (2019) “From conventional machine learning to AutoML.” Journal of Physics: Conference Series, 1207 (1): 012015.
[5] Kaiser, Sebastian, Rodrigo Santamaria, Tatsiana Khamiakova, Martin Sill, Roberto Theron, Luis Quintales, Friedrich Leisch, Ewoud De Troyer,
and Maintainer Sebastian Kaiser. (2020) Package ‘biclust’ http://cran.fhcrc.org/web/packages/biclust/biclust.pdf.
[6] Loureiro, Antonio, Luis Torgo, and Carlos Soares. (2004) “Outlier detection using clustering methods: a data cleaning application.” Proceed-
ings of KDNet Symposium on Knowledge-based systems for the Public Sector.
Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169 1169
10 Author name / Procedia Computer Science 00 (2020) 000–000
[7] Reunanen, Niko, Tomi Räty, Juho J. Jokinen, Tyler Hoyt, and David Culler. (2019) “Unsupervised online detection and prediction of outliers
in streams of sensor data.” International Journal of Data Science and Analytics: 1–30.
[8] Khedairia, Soufiane, and Mohamed Tarek Khadir. (2019) “A multiple clustering combination approach based on iterative voting process.”
Journal of King Saud University-Computer and Information Sciences.
[9] Soufiane, Khedairia, Houari Imene, Ababsia Manel, and Khadir Mohamed Tarek. (2019) “Clustering ensemble approach based on Incremental
Llearning.” Proceedings of the 9th International Conference on Information Systems and Technologies: 1–7.
[10] One-Class Classification Algorithms https://machinelearningmastery.com/one-class-classification-algorithms.
[11] Bellinger, Colin, Shiven Sharma, and Nathalie Japkowicz. (2012) “One-class versus binary classification: Which and when?.” Proceedings of
the 11th IEEE International Conference on Machine Learning and Applications: 102–106.
[12] Huang, Zhexue. (1998) “Extensions to the k-means algorithm for clustering large data sets with categorical values.” Data mining and knowledge
discovery 2 (3): 283–304.
[13] Cheng, Yizong, and George M. Church. (2000) “Biclustering of expression data.” Proceedings of the International Conference on Intelligent
Systems for Molecular Biology: pp. 93–103.
[14] Suri, NNR Ranga, and G. Athithan. (2019) “Outlier detection: techniques and applications.” Springer Nature.
[15] Clustering performance evaluation https://scikit-learn.org/stable/modules/clustering.html.
[16] Outlier detection https://scikit-learn.org/stable/modules/outlier detection.html.
[17] Dask https://pypi.org/project/dask/.
[18] Scikit-learn https://scikit-learn.org/stable.
[19] Coclust https://pypi.org/project/coclust.
[20] Hodge, Victoria, and Jim Austin. (2004) “A survey of outlier detection methodologies.” Artificial Intelligence Review 22 (2): 85–126.
[21] Kamalov, Firuz, and Ho Hon Leung. (2020) “Outlier detection in high dimensional data.” Journal of Information & Knowledge Management
19 (1): 2040013.
[22] Alexandropoulos, Stamatios-Aggelos N., Sotiris B. Kotsiantis, Violetta E. Piperigou, and Michael N. Vrahatis. (2020) “A new ensemble
method for outlier identification.” Proceedings of the 10th IEEE International Conference on Cloud Computing, Data Science & Engineering:
769–774.
[23] Sheridan, Kevin, Tejas G. Puranik, Eugene Mangortey, Olivia J. Pinon-Fischer, Michelle Kirby, and Dimitri N. Mavris. (2020) “An application
of DBSCAN clustering for flight anomaly detection during the approach phase.” Technical report - AIAA Scitech Forum.
[24] Liu, Xiufeng, Nadeem Iftikhar, Per Sieverts Nielsen, and Alfred Heller. (2016) “Online anomaly energy consumption detection using lambda
architecture.” Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery: 193–209.
[25] Hill, David J., and Barbara S. Minsker. (2010) “Anomaly detection in streaming environmental sensor data: A data-driven modeling approach.”
Environmental Modelling & Software 25 (9): 1014–1022.
[26] Langone, Rocco, Alfredo Cuzzocrea, and Nikolaos Skantzos. (2020) “A flexible and interpretable framework for predicting anomalous behavior
in Industry 4.0 environments.” Proceedings of the International Conference on Advanced Information Networking and Applications: 693–702.
[27] Angelopoulos, Angelos, Emmanouel T. Michailidis, Nikolaos Nomikos, Panagiotis Trakadas, Antonis Hatziefremidis, Stamatis Voliotis, and
Theodore Zahariadis. (2020) “Tackling faults in the Industry 4.0 era — A survey of machine-learning solutions and key aspects.” Sensors 20
(1): 109.
[28] Muhr, David, Shailesh Tripathi, and Herbert Jodlbauer. (2020) “Divide and conquer anomaly detection: A case study predicting defective
engines.” Procedia Manufacturing 42: 57–61.
[29] Hsieh, Ruei-Jie, Jerry Chou, and Chih-Hsiang Ho. (2019) “Unsupervised online anomaly detection on multivariate sensing time series data for
smart manufacturing.” Proceedings of the 12th IEEE Conference on Service-Oriented Computing and Applications: 90–97.
[30] Zimek, Arthur, Ricardo JGB Campello, and Jörg Sander. (2014) “Ensembles for unsupervised outlier detection: challenges and research
questions a position paper” Acm Sigkdd Explorations Newsletter 15 (1): 11–22.
[31] Ding, Zhiguo, and Minrui Fei. (2013) “An anomaly detection approach based on isolation forest algorithm for streaming data using sliding
window.” IFAC Proceedings 46 (20): 12–17.
[32] Dahiya, Sonika, Harshit Nanda, Jatin Artwani, and Jatin Varshney. (2020) “Using clustering techniques and classification mechanisms for fault
diagnosis.” International Journal of Advanced Trends in Computer Science and Engineering 9 (2): 2138 – 2146.
[33] TPOTClassifier https://epistasislab.github.io/tpot/api/.
[34] Agrawal, Utkarsh, Daniele Soria, Christian Wagner, Jonathan Garibaldi, Ian O. Ellis, John MS Bartlett, David Cameron, Emad A. Rakha, and
Andrew R. Green. (2019) “Combining clustering and classification ensembles: A novel pipeline to identify breast cancer profiles.” Artificial
intelligence in medicine 97: 27–37.