0% found this document useful (0 votes)

33 views

Outlier Detection in Sensor Data Using Ensemble Learning

This document discusses a method for detecting outliers in sensor data using ensemble learning. The method uses a sliding window approach to analyze sensor data streams. It applies clustering algorithms to construct subgroups that represent different data structures, and then uses these subgroups in a one-class classification algorithm to identify outliers. Patterns that do not belong to any of the common structures or clusters are considered outliers. The approach aims to detect machine failures by identifying rare patterns in sensor data.

Uploaded by

sofia oct

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views

Outlier Detection in Sensor Data Using Ensemble Learning

Uploaded by

sofia oct

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Available online at www.sciencedirect.

com
Available online at www.sciencedirect.com
Available online at www.sciencedirect.com

ScienceDirect
Procedia Computer Science 00 (2020) 000–000
Procedia
Procedia Computer
Computer Science
Science 00 (2020)
176 (2020) 000–000
1160–1169 www.elsevier.com/locate/procedia
www.elsevier.com/locate/procedia

24th International Conference on Knowledge-Based and Intelligent Information & Engineering

24th International Conference on Knowledge-Based
Systems and Intelligent Information & Engineering
Systems
Outlier
Outlier Detection
Detection in
in Sensor
Sensor Data
Data using
using Ensemble
Ensemble Learning
Learning
Nadeem Iftikhara,∗, Thorkil Baattrup-Andersenbb , Finn Ebertsen Nordbjergaa , Karsten
Nadeem Iftikhara,∗, Thorkil Baattrup-Andersen
Jeppesenaa , Finn Ebertsen Nordbjerg , Karsten
a
Jeppesen
Department of Computer Science, University College of Northern Denmark, Aalborg 9200, Denmark
a Departmentof Computer Science, University
b Dolle College7741,
of Northern Denmark, Aalborg 9200, Denmark
A/S, Frøstrup Denmark
b Dolle A/S, Frøstrup 7741, Denmark

Abstract
Abstract
Analyzing sensor data from a production environment is quite challenging because of the high-dimensional nature of the data. In
Analyzing sensor
addition, the data from
generated a production
data is in the formenvironment is quite
of time-series, wherechallenging
the sequencebecause of the high-dimensional
of registrations may be of utmostnature of the data.
significance. In
One
addition, the generated data is in the form of time-series, where the sequence of registrations may be of utmost significance.
of the main goals of the paper is to determine if the given time-series of feature combinations is normal or rare. This goal could One
of the main goals
successfully of the paper
be achieved is to determine
by combining multipleifmachine
the givenlearning
time-series of feature
models. In this combinations
paper, a slidingiswindow
normal or rare.ensemble
based This goalmethod
could
successfully be achieved by combining multiple machine learning models. In this paper, a sliding window based ensemble
is proposed to detect outliers in a streaming fashion. The proposed method uses a combination of clustering algorithms to construct method
is proposed(clusters)
subgroups to detect representing
outliers in a streaming fashion.
different data The proposed
structures. method uses
These structures are alater
combination
used in a of clustering
one-class algorithmsalgorithm
classification to construct
to
subgroups
identfy the(clusters) representing
outliers. Thus, different
if a pattern doesdata structures.
not belong These
to any structures
of the common arestructures
later usedorinclusters,
a one-class
it isclassification algorithm
an outlier. Further, to
based
identfy the outliers. Thus, if a pattern does not belong to any of the common
on the rare pattern classification, machine failures could be predicted in advance. structures or clusters, it is an outlier. Further, based
on the rare pattern classification, machine failures could be predicted in advance.
c 2020

© 2020 The
The Authors.
Author(s). Published
Published byby ElsevierB.V.
Elsevier B.V.
c 2020

This The
is an Author(s).
open Published
access article underbythe
Elsevier B.V.
CC BY-NC-ND
BY-NC-ND license
license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
(https://creativecommons.org/licenses/by-nc-nd/4.0)
This is an open
Peer-review access
under article under
responsibility
responsibility of thescientific
ofthe
KES CC BY-NC-ND
committee
International. license
of the(https://creativecommons.org/licenses/by-nc-nd/4.0/)
KES International.
Peer-review under responsibility of KES International.
Keywords: Outlier detection; ensemble learning; clustering; classification; sensor data; Industry 4.0; machine learning
Keywords: Outlier detection; ensemble learning; clustering; classification; sensor data; Industry 4.0; machine learning

1. Introduction
1. Introduction
One of the most common challenges often faced in production environment is how to deal with patterns failing to
One oftothe
conform most common
common challenges
structures. often faced inand
These extraordinary production environment
rarely occurring is how
patterns are to deal outliers.
called with patterns failing
Hence, to
it is of
conform to common structures. These extraordinary and rarely occurring patterns are called outliers.
great importance to detect outliers and handle them one way or another to achieve optimal production performance. Hence, it is of
great
Thus, importance to detect
to predict any outliers
failures and to
in order handle
avoidthem one way
unplanned or another
machine to achieve
stoppages, it isoptimal production
important performance.
to accurately predict
Thus, to predict any failures in order to avoid unplanned machine stoppages, it is important to accurately
outliers by applying appropriate detection model(s). Outlier detection techniques normally belong to one of the predict
fol-
outliers by applying appropriate detection model(s). Outlier detection techniques normally belong to
lowing learning approaches: supervised, unsupervised, semi-supervised and ensemble. Supervised learning requires one of the fol-
lowing learning approaches: supervised, unsupervised, semi-supervised and ensemble. Supervised learning
labeled training data while unsupervised learning does not need training data. Semi-supervised learning combines a requires
labeled training
small data set ofdata while
labeled unsupervised
data with a largelearning
data setdoes not need training
of unlabeled training data
data.[1].
Semi-supervised
In comparison,learning
ensemblecombines
learninga
small data set of labeled data with a large data set of unlabeled training data [1]. In comparison, ensemble learning

∗ Nadeem Iftikhar. Tel.: +45-7269-1579.

∗ Nadeem Iftikhar.naif@ucn.dk
Tel.: +45-7269-1579.
E-mail address:
E-mail address: naif@ucn.dk
1877-0509 c 2020 The Author(s). Published by Elsevier B.V.
1877-0509
This c 2020

is an open Thearticle
access Author(s).
underPublished by Elsevierlicense
the CC BY-NC-ND B.V. (https://creativecommons.org/licenses/by-nc-nd/4.0/)
1877-0509
This is an ©
open2020 Thearticle
access Authors. Published
under the CC by Elsevier B.V.
BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0/)
Peer-review under
This is an open responsibility
access of KES
article under International.
the CC BY-NC-ND license (https://creativecommons.org/licenses/by-nc-nd/4.0)
Peer-review under responsibility of KES International.
Peer-review under responsibility of the scientific committee of the KES International.
10.1016/j.procs.2020.09.112
Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169 1161
2 Author name / Procedia Computer Science 00 (2020) 000–000

uses a combination of multiple learning algorithms to obtain better prediction accuracy; for instance, clustering and
classification as mentioned in this paper. In fact, ensemble learning has not been applied to outlier detection to a large
extend [2] for this reason it is considered in this paper.
To summarize, the main contributions in this paper are as follow:

• Presenting an outlier detection method based on sliding window and ensemble learning for time-series
• Considering several learning models and choosing the best combination
• Testing the accuracy and the validity of the proposed detection method

The paper is structured as follows. Section 2 explains the motivation of this paper. Section 3 describes the proposed
ensemble method. Section 4 evaluates the performance of the method. Section 5 presents the related work. Section 6
concludes the paper and points out the future research directions.

2. Motivation

This study is performed in real-world settings in a medium-sized manufacturing company Dolle [3] in Northern
Denmark. One of the main goals of this case study “is to find patterns in sensor data that can help to predict and
ultimately prevent machine’s failures”. Fig. 1, provides a snapshot of the subset of sensor readings (time-series) from
a unique machine in the production facility. The snapshot contains readings of five sensors ranging from 1 to 5. The
timestamp granularity is one second. The sensor readings are binary encoded into 1 or 0 for event and non-event,
respectively.

Fig. (1) Sensor data snapshot

In addition, the sensor readings in Fig. 1 are read as follows: 1 (entry of raw material): the value 1 = entry, 0 = no
entry; 2 (exit of final product): the value 1 = exit, 0 = no exit; 3 (raw material quality sensing): the value 1 = not good,
0 = good; 4 (machine error): the value 1 = error, 0 = no error; and 5 (machine alarm) : the value 1 = alarm, 0 = no
alarm. Undoubtedly, Dolle’s case study clearly illustrate the challenges of analyzing data from industrial sensors.

3. Proposed Ensemble Method

3.1. Overview

This section presents the proposed ensemble method based on unsupervised machine learning algorithms for pre-
dicting outlier in a time series. The ensemble method detects the outliers by applying clustering prior to classification.
Also to be noted: sensor data from a production environment presents itself as a steady stream, thus the performed
analysis must be dynamic or real-time. In brief, the proposed ensemble method comprises of the following two steps,
which are presented in Fig. 2 and further discussed as:

• Step 1 (Batch processing):

– Creating structures over whole dataset
– Identifying clusters using a classical clustering and a biclustering algorithm
– Merging and/or reallocation of clusters
– Training a one-class classifier over normal/common data structures or cluster(s)
• Step 2 (Real-time processing):
– Creating structures over recent incoming data and detecting outliers by applying the classification model
1162 Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169
Author name / Procedia Computer Science 00 (2020) 000–000 3

Sensor data Train the model

streams over “normal”
data structures
Sensor Creating Identifying Training the
data store structures clusters model

Time-based sliding windows Clustering One-class

of the whole dataset classification
Batch processing

Creating Applying
structures the model

Time-based sliding windows Detecting

of the recent incoming data outliers
Real-time processing
Fig. (2) Proposed ensemble method

3.2. Structures

Since a single sensor reading does not reveal any categorical information, then it follows that it does not make
much sense to construct clusters of single sensor readings. Thus, in order to predict the future incidents a sequence of
sensor readings should be grouped, where one structure or pattern is produced for each group. This grouping does not
cause readings to become aggregated into a single output reading instead the readings retain their separate identities.
In fact, in the case of time series it is not a good idea to just look at arrays succeeding one another. Owing to the fact
that if the starting point was different the structures probably end up in totally different clusters or individual readings
might fall into separate windows. In order to get all possible combinations of arrays the concept of sliding window is
used in this paper. Each sliding window (see Fig. 3) has the size of n observations. Each window can be considered as
a snapshot in time, representing a combination of sensor reading over a given period of time. Hence, the purpose of
this process is to group/cluster a specific window together with other similar snapshots in time.

Fig. (3) Time-based sliding window

3.3. Clustering

The clustering technique introduced in this paper is inspired by classical clustering algorithms such as, K-Means,
Mean-Shift, K-Modes, DBSCAN etc. [4] combined with biclustering algorithms for instance, Coclust Mod/SpecMod,
Spectral Biclustering/Co-clustering, Delta-biclustering etc. [5]. Classical clustering identifies global patterns by ap-
plying clustering of either rows or columns of a data matrix, whereas, biclustering or co-clustering identifies local
patterns by allowing simultaneous clustering of the objects (rows) and attributes (columns) of a data matrix. Bicluster-
ing assists in detecting abnormal patterns who are in the earliest stages of a machine fault. Biclustering also facilitate
in reducing the dimensionality of the high-dimensional data by picking up only a small sets of objects and attributes
based on the local patterns. Further, the objects and attributes may participate in multiple clusters or does not partic-
ipate in any cluster at all. Overlapping clusters are not considered in this case study for the reason that overlapping
may cause uncertainty problem and may well requires fuzzy logic to solve the problem.
Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169 1163
4 Author name / Procedia Computer Science 00 (2020) 000–000

Merging and/or reassigning of clusters based

Cluster1 Cluster0
on cluster analysis (e.g. size of the cluster etc.)
and/or a voting technique

C1 ... Cn B1 ... Bn Identification of clusters of structures

relevant to frequent patterns and other
rare events

Classical clustering algorithm Biclustering algorithm

(global patterns) (local patterns)

Training dataset (structures)

...
Batch processing
Fig. (4) Training diverse clustering algorithms

In the first place, classical clustering and biclustering algorithms are applied in parallel to the sets of preceding
sensor readings or structures in order to identify clusters containing global and local patterns, respectively (see Fig. 4).
Next, a new cluster label either Cluster1 (group of common structures) or Cluster0 (group of rare structures) is
assigned to each cluster by using the size (number of structures in each cluster) [6] and a voting technique. Cluster
sizes are used as the key factor for identifying groups of observations that are distinct from the majority of the data.
Indeed, outliers are rare events by definition [7], as a result larger clusters may merge to Cluster1 and smaller clusters
vice versa. As each structure belongs to multiple clusters due to the use of different clustering algorithms then the final
cluster label assigned to each structure (either Cluster1 or Cluster0 ) could be determined by iterative [8] or majority
[9] voting techniques. Moreover, Algorithm 1 specifies the sequence of computational steps.

Algorithm 1

1. Slide across the sensor readings with a sliding interval of one second and construct data structures using a time-
based window of size n.

(a) Assign each structure to the corresponding clusters using classical and biclustering algorithms (each algo-
rithm works individually and in parallel on the whole set of structures).
(b) Merge clusters of the similar structures using cluster analysis and/or reallocate clusters using a voting tech-
nique.

3.4. One-class Classification

Generally, one-class classification involves training a model on preceding normal/common structures. Different
models such as, One-Class Support Vector Machines (OC-SVM), Isolation Forest (IForest), Local Outlier Factor
(LOF) a few more [10] could be used as a one-class classifier. One-class algorithms could be effective for imbalanced
datasets [11] (as in this case study). In imbalanced datasets most of the data belongs only to one class. Later on, the
classifier predicts the outliers when new sensor events are recorded. Due to the fact that one-class classifier fits on
cluster(s) with common/normal structures, thus any novel rare pattern that does not belongs to one of the common
structures is labeled as an outlier (see Fig. 5). In addition, Algorithm 2 specifies the sequence of computational steps.
1164 Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169
Author name / Procedia Computer Science 00 (2020) 000–000 5

Step 1: Fitting one-class classifier

One-class
classifier
using normal structures

...
Training dataset (normal structures)

Batch Processing

One-class
Step 2: Classifier

classifier
Outlier
predictions

...
New patterns
Real-time Processing
Fig. (5) One-class classifier

Algorithm 2

1. Train a one-class classifier using only normal structures.

2. Classify the newly arrived pattern as an outlier, if it does not belongs to one of the normal structures.

3.5. Running Example

The following example will help in understanding the proposed ensemble method. Two clustering algorithms are
used for this example, a classical clustering algorithm K-Modes [12] and a Delta-biclustering algorithm [13]. K-
Modes is an extension of K-Means and it is commonly used for clustering categorical variables (as in this case study).
Further, K-Modes clustering uses Hamming distance or dissimilarity rather than Euclidean distance and modes instead
of means. Conversely, Delta-biclustering uses greedy iterative approach that involves deleting rows and columns
from the main data matrix. Specifically, it searches for sub-matrices with suitable row and column correlation until
a Mean Squared Residue (MSR) score lower than a specific threshold or delta value is achieved. To start with, a data
matrix/structure (DM1 ) with 5x5 dimension is considered. The data matrix contains readings of five sensors ranging
from 1 to 5 with a granularity of one second representing a 5 second time period.

Data matrix (DM1 ) Cluster centroid (C1 ) Dissimilarity

1 2 3 4 5 1 2 3 4 5
     
 1 0 0 0 0   1 0 0 0 0   0  Total dissimilarity score
  



 
 0 
 1 0 1 0 0   1 0 1 0 0   
   
 1 1 1 1 1  =⇒ K Modes  1 1 1 0 0  =⇒  2 
  =⇒ 6
     
 1 1 0 1 1   1 1 0 0 0   2 
     
1 0 0 1 1 1 0 0 0 0 2

Subsequently, the matrix is compared with one of the centroid’s using the K-Modes algorithm with a total dissimi-
larity score of “6” calculated based on the global pattern. The total dissimilarity score is not significant which is why
the data matrix identifies as a C1 cluster. Similarly, Delta-biclustering is applied to the main data matrix (without over-
Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169 1165
6 Author name / Procedia Computer Science 00 (2020) 000–000

lapping and fixed number of sub-clusters). The following sub-matrix, B1 could be materialized, where the sub-matrix
reveals a local pattern. Hence, two clusters (C1 and B1 ) are identified for DM1 .

Data matrix (DM1 )

1 2 3 4 5 Sub-matrix (B1 )
 
 1 0 0 0 0  4 5
   
 1 0 1 0 0   1 1 
  
 
 1 1 1 1 1  =⇒ Biclustering  1 1 
   
 1 1 0 1 1  1 1
 
1 0 0 1 1

Further, cluster analysis is performed once both the clustering algorithms finish their execution for a specific time
interval and clusters are identified for all the structures in the training dataset. Historical data and domain knowledge
indicates that outliers are rare events manifesting as low frequency or density inferring that small clusters can be an
indication of outliers [14]. Hence, based on the cluster analysis the most similar clusters are merged into one cluster ei-
ther Cluster1 or Cluster0 , where Cluster1 indicates that normal structures belong to that cluster, and Cluster0 indicates
that rare structures belong to that cluster. In order to determine the final clustering result for DM1 , the following for-
mula is used: cluster.valuecount() > (dataset size*(outlier percentage/100)). Based on the proposed formula, C1 has
a higher value count and B1 has a lower value count than the threshold value (dataset size*(outlier percentage/100)),
respectively. In addition, a truth table with logic AND gate assigns Cluster0 to DM1 . The reason to use a truth table
rather than a majority/iterative voting technique is due to the fact that in this running example there are no overlapping
clusters and as the dataset in imbalanced, hence more caution is required to identify abnormal or rare behaviours.
DM1 is allocated Cluster0 group for the reason that biclustering uncovers a fault in the machine.

Data matrix (DM1 )

1 2 3 4 5
 
 0 0 0 0 0  K-Modes Bicluster Final clustering result
 
 1 0 0 0 0 
 
 1 1 0 1 1  =⇒ C 1 And B1 =⇒ Cluster 0
 
 1 1 0 1 1 
 
0 0 0 1 1

Lastly, the one-class classifier is trained against the normal cluster (Cluster1 ) and the patterns in the streaming
sensor data are matched in near real-time against known structures in order to determined that if the sensor data
stream is normal or rare.

4. Evaluation

In this section, the proposed ensemble method is evaluated in terms of validation of the clustering results and pre-
diction accuracy of one-class classification. In general, clustering validation can be categorized into two categories:
external validation and internal validation. The external validation requires comparing the results of a cluster analysis
to externally available results, for example availability of class labels, whereas, internal validation uses the internal in-
formation of the clustering process to evaluate the goodness of the clustering structure without reference to external in-
formation. The real-world dataset used in this paper is not labeled. Hence, internal evaluation measures are used to val-
1166 Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169
Author name / Procedia Computer Science 00 (2020) 000–000 7

idate the quality of the diverse clustering algorithms. In order to measure the goodness of the clustering structures there
are well known measures available, like Silhouette coefficient (sklearn.metrics.silhouette score), Calinski-Harabasz
index (sklearn.metrics.calinski harabasz score), Davies-Bouldin index (sklearn.metrics.davies bouldin score) and so
on [15]. Similarly, to evaluate how good the one-class classifier is, test accuracy is commonly used as a measure [16].
Test accuracy involves predicting completely unseen patterns.

4.1. Setup

In order to store structures, a high-level array Python Dask (version 2.16.0) [17] is used. Dask is an alternative to
Spark. It can scale to a cluster of 100s of machines. It can be used when arrays are really heavy (i.e. they do not fit
into main memory). Dask divides arrays into smaller chunks and operate on them in parallel. In order to choose a
suitable classical clustering and a biclustering algorithm for the proposed ensemble learning method, several classi-
cal and biclustering algorithms are experimented with in this paper. With respect to classical algorithms: K-Means,
Mean-Shift, K-Modes, DBSCAN and Birch are examined, whereas, in connection with biclustering algorithms: Co-
clustMod, CoclustSpecMod, CoclustInfo, SpectralCoclustering and SpectralBiclustering are assessed. On the whole
Python (version 3.6.7), scikit-learn (version 0.21) [18] and coclust (version 0.2.1) [19] are used to test all the cluster-
ing algorithms. Likewise, several one-class classification algorithms: One-class Support Vector Machine (OC-SVM),
Isolation Forest (IForest) and Local Outlier Factor (LOF) are also tested to detect the outliers. The clustering and clas-
sification algorithms have run on a single-node hardware platform with a 8th Generation Intel Core i7-8565U 1.8 GHz
processor (Turbo 4.60 GHz, 4 Cores 8 Threads, 8MB Cache), 32GB DDR4 RAM and 1TB M.2 PCIe NVMe SSD.
The reported results were obtained by running each algorithm 20 times with averaging over the best 5 executions. For
each algorithm, the default parameter values were used, except for the number of clusters which was set differently in
order to achieve optimal clustering performance.

4.2. Results and Discussion

In order to evaluate the performance of the proposed ensemble method, firstly, Table 1 and Table 2 refer to the clus-
tering results obtained by applying five classical clustering and five biclustring/coclustring algorithms. The algorithms
are evaluated based on the internal clustering validation by using the following three measures: Silhouette coefficient,
Calinski-Harabasz index and Davies-Bouldin index. These three measure are commonly used when the labels are not
known and the evaluation must be performed using the model itself.

Table (1) Classical clustering algorithms validation.

Model Silhouette Coefficient Calinski-Harabasz Index Davies-Bouldin Index

K-Means 0.48 139 1.88

Mean-Shift 0.57 52 1.47
K-Modes 0.66 172 1.24
DBSCAN 0.46 15.3 1.40
Birch 0.67 145 1.63

Table (2) Biclustering algorithms validation.

Model Silhouette Coefficient Calinski-Harabasz Index Davies-Bouldin Index

CoclustMod 0.53 141 1.63

CoclustSpecMod 0.61 156 1.46
CoclustInfo 0.42 75 2.06
SpectralCoclustering 0.70 162 1.49
SpectralBiclustering 0.51 113 1.78
Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169 1167
8 Author name / Procedia Computer Science 00 (2020) 000–000

The score for Silhouette coefficient is bounded between -1 for incorrect clustering and +1 for highly dense clus-
tering. Similarly, higher Calinski-Harabasz score relates to a model with better defined clusters and a lower Davies-
Bouldin index relates to a model with better separation between the clusters. It can be observed from Table 1 that
K-Modes and Birch methods have performed quite well and from Table 2 it is obvious that SpectralCoclustring and
CoclustSpecMod have performed well. Since, multiple clustering solutions could provide more insights than only one
solution for this reason the combinaion of K-Modes and SpectralCoclustering in the proposed ensemble method could
improve cluster accuracy and validity as compared to the individual clustering solution. Further, Fig. 6a and Fig. 6b
present the test accuracy of the one-class classification models based on the clustering results. Among the one-class
classifiers, OC-SVM is an unsupervised algorithm that learns a decision function for novelty detection by classifying
novel data as similar or different to the training set. In order to perform outlier detection from moderate to high dimen-
sional datasets LOF and IForest could be used. Furthermore, Fig. 6a and Fig. 6b show that the classifiers are tested
with novel patterns that are both normal and rare, respectively. The results show that the average test accuracy of the
one-class classifiers (based on the combined K-Modes and SpectralBiclustering) for novel patterns that are normal
is approximately 85% and for rare patterns it is approximately 80%, which is pretty good as compared to individual
clustering algorithms.

OC-SVM LOF IForest OC-SVM LOF IForest

90 90

80 80

70 70

60 60
Test accuracy (%)

Test accuracy (%)

50 50

40 40

30 30

20 20

10 10

0 0
SpectralBiclustering SpectralBiclustering
K-Modes K-Modes and SpectralBiclustering K-Modes K-Modes and SpectralBiclustering

(a) Novel patterns (normal) (b) Novel patterns (rare)

Fig. (6) Cluster-based one-class ensemble methods for classification

5. Related Work

This section mainly concentrates on the previous work done in relation to outlier or anomaly detection. A com-
prehensive survey of outlier detection methodologies by [20] emphasized that outliers should be identified as early
as possible in order to prevent catastrophic consequences. Further, a novel method for outlier detection in high-
dimensional data based on principal component analysis and local kernel density estimation is proposed by [21].
An ensemble method for outlier identification based on supervised learning is presented by [22]. A study to identify
anomalous flights using a combination of hierarchical and DBSCAN clustering algorithms is conducted by [23]. Fur-
ther, [24] proposed a lambda architecture for real-time anomaly detection on large data sets where the performance of
the system is critical. A real-time anomaly detection method based on univariate auto-regressive data-driven models
for data streams that uses a moving window to identify data that differs from historical patterns is put forth by [25].
Also, a methodology to predict the probability of future occurrence of industrial asset failures using logistic regres-
sion is offered by [26]. In addition, [27] suggested in combining or ensembling different machine learning techniques
in order to predict the machine related faults more accurately. Likewise, to accurately detect defective combustion
engines based on high-dimensional vibration data using unsupervised anomaly detection techniques such as, LOF,
OC-SVM and IForest is presented by [28]. Further, the work by [29] proposed an unsupervised real-time anomaly
detection algorithm that utilizes LSTM-based Auto Encoder to detect anomalies at an earlier stage of production
line. Also, [30] proposed the fundamental blocks for successfully building ensembles for unsupervised outlier detec-
tion. The blocks are: learning accurately, using diverse models and combining these models. In addition, an IForest
1168 Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169
Author name / Procedia Computer Science 00 (2020) 000–000 9

based anomaly detection framework using sliding window is proposed by [31]. Moreover, a clustering and classifi-
cation based technique for fault diagnosis, is proposed by [32]. First, the technique determines the optimal number
of clusters. Second, it performs clustering by selecting the best algorithm (DBSCAN) from multiple clustering algo-
rithms. Third, it labels the dataset. Last, it uses a classification tool (TPOT) [33] to find the best classification method
(sklearn.ensemble.ExtraTreesClassifier) that results in high accuracy. Similarly, [34] suggested combining cluster-
ing and classification ensembles to identify breast cancer profiles. The ensemble of two clustering (K-Medoids and
K-Means) and two classification (Artificial Neural Network and K-Nearest Neighbor) algorithms is used to refine
clustering results. In contrast to these works, the work presented in this paper employs time-based window on an
unlabelled real-world binary dataset with more than 2 million data points from a manufacturing company. The work
emphasis on making use of global and local clustering algorithms along with one-class classifier to detect outliers in
real-time with high accuracy. The other vital feature of the proposed work is the processing of a large volume of data
all at once, such as the clustering and the training of the classifier is done in the batch mode. This classifier is then
used in near real-time to predict the outliers.
On the whole the focus of these previous works is on various aspects and recent advancements of of outlier de-
tection. The work presented in this paper considers a number of the recommendations presented in those previous
works, however, most of them focus on theoretical rather than practical issues in relation to outlier detection, while
the focus of this paper is to provide a real-word application of the detection techniques. Further, it can also be seen
from the the previous works, outlier detection in manufacturing industry remains briefly addressed, for that reason
this paper is among the very few using ensemble methods for this purpose. The proposed ensemble method combines
the clustering and classification algorithms for outlier detection in order to enhance operational efficiency for small
and medium-sized enterprises (SMEs).

6. Conclusions and Future Works

Accurately detecting outliers in sensor data from a production environment is an important issue. Outlier detection
intends to find patterns in data that do not belong to one of the common structures. This paper introduced a new
ensemble learning model that is inspired by classical clustering (global patterns) combined with the principles of
biclustering (local patterns) to construct clusters representing different types of structures. These structures are later
used in a one-class classifier to detect the outliers. The accuracy of the ensemble method is tested on a real-world
dataset from the production industry. The results have verified the accuracy and the validity of the proposed method.
For the future work, a near real-time alerting service will be implemented that would trigger an alert with some
related information for correct and timely response by classifying the suspicious events as, faulty input, no output,
internal machine error and so on. In addition, gaps in the sensor data streams will be filled at the data preprocessing
stage for the reason that these gaps could lead to inaccurate results. Also, the execution time of the proposed method
will be determined in order to measure the performance.

Acknowledgements

This research is supported by University College of Northern Denmark - Research and Development funding and
Dolle A/S.

References

[1] Chandola, Varun, Arindam Banerjee, and Vipin Kumar. (2009) “Anomaly detection: A survey.” ACM Computing Surveys 41 (3): 1–58.
[2] Rayana, Shebuti, Wen Zhong, and Leman Akoglu. (2016) “Sequential ensemble learning for outlier detection: A bias-variance perspective.”
Proceedings of the 16th IEEE International International Conference on Data Mining: 1167–1172.
[3] Dolle A/S https://www.dolle.eu.
[4] Weng, Ziqiao. (2019) “From conventional machine learning to AutoML.” Journal of Physics: Conference Series, 1207 (1): 012015.
[5] Kaiser, Sebastian, Rodrigo Santamaria, Tatsiana Khamiakova, Martin Sill, Roberto Theron, Luis Quintales, Friedrich Leisch, Ewoud De Troyer,
and Maintainer Sebastian Kaiser. (2020) Package ‘biclust’ http://cran.fhcrc.org/web/packages/biclust/biclust.pdf.
[6] Loureiro, Antonio, Luis Torgo, and Carlos Soares. (2004) “Outlier detection using clustering methods: a data cleaning application.” Proceed-
ings of KDNet Symposium on Knowledge-based systems for the Public Sector.
Nadeem Iftikhar et al. / Procedia Computer Science 176 (2020) 1160–1169 1169
10 Author name / Procedia Computer Science 00 (2020) 000–000

[7] Reunanen, Niko, Tomi Räty, Juho J. Jokinen, Tyler Hoyt, and David Culler. (2019) “Unsupervised online detection and prediction of outliers
in streams of sensor data.” International Journal of Data Science and Analytics: 1–30.
[8] Khedairia, Soufiane, and Mohamed Tarek Khadir. (2019) “A multiple clustering combination approach based on iterative voting process.”
Journal of King Saud University-Computer and Information Sciences.
[9] Soufiane, Khedairia, Houari Imene, Ababsia Manel, and Khadir Mohamed Tarek. (2019) “Clustering ensemble approach based on Incremental
Llearning.” Proceedings of the 9th International Conference on Information Systems and Technologies: 1–7.
[10] One-Class Classification Algorithms https://machinelearningmastery.com/one-class-classification-algorithms.
[11] Bellinger, Colin, Shiven Sharma, and Nathalie Japkowicz. (2012) “One-class versus binary classification: Which and when?.” Proceedings of
the 11th IEEE International Conference on Machine Learning and Applications: 102–106.
[12] Huang, Zhexue. (1998) “Extensions to the k-means algorithm for clustering large data sets with categorical values.” Data mining and knowledge
discovery 2 (3): 283–304.
[13] Cheng, Yizong, and George M. Church. (2000) “Biclustering of expression data.” Proceedings of the International Conference on Intelligent
Systems for Molecular Biology: pp. 93–103.
[14] Suri, NNR Ranga, and G. Athithan. (2019) “Outlier detection: techniques and applications.” Springer Nature.
[15] Clustering performance evaluation https://scikit-learn.org/stable/modules/clustering.html.
[16] Outlier detection https://scikit-learn.org/stable/modules/outlier detection.html.
[17] Dask https://pypi.org/project/dask/.
[18] Scikit-learn https://scikit-learn.org/stable.
[19] Coclust https://pypi.org/project/coclust.
[20] Hodge, Victoria, and Jim Austin. (2004) “A survey of outlier detection methodologies.” Artificial Intelligence Review 22 (2): 85–126.
[21] Kamalov, Firuz, and Ho Hon Leung. (2020) “Outlier detection in high dimensional data.” Journal of Information & Knowledge Management
19 (1): 2040013.
[22] Alexandropoulos, Stamatios-Aggelos N., Sotiris B. Kotsiantis, Violetta E. Piperigou, and Michael N. Vrahatis. (2020) “A new ensemble
method for outlier identification.” Proceedings of the 10th IEEE International Conference on Cloud Computing, Data Science & Engineering:
769–774.
[23] Sheridan, Kevin, Tejas G. Puranik, Eugene Mangortey, Olivia J. Pinon-Fischer, Michelle Kirby, and Dimitri N. Mavris. (2020) “An application
of DBSCAN clustering for flight anomaly detection during the approach phase.” Technical report - AIAA Scitech Forum.
[24] Liu, Xiufeng, Nadeem Iftikhar, Per Sieverts Nielsen, and Alfred Heller. (2016) “Online anomaly energy consumption detection using lambda
architecture.” Proceedings of the International Conference on Big Data Analytics and Knowledge Discovery: 193–209.
[25] Hill, David J., and Barbara S. Minsker. (2010) “Anomaly detection in streaming environmental sensor data: A data-driven modeling approach.”
Environmental Modelling & Software 25 (9): 1014–1022.
[26] Langone, Rocco, Alfredo Cuzzocrea, and Nikolaos Skantzos. (2020) “A flexible and interpretable framework for predicting anomalous behavior
in Industry 4.0 environments.” Proceedings of the International Conference on Advanced Information Networking and Applications: 693–702.
[27] Angelopoulos, Angelos, Emmanouel T. Michailidis, Nikolaos Nomikos, Panagiotis Trakadas, Antonis Hatziefremidis, Stamatis Voliotis, and
Theodore Zahariadis. (2020) “Tackling faults in the Industry 4.0 era — A survey of machine-learning solutions and key aspects.” Sensors 20
(1): 109.
[28] Muhr, David, Shailesh Tripathi, and Herbert Jodlbauer. (2020) “Divide and conquer anomaly detection: A case study predicting defective
engines.” Procedia Manufacturing 42: 57–61.
[29] Hsieh, Ruei-Jie, Jerry Chou, and Chih-Hsiang Ho. (2019) “Unsupervised online anomaly detection on multivariate sensing time series data for
smart manufacturing.” Proceedings of the 12th IEEE Conference on Service-Oriented Computing and Applications: 90–97.
[30] Zimek, Arthur, Ricardo JGB Campello, and Jörg Sander. (2014) “Ensembles for unsupervised outlier detection: challenges and research
questions a position paper” Acm Sigkdd Explorations Newsletter 15 (1): 11–22.
[31] Ding, Zhiguo, and Minrui Fei. (2013) “An anomaly detection approach based on isolation forest algorithm for streaming data using sliding
window.” IFAC Proceedings 46 (20): 12–17.
[32] Dahiya, Sonika, Harshit Nanda, Jatin Artwani, and Jatin Varshney. (2020) “Using clustering techniques and classification mechanisms for fault
diagnosis.” International Journal of Advanced Trends in Computer Science and Engineering 9 (2): 2138 – 2146.
[33] TPOTClassifier https://epistasislab.github.io/tpot/api/.
[34] Agrawal, Utkarsh, Daniele Soria, Christian Wagner, Jonathan Garibaldi, Ian O. Ellis, John MS Bartlett, David Cameron, Emad A. Rakha, and
Andrew R. Green. (2019) “Combining clustering and classification ensembles: A novel pipeline to identify breast cancer profiles.” Artificial
intelligence in medicine 97: 27–37.

Immediate Download Management Information Systems: Moving Business Forward 4th Edition R. Kelly Rainer Ebooks 2024
100% (5)
Immediate Download Management Information Systems: Moving Business Forward 4th Edition R. Kelly Rainer Ebooks 2024
62 pages
Cnssi 4009
No ratings yet
Cnssi 4009
252 pages
Outlier Generation and Anomaly Detection Based On Intelligent One-Class Techniques Over A Bicomponent Mixing System
No ratings yet
Outlier Generation and Anomaly Detection Based On Intelligent One-Class Techniques Over A Bicomponent Mixing System
12 pages
1 s2.0 S0957417420300774 Main
No ratings yet
1 s2.0 S0957417420300774 Main
8 pages
Outlier Analysis
No ratings yet
Outlier Analysis
18 pages
1 s2.0 S0950705121006626 Main
No ratings yet
1 s2.0 S0950705121006626 Main
16 pages
Feature Bagging For Outlier Detection
No ratings yet
Feature Bagging For Outlier Detection
11 pages
LSOD
No ratings yet
LSOD
35 pages
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
No ratings yet
Distance-Based Outlier Detection: Consolidation and Renewed Bearing
12 pages
UNIT 4
No ratings yet
UNIT 4
17 pages
Outlier Detection Techniques
No ratings yet
Outlier Detection Techniques
11 pages
Energy Conversion and Econom - 2023 - Patel - Taxonomy of Outlier Detection Methods For Power System Measurements
No ratings yet
Energy Conversion and Econom - 2023 - Patel - Taxonomy of Outlier Detection Methods For Power System Measurements
16 pages
Duan
No ratings yet
Duan
18 pages
Data Minning Unit 4-1
No ratings yet
Data Minning Unit 4-1
10 pages
Pages: 1191 - 1196 ISSN: 2278-2397 International Journal of Computing Algorithm (IJCOA) 1191 Outlier Detection Scheme To Handle Wireless Sensor Data
No ratings yet
Pages: 1191 - 1196 ISSN: 2278-2397 International Journal of Computing Algorithm (IJCOA) 1191 Outlier Detection Scheme To Handle Wireless Sensor Data
6 pages
12 Outlier
No ratings yet
12 Outlier
18 pages
Mining Class Outliers: Concepts, Algorithms and Applications in CRM
No ratings yet
Mining Class Outliers: Concepts, Algorithms and Applications in CRM
17 pages
On The Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and An Empirical Study
No ratings yet
On The Evaluation of Unsupervised Outlier Detection: Measures, Datasets, and An Empirical Study
37 pages
Improved Data Discrimination in Wireless Sensor Networks: B. A. Sabarish, S. Shanmugapriya
No ratings yet
Improved Data Discrimination in Wireless Sensor Networks: B. A. Sabarish, S. Shanmugapriya
3 pages
17 dm2 Anomaly Detection 2022 23
No ratings yet
17 dm2 Anomaly Detection 2022 23
113 pages
Outlier Detection Techniques
100% (2)
Outlier Detection Techniques
56 pages
Outliers Detection and Classification in Wireless Sensor Networks
No ratings yet
Outliers Detection and Classification in Wireless Sensor Networks
8 pages
1-s2.0-S0020025523011052-main
No ratings yet
1-s2.0-S0020025523011052-main
17 pages
Unit5
No ratings yet
Unit5
47 pages
Guatrini 2020
No ratings yet
Guatrini 2020
16 pages
ADII10 Analisa Outlier
No ratings yet
ADII10 Analisa Outlier
37 pages
188 1496475265 - 03-06-2017 PDF
No ratings yet
188 1496475265 - 03-06-2017 PDF
6 pages
A Survey On Outlier Detection Techniques
No ratings yet
A Survey On Outlier Detection Techniques
37 pages
Outlier Mining Techniques For Uncertain Data
No ratings yet
Outlier Mining Techniques For Uncertain Data
7 pages
Reverse Accessible in Local Outlier Factor Density Based Recognition (1)
No ratings yet
Reverse Accessible in Local Outlier Factor Density Based Recognition (1)
10 pages
07 OUTLIER DETECTION
No ratings yet
07 OUTLIER DETECTION
54 pages
Chapter 4 Part 2
No ratings yet
Chapter 4 Part 2
12 pages
(C) 2022 Matlab-Based Graphical User Interface For IoT Sensor Measurements Subject To Outlier
No ratings yet
(C) 2022 Matlab-Based Graphical User Interface For IoT Sensor Measurements Subject To Outlier
6 pages
5.1.1 Objective and Scope: Jyenis 2020
No ratings yet
5.1.1 Objective and Scope: Jyenis 2020
8 pages
Unit 5
No ratings yet
Unit 5
70 pages
Anomaly Detection On Industrial Electrical Systems Using Deep Learning
No ratings yet
Anomaly Detection On Industrial Electrical Systems Using Deep Learning
6 pages
ID 429 Anodot Ultimate Guide To Building A Machine Learning Outlier Detection System Part II
No ratings yet
ID 429 Anodot Ultimate Guide To Building A Machine Learning Outlier Detection System Part II
22 pages
Lec3. Outlier Analysis
No ratings yet
Lec3. Outlier Analysis
54 pages
Module_11(c)
No ratings yet
Module_11(c)
4 pages
ADS EXP 7
No ratings yet
ADS EXP 7
10 pages
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
No ratings yet
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
34 pages
Outlier Detection
No ratings yet
Outlier Detection
36 pages
Outlier Detection For Different Applications Review IJERTV2IS3508
No ratings yet
Outlier Detection For Different Applications Review IJERTV2IS3508
13 pages
Krishnendu PCB-IT602B
No ratings yet
Krishnendu PCB-IT602B
11 pages
Missing and Outlier
No ratings yet
Missing and Outlier
20 pages
1 s2.0 S0952197622004936 Main
No ratings yet
1 s2.0 S0952197622004936 Main
8 pages
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
No ratings yet
On Normalization and Algorithm Selection For Unsupervised Outlier Detection
46 pages
Ebook Beginners Guide To Anomaly Detection 2022
No ratings yet
Ebook Beginners Guide To Anomaly Detection 2022
12 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
44 pages
Time-Series Anomaly Detection Service at Microsoft
No ratings yet
Time-Series Anomaly Detection Service at Microsoft
9 pages
Data Mining:: Concepts and Techniques
No ratings yet
Data Mining:: Concepts and Techniques
13 pages
12Outlier-1
No ratings yet
12Outlier-1
45 pages
A Review On Outlier Detection in Time Series Data BCAM 1 PDF
No ratings yet
A Review On Outlier Detection in Time Series Data BCAM 1 PDF
45 pages
Anomaly Detection and Outlier Analysis
No ratings yet
Anomaly Detection and Outlier Analysis
25 pages
4_Outliers_+Transformaations ML
No ratings yet
4_Outliers_+Transformaations ML
28 pages
Cheng 2019
No ratings yet
Cheng 2019
8 pages
11 Different Ways For Outlier Detection in Python
No ratings yet
11 Different Ways For Outlier Detection in Python
11 pages
Outlier Detection
No ratings yet
Outlier Detection
45 pages
Anomaly_Detection_on_Data_Streams_with_H
No ratings yet
Anomaly_Detection_on_Data_Streams_with_H
6 pages
Compusoft, 3 (6), 831-835 PDF
No ratings yet
Compusoft, 3 (6), 831-835 PDF
5 pages
Efficient Memory Optimization for IoT Intrusion Detection
From Everand
Efficient Memory Optimization for IoT Intrusion Detection
Ethan Evelyn
No ratings yet
Data Science with Python: From Zero to Machine Learning
From Everand
Data Science with Python: From Zero to Machine Learning
Pouvo
No ratings yet
A_New_Approach_for_the_Short-Term_Load_Forecasting
No ratings yet
A_New_Approach_for_the_Short-Term_Load_Forecasting
7 pages
Nightfighter Solo 110911 PDF
100% (1)
Nightfighter Solo 110911 PDF
11 pages
短篇研究论文
100% (1)
短篇研究论文
7 pages
Reboot For The AI Revolution: Comment
No ratings yet
Reboot For The AI Revolution: Comment
4 pages
GROUP-4-RESEARCH (1)
No ratings yet
GROUP-4-RESEARCH (1)
38 pages
What Is AI ? - Types of AI - What Is Line Scout ? - Line Scout Robot - Advantages of Line Scout - Disadvantage of Line Scout - Specification - Implementation of Line Scout
No ratings yet
What Is AI ? - Types of AI - What Is Line Scout ? - Line Scout Robot - Advantages of Line Scout - Disadvantage of Line Scout - Specification - Implementation of Line Scout
16 pages
Robot Rights? Let's Talk About Human Welfare Instead: Abeba Birhane Jelle Van Dijk
No ratings yet
Robot Rights? Let's Talk About Human Welfare Instead: Abeba Birhane Jelle Van Dijk
7 pages
Srdiff: Single Image Super-Resolution With Diffusion Probabilistic Models
No ratings yet
Srdiff: Single Image Super-Resolution With Diffusion Probabilistic Models
9 pages
Lecture-1to8-HCL-DSE - Sumita Narang - IDS PDF
No ratings yet
Lecture-1to8-HCL-DSE - Sumita Narang - IDS PDF
304 pages
Prompt Engineering
100% (4)
Prompt Engineering
100 pages
Black Blue Futuristic Modern Artificial Intelligence Project Presentation (1)
No ratings yet
Black Blue Futuristic Modern Artificial Intelligence Project Presentation (1)
10 pages
Harvard CS197 Lecture 6 & 7 Notes
No ratings yet
Harvard CS197 Lecture 6 & 7 Notes
18 pages
Social Media As A Marketing Tool - by Deepak
No ratings yet
Social Media As A Marketing Tool - by Deepak
30 pages
AI End Term Assignment
No ratings yet
AI End Term Assignment
12 pages
branches of AI
No ratings yet
branches of AI
7 pages
DHI-ITC352-AU3F-LZF1640_Datasheet_20210121
No ratings yet
DHI-ITC352-AU3F-LZF1640_Datasheet_20210121
3 pages
FD Eryu Pan, 2024
No ratings yet
FD Eryu Pan, 2024
7 pages
Fuzzy Control For Crushing
No ratings yet
Fuzzy Control For Crushing
30 pages
Unit 1 Intoduction to Generative AI
No ratings yet
Unit 1 Intoduction to Generative AI
8 pages
IX Summer Assignments
No ratings yet
IX Summer Assignments
18 pages
AI Scholars Brochure Winter 2023-Summer 2024
No ratings yet
AI Scholars Brochure Winter 2023-Summer 2024
20 pages
Distributed, Decentralized, and Democratized Artificial Intelligence
No ratings yet
Distributed, Decentralized, and Democratized Artificial Intelligence
7 pages
Integration of Computers
No ratings yet
Integration of Computers
3 pages
4-Neural Networks and Activation Function
No ratings yet
4-Neural Networks and Activation Function
28 pages
14PHDCS004 PDF
No ratings yet
14PHDCS004 PDF
1 page
Three Types of Innovation
No ratings yet
Three Types of Innovation
1 page
Virtual Try On Documentation
No ratings yet
Virtual Try On Documentation
60 pages
IOE UNIT 1
No ratings yet
IOE UNIT 1
12 pages