Sensors 24 03215
Sensors 24 03215
Sensors 24 03215
Article
Condition Monitoring and Predictive Maintenance of Assets
in Manufacturing Using LSTM-Autoencoders and
Transformer Encoders
Xanthi Bampoula, Nikolaos Nikolakis and Kosmas Alexopoulos *
Laboratory for Manufacturing Systems and Automation, Department of Mechanical Engineering and
Aeronautics, University of Patras, 26504 Patras, Greece; baboula@lms.mech.upatras.gr (X.B.);
nikolakis@lms.mech.upatras.gr (N.N.)
* Correspondence: alexokos@lms.mech.upatras.gr; Tel.: +30-2610-910160
Abstract: The production of multivariate time-series data facilitates the continuous monitoring
of production assets. The modelling approach of multivariate time series can reveal the ways in
which parameters evolve as well as the influences amongst themselves. These data can be used
in tandem with artificial intelligence methods to create insight on the condition of production
equipment, hence potentially increasing the sustainability of existing manufacturing and production
systems, by optimizing resource utilization, waste, and production downtime. In this context, a
predictive maintenance method is proposed based on the combination of LSTM-Autoencoders and a
Transformer encoder in order to enable the forecasting of asset failures through spatial and temporal
time series. These neural networks are implemented into a software prototype. The dataset used for
training and testing the models is derived from a metal processing industry case study. Ultimately,
the goal is to train a remaining useful life (RUL) estimation model.
potential malfunctions in production equipment and estimating its remaining useful life
(RUL)—is beneficial and important as maintenance activities can be scheduled, preventing
equipment failures, minimizing downtime, and optimizing maintenance activities, lead-
ing to increased production and improved overall process performance [7–11]. However,
taking into account the existence of a wide spectrum of artificial intelligence methods and
tools, it is imperative to select an appropriate model which is capable of processing both
large and complex data as well as providing accurate predictions in a fast manner. The
existence of this gap is the motive of the present work, which aims to deliver a methodology
that takes advantage of data analytics algorithms in the processing of data captured in
production lines so as to give guidelines and detect features that can be used in PdM. As
such, the combination of LSTM-Autoencoders, as a preliminary preprocessing step, and
Transformer is a promising solution for addressing the above-mentioned challenges.
Additionally, the aim of this work is to propose a novel approach for fault detection
and RUL prediction. Autoencoders with Long Short-Term Memory (LSTM) networks and a
Transformer encoder are used to assess the operational condition of production equipment
and detect anomalies that are then mapped to different RUL values. A combination of
two LSTM-Autoencoder networks is proposed for classifying the current machine’s health
condition based on different corresponding labels and then one Transformer encoder is
used for RUL estimation. The main novelty of this approach is that a separate neural
network is trained for each label, leading to better results for each case. Consequently, this
method can be adjusted to several types of machines and labels. The proposed approach
has been evaluated in a steel industry case based on historical maintenance record datasets.
Finally, the development of a prototype method and the implementation of a software
prototype have shown that the proposed method can provide information regarding
the machine’s health without requiring any specialization and additional skills from the
industry operators.
The structure of this work is divided into six sections. After the end of the Introduction
section which presents the scope, challenges, and background of the present work, the
Literature Review section follows, including key points from the literature that evaluate
the performance of different data analytics algorithms and present how the topics of
maintenance in manufacturing processes are tackled. After the Literature Review, this
work continues with the Methods, Implementation and Case study sections, where the
methodology, the actions, and the means that are needed to perform predictive maintenance
in the actual case from industry are mentioned. Having created the models and extracted
the features, the Case study section includes a Discussion chapter which discusses the
models’ outputs and their interpretations as well as the competitive advantages. Finally, in
the Conclusions section the outputs of the involved developments are summarized.
2. Literature Review
The condition monitoring of equipment, ensuring good functionality over the years,
has become a requirement/necessity for industries [6]. Some of the key reasons are the
repair downtime and the increasing cost of equipment failures, due to the high technology
that is hidden in each machine and robot, and machine idling, due to repair operations
leading to less productivity, out of schedule deliveries, and, consequently, dissatisfied
customers [12,13]. Condition monitoring also assists the transition from the traditional,
reactive, and preventive type of maintenance to the modern PdM [14–16]. PdM relies on
AI technologies to analyze significant amounts of data as close to real time as possible,
detecting potential equipment failures [17–19]. Data-driven approaches/methodologies
are effective for PdM as ML (machine learning) models can be trained on labelled data
during process failure without requiring an in-depth understanding of the underlying
process [20,21]. This allows industries and machine manufacturers to leverage the vast
amounts of data generated by industrial equipment, IoT devices, and edge devices to
predict upcoming failures in the near future and schedule maintenance activities before
they occur, extending the lifetime of the component [22–24]. Moreover, this kind of data-
Sensors 2024, 24, 3215 3 of 25
put and output, becoming more and more ubiquitous in deep learning [56,57]. The Trans-
Sensors 2024, 24, 3215 former architecture (Figure 1) was introduced in the 2017 paper “Attention is All You 4 of 25
Need” [58] and has since been used in many state-of-the-art models for NLP (natural lan-
guage processing) tasks such as language translation, sentiment analysis, and text classi-
fication. The main
classification. idea
The behind
main idea transformers is the useisof
behind transformers self-attention
the mechanisms,
use of self-attention which
mechanisms,
allow the model to focus on different parts of the input sequence and learn
which allow the model to focus on different parts of the input sequence and learn the the relation-
ships between them,
relationships making
between them
them, well-suited
making them for processing
well-suited forsequential
processingdata. Transform-
sequential data.
ersTransformers
eliminate theeliminate
need to train
the neural
need tonetworks with networks
train neural large, labelled
withdatasets that aredatasets
large, labelled costly
and time-consuming
that are costly and to produce by finding
time-consuming patterns
to produce by between elementsbetween
finding patterns mathematically
elements
[59–62].
mathematically [59–62].
Figure
Figure 1. 1.
TheThe Transformer
Transformer model
model architecture.
architecture.
InIn contrast
contrast toto previous
previous approaches,
approaches, the
the use
use ofof the
the attention
attention mechanism
mechanism provided
provided byby
these architectures allows us to take into consideration a plethora of characteristics
these architectures allows us to take into consideration a plethora of characteristics in- involved
in different
volved formsforms
in different of data [63,64].
of data Transformers
[63,64]. havehave
Transformers alsoalso
beenbeen
used for for
used time-series data
time-series
analysis and forecasting as they are capable of capturing long-term dependencies
data analysis and forecasting as they are capable of capturing long-term dependencies in in the
time-series data [65]. The use of Transformers for that kind of data analysis has
the time-series data [65]. The use of Transformers for that kind of data analysis has shown shown
promising results and is an area of active research and development.
promising results and is an area of active research and development.
Consequently, this paper proposes and examines a supervised deep learning method,
Consequently, this paper proposes and examines a supervised deep learning
combining a set of Autoencoders with Long Short-Term Memory (LSTM) networks and a
method, combining a set of Autoencoders with Long Short-Term Memory (LSTM) net-
Transformer encoder, for fault detection, health condition estimation, and RUL prediction
works and a Transformer encoder, for fault detection, health condition estimation, and
of a machine. First, the set of LSTM-Autoencoder networks classify the general current
RUL prediction of a machine. First, the set of LSTM-Autoencoder networks classify the
health of the machine into distinct labels, and then, only if the LSTM-Autoencoders indicate
general current health of the machine into distinct labels, and then, only if the LSTM-Au-
that the machine’s health is bad, one Transformer encoder is used to classify the machine’s
toencoders indicate that the machine’s health is bad, one Transformer encoder is used to
status into specific classes corresponding to different RUL values.
classify the machine’s status into specific classes corresponding to different RUL values.
3. Method
3. Method
Currently, AI provides a plethora of tools, methods, and models for the prediction
Currently, AI provides
of possible equipment a plethora ofTherefore,
malfunctions. tools, methods, and models
engineers have tofor thethe
face prediction
challenge ofof
possible equipment malfunctions. Therefore, engineers have to face the challenge of
carefully selecting the most appropriate ML model. In the presented case study, alternative care-
fully
MLselecting the most
models could appropriate e.g.,
be implemented, ML GRU,
model.which
In therequires
presented caseofstudy,
the use alternative
less computational
ML models could be implemented, e.g., GRU, which requires the use of
parameters, and, by extension, less computational resources, at the cost of losingless computa-
long-term
tional parameters,
dependencies and,
built upbyin extension, less computational
the dataframes. resources, at the have
The two LSTM-Autoencoders cost of losing
been used
as a preliminary preprocessing step in the approach in order to filter out any irrelevant
information and decide if the data require further analysis from the Transformer encoder.
Then, the Transformer encoder further processes and analyzes the data, mapping them into
Sensors 2024, 24, 3215 5 of 25
3.1. LSTM-Autoencoders
In order to train any set of LSTM-Autoencoders, sensor data are required, derived
from a production machine. After the training, the set of separate LSTM-Autoencoders can
classify new sensor data that have never been seen before to different operational machine
statuses. In particular, a variety of different sensors, that are placed on the machine, take
measures of multiple features from the equipment and its environment. Preprocessing
of the data is mandatory, as data coming from industry can be inconsistent, noisy, or
even incomplete, leading to poor model performance. Apart from that, identifying the
appropriate set of features associated with potential failures is a challenging task. So, in
order to model the degradation process of any machine and determine the critical values,
plotting the dataframe values is proposed. After the visualization of the data, and in
combination with the knowledge and maintenance records of the factory specialists, related
studies, and scientific dissertations of a machine, the key features can be selected.
LSTM-Autoencoders are used for the classification of the health condition of a ma-
chine to one or more categories as explained hereafter. The architecture of each LSTM-
Autoencoder depends on the problem and the categories to be identified. The proposed
approach requires, at a minimum, two categories to determine the health condition of the
equipment: one category to represent the equipment’s good health condition, typically
after maintenance or part replacement, and the other category to represent bad health con-
ditions, such as due to degradation or failure that requires maintenance from an operational
perspective. Additional categories, beyond the two mentioned, could be included based
on specific needs and requirements. However, this specific study uses the minimum of
two categories, namely “good health” and “bad health”, to classify the health status of the
equipment. In order to classify these categories, an LSTM-Autoencoder is trained for each
label, with different datasets, so the number of LSTM-Autoencoders equals the number
of labels.
In order to define these different datasets and train the individual LSTM-Autoencoders,
historical maintenance records are used in order to label the data based on their timestamp
and the number and type of different statuses selected. Finally, a data split is performed
to define, train, and test data for each LSTM-Autoencoder; 80% of the initial dataset is
used for the neural network training and validation, and the remaining 20% for testing the
neural network [66].
Figure 2 illustrates a high-level LSTM-Autoencoder architecture. As presented in the
following Equation (1), the input of each LSTM-Autoencoder is a time-series sequence,
Ai , containing the values αij of each sensor, denoting one of the variables measured at a
specific time, with n being the number of features.
Ai = αi1 , αi2 , αi3 , . . . , αij , where αij ∈ R, with i, j ∈ Z and i ≤ n (1)
Consequently, this time-series sequence is the input of each LSTM cell of the encoder,
along with the hidden output from the previous LSTM cell. Finally, the output of the
encoder is a compressed representation of the input sequence, the learned representation
vector, which includes all the hidden states from all the previous encoder LSTM cells. This
output is fed then into the decoder to reconstruct the original input sequence, processing
these encoded features through a series of LSTM decoder cells. As presented in Equation (2),
the output of the decoder layer is a reconstruction of the initial input time-series sequence
A′ i , containing the reconstructed values α′ ij of each sensor.
h i
Ai′ = αi1
′ ′
, αi2 ′
, αi3 , . . . , αij′ , where αij′ ∈ R, with i, j ∈ Z and i ≤ n (2)
After the LSTM-Autoencoder training, the model is evaluated by feeding the test data,
defined earlier, as input to the model, and then, the reconstructed values are compared
(2), the output of the decoder layer is a reconstruction of the initial input time-series se-
quence A’i, containing the reconstructed values α’ij of each sensor.
After the LSTM-Autoencoder training, the model is evaluated by feeding the test
data, defined earlier, as input to the model, and then, the reconstructed values are com-
with the
pared input
with thevalues. The metric
input values. used to
The metric evaluate
used the model
to evaluate is theisMean
the model Squared
the Mean Error
Squared
(MSE) as presented in Equation (3).
Error (MSE) as presented in Equation (3).
11 n
∑ ′
2
𝑀𝑆𝐸
MSE i = A𝐴′
i − A𝐴i (3)
(3)
n𝑛i=1
Following the
Following the training
training phase,
phase, new data, that the LSTM-Autoencoders
LSTM-Autoencoders havehavenever
never
seen before,
seen before, are
are provided
provided as input to the networks,
networks, and
and each
each of
of them
them produce
producedifferent
different
reconstructedvalues
reconstructed values for
for the
the same
same input, as depicted in Figure
Figure 3.
3.
Figure2.2. High-level
Figure High-level LSTM-Autoencoder architecture.
LSTM-Autoencoder architecture.
Figure 3. LSTM-Autoencoder
LSTM-Autoencoder architecture
architecture set.
set.
integration of
The integration of outputs
outputs from
from the the two
two separate
separate LSTM-Autoencoders
LSTM-Autoencoders is is achieved
achieved
through aa decision
decision rule,
rule, based
based on
on their
their reconstruction
reconstruction losses,
losses, compared
compared to to the
the input.
input. The
The
LSTM-Autoencoder with the lower reconstruction loss indicates better recognition of the the
input dataset, and consequently, the the input
input sequence
sequence isis classified
classified into the the same
same category
category
state
state as
as the
the one
one used
used to
to train
train this
this specific
specific LSTM-Autoencoder.
LSTM-Autoencoder.
In this
this approach, LSTM-Autoencoders serve as a preprocessing step. If
approach, LSTM-Autoencoders If the LSTM-
Autoencoders
Autoencoders classify the the health
health status
statusof ofthe
theequipment
equipmentas asaa“good
“goodstate”,
state”,further
furtheranaly-
anal-
ysis fromthe
sis from the Transformer
Transformer encoder
encoder is unnecessary.
is unnecessary. Otherwise,
Otherwise, in case
in case thatLSTM-Auto-
that the the LSTM-
Autoencoders
encoders classify classify the health
the health statusstatus
of theofequipment
the equipment as a “bad
as a “bad state”,
state”, the same
the same inputinput
data
data are used
are used as input
as input to a to a Transformer
Transformer encoder
encoder in order
in order to identify
to identify its remaining
its remaining useful
useful life
life (Figure
(Figure 4). 4).
Figure 4.
Figure 4. LSTM-Autoencoders
LSTM-Autoencoders and
and Transformer
Transformer encoder
encoder integration.
integration.
3.2. Transformer
3.2. Transformer Encoder
Encoder
The
The Transformer
Transformerencoder
encoderis is used
used for
for the
the identification
identification of of the current machine’s
machine’s health
health
condition and mapping
mapping it toto remaining
remaining useful
useful life
life (RUL)
(RUL) byby processing
processing and
and extracting
extracting
meaningful
meaningful information
information from
from the
the input
input data
data and
and making
making predictions.
predictions.
In
In the proposed
proposed approach,
approach, three
three (3)
(3) classes are used for the classification representing
representing
different health states of the machine.
different health states of the machine. The data that belong to Class 0 represent
represent the health
health
state of machines
of machines with an RUL of 3–4 days.
RUL of 3–4 days. The data that belong to Class 1 represent
represent the
the
health state of machines with an RUL of 2–3 days. Finally, the data that belong to Class 2
represent the health state of machines with an RUL of 1 day.
In order to label the data into the three (3) different classes, historical maintenance
records are taken into consideration based on their timestamp. Finally, a data split is
performed to define, train, and test data for each LSTM-Autoencoder; 80% of the initial
dataset is used for the neural network training and validation, and the remaining 20% for
the neural network testing.
Figure 5, illustrates the Transformer encoder’s Multi-Head Attention architecture. The
input of the Transformer encoder is a window from time-series data that are processed
independently and contain the values of each sensor. After the Q, K, and V matrixes are
generated for each head independently, the next step is the matrix multiplications between
the Queries matrix and the transposed Keys matrix, determining the relationships or the
similarity of the Query and the Key values (the scores). These scores are then scaled down
by being divided by the square root of the Query and Key dimension in order to avoid any
exploding effect. SoftMax is then applied to the scaled score matrixes in order to obtain
the attention weights. Finally, the attention weights of the multiple heads are multiplied
with the value matrixes in order to produce one matrix for each head that contains the
information of a value corresponding to the whole input. So, as the Transformer model
has multiple heads (# of heads = h), the output is h matrixes. Finally, all separate h outputs
from each Attention Head are concatenated and then multiplied with the Wo matrix in
order to output a matrix with the same shape as the input. The output of the Multi-Head
to obtain the attention weights. Finally, the attention weights of the multiple heads are
multiplied with the value matrixes in order to produce one matrix for each head that con-
tains the information of a value corresponding to the whole input. So, as the Transformer
Sensors 2024, 24, 3215 model has multiple heads (# of heads = h), the output is h matrixes. Finally, all8separate of 25 h
outputs from each Attention Head are concatenated and then multiplied with the Wo ma-
trix in order to output a matrix with the same shape as the input. The output of the Multi-
Head Attention
Attention is thenisadded
then added to the input
to the original original input
(Figure (Figure
6) and passes6)through
and passes through a nor-
a normalization
layer, making
malization themaking
layer, model more robust more
the model and stable during
robust andtraining.
stable during training.
Figure 5. 5.
Figure Transformer
Transformer encoder Multi-Head
encoder Multi-Head Attention.
Attention.
Figure 8.
Figure 8. LSTM-Autoencoder
LSTM-Autoencoder and
and Transformer model implementation.
Transformer model implementation.
At first,
At first,the
thesensor
sensordatadatawerewere imported
imported to the implemented
to the implemented system as JSON
system files, files,
as JSON pro-
cessed to remove
processed to remove missing
missing values, andand
values, finally converted
finally convertedto ato
dataframe
a dataframe format using
format the
using
Pandas
the library.
Pandas In the
library. In final dataframe,
the final eacheach
dataframe, column represented
column representedthe values of a single
the values sen-
of a single
sor, a feature,
sensor, a feature,sorted
sortedin in
chronological
chronologicalorder orderbased
basedon ontheir
theirtimestamp.
timestamp. The The selection of
features, used to determine the level of degradation of the machine, was based mainly on
human knowledge of the equipment and process and our our bibliographic
bibliographic research.
research. Finally,
Finally,
in order toto increase
increasethethemodel
modelperformance,
performance,atat a second
a second level, two
level, labels
two were
labels wereused for the
used for
LSTM-Autoencoder
the LSTM-Autoencoder network,
network, identifying the good
identifying the goodand and
bad bad
operating condition
operating of the
condition of
the monitored
monitored equipment,
equipment, and and
thenthenthreethree
labelslabels
werewere
used used for
for the the Transformer
Transformer network,
network, iden-
identifying
tifying the RULthe RUL ofmonitored
of the the monitored equipment
equipment through
through classification.
classification.
In order
order to implement the LSTM-Autoencoders, the Keraslibrary
to implement the LSTM-Autoencoders, the Keras librarywas
wasused.
used.Keras
Kerasis is
a
popular
a popularPython
Pythonlibrary
librarythat
thatisiswidely
widelyusedusedfor
fordeveloping
developingand andevaluating
evaluating deep
deep learning
models as an open-source software library that provides a user-friendly interface for de-
signing and training neural networks. In the aforementioned proposed approach, the
training dataset was segmented based on historical maintenance records and then two
separate LSTM-Autoencoders were trained using data corresponding to each of the two
equipment states, namely good and bad. After the training the two separate LSTM-Auto-
Sensors 2024, 24, 3215 10 of 25
models as an open-source software library that provides a user-friendly interface for design-
ing and training neural networks. In the aforementioned proposed approach, the training
dataset was segmented based on historical maintenance records and then two separate
LSTM-Autoencoders were trained using data corresponding to each of the two equipment
states, namely good and bad. After the training the two separate LSTM-Autoencoders,
newly arrived data were fed into each of the two separate LSTM-Autoencoders, which are
connected in parallel, in order to classify them into one of the two supported labels, “bad
state” or “good state”.
Then, in order to implement the Transformer model, Keras library was also used. In
case the LSTM-Autoencoder result is that the machine is in a bad state, the Transformer
model will take the same input in order to further process the data and make a classification
of the RUL of the machine.
Finally, during the experimentation stage, the accuracy of the system’s results was
cross-validated using the actual maintenance records provided by the use-case owner, as
described in the following section.
5. Case Study
5.1. Hot Rolling Mill
The aforementioned approach was implemented into a software prototype that was
trained and tested in a real-world steel production industry case. The data used in this
study were derived from a hot rolling mill machine that is used for producing metal
bars. Figure 9 illustrates a high-level diagram of the rolling mill machine components and
Sensors 2024, 24, x FOR PEER REVIEWtheir connectivity. Sensor values were initially stored in a local database on the 11 motion
of 26
controller and then transferred to a Programmable Logic Controller (PLC) database, and
finally, in a historical database. Real-time data were transmitted from the PLC database to
the PC for RUL prediction via communication channels. Additionally, as the developed
work was implemented
framework on an industrial
was implemented intranet,intranet,
on an industrial and there
andwas no external
there communica-
was no external com-
tions/exchange of data outside
munications/exchange of datathe factory,
outside theno mechanisms
factory, for data privacy
no mechanisms for dataand security
privacy and
were incorporated.
security were incorporated.
Hotrolling
Figure9.9.Hot
Figure rollingmill
millmachine
machinediagram.
diagram.
Therolling
The rollingcylinders
cylindersofofthe
thehot
hotrolling
rollingmill
millhave
havedifferent
differentgeometrically
geometricallycoated
coatedseg-
seg-
ments attached to them, which are used to form the metal bars by applying
ments attached to them, which are used to form the metal bars by applying force. The force. The
rolling mill
rolling mill consists
consists of
of three
three top
top and
and three
three bottom
bottom segments,
segments, each
each with
with aa wear-resistant
wear-resistant
coating. Regarding the preventive maintenance activities that take place for this machine,
coating. Regarding the preventive maintenance activities that take place for this machine,
the coated segments are scheduled to be replaced approximately every sixteen (16) days
the coated segments are scheduled to be replaced approximately every sixteen (16) days
or sooner in case of any unexpected damage, and the replacement of the coated segments
or sooner in case of any unexpected damage, and the replacement of the coated segments
by the maintenance personnel typically lasts about two hours. The goal and objective of
by the maintenance personnel typically lasts about two hours. The goal and objective of
this study is to enable the turn from preventive maintenance into predictive maintenance
by anticipating the behaviour of the segments through RUL prediction with the use of
neural networks.
this study is to enable the turn from preventive maintenance into predictive maintenance
by anticipating the behaviour of the segments through RUL prediction with the use of
neural networks.
Table 22 presents
Table presentsthe
thearchitecture
architectureofofeach
eachLSTM-Autoencoder,
LSTM-Autoencoder, which
which includes
includes thethe lay-
layers
ers of the network created, the number of parameters (weights and biases) of
of the network created, the number of parameters (weights and biases) of each layer, and each layer,
andtotal
the the total parameters
parameters ofmodel,
of the the model, as described
as also also described previously.
previously. In machine
In machine learning
learning and
and neural networks, the number of parameters in a neural network
neural networks, the number of parameters in a neural network can have an impactcan have an impact
on
on the
the processing
processing complexity
complexity of of
thethe model
model [70].InInthis
[70]. thisapproach,
approach,the
thenumber
number of of trainable
trainable
parameters in each network was 249.860, which resulted in the good performance of the
model.
parameters in each network was 249.860, which resulted in the good performance of the
model.
As3.mentioned
Table before, the
Historical maintenance coated segments are scheduled to be replaced approxi-
records.
mately every sixteen (16) days or sooner in case of any unexpected damage and failure.
So, as# illustrated
Mountedin FigureUnmounted
12, we can assumeRULthat in the first twoRemark
days that the coating
1 day 1 day 12 12 days Large piece
was mounted, the sensor data corresponded to a machines’ good state, and broken outvice
of surface
versa: the
2 day 1 day 15 15 days Large piece broken
last two days before the coating was unmounted, the sensor data corresponded out of surface
to a ma-
3 day 1 day 16 16 days Preventive maintenance
chines’
4 bad stateday(Table
1 4). day 15 15 days Large piece broken out of surface
Figure
Figure12.
12.Data
Dataselection
selection for
for training LSTM-Autoencoders.
training LSTM-Autoencoders.
Each dataset consisted of approximately 200,000 values. The datasets were then split
into training and test data, with 80% of the first part of the dataset used for training and
the remaining 20% used for testing. Both the training and test data were normalized to a
range from 0 to 1 to facilitate faster and better training of the neural networks.
Table 5 presents the training loss results after performing multiple experiments in
Sensors 2024, 24, 3215 14 of 25
Each dataset consisted of approximately 200,000 values. The datasets were then split
into training and test data, with 80% of the first part of the dataset used for training and
the remaining 20% used for testing. Both the training and test data were normalized to a
range from 0 to 1 to facilitate faster and better training of the neural networks.
Table 5 presents the training loss results after performing multiple experiments in
order to identify the ideal number of epochs, the window size, and the batch size in this
use case. Epoch refers to the number of times the entire training dataset is passed through
the neural network during the training process. In each epoch, the neural network goes
through all the training examples in the dataset. The batch size refers to the number of
samples that are processed at each training iteration, and the weights of the neural network
are updated after processing each batch.
After the training of the LSTM-Autoencoders, new datasets that the two separate
LSMT-Autoencoders had never seen before were then input. Each dataset was the input
for both LSTM-Autoencoders and each of them produced different reconstructed values
for the same input. The reconstructed values that presented a smaller reconstructed error
with the input are probably recognized better by this LSTM-Autoencoder. As a result, the
input dataset belongs to the same category state as the dataset that the LSTM-Autoencoder
was trained with. In Table 6, the first column refers to the actual states of the monitored
equipment on specific days according to the historical maintenance records of the hot
rolling mill, while the last two columns present the loss generated by each one of the two
LSTM-Autoencoders for the corresponding days.
the output of the Dense layer passes through a Dropout layer. Finally, the output of the
Sensors 2024, 24, x FOR PEER REVIEWDropout layer is passed through a Dense layer with units = # of classes applying linear
17 of 26
Sensors 2024, 24, x FOR PEER REVIEW 17 of 26
transformation followed by the SoftMax activation function. This function outputs the
probabilities of the # of classes.
Figure
Figure15.
15.Data
Dataselection
selection for training the
for training theTransformer
Transformerencoder.
encoder.
Figure 15. Data selection for training the Transformer encoder.
Theinput
The input dataset
dataset consisted
consisted of
ofapproximately
approximately300,000
300,000values. The
values. datasets
The were
datasets then
were then
split The training
into input dataset
and consisted
test data, of approximately
with 80% of the first 300,000
part of values.
the The
dataset datasets
used for were then
training
split into training and test data, with 80% of the first part of the dataset used for training
split into training and test data, with 80% of the first part of the dataset used for training
and the remaining 20% used for testing. Both the training and test data were normalized
and the remaining 20% used for testing. Both the training and test data were normalized
to a range from 0 to 1 to facilitate faster and better training of the neural networks.
to a range from 0 to 1 to facilitate faster and better training of the neural networks.
Table 7 presents the best accuracy rate after performing multiple experiments in or-
Table 7 presents the best accuracy rate after performing multiple experiments in or-
der to identify the ideal window size and batch size in this use case.
der to identify the ideal window size and batch size in this use case.
Sensors 2024, 24, 3215 17 of 25
and the remaining 20% used for testing. Both the training and test data were normalized to
a range from 0 to 1 to facilitate faster and better training of the neural networks.
Table 7 presents the best accuracy rate after performing multiple experiments in order
to identify the ideal window size and batch size in this use case.
Following the completion of the model training phase, a series of digital experiments
were conducted. For these experiments, new datasets were used, derived from the splitting
of the initial dataframe. These experiments share the same methodology, yet with different
datasets as input to the Transformer model. The output of each experiment is a set of
classification metric values and confusion matrices over the different classes. Finally, the
results from the experiments were cross-validated using the actual maintenance records
provided by the use-case owner for the evaluation of the system’s performance. Each class
corresponds to a different health state of the machine (Table 8).
Classes RUL
Class 0 3–4 days
Class 1 2–3 days
Class 2 1 day
Tables 9–11 present the classification metric values in order to evaluate the performance
of the Transformer model. The metrics used for the evaluation are Precision, Recall, F1
Score and Accuracy and are calculated for each class in each input dataset. The input
datasets used for the experiments were labelled as Class 0, Class 1, and Class 2 based
on the segment’s exchange records. Confusion matrixes are used in order to provide a
representation of the Transformer model’s actual class labels and the predictions for each
class (Figures 16–18). Each row of the confusion matrix represents the number of data
values that belong in the real class, and each column represents the number of data values
in the predicted class.
Table 9. Transformer
Table 11. results:Experiment
Transformer results: Experiment 1—maintenance
3—maintenance because
because of break
of break down.down.
Precision
Precision (%)(%) RecallRecall
(%) (%) F1 Score
F1 Score (%) (%) Confidence
Confidence (%) (%) Support
Support
Class00
Class 94%
60% 16% 70% 25% 80% 16% 70% 3600 3600
Sensors 2024, 24, x FOR PEER REVIEW 19 of 26
Class11
Class 98%
56% 65% 97% 60% 98% 64% 97% 3600 3600
Class22
Class 74%
56% 88% 94% 68% 83% 88% 94% 3580 3580
Accuracy (%)
Accuracy (%) 56% 87%
Table 10. Transformer results: Experiment 2—maintenance because of break down.
Figure 18.Confusion
Figure18. Confusionmatrix:
matrix:Experiment 3—maintenance
Experiment because
3—maintenance of break
because of down.
break down.
The input datasets used for the following three experiments were labelled as Class 2
despite the fact that these data were taken the day before the preventive maintenance ac-
tivities based on the segment’s exchange records. As the segment exchange took place
preventively and not because of a segment break down, it indicates that the machine may
have had a few more days of expected life. Consequently, it is interesting to observe the
Sensors 2024, 24, 3215 19 of 25
The input datasets used for the following three experiments were labelled as Class
2 despite the fact that these data were taken the day before the preventive maintenance
activities based on the segment’s exchange records. As the segment exchange took place 20 of 26
Sensors 2024, 24, x FOR PEER REVIEW
preventively and not because of a segment break down, it indicates that the machine may
have had a few more days of expected life. Consequently, it is interesting to observe the
Transformer model’s predictions for these cases (Tables 12–14).
Table 12. Transformer results: Experiment 4—preventive maintenance.
Table 12. Transformer results: Experiment 4—preventive maintenance.
Precision (%) Recall (%) F1 Score (%) Confidence (%) Support
Class 0 Precision0(%) Recall (%) 0 F1 Score (%) 0Confidence (%) Support 0
Class
Class 01 0 0 0 0 0 0 0 0
Class 12
Class 0100% 0 19% 0 32% 19% 0 3580
Class 2 100% 19% 32% 19% 3580
Accuracy (%) 19%
Accuracy (%) 19%
Confusion matrixes show that despite the fact that these data were taken the day
beforeConfusion matrixesmaintenance
the preventive show that despite the fact
activities andthat these belong
should data were taken 2,
in Class thethe
daymachine
before the preventive maintenance activities and should belong in Class 2, the machine
may have had a few more days of expected life. According to Figures 19 and 20, the Trans- may
have had a few more days of expected life. According to Figures 19 and 20, the Transformer
former model predicted that these data belong to Class 0 and have about 3–4 more days
model predicted that these data belong to Class 0 and have about 3–4 more days of life,
of life,according
while, while, according to the
to Figure 21, Figure 21, the Transformer
Transformer model predicted model predicted
that these that these
data belong to data
belong to Class 1 and have about 2–3 more
Class 1 and have about 2–3 more days of life. days of life.
19.Confusion
Figure 19.
Figure Confusionmatrix:
matrix:Experiment 4—preventive
Experiment maintenance.
4—preventive maintenance.
Sensors 2024, 24, x FOR PEER REVIEW 21 of 26
Sensors 2024, 24, x FOR PEER REVIEW 21 of 26
Sensors 2024, 24, 3215 20 of 25
Figure 21.Confusion
Figure21. Confusionmatrix:
matrix:Experiment 6—preventive
Experiment maintenance.
6—preventive maintenance.
Figure 21. Confusion matrix: Experiment 6—preventive maintenance.
5.8. Discussion
5.8. Discussion
5.8. Discussion
In order to evaluate the performance of the proposed approach, four months of
machineIn order to evaluate
operation data were the performance
used, of the proposed
and the datasets approach, four months of ma
In order to evaluate the performance of thefor training and
proposed testing were
approach, four created
months of ma-
chine on
based operation datamaintenance
the historical were used,records and the datasets
from the hotfor training
rolling and testing were created
mill machine.
chine operation data were used, and the datasets for training and testing were created
based Foron thethe
LSTM-Autoencoder
historical maintenance (Table 6)records
the difference
frombetween
the hot the lossesmill
rolling of the two LSTM-
machine.
based on the was
Autoencoders historical maintenance records from the the
hotinput
rolling mill machine.
For the LSTM-Autoencoder (Table 6) the difference between the losses the
enough in order to categorize and label data and identify of the two
healthFor theofLSTM-Autoencoder
status the hot rolling mill machine. (Table 6) Thethe difference
datasets between
for training and the losses
testing of the two
created
LSTM-Autoencoders was enough in order to categorize and label the input data and iden
LSTM-Autoencoders was enoughrecords
based on the historical maintenance in order to categorize
from the hot rollingandmilllabel the input data and iden-
machine
tify the health status of the hot rolling mill machine. The datasets for training and testing
tify the
Thehealth
results status
from the of experiments
the hot rolling weremill machine. The
cross-validated datasets
using for training
the actual maintenance and testing
createdprovided
records based on the historical maintenance records from the hot rolling mill machine
created based onby thethe use-case maintenance
historical owner for the evaluation
records from of the
thesystem’s
hot rollingperformance.
mill machine
The results
According from the experiments
to the data were cross-validated using the actual maintenance
The results frompresented
the experimentsin Tableswere9–11,cross-validated
the Transformerusing modelthe canactual
predict the
maintenance
records provided
equipment’s by the
health state, use-case
predict owner for
the remaining the life,
useful evaluation
and preventof theanysystem’s
failure or performance
break
records provided by the use-case owner for the evaluation of the system’s performance
According
down with highto the data presented
confidence. in Tables
Additionally, 9–11, the
the network Transformer
results model
in Tables 12–14 canthat
show predict the
According to the data presented in Tables 9–11, the Transformer model can predict the
the equipment was still in a healthy state at the time of preventive
equipment’s health state, predict the remaining useful life, and prevent any failure or maintenance activities.
equipment’s
Consequently, health state,
in a period predict
of one the as
(1) year, remaining
preventive useful life, and prevent takeany failure or
break down with high confidence. Additionally, themaintenance
network results activities
in Tables place
12–14 show
break
every down with
sixteen highthe
(16) days, confidence.
equipment Additionally,
couldstate
gain at the
(onthe network results in Tables
average) 12–14 show
that the equipment was still in a healthy time ofapproximately fifty-seven
preventive maintenance activ
that the equipment was still in a healthy state
(57) more days of life and a 17,39% reduction in preventive stoppages. at the time of preventive maintenance activ-
ities. Consequently, in a period of one (1) year, as preventive maintenance activities take
ities.AsConsequently,
indicated in theinLSTM-Autoencoder
a period of one (1) year, and
Training as preventive maintenance
Testing paragraph, activities take
the developed
place everycan
framework sixteen
predict(16) days,
thedays, the equipment
equipment’s health statuscould gain
and (on
the (on average) approximately
corresponding RUL values fifty
place every sixteen (16) the equipment could gain average) approximately fifty-
seven
with (57) more days of life and a 17,39% reduction in preventive stoppages.
sevena high confidence
(57) more days rate.
of life However,
and a 17,39%the factreduction
that the confidence
in preventivelevel remains
stoppages. less than
100% As indicated
indicates in the
that the LSTM-Autoencoder
developed Training and Testing
framework is a complementary tool andparagraph,
provides good the devel
As indicated in the LSTM-Autoencoder Training and Testing paragraph, the devel-
oped framework
estimates can predict the equipment’s
for the technician/engineer, and that human health status isand
intervention stillthe corresponding
required in order RUL
oped framework can predict the equipment’s health status and the corresponding RUL
to ensurewith
values seamless
a highoperation
confidence of therate.
production
However, line. Concretely,
the fact that thethedeveloped
confidence framework
level remains
values
can be withasaahigh
used smart confidence
suggestion rate. However,
system which the fact
monitors the that
statustheof confidence
the equipment level remains
andand
less than 100% indicates that the developed framework is a complementary tool pro
less than 100% indicates that the developed framework is a complementary tool and pro-
vides good estimates for the technician/engineer, and that human intervention is still re
vides good estimates for the technician/engineer, and that human intervention is still re-
quired in order to ensure seamless operation of the production line. Concretely, the de
quired in order to ensure seamless operation of the production line. Concretely, the de-
veloped framework can be used as a smart suggestion system which monitors the status
veloped framework can be used as a smart suggestion system which monitors the status
Sensors 2024, 24, 3215 21 of 25
6. Conclusions
In conclusion, this study proposes a new approach for fault detection by evaluating
the condition of production assets and predicting their remaining useful life (RUL). In
order to integrate this solution, Autoencoders with Long Short-Term Memory (LSTM)
networks were combined with a Transformer encoder to evaluate the functional status of a
hot rolling mill machine in manufacturing, identify any anomalies, and map them to RUL
values. Initially, a combination of two LSTM-Autoencoder networks was trained for the
classification of the current machine’s health condition to the two different corresponding
labels of the machine, good state and bad state. Then, a Transformer encoder was trained
in order to estimate and predict the remaining useful life of this machine. The proposed
method was evaluated on a hot rolling milling machine.
The novelty of the proposed approach is that in the first phase, a separate LSTM-
Autoencoder is trained for one label, leading to better results, and making it easily ad-
justable to many labels following the exact same logic and procedure. The two LSTM-
Autoencoders were used as a preliminary preprocessing step in the approach in order to
filter out any irrelevant information and decide if the data required further analysis from
the Transformer encoder. Then, the Transformer encoder further processes and analyzes
the data, mapping them into different RUL classes. So, using LSTM-Autoencoders as a
preliminary preprocessing step allows a balance between computational efficiency and
model performance. Furthermore, considering the architectural characteristics of the Trans-
formers, key elements such as non-sequential processing and self-attention mechanisms
enable such models to process large datasets in real time and provide faster responses in
comparison to other similar models.
Real-world data from a hot rolling milling machine were used both for training and
testing of the neural networks, and the obtained results were satisfactory as presented
in this study. However, during the development of the presented method, several chal-
lenges emerged. One of the key limitations was the extensive data preprocessing required.
Concretely, a manual labelling process was mandatory, which was encountered by combin-
ing the dataframe with labels derived from historical maintenance records. Another key
limitation was the increased complexity of the data, which was addressed by iteratively
fine-tuning the hyperparameters of the model. By extension, additional experiments are
necessary to be conducted using a more extensive dataset of higher data quality for a longer
time period.
The results from all the different experiments show that the proposed approach
is promising and can help to improve maintenance planning, reducing redundant and
preventive stoppages in the production line, preventing any serious failure of the machine
before it happens, and leading to a decrease in the cost of maintenance operations. Finally,
the proposed method can provide information regarding the machine’s health without
requiring any specialization and additional skills from the industry operators.
However, one limitation of the proposed approach arises when dealing with data of
higher resolution with multiple labels, requiring multiple neural networks to identify the
machine’s status. Such cases can be computationally complex, and neural networks may
not be able to accurately recognize the neighbour states. Also, another limitation of this
approach is the requirement for maintenance records used to label the datasets, such as
component break downs and failures. These kinds of data are limited in the industry as
preventive maintenance activities are planned in order to avoid this kind of critical failure
of the equipment.
A next step for this approach is performance optimization by choosing different sets
of hyperparameters for each network, conducting experiments, and comparing the results.
Also, the robustness of the model to anomalies and noise data will be evaluated. The same
approach could also be tested with more than four features and high-dimensional data, or
Sensors 2024, 24, 3215 22 of 25
completely different set of features for training. This expansion will allow the model to
find and uncover more hidden patterns, relationships, correlations, and other insights that
may remain undiscovered within the constraints of the current implementation.
Future work will also focus on evaluating the proposed concept against other ma-
chine learning methods combining different neural networks for each step, using different
datasets from different real-world scenarios. In terms of implementation, and in order to
minimize the framework’s response time (i.e., real-time), a better network infrastructure
needs to be implemented in order to reduce network latency and system response. Further-
more, regarding the neural network operation, the utilization of high-power GPUs could
further reduce prediction time. Finally, in an attempt to improve the impact of the proposed
method, future work will involve the comparison of the developed model versus other
statistical models, e.g., the exponential degradation model. Finally, different architectures
for varying conditions will also be investigated and compared against the current approach.
Author Contributions: Conceptualization, K.A., N.N. and X.B.; methodology, K.A. and X.B.; soft-
ware: X.B.; validation, X.B.; formal analysis, X.B.; investigation, X.B.; resources, K.A.; data curation,
X.B.; writing—original draft preparation, X.B.; writing—review and editing, X.B., N.N. and K.A.;
visualization, X.B.; supervision, K.A. and N.N.; project administration, N.N.; funding acquisition,
K.A. All authors have read and agreed to the published version of the manuscript.
Funding: This research has been partially funded by the European project “SERENA—VerSatilE
plug-and-play platform enabling REmote predictive mainteNAnce” (Grant Agreement: 767561).
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data presented in this study are available on request from the
corresponding author due to privacy restrictions.
Conflicts of Interest: The authors declare no conflicts of interest.
References
1. Chryssolouris, G.; Alexopoulos, K.; Arkouli, Z. A Perspective on Artificial Intelligence in Manufacturing; Springer Nature:
Berlin/Heidelberg, Germany, 2023; Volume 436.
2. Rahman, M.S.; Ghosh, T.; Aurna, N.F.; Kaiser, M.S.; Anannya, M.; Hosen, A.S. Machine Learning and internet of things in industry
4.0: A review. Meas. Sens. 2023, 28, 100822. [CrossRef]
3. Vaidya, S.; Ambad, P.; Bhosle, S. Industry 4.0—A glimpse. Procedia Manuf. 2018, 20, 233–238. [CrossRef]
4. Grabowska, S. Smart factories in the age of Industry 4.0. Manag. Syst. Prod. Eng. 2020, 28, 90–96. [CrossRef]
5. Sestino, A.; Prete, M.I.; Piper, L.; Guido, G. Internet of Things and Big Data as enablers for business digitalization strategies.
Technovation 2020, 98, 102173. [CrossRef]
6. Liu, Z.; Mei, W.; Zeng, X.; Yang, C.; Zhou, X. Remaining useful life estimation of insulated gate biploar transistors (IGBTS) based
on a novel volterra K-nearest neighbor optimally pruned extreme learning machine (VKOPP) model using degradation data.
Sensors 2017, 17, 2524. [CrossRef]
7. Le Xuan, Q.; Adhisantoso, Y.G.; Munderloh, M.; Ostermann, J. Uncertainty-aware remaining useful life prediction for predictive
maintenance using deep learning. Procedia CIRP 2023, 118, 116–121. [CrossRef]
8. Lee, J.; Mitici, M. Deep reinforcement learning for predictive aircraft maintenance using probabilistic remaining-useful-life
prognostics. Reliab. Eng. Syst. Saf. 2023, 230, 108908. [CrossRef]
9. de Pater, I.; Mitici, M. Predictive maintenance for multi-component systems of repairables with Remaining-Useful-Life prognostics
and a limited stock of spare components. Reliab. Eng. Syst. Saf. 2021, 214, 107761. [CrossRef]
10. Guo, L.; Li, N.; Jia, F.; Lei, Y.; Lin, J. A recurrent neural network based health indicator for remaining useful life prediction of
bearings. Neurocomputing 2017, 240, 98–109. [CrossRef]
11. Chen, C.; Shi, J.; Lu, N.; Zhu, Z.H.; Jiang, B. Data-driven predictive maintenance strategy considering the uncertainty in remaining
useful life prediction. Neurocomputing 2022, 494, 79–88. [CrossRef]
12. Stavropoulos, P.; Papacharalampopoulos, A.; Vasiliadis, E.; Chryssolouris, G. Tool wear predictability estimation in milling based
on multi-sensorial data. Int. J. Adv. Manuf. Technol. 2016, 82, 509–521. [CrossRef]
13. Zhang, C.; Yao, X.; Zhang, J.; Jin, H. Tool condition monitoring and remaining useful life prognostic based on a wireless sensor in
dry milling operations. Sensors 2016, 16, 795. [CrossRef] [PubMed]
Sensors 2024, 24, 3215 23 of 25
14. Aivaliotis, P.; Georgoulias, K.; Chryssolouris, G. The use of Digital Twin for predictive maintenance in manufacturing. Int. J.
Comput. Integr. Manuf. 2019, 32, 1067–1080. [CrossRef]
15. Dhiman, H.S.; Deb, D.; Muyeen, S.M.; Kamwa, I. Wind turbine gearbox anomaly detection based on adaptive threshold and twin
support vector machines. IEEE Trans. Energy Convers. 2021, 36, 3462–3469. [CrossRef]
16. Dhiman, H.S.; Bhanushali, D.; Su, C.-L.; Berghout, T.; Amirat, Y.; Benbouzid, M. Enhancing Wind Turbine Reliability through
Proactive High Speed Bearing Prognosis Based on Adaptive Threshold and Gated Recurrent Unit Networks. In Proceedings
of the IECON 2023-49th Annual Conference of the IEEE Industrial Electronics Society, Singapore, 16–19 October 2023; IEEE:
New York, NY, USA, 2023; pp. 1–6.
17. Gao, R.; Wang, L.; Teti, R.; Dornfeld, D.; Kumara, S.; Mori, M.; Helu, M. Cloud-enabled prognosis for manufacturing. CIRP Ann.
2015, 64, 749–772. [CrossRef]
18. Oo, M.C.M.; Thein, T. An efficient predictive analytics system for high dimensional big data. J. King Saud Univ.-Comput. Inf. Sci.
2022, 34, 1521–1532. [CrossRef]
19. Suh, J.H.; Kumara, S.R.; Mysore, S.P. Machinery fault diagnosis and prognosis: Application of advanced signal processing
techniques. CIRP Ann. 1999, 48, 317–320. [CrossRef]
20. Cerquitelli, T.; Nikolakis, N.; O’Mahony, N.; Macii, E.; Ippolito, M.; Makris, S. Predictive Maintenance in Smart Factories; Springer:
Singapore, 2021.
21. Huang, C.G.; Huang, H.Z.; Li, Y.F. A bidirectional LSTM prognostics method under multiple operational conditions. IEEE Trans.
Ind. Electron. 2019, 66, 8792–8802. [CrossRef]
22. Liu, C.; Yao, R.; Zhang, L.; Liao, Y. Attention based Echo state Network: A novel approach for fault prognosis. In Proceedings of
the 2019 11th International Conference on Machine Learning and Computing, Zhuhai, China, 22–24 February 2019; pp. 489–493.
23. Jaenal, A.; Ruiz-Sarmiento, J.-R.; Gonzalez-Jimenez, J. MachNet, a general Deep Learning architecture for Predictive Maintenance
within the industry 4.0 paradigm. Eng. Appl. Artif. Intell. 2024, 127, 107365. [CrossRef]
24. Alabadi, M.; Habbal, A.; Guizani, M. An Innovative Decentralized and Distributed Deep Learning Framework for Predictive
Maintenance in the Industrial Internet of Things. IEEE Internet Things J. 2024. [CrossRef]
25. Farahani, S.; Khade, V.; Basu, S.; Pilla, S. A data-driven predictive maintenance framework for injection molding process. J. Manuf.
Process. 2022, 80, 887–897. [CrossRef]
26. Yousuf, M.; Alsuwian, T.; Amin, A.A.; Fareed, S.; Hamza, M. IoT-based health monitoring and fault detection of industrial AC
induction motor for efficient predictive maintenance. Meas. Control 2024. [CrossRef]
27. D’Urso, D.; Chiacchio, F.; Cavalieri, S.; Gambadoro, S.; Khodayee, S.M. Predictive maintenance of standalone steel industrial
components powered by a dynamic reliability digital twin model with artificial intelligence. Reliab. Eng. Syst. Saf. 2024, 243,
109859. [CrossRef]
28. Sawant, V.; Deshmukh, R.; Awati, C. Machine learning techniques for prediction of capacitance and remaining useful life of
supercapacitors: A comprehensive review. J. Energy Chem. 2022, 77, 438–451. [CrossRef]
29. Zhang, H.; Luo, Y.; Zhang, L.; Wu, Y.; Wang, M.; Shen, Z. Considering three elements of aesthetics: Multi-task self-supervised
feature learning for image style classification. Neurocomputing 2023, 520, 262–273. [CrossRef]
30. Kwak, D.; Choi, S.; Chang, W. Self-attention based deep direct recurrent reinforcement learning with hybrid loss for trading
signal generation. Inf. Sci. 2023, 623, 592–606. [CrossRef]
31. de Carvalho Bertoli, G.; Junior, L.A.P.; Saotome, O.; dos Santos, A.L. Generalizing intrusion detection for heterogeneous networks:
A stacked-unsupervised federated learning approach. Comput. Secur. 2023, 127, 103106. [CrossRef]
32. Mohammed, A.; Kora, R. A comprehensive review on ensemble deep learning: Opportunities and challenges. J. King Saud
Univ.-Comput. Inf. Sci. 2023, 35, 757–774. [CrossRef]
33. Pang, Y.; Zhou, X.; Zhang, J.; Sun, Q.; Zheng, J. Hierarchical electricity time series prediction with cluster analysis and sparse
penalty. Pattern Recognit. 2022, 126, 108555. [CrossRef]
34. Zonta, T.; Da Costa, C.A.; da Rosa Righi, R.; de Lima, M.J.; da Trindade, E.S.; Li, G.P. Predictive maintenance in the Industry 4.0:
A systematic literature review. Comput. Ind. Eng. 2020, 150, 106889. [CrossRef]
35. Huang, S.-Y.; An, W.-J.; Zhang, D.-S.; Zhou, N.-R. Image classification and adversarial robustness analysis based on hybrid
quantum–classical convolutional neural network. Opt. Commun. 2023, 533, 129287. [CrossRef]
36. Li, Y.; Hao, Z.; Lei, H. Survey of convolutional neural network. J. Comput. Appl. 2016, 36, 2508.
37. Li, Z.; Liu, F.; Yang, W.; Peng, S.; Zhou, J. A survey of convolutional neural networks: Analysis, applications, and prospects. IEEE
Trans. Neural Netw. Learn. Syst. 2021, 33, 6999–7019. [CrossRef] [PubMed]
38. Bueno-Barrachina, J.-M.; Ye-Lin, Y.; Nieto-Del-Amor, F.; Fuster-Roig, V. Inception 1D-convolutional neural network for accurate
prediction of electrical insulator leakage current from environmental data during its normal operation using long-term recording.
Eng. Appl. Artif. Intell. 2023, 119, 105799. [CrossRef]
39. Guo, Y.; Zhou, Y.; Zhang, Z. Fault diagnosis of multi-channel data by the CNN with the multilinear principal component analysis.
Measurement 2021, 171, 108513. [CrossRef]
Sensors 2024, 24, 3215 24 of 25
40. Fernandes, M.; Corchado, J.M.; Marreiros, G. Machine learning techniques applied to mechanical fault diagnosis and fault
prognosis in the context of real industrial manufacturing use-cases: A systematic literature review. Appl. Intell. 2022, 52,
14246–14280. [CrossRef] [PubMed]
41. Rout, A.K.; Dash, P.; Dash, R.; Bisoi, R. Forecasting financial time series using a low complexity recurrent neural network and
evolutionary learning approach. J. King Saud Univ.-Comput. Inf. Sci. 2017, 29, 536–552. [CrossRef]
42. Zhang, J.; Wang, P.; Yan, R.; Gao, R.X. Deep learning for improved system remaining life prediction. Procedia CIRP 2018, 72,
1033–1038. [CrossRef]
43. Malhi, A.; Yan, R.; Gao, R.X. Prognosis of defect propagation based on recurrent neural networks. IEEE Trans. Instrum. Meas.
2011, 60, 703–711. [CrossRef]
44. Wang, Y.; Zhao, Y.; Addepalli, S. Remaining useful life prediction using deep learning approaches: A review. Procedia Manuf.
2020, 49, 81–88. [CrossRef]
45. Gao, S.; Huang, Y.; Zhang, S.; Han, J.; Wang, G.; Zhang, M.; Lin, Q. Short-term runoff prediction with GRU and LSTM networks
without requiring time step optimization during sample generation. J. Hydrol. 2020, 589, 125188. [CrossRef]
46. Yan, H.; Qin, Y.; Xiang, S.; Wang, Y.; Chen, H. Long-term gear life prediction based on ordered neurons LSTM neural networks.
Measurement 2020, 165, 108205. [CrossRef]
47. Gers, F.A.; Schmidhuber, J.; Cummins, F. Learning to forget: Continual prediction with LSTM. Neural Comput. 2000, 12, 2451–2471.
[CrossRef] [PubMed]
48. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [CrossRef] [PubMed]
49. Abhaya, A.; Patra, B.K. An efficient method for autoencoder based outlier detection. Expert Syst. Appl. 2023, 213, 118904.
[CrossRef]
50. Zhou, C.; Paffenroth, R.C. Paffenroth. Anomaly detection with robust deep autoencoders. In Proceedings of the 23rd ACM
SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017;
pp. 665–674.
51. Liao, W.; Guo, Y.; Chen, X.; Li, P. A unified unsupervised gaussian mixture variational autoencoder for high dimensional outlier
detection. In Proceedings of the 2018 IEEE International Conference on Big Data (Big Data), Seattle, WA, USA, 10–13 December
2018; IEEE: New York, NY, USA, 2018; pp. 1208–1217.
52. Jeon, S.; Kang, J.; Kim, J.; Cha, H. Detecting structural anomalies of quadcopter UAVs based on LSTM autoencoder. Pervasive Mob.
Comput. 2022, 88, 101736. [CrossRef]
53. Dou, T.; Clasie, B.; Depauw, N.; Shen, T.; Brett, R.; Lu, H.-M.; Flanz, J.B.; Jee, K.-W. A deep LSTM autoencoder-based framework
for predictive maintenance of a proton radiotherapy delivery system. Artif. Intell. Med. 2022, 132, 102387. [CrossRef] [PubMed]
54. Bampoula, X.; Siaterlis, G.; Nikolakis, N.; Alexopoulos, K. A deep learning model for predictive maintenance in cyber-physical
production systems using lstm autoencoders. Sensors 2021, 21, 972. [CrossRef] [PubMed]
55. Sagheer, A.; Kotb, M. Unsupervised pre-training of a deep LSTM-based stacked autoencoder for multivariate time series
forecasting problems. Sci. Rep. 2019, 9, 19038. [CrossRef]
56. Mo, Y.; Wu, Q.; Li, X.; Huang, B. Remaining useful life estimation via transformer encoder enhanced by a gated convolutional
unit. J. Intell. Manuf. 2021, 32, 1997–2006. [CrossRef]
57. Hao, J.; Wang, X.; Yang, B.; Wang, L.; Zhang, J.; Tu, Z. Modeling recurrence for transformer. arXiv 2019, arXiv:1904.03092.
58. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Łukasz, K.; Illia, P. Attention is all you need. arXiv
2017, arXiv:1706.03762.
59. Ntakouris, T. Timeseries Classification with a Transformer Model. Keras, 2021. Available online: https://keras.io/examples/
timeseries/timeseries_classification_transformer/ (accessed on 10 January 2024).
60. Bergen, L.; O’Donnell, T.; Bahdanau, D. Systematic generalization with edge transformers. Adv. Neural Inf. Process. Syst. 2021, 34,
1390–1402.
61. Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding.
arXiv 2018, arXiv:1810.04805.
62. Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Yanqi, Z.; Wei, L.; Liu, P.J. Exploring the limits of transfer
learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551.
63. Chen, D.; Hong, W.; Zhou, X. Transformer network for remaining useful life prediction of lithium-ion batteries. IEEE Access 2022,
10, 19621–19628. [CrossRef]
64. Huertas-García, Á.; Martín, A.; Huertas-Tato, J.; Camacho, D. Exploring Dimensionality Reduction Techniques in Multilingual
Transformers. Cogn. Comput. 2023, 15, 590–612. [CrossRef] [PubMed]
65. Hu, W.; Zhao, S. Remaining useful life prediction of lithium-ion batteries based on wavelet denoising and transformer neural
network. Front. Energy Res. 2022, 10, 1134. [CrossRef]
66. Joseph, V.R. Optimal ratio for data splitting. Stat. Anal. Data Min. ASA Data Sci. J. 2022, 15, 531–538. [CrossRef]
67. Python Language Reference, Version 3.7. Available online: https://docs.python.org/3.7/reference/ (accessed on 29 January 2021).
68. Al-Taie, M.Z.; Kadry, S.; Lucas, J.P. Online data preprocessing: A case study approach. Int. J. Electr. Comput. Eng. 2019, 9, 2620.
[CrossRef]
Sensors 2024, 24, 3215 25 of 25
69. Spuzic, S.; Strafford, K.N.; Subramanian, C.; Savage, G. Wear of hot rolling mill rolls: An overview. Wear 1994, 176, 261–271.
[CrossRef]
70. Spuzic, S.; Strafford, K.; Subramanian, C.; Savage, G. Low complexity autoencoder based end-to-end learning of coded communi-
cations systems. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium,
25–28 May 2020; IEEE: New York, NY, USA, 2020; pp. 1–7.
71. Simoulin, A.; Crabbé, B. How many layers and why? An analysis of the model depth in transformers. In Proceedings of the
59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural
Language Processing: Student Research Workshop, Bangkok, Thailand, 1–6 August 2021; pp. 221–228.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.