Artículo 4
Artículo 4
Artículo 4
Measurement
journal homepage: www.elsevier.com/locate/measurement
A R T I C L E I N F O A B S T R A C T
Keywords: In the Engineering discipline, prognostics play an essential role in improving system safety, reliability and
Machine prognostics enabling predictive maintenance decision-making. Due to the adoption of emerging sensing techniques and big
Predictive maintenance data analytics tools, data-driven prognostic approaches are gaining popularity. This paper aims to deliver an
Condition-based maintenance
extensive review of recent advances and trends of data-driven machine prognostics, with a focus on their ap
Machine learning
Prognostics and health management
plications in practice. The primary purpose of this review is to categorize existing literature and report the latest
Remaining useful life research progress and directions to support researchers and practitioners in acquiring a clear comprehension of
the subject area. This paper first summarizes fundamental methodologies on data-driven approaches for pre
dictive maintenance. Then, the article further conducts a comprehensive investigation on the different fields of
applications of machine prognostics. Finally, a discussion on the challenges, opportunities, and future trends of
predictive maintenance is presented to conclude this paper.
* Corresponding author.
E-mail address: btseng@utep.edu (T.-L.B. Tseng).
https://doi.org/10.1016/j.measurement.2021.110276
Received 11 July 2021; Received in revised form 28 September 2021; Accepted 2 October 2021
Available online 8 October 2021
0263-2241/Published by Elsevier Ltd.
Y. Wen et al. Measurement 187 (2022) 110276
critical aspects of predictive maintenance. Diagnostics refer to identi model-based, statistical-based, and data-driven concepts. This work
fying the presence of operational faults and determining the root cause only focuses on applying machine tools considering the feed axis,
and effect to the functional equipment. In contrast, prognostics deal with spindle speed, and hydraulic system. Table 1 highlights the contribution
predicting the future state or remaining useful life (RUL) based on the of some review works in the last decade.
current and historical conditions. Accurately predicting “life-span” is the Unlike the above-mentioned review works, this paper aims to pro
key to the success of predictive maintenance. It involves analytical vide an extensive and broad overview of the most recent advances and
computations of historical or real-time data streamed from applications, trends of data-driven machine prognostics for predictive maintenance,
sensors, devices, etc. In general, prognostics measures the extent of focusing on their applications in different industrial fields. This article
deviation and degradation of any machine or system from the normal provides a detailed discussion and recent advances in each of the cate
operating behavior to predicts the RUL and future performance. How gories of data-driven approaches for recent five years. The primary
ever, the task of prognostics is not trivial as predicting future perfor motivation of this survey is to categorize the existing literature and
mance depends on the analysis of failure modes, early signals of wear summarize the latest research progress and directions to assist re
and aging, and the nature of faults. It also requires sound knowledge of searchers and practitioners in acquiring a clear comprehension of the
the failure mechanisms which have a certain amount of physical subject area. The paper also discusses applied research issues when
randomness. Moreover, prognostics identifies the potential system pa applying current technology and suggests some potentially promising
rameters that are likely to cause the degradations, leading to eventual directions for predictive maintenance.
failures, which involves considerable uncertainty and complicates the The remainder of this paper is organized as follows. Section 2 pro
prediction. Therefore, prognostics is much more challenging than di vides a discussion on typical data-driven predictive models and analysis
agnostics and requires effective and efficient predictive models to methods. The applications for each category of data-driven approaches
monitor the machine health conditions. are summarized in Section 3. Section 4 presents the current challenges
In general, prognostic models can be classified into two groups, and opportunities of machine prognostics for predictive maintenance.
physical-based and data-driven models [9]. The physical-based models Section 5 concludes this work.
capture the failure mechanisms or physical phenomena to build a
mathematical representation of the degradation process. It always re 2. Data-driven prognostic algorithms
quires a thorough understanding of the sophisticated degradation
mechanisms, making it infeasible or ineffective in practical applications Prognostics algorithms focus on predicting when a system or a
due to the system complexity or unclear degrading mechanism [10,11]. component stops to perform its intended functions. In other words, the
On the other hand, data-driven prognostics usually deploy data mining prognostic algorithms predict the future performance or the RUL of a
techniques to identify the pattern and anomalies within the raw signals/ system or component by analyzing the extent of deviation and degra
data to detect any changes in system states. Due to the promising ap dation from its expected normal operating conditions. In general, the
plications and data availability, data-driven models are becoming health state of an item degrades linearly with its usage or operating
attractive in recent years. Data-driven models can be further classified cycle. However, the task of prognostics is not trivial due to the variation
into three subcategories, statistical-based models, conventional machine of operation conditions, environment, and complex nature of different
learning based models, and deep learning based models. In the first parameters. Prognosis requires intensive degradation data of an item,
subcategory, the general path and stochastic process models are usually such as lifetime data or run-to-failure histories. To make accurate
designed to track the trajectory of the degradation in a probabilistic prognostics, choosing a proper modeling technique is essential. There
manner. Conventional machine learning approaches, including random are mainly three modeling strategies for predictive maintenance based
forest (RF), artificial neural network (ANN), support vector machine on degradation data: (1) regression models, (2) classification models, (3)
(SVM), etc., are commonly designed to extract features for machine RUL survival models. A regression model seeks to model the trajectory of a
prediction. As data increases in dimensionality and volume, deep degradation path and then predict when the system will fail. A classi
learning with automatic feature learning demonstrates outstanding fication model tries to predict if the failure occurs within a given time
performance in reliability estimation using degradation data. window. The basic idea of survival models is trying to answer how the
Several efforts have been performed to review the topic of degra risk of failure changes in time. To implement these strategies, data-
dation modeling and machine prognostics in the recent decade. Si et al.
provided a systematic review on data-driven models for RUL prediction.
They classified the data-driven models into two broad types of models Table 1
according to the criterion that if the models rely on directly observed Recent contribution of the review works in the field of PHM.
state information or not [12]. Ye and Xie [13] classified existing Reference Year Description
degradation models into general path models, stochastic process models, [12] 2013 The review only focused on statistical data-driven approaches
and others with a focus on stochastic models. Zhang et al. [14] provided [13] 2014 Reviewed based on three categories: statistical process models
a review on degradation model-based RUL estimation approaches with (SPMs), general path models (GPMs), and other models
an emphasis on the heterogeneity in the systems. All of the above works beyond SPMs and GPMs
[22] 2014 Summarized the PHM systems for rotating machinery and
only focused on statistical-based models. It has been observed a prolif provides a systematic design methodology
eration of data-driven algorithms to help with prognostics in the latest [14] 2015 Focused on the degradation modeling and RUL estimation for
few years. For example, Wang et al. [15] provided an in-depth review on heterogeneity in the systems
health indicator construction for vibration-based bearing and gear. Khan [15] 2017 Reviewed on the health indicators construction for vibration-
based bearing and gears using mechanical signals
and Yairi [16] presented a systematic review of artificial intelligence
[16] 2018 Provided an overview of architectures and theories of artificial
based system health management and recent trends of deep learning in intelligence-based prognostics approaches with plausible
the reliability field. Recently, [17 18] surveyed contemporary ad advantages and limitations
vancements of deep learning and its applications to machine health [17,18] 2019 Surveyed the deep learning based prognostics approaches and
monitoring. All of them only survey AI techniques for system health their applications
[19,20] 2019 Reviewed the recent advancement in the field of prognostics
management. In another recent work, Kordestani et al. [19] and Guo and summarized them into three categories, i.e., the data-
et al. [20] reviewed and summarized the emerging prognostic modeling driven, physics-based, and hybrid prognostics.
methods, which can be classified into data-driven, physics-based/ [21] 2020 Provides a review on diagnostics and prognostics approaches
model-based, and hybrid approaches. Baur et al. [21] presented a review focusing on the application of machine tools considering the
feed axis, spindle speed, and hydraulic system.
on diagnostics and prognostics approaches from knowledge-based,
2
Y. Wen et al. Measurement 187 (2022) 110276
driven models can be classified into three categories: statistical-based such as Weibull, normal, lognormal, etc. Third, error items that capture
models, conventional machine learning based models, and deep the produce and environment noises can be assumed independently and
learning based models. In this section, we report a systematic overview identically distributed or correlated among different time points. To
of these three categories. The structure of this section is summarized in predict the RUL for working units at an individual level, the parameters
Fig. 1. in the model need to be estimated at the offline stage and updated at the
online stage when new observations are available. For the offline
parameter estimation, the empirical two-stage method [29], maximum
2.1. Statistical based models likelihood estimation (MLE) and expectation–maximization (EM) algo
rithm provide reliable estimates. For the online parameters updating,
Typically, a statistical based model for the RUL estimation is con Bayesian framework is the most natural way, where posterior distribu
structed via fitting a probabilistic model to data without relying on any tions of the model parameters are generated based on newly collected
physics or engineering principle. Two broad categories of statistical data.
based models are general path models (GPMs) and stochastic process
models (SPMs). In the following subsections, a brief review of each type 2.1.2. Stochastic process model
of these models is provided. There are four SPMs in the literature which are commonly used for
RUL prediction, namely, Wiener process, Gamma process, Gaussian
2.1.1. General path model process and inverse Gaussian process models. A brief description of these
The basic idea of a GPM, which was first introduced by Lu and four models is illustrated in the following paragraphs.
Meeker in 1993 [23], is to find an appropriate parametric regression
model to capture the degradation trend over time. The general path 2.1.2.1. Wiener process. In the stochastic process based model, for any
model allows the direct use of degradation data and captures the unit- time t, and Δt > 0, the increments ΔX(t) = X(t +Δt) − X(t) of degrada
wise fluctuation in degradation data. For any given time t, the degra tion signal X(t) in disjoint time intervals are independent. For a Wiener
dation path of unit i is defined as process, ΔX(t) is normally distributed. If we use a wiener process to
yi (t) = η(t, φ, θt ) + εi (1) describe a degradation trend, The basic form can be written as
X(t) = λt + σB(t) (2)
where φ is a vector of fixed (population) effects for all units, θt is a vector
of random (individual) effects for the ith unit, and εi N(0, σ 2 ) is the where λ is a drift parameter reflecting the degradation rate, σ is a
normally distributed measurement errors. This model relies on three diffusion coefficient, B(t) represents a standard Brownian motion. So
basic assumptions. Firstly, the degradation data should be captured (
then ΔX(t) is a normal distribution with ΔX(t) N β[Λ(t + Δt) − Λ(t) ],
using any appropriate failure models and mapped to a function of η(∙). )
σ 2 [Λ(t + Δt) − Λ(t) ] based on the property of the Wiener process. As a
Secondly, the historical data should be collected under similar situations
degradation model, the Wiener process has some unique advantages. A
considering a reasonable variation of each individual component.
dominant advantage is that the distribution of the failure time can be
Finally, there exists some defined critical level of degradation, termed as
formulated analytically by the first passage time (FPT), in which its
a soft failure, which indicates component failure. Due to its simplicity
probability density function (PDF) follows an inverse Gaussian distri
and ease of implementation, GPM has been well-studied, and various
bution, namely
extensions have been developed based on the basic form in Equation (1)
[12,24,25]. First, the functional form for the η(∙)could be linear, ( )1 [ ]
b 2 b(y − a)2
quadratic, exponential, etc [26]. If the degradation signals are complex fIG (y; a, b) = exp − ,y > 0 (3)
2πy3 2a2 y
and show nonlinear shapes, two or multiple phases, more generally,
nonparametric regression forms can be assumed [27,28]. Second, a
where b is the mean and a is the shape parameter. Due to its mathe
variety of distributions for the parameters in the η(∙)can be considered,
3
Y. Wen et al. Measurement 187 (2022) 110276
matical properties and physical interpretations, the Wiener process can process X(t) is called inverse Gaussian process. The pdf of an IG distri
be easily extended to satisfy different demands. One alternative is to add bution is defined as
an error term into the basic process to capture measurement errors in √̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
( )
√
degradation signals [30]; the second way is to incorporate random- √b 3 b(x − a)2
f (x; a, b) = √ x− 2 exp − ,x > 0 (7)
effects model in dealing with unobserved heterogeneities, specifically, 2π 2a2 x
assume that λ or σ or both follow some certain parametric distributions,
see examples [31–33] among others. The third approach is to incorpo Let T = inf{t = Y(t) = D} denotes the failure time. The Failure time
rate nonlinear structure into this model to make the model more general. distribution is obtained by
In particular, the more generalized model is defined as
P(T < t) = P(Y(t) > D )
X(t) = λΛ(t; θ) + σ B(τ(t; γ)) (4) [√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ] [√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅ ]
η η
=Φ (Λ(t) − D ) − e2ηΛ(t) Φ − (Λ(t) + D ) (8)
D D
where Λ(t; θ) and τ(t; γ) are non-decreasing functions with parameter
vectors of θ and γ [34]. Wiener processes have attracted significant
where Φ is the standard normal cumulative distribution function (CDF).
attention in modeling several degradation trends encountered in real
Ye et al. [46] first justified its physical meaning by exploring the
systems, such as bridge beams [31], fatigue crack dynamics [35], light-
inherent relations between the IG process and the compound Poisson
emitting diodes [36], thrust ball bearings [37], and micro electro me
process.
chanical systems (MEMS) [38]. Zhang [39] provided a comprehensive
To summarize, SPMs are more favorable than GPMs to account for
review of Wiener process methods with application to RUL prediction.
the randomness in degradation processes caused by both inherent and
environmental factors when a significant fluctuation exists in the data.
2.1.2.2. Gamma process. One of the distinct features of the Wiener However, compared to GPMs, SPMs are often complex and require a
process is that it is a non-monotone stochastic process. However, it more in-depth statistical and computational ability for the model
might not be suitable in many degradation applications that show parameter estimation.
apparent monotone trends. Gamma process is an alternative in this re
gard. If the increment ΔX(t) follows Gamma distribution, the process 2.1.3. Markovian-based model
X(t) is called the Gamma process. The Gamma process is proved to be an Define a set of states S = {s1 , s2 , ⋯, sr }, the Markov process is a
efficient tool in the stochastic modeling of monotonic and gradual process that starts in one of these states and moves successively from one
degradation in a sequence of small increments, such as fatigue, wear, state to another. Although the Markov process still belongs to stochastic
consumption, creep, corrosion, erosion, swell, crack growth, and so processes, this model is distinguished from the above stochastic models.
forth [40]. However, Gamma process models have the following short Markov process assumes a finite state of the degradation and the task is
comings. First, gamma process models are constrained by the assump to find the transition probability among those states. The main property
tion of Markov property. Second, gamma process models are only of the Markov process is being memoryless, which states that the future
effective in describing the monotonic degradation processes [8]. A degradation state only relies on the current degradation state. RUL
survey of the application of gamma processes in degradation modeling estimation using Markovian-based models can be captured by
can be found in [41]. computing the amount of time that the process will take to transit from
the current state to the absorbing state for the first time [12]. In real-
2.1.2.3. Gaussian process. The Gaussian process is another emerging world applications, however, the transition probabilities may also be
approach in the field of prognostics. A Gaussian is defined mathemati related to other variables, e.g., the level of degradation, the time when
cally as the product reached the current state, etc. Semi-Markovian models
f (x) = GP(m(x), k(x, x’)) (5) extend the application of Markovian-based models by incorporating the
effects of these factors. In practice, the actual degradation level is not
where m(x) and k(x, x’) are the mean and covariance functions respec accessible due to the complexity of degradation process or the random
tively, denoted by nature of the equipment. Hidden Markov Models (HMM) and Hidden
Semi-Markov models (HSMM) [47] can be used to solve this issue. In
[ m(x) = E(f (x) ) ] (6) HMM, the state of the hidden process can be inferred by the observation
k(x, x’ ) = E (f (x) − m(x) )(f (x’ ) − m(x’ ) )T
sequences, each state is described by probability density distribution,
Gaussian process regression is a way to undertake non-parametric and each observation vector is generated by the state of the corre
regression with Gaussian processes. The idea is that Gaussian process sponding probability density distribution. HSMM, as a generalization of
regression uses conditioning on Gaussian vectors to find a model that HMM, can reflect gradual changes because of the semi-markovian
actually passes through the data points. Unlike classical regression assumption. Markovian-based models, which are known for exact and
models, Gaussian process regression does not force an analytical formula approximate learning and inference, have a strong statistical foundation
for the predictor, but a covariance structure for the outcomes. To and have been well studied. Due to their Markovian nature, they do not
accurately reflect the correlations presented in the data, the covariance take into account the sequence of states leading into any given state.
functions need to be specified, and the hyperparameter values of the
covariance function need to be optimized. Due to the probabilistic na 2.1.4. Filtering-based model
ture of the Gaussian process models, the classic model optimization The Kalman filtering model and Particle filters (PFs) are the most
approach where model parameters are optimized through the minimi popular filtering-based degradation models. In the Kalman filtering
zation of a cost function such as mean square error is not readily model, the unobserved degradation xt and the observed degradation
applicable. A probabilistic approach to the optimization of the model, signal yt has the relationship that xt = αxt− 1 +∊t and yt = βxt + ηt ,
such as the maximum likelihood method, is more appropriate. Some where ∊t and ηt are Gaussian noises, α and β are the parameters of the
examples of Gaussian process regression applied to RUL prognostics can state-space model. Unlike Markovian-based models that only depend on
be found in [42–45]. the last degradation signal, the Kalman filtering model takes advantage
of all historical data. However, the Kalman filtering model is constrained
2.1.2.4. Inverse Gaussian process. The inverse Gaussian (IG) process is by linear assumption and Gaussian noise assumption. To overcome the
another natural choice for degradation data which provides a monotone drawback, PFs are particularly useful for linear/nonlinear Gaussian/
degradation path. If the increment ΔX(t) follows IG distribution, the non-Gaussian state-space models. The process of a PF can be expressed
4
Y. Wen et al. Measurement 187 (2022) 110276
by the state transition function f and the measurement function h: methodology to be used in the binary classification problem and then is
applied to solve the regression problem. When it is applied to a regres
xk = f (xk− 1 , θk , νk )
(9) sion problem it is termed as Support Vector Regression (SVR). The key
zk = h(xk , ωk )
purpose for SVR is to get a functional relationship between input and
where kis the time step, xk is unobserved degradation, θk is a vector of output under the hypothesis that the joint distribution of the input and
model parameters, zk is observed degradation signal, νk and ωk are pro output is not defined and unknown. SVR uses a complex penalty func
cess and measurement noise, respectively. The posterior p(xk |z1:k ) can be tion that a penalty cannot be enacted if the predicted value is farther
updated recursively using Bayesian inference based on up-to-date ob away from the real value. The restricted region is called an insensitive
servations. Once the degradation model is updated, the future degra tube [55]. Then support vectors are then fitted to regression models and
dation magnitudes and RUL can be predicted based on the updated apply to predict the degradation level and calculate the corresponding
model. Comparing with the Kalman filter, PF is more elastic as it does RUL values [56]. Benkedjouh et al. [57] used SVR and the isometric
not assume linearity and Gaussian nature of noise in data. Both filters feature mapping reduction technique to predict the RUL for rotating
start with a state-space representation of the stochastic processes of in machines. Hu et al. [58] built an RUL prediction method based on fuzzy
terest. They are robust and scale well in many applications but at the C-mean clustering and wavelet SVM. Shen et al. [59] designed an SVR
price of high computational cost. based on a generic multi-class solver to recognize the different faults
pattern of rotating machinery. Liu et al. [60] proposed an improved
2.1.5. Covariate based model probabilistic SVM regression technique to predict the condition of Nu
The risk factors that cause the degradation process are called cova clear Power Plant elements. A comprehensive review on SVM-based
riates. One of the most popular covariate-based models is Cox Propor estimation of RUL can be found in reference [61]. SVM is very effec
tional Hazard (PH) model, which was proposed by Cox in 1972 [48]. The tive in high dimensional spaces and works well in cases where the
Cox PH model allows to describe the survival time/RUL as a function of number of dimensions is greater than the number of samples. More
multiple prognostic factors. The basic format of the Cox PH model is importantly, SVM is relatively memory efficient, which is a great
defined as, advantage for online modeling. However, when the noise level is high,
the performance may decrease significantly.
h(t; z) = h0 (t)exp(βz) (10)
2.2.2. Decision tree
where h0 (t) is the baseline hazard rate function, which can be either The Decision Tree (DT) is a non-parametric supervised technique
nonparametric or parametric, so the model is often called as a semi- based on a tree-like model for regression and classification. The key
parametric approach. z is a vector of the corresponding covariates/ purpose of DT is to predict the value of an objective variable by estab
prognostic factors. The covariate z is associated with the system. β is the lishing a hierarchical structure composed of nodes extracted from a
unknown parameter of the model, which is called regression coefficient, training dataset. A DT generally consists of one root, several branches,
defining the effects of the covariates. With the hazard function in and many interval nodes. Every path is from the root node to a leaf node
Equation (10), the pdf of the failure time can be defined as through the internal nodes. This path denotes a classification with the
(∫ t ) different conditions of the components or systems. Every leaf node
f (t) = h(t)δ S(t) = h(t)δ exp − h(u)du (11) represents a response for regression or a class label for classification. To
0
extend the power of DT, some variants have been developed, such as
(∫ )
t gradient boosting decision tree (GBDT), random forest (RF). The RF is a
where S(t) = exp 0 − h(u)du is the survival function. δ denotes an
term for a collaborative approach of DT, which consists of numerous
indicator function taking value 1 if the system is failed at t, or taking trees. Unlike classical methods that build a single tree on a whole
value 0 if it is censored. Note that censored means that the equipment dataset, RF randomly chooses the features and instances to build mul
doesn’t have a failure event. The Cox PH model is frequently used in tiple trees. Each DT then votes for a particular target class and a class
medical statistics and has been extended to the manufacturing field in having the bulk votes is the model’s prediction with a certain proba
recent years. From the Equation (10) we can see that if z is extended to bility. In contrast to a traditional DT, RF demonstrates good predictive
z(t), the degradation signals can be easily incorporated into the equation performance with considerable noise made by random selection of in
by treating degradation data as a time-varying covariate. It is beneficial stances and features. Furthermore, it can deal with large datasets having
in reliability analysis for hard failure systems, where each equipment numerous features with diverse data types, e.g., continuous or cate
runs to fail, so that different unit has different degradation level. The gorical values. Kundu et al. [62] presented an RF regression methodol
functional form for z(t) can be general path model [49], Wiener process ogy for RUL prediction for spur gears depending on pitting failure mode.
[50,51], multivariate Gaussian convolution process [52]. The Cox PH GBDT is an iteratively accumulative decision tree method. The algo
model, a semi-parametric approach, is more robust than other para rithm accumulates the results of multiple decision trees as the final
metric approaches as it is not vulnerable to misspecification of the prediction output by creating a group of weak learners. Wang et al. [63]
baseline hazard. But the proportional hazard assumption may limit its developed a GBDT model to estimate the RUL by choosing fault features
application to accounting for complex relationships among covariates. and measuring fault severity subjected to relative entropy distance in
Recently, deep learning based Cox models have been implemented to fault prediction of electronic circuits. DT requires less effort for data
relax the proportional hazard assumption [53,54]. preparation during pre-processing, and it is very intuitive and easy to
explain. But we need to be careful that a slight change in the data can
2.2. Conventional machine learning based models cause a significant change in the structure of the decision tree, which
makes the model instable. The calculation of the tree can go far more
Though machine learning has been around for several decades, it has complex compared to other algorithms.
seen a revival in recent years due to the dominance of data stemming
from the information explosion. In the following subsection, the various 2.2.3. Back propagation neural network
machine learning algorithms have been reviewed and discussed from the Back Propagation Neural Network (BPNN) is a supervised-learning
predictive maintenance perspective. method implemented by iterative optimization to solve the classifica
tion or regression problem. Usually, it takes a vector as the input, and
2.2.1. Support vector machine outputs is a label representing the information of corresponding classes
Support Vector Machine (SVM) was initially established as a or function value. It firstly calculates the model results through the
5
Y. Wen et al. Measurement 187 (2022) 110276
forward propagation step and then tunes the network’s weights through datasets. To expand the power of CNN, several variants of CNN-based
the back propagation step. The two steps above can be executed itera models have been introduced in the literature as reported in Table 2.
tively until the errors between the model results and the label reduce to a The key benefit of using CNNs is to extract complex, non-linear, non-
desired threshold. The ultimate target of the BPNN is to get the network
parameters representing the relation between the input and output by
minimizing a corresponding loss function. BPNN usually used the Table 2
squared error sum (SES) for the network as an objective function and Variants of CNNs and their use for RUL prediction.
applied the gradient descent technique to get the objective function’s Variants Distinctions References
minimum value. The BPNN, just like other NNs, is flexible and powerful
Deep CNN • It consists of different • Babu et al. [72] combined a
to find the nonlinear mapping between inputs and outputs, and it processing units at multiple regressor with a deep CNN
doesn’t require prior knowledge about the network. But back propaga layers (usually have 5 to 10 architecture to estimate the
tion is notorious for the easily getting stuck in “local minima”. layers, even more) RUL from multivariate time
• Effective in capturing the series data.
salient patterns in the signals • Ren et al. [73] fused a
2.3. Deep learning based model smoothing method with a
CNN built a CNN for
predicting the bearing RUL.
In recent years, deep learning approaches have shown excellent Deep Multi • Multiple CNN architectures • Yang et al. [74] proposed a
performance in various applications ranging from feature extraction, CNN are stacked together double-CNN model architec
defect detection, segmentation, medical imaging, additive • The output of the previous ture to predict RUL using
manufacturing, and many more [64–70]. Realizing the promising abil CNN becomes input of other original vibration signals
CNNs without resorting to any
ity, researchers have experimented on various deep architectures to • Effective in dealing with raw feature extractor.
develop the solution approach in remaining useful life prediction. In the signals, instead of depending
following subsection, we discussed the architecture of deep Convolu on the feature extractor
tional Neural Networks (CNNs) and their variants from the predictive Deep Multi- • MSCNN framework has three • Kiranyaz et al. [75] utilized
scale CNN sequential stages: the MSCNN for fault
maintenance point of view.
(MSCNN) transformation, local detection and identification
convolution, and full for a circuit monitoring
2.3.1. Convolutional neural network convolution. system
Due to the ability to generalize the local and global features, CNNs • The transformation stage • Zhu et al. [76] used the
turn out to be the most popular deep learning methods. CNNs are applies transformations on wavelet transform to
the input time series. propose Time Frequency
exceptionally successful in extracting features from input data and using • In the local convolution Representation (TFR), then
them to make a trustworthy prediction. A basic CNN structure mainly stage, extract the features for applied this TFR to MSCNN
has an input layer, convolution layer, pooling layer, and fully connected each branch. to perform RUL estimation.
layer as shown in Fig. 2. • The full convolution stage • Li et al. [77] used MSCNN for
concatenates all extracted RUL prediction, the model
The input data could be either two-dimensional or one-dimensional
features and applies several has three multi-scale blocks,
such as time–frequency spectrum or time series data, respectively. The more convolutional layers to where three different sizes of
convolution layer uses a set of weights and convolutes at each layer to generate the final output. convolution operations are
form the layer-wise features, which are called a feature map. The output • Effective to keep the multiple put on each block in parallel.
of the convolutional layer is calculated as: Yn = f(X*Wn +bn ), where * levels of abstraction for the
prediction
represents an operator of the convolution, n denotes the number of
Hybrid CNN • This CNN architecture is • Wen et. al [85] proposed a
convolution filters, Wn is the weight matrix, bn is the filter kernel bias. (HCNN) mainly the combination of new residual CNN (ResCNN)
Following the convolution, the model parameters are reduced by sub above-mentioned CNN along by adding a skip connection
sampling, named as pooling process. After the pooling layer, multiple with additional supporting between convolution blocks
layers.
fully connected layers are used to convert the matrix to a row or a col
• Incorporates the advantages
umn. Finally, a classification or regression layer is added to get the of different methodologies by
predictions or results. To predict the RUL, CNN can be used to extract their integration to improve
useful and robust features from data. The number of processing units or the prediction performance.
the CNN structure greatly depends on the nature of problems and
6
Y. Wen et al. Measurement 187 (2022) 110276
handcrafted features because of the superior feature extraction and the major distinction for these variants and includes some representative
object recognition performances of CNN. Based on our observations, publications for each variant as references. Several studies have proven
only a few research works focus on pure CNNs for RUL prediction as that RNNs and LSTMs are outstanding and perform better than many
listed in Table 2. The reason is that, CNNs may not sufficiently model the conventional machine learning approaches and even better than CNNs
temporal characteristics of time series data. Moreover, CNN is signifi for the RUL prediction tasks [92]. However, as some researchers pointed
cantly slower due to a convolutional operation and requires a lot out that the LSTM network may not be robust when processing raw time
of data to train effectively. Also, the tuning to find the proper learning series data directly since the sensor data usually contains noise [93].
rate for the CNN methods on real-world applications is difficult [71].
2.3.3. Autoencoder
2.3.2. Recurrent neural network An autoencoder is an unsupervised neural network. The main idea of
The underlying motivation behind Recurrent Neural Network (RNN) autoencoder is to train the model to reconstruct the original input at the
is to mine the sequential information for any given dataset. It creates output layer [94]. The autoencoder network consists of three layers: an
memory cells that capture the past and predict the future sequence based input layer, a hidden layer for encoding, and an output layer for
on the previous computation. A typical RNN is shown in Fig. 3. As shown decoding as shown in Fig. 5. The size of the hidden neurons is usually
in Fig. 3, the structure of the RNN constitutes a deep network with one smaller than the input. In this way, the network is forced to learn a
layer per time step and shares the parameters across the layers. The compressed representation of the input. Define the input as x, an
concept of parameter sharing is a useful way to capture the relationship encoder function g(.) parameterized by ϕ, a decoder function f(.)
between one input item and its neighboring context. This makes the parameterized by θ, the output as x’, then the reconstructed input is x’ =
RNNs very successful over the traditional NNs and CNNs. The network fθ (gϕ (x)). The parameters (θ, ϕ)are trained to minimize the reconstruc
can be trained in a similar fashion of backpropagation across the time tion error so that the output is similar to the original input, i.e., x ≈ x’.
steps. However, the training process is especially challenging due to the Several variants have been developed to solve different issues based on
problem of gradients vanishing or exploding. To overcome this issue, the basic mode. If the number of network parameters is larger than the
long short-term memory networks (LSTM) is constructed [78]. LSTMs number of inputs, the basic autoencoder will face overfitting issues. To
are a special kind of RNN for remembering information for long periods avoid overfitting and improve the robustness, denoising autoencoder is
(long-term dependencies) and are explicitly designed to avoid the developed by randomly changing some of the input values to zero, then
problem of standard RNN. Similar to standard RNNs, LSTMs also possess loss function is designed in a way that the output values are compared
chain-like structures, but they differ in the structure of the memory cell. with the tampered input instead of original input. Another tactic to
Instead of having a single neural network layer, LSTMs have four control the number of hidden nodes is sparse autoencoder, where a
interacting networks connected in a very tricky way to remove or add sparse constraint is added to limit the activation of its nodes. With sparse
information to the cell state by regulating the structure of different constraint, some nodes in the hidden layer are active and the other nodes
gates. Gates indicate a special setup to control the information passing to are inactive. This constraint is achieved by adding a penalty term into
the cell state and output at each repeating module. Gated Recurrent Unit the loss function. The third variant is called variational autoencoder
(GRU) is another modified LSTM cell, which was introduced by Cho [88]. The basic idea is that instead of mapping an input to a fixed vector,
et al. [79]. Recently, this architecture showed its promising application input is mapped to a distribution. When decoding from the layer, sam
in the field of RUL prediction [80–82]. GRU combines the input gate and ples from each distribution are randomly selected to generate a vector.
the forget gate into the update gate. It also merges with cellular sate and Variational autoencoder provides a probabilistic manner for describing
hidden state. A comparison of the memory cell in standard RNN, LSTM the input. In the predictive maintenance field, autoencoders currently
and GRU LSTM is shown in Fig. 4. are mainly used to reduce the dimension and eliminate the redundancy
The horizontal line on the top of the repeating module indicates the of the data. Autoencoders are commonly employed as a feature extractor
cell state, which passes through the entire network chain with some or a tool for health index construction. Some examples can be found in
minor liner interactions. They are comprised of a sigmoid neural net [95–99]. To learn sequential information from input signals, LSTM
layer and a pointwise multiplication and addition operation. The sig autoencoder is a preferable choice. As an autoencoder learns to capture
moid layer maps numbers between zero and one, where zero means no as much information as possible rather than as much relevant informa
information will pass through and the value of one allows all informa tion as possible, it may misunderstand important variables of the input
tion. Recently many researchers have introduced several variations data.
based on the original RNN and LSTM. Among them, some major variants
are Gated Recurrent Unit (GRU) RNN, Bi-Directional LSTM (BiLSTM), 2.3.4. Bayesian deep learning
and Bi-Directional Handshaking LSTM (BHLSTM). Table 3 summarizes Bayesian Deep learning method is an extension of deep learning in a
probabilistic manner. The fundamental idea is to adopt Bayesian infer
ence as the learning tool for quantifying uncertainty of the model by
treating the deep learning architectures as probabilistic models. Typi
cally, a Bayesian Deep learning places prior distributions over the net
work’s weights and then learn the corresponding posterior distributions
over the weights. Then each forward pass will have different weights
and therefore providing potentially different outputs. As exact Bayesian
inference is computationally intractable for most of the NN structures,
some sampling strategies are used to learn the parameters of Bayesian
deep learning models, which is often computationally expensive. From
the literature review, we observed that some popular approximations
have been proven effective to alleviate the computational burden, such
as variational inference, expectation propagation, Laplace approxima
tion, Hamiltonian methods, bootstrapping, Monte Carlo dropout, etc.
Among these, Monte Carlo dropout, which combines approximate
Bayesian inference with dropout, has drawn considerable attention in
many research fields due to its simplicity, scalability, and computational
Fig. 3. Structure of a typical RNN. efficiency. It is well known that data-driven models inevitably face two
7
Y. Wen et al. Measurement 187 (2022) 110276
Fig. 4. Memory cell structure, (a) standard RNN; (b) LSTM; (c) GRU LSTM.
types of uncertainties: aleatoric uncertainty, reflecting the noise pollu for RUL prediction based on transfer learning can be found from
tion in data collection and transmission, and epistemic uncertainty, [89,98,104–106]. Transfer learning is especially useful in the situation
reflecting the ignorance of model property. To address these issues, a where source data and target data are in different feature spaces or have
variety of researchers have been exploring Bayesian Deep learning to different distributions in which training data for the target problem are
account for the uncertainties to improve model accuracy. References limited but data for a related problem are abundant. Transfer learning
[100,101] use Bayesian deep learning based methods to quantify the provides an effective way for RUL prediction with limited historical
uncertainty of point prediction for bearings and Gas turbine engine. Li failure data. However, transfer learning only performs better under the
et al. [102] proposed a Bayesian deep learning based methods, in which condition that the domain and target problems of both models are
a sequential Bayesian boosting algorithm was executed to improve the similar enough. Otherwise, it will end up with a negative transfer.
prediction accuracy. A Bayesian deep learning model can be treated as Currently, it is still challenging to find solutions to negative transfer.
an ensemble of multiple models, which may naturally reduce the risks of
over-fitting issues. Another benefit of Bayesian deep learning models is 3. Applications in predictive maintenance
that they allow to quantify the uncertainty, which is very important for
RUL prediction considering the limited data availability and the sto In this section, we first provide an overview of the workflow of the
chastic nature of degradation processes in the manufacturing field. But predictive maintenance and how the prognostic approaches are applied
the computational cost is heavier for online inference comparing with for predictive maintenance. Then we share some statistics and give a
other deep learning models as the model needs to run multiple times to comprehensive review of applying RUL predictions to various fields.
get the distribution of outputs. Fig. 6 summarizes the workflow of the predictive maintenance. First,
data are collected intermittently or continuously from an interactive
2.3.5. Transfer learning physical system of interest. Various sensors are installed to collect the
Deep learning models excel at learning from a large number of degradation signals in semi-observable or fully online systems. Some
labeled examples, learn a very accurate mapping from the inputs to commonly used sensors are pressure sensors, force sensors, speed sen
outputs. But it lacks capability to generalize to different application sors, temperature sensors, torque sensors, proximity probes, acceler
scenarios. The reason is that many machine learning algorithms assume ometers, etc. Apart from the sensor data, quantitative data are
that the training and test data are in the same feature space and have the sometimes collected based on the purpose and application domain. For
same distribution. This assumption may not hold in many real-world example, the RUL of many power storage systems depends on the
applications. Transfer learning is developed to tackle this issue charging and discharging cycles. In such cases, a number of cycles are
through storing knowledge learned from one domain (called the source recorded to collect the data. Following the data collection and pro
domain) and transferring it to a different but related problem (called the cessing, feature extraction plays a critical role in model development
target domain). Mathematically speaking, a domain can be defined as and RUL prediction. The straightforward use of raw data is inconvenient
D = {χ , P(X) }, where χ represents a feature space, P(X) is a marginal due to the high complexity and nonlinearity of sensor signals. Hence, the
distribution where X = {x1 , x2 ⋯, xn } ∈ χ . A task can be defined as T = underlying motivation of feature extraction is to utilize the patterns and
{y, f(x) }, where y is a label space, f(⋅) is a predictive function. Given a trends in the sensor signals to predict RUL. Literature in RUL prediction
source domain Ds and learning task Ts , a target domain Dt and learning has focused on many time-domain and frequency-domain features such
task Tt , where Ds ∕
= Dt or Ts ∕= Tt . Transfer learning aims to help improve as root mean square, kurtosis, short-time Fourier transform [107],
the learning of the target predictive distribution based on Ds and Ts . wavelet transform [108], empirical mode decomposition [109].
Based on what to transfer, transfer learning can be conducted at several Recently, a number of machine learning based approaches have been
levels: instance-transfer, feature-representation-transfer, parameter- utilized to learn the learn the features and mapping the raw signals to
transfer, and rational-knowledge-transfer [103]. Instance-transfer tries the associated RUL. Several such machine learning algorithms are
to reweight some the samples from the source domain in an attempt to described in Section 2.2. These machine learning algorithms are that
correct for the distribution difference, then apply them in the target they are capable to extract the useful features and information within
domain for training. Feature-representation-transfer tries to get good the data with a very limited human intervention. Section 2.3 demon
feature representations that can reduce the difference between the strates several deep learning architectures, which are emerging and
source domain and the target domain. Parameter-transfer discovers highly effective techniques for patterns and trends recognition. Their
shared parameters or prior knowledge between the source domain and deep networks are capable of obtaining high-level abstractions of data to
the target domain. Parameter-transfer models believe that a well-trained improve the performance in intelligent prognostics. The best part of
model on the source domain has learned a well-defined structure, and if these deep learning architectures is that they are capable of feature
two tasks are related, this structure can be transferred to the target extraction without human intervention. The extracted features are then
model. Relational-knowledge-transfer works by mapping some similar used as inputs of developed RUL prediction model. Once the predicted
patterns from the inputs to the outputs between both domains. Examples RUL is obtained, the last step is to decide the optimal time to send out an
8
Y. Wen et al. Measurement 187 (2022) 110276
Table 3 practice. Timely acquisition of the inventory number and status of spare
Variants of RNNs and their use for RUL prediction. parts is challenging. Some researchers have investigated this issue by
Variants Distinctions References joint optimization of maintenance and inventory management
[110–112]. Another exigent issue in the decision making process is that
Recurrent Neural • Can utilize the • Heimes [83] utilized the
Network (RNN) sequential information RNN incorporating with spare parts inevitably deteriorate over time due to the inner mechanism
• Able to retain the short- Kalman filtering training and imperfect storage conditions, which will shorten the storage lifetime
term information and evolutionary and eventually affect inventory management. How to estimate the
• Able to capture the algorithm for prognostics storage lifetime with arbitrary number of spare parts based on the
temporal correlations problem.
in sequence data • Liu et al. [84] proposed an
operating and storage degradation processes is attracting researchers’
adaptive RNN for dynamic attention [113,114]. Interested readers are referred to the recent articles
state forecasting to for more details [113,114]. In this paper, we mainly focus on the RUL
leverage the RUL prediction in predictive maintenance.
prediction for Lithium-ion-
Over the last decade, researchers showed intensive interest in RUL
batteries.
• Liang et al. [85] proposed prediction for predictive maintenance and published excellent research
a RNN based health papers in this area. However, we only focus on the recent advancement
indicator for RUL and applications within the time frame of the year 2015–2020. We
prediction of bearings. reviewed 253 published papers under a broad point of view of data-
Long Short-Term • Solve the vanishing • Zhang et al. [86] employed
Memory (LSTM) gradient or exploding LSTM RNN to learn the
driven approaches such as statistical approaches, deep learning, con
problem long-term dependencies ventional artificial intelligence and hybrid approaches (combination of
• Able to retain both among the degraded ca statistical and AI-based methods). The papers were collected from
Long and short term pacities and construct a Google scholar using a thorough key word search of “remaining useful
information RUL predictor.
life (RUL)”, “predictive maintenance (PM)”, “prognostic and health
• It is easy to detect and • Zhao et al. [87] developed
capture important an RUL predictor using the management (PHM)”, “health index (HI)” and “machine health”. It is
features over a long LSTM, which can evaluate observed that the concept of predictive maintenance and RUL prediction
distance the trend features. are widely applied for aircraft engines, bearings, gears, motors, machine
• Xiang et. al. [88] tools, wind turbines, batteries, computer hard disks, and many more.
introduced a new type of
Based on our observations, we categorized all of these applications into
LSTM with weight
amplification for accurate four broad categories as “aircrafts”, “rotating machinery”, “power sys
prediction of gear tems” and “electronic systems”. We also break down the proportion of
remaining life. published papers in these four categories as shown in Fig. 7. Surpris
Gated Recurrent Unit • Can deal with long • Song et al. [80] proposed a
ingly, most of the researchers showed their interest in the field of
(GRU) RNN/LSTM term relationship in battery RUL prediction
RNN approach based on the rotating machineries (48%) and aircraft system (23%) applications. One
• Able to capture RNN with gated recurrent potential reason for this is the availability of standard datasets in these
dependencies at unit. two fields. We found that the NASA C-MAPSS, PRONOSTIA, and Center
different time scales • Chen et al. [81] of Intelligence at the University of Cincinnati dataset are widely used in
• Able to capture the incorporated kernel
this area. However, some researchers are also trying to generate the
inherent relation for principal component
long-term prediction analysis and GRU RNN to dataset from their own lab and focusing on the field of the power and
predict RUL. electronic system components and products.
• Wang et al. [82] presented We also capture the trend (number of published papers) of different
a hybrid RUL prediction
data-driven methods for the individual applications as shown in Fig. 8.
model by adapting a deep
heterogeneous GRU
We can see that the number of publications based on statistical ap
model. proaches is notably large except for hard drives. Due to persuasive
Bi-directional Long • Utilize the information • Zhang et al. [89] proposed mathematical properties and physical interpretations, also the capabil
Short-Term in both forward and a transfer learning based ities to capture the uncertainty of parameters, statistical approaches
Memory (BLSTM) backward direction BLSTM network for
have been attracted widespread attention. However, some statistical
• Suitable for turbofan engine RUL
intermediate prediction. models are complex and can be computationally expensive. Due to the
prediction • Wang et al. [90] proposed fact that deep learning provides good functional mappings between in
a data-driven approach puts and outputs, which is powerful to capture dynamic information, the
with BLSTM network for
number of deep learning prognostic models is also escalating. The main
RUL prediction, which can
make full use of sensor
disadvantage of deep learning is the operating and training processes are
data sequence in “black boxes” and a majority of proposed approaches are only focused
bidirectional. on prediction on a population level, it is hard to catch the uncertainty
Bi-directional • Can utilize two sets of • Elsheikh et al. [91] and individual heterogeneity. It would be interesting to combine the
Handshaking Long LSTM cells in reverse proposed the BHLSTM to
advantages of each approach in the future. In the following subsection,
Short-Term order predict the RUL of a
Memory (BHLSTM) • Allow forward and system, which is capable to we will provide details review of research of RUL prediction for those
backward unit process maximum applications (i.e., aircraft, rotating machinery, electronic systems,
collaboration in the information for any given power systems) of predictive maintenance. For each field, we first
learning process subset of sequence. describe the popular and open access dataset, then summarize the
contemporary research work in the recent five years in the corre
alert of failure to help maintenance decision making. If the predicted sponding field. It is worth noting that the number of papers on data-
time is shorter than the actual time, maintenance will be implemented driven prognostics is enormous. Consequently, some papers would be
earlier, the benefits of more extended usage are lost. If the prediction is omitted inevitably.
too late, the equipment may fail and result in a more significant loss. To
determine the optimal maintenance policy, the inventory management 3.1. Application in rotating machinery
associated with the spare parts cost also needs to be considered in
Bearing, gears, motor, and shaft are the most common rotating parts
9
Y. Wen et al. Measurement 187 (2022) 110276
which are widely used in machinery and industry. Rotating machinery is bearings and gears: The Center of Intelligence at the University of Cin
critical for the machine heath and ensures safety as any rotating parts’ cinnati [134] and PRONOSTIA [115]. Most of the researches in the field
flaws can lead to significant consequences and failure. The identified of bearings and gears used the datasets from these two experimental
significant causes of bearing failures are generally subject to excessive setups. Lubricant is another important substance for rotating machinery.
balance, improper installations, poor lubrication practices, alignment It helps to operate machines in safe and healthy conditions. The dete
tolerances, poor storage, and handling techniques. RUL prediction of rioration trends can be described by observing the characteristics of
bearings and gears is derived based on the vibration or mechanical lubricants and the operation conditions of the machine. Over time and
signals obtained from the bearings. In literature, we found two experi usage, lubricants degrade and produce acidic substances, moisture, and
mental setups for the run-to-failure and historical data collection of insoluble deposits, such as carbon deposits, sludges [116], which lead to
10
Y. Wen et al. Measurement 187 (2022) 110276
Fig. 8. Recent research trends (2015–2020) on different application fields & methods.
machine failure. Hence, it is necessary to understand the deterioration under multiple conditions. Integrating these multiple conditions would
trend in advance to reduce wear and friction from the mobile compo provide better prognosis result.
nents and avoid machine failure. Recently, condition monitoring of Literature also states several artificial intelligence based approaches
lubricating oil has attracted considerable attention in research. The to construct the health indicators for bearing and gear remaining life
lubricant data could be obtained from either a physical experiment or prediction. Zhao et al. [125] employed the principal component analysis
simulation procedure by mimicking the real environment under a (PCA) and linear discriminant analysis for dimensionality reduction.
certain assumption. One of the procedures of collecting lubricant data is Then they used a multiple linear regression model to estimate the RUL.
the four-ball test found in [117]. To obtain real-time data of lubricating In some applications, it is hard to obtain the failure or suspension his
oil, a wear debris sensor and an oil property sensor were employed in the tories which make the RUL prediction tasks more challenging. To
oil cycling line. Besides, a temperature control system was set to offer a address this issue, Xiao et al. [126] developed an inference method using
fixed temperature for the oil property sensor. By doing this, wear con recorded condition monitoring data. An adaptive time windows was
ditions and dynamic viscosity and permittivity of lubricating oil can be employed to divide the extracted features and to train an ANN for
monitored simultaneously. To accelerate oil degradation, a test intelligent prognosis. Bastami et al. [127] used the wavelet packet
composed of various loads and speeds was carried out. The test was transform to extract signal features and later trained an ANN to estimate
stopped and restarted when working conditions were changed. Another the RUL of rolling bearings. The nonlinear nonparametric approach is
lubricant data was found in [118]. The lubricating oil used for large also considered as a very appealing technique in predicting the RUL of
machines was collected from the engine of a loader. In the following bearings. The ensemble technique is another effective machine learning
paragraphs, we articulate different research methodologies and ap paradigm that could be used in RUL prediction. Such an ensemble
proaches win this application field along with the description of the approach, decision tree-based random forest, was proposed in [62] by
commonly used datasets in this area. Kundu et al. for monitoring and detecting the pitting progression in spur
Lei et al. proposed a two-stage method based on a particle filtering gears to predict the RUL of gear.
algorithm to predict the bearing RUL [119]. They fused multiple fea In recent years, deep learning becomes very effective in machine
tures to construct new health indications and then used the maximum- health monitoring and prognostics due to its capability of learning
likelihood estimation to initialize the model parameters. Gaussian Pro representation from raw data. With the development of deep learning
cess Regression (GPR) is another effective method that was used in methods, Guo et al. [85] proposed a deep neural network structure
[120] with the integration of composite kernels. RMS, Kurtosis and Crest named recurrent neural network based health indicator (RNN-HI),
factor are used for feature fusion by self-organizing map. It is experi where several classical time–frequency features are combined with the
mentally demonstrated that integration of composite kernels improves original feature set to get the most sensitive features as the input of RNN-
the prediction accuracy than particle filter method. Liu et al. [121] HI model to leverage the RUL prediction of bearings. Deep learning
divided the entire bearing life and built an individual local regression methods can effectively extract the discriminative features for moni
model to get the multiple health states to leverage the RUL prediction. toring bearing fault. However, temporal information also plays a critical
This constituted a semi-supervised approach that can be utilized without role in the fault degradation process, which was not considered in many
having any prior knowledge. Wang et al. [122] proposed a Wiener cases. Mao et al. [128] first considered this temporal information in
process model with stage correlation and a Bayesian approach to utilize bearing RUL prediction and proposed the LSTM. In another research,
the prior distribution information into the model parameter. Ahmad Tang et al. [129] proposed an LSTM approach combining the bottleneck
et al. [123] proposed a dynamic regression model to capture the trend of features to develop a novel prediction method of bearing performance
bearing health indicator, which is later used to project the future health degradation. In LSTM approaches, first, the feature parameters are
indicator value and estimate the bearing useful life. The adaptive extracted from the different domains such as time domain, frequency
regression model can determine an appropriate time to start prediction domain, time–frequency domain. Then the important features are
which yields excellent prognostics performance. Kundu et al. [124] used extracted from the original feature set that could better represent the
a clustering and change point detection algorithm to identify the failure degradation process of bearings. Finally, the selected features are used
behaviors and predict the RUL. Sometimes bearing could be operated to train the LSTM network to predict the bearing RUL. Traditionally the
11
Y. Wen et al. Measurement 187 (2022) 110276
feature extraction is derived from prior knowledge and is separated from Chehade et al. [138] predicted RUL through individual failure
the RUL models. Ren et al. [130] proposed a Multi-scale Dense Gated threshold distribution estimation. They developed a convex quadratic
Recurrent Unit Network (MDGRU) to combine the feature extraction formulation that combines the historical population information and the
into the RUL model by pre-trained Restricted Boltzmann Machine condition monitoring data of an operating unit to online estimate its
network, multi-scale layers, skip gate recurrent unit layers, dense layers. failure threshold. Some efforts have focused on developing data fusion
In [131], Li et al. used a CNN to explore the time–frequency domain methodologies for prognostics [139–141]. The main idea is to construct
information and to extract multi-scale features. Deep learning ap a health index via selecting and fusing multiple degradation signals to
proaches showed the limitation on predicting less stability for the single track the trajectories of the degradation process. After that, the con
sensory information. To address this issue, Wu and Zhang [132] pro structed health index was treated as another sensor signal and then was
posed a new cascade fusion convolutional long-short term memory to used for degradation modeling and prognostics. Song and Liu [140]
fuse the information streams in the form of an ensemble model. Lo et al. solved the HI construction by the quantile regression technique. Kim
[133] proposed a one-dimensional CNN for the prognosis of bearing and et al. [139] proposed a latent linear model for HI construction and a
gear. The network was trained in a hybrid fashion where both the systematic sensor selection procedure for RUL prediction. Chehade et al.
classification loss and clustering loss were combined to estimate the [141] extended the data-level fusion techniques to multiple failure mode
status of prognosis. Xiang et al. [88] proposed an attention based LSTM scenarios. Using constructed HI, Li et al. [142] developed an age- and
named LSTM-A. This special type of network utilizes an attention state-dependent Wiener-process model for RUL prediction with the
mechanism to amplify the input and hidden layer weights at different consideration of the unit-to-unit variability. Son et al. [143] proposed a
degrees to accurately predict the gear remaining life. non-homogeneous gamma process based RUL prediction method. The
The RUL prediction of lubricant is accomplished based on the pa model considered noisy degradation data and by using the Gibbs sam
rameters obtained from oil and degradation trends. Tanwar et al. [134] pling technique, the hidden degradation states were approximated by
proposed a degradation model based on continuous time stochastic using the Gibbs sampling technique. All of the above mentioned
process, i.e., the Wiener process for lubricating oil degradation tracking methods were developed based on statistical approaches. Researchers
and RUL prediction under regular oil top-up effects. In this research, the also investigated the machine learning approaches to develop more
Oil Replenished Effect was neglected in the prediction of lubricant sophisticated methods. For example, Ordóñez Celestino et al. [144]
remaining life. Tanwar and Raghavan addressed this issue in [135] and proposed a hybrid ARIMA (auto-regressive integrated moving average)-
proposed the use of the GPR model as a non-parametric Bayesian SVM model for estimating the RUL. ARIMA model is utilized to estimate
method. Recently, the machine learning approach is also being incor the values of the predictor variables in advance. Then, the result of
porated for condition monitoring from lubricant oil. In [136], the ARIMA is applied as the input of a support vector regression model. Al-
researcher used the machine learning approach to classify the engine Dulaimi et al. [145] proposed a hybrid of LSTM and CNN framework. In
lubricant into three conditions as normal, degraded, and unsuitable. their model, LSTM and CNN are constructed in parallel followed by a
They used a cohort of military land vehicles to collect the data from fully connected multilayer fusion neural network. Zheng et al. [146]
laboratory test results of lubricants and monitoring system of vehicle proposed an LSTM approach for RUL estimation, which fully utilizes the
health. The proposed machine learning procedure used feature selection sensor sequence information and uncovers hidden patterns. Badu et al.
methods to identify the best feature set for representing the lubricant oil [72] developed a deep CNN-based regression method to predict the RUL.
condition. The convolution and pooling filters were used along the temporal
dimension over the multi-channel sensor data to integrate automated
3.2. Application in aircrafts feature learning from raw sensor signals. Wen et al. [147] proposed a
residual convolutional neural network (ResCNN), it can help overcome
An aircraft engine is the power component of an aircraft propulsion vanishing or exploding gradient problem of deep learning algorithms.
system. Most aircraft engines are either piston engines or gas turbines. Song et al. [96] developed a autoencoder-BLSTM hybrid model. racy of
An aircraft engine produces thrust to propel an aircraft. Aircraft engine RUL. Autoencoder was used as a feature extractor to reduce the
failures may result in significant economic losses and even accidents in dimension of data. BLSTM was designed to capture the bidirectional
extreme cases. Except for the aircraft engine, another two of the most long-range dependencies. It showed that the hybrid model had better
important systems in the aircraft are aircraft auxiliary power unit (APU) prediction performance comparing with most existing methods
and actuators [114]. APU is a small turbine engine installed under the including CNN and LSTM.
tail of an aircraft. Instead of providing propulsion, its main function is to To predict the RUL of APU, Chen et al. [148] developed a Gaussian
supply power at a certain flight altitude and provide bleed air for the process regression model combined with ensemble empirical mode
cabin air condition system on the ground. For some aircrafts, APU can decomposition. Liu et al.[149] utilized an Extreme learning machine
also provide compressed air and backup electric power to compensate (ELM) to predict the degradation of an APU. They employed a restricted
for the effect of dead engines. Thus, monitoring of the health state to Boltzmann machine (RBM) to optimize the ELM. Wang et al. [150]
ensure safety and operation efficiency is essential. Actuators (e.g., derived a health index to characterize the APU degradation and then
Electro-Mechanical Actuators (EMA) and Electro-Hydraulic Actuators used a Bayesian framework for the RUL prediction. Zhang et al. [151]
(EHA)) play an active role in control systems in aircrafts. They are used utilized a Weibull-based generalized renewal process to implement
to convert electrical signals to mechanical movement or other physical failure rate prediction of APU. Researchers also put their effort for ac
variables, such as pressure or temperature. In this research field, a ma tuators prognostics. Zhang et al. [152] proposed a weighted bagging
jority of publications use the Commercial Modular Aero-Propulsion GPR algorithm. With the idea of ensemble learning, the weighted
System Simulation (C-MAPSS), which is collected from National Aero bagging GPR algorithm uses a series of subsets to train the GPR model.
nautics and Space Administration (NASA) [137]. The C-MAPSS dataset They found that the proposed method can take the randomness of data
includes four sub-datasets. All engines work in normal condition first into consideration. Then Zhang et al. [153] proposed a feature-aided
and then degrade continuously until a failure criterion is reached. Every Kalman Filter method for motor voltage estimation, which is an essen
record of the engine state is generated by a set of 24 variables, three of tial parameter for performance degradation assessment of EMA. The
which are operational settings and the other 21 are for engine perfor dataset they collected through Flyable Electromechanical Actuator,
mance measurements. Currently, there is no public dataset available. which was made by NASA Ames Research Center. Guo et al. [154]
Most researchers investigate the performance based on data collected presented an optimized incremental learning and on-line training al
from commercial aircraft fleets. The research in this filed has been gorithm based on the relevance vector machine for EHA RUL prediction.
summarized in the following paragraphs. In their research, sample entropy was introduced as an effective
12
Y. Wen et al. Measurement 187 (2022) 110276
signature of the EHA’s health. 3.4. Application in electrical and electronic components
3.3. Application in power systems Many electronic systems including consumer electronics, electric
vehicles, airplanes, and renewable energy devices use lithium-ion bat
Wind energy generated by wind turbines is a growing and reliable teries as the main sources of energy storage. The performance of lithium-
renewable energy source in the world. However, the wind energy in ion batteries deteriorates over the service time in terms of capacity loss
dustry experiences increasing operation & maintenance costs because of and resistance increase. To ensure the safety and reliability of lithium-
main components failures. The temperature stress caused by the tem ion batteries, accurate estimation of the health state and RUL predic
perature difference along with the machine, e.g., shafts’ and gears’ tion are essentials to track the actual performance of batteries. RUL of
temperature, together with lubrication problems accelerate wind tur batteries can be determined by the number of charges and discharge
bine faults. To monitor the health of wind turbines, various monitoring cycles to reduce its capacity from the known current value to the
techniques have been used, such as acoustic measurement, electrical threshold value. In the papers [1,168], they provided comprehensive
effects monitoring, equipment vibration monitoring, power temperature reviews on data-driven methods for battery health diagnostics and
monitoring, oil debris monitoring, etc. SCADA (Supervisory Control and prognostics estimation. We found two public datasets available: NASA
Data Acquisition) is a commonly used tool for data collection, which is a Ames Prognostic enter of Excellence [169] and Center for Advanced Life
system built into turbines to control electricity generation. This system Cycle Engineering (CALCE) of University of Maryland [170]. NASA
use sensors to collect various functional parameters and data, such as Ames Prognostic enter of Excellence dataset contains 4 batteries’ aging
temperature, bearing vibration, wind speed, and phase currents of wind processes which were tested under certain conditions. The batteries
turbines [155]. Carroll et al. [156] ensembled ANN, SVM, and logistic were run through different charge, discharge, and impedance opera
regression to predict wind turbine gearbox failure using SCADA data. tional profiles at room temperature. The CALCE provides multiple bat
This methodology appears to be effective in predicting the failure up to a teries dataset. For the RUL prediction, the battery capacity was tested as
month before it occurs. Inclusion of high frequency vibration data could an indicator of battery status. To measure the capacity, all batteries were
extend that prediction capability to 5–6 months before failure occurs fully charged under the constant-current/constant-voltage mode. In the
with reasonable accuracy. Song et al. [157] introduced a Bayesian discharge period, the cells were applied to a specific load to maintain at
framework with three different methods, namely, the bin method, the a constant current until the voltage was reduced to 2.7 V. Then the
multivariate normal distribution based method, and the copula method discharge capacity was recorded after each full charge–discharge pro
to identify wind turbine health states based on their SCADA data. The cess. The details of the experiments to generate data can be found in
results showed that copula method has the best prediction performance. [171].
Chen et al. [158] proposed an enhanced particle filtering algorithm for A number of researches has been reported for RUL prediction based
wind turbine drivetrain gearboxes RUL prediction using vibration data, on these two public datasets. Zhai and Ye [172] studied a Wiener process
In their method, an adaptive neuro-fuzzy inference system was used to model with an adaptive drift for RUL prediction of batteries. They
learn the health state transition. Hu et al. [159] explored the Wiener concluded that the proposed model fixed the deficiency of conventional
process for the prediction of wind turbine health status using the tem Wiener process models that ignoring the variability of drifts. Shen et al.
perature characteristics of operational SCADA data. Nielsen and [173] proposed a Wiener-based model with measurement errors, which
Sørensen [160] proposed a Markov deterioration model to predict the were assumed to be a logistic distribution with zero means. They
deterioration and RUL of wind turbine blades. In their model, a dynamic adopted the Monte Carlo expectation–maximization method together
Bayesian network was used to obtain probabilities of inspection out with the Gibbs sampling for parameter estimation. Wang et al. [174]
comes and the maximum likelihood method was applied to estimate the proposed a mixed-effects model based on the Wiener process to capture
transition probabilities for a hidden Markov model. Saidi et al. [161] the two-phase degradation pattern. This model accommodated two
proposed a vibration-based prognostic and health monitoring method significant aspects: phase correlation and unit heterogeneity. Zhang
ology for wind turbine high-speed shaft bearing using a spectral kurtosis et al. [175] presented a stochastic modeling method and took the re
and SVR. Reviews about wind turbine condition monitoring can be covery phenomenon into consideration, which is a common phenome
found from [155,162,163]. non for batteries that the system performance degrades with usage and
Electric valves, power transformers, reactor coolant pumps, etc. are recovers in storage. Si [176] proposed a generic nonlinear stochastic
also widely used components in many power systems. Though the RUL modeling framework, they utilized a time-dependent drift coefficient to
prediction of these individual components is not trivial, in recent years, characterize the nonlinearity and dynamics of the degradation signals.
researchers showed their interest in prognosis of these components. For Chen et al. [177] employed a hybrid method based on SVR and error
example, Wang et al. [164] applied a convolution kernel combined with compensation methods for RUL prediction. They used genetic algorithm
LSTM for feature extraction. Then, LSTM is utilized for predicting RUL of to optimize the hyper-parameters of SVR to achieve better accuracy. Xue
electric valves. Later, Wang et al. [165] improved the RUL prediction et al. [178] proposed an integrated algorithm that combines unscented
method by combining LSTM and convolutional auto-encoder (CAE). Kalman filter and SVR. Similar to Chen et al. [177], they used a genetic
They combined deeper features extracted by CAE and the original fea algorithm to optimize parameters of SVR. Khumprom and Yodo [179]
tures to enrich the dimension of features, and the case study showed an proposed a Deep Neural Network (DNN) and compared with other ma
improved predictive capability. Aizpurua et al. [166] focused on lifetime chine learning algorithms, including SVM, k-NN, ANN, and Linear
predictions of power transformers in NPPs. They proposed a Bayesian Regression. The results showed that the DNN algorithm could be com
Particle Filtering framework by integrating model-based experimental parable and outweigh conventional machine learning algorithms. Ren
models, forecasting models and uncertainty modeling concepts together et al. [180] integrated autoencoder with DNN, in which autoencoder
for condition assessment of transformers. Nguyen et al. [167] combined was used for multi-dimensional feature extraction and DNN is trained
ensemble empirical mode decomposition and LSTM for the prognostics for RUL prediction. Jiao et al. [181] tried to combine both statistical
of reactor coolant pumps of NPPs. They observed that multi-step-ahead method and deep learning model, they proposed a Particle Filtering
predictions obtained by an ensemble of separate prediction models are framework based on conditional variational autoencoder and a
more accurate and less noisy than the predictions obtained by a single reweighting strategy to predict the RUL. Since battery powered electric
model. vehicles are starting to play a significant role in today’s automotive
industry. The reliability and safety of batteries are critical. Robust RUL
prediction methods for batteries are desirable and the number of pub
lications are expected to increase rapidly.
13
Y. Wen et al. Measurement 187 (2022) 110276
Through the literature search, it is also observed that the prognosis metrics are negatively oriented scores, which means a lower value in
methodologies are widely being applied for another electronic compo dicates a better model. Both the MAE and RMSE report the average RUL
nent namely hard disk drive (HDD). The Hard disk drive (HDD) is a prediction error to the model. The value of these two metrics could range
complex system integrating mechanical, electricity and magnetism, between 0 and ∞. MAPE is a variant of MAE which is the absolute error
which is the most important and robust data storage device for major normalized over the data. This metric is useful when the errors need to
data storage services. Self-monitoring, analysis and reporting technol be compared across data with different scales. The SF, as shown in
ogy (SMART) is a commercial health monitoring system, which can Equation (15), is asymmetric around the true time of failure. It is defined
detect and report various indicators of HDD reliability to facilitate the in a way that late predictions are more heavily penalized than early
HDD prognosis. However, the conventional SMART can only provide a predictions [137]. We only list a subset of metrics in evaluating the
basic evaluation, and the failure detection rate (FDR) is 3–10% [182]. To performance of methodologies for different applications or datasets. For
improve the accuracy of proactive failure prediction, in recent years, example, C-MAPSS dataset, researchers who developed deep learning
statistical and machine learning methods have been adopted to build approaches for the dataset commonly used the scoring and RMSE met
prediction models based on the SMART attributes. There are two public rics to quantitively evaluate the proposed methodologies in this field.
datasets used in literature: Baidu Inc. [183] and Backblaze Inc (available This provides us an opportunity to compare different deep learning
at: https://www.backblaze.com/b2/hard-drive-test-data.html). The approaches directly. It is worth mentioning that, the literature states a
dataset of Baidu Inc. was collected from a total of 23,395 drives, which wide variety of performance criteria to evaluate different methods. This
had the same initial mode. The attributes of those drives were sampled is usual and expected, as the datasets used in this research field are
at every hour using the SMART and labeled as good or failed, with only characterized by different parameters, scales, variations and operating
433 drives in the failed class and the rest of 22,962 drives in the good conditions. Sometimes, a dataset has multiples of sub-datasets and
class. The Backblaze company also collect the dataset in the similar different experimental setups. Considering this limitation, in the com
fashion. They gathered the SMART attributes on daily basis. In 2013, the parison table, we only report the common evaluation metrics for any
company made the dataset available for the research community and particular dataset having the same parameters, scales, and operating
provides update quarterly [184]. Using Baidu dataset, Xu et al. [185] conditions. IEEE PHM2012 challenge dataset, provided by the FEMTO-
introduced a RNN based approach to assessing the health status of hard ST institution in France, is a popular dataset used for experimenting and
drives based on the gradually changing sequential SMART attributes. Li predicting the bearing remaining useful life. For this dataset, MAE and
et al. [186] proposed two hard drive failure prediction models based on RMSE are often used to evaluate the proposed methodologies. A com
Decision Trees (DTs) and Gradient Boosted Regression Trees. Both pre parison table for C-MAPSS and IEEE PHM 2012 challenge dataset is
diction models showed steady prediction performance, with high failure reported in Table 4 and Table 5, respectively. From Table 4, we can
detection rates (80–96%) and low false alarm rates (0.006–0.31%). observe that deep learning approaches applied to the C-MAPSS dataset
Using Backblaze dataset, Lima et al. [187] evaluated the performance of are only started in 2016. Since then, the number of publications has
both LSTM and CNN architecture to predict the hard drive failure. The grown exponentially, which has been demonstrated in Fig. 8. The best
results of this study showed that deep learning models could be the three performances in each column of Table 4 are labeled in bold. Due to
effective alternative for failure prediction. the calculation biases from different authors, the RMSE and Scoring are
not consistent. Generally, we can conclude that capsule NN [188] and
3.5. Performance analysis MSCNN [77] show superior performance and a good generalization
capability for all sub-datasets. Bayesian Deep learning [101] also dem
This section first describes some performance evaluation metrics for onstrates satisfying results. It is worth mentioning that unlike other deep
RUL prediction. Prediction of RUL is a vast research field where re learning approaches only focusing on point estimation of RUL, Bayesian
searchers develop and proposed a variety of methodologies and algo Deep learning enhanced the model interpretability by providing non
rithms. Hence, different researchers used different performance only point estimation, but also uncertainty quantification, which is
evaluation metrics. Following our pursuit, we found the four most used highly desirable in practice. For IEEE PHM2012 challenge dataset, the
evaluation metrics, i.e., Mean Absolutes Error (MAE), Root Mean performance metrics, dataset and data prediction points vary, we only
Squared Error (RMSE), Mean Absolute Percentage Error (MAPE) and provide a rough comparison based on the work from [224]. As we can
Scoring Function (SF). These metrics are defined as, see, DNN performs the best comparing other traditional machine
learning approaches.
N ⃒ ⃒
1 ∑ ⃒ ⃒
MAE = ⃒̂y i − yi ⃒ (12)
N i=1 ⃒ ⃒
4. Challenges and future trends
√̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅̅
√
√1 ∑ N ( )2 4.1. Challenges and future trends for the predictive maintenance
RMSE = √ ̂y i − yi (13)
N i=1
With advancements in the industrial Internet of things and artificial
⃒ ⃒ intelligence, predictive maintenance has become more and more effi
⃒ ⃒
⃒ y i − yi ⃒
N ⃒̂
cient. The core point to apply predictive maintenance strategy success
∑ ⃒
MAPE =
1
(14) fully is to model and predict failure patterns accurately. Based on our
N i=1 yi studies, it can be seen that this area has been well studied using many
⎧ methodologies ranging from statistical approaches to machine learning
⎛ ⎞
⎪
⎪ based approaches. Recently, many researchers have focused on machine
⎪
⎪
⎪
⎪
⎪
∑N ⎜ ̂y − y ⎟
⎜e− i13 i − 1⎟, if ̂y i − yi < 0 learning approaches to explore the field of RUL prediction. Deep neural
⎪
⎪
⎪
⎨
i=1 ⎝ ⎠ networks such as CNNs, auto-encoder, RNN, LSTM are getting popu
SF = (15) larity for learning features from raw data. Some researchers integrate
⎪
⎪
⎪
⎪ ⎛ ⎞ the statistical and traditional machine learning approach to explore this
⎪
⎪
⎪
⎪
∑N
⎝
̂y i − yi field [196,197]. However, the task of RUL is still challenging due to the
⎪
⎪ e 10 − 1⎠, if ̂y i − yi ≥ 0
⎩ i=1 complex, uncertain, nonlinear features and operational conditions. The
main challenges can be summarized in several aspects: (1) Data insuf
In the above equations, yi and ̂ y i represent true RUL and predicted ficiency and imbalanced classes: most data-driven models, especially
RUL, respectively. N is the total number of units or systems. All of these machine learning approaches, predict the RUL based on the extracted
14
Y. Wen et al. Measurement 187 (2022) 110276
Table 4
Performance comparison for C-MAPSS dataset.
Method year F001 F002 F003 F004
CNN [72] 2016 1287 18.45 13,570 30.29 1596 19.82 7886 29.16
Multi-objective Deep belief NN [189] 2017 640 17.96 10,851 28.06 683 19.41 7210 29.45
Deep LSTM [129] 2017 338 16.14 4450 24.49 852 16.18 5550 27.17
BiLSTM [22] 2018 295 13.65 4130 23.18 317 13.74 5430 24.86
LSTM-FNN [190] 2018 481 14.89 7982 26.86 493 15.11 5200 27.11
Deep CNN [191] 2018 274 12.61 10,400 22.36 284 12.64 12,500 23.31
hybrid LSTM [87] 2019 262 14.72 6953 29.00 452 17.72 15,069 33.43
Ensemble ResCNN [147] 2019 212 12.16 2087 20.85 180 12.01 3400 24.97
capsule NN [188] 2020 276 12.58 1229 16.30 283 11.71 2625 18.96
MSCNN [77] 2020 196 11.44 3747 19.35 241 11.67 4844 22.22
CNN + LSTM [192] 2020 231 12.56 3366 22.73 251 12.10 2840 22.66
Generative Adversarial Networks [193] 2020 174 10.71 2982 19.49 273 11.48 3874 19.71
Bayesian Deep learning [194] 2020 267 12.19 2007 18.49 409 12.07 2415 19.41
15
Y. Wen et al. Measurement 187 (2022) 110276
scenario. If the data size is small and degradation data show similar good reference for selecting an appropriate model for a specific appli
degradation forms, GPMs perhaps are the most suitable and simplest cation scenario. Moreover, we investigate and pinpoint some challenges
models to use. However, their inability to capture the temporal vari and promising directions and opportunities of prognostics for future
ability and the uncertainty inherent in the progression of deterioration studies. Lastly, we provide some constructive resolutions to mitigate the
over time, which is common in practice, limits their engineering appli predictive maintenance challenges (e.g., data insufficiency and imbal
cations. In other words, GPMs are applicable only when the unexplained anced classes; poor generalization ability of developed models; late
randomness is sufficiently small. To deal with the randomness caused by prediction caused by poor predictive capability; noise associated with
inherent variability and environmental factors, SPMs are a natural real-time/online prognostics for in-situ components; manual assignment
choice. If degradation processes are monotonic and evolving only in one of hyper-parameters estimation and tuning; and discrepancy of cross-
direction, Gamma process and inverse Gaussian process are appropriate domain prognosis) and guidance for the user to choose appropriate
to model this type of degradation data. Wiener process is suitable in models to support predictive maintenance implementation. This
modeling deterioration which is not monotone. To relax the assumption research effort would lead to develop machine learning based predictive
of parametric forms, Gaussian process models can be well adapted to maintenance system that enables to sustain effective and accurate pre
model the complex data, where do not involve parameters. Both GPMs ventive maintenance. In summary, this review provides an indication of
and SPMs have well-established statistical properties, where a closed how to study predictive maintenance problems from data-driven ma
form of PDF of RUL is usually available. If not in some cases, filtered chine prognostics perspective and pave a path for effective further
based methods or some sampling methods have to be used for finding an investigation. It can be foreseen that more and more advanced predic
approximated RUL. One limitation of both GPMs and SPMs is that they tive models will be developed in the near future, which will boost pre
require a pre-defined failure threshold. However, this may not be dictive maintenance, improve reliability, enhance productivity and
available or accurate since it requires knowledge from domain experts. achieve intelligent decision-making in the industry.
Moreover, a fixed failure threshold may not be sufficient to characterize
the health status of all products due to their heterogeneous features. In CRediT authorship contribution statement
this case, covariate based models are efficient without needing failure
threshold assumption. The rapid development of sensing and computing Yuxin Wen: Data curation, Writing– original draft, Writing – review
technologies has enriched degradation data significantly. This data-rich & editing. Md. Fashiar Rahman: Visualization, Investigation, Valida
environment for degradation modeling and prognostics that could tion. Honglun Xu: Conceptualization, Methodology. Tzu-Liang Bill
potentially lead to an accurate inference about RUL of products. How Tseng: Supervision, Writing – review & editing.
ever, RUL prediction with multi-sensor signals is a more challenging
issue than the cases of a single degradation signal. One way to deal with
this issue is to combine multi-sensor signals into a composite health Declaration of Competing Interest
index or mapping the correlation between signals and RUL values, then
widely-used GPMs and stochastic process models are still applicable. The authors declare that they have no known competing financial
Another option is to use machine learning, which has gradually become interests or personal relationships that could have appeared to influence
a mainstream for RUL prediction. The goal of conventional machine the work reported in this paper.
learning and deep learning for RUL prediction is to learn the non-linear
mapping between the sensor data and RUL using different network ar Acknowledgments
chitecture. Among those machine learning models, LSTMs have attrac
ted great attention and presented an outstanding ability in the This work was partially supported by the National Science Founda
application of RUL prediction, as they have the capability to learn de tion (ECR-PEER-1935454), (ERC-ASPIRE-1941524) and Department of
pendencies of sequential data. While machine models can provide better Education (Award # P120A180101). The authors wish to express
performance for RUL prediction, they do not have a probabilistic sincere gratitude for their financial support.
orientation, namely, uncertainty quantification, and therefore, no PDF
of the RUL is available. Bayesian neural networks have been used to References
cover the shortage. If there is a need for cross-domain prognosis, transfer
learning is a preferable way to provide a better performance. Attention [1] H. Meng, Y.-F. Li, A review on prognostics and health management (PHM)
methods of lithium-ion batteries, Renew. Sustain. Energy Rev. 116 (2019),
mechanism can also assist the learning model in yielding potential im 109405.
provements in the learning tasks. [2] P.G. Ramesh, S.J. Dutta, S.S. Neog, P. Baishya, I. Bezbaruah, Implementation of
Predictive Maintenance Systems in Remotely Located Process Plants under
Industry 4.0 Scenario, Advances in RAMS Engineering, Springer, 2020, pp.
5. Conclusion 293–326.
[3] N. Sakib, T. Wuest, Challenges and opportunities of condition-based predictive
In the context of Industry 4.0, predictive maintenance is trans maintenance: a review, Procedia CIRP 78 (2018) 267–272.
[4] R. Ahmad, S. Kamaruddin, An overview of time-based and condition-based
forming the way of thinking maintenance: from cost to business op maintenance in industrial application, Comput. Ind. Eng. 63 (2012) 135–149.
portunity in the industry. Based on this rationale, predictive [5] A. Jezzini, M. Ayache, L. Elkhansa, B. Makki, M. Zein, Effects of predictive
maintenance is attracting considerable investment from industries and maintenance(PdM), Proactive maintenace(PoM) & Preventive maintenance(PM)
on minimizing the faults in medical instruments, in: 2013 2nd International
increasing attention from research societies. Many predictive mainte Conference on Advances in Biomedical Engineering, 2013, pp. 53–56.
nance techniques have been developed up to now to respond to the [6] A.K.S. Jardine, D. Lin, D. Banjevic, A review on machinery diagnostics and
demand of high reliability of facilities and equipment but more studies prognostics implementing condition-based maintenance, Mech. Syst. Sig. Process.
20 (2006) 1483–1510.
are still required to improve their predictive accuracies and efficiencies. [7] K.L. Tsui, N. Chen, Q. Zhou, Y. Hai, W. Wang, Prognostics and health
This review provides a comprehensive overview of the most recent data- management: a review on data driven approaches, Math. Probl. Eng. 2015
driven prognostic techniques, which is the indispensable process for (2015).
[8] Y. Lei, N. Li, L. Guo, N. Li, T. Yan, J. Lin, Machinery health prognostics: a
predictive maintenance. Specifically, this paper reviews the methodol
systematic review from data acquisition to RUL prediction, Mech. Syst. Sig.
ogies, best practices, current challenges, and future trends of machine Process. 104 (2018) 799–834.
prognostics. To make accurate prognostics, choosing a proper modeling [9] M.S. Kan, A.C.C. Tan, J. Mathew, A review on prognostic techniques for non-
technique is essential. We provide a detailed summary of statistical stationary and non-linear rotating systems, Mech. Syst. Sig. Process. 62 (2015)
1–20.
based models and machine learning based models. Then, their applica [10] M.G. Pecht, A prognostics and health management roadmap for information and
tions based on these models are demonstrated in detail, which provide a electronics-rich systems, IEICE ESS Fundam. Rev. 3 (2010), 4_25-24_32.
16
Y. Wen et al. Measurement 187 (2022) 110276
[11] Y. Wen, J. Wu, Q. Zhou, T.-L. Tseng, Multiple-change-point modeling and exact [45] D. Yang, X. Zhang, R. Pan, Y. Wang, Z. Chen, A novel Gaussian process regression
bayesian inference of degradation signal for prognostic improvement, IEEE Trans. model for state-of-health estimation of lithium-ion battery using charging curve,
Autom. Sci. Eng. (2018) 1–16. J. Power Sources 384 (2018) 387–395.
[12] X.-S. Si, W. Wang, C.-H. Hu, D.-H. Zhou, Remaining useful life estimation – A [46] Z.-S. Ye, N. Chen, The inverse Gaussian process as a degradation model,
review on the statistical data driven approaches, Eur. J. Oper. Res. 213 (2011) Technometrics 56 (2014) 302–311.
1–14. [47] F. Cartella, J. Lemeire, L. Dimiccoli, H. Sahli, Hidden semi-Markov models for
[13] Z.S. Ye, M. Xie, Stochastic modelling and analysis of degradation for highly predictive maintenance, Math. Probl. Eng. 2015 (2015).
reliable products, Appl. Stochastic Models Bus. Ind. 31 (2015) 16–32. [48] C.R. David, Regression models and life tables (with discussion), J. Roy. Stat. Soc.
[14] Z. Zhang, X. Si, C. Hu, X. Kong, Degradation modeling–based remaining useful life 34 (1972) 187–220.
estimation: a review on approaches for systems with heterogeneity, Proc. Inst. [49] Q. Zhou, J. Son, S. Zhou, X. Mao, M. Salman, Remaining useful life prediction of
Mech. Eng. Part O: J. Risk Reliab. 229 (2015) 343–355. individual units subject to hard failure, IIE Trans. 46 (2014) 1017–1030.
[15] D. Wang, K.-L. Tsui, Q. Miao, Prognostics and health management: a review of [50] J. Man, Q. Zhou, Prediction of hard failures with stochastic degradation signals
vibration based bearing and gear health indicators, IEEE Access 6 (2017) using Wiener process and proportional hazards model, Comput. Ind. Eng. 125
665–676. (2018) 480–489.
[16] S. Khan, T. Yairi, A review on the application of deep learning in system health [51] J. Hu, Q. Sun, Z. Ye, Q. Zhou, Joint modeling of degradation and lifetime data for
management, Mech. Syst. Sig. Process. 107 (2018) 241–265. RUL prediction of deteriorating products, IEEE Trans. Ind. Inf. (2020) 1.
[17] L. Zhang, J. Lin, B. Liu, Z. Zhang, X. Yan, M. Wei, A review on deep learning [52] X. Yue, R.A. Kontar, Joint models for event prediction from time series and
applications in prognostics and health management, IEEE Access 7 (2019) survival data, Technometrics (2020) 1–26.
162415–162438. [53] J.L. Katzman, U. Shaham, A. Cloninger, J. Bates, T. Jiang, Y. Kluger, DeepSurv:
[18] R. Zhao, R. Yan, Z. Chen, K. Mao, P. Wang, R.X. Gao, Deep learning and its personalized treatment recommender system using a Cox proportional hazards
applications to machine health monitoring, Mech. Syst. Sig. Process. 115 (2019) deep neural network, BMC Med. Res. Method. 18 (2018) 24.
213–237. [54] H. Kvamme, Ø. Borgan, I. Scheel, Time-to-event prediction with neural networks
[19] M. Kordestani, M. Saif, M.E. Orchard, R. Razavi-Far, K. Khorasani, Failure and Cox regression, arXiv preprint arXiv:1907.00825, 2019.
prognosis and applications—a survey of recent literature, IEEE Trans. Reliab. [55] P.J.G. Nieto, E. García-Gonzalo, F.S. Lasheras, F.J. de Cos Juez, Hybrid PSO–SVM-
(2019). based method for forecasting of the remaining useful life for aircraft engines and
[20] J. Guo, Z. Li, M. Li, A review on prognostics methods for engineering systems, evaluation of its reliability, Reliab. Eng. Syst. Saf. 138 (2015) 219–231.
IEEE Trans. Reliab. (2019) 1–20. [56] T. Qin, S. Zeng, J. Guo, Robust prognostics for state of health estimation of
[21] M. Baur, P. Albertelli, M. Monno, A review of prognostics and health management lithium-ion batteries based on an improved PSO–SVR model, Microelectron.
of machine tools, Int. J. Adv. Manuf. Technol. 107 (2020) 2843–2863. Reliab. 55 (2015) 1280–1284.
[22] J. Lee, F. Wu, W. Zhao, M. Ghaffari, L. Liao, D. Siegel, Prognostics and health [57] T. Benkedjouh, K. Medjaher, N. Zerhouni, S. Rechak, Remaining useful life
management design for rotary machinery systems—Reviews, methodology and estimation based on nonlinear feature reduction and support vector regression,
applications, Mech. Syst. Sig. Process. 42 (2014) 314–334. Eng. Appl. Artif. Intell. 26 (2013) 1751–1760.
[23] C.J. Lu, W.O. Meeker, Using degradation measures to estimate a time-to-failure [58] Y. Hu, C. Hu, X. Kong, Z. Zhou, Real-time lifetime prediction method based on
distribution, Technometrics 35 (1993) 161–174. wavelet support vector regression and fuzzy c-means clustering, Acta Autom. Sin.
[24] N. Gebraeel, A. Elwany, J. Pan, Residual life predictions in the absence of prior 38 (2012) 331–340.
degradation knowledge, IEEE Trans. Reliab. 58 (2009) 106–117. [59] C. Shen, D. Wang, F. Kong, W.T. Peter, Fault diagnosis of rotating machinery
[25] H. Kim, J.T. Kim, G. Heo, Prognostics for integrity of steam generator tubes using based on the statistical parameters of wavelet packet paving and a generic
the general path model, Nucl. Eng. Technol. 50 (2018) 88–96. support vector regressive classifier, Measurement 46 (2013) 1551–1564.
[26] N. Gebraeel, Sensory-updated residual life distributions for components with [60] J. Liu, R. Seraoui, V. Vitelli, E. Zio, Nuclear power plant components condition
exponential degradation patterns, IEEE Trans. Autom. Sci. Eng. 3 (2006) monitoring by probabilistic support vector machine, Ann. Nucl. Energy 56 (2013)
382–393. 23–33.
[27] Y. Wen, J. Wu, Y. Yuan, Multiple-phase modeling of degradation signal for [61] H.-Z. Huang, H.-K. Wang, Y.-F. Li, L. Zhang, Z. Liu, Support vector machine based
condition monitoring and remaining useful life prediction, IEEE Trans. Reliab. 66 estimation of remaining useful life: current research status and future trends,
(2017) 924–938. J. Mech. Sci. Technol. 29 (2015) 151–163.
[28] R. Zhou, N. Serban, N. Gebraeel, Degradation-based residual life prediction under [62] P. Kundu, A.K. Darpe, M.S. Kulkarni, An ensemble decision tree methodology for
different environments, Ann. Appl. Stat. (2014) 1671–1689. remaining useful life prediction of spur gears under natural pitting progression,
[29] N. Chen, K.L. Tsui, Condition monitoring and remaining useful life prediction Struct. Health Monit. 19 (2020) 854–872.
using degradation signals: revisited, IIE Trans. 45 (2013) 939–952. [63] L. Wang, D. Zhou, H. Zhang, W. Zhang, J. Chen, Application of relative entropy
[30] G.A. Whitmore, Estimating degradation by a Wiener diffusion process subject to and gradient boosting decision tree to fault prognosis in electronic circuits,
measurement error, Lifetime Data Anal. 1 (1995) 307–319. Symmetry 10 (2018) 495.
[31] X. Wang, Wiener processes with random effects for degradation data, [64] M. Ferguson, R. Ak, Y.-T.T. Lee, K.H. Law, Automatic localization of casting
J. Multivariate Anal. 101 (2010) 340–351. defects with convolutional neural networks, in: 2017 IEEE international
[32] Y. Wen, J. Wu, D. Das, T.-L.-B. Tseng, Degradation modeling and RUL prediction conference on big data (big data), IEEE, 2017, pp. 1726–1735.
using Wiener process subject to multiple change points and unit heterogeneity, [65] M.K. Ferguson, A. Ronay, Y.-T.T. Lee, K.H. Law, Detection and segmentation of
Reliab. Eng. Syst. Saf. 176 (2018) 113–124. manufacturing defects with convolutional neural networks and transfer learning,
[33] X.-S. Si, W. Wang, C.-H. Hu, D.-H. Zhou, M.G. Pecht, Remaining useful life Smart Sustain. Manuf. Syst. 2 (2018).
estimation based on a nonlinear diffusion degradation process, IEEE Trans. [66] M.F. Rahman, J. Wu, T.L.B. Tseng, Automatic morphological extraction of fibers
Reliab. 61 (2012) 50–67. from SEM images for quality control of short fiber-reinforced composites
[34] Z.-S. Ye, Y. Wang, K.-L. Tsui, M. Pecht, Degradation data analysis using Wiener manufacturing, CIRP J. Manuf. Sci. Technol. 33 (2021) 176–187.
processes with measurement errors, IEEE Trans. Reliab. 62 (2013) 772–780. [67] W. Hou, Y. Wei, J. Guo, Y. Jin, Automatic detection of welding defects using deep
[35] X.-S. Si, W. Wang, C.-H. Hu, M.-Y. Chen, D.-H. Zhou, A Wiener-process-based neural network, J. Phys.: Conf. Ser., IOP Publishing (2017), 012006.
degradation model with a recursive filter algorithm for remaining useful life [68] Z. Huang, Z. Pan, B. Lei, Transfer learning with deep convolutional neural
estimation, Mech. Syst. Sig. Process. 35 (2013) 219–237. network for SAR target classification with limited labeled data, Remote Sensing 9
[36] C.-Y. Peng, S.-T. Tseng, Mis-specification analysis of linear degradation models, (2017) 907.
IEEE Trans. Reliab. 58 (2009) 444–455. [69] M.F. Rahman, T.-L.B. Tseng, M. Pokojovy, W. Qian, B. Totada, H. Xu, An
[37] H. Wang, X. Ma, Y. Zhao, An improved Wiener process model with adaptive drift automatic approach to lung region segmentation in chest x-ray images using
and diffusion for online remaining useful life prediction, Mech. Syst. Sig. Process. adapted U-Net architecture, Medical Imaging 2021: Physics of Medical Imaging,
127 (2019) 370–387. International Society for Optics and Photonics, 2021, pp. 115953I.
[38] Z.-S. Ye, Y. Shen, M. Xie, Degradation-based burn-in with preventive [70] M.F. Rahman, Y. Wen, H. Xu, T.-L.B. Tseng, S. Akundi, Data mining in
maintenance, Eur. J. Oper. Res. 221 (2012) 360–367. telemedicine, Adv. Telemed. Health Monit. (2020) 103.
[39] Z. Zhang, X. Si, C. Hu, Y. Lei, Degradation data analysis and remaining useful life [71] L. Wen, L. Gao, X. Li, B. Zeng, Convolutional neural network with automatic
estimation: a review on Wiener-process-based methods, Eur. J. Oper. Res. 271 learning rate scheduler for fault classification, IEEE Trans. Instrum. Meas. 70
(2018) 775–796. (2021) 1–12.
[40] Q. Dong, L. Cui, A study on stochastic degradation process models under different [72] G.S. Babu, P. Zhao, X.-L. Li, Deep convolutional neural network based regression
types of failure thresholds, Reliab. Eng. Syst. Saf. 181 (2019) 202–212. approach for estimation of remaining useful life, International conference on
[41] J.M. van Noortwijk, A survey of the application of gamma processes in database systems for advanced applications, Springer, 2016, pp. 214–228.
maintenance, Reliab. Eng. Syst. Saf. 94 (2009) 2–21. [73] L. Ren, Y. Sun, H. Wang, L. Zhang, Prediction of bearing remaining useful life
[42] R.R. Richardson, M.A. Osborne, D.A. Howey, Gaussian process regression for with deep convolution neural network, IEEE Access 6 (2018) 13041–13049.
forecasting battery state of health, J. Power Sources 357 (2017) 209–219. [74] B. Yang, R. Liu, E. Zio, Remaining useful life prediction based on a double-
[43] P. Boškoski, M. Gašperin, D. Petelin, Đ. Juričić, Bearing fault prognostics using convolutional neural network architecture, IEEE Trans. Ind. Electron. 66 (2019)
Rényi entropy based features and Gaussian process models, Mech. Syst. Sig. 9521–9530.
Process. 52 (2015) 327–337. [75] S. Kiranyaz, A. Gastli, L. Ben-Brahim, N. Al-Emadi, M. Gabbouj, Real-time fault
[44] S.A. Aye, P.S. Heyns, An integrated Gaussian process regression for prediction of detection and identification for MMC using 1-D convolutional neural networks,
remaining useful life of slow speed bearings based on acoustic emission, Mech. IEEE Trans. Ind. Electron. 66 (2018) 8760–8771.
Syst. Sig. Process. 84 (2017) 485–498.
17
Y. Wen et al. Measurement 187 (2022) 110276
[76] J. Zhu, N. Chen, W. Peng, Estimation of bearing remaining useful life based on [106] H. Zhang, Q. Zhang, S. Shao, T. Niu, X. Yang, H. Ding, Sequential network with
multiscale convolutional neural network, IEEE Trans. Ind. Electron. 66 (2018) residual neural network for rotatory machine remaining useful life prediction
3208–3216. using deep transfer learning, Shock Vib. 2020 (2020).
[77] H. Li, W. Zhao, Y. Zhang, E. Zio, Remaining useful life prediction using multi- [107] Z. Zhang, Y. Wang, K. Wang, Fault diagnosis and prognosis using wavelet packet
scale deep convolutional neural network, Appl. Soft Comput. 89 (2020), 106113. decomposition, Fourier transform and artificial neural network, J. Intell. Manuf.
[78] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural Comput. 9 24 (2013) 1213–1227.
(1997) 1735–1780. [108] Y. Wang, G. Xu, L. Liang, K. Jiang, Detection of weak transient signals based on
[79] K. Cho, B. Van Merriënboer, C. Gulcehre, D. Bahdanau, F. Bougares, H. Schwenk, wavelet packet transform and manifold learning for rolling element bearing fault
Y. Bengio, Learning phrase representations using RNN encoder-decoder for diagnosis, Mech. Syst. Sig. Process. 54 (2015) 259–276.
statistical machine translation, arXiv preprint arXiv:1406.1078, 2014. [109] G. Bin, J. Gao, X. Li, B. Dhillon, Early fault diagnosis of rotating machinery based
[80] Y. Song, L. Li, Y. Peng, D. Liu, Lithium-Ion Battery Remaining Useful Life on wavelet packets—Empirical mode decomposition feature extraction and
Prediction Based on GRU-RNN, in: 2018 12th International Conference on neural network, Mech. Syst. Sig. Process. 27 (2012) 696–711.
Reliability, Maintainability, and Safety (ICRMS), IEEE, 2018, pp. 317–322. [110] J.-X. Zhang, D.-B. Du, X.-S. Si, C.-H. Hu, H.-W. Zhang, Joint optimization of
[81] J. Chen, H. Jing, Y. Chang, Q. Liu, Gated recurrent unit based recurrent neural preventive maintenance and inventory management for standby systems with
network for remaining useful life prediction of nonlinear deterioration process, hybrid-deteriorating spare parts, Reliab. Eng. Syst. Saf. 214 (2021), 107686.
Reliab. Eng. Syst. Saf. 185 (2019) 372–382. [111] J. Cai, Y. Yin, L. Zhang, X. Chen, Joint optimization of preventive maintenance
[82] J. Wang, J. Yan, C. Li, R.X. Gao, R. Zhao, Deep heterogeneous GRU model for and spare parts inventory with appointment policy, Math. Probl. Eng. 2017
predictive analytics in smart manufacturing: application to tool wear prediction, (2017).
Comput. Ind. 111 (2019) 1–14. [112] Y. Jiang, M. Chen, D. Zhou, Joint optimization of preventive maintenance and
[83] F.O. Heimes, Recurrent neural networks for remaining useful life estimation, inventory policies for multi-unit systems subject to deteriorating spare part
IEEE, pp. 1–6. inventory, J. Manuf. Syst. 35 (2015) 191–205.
[84] J. Liu, A. Saxena, K. Goebel, B. Saha, W. Wang, An adaptive recurrent neural [113] J.-X. Zhang, X.-S. Si, D.-B. Du, C.-H. Hu, C. Hu, A novel iterative approach of
network for remaining useful life prediction of lithium-ion batteries, National lifetime estimation for standby systems with deteriorating spare parts, Reliab.
Aeronautics And Space Administration Moffett Field CA Ames Research … Eng. Syst. Saf. 201 (2020), 106960.
(2010). [114] H. Jia, Y. Ding, R. Peng, Y. Song, Reliability evaluation for demand-based warm
[85] L. Guo, N. Li, F. Jia, Y. Lei, J. Lin, A recurrent neural network based health standby systems considering degradation process, IEEE Trans. Reliab. 66 (2017)
indicator for remaining useful life prediction of bearings, Neurocomputing 240 795–805.
(2017) 98–109. [115] P. Nectoux, R. Gouriveau, K. Medjaher, E. Ramasso, B. Chebel-Morello, N.
[86] Y. Zhang, R. Xiong, H. He, M.G. Pecht, Long short-term memory recurrent neural Zerhouni, C. Varnier, PRONOSTIA: An experimental platform for bearings
network for remaining useful life prediction of lithium-ion batteries, IEEE Trans. accelerated degradation tests.
Veh. Technol. 67 (2018) 5695–5705. [116] R.M. Mortier, S.T. Orszulik, M.F. Fox, Chemistry and Technology of Lubricants,
[87] S. Zhao, Y. Zhang, S. Wang, B. Zhou, C. Cheng, A recurrent neural network Springer, 2010.
approach for remaining useful life prediction utilizing a novel trend features [117] Y. Du, T. Wu, J. Cheng, R. Gong, Lubricating oil deterioration on a four-ball test
construction method, Measurement 146 (2019) 279–288. rig via on-line monitoring, in: Proceedings of Malaysian international tribology
[88] S. Xiang, Y. Qin, C. Zhu, Y. Wang, H. Chen, Long short-term memory neural conference, 2015, pp. 185–186.
network with weight amplification and its application into gear remaining useful [118] Y. Du, T. Wu, S. Zhou, V. Makis, Remaining useful life prediction of lubricating oil
life prediction, Eng. Appl. Artif. Intell. 91 (2020), 103587. with dynamic principal component analysis and proportional hazards model,
[89] A. Zhang, H. Wang, S. Li, Y. Cui, Z. Liu, G. Yang, J. Hu, Transfer learning with Proc. Inst. Mech. Eng., Part J: J. Eng. Tribol. 234 (2020) 964–971.
deep recurrent neural networks for remaining useful life estimation, Appl. Sci. 8 [119] Y. Lei, N. Li, S. Gontarz, J. Lin, S. Radkowski, J. Dybala, A model-based method
(2018) 2416. for remaining useful life prediction of machinery, IEEE Trans. Reliab. 65 (2016)
[90] J. Wang, G. Wen, S. Yang, Y. Liu, Remaining useful life estimation in prognostics 1314–1326.
using deep bidirectional lstm neural network, in: 2018 Prognostics and System [120] S. Hong, Z. Zhou, C. Lu, B. Wang, T. Zhao, Bearing remaining life prediction using
Health Management Conference (PHM-Chongqing), IEEE, 2018, pp. 1037–1042. Gaussian process regression with composite kernel functions, J. Vibroeng. 17
[91] A. Elsheikh, S. Yacout, M.-S. Ouali, Bidirectional handshaking LSTM for (2015) 695–704.
remaining useful life prediction, Neurocomputing 323 (2019) 148–156. [121] Z. Liu, M.J. Zuo, Y. Qin, Remaining useful life prediction of rolling element
[92] S. Xiang, Y. Qin, J. Luo, H. Pu, B. Tang, Multicellular LSTM-based deep learning bearings based on health state assessment, Proc. Inst. Mech. Eng., Part C: J. Mech.
model for aero-engine remaining useful life prediction, Reliab. Eng. Syst. Saf. 216 Eng. Sci. 230 (2016) 314–330.
(2021), 107927. [122] H. Wang, Y. Zhao, X. Ma, Remaining useful life prediction using a novel two-stage
[93] Q. An, Z. Tao, X. Xu, M. El Mansori, M. Chen, A data-driven model for milling tool wiener process with stage correlation, IEEE Access 6 (2018) 65227–65238.
remaining useful life prediction with convolutional and stacked LSTM network, [123] W. Ahmad, S.A. Khan, M.M. Islam, J.-M. Kim, A reliable technique for remaining
Measurement 154 (2020), 107461. useful life estimation of rolling element bearings using dynamic regression
[94] Y. Wang, H. Yao, S. Zhao, Auto-encoder based dimensionality reduction, models, Reliab. Eng. Syst. Saf. 184 (2019) 67–76.
Neurocomputing 184 (2016) 232–242. [124] P. Kundu, S. Chopra, B.K. Lad, Multiple failure behaviors identification and
[95] J. Ma, H. Su, W.-L. Zhao, B. Liu, Predicting the remaining useful life of an aircraft remaining useful life prediction of ball bearings, J. Intell. Manuf. 30 (2019)
engine using a stacked sparse autoencoder with multilayer self-learning, 1795–1807.
Complexity 2018 (2018). [125] M. Zhao, B. Tang, Q. Tan, Bearing remaining useful life estimation based on
[96] Y. Song, G. Shi, L. Chen, X. Huang, T. Xia, Remaining useful life prediction of time–frequency representation and supervised dimensionality reduction,
turbofan engine using hybrid model based on autoencoder and bidirectional long Measurement 86 (2016) 41–55.
short-term memory, J. Shanghai Jiaotong Univ. (Science) 23 (2018) 85–94. [126] L. Xiao, X. Chen, X. Zhang, M. Liu, A novel approach for bearing remaining useful
[97] C. Su, L. Li, Z. Wen, Remaining useful life prediction via a variational autoencoder life estimation under neither failure nor suspension histories condition, J. Intell.
and a time-window-based sequence neural network, Qual. Reliab. Eng. Int. Manuf. 28 (2017) 1893–1914.
(2020). [127] A.R. Bastami, A. Aasi, H.A. Arghand, Estimation of remaining useful life of rolling
[98] W. Mao, J. He, M.J. Zuo, Predicting remaining useful life of rolling bearings based element bearings using wavelet packet decomposition and artificial neural
on deep feature representation and transfer learning, IEEE Trans. Instrum. Meas. network, Iranian J. Sci. Technol., Trans. Electr. Eng. 43 (2019) 233–245.
69 (2020) 1594–1608. [128] W. Mao, J. He, J. Tang, Y. Li, Predicting remaining useful life of rolling bearings
[99] M. Xia, T. Li, T. Shu, J. Wan, C.W. De Silva, Z. Wang, A two-stage approach for the based on deep feature representation and long short-term memory neural
remaining useful life prediction of bearings using deep neural networks, IEEE network, Adv. Mech. Eng. 10 (2018), 1687814018817184.
Trans. Ind. Inf. 15 (2018) 3703–3711. [129] G. Tang, Y. Zhou, H. Wang, G. Li, Prediction of bearing performance degradation
[100] W. Peng, Z.-S. Ye, N. Chen, Bayesian deep-learning-based health prognostics with bottleneck feature based on LSTM network, in: 2018 IEEE International
toward prognostics uncertainty, IEEE Trans. Ind. Electron. 67 (2019) 2283–2293. Instrumentation and Measurement Technology Conference (I2MTC), IEEE, 2018,
[101] M. Kim, K. Liu, A Bayesian deep learning framework for interval estimation of pp. 1–6.
remaining useful life in complex systems by incorporating general degradation [130] L. Ren, X. Cheng, X. Wang, J. Cui, L. Zhang, Multi-scale dense gate recurrent unit
characteristics, IISE Trans. (2020) 1–23. networks for bearing remaining useful life prediction, Future Gen. Comput. Syst.
[102] G. Li, L. Yang, C.-G. Lee, X. Wang, M. Rong, A Bayesian deep learning RUL 94 (2019) 601–609.
framework integrating epistemic and aleatoric uncertainties, IEEE Trans. Ind. [131] X. Li, W. Zhang, Q. Ding, Deep learning-based remaining useful life estimation of
Electron. (2020). bearings using multi-scale feature extraction, Reliab. Eng. Syst. Saf. 182 (2019)
[103] S.J. Pan, Q. Yang, A survey on transfer learning, IEEE Trans. Knowl. Data Eng. 22 208–218.
(2009) 1345–1359. [132] Q. Wu, C. Zhang, Cascade fusion convolutional long-short time memory network
[104] C. Sun, M. Ma, Z. Zhao, S. Tian, R. Yan, X. Chen, Deep transfer learning based on for remaining useful life prediction of rolling bearing, IEEE Access 8 (2020)
sparse autoencoder for remaining useful life prediction of tool in manufacturing, 32957–32965.
IEEE Trans. Ind. Inf. 15 (2018) 2416–2425. [133] C.-C. Lo, C.-H. Lee, W.-C. Huang, Prognosis of bearing and gear wears using
[105] Y. Fan, S. Nowaczyk, T. Rögnvaldsson, Transfer learning for remaining useful life convolutional neural network with hybrid loss function, Sensors 20 (2020) 3539.
prediction based on consensus self-organizing models, Reliab. Eng. Syst. Saf. 203 [134] M. Tanwar, N. Raghavan, Lubricating oil degradation modeling and prognostics
(2020), 107098. using the Wiener process, in: 2019 International Conference on Sensing,
Diagnostics, Prognostics, and Control (SDPC), IEEE, 2019, pp. 601–605.
18
Y. Wen et al. Measurement 187 (2022) 110276
[135] M. Tanwar, N. Raghavan, Lubricating oil remaining useful life prediction using [166] J.I. Aizpurua, S.D.J. McArthur, B.G. Stewart, B. Lambert, J.G. Cross, V.
multi-output gaussian process regression, IEEE Access 8 (2020) 128897–128907. M. Catterson, Adaptive power transformer lifetime predictions through machine
[136] V.T. Le, C.P. Lim, S. Mohamed, S. Nahavandi, L. Yen, G.E. Gallasch, S. Baker, D. learning and uncertainty modeling in nuclear power plants, IEEE Trans. Ind.
Ludovici, N. Draper, V. Wickramanayake, Condition monitoring of engine Electron. 66 (2018) 4726–4737.
lubrication oil of military vehicles: a machine learning approach, in: 17th [167] H.-P. Nguyen, P. Baraldi, E. Zio, Ensemble empirical mode decomposition and
Australian International Aerospace Congress: AIAC 2017, Engineers Australia, long short-term memory neural network for multi-step predictions of time series
Royal Aeronautical Society, 2017, pp. 718. signals in nuclear power plants, Appl. Energy 116346 (2020).
[137] A. Saxena, K. Goebel, D. Simon, N. Eklund, Damage propagation modeling for [168] Y. Li, K. Liu, A.M. Foley, A. Zülke, M. Berecibar, E. Nanini-Maury, J. Van Mierlo,
aircraft engine run-to-failure simulation, in: 2008 international conference on H.E. Hoster, Data-driven health estimation and lifetime prediction of lithium-ion
prognostics and health management, IEEE, 2008, pp. 1–9. batteries: a review, Renew. Sustain. Energy Rev. 113 (2019), 109254.
[138] A. Chehade, S. Bonk, K. Liu, Sensory-based failure threshold estimation for [169] B. Saha, K. Goebel, Battery data set, NASA AMES prognostics data repository,
remaining useful life prediction, IEEE Trans. Reliab. 66 (2017) 939–949. 2007.
[139] M. Kim, C. Song, K. Liu, A generic health index approach for multisensor [170] P. Michael, Battery Data Set, CALCE Battery Research Group, Maryland, MD,
degradation modeling and sensor selection, IEEE Trans. Autom. Sci. Eng. 16 2017, 2017, pp. https://web.calce.umd.edu/batteries/index.html.
(2019) 1426–1437. [171] W. He, N. Williard, M. Osterman, M. Pecht, Prognostics of lithium-ion batteries
[140] C. Song, K. Liu, Statistical degradation modeling and prognostics of multiple based on Dempster-Shafer theory and the Bayesian Monte Carlo method, J. Power
sensor signals via data fusion: a composite health index approach, IISE Trans. 50 Sources 196 (2011) 10314–10321.
(2018) 853–867. [172] Q. Zhai, Z.-S. Ye, RUL prediction of deteriorating products using an adaptive
[141] A. Chehade, C. Song, K. Liu, A. Saxena, X. Zhang, A data-level fusion approach for Wiener process model, IEEE Trans. Ind. Inf. 13 (2017) 2911–2921.
degradation modeling and prognostic analysis under multiple failure modes, [173] Y. Shen, L. Shen, W. Xu, A Wiener-based degradation model with logistic
J. Qual. Technol. 50 (2018) 150–165. distributed measurement errors and remaining useful life estimation, Qual.
[142] N. Li, Y. Lei, T. Yan, N. Li, T. Han, A Wiener-process-model-based method for Reliab. Eng. Int. 34 (2018) 1289–1303.
remaining useful life prediction considering unit-to-unit variability, IEEE Trans. [174] H. Wang, X. Ma, Y. Zhao, A mixed-effects model of two-phase degradation process
Ind. Electron. 66 (2018) 2092–2101. for reliability assessment and RUL prediction, Microelectron. Reliab. 107 (2020),
[143] K. Le Son, M. Fouladirad, A. Barros, Remaining useful lifetime estimation and 113622.
noisy gamma deterioration process, Reliab. Eng. Syst. Saf. 149 (2016) 76–87. [175] Z.-X. Zhang, X.-S. Si, C.-H. Hu, M.G. Pecht, A prognostic model for stochastic
[144] C. Ordóñez, F.S. Lasheras, J. Roca-Pardiñas, F.J. de Cos Juez, A hybrid degrading systems with state recovery: Application to Li-ion batteries, IEEE Trans.
ARIMA–SVM model for the study of the remaining useful life of aircraft engines, Reliab. 66 (2017) 1293–1308.
J. Comput. Appl. Math. 346 (2019) 184–191. [176] X.-S. Si, An adaptive prognostic approach via nonlinear degradation modeling:
[145] A. Al-Dulaimi, S. Zabihi, A. Asif, A. Mohammadi, A multimodal and hybrid deep application to battery data, IEEE Trans. Ind. Electron. 62 (2015) 5082–5096.
neural network model for remaining useful life estimation, Comput. Ind. 108 [177] L. Chen, Y. Zhang, Y. Zheng, X. Li, X. Zheng, Remaining useful life prediction of
(2019) 186–196. lithium-ion battery with optimal input sequence selection and error
[146] S. Zheng, K. Ristovski, A. Farahat, C. Gupta, Long short-term memory network for compensation, Neurocomputing 414 (2020) 245–254.
remaining useful life estimation, in: 2017 IEEE international conference on [178] Z. Xue, Y. Zhang, C. Cheng, G. Ma, Remaining useful life prediction of lithium-ion
prognostics and health management (ICPHM), IEEE, 2017, pp. 88–95. batteries with adaptive unscented kalman filter and optimized support vector
[147] L. Wen, Y. Dong, L. Gao, A new ensemble residual convolutional neural network regression, Neurocomputing 376 (2020) 95–102.
for remaining useful life estimation, Math. Biosci. Eng 16 (2019) 862–880. [179] P. Khumprom, N. Yodo, A data-driven predictive prognostic model for lithium-ion
[148] X. Chen, H. Wang, J. Huang, H. Ren, APU degradation prediction based on EEMD batteries based on a deep learning algorithm, Energies 12 (2019) 660.
and Gaussian process regression, IEEE, pp. 98–104. [180] L. Ren, L. Zhao, S. Hong, S. Zhao, H. Wang, L. Zhang, Remaining useful life
[149] X. Liu, L. Liu, L. Wang, Q. Guo, X. Peng, Performance sensing data prediction for prediction for lithium-ion battery: a deep learning approach, IEEE Access 6
an aircraft auxiliary power unit using the optimized extreme learning machine, (2018) 50587–50598.
Sensors 19 (2019) 3935. [181] R. Jiao, K. Peng, J. Dong, Remaining useful life prediction of lithium-ion batteries
[150] F. Wang, J. Sun, X. Liu, C. Liu, Aircraft auxiliary power unit performance based on conditional variational autoencoders-particle filter, IEEE Trans. Instrum.
assessment and remaining useful life evaluation for predictive maintenance, Proc. Meas. (2020).
Inst. Mech. Eng., Part A: J. Power Energy 234 (2020) 804–816. [182] J.F. Murray, G.F. Hughes, K. Kreutz-Delgado, Machine learning methods for
[151] Y. Zhang, Y. Peng, P. Wang, L. Wang, S. Wang, H. Liao, Aircraft APU failure rate predicting failures in hard drives: a multiple-instance application, J. Mach. Learn.
prediction based on improved Weibull-based GRP, IEEE, pp. 1–6. Res. 6 (2005) 783–816.
[152] Y. Zhang, D. Liu, J. Yu, Y. Peng, X. Peng, EMA remaining useful life prediction [183] B. Zhu, G. Wang, X. Liu, D. Hu, S. Lin, J. Ma, Proactive drive failure prediction for
with weighted bagging GPR algorithm, Microelectron. Reliab. 75 (2017) large scale storage systems, IEEE, pp. 1–5.
253–263. [184] N. Aussel, S. Jaulin, G. Gandon, Y. Petetin, E. Fazli, S. Chabridon, Predictive
[153] Y. Zhang, L. Liu, Y. Peng, D. Liu, An Electro-Mechanical Actuator motor voltage models of hard drive failures based on operational data, IEEE, pp. 619–625.
estimation method with a feature-aided Kalman Filter, Sensors 18 (2018) 4190. [185] C. Xu, G. Wang, X. Liu, D. Guo, T. Liu, Health status assessment and failure
[154] R. Guo, Z. Liu, J. Wang, Remaining useful life prediction for the electro-hydraulic prediction for hard drives with recurrent neural networks, IEEE Trans. Comput.
actuator based on improved relevance vector machine, Proc. Inst. Mech. Eng. Part 65 (2016) 3502–3508.
I: J. Syst. Control Eng. 234 (2020) 501–511. [186] J. Li, R.J. Stones, G. Wang, X. Liu, Z. Li, M. Xu, Hard drive failure prediction using
[155] A. Stetco, F. Dinmohammadi, X. Zhao, V. Robu, D. Flynn, M. Barnes, J. Keane, decision trees, Reliab. Eng. Syst. Saf. 164 (2017) 55–65.
G. Nenadic, Machine learning methods for wind turbine condition monitoring: a [187] F.D.S. Lima, F.L.F. Pereira, L.G.M. Leite, J.P.P. Gomes, J.C. Machado, Remaining
review, Renewable Energy 133 (2019) 620–635. useful life estimation of hard disk drives based on deep neural networks, IEEE, pp.
[156] J. Carroll, S. Koukoura, A. McDonald, A. Charalambous, S. Weiss, S. McArthur, 1–7.
Wind turbine gearbox failure and remaining useful life prediction using machine [188] A. Ruiz-Tagle Palazuelos, E.L. Droguett, R. Pascual, A novel deep capsule neural
learning techniques, Wind Energy 22 (2019) 360–375. network for remaining useful life estimation, Proc. Inst. Mech. Eng. Part O: J. Risk
[157] Z. Song, Z. Zhang, Y. Jiang, J. Zhu, Wind turbine health state monitoring based on Reliab. 234 (2020) 151–167.
a Bayesian data-driven approach, Renewable Energy 125 (2018) 172–181. [189] C. Zhang, P. Lim, A.K. Qin, K.C. Tan, Multiobjective deep belief networks
[158] F. Cheng, L. Qu, W. Qiao, L. Hao, Enhanced particle filtering for bearing ensemble for remaining useful life estimation in prognostics, IEEE Trans. Neural
remaining useful life prediction of wind turbine drivetrain gearboxes, IEEE Trans. Networks Learn. Syst. 28 (2017) 2306–2318.
Ind. Electron. 66 (2019) 4738–4748. [190] Y. Liao, L. Zhang, C. Liu, Uncertainty prediction of remaining useful life using
[159] Y. Hu, H. Li, P. Shi, Z. Chai, K. Wang, X. Xie, Z. Chen, A prediction method for the long short-term memory network based on bootstrap method, in: 2018 IEEE
real-time remaining useful life of wind turbine bearings based on the Wiener International Conference on Prognostics and Health Management (ICPHM), IEEE,
process, Renewable Energy 127 (2018) 452–460. 2018, pp. 1–8.
[160] J.S. Nielsen, J.D. Sørensen, Bayesian estimation of remaining useful life for wind [191] X. Li, Q. Ding, J.-Q. Sun, Remaining useful life estimation in prognostics using
turbine blades, Energies 10 (2017) 664. deep convolution neural networks, Reliab. Eng. Syst. Saf. 172 (2018) 1–11.
[161] L. Saidi, J.B. Ali, E. Bechhoefer, M. Benbouzid, Wind turbine high-speed shaft [192] A.L. Ellefsen, E. Bjørlykhaug, V. Æsøy, S. Ushakov, H. Zhang, Remaining useful
bearings health prognosis through a spectral Kurtosis-derived indices and SVR, life predictions for turbofan engine degradation using semi-supervised deep
Appl. Acoust. 120 (2017) 1–8. architecture, Reliab. Eng. Syst. Saf. 183 (2019) 240–251.
[162] H.D.M. de Azevedo, A.M. Araújo, N. Bouchonneau, A review of wind turbine [193] G. Hou, S. Xu, N. Zhou, L. Yang, Q. Fu, Remaining useful life estimation using
bearing condition monitoring: state of the art and challenges, Renew. Sustain. deep convolutional generative adversarial networks based on an autoencoder
Energy Rev. 56 (2016) 368–379. scheme, Comput. Intell. Neurosci. 2020 (2020).
[163] J.P. Salameh, S. Cauet, E. Etien, A. Sakout, L. Rambault, Gearbox condition [194] M. Kim, K. Liu, A Bayesian deep learning framework for interval estimation of
monitoring in wind turbines: a review, Mech. Syst. Sig. Process. 111 (2018) remaining useful life in complex systems by incorporating general degradation
251–264. characteristics, IISE Trans. 53 (2020) 326–340.
[164] H. Wang, M.-J. Peng, Y.-K. Liu, S.-W. Liu, R.-Y. Xu, H. Saeed, Remaining useful
life prediction techniques of electric valves for nuclear power plants with
convolution kernel and LSTM, Sci. Technol. Nucl. Install. 2020 (2020).
[165] H. Wang, M.-J. Peng, Z. Miao, Y.-K. Liu, A. Ayodeji, C. Hao, Remaining useful life
prediction techniques for electric valves based on convolution auto encoder and
long short term memory, ISA Trans. (2020).
19
Y. Wen et al. Measurement 187 (2022) 110276
[195] L. Ren, J. Cui, Y. Sun, X. Cheng, Multi-bearing remaining useful life collaborative [197] M. Djeziri, S. Benmoussa, R. Sanchez, Hybrid method for remaining useful life
prediction: a deep learning approach, J. Manuf. Syst. 43 (2017) 248–256. prediction in wind turbine systems, Renewable Energy 116 (2018) 173–187.
[196] L. Liao, F. Köttig, A hybrid framework combining data-driven and model-based
methods for system remaining useful life prediction, Appl. Soft Comput. 44
(2016) 191–199.
20