(Article) Sim Et Al - A Python Based Tutorial On Prognostics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.

1007/s12206-022-0728-z

Journal of Mechanical Science and Technology 36 (8) 2022


Original Article
DOI 10.1007/s12206-022-0728-z
A python based tutorial on prognostics
and health management using vibration
Keywords:
· Prognostics and health management
(PHM)
signal: signal processing, feature
· Python
· MATLAB
extraction and feature selection
· Tutorial
· Bearing Jinwoo Sim1, Jinhong Min1, Doyeon Kim1, Seong Hee Cho1, Seokgoo Kim2 and Joo-Ho Choi3
· Gear
1
· Signal processing Department of Aerospace and Mechanical Engineering, Korea Aerospace University, Goyang-si,
· Feature engineering 2
Gyeonggi-do, Korea, Department of Mechanical and Aerospace Engineering, University of Florida,
3
Gainesville, Florida, United States, School of Aerospace and Mechanical Engineering, Korea Aerospace
University, Goyang-si, Gyeonggi-do, Korea
Correspondence to:
Joo-Ho Choi
jhchoi@kau.ac.kr
Abstract MATLAB is a convenient and well-established engineering tool used by many
researchers and engineers in implementing the prognostics and health management (PHM).
Citation:
Sim, J., Min, J., Kim, D., Cho, S. H., Kim,
Recently however, Python has emerged as a new language platform for the same purpose due
S., Choi, J.-H. (2022). A python based to its advantages of free access, high extensibility and plenty libraries. This paper provides a
tutorial on prognostics and health man- Python tutorial to aid the beginners in the PHM to implement the signal processing and feature
agement using vibration signal: signal
processing, feature extraction and feature
engineering using the open access data of gears and bearings. The Python codes are provided
selection. Journal of Mechanical Science at the web page www.kau-sdol.com so that they produce the same results as the MATLAB
and Technology 36 (8) (2022) 4083~4097. codes. As such, they are reliable as well as of practical value to those who want to learn how to
http://doi.org/10.1007/s12206-022-0728-z
implement the PHM by Python or to migrate from the MATLAB to Python.

Received December 26th, 2021


Revised March 2nd, 2022
Accepted April 14th, 2022 1. Introduction
Prognostics and health management (PHM) is an engineering discipline that evaluates the
† Recommended by Editor
Hyun Kyu Kim health of a system using the sensors data and predicts the remaining useful life (RUL) to en-
able the failure prevention while maximizing its service life. Due to the intense attention to this
subject, great number of literature have been published over the last decades with diverse as-
pects and applications. In general, the PHM technique consists of the four steps: (1) signal
processing that removes the unnecessary signals from the raw sensor data, (2) feature engi-
neering to extract or select a finite number of valuable information, (3) diagnosis to identify the
fault mode and its severity, and (4) prognosis to predict the RUL until failure [1].
Though the highlights of the PHM are the diagnosis and prognosis, their performances are
greatly influenced by the first two steps: the signal processing and the feature engineering.
Great number of research as well as review papers have been published toward this topic.
However, less attention was given to the tutorial aspects, which are valuable for the beginners
to implement the process by themselves. Motivated by this, the authors have recently pub-
lished a series of papers that presents the tutorial for the signal processing, feature engineering
and diagnosis for the gears and bearings that are implemented in MATLAB environment [1-3].
While the MATLAB is used extensively for many engineering applications including the PHM
due to the excellent computing environment and outstanding toolboxes, the drawback is that it
is commercial software which limits its use in the industries and academia. Recently, Python
has emerged as another language platform that can serve the similar purpose which is free to
use and featured by high extensibility and libraries. In the field of PHM, the use of Python is
also observed with increase in volume as shown in Fig. 1, which is the survey result of Google
scholar search. While the main reason for the increase may be due to the emerging popularity
© The Korean Society of Mechanical
Engineers and Springer-Verlag GmbH
of deep learning by Python, it may indicate that the demand for Python may be also strong for
Germany, part of Springer Nature 2022 the signal processing and features engineering.

4083
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

Table 1. Python libraries and version. Table 2. Dataset description.

Name Version Site Sampling rate


Name of dataset Data type Number of data
Python 3.9.6 https://www.python.org/ [Hz]

MatPlotlib 3.4.3 https://matplotlib.org/ Normal Fault


HS gear Diagnosis 97656
Nitime 0.9 http://nipy.org/nitime/ 6 11

Numpy 1.20.0 https://numpy.org/ Normal Outer Inner


CWRU bearing Diagnosis 48000
Pandas 1.3.4 https://pandas.pydata.org/ 10 10 10

Scipy 1.7.2 https://www.scipy.org/ IMS bearing Prognosis 20480 1 run to fail data

PyEMD 1.0.0 https://pyemd.readthedocs.io

2. Dataset explanation
In the study of the signal processing and feature engineering,
the vibration data of the gears and bearings are usually used
since they are one of the critical components responsible for
the failure. There are a number of public datasets for this pur-
pose including those by the authors, and some of them were
used in the previous tutorial papers [1, 3] to practice the PHM
conducted by the MATLAB in that time. The same datasets are
considered as shown in Table 2, which are the vibrations ac-
quired by the accelerometer. Among these, the diagnosis data
consist of the normal and fault conditions, whereas the progno-
sis data are obtained regularly from the beginning to the end
Fig. 1. Number of research for PHM by Python and MATLAB in Google during the run-to-failure tests.
scholar.
High speed (HS) gear datasets are provided by Dr.
Bechhoefer [4] for the pinion gear in the wind turbine, meas-
Despite these findings, no tutorials have been reported to ured for 6 s with a sampling frequency of 97656 Hz. The gear
date that addresses how to implement the PHM in Python. This has 32 teeth and rotates with 30 Hz, which yields the gear
paper is dedicated to this aspect: a tutorial is given for the sig- mesh frequency (GMF) of 960 Hz. There are 6 and 11 datasets
nal processing and feature engineering by Python. Either for for the normal and fault conditions, in which the natural faults
those who want to learn how to implement the PHM by Python are given in the pinion gear. Case Western Reserve University
or to migrate from the MATLAB to the Python, this paper may (CWRU) bearing dataset [5] contains many cases with various
be of practical value since the paper provides the MATLAB and sizes and locations. Among them, “normal baseline data” and
Python codes side by side. Note that where possible, the Py- “48 k drive end bearing fault data” for the fault of 0.021 inch at
thon codes are written so that the results closely agree with the inner and outer race, are taken, which are collected under
those by the MATLAB. 3 hp motor load and 1730 rpm with measured time about 10 s
Although it is a powerful language platform, the Python by it- and sampling frequency 48 kHz. Since only a single dataset is
self is limited in capability and additional libraries such as given for each case, it is divided into 10 segments to create
numpy should be installed to implement the PHM functions as 10 sets of data for the diagnostic study. The intelligent mainte-
shown in this paper. To avoid compatibility issues, Table 1 nance system (IMS) bearing data are made by the University of
shows the name, version and site of the library used in this Cincinnati, which are obtained every 10 minutes during the run-
paper. Unless these are accommodated, the Python code may to-failure experiment under the load of 6000 lbs and speed of
not run properly. In Appendix, how to apply this as a prelimi- 2000 rpm, with sampling frequency 20480 Hz [6]. Among them,
nary setting is addressed along with the file “requirements.txt.” dataset 2 is used, which is the first bearing among the four that
Additional information about the libraries can also be found in has ended up with outer race fault.
their own sites or in Python’s official package index, “PyPI.” In the signal processing, two datasets HS gear and CWRU
Since the main purpose is to provide a Python tutorial for how bearing are used for the gear and bearing respectively. In the
to implement the PHM, the detail theories or algorithms are not HS gear, time synchronous averaging (TSA) algorithm is im-
presented here, but the citations are given to the relevant litera- plemented. Then the features extraction is followed. In the
ture where necessary. The resulting values and figures are CWRU bearing, it goes through the three steps: autoregressive
given by the Python for each example in the following sections, filter, spectral kurtosis, and envelope analysis, which are typcial
of which the full codes are available in the website of authors’ in the signal processing of many bearings. Features extraction
lab (https://www.kau-sdol.com/matlab-code) to aid in practicing is then conducted for the resulting signal. Note that in the sig-
by themselves. nal processing, another type of algorithm: empirical mode de-

4084
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

composition is also implemented for the CWRU bearing. short time Fourier transformation (STFT) are often found in this
category. Among these, the EMD is presented here, which has
proven the most outstanding, and is widely used in various
3. Signal processing applications.
The basic purpose of the signal processing is to find out the
information related to the fault by eliminating unwanted noise in 3.1 Time synchronous averaging
the signal. In general, it consists of two steps: discrete signal
separation to remove the unrelated signal from the fault, and When the signal is periodic such as those from the rotating
the signal enhancement to amplify the strength of weak fault shaft, the random noise occurring during the rotation can be
signal [7]. Among the numerous algorithms in each step as removed by averaging out the signal over each revolution,
listed in Fig. 2, (for more detail, see, e.g., Refs. [8, 9]), time which are stated as follows:
synchronous averaging (TSA) and auto regressive (AR) filter
1 N −1
are chosen for the signal separation, and spectral kurtosis (SK) y (t ) = ∑x ( t + nT )
N n=0
(1)
and envelope analysis are chosen for the signal enhancement,
respectively, as the tutorial applications. Note that the SK and
envelope analysis are particularly useful for the signal process- where N is the number of rotations, x and y are the raw
ing of the bearing vibration signal. and averaged signal, respectively. As shown in Fig. 3, vibration
While the above techniques are mostly suitable for the sta- signal is segmented for each revolution by using tachometer
tionary signal with constant speed, they are less accurate or and then TSA signal is obtained by averaging out them. In the
inadequate for the non-stationary signals that are encountered MATLAB, this can be performed by the built-in function:
in many applications as well. Signal decomposition is useful in ta = tsa ( x, fs, tp ) for the input signal x with the sampling rate
this case, which is to decompose the original signal into a fs and the pulse times (tacho signal) tp .
number of sub-signals that allows more chance to find fault However, there is no such function in the Python environ-
information. Time-frequency domain analysis such as empirical ment, which requires coding effort based on the understanding
mode decomposition (EMD), wavelet transform (WT), and of the TSA principle: The data are resampled for averaging

Fig. 2. Algorithms in the signal processing of raw vibration signal.

Fig. 3. Overview of TSA process.

4085
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

since the time interval and the number of data points in one given in the Authors’ website. The result of TSA for the 180
revolution is different. Linear interpolation is applied for resam- rotations of normal HS gear data is shown in Fig. 4(a), in which
pling with the number of points fixed at the minimum number the raw and TSA data are those of first rotation and average of
within the N rotations [10]. Averaging is then performed for 180 rotations, respectively, where Acc. represents acceleration
the N rotations. The Python code that implements this is in terms of g.
The fast Fourier transform (FFT) indicates that the gear
mesh frequency (GMF) is clearly identified at 975 Hz in the
TSA signal by averaging out the random noise in terms of the
amplitudes in the FFT.

3.2 Autoregressive filter


The overview of AR filter process is shown in Fig. 5. When
the signal contains deterministic signal originated by other
sources, it can be identified by the autoregressive (AR) model,
which estimates the current value of time series via weighted
sum of the past values, and is stated as follows:

p
x p ( n ) = −∑a ( k ) x ( n − k ) (2)
(a) k =1

where x and x p are the raw and predicted signal, a ( k )


and p are the parameters and order of the model, respec-
tively. The a ( k ) can be obtained by the Yule-Walker equation.
Once the x p is obtained, it is subtracted from the raw signal
to get the residual signal only, using which the fault information
can be extracted more easily:

e ( n) = x ( n) − xp ( n) . (3)

The order p is usually determined to maximize the degree


of fault signal, which is evaluated by the kurtosis. While it indi-
(b)
cates the peakedness or impulsiveness as a measure of fault
Fig. 4. Raw and TSA signal of HS gear: (a) time domain; (b) frequency occurrence, it is not easily identified by the raw signal but
domain. greatly enhanced by the residual signal [11]. In this case, both

Fig. 5. Overview of AR filter process.

4086
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

Table 3. MATLAB and Python codes for AR filter. Table 4. MATLAB & Python command for fast kurtogram.

MATLAB command MATLAB command


a1= aryule ( x, p ) ; kurtogram ( x, fs )

( )
xp = filter ⎡⎣0 − a1( 2 : end ) ⎦⎤ , 1, x ; Python command
xn = x − xp; fast _ kurtogram ( x, fs, nlevel = 8 )
Python command
a1= AR _ est _ YW ( x, p, rxx = None ) xp

(
= lfilter np.append ( 0, a1[0]) , [1] , x, axis = 0 )
xn = x − xp

Fig. 7. STFT and spectral kurtosis.

Fig. 6. Raw and AR filtered signal of CWRU bearing. odically at the bearing fault frequencies, which will be men-
tioned later in Sec. 4.3. But this is modulated by the band of
the MATLAB and Python have the built-in functions for the AR much higher resonant frequencies of the bearings.
filter as shown in Table 3. In the MATLAB, the AR parameters The SK enables to find this band since it becomes higher
and predicted signal are obtained by aryule and filter, while around the region of resonant frequency. As shown in Fig. 7, a
they are AR_est_YW and lfilter in the Python. Note that the specific frequency band with higher magnitude is identified in
sign of parameters a ( k ) in Python is opposite to that of the the STFT, at which the SK also shows higher value. By band-
MATLAB. pass filtering to this band, while removing the signals else-
The AR filter is applied to the CWRU bearing data which in- where, the fault signal can be enhanced, i.e., the kurtosis is
cludes the inner race fault. The result is shown in Fig. 6, in increased.
which the kurtosis increases greatly from 3.7391 to 23.2105 In the SK evaluation, the STFT is required, which is a local
after the filtering. Note that the kurtosis is usually used as a FFT for a finite window at a time t . Therefore, the SK de-
measure of impulsiveness indicator of fault signal. The reason pends on the size of time window. In order to find out the best
for the increase is because the other components irrelevant to band among the various window sizes, the kurtogram is con-
the fault are removed and only the residual signal is used. structed, which visualizes the SK as a function of two variables:
frequency f and STFT window size w [12]. MATLAB &
Python functions to construct and visualize the kurtogram are
3.3 Spectral kurtosis
in Table 4, where fs is the sampling frequency and nlevel
Spectral kurtosis (SK) is an extension of kurtosis in the fre- controls the level of window size in the kurtogram. In Python,
quency domain, indicating how the impulsiveness of a signal nlevel = 8 is added to make the result same as the MATLAB.
varies with frequency. Note that as stated before, the SK is The results of kurtogram for the AR filtered signal of CWRU
useful for the signal processing of the bearing vibration data. bearing is given in Fig. 8(a) where the optimum frequency band
The definition is as follows [1]: that yields the maximum SK is given with the center at
21015.6 Hz and width being 2000 Hz. Figs. 8(b) and (c) repre-
4
S (t, f ) sent the residual and its band pass filtered signal in the time
K( f )= 2
−2 (4) domain and frequency domain respectively. As shown in the
S (t, f )
2
middle of Fig. 8(b), the kurtosis is enhanced greatly from
23.2105 to 193.752 after filtering by the band identified in the
where f is the frequency, ⋅ is the time-averaging operator, kurtogram. The FFT is given on the right in Fig. 8(c) which
and S ( t , f ) is the short time Fourier transform (STFT) of the indicates the signals outside the band are removed. For com-
signal. The reason to introduce SK is as follows: when the fault parison, the result by the randomly chosen band is also given
occurs in the bearing, the impulse signals are produced peri- at the bottom of Fig. 8(b), which shows little change in the kur-

4087
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

Table 5. MATLAB & Python codes for envelope analysis.

MATLAB command
xn = abs ( hilbert ( x ) )
Python command
from scipy.signal import hilbert
xn = abs ( hilbert ( x, axis = 0 ) )

(a)

(a)

(b)

(b)

(c)

Fig. 8. Use of spectral kurtosis for CWRU bearing signal: (a) fast kurto-
gram; (b) time domain signal; (c) frequency domain signal.

tosis. Hence it confirms the importance of bandpass filtering by


SK and kurtogram.
(c)

Fig. 9. Envelope analysis of CWRU bearing signal: (a) time domain signal;
3.4 Envelope analysis (b) frequency domain signal; (c) scale up view in (0, 200) Hz of (b).
Envelope analysis is also useful for the bearing vibration
data as stated earlier, which is usually applied to the bandpass a ( t ) = x ( t ) + jxˆ ( t )
filtered signal via SK and kurtogram. Note however that the (5)
= x ( t ) + xˆ 2 ( t )e = A(t ) e
2 jφ ( t ) jφ ( t )
filtering is not always necessary since the good results are
often obtained only by the envelope analysis without filtering.
The envelope analysis is to demodulate the signal by separat- where x̂ ( t ) is the phase shifted signal by Hilbert transform,
ing out the fault signal (envelope) from the resonance (carrier). and A ( t ) is the instantaneous amplitude, which is the enve-
This is accomplished by taking Hilbert transform, which shifts lope signal that we want. The MATLAB and Python functions
the phase of the input signal by -90 degrees, followed by calcu- are given in Table 5.
lating the analytic signal: Fig. 9(a) shows the bandpass filtered signal obtained previ-

4088
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

ously and its envelope signal in a single figure. Note that the from high to low. The MATLAB and Python codes are in Table
signal is given with narrower scale than Fig. 9(b), by which one 6, where imf represents the multiple IMFs and res is the
can identify the envelope surrounding the raw signal more residue. In order to get the same result as the MATLAB, argu-
easily. Comparing the FFT of bandpass filtered signal with that ment “max_imfs = 10” is added in the Python command be-
of envelope signal in Fig. 9(b), the high frequency spectrum in cause the default maximum number of IMFs is set at 10 in
the resonance range is removed after going through the enve- MATLAB.
lope analysis, leaving only those in the low frequency that can The EMD is applied to the raw CWRU bearing signal. 10
better represent the fault signal of the bearing. The close-up IMFs are obtained, among which the first, second, last IMF and
views in (0, 200) Hz of the two FFTs are given in Fig. 9(c), the residue are shown in Fig. 10. As shown in the figure, IMF1
where the fault frequency of the bearing is clearly identified includes the high frequency component whereas the IMF10
after the envelope analysis. More detail of fault frequency will represents the lower one. By examining each of the IMFs, fault
be addressed in Sec. 4.3. features can be extracted [15, 16], or fault frequency of bearing
can be identified [17].
3.5 Empirical mode decomposition
EMD was proposed by Huang et al. [13], which is an algo- 4. Features extraction
rithm to decompose the signal into a set of complete and al- Once the signal processing is completed, finite number of
most orthogonal components called intrinsic mode functions features are extracted from the signal, which represent the fault
(IMF) and one residue representing the mean trend of the sig- or degraded health by a quantitative value. In general, the fea-
nal. IMF is the mode function that has the number of extrema tures can be categorized into those in time domain, frequency
and the number of zero-crossings equal or differ at most by domain, and time-frequency domain [18]. Among these, time-
one, with the average of upper and lower envelopes being zero.
A good review of EMD for the fault diagnosis in rotating ma-
chinery is presented by Lei et al. [14]. In the EMD, the original Table 6. MATLAB & Python codes for EMD.
signal is decomposed as:
MATLAB command
K
x ( t ) = ∑ci ( t ) + r ( t ) (6) [imf , res ] = emd ( x )
i =1
Python command
emd = Pyemd .EMD ( max_ imfs = 10 )
where ci ( t ) are the IMFs and r ( t ) is the residue. In fact,
each IMF signal represents different frequency bands ranging imfs = emd .emd ( x )

(a) (b)

Fig. 10. Result of EMD for CWRU bearing signal: (a) time domain; (b) frequency domain.

4089
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

domain features are the simplest and more commonly used in identification, the most common ones are presented here in
various fields. On the other hand, there are some features Table 7. The features are calculated by Python for the TSA
specific to the gears and bearings that are attributed to their signal of 6 normal and 11 faulty HS gear data. Then they are
own fault mode, most of which are expressed in the frequency normalized by their mean and standard deviation for each fea-
or time-frequency domain. The functions of MATLAB and Py- ture. The results for the normal and fault are given in Fig. 11,
thon for extracting these features are addressed in this section. which shows that most of the features classify the fault (x) from
the normal (o) well, but some (SK and SF) do not. This means
that further step is necessary to find out the more useful fea-
4.1 Time domain features
tures as will be addressed in Sec. 5.
Among the many time domain features used for the fault

Table 7. Time domain features.

Feature name Brief explanation Formula

Mean The average value of the signal ∑X


N

∑( X − X )
2

STD The standard deviation of the signal


N −1

RMS
The root mean square of the signal, which generally tends to get bigger ∑X 2

as the degree of fault in the bearing increase [19] N

1
∑( X − X )
3

SK The skewness of the signal, which measures asymmetry of the distribution of the signal N
STD 3
1
∑( X − X )
4
The kurtosis of the signal, which represents the sharpness of the distribution of the signal,
KUR N
and if this value is close to 3, it is closer to normal distribution
STD 4
RMS
The shape factor of the signal, which is affected by an object’s shape
SF 1
but is independent of its dimensions [20]
N
∑X
The crest factor of the signal, which represents how much impact occur during the rolling max X
CF
element and raceway contact and appropriate for “spiky signal” [20] RMS
max X
The impulse factor of the signal, which is used to measure how much impact is generated
IF 1
from the bearing fault as CF [18]
N
∑X
max X
The margin factor of the signal, which measures the level of impact between 2
MF ⎛1 ⎞
rolling element and raceway [18] ⎜
⎝N
∑ X ⎟

Peak The maximum value of absolute value of the signal max X

The peak to peak value of the signal, which means the difference between
P2P max ( X ) − min ( X )
the maximum and minimum values

Fig. 11. Time domain features of HS gear.

4090
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

4.2 Features for gear random noise occurred during the rotation. Residual signal is
obtained from the denoised signal by removing the compo-
Over many years, specific features have been developed for
nents of GMF and its harmonics since they are not related with
gear fault detection, which are listed in Table 8 with brief ex-
the fault. Difference signal is made by further removing the
planation (see for the full detail [21, 22]). The calculation of the
components of sideband frequencies around the GMF with the
features requires the evaluation in the time as well as in the
distance being the shaft rotational frequency.
frequency domain. For example, the FM0 consists of the P2P
For the HS gear data, the features are extracted based on
which is the time feature defined in Table 7 and the summation
the formula in Table 8, and the results are given in Fig. 12 after
of amplitudes of GMF and its harmonics in the frequency do-
normalization. As same as Fig. 11, some are good, while the
main. The signals in Table 8 consist of three types, in which the
others such as SER and NB4 are bad for classification.
denoised signal is made by performing the TSA to remove the

Table 8. Specific features for gear.

Signal type Feature name Brief explanation Formula


A robust indicator of major faults in a gear mesh [21] P2P
FM0

N har
N har is the number of GMF harmonic and P is an amplitude of GMF. i =1
Pi

Denoised signal Sideband energy ratio, which measures the energy ratio of sideband
∑ (S + Si− )
N sb
that generally caused by modulation of GMF [23] +

SER i =1 i

N sb is the number of sidebands, S +


and S −
are the amplitude P1
of sidebands.
General fault indicator which reacts not only to the one set of damage
N ∑ i =1 ( riM − rM )
N 4

as FM4 does, but also to the continuing growth of the fault [21]
Residual signal NA4 ⎧1 2 ⎫
2

∑ ⎡⎢⎣∑ ( r − rj ⎤ ⎬ )
M N
r is the residual signal, and M is the number of current time ⎨ ij
⎥⎦ ⎭
⎩M j =1 i =1
signal and is set to 1 in this paper.
The kurtosis of difference signal, which detects the pattern changes N ∑ i =1 ( d i − d )
N 4

FM4 resulting from the damage on a few gear teeth [22] 2


⎡ N ( d − d )2 ⎤
d is the difference signal. ⎣⎢ ∑ i =1 i ⎦⎥

N 2 ∑ i =1 ( di − d )
N 6

M6A 3
⎡ N ( d − d )2 ⎤
th th
6 and 8 statistical moment of difference signal, which developed to ⎢⎣ ∑ i =1 i ⎥⎦
Difference signal
detect surface damage on machinery components [24]
N 3 ∑ i =1 ( d i − d )
N 8

M8A 4
⎡ N (d − d ) ⎤
⎣⎢ ∑ i i
2

⎦⎥
RMSdiff
ER Energy ratio that can detect heavy uniform wear [25] ⎡ Pi + ∑ Nsb ( Sij+ + Sij− ) ⎤

N har
i =1 ⎣ j =1 ⎦
Indicator of localized gear tooth damage [26] N ∑ i =1 ( siM − sM )
N 4

BPF signal NB4 ⎧1 2 ⎫


2

∑ ⎡⎣⎢∑ ( s − sj ⎤⎬ )
M N
s is the envelope signal of band pass filtered signal. ⎨
⎩M j =1 i =1 ij
⎦⎥ ⎭

Fig. 12. Specific features of HS gear.

4091
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

Table 9. Formula for bearing fault frequency.

Ball pass frequency outer


nf r ⎛ d ⎞
f BPFO = ⎜ 1 − cosθ ⎟
2 ⎝ D ⎠
Ball pass frequency inner
nf r ⎛ d ⎞
f BPFI = ⎜ 1 + cosθ ⎟
2 ⎝ D ⎠
Fundamental train frequency
fr ⎛ d ⎞
f FTF = ⎜ 1 − cosθ ⎟ (a)
2⎝ D ⎠
Ball spin frequency
D ⎡ ⎛d ⎞ ⎤
2

f BSF = f r ⎢1 − ⎜ cosθ ⎟ ⎥
2d ⎣⎢ ⎝ D ⎠ ⎦⎥

Table 10. Values of bearing fault frequency.

Data set BPFO BPFI FTF BSF


CWRU 3.5848 5.4152 0.3983 4.7135
IMS 7.0921 8.9079 0.4433 4.1975

(b)

Fig. 13. Geometry of bearing.


(c)

Fig. 14. Envelope spectrum of CWRU bearing: (a) normal; (b) outer race
4.3 Features for bearing fault; (c) inner race fault.
When the fault is created in the rolling element bearing, there
exist certain fault frequencies produced by the periodic impulse,
which are different depending on the fault location: outer race,
inner race, cage, and ball. They are determined by the formula
in Table 9, in which the f r denotes the shaft rotating speed,
d and D represent the diameter of ball and bearing respec-
tively, θ is the contact angle of bearing and n is the number
of rolling elements, as illustrated in Fig. 13. In the bearing, it is
common to use the amplitudes at these frequencies after per-
forming the FFT as the features indicating the fault existence.
The frequencies of two bearings in Table 2 are given in Table
10. Fig. 15. Time domain and specific features of CWRU bearing.
The amplitudes at the bearing fault frequencies are calcu-
lated for the normal, inner, and outer race faults of the CWRU
bearing, respectively. These are obtained by extracting the whereas they are not in the normal bearing, which validates the
envelope signal after going through the AR filtering and band- existence of respective fault. As a result, the amplitudes at the
pass filtering. The FFT in Fig. 14 shows that the peaks at the four fault frequencies are extracted as the features and plotted
BPFI and BPFO in the faulty bearing are clearly identified along with the other 11 time domain features in Fig. 15.

4092
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

5. Feature selection From this, the mean is recommended as the appropriate fea-
As was mentioned previously, for more efficient performance, ture for the fault diagnosis of HS gear.
it is better to select only the outstanding ones instead of the
whole features. In the diagnosis, Fisher’s discriminant ratio 5.2 Scatter matrices
(FDR) and scatter matrix are used for this purpose. In the The FDR introduced above is applied to a single feature se-
prognosis, three prognostic measures: monotonicity, trendabil- lection for the two classes only, which is of limited value. For
ity and robustness are used to select the good features. the multi-class and multi-dimensional features, the concept of
FDR can be extended by using the scatter matrices: within-
5.1 Fisher’s discriminant ratio class matrix, between-class matrix, and mixture matrix [27]. Let
the number of classes be M , and the prior probability of class
FDR is the metric that measures how the two Gaussian dis- be Pi . Then the within-class scatter matrix, which represents
tributions are far apart from each other, which is defined as the degree of dispersion of each class, is defined as
follows [27]:
M
S w = ∑Pi Si , where Si = E ⎡( x − μi ) ( x − μi )⎤⎦
T
(8)
( μ − μ2 )
2

FDR = 1 (7) i =1

σ 12 + σ 22
where Si is the covariance of the feature vector x with the
where the two distributions are denoted by N1 ( μ1 ,σ 12 ) and mean μi at the i th class. The between-class scatter matrix,
N 2 ( μ 2 ,σ 22 ) . The numerator ( μ1 − μ 2 ) indicates how far the
2
which represents the distance of the mean of each individual
distance is between the centers of two classes, and the de- class from the global mean μ0 , is defined as:
nominator σ 12 + σ 22 indicates how large the dispersion of the
M M
two classes is. Table 11 is the calculated FDR of each feature
Sb = ∑Pi ( μi − μ0 ) ( μi − μ0 ) , where μ0 = ∑Pi μi .
T
(9)
obtained in Sec. 4 for the HS gear. Fig. 16 shows the PDFs of i =1 i =1

the normal and fault of the features with highest FDR on the left
and lowest on the right. Obviously, the feature on the left dis- The mixture scatter matrix is then defined as:
tinguishes the normal and fault clearly, while the right is not.
S m = E ⎡( x − μ0 ) ( x − μ0 )⎤⎦ = Sw + Sb .
T


(10)
Table 11. Best and worst three FDRs of HS gear.
With these matrices, a new criterion for feature evaluation is
Best 3 Feature FDR value
defined as follows:
1 Mean 57.02852
2 FM4 12.23760
J 3 = trace { S w−1 S m } . (11)
3 CF 12.13059
Worst 3 Feature FDR value
Note here that the feature vector x consists of N × l ma-
1 SK 0.000046
trix where N and l are the number of total samples includ-
2 NB4 0.000558 ing the normal and faults and the number of features to select,
3 SF 0.227416 respectively.

(a) (b)

Fig. 16. Distribution of features for HS gear: (a) highest FDR; (b) lowest FDR.

4093
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

As multiple features can be selected by this approach, one prediction. Second metric is the trendability which is also called
can consider any combinations of the elements in the feature Spearman correlation. Trendability assesses how well the rela-
vector to evaluate J 3 , and select the one with the highest tionship between two variables can be described using a
value. To illustrate this, consider the features obtained from monotonic function and is defined as follows:
CWRU bearing data, which consist of 15 features. When two
features ( l = 2) are considered, we have 105 combinations for cov ( rgT , rg X )
Tre (T , X ) = (13)
J 3 evaluation. Among them, the best and worst three combi- σ rg σ rg
T T

nations are listed in Table 12. The individual features data are
also plotted in Fig. 17 for the best and worst features. As ex- where T is the time, cov ( ⋅) is covariance, σ is the stan-
pected, the features with highest J 3 are outstanding in distinc- dard deviation, and rg represents the rank of the variable.
tion, while those with lowest are not. The MATLAB & Python functions to calculate trendability is in
Table 13.
5.3 Metrics for prognosis The robustness measures how the signal is robust to ran-
dom fluctuation by measurement noise. Selecting features
In the prognosis, it is important to figure out which features
robust to noise is important because large noise can cause
represent the health degradation tendency over time better.
poor performance in the prognosis. The robustness is defined
There are three metrics to measure the performance in this
as [28]:
context, which are the monotonicity, trendability and robust-
ness. The monotonicity measures the monotonic increasing or
1 ⎛ X ( k ) − X ( k ) ⎞
decreasing trend. The equation is as follows [28]: Rob ( X ) = ∑
k k
exp ⎜ −
⎜ X (k )


(14)
⎝ ⎠
1
Mon ( X ) = No. of d / dx > 0 − No. of d / dx < 0 (12)
K −1 where X is the smoothed value of X in terms of time.
Smoothing is generally conducted by the moving average,
where X is the feature, K is the number of sequences, and which is to average out the current value by the finite number
d / dx is the difference of the feature sequence. The value of recent data.
varies between 0 and 1, and the closer the absolute value is to Since the three metrics are obtained for the prognosis, their
1, the more completely monotonic it is, and provides better average is usually used to select the features that shows good
prognostic performance. IMS bearing dataset are used to illus-
trate this. The 15 features of IMS bearing data are extracted in
Table 12. Best and worst 3 J 3 values of CWRU bearing. time sequence, which consist of 11 time domain and 4 bearing
fault frequency features. Note that the time domain features
Best 3 Feature combination J 3 value are extracted from raw data and bearing features are extracted
1 SK & SF 751.4
2 SK & BPFO 732.63
Table 13. MATLAB & Python codes for trendability.
3 STD & SF 489.56
Worst 3 Feature combination J 3 value MATLAB command
1 FTF & BSF 6.6366 tre = corr (T, X ‘Spearman’)
2 SK & FTF 10.207 Python command
3 SK & BSF 11.091 tre = spearmanr (T, X )

(a) (b)

Fig. 17. Features data of CWRU bearing: (a) hgh J 3 ; (b) low J 3 .

4094
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

(a)

(b) (c)

Fig. 18. Prognostic features of IMS bearing: (a) average of three metrics; (b) feature with highest metric over cycles; (c) feature with lowest metric over cycles.

from envelope signal after AR and bandpass filtering. As (MSIT) (No. 2020R1A4A4079904).
shown in Fig. 18(a), the STD and IF are those with the best
and worst prognostic performance in terms of the average
metrics. In Figs. 18(b) and (c), the trends of STD and IF are References
plotted in terms of cycles, which indicates the former one [1] S. Kim, D. An and J. H. Choi, Diagnostics 101: A tutorial for
shows increase beyond the 600 cycles whereas the IF does fault diagnostics of rolling element bearing using envelope
not show any degradation trend over cycles. analysis in MATLAB, Appl. Sci., 10 (20) (2020) 1-23, doi:
10.3390/app10207302.
[2] S. Kim, C. Lim, S.-J. Ham, H. Park and J.-H. Choi, Tutorial for
6. Conclusion prognostics and health management of gears and bearings:
This paper has introduced some fundamental PHM steps as advanced signal processing technique, Trans. KOREAN Soc.
a tutorial that demonstrates from the signal processing to the Mech. Eng. A, 42 (12) (2018) 1119-1131.
features selection using the open-access datasets and corre- [3] J. Sim, S. Kim, H. J. Park and J.-H. Choi, A tutorial for feature
sponding MATLAB & Python codes. The purpose is to help the engineering in the prognostics and health management of
PHM beginners to study the Python and implement it by them- gears and bearings, Appl. Sci., 10 (16) (2020) 5639.
selves. The full MATLAB & Python codes to execute the overall [4] E. Bechhoefer, High speed gear dataset, Acoustics and
process in this paper can be accessed at the authors’ home- Vibration Database (2014).
page https://www.kau-sdol.com/matlab-code. [5] Case Western Reserve University Bearing Data Center, Bear-
ings Vibration Data Sets, Available at: https://engineering.case.
edu/bearingdatacenter/welcome (2020).
Acknowledgments [6] J. Lee, H. Qiu, G. Yu, J. Lin and Rexnord Technical Services,
This work was supported by the National Research Founda- IMS Bearing Run-to-Failure Dataset (2007) 2-3.
tion of Korea (NRF) grant funded by the Korea government [7] R. B. Randall and J. Antoni, Rolling element bearing diagnostics-

4095
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

A tutorial, Mechanical Systems and Signal Processing, 25 (2) Assessment of sideband energy ratio technique in detection of
(2011) 485-520, doi: 10.1016/j.ymssp.2010. 07.017. wind turbine gear defects, Case Stud. Mech. Syst. Signal
[8] R. B. Randall, Vibration-Based Condition Monitoring, John Process., 2 (2015) 1-11.
Wely & Sons. Ltd (2011). [24] H. R. Martin, Statistical moment analysis as a means of
[9] N. Sawalhi, Vibrations of Spalled Rolling Element Bearings, surface damage detection, Proceedings of the 7th International
Diagnostics, Prognostics and Dynamic Simulations, VDM Verlag Modal Analysis Conference, 1 (1989) 1016-1021.
(2009). [25] H. J. Decker and D. G. Lewicki, Spiral Bevel Pinion Crack
[10] E. Bechhoefer and M. Kingsley, A review of time synchronous Detection in a Helicopter Gearbox, NASA TM-2003-212327
average algorithms, Annual Conference of the PHM Society, 1 ARL-TR-2958, June (2003) 16.
(1) (2009). [26] Y. Lei and M. J. Zuo, Gear crack level identification based on
[11] N. Sawalhi, R. B. Randall and H. Endo, The enhancement of weighted K nearest neighbor classification algorithm, Mech.
fault detection and diagnosis in rolling element bearings using Syst. Signal Process., 23 (5) (2009) 1535-1547.
minimum entropy deconvolution combined with spectral [27] S. Theodoridis and K. Koutroumbas, Pattern Recognition,
kurtosis, Mech. Syst. Signal Process. (2007) doi: 10.1016/ Second Edition, Academic Press New York, NY, USA (2003).
j.ymssp.2006.12.002. [28] Y. Lei, N. Li, L. Guo, N. Li, T. Yan and J. Lin, Machinery health
[12] J. Antoni, Fast computation of the kurtogram for the detection prognostics: a systematic review from data acquisition to RUL
of transient faults, Mech. Syst. Signal Process. (2007) doi: prediction, Mech. Syst. Signal Process., 104 (2018) 799-834.
10.1016/j.ymssp.2005.12.002.
[13] N. E. Huang et al., The empirical mode decomposition and
the Hubert spectrum for nonlinear and non-stationary time Appendix
series analysis, Proc. R. Soc. A Math. Phys. Eng. Sci. (1998) This section describes how to execute the codes provided by
doi: 10.1098/rspa.1998.0193. the authors. They are available by the zip folder at the home-
[14] Y. Lei, J. Lin, Z. He and M. J. Zuo, A review on empirical page, which consists of 8 codes and two folders as shown in
mode decomposition in fault diagnosis of rotating machinery, Fig. A.1.
Mech. Syst. Signal Process., 35 (1-2) (2013) 108-126. Both the MATLAB and Python codes are given for each
[15] Y. Yu and C. Junsheng, A roller bearing fault diagnosis problem, which are tuned to yield the same result. In the case
method based on EMD energy entropy and ANN, J. Sound of MATLAB, it is not difficult to execute each code using the
Vib., 294 (1-2) (2006) 269-277. related toolbox if necessary, by simply installing it by following
[16] C. Junsheng, Y. Dejie and Y. Yu, A fault diagnosis approach the MATLAB instruction. However, in the case of Python, it
for roller bearings based on EMD method and AR model, may not be the case due to the version issues of libraries,
Mech. Syst. Signal Process., 20 (2) (2006) 350-362. which are similar to the toolbox in MATLAB. To run the Python
[17] D. Yu, J. Cheng and Y. Yang, Application of EMD method and codes given by the authors without error, follow the instruc-
Hilbert spectrum to the fault diagnosis of roller bearings, Mech. tions: first, download and install the Python version 3.9.6 (The
Syst. Signal Process., 19 (2) (2005) 259-270. reason is because some libraries are not installed in higher
[18] W. Caesarendra and T. Tjahjowidodo, A review of feature versions such as Python 3.10). Note that the location of Python
extraction methods in vibration-based condition monitoring and folder should be added to the PATH in the environment vari-
its application for degradation trend estimation of low-speed able. Next, download the “requirements.txt” from the aforemen-
slew bearing, Machines, 5 (4) (2017) 21. tioned website and place the file in the default directory of ter-
[19] Z. Shen, Z. He, X. Chen, C. Sun and Z. Liu, A monotonic minal. The file contains the libraries listed in Table 1 and their
degradation assessment index of rolling bearings using fuzzy prerequisites. Next, type “pip install -r requirements.txt” in the
support vector data description and running time, Sensors, 12 terminal to install these libraries. After the installation is com-
(8) (2012) 10109-10135. plete, users can execute the Python codes safely, and get the
[20] C. T. Yiakopoulos, K. C. Gryllias and I. A. Antoniadis, Rolling same results given in this paper.
element bearing fault detection in industrial environments
based on a K-means clustering approach, Expert Syst. Appl.,
38 (3) (2011) 2888-2911.
[21] P. D. Samuel and D. J. Pines, A review of vibration-based
techniques for helicopter transmission diagnostics, J. Sound
Vib., 282 (1-2) (2005) 475-508.
[22] M. Lebold, K. McClintic, R. Campbell, C. Byington and K.
Maynard, Review of vibration analysis methods for gearbox
diagnostics and prognostics, Proceedings of the 54th Meeting
of the Society for Machinery Failure Prevention Technology,
634 (2000) 16.
[23] T. R. Pattabiraman, K. Srinivasan and K. Malarmohan, Fig. A.1. The contents of uploaded file.

4096
Journal of Mechanical Science and Technology 36 (8) 2022 DOI 10.1007/s12206-022-0728-z

Jinwoo Sim received the B.S. degree in Seokgoo Kim received the bachelor’s
mechanical engineering in 2020 from and master’s degrees in mechanical
Korea Aerospace University, Goyang, engineering from Korea Aerospace Uni-
South Korea, where he is currently work- versity, Goyang, South Korea, in 2016
ing toward the M.S. degree in mechani- and 2018, respectively. He is currently
cal engineering with Department of pursuing the dual Ph.D. degree with
Aerospace and Mechanical Engineering. Korea Aerospace University and the
His research interest focuses on prog- University of Florida. His research inter-
nostics and health management (PHM) for the mechanical ests include prognostics and health management for complex
system and health index construction of systems operating engineering systems, data analytics, machine learning, and
under variable conditions. uncertainty management.
Mr. Sim is a recipient of KSME Best Paper Award in 2021. He has received the KSME Student Best Paper Award, in 2019,
the ICMR Best Paper Award, in 2019, and the PHM Asia Pa-
Jinhong Min is currently working toward cific Best Student Award (Bronze), in 2021.
the B.S. degree in mechanical engineer-
ing with Department of Aerospace and Joo-Ho Choi received the B.S. degree
Mechanical Engineering in Korea Aero- from Hanyang University, Seoul, South
space University, Goyang, South Korea. Korea, and the M.S. and Ph.D. degrees
His research interest focuses on prog- from the Korea Advanced Institute of
nostics and health management (PHM) Science and Technology, Seoul, South
for the mechanical system and machine Korea, all in mechanical engineering.
learning. He was an Engineer with Samsung
Corning, Suwon, South Korea, where he
Doyeon Kim is currently working toward led the research team developing equipment to improve quality
the B.S. degree in mechanical engineer- and productivity in TV glass production. He is currently a Pro-
ing with Department of Aerospace and fessor of Aerospace and Mechanical Engineering with Korea
Mechanical Engineering in Korea Aero- Aerospace University, Goyang, South Korea. Over the years,
space University, Goyang, South Korea. he has made some key publications, including the reviews,
Her research interest focuses on prog- tutorials, and a book to help the engineers research and prac-
nostics and health management (PHM). tice in the field of prognostics and health management (PHM).
Prof. Choi served as the Editor for the Journal of Mechanical
Science and Technology for six years. He has founded the
Seong Hee Cho received the B.S. and Korean Society for PHM in Korea. In 2018, he became a Fel-
the M.S. degrees in mechanical engi- low for the PHM society of USA for his leadership as a Chair in
neering from Korea Aerospace Univer- the First Asia Pacific Conference of the PHM Society 2017.
sity, Goyang, South Korea in 2019 and
2021, respectively. Her research interest
focuses on prognostics and health man-
agement (PHM) for engine systems and
machine learning.

4097

You might also like