Keywords: A large amount of data would be generated during the operation of wind turbine (WT), which is easy to cause
Wind turbine dimensional disaster, and if more than one WT fault occur, multiple sensors would alarm. To solve the problems
Fault diagnosis of big data, inaccurate and untimely fault diagnosis and so on, a hybrid fault diagnosis method is developed
based on ReliefF, principal component analysis (PCA) and deep neural network (DNN) in this paper. Firstly, the
Principal component analysis
Deep neural network
ReliefF method is used to select the fault features and reduce the data dimensions. Secondly, PCA algorithm is
used to further reduce the data dimensions, which is mainly used to reduce the redundancy among the data and
improve the accuracy of fault diagnosis. Finally, the ReliefF-PCA-DNN model is constructed, optimized and used
for the fault case of a wind farm in Jilin Province. The experimental results show that, for the single fault, the
accuracies of the proposed hybrid models are all more than 98.5% and for the multi faults, the accuracy of the
proposed model is more than 96%, which both are all much higher than the comparison methods. So, the method
could diagnose the WT faults well.
1. Introduction phase short circuit fault of doubly-fed type wind-driven generator (Li
et al., 2019). Li et al. investigated fault diagnosis of wind turbines by
As a renewable clean energy, wind energy has become an indis using Gaussian process classifiers (GPC) based on the operational data
pensable key force to solve the problem of environmental pollution in collected from the SCADA system (Li et al., 2019). Cho et al proposed a
recent years (Lonf et al, 2017). With the increase of installed capacity fault detection and diagnosis method to automatically identify different
and service life of WTs, the operation and maintenance cost along with fault conditions of a hydraulic blade pitch system in a spar-type floating
the potential safety hazard is increasing especially in mountainous wind wind turbine based on Kalman filter and artificial neural network (Cho
farms, and the serious wind turbulence there would further increase the et al., 2021). Because hundreds of sensors are installed on the WTs, and
WT fatigue load, which is easy to lead to malignant accidents. these data are stored in the SCADA system through the sensors, so the
As we all know, artificial neural network (ANN) technology is an data analysis and mining technology emerges as the times require.
effective method, which is widely used in WT fault diagnosis (Zeng et al., However, once the WT generator appears the urgent breakdown during
2018). Liu and Laouti used support vector machine for WT fault diag the operation, because of the large amount of data and too many pa
nosis and WT fault detection, respectively (Liu et al., 2013; Laouti and rameters unrelated to the fault, it would not be able to locate the fault
Othman, 2011). Zhou et al. proposed the study learning based data- quickly and accurately. A fault may cause many sensors to alarm, and
driven methods for abnormal event detection based on kernel princi the faults may occur at the same time. How to find out the sensitive fault
ple component analysis (kPCA) and the novel discriminative method features from so many parameters and improve the accuracy of fault
that only required partial expert knowledge for training (Zhou et al., diagnosis urgently needs more in-depth study.
2016). Leahy et al. applied classification techniques to recognize the Feature dimensionality reduction is an important premise of WT
fault and fault-free operation of a wind turbine in the South-East of fault diagnosis, which mainly aims to describe the data set in a more
Ireland based on SCADA data (Leahy et al., 2016). Poon et al. presented accurate way. Compared with the original feature set, this method has
a fault detection and identification (FDI) method for switching power fewer features. To achieve this, it removes unnecessary, unwanted, and
converters using a model-based state estimator approach (Poon et al., irrelevant features from the dataset. During the WT real-time operation,
2016). Yu et al. used deep belief network to diagnose the WT fault (Yu many parameters unrelated to the faults are stored in SCADA database.
et al., 2018). Li et al. used BP neural network to diagnose the phase to When a fault occurs, feature dimensionality reduction could eliminate
necessary to mine the fault data and extract the fault sensitive features. ∑k
However, some conventional dimensionality reduction algorithms, such diff(F,R,Mi (C)) (mk)W(F)=W(F)− i=1
diff(F,R,Hi )/(mk)
as Pearson correlation coefficient and so on, could not effectively extract
[ ]/ maximized along the new coordinate axis, and the data points are no
∑ p(C) ∑k
+ diff(F,R,Mi (C)) (mk)W(F) longer linearly correlated. This set of transformed data is called the
1− p(Class(R)) i=1 principal component. The direction with the largest data variance is
∑k ∑ selected as the first principal component, and the second principal
=W(F)− diff(F,R,Hi )/(mk)+
component is selected based on the direction along with the second
[ ]/ largest variance, which is orthogonal to the first principal component.
p(C) ∑k
diff(F,R,Mi (C)) (mk)diff(F,R1 ,R2 ) (1) This process is repeated until n principal components are found. The
1− p(Class(R)) i=1 process of dimension reduction is as follows:
where, p(C) is the distribution probability of class C, class (R) is the A) Centralize the matrix x (de-average each dimension)
category that R belongs to, Hi is the ith nearest neighbor sample in class
(R), Mi(C) denotes the ith nearest neighbor sample in class C, and diff(F, C = VT x (5)
R1,R2) represents the difference between R1 and R2 on the Fth feature.
The features for discrete set are as follows: B) Calculate the covariance matrix of the samples
0; R1 [F] = R2 [F]
diff (F, R1 , R2 ) = (2) 1∑ S
1; R1 [F] ∕
= R2 [F] ψ = xij − xi (6)
S i=1
The features for continuous set are as follows:
1∑ S
diff (F, R1 , R2 ) =
|R1 [F] − R2 [F]|
(3) B= ψψ T (7)
max(F) − min(F) S i=1
C. For F = 1 to n do
If W(F) > δ, the Fth feature is added to Q. C) Eigenvalue decomposition of covariance matrix;
The weight obtained by ReliefF algorithm needs to calculate D) The eigenvectors corresponding to the first n largest eigenvalues
Euclidean distance, so the data need to be normalized: are selected to form the eigenvector matrix W. where x is the
eigenvector, C is the reduced dimension matrix, V is the de-
(Q − Qmin )
Y= (4) averaging vector, S is the number of samples, ψ is the new ma
(Qmax − Qmin ) trix after the average value of each column of elements in the
feature matrix, and B is the covariance matrix of ψ . PCA could
where Q is the feature subset obtained by ReliefF algorithm, Qmin is the
keep the most important content of data while reducing dimen
minimum value of each feature parameter, and Qmax is the maximum
sion. At the same time, due to the transformation of coordinate
value of each feature parameter. Thus, a new characteristic matrix Y is
system, the redundancy of the data is greatly reduced (Uzer et al.,
obtained. The ReliefF algorithm flow in two-dimensional space is shown
in Fig. 2. As can be seen from the figure, the relief algorithm could cause
high redundancy of feature subset, and it is necessary to remove the
redundancy (Jain & Singh, 2018). 2.2. Structure of the deep neural network (DNN)
2.1.2. Reduce the feature dimensions based on PCA algorithm In 2006, Hinton et al. firstly proposed the concept of deep learning
Principal component analysis (PCA) is a dimensionality reduction (Hinton and Salakhutdinov, 2006). Deep learning is a deep data feature
method (Zhang et al., 2019, 2020; Li et al., 2018) which is used to reduce extraction algorithm, which could overcome the shortcomings of sup
the dimension of data sets. It could map n-dimensional features to k- port vector machine (SVM), artificial neural network (ANN) and other
dimension (k < n) and make the data separable. In the new spatial co shallow machine learning methods, such as imprecise feature extraction
ordinate system, the variance of the transformed data points is results and easily falling into local minimum and so on. DNNs are neural
networks with many hidden layers, which is also known as deep feed C) Calculate the gradient gt of the time step t:
forward networks (DFN).
gt = ∇θ J(θt− 1 ) (12)
As is shown in Fig. 3, DNN could be divided into input layers, hidden
layers and output layers. All layers are fully connected, that is to say, any
neuron in layer i is connected with all neurons in layer i + 1. When DNN D) Calculate the exponential moving average of the gradient:
propagates forward, the calculation formula is as follows:
mt = β1 mt− 1 + (1 − β1 )gt (13)
uK = WKi Xi (8)
i=1 E) Calculate the exponential moving average Xvt of the square of the
yK = f (uk − bk ) (9)
vt = β2 vt− 1 + (1 − β2 )g2t (14)
where, Xt is the ith input variable, Wki represents the weight connected to
the ith input variable, uk is the weighted sum of all input variables, bk is
the threshold, f(⋅)is the activation function, and yk is the output of the F) Correct the mt by using the deviations:
( )
neural network. The common activation functions are sigmoid, tanh and ̂ t = mt / 1 − βt1
m (15)
ReLU (Apicella et al., 2019; Wang et al., 2020). In this paper, ReLU
(Montanelli and Yang, 2020; Eckle & Hieber, 2019; Chen & Ho, 2019) is
selected as the activation function. The expression is as follows: G) Correct the vt by using the deviations:
( )
fReLU = max(0, z) (10) v t = vt / 1 − βt2
̂ (16)
where yi is the real value of the output layer and ai is the calculated value
by SoftMax. At the same time, the fault data in different periods are used
to verify the model, and the accuracy is taken as a part of the evaluation
Table 1
Parts of the experimental data.
No. of Samples P1r/s P2/V P3/f P4m/s P5/kW P6kW/h P7/Hz P8/kW P9/r P10/A
Table 2
Statistics of the WTs’ faults.
Serial number of the WTs Failure frequency State
2 98 Change to 1
3 207 Change to 1
14 296 Change to 1
23 517 Change to 1
39 172 Change to 1
Table 3
Number of selected features corresponding to different thresholds.
Fault type Number of all A1 A2 A3
models based on the four dimension reduction algorithms. It can be seen Table 7
that the diagnosis accuracy of the proposed model is significantly higher Correlation between first principal component and original data.
than that of other models. When diagnosing a single WT fault, the Fault types Related parameters
proposed model has less dimensions and the highest fault diagnosis
Overtemperature of Gearbox oil Temperature at DE end of generator bearing,
accuracy. When diagnosing various WT faults, the proposed model also temperature at DE end of gearbox bearing,
has the least dimensions and the highest fault diagnosis accuracy. temperature at NDE end of gearbox bearing
Compared with other comparing models, the proposed model has better Overtemperature of bearing at Gear oil temperature, temperature at DE end of
generalization. To sum up, the proposed model has lower dimension, the NDE end of gearbox gearbox bearing, Temperature at DE end of
generator bearing
highest accuracy of fault diagnosis and good generalization. Abnormal engine room Temperature of Rotor bearing A, temperature at
Besides, Table 7 shows the correlation between the first principal temperature NDE end of gearbox bearing
component and the original data after PCA dimension reduction. The Main shaft brake failure Angle of cabin to North, Engine room
score coefficient of the first principal component relative to the original temperature, Temperature of Rotor bearing A
ReliefF- PCA-DNN
ReliefF- PCA-DNN
Pearson-DNN 100
Accuracy results(%)
40 50
10 1012 8 10
8 10
9 10
12 30
7 7
3 2 3 3 3 3
0 20
R1 R2 R3 R4 R5
Fault type
Fig. 6. Comparing results of the models based on various dimension reduction algorithms.
0.5 96
shown in Fig. 13, compared with the single fault diagnosis models, the
accuracies of the three multi-faults prediction models decrease. The
96 proposed ReliefF-PCA-DNN model still shows excellent diagnostic per
formance in multiple faults, and the accuracies are all above 96%. When
the fault types are 2 and 4, the accuracies are 99.2% and 97.82%
Accuracy (%)
95 respectively. When the fault types increase, the accuracy of the proposed
model is relatively stable without obvious decline. When there are 8
types of faults, the accuracy is 96.72%.
94 For the ReliefF-PCA-SVM model, when there are few fault types, it
could achieve the faults diagnosis well, but when the fault types increase
obviously, its accuracy decreases. When there are 8 types of faults, the
accuracy is only 88.89%.
For the ReliefF-PCA-BPNN model, when there are few fault types, its
92 accuracy is stable. When the fault type is 2 and 4, the accuracies are
2 3 4 5 6 7 98.2% and 93.96% respectively. When the number of faults increases, its
Number of hidden layers accuracy is not as good as those of the two models above. The accuracy
decreases sharply and is only 24.12%. It is unable to distinguish faults
Fig.11. Results of different hidden layers. effectively because the ReliefF-PCA-BPNN model has a shallow network
structure and could not learn all features effectively. The results above
show that the proposed model here has obvious advantages over the
Table 9
ReliefF-PCA-BPNN and ReliefF-PCA-SVM model in the WTs’ faults
Numbers of neurons in grid search method.
Hidden layer Number of neurons
Next, the proposed model is verified by using the fault data of other
2 13, 14 periods. Table 11 shows the confusion matrix of the diagnosis results.
3 13, 15, 14 The values on the diagonal in Table 11 are the numbers of correct
4 14, 14, 15, 15
diagnosis, and the rest in the same column indicate the numbers of
5 13, 14, 14, 14, 14
6 13, 14, 14, 14, 16, 16 misdiagnosis. It can be seen that the misdiagnosed faults mainly focus on
7 13, 14, 14, 16, 14, 14, 16
Table 10
proposed model. Fault types of different WTs.
As shown in Fig. 12, for the No. 39 wt, the single fault diagnosis label Fault type Note
accuracies of the three models are all very high, and the accuracy for
F0 Normal state of No.2 wt Normal
each fault is more than 98.5%. It shows that the three models are F1 Oil overtemperature fault of the No. 14 wt Temperature >80 ℃
extremely reliable in single fault diagnosis after dimension reduction, gearbox
and the single fault diagnosis result is more reliable than multiple fault F2 Brake holding failure of the No.14 wt main shaft Brake holding
F3 Oil overtemperature fault of the No. 23 wt Temperature >80 ℃
diagnosis, because the fault categories involved are less and the
complexity of the model is low, but each classifier could only diagnose F4 Bearing overtemperature fault at NDE end of the Temperature greater
specific faults. No. 23 wt gearbox than 80 ℃
Multi faults classification is to classify a series of sampled data by F5 Oil overtemperature fault of the No. 39 wt Temperature >80 ℃
categories, including normal categories and several faults categories. gearbox
F6 Bearing overtemperature fault at NDE end of the Temperature >80 ℃
These faults come from the 4 WTs. Different faults of one WT and the
No. 39 wt gearbox
same fault of different WTs are defined as different faults. As shown in F7 Brake holding failure of the No.39 wt main shaft Brake holding
Table 10, one model is used for faults diagnosis of different WTs. As is F8 Temperature fault of the No. 3 wt cabin Temperature < 5 ℃
ReliefF- PCA-DNN
100 ReliefF- PCA-SVM
2 types of failures 4 types of failures 8 types of failures Fig. 14. ROC curves and AUC values for the test sets.
Fault states
reduction, the data dimensions are effectively reduced. Compared with
Fig. 13. Accuracy of different models under multiple faults. other dimensionality reduction algorithms, the ReliefF-PCA algorithm
has good generalization, which not only ensures the accuracy of the
proposed model, but also speeds up the its operation speed.
Table 11
Diagnostic results of confusion matrix for the ReliefF-PCA-DNN model.
Secondly, the network functions and parameters of the proposed
model are discussed and optimized. From the loss value in the training
label prediction classification results for the ReliefF-PCA-DNN model
process, we can see that when the ReLU activation function is used as the
F0 F1 F2 F3 F4 F5 F6 F7 F8 hidden layer activation function and the Adam optimization algorithm is
F0 100 0 0 0 0 0 0 0 0 used as the parameter updating algorithm, the proposed model has good
F1 0 100 0 0 0 0 0 0 0 training effect. The best number of hidden layers is 5 with the shortest
F2 0 0 100 0 0 0 0 0 0 running time.
F3 0 0 0 98 1 1 0 0 0
Thirdly, the model proposed here is compared with the common
F4 0 0 0 1 99 0 0 0 0
F5 0 0 0 1 0 96 3 0 0 fault diagnosis models. For the single-fault diagnosis, the three models
F6 0 0 0 0 0 2 98 0 0 have good performance. In the process of multi-faults diagnosis, the
F7 0 0 0 0 0 0 0 100 0 accuracy of the proposed model is the highest, and the ReliefF-PCA-
F8 0 0 0 0 0 0 0 0 100
BPNN model is the worst when there are many fault types. According
to the verification results of multi faults diagnosis, the misdiagnosed
F3 to F6. The main reason for the results is that the gearbox temperature faults mainly focus on F3 to F6. It indicates that there may be a certain
fault is easy to cause the temperature of surrounding parts to rise, such direct connection between the two types of faults and needs further
as the increase of bearing temperature and so on. Although the proposed study.
model has good fault diagnosis performanc, it is also easy to fall into
misdiagnose when the gearbox temperature exceeds the normal level. Declaration of Competing Interest
Different fault types would cause the same data to change the internal
characteristics. That is the main reason why the accuracy of each algo The authors declare that they have no known competing financial
rithm decreases under multiple faults diagnosis. Besides, The ROC interests or personal relationships that could have appeared to influence
curves and AUC values of the test sets are shown in Fig. 14, It can be seen the work reported in this paper.
from the figure that the test sets achieve good classification effect on the
trained model, in which the precision value is 0.9937 and the recall Acknowledgements
value is 0.9891. Macroscopically, the average AUC value of 9 times is
0.9991, which excludes the chance and shows that the proposed model The authors are thankful to the support of Jilin city outstanding
is relatively stable. Microscopically, the AUC values of F0 to F8 are all young talents training program (20190104156), the science and tech
greater than 0.98, and the AUC values of F5 and F6 are lower than those nology projects by Jilin province department of education
of other faults, which is consistent with the conclusion above. (JJKH20190709KJ).
