Full Thesis PD
Full Thesis PD
Full Thesis PD
JIN JUN
JIN JUN
( B. ENG )
A THESIS SUBMITTED
FOR THE DEGREE OF DOCTOR OF PHILOSOPHY
DEPARTMENT OF ELECTRICAL AND COMPUTER ENGINEERING
NATIONAL UNIVERSITY OF SINGAPORE
2005
ACKNOWLEDGEMENT
Acknowledgement is also towards to Toshiba Corporation, Japan for its support on this
project.
I would like to thank my wife and my parents for their love, patience, and continuous
support along the way.
Thanks are also given to the Power System Laboratory Technician Mr. H. S. Seow, for
his help and cooperation throughout this research project.
Last but not least, I would like to thank my friends and all those, who have helped me
in one way or another.
ii
7. C.S. Chang, R.C. Zhou, J. Jin, Identification of Partial Discharge Sources in GasInsulated Substations, Proc. of Australasian Universities Power Engineering
Conference 2004, paper number 50, Australia.
iii
TABLE OF CONTENT
ACKNOWLEDGEMENT.............................................................................................i
PAPERS WRITTEN ARISING FROM WORK IN THIS THESIS.......................iii
TABLE OF CONTENT...............................................................................................iv
SUMMARY ................................................................................................................ix
LIST OF FIGURES .....................................................................................................xi
LIST OF TABLES .....................................................................................................xvi
CHAPTER 1: INTRODUCTION................................................................................1
1.1
1.1.1
1.1.2
1.1.3
PD in SF6 .................................................................................................6
1.1.4
1.1.5
1.1.6
1.1.7
1.2
1.2.1
1.2.2
1.3
1.4
1.4.1
1.4.2
1.5
INTRODUCTION.................................................................................37
iv
2.2
2.2.1
2.2.2
2.2.3
2.3
2.3.1
Introduction...........................................................................................45
2.3.2
2.3.3
2.4
2.4.1
2.4.2
2.4.3
2.4.4
2.5
INTRODUCTION.................................................................................77
3.2
3.3
3.4
3.4.1
3.4.2
GA Optimization....................................................................................83
3.4.3
3.5
PERFORMANCE TESTING................................................................90
3.6
3.7
INTRODUCTION ................................................................................97
4.2
PRE-SELECTION ..............................................................................101
4.3
4.3.1
4.3.2
4.4
4.4.1
4.4.2
4.4.3
4.5
4.5.1
4.5.2
4.6
INTRODUCTION...............................................................................127
5.2
5.2.1
5.2.2
5.2.3
Feature Selection.................................................................................138
5.3
5.3.1
Level of Decomposition.......................................................................143
5.3.2
5.4
5.4.1
5.4.2
5.4.3
5.4.4
5.5
vi
6.1.1
6.1.2
6.1.3
6.2
6.2.1
6.2.2
6.2.3
6.2.4
6.3
INTRODUCTION...............................................................................188
7.2
7.2.1
7.2.2
Re-selection of WPT_feature...............................................................194
7.3
7.3.1
7.3.2
7.4
CONCLUSION ...................................................................................208
8.1.1
8.1.2
8.2
REFERENCES..........................................................................................................215
vii
APPENDICES ...........................................................................................................223
A.
A.2
A.3
B.
C.
Genetic Algorithm...............................................................................237
D.
E.
F.
viii
SUMMARY
In the first part of this thesis, a wavelet-packet based denoizing method is developed
to effectively suppress the white noises. A novel variance-based criterion is employed
to select the most significant frequency bands for noise reduction. Parameters
associated with the denoizing scheme are optimally selected using genetic algorithm.
Using the proposed method, successful and robust denoizing is achieved for PD
signals having various noise levels. Successful restoration of the original waveforms
enables the extraction of reliable features for PD identification.
ix
of speed and accuracy. Therefore, new methods are developed in the second part of
this thesis to solve the problems with phase-resolved methods.
Features extracted using wavelet packet transform (WPT_Feature) form the second
category of PD features. A statistical criterion, known as J criterion is employed to
ensure that the features with the most discriminative power are selected. Taking
advantage of the additional frequency information equipped with wavelet packet
transform, WPT_Feature exhibits a large margin between feature clusters of different
classes, which indicates good classification performance.
Owing to the compactness and high quality of the extracted features, successful and
robust PD identification is achieved using a very simple MLP network. Particularly,
MLP with WPT-based pre-processing achieves 100% correct classification on test and
on data obtained from different PD to sensor distances. This verifies the robustness of
the WPT-based feature extraction. Moreover, both the WPT and ICA based PD
diagnostic methods are potentially suitable for online applications.
LIST OF FIGURES
Fig. 1.1
Fig. 1.2
Fig. 1.3
Fig. 1.4
Fig. 1.5
Fig. 1.6
Fig. 1.7
Fig. 1.8
Fig. 1.9
Fig. 1.10
Fig. 1.11
Fig. 1.12
Fig. 1.13
Fig. 1.14
Fig. 1.15
Fig. 1.16
Fig. 2.1
Fig. 2.2
The decomposition tree structure of (a) DWT and (b) WPT ................ 41
Fig. 2.3
Fig. 2.4
Fig. 2.5
Fig. 2.6
Fig. 2.7
Fig. 2.8
Fig. 2.9
Fig. 2.10
Fig. 2.11
Fig. 2.12
Fig. 2.13
Fig. 2.14
Fig. 2.15
One-step decomposition........................................................................ 62
Fig. 2.16
One-step reconstruction......................................................................... 63
Fig. 2.17
Original PD signal................................................................................. 65
Fig. 2.18
Fig. 2.19
Fig. 2.20
Fig. 2.21
Fig. 2.22
Fig. 2.23
Fig. 2.24
Fig. 3.1
Fig. 3.2
Fig. 3.3
GA flowchart......................................................................................... 85
Fig. 3.4
Fig. 3.5
Fig. 3.6
xii
Fig. 3.7
Fig. 3.8
Fig. 4.1
Fig. 4.2
Fig. 4.3
Fig. 4.4
Fig. 4.5
Fig. 4.6
Fig. 4.7
Fig. 4.8
Fig. 4.9
Fig. 4.10
Fig. 4.11
Fig. 4.12
Fig. 4.13
Fig. 4.14
Fig. 4.15
Feature clusters formed by (a) ICA features (b) PCA features. .......... 122
Fig. 4.16
Fig. 5.1
Fig. 5.2
WPD tree of level 5 (Copy of Fig. 3.8 for reference) ......................... 129
Fig. 5.3
Fig. 5.4
Fig. 5.5
xiii
Fig. 5.6
Fig. 5.7
Fig. 5.8
Fig. 5.9
Fig. 5.10
Fig. 5.11
Fig. 5.12
Feature spaces formed by the best features obtained from (a) sym6
wavelet; (b) db9 wavelet.................................................................. 154
Fig. 5.13
Impact of noise levels on the features selected in Section 5.4.1 ......... 156
Fig. 5.14
Feature spaces obtained from signals of different SNR levels ........... 158
Fig. 5.15
Fig. 5.16
Fig. 6.1
Fig. 6.2
Fig. 6.3
Fig. 6.4
Fig. 6.5
Fig. 6.6
Fig. 6.7
Fig. 6.8
Mean squared error during training when using ICA_feature as input .....
............................................................................................................. 179
Fig. 6.9
Fig. 6.10
Mean squared error during training when using WPT_feature as input ...
............................................................................................................. 182
Fig. 7.1
xiv
Fig. 7.2
Fig. 7.3
Fig. 7.4
Fig. 7.5
Fig. 7.6
Fig. A.1
Fig. A.2
The layout of the test setup with a section of an 800 kV GIS............. 225
Fig. A.3
Fig. A.4
Fig. B.1
Fig. B.2
xv
LIST OF TABLES
Table 2.1
Table 2.2
Table 2.3
Table 3.1
Table 3.2
Table 3.3
GA intermediate parameters.................................................................. 93
Table 3.4
Table 4.1
Table 4.2
Table 4.3
Table 4.4
Table 5.1
Table 5.2
Table 5.3
Table 5.4
Table 5.5
Table 6.1
Table 6.2
Table 6.3
Table 6.4
xvi
Table 6.5
Table 6.6
Table 6.7
Table 6.8
Table 6.9
Table 6.10
Table 6.11
Table 6.12
Table 7.1
Table 7.2
Table 7.3
Table 7.4
Table 7.5
Table 7.6
Table 7.7
Table 7.8
Table 7.9
Table 7.10
Table A.1
Table A.2
Table A.3
xvii
CHAPTER 1 INTRODUCTION
CHAPTER 1
INTRODUCTION
The background of this research is introduced first. The importance of partial discharge
(PD) detection, PD measurement system in gas-insulated-substation (GIS), various
noise reduction methods for PD signals and the methods for PD source recognition are
reviewed. The objectives, scope and contributions to knowledge of the research are
described. Finally, an outline of the thesis is given.
CHAPTER 1 INTRODUCTION
1.1
A significant trend in the development of electrical power equipment over the years
has been the increase of equipment operating voltage. This has given rise to the need
for more reliable insulation systems and subsequently the need to detect the
degradation of such systems through diagnostic measurements. In the past couple of
years, increasing attention has been paid to the development of such tools. Among the
various diagnostic techniques, partial discharge (PD) measurement is generally
considered crucial for condition-based maintenance, as it is nondestructive, nonintrusive and can reflect the overall integrity of the insulation system. Thus, a good
understanding of the PD phenomenon is the basis of this diagnostic system.
CHAPTER 1 INTRODUCTION
GIS is a very complicated system that consists of busbars, arresters, circuit breakers,
current and potential transformers, and other auxiliary components as illustrated in Fig.
CHAPTER 1 INTRODUCTION
1.2. These components are enclosed in a grounded metal enclosure which is filled with
sulfur hexafluoride (SF6). Epoxy resin spacers are used to hold the conductor in place
within the enclosure as shown in Fig. 1.3.
Grounded Enclosure
SF6 Gas
Resin Spacer
Fig.1.3 GIS test chamber
CHAPTER 1 INTRODUCTION
CHAPTER 1 INTRODUCTION
In recent years, there has been a great deal of new development in GIS monitoring
techniques, among which partial discharge detection [3-7] is found to be the most
important method as PD is an indicator of all dielectric failures in the initial stages.
This thesis focuses on the detection and identification of PD activities in GIS.
1.1.3 PD in SF6
Sulfur hexafluoride (SF6) gas has been used as a popular insulation material since its
dielectric strength is twice as good as air and it also offers excellent thermal and arc
interruption characteristics [28]. However, conducting particles may cause PD in SF6
and lower the breakdown voltage of a GIS considerably. The likely causes of such
contamination are debris left from the manufacturing and assembly process,
mechanical abrasion, movement of the central conductor under load cycling and
vibration during shipment. Even with a very high level of quality control, it appears
that a certain level of particulate contamination is unavoidable. Therefore,
investigation of PD activities in SF6 is imperative for the condition monitoring of GIS.
The common defects in GIS include free conducting particles, surface contamination
on insulating spacers and protrusions on conductor [7-10] as illustrated in Fig. 1.4.
These defects enhance the local electric field, leading to partial discharge and
ultimately a complete breakdown. Corona, which is regarded as an important source of
noise is also reviewed in this section.
CHAPTER 1 INTRODUCTION
Fig. 1.4 Common defects in GIS. (1) protrusion on conductor, (2) free conducting
particle, (3) particle on spacer surface.
When a free conducting particle, such as a piece of swarf, is exposed to the electric
field in a GIS, it becomes charged and experiences an electrostatic force. The
electrostatic force may be sufficient to overcome the particles weight, so that the
particle moves under the combined influence of the electric field and gravity. The
particle may return to the enclosure at any point on the power frequency wave and a
dancing motion is observed. When the particle moves, it periodically makes contact
with the grounded enclosure, and a discharge occurs with every touch. The breakdown
CHAPTER 1 INTRODUCTION
occurs when the particle approaches, but is not in contact with the busbar. There is a
critical particle-to-busbar spacing where the system breakdown voltage is a minimum.
Apart from the movement of the particle, there are a number of factors that affect the
degree of harmfulness of a free particle, such as the shape and size of the particle,
applied voltage level, etc. Long, thin and wire-like particles are more likely to trigger
breakdown than spherical particles of the same material [8].
As breakdown will only occur when a particle is lifted and approaches the busbar,
various techniques have been developed for permanently deactivating or removing
particles from the active region during high voltage testing [85, 86]. For instance, an
adhesive can be employed at the low field enclosure in conjunction with a low field
trap. Other techniques for preventing particle movement include applying insulating
coatings on the enclosure, using magnetic fields and coating the particles with a
dielectric layer [86]. Although probability of breakdown is reduced due to the abovementioned measures which decrease the number of free particles in the chamber,
particle-initiated breakdown is still unavoidable in GIS due to the particles generated
during operation.
CHAPTER 1 INTRODUCTION
the voltage rating of insulating supports rather than the dielectric strength of the SF6
gas. This voltage rating is highly dependent on surface conditions and the presence of
any contamination which may initiate partial discharge. Sources of contamination
include fixed metallic particles, grease and trapped charge [10].
A particle on the spacer is in contact with a surface that will store charge near the
particle ends. The accumulated charges can then lead to high field concentration on the
surface of spacer. Therefore, particles on the spacer can reduce the flashover voltage
significantly.
Protrusion on Conductor
A sharp metallic protrusion on a busbar enhances the local electric field. If the local
electric field exceeds some critical value, there is a localized breakdown of the SF6 gas
which causes discharges that could lead to complete breakdown. This type of defect is
usually considered to be the most critical one that defines the critical PD level [29].
For a protrusion on the busbar, three distinct phases of discharge activities can be
identified namely diffuse glow, streamer and leader discharge. However, the glow
discharge is not detectable using UHF measurement as the PD current magnitude is
small and the frequency components are too low for UHF excitation. On the other hand,
leader discharge is only observed at high voltages prior to breakdown. Hence, PD data
is measured from streamer phase in this work.
CHAPTER 1 INTRODUCTION
Air Corona
Corona is a discharge phenomenon that is characterized by the complex ionization
which occurs in the air surrounding high voltage transmission line conductors outside
the GIS at sufficiently high levels of conductor surface electric field. It is usually
accompanied by a number of observable effects, such as visible light, audible noise,
electric current, energy loss, radio interference, mechanical vibrations, and chemical
reactions. Corona signals propagate through the busbar and are detected by the sensors.
CHAPTER 1 INTRODUCTION
chemical changes in SF6, but this technique appears to be too insensitive for PD
detection in GIS [3].
For many years, the conventional electrical method, IEC 270, has been well developed
and widely used in detecting PD activities in cables, transformers, generators, and
other equipment. The typical frequency range of this type of measurement is 40 kHz to
1 MHz. Fig. 1.5 shows the typical measurement circuit of the IEC 270 method. A
coupling capacitor is placed in parallel with the test object and the discharge signals
are measured across the external impedance.
(a)
(b)
Fig. 1.5 PD measurement circuit of IEC 270 method
(a) Coupling device in series with the coupling capacitor; (b) Coupling device in
series with the test object
11
CHAPTER 1 INTRODUCTION
One of the main advantages of this method is that a very broad scale of experience has
been obtained through years of practical applications. In addition, the measurement can
be calibrated to assure that the same result is obtained from two different systems that
are used to measure the same sample. However, there are three major drawbacks
associated with this method which make it inappropriate to be applied in GIS [3-6].
Firstly, the IEC 270 method needs an external coupling capacitor which is not
normally provided in GIS. Hence, the method can not be employed on the GIS in
service. Secondly, the sensitivity of the method depends on the ratio of the coupling
capacitance to the capacitance of the test object. The total capacitance of a GIS is large.
Therefore, the method has insufficient sensitivity for a complete GIS. Thirdly, such a
low frequency method is not suitable for field application on GIS as a result of
excessive interferences as shown in Fig. 1.6.
12
CHAPTER 1 INTRODUCTION
Fig. 1.6 Various noises travel through the GIS conductor via bushing
CHAPTER 1 INTRODUCTION
frequency up to VHF ranges (30 MHz to 300 MHz). However, they do not produce
electromagnetic waves within UHF ranges (300 MHz to 1.5 GHz). Thus sinusoidal
continuous noises can not be detected by the UHF sensor and are not considered in this
study. . However, the other two types of noise contain both low frequency and high
frequency components. Thus, advanced noise reduction techniques have to be
developed for suppressing the residual noises in UHF signals.
1. UHF Measurement.
Data acquisition is usually performed through internal or external UHF sensors.
The recorded data are then transferred and stored on a PC hard drive for further
analysis.
2. Noise reduction.
It is well-known that environmental noises present on the GIS site would cause
distortion in the measured signals. Therefore, sufficient noise suppression is a
pre-requisite for any on-site PD evaluation and analysis.
3. Partial discharge fingerprints construction.
14
CHAPTER 1 INTRODUCTION
In many commercial PD monitoring systems for GIS, some of the components, such as
PD location are not included. This may be due to the lack of practical methods and the
15
CHAPTER 1 INTRODUCTION
complicated structures of GIS. In such commercial systems, the UHF signals created
by partial discharge are detected by couplers positioned throughout the substation. The
signals are then passed via coaxial cables to a local processing unit where they are
amplified, filtered and digitized. Subsequently, the processed data is transferred and
saved in a central PC, where a PD diagnostic software is usually installed. By running
the software, various PD patterns are built for data obtained from each sensor and used
by an experienced engineer or artificial intelligence software to assess the risk of
defects in GIS.
16
CHAPTER 1 INTRODUCTION
White noises widely exist in the high voltage laboratory and on site. They are Gaussian
distributed in time domain and uniformly distributed in frequency domain. Therefore,
it is impossible to effectively eliminate white noise using any time or frequency
methods. Fig. 1.8 shows a measured UHF PD signal buried in excessive white noise. It
can be seen that the PD signal has been distorted and it is impossible to gauge the
condition of the insulation based on such a signal.
CHAPTER 1 INTRODUCTION
Air corona occurs in the form of stochastic pulse-shaped noise at the bushing of the
GIS. It is therefore not so harmful to GIS insulation. However, the signal is usually so
intense that enough UHF components are fed into the busbar to give an unacceptably
high noise level. It is difficult to distinguish this kind of interference due to the
similarities between SF6 PD and air corona. The amplitudes of corona signals are often
comparable to or even bigger than those of PD as illustrated in Fig. 1.9. Therefore,
discrimination of air corona is crucial for PD detection and source recognition.
Fig. 1.9 Comparison of SF6 PD and air corona. (a) SF6 PD; (b) air corona.
18
CHAPTER 1 INTRODUCTION
As distinct from partial discharge occurring in solid or liquid dielectrics for generators
and transformers, PD in SF6 exhibits unique breakdown characteristics as illustrated in
Fig. 1.10. It can be seen that both PD inception and breakdown voltage increase with
the gas pressure in region I. In region II, breakdown voltage decreases with increasing
pressure, while inception voltage keeps going up. Above a critical pressure Pc,
breakdown voltage is seen to coincide with inception voltage, meaning that PD in SF6
leads to breakdown very fast. This suggests that the PD diagnostic system must be able
to detect and identify the PD source in time so that breakdown can be prevented.
However, the widely adopted PD diagnosis method, namely phase-resolved PD (PRPD)
pattern analysis requires a long time for signal measurement and formation of PRPD
patterns. Thus, it may not meet the requirement for GIS application. In addition, this
approach can not be applied to DC power transmission system, where phase reference
is not available. With the increasing application of DC transmission, PD identification
in such systems becomes more and more important. There is therefore an urgent need
to develop a new method for fast and reliable classification of SF6 PD. Detailed review
of PRPD pattern analysis and its application is given in Section 1.3.
19
CHAPTER 1 INTRODUCTION
1.2
The various techniques for white noise reduction include filtering, spectral analysis
and Wavelet Transform (WT) [13], among which filtering and spectral analysis are
20
CHAPTER 1 INTRODUCTION
based on Fast Fourier Transform (FFT). Fast Fourier Transform and its inverse give a
one-to-one relationship between the time domain and the frequency domain [14].
Although the spectral content of the signal is easily obtained using the FFT,
information in time is however lost. Fig. 1.11 shows the FFT of a measured PD signal.
As illustrated in Fig. 1.11 (b), FFT only gives the frequency components of the PD
signal. Since white noises are uniform distributed in frequency domain, it is impossible
to remove white noises using FFT without significant distortion in the original PD
signal. Therefore, additional time information is crucial for PD signal denoizing and
detection due to its non-periodic and fast transient waveform in time domain.
Fig. 1.11 Fast Fourier Transform of UHF PD signal (a) PD signal; (b) FFT of (a).
21
CHAPTER 1 INTRODUCTION
Transform [13], [15-17] for PD signal denoizing. Wavelets are functions that satisfy
certain mathematical requirements and are used in representing data or other functions.
Using their practical implementation known as wavelet filter banks, discrete wavelet
transform (DWT) maps the data into different frequency components, and then studies
each component with a resolution matched to its decomposition level. As illustrated in
Fig. 1.12, DWT processes PD signal at different time-frequency resolutions so that
both frequency and time characteristics can be studied simultaneously. In addition, the
energy of PD signal is concentrated in a few large decomposition coefficients, while
the energy of white noise is spread among all coefficients in wavelet domain, resulting
in small coefficients [83, 84]. Therefore, it is feasible to remove white noises in
wavelet domain with little distortion by employing a thresholding method. DWT thus
suppresses white noise within the PD signals more effectively than Fourier based
methods.
22
CHAPTER 1 INTRODUCTION
Various denoizing methods are discussed in [13] with a special focus upon the
wavelet-based method. The method first decomposes the PD signal into several detail
components, each containing a set of decomposition coefficients. Subsequently,
components that are dominated by noises are discarded. Thresholding is then
performed on the decomposition coefficients of retained components, followed by the
reconstruction of the denoized signal. Although the feasibility of applying wavelet
transform to PD signal denoizing is studied, the denoizing performance in terms of
signal-to-noise ratio and distortion is however not fully investigated as only graphic
23
CHAPTER 1 INTRODUCTION
results are presented without any numerical calculation. Furthermore, the selection of
detail components for reconstruction is based on observation, which is not robust for
all applications. Therefore, an automated method should be developed.
In [16-17], the issues associated with the wavelet-based PD denoizing methods, such
as wavelet selection and threshold estimation are investigated. However, one threshold
is applied to all detail coefficients at the first decomposition level that corresponds to
high-frequency bands. Noise levels corresponding to high-frequency bands could be
different. Thus, further investigation of time-frequency features at high-frequency
bands should be required for PD signal denoizing.
24
CHAPTER 1 INTRODUCTION
that the method works well on the data obtained from the low-frequency measurement.
However, the frequency contents of PD and corona signals obtained from UHF
measurement are overlapped. This means that it is difficult to determine whether a
component is dominated by PD or corona. Therefore, the method may not work on
UHF resonance signal. Moreover, the method can not be applied online as the
discrimination process is not automatic.
Methods based on neural networks are proposed in [21-23] to classify PD and corona.
Using the measured signals or phase-resolved PD patterns as input, various neural
network structures are constructed and trained for discrimination of corona. These
methods however do not provide a detailed discussion on feature extraction, which is
crucial for neural network design and its classification performance. Moreover, the
neural networks employed in [21-23] have very complicated structures, which prevent
them from online application due to the slow response. Hence, there comes the need to
develop a new scheme for discrimination of corona and PD.
25
CHAPTER 1 INTRODUCTION
1.3
REVIEW
OF
PARTIAL
DISCHARGE
SOURCE
RECOGNITION
Traditionally, the approach using phase-resolved PD (PRPD) patterns has been widely
employed to monitor partial discharge activities [23-25]. Here the total charge
transferred during a discharge and the time or ac phase at which the discharge occurs
are measured. In addition, the total number of PD events occurring within a time
interval is counted. Based on these parameters, PRPD pattern analysis investigates the
PD magnitude and/or PD repetition rate in relation to voltage ac cycle, which is
equally divided into a certain number of windows. Typical PRPD patterns,
accumulated over a number of cycles, are shown in Figs. 1.13 and 1.14.
26
CHAPTER 1 INTRODUCTION
Fig. 1.13 Two-dimensional PRPD patterns (a) PD repetition rate against phase; (b) PD
amplitude against phase
27
CHAPTER 1 INTRODUCTION
In [3, 23-27], features are extracted from the PRPD or POW patterns using envelop
extraction, statistical methods, orthogonal transforms, unsupervised neural networks or
fractals method. Subsequently, various classification schemes are developed to identify
defects based on the extracted features. However, results of these methods show large
classification error due to the variety of the patterns produced by defects of the same
type as shown in [26]. Another major drawback with these approaches is that they
require signals measured within a few seconds or even longer to form the PRPD or
POW patterns before feature extraction and classification. On the other hand, PD can
progress very quickly from initiation to breakdown in GIS, particularly in highpressure SF6 for working voltages at 300 kV and above. In addition, more than one
type of PD can take place in the GIS chamber during the forming PRPD or POW
patterns [3]. This has resulted in inaccurate PRPD or POW patterns and lead to further
misclassification. There is therefore an urgent need to develop a fast and reliable
diagnosis method for source recognition of PD.
28
CHAPTER 1 INTRODUCTION
1.4
Through the background review, the traditional denoizing and source recognition
methods are considered to be insufficient to provide fast and reliable diagnosis of
insulation system in GIS. Thus, in contrast to the PRPD- or POW-based methods, a
novel scheme based on UHF signals with duration of several hundred nanoseconds is
developed in this thesis as shown in Fig. 1.15. As data are collected in much shorter
windows, the possibility of encountering more than one type of discharge signals
during measurement and subsequent classification is very small. In addition, the short
data acquisition time enables the development of fast PD diagnosis system which can
be potentially applied online. Therefore, the problems with PRPD- and POW-based
methods are basically solved through the use of UHF signal directly.
(1)
(2)
29
CHAPTER 1 INTRODUCTION
(3)
To select features with the largest discriminating power to form compact and
high-quality PD fingerprints, so that the speed and classification performance
are improved significantly.
(4)
30
CHAPTER 1 INTRODUCTION
31
CHAPTER 1 INTRODUCTION
(1)
(2)
(3)
(4)
1.5
The overall structure of this thesis is illustrated in Fig. 1.16. Content of each chapter is
briefly described as follows:
32
CHAPTER 1 INTRODUCTION
Chapter 2 studies the denoizing of UHF PD signals using wavelet packet transform. A
novel variance-based criterion is developed to select the best tree from wavelet packet
decomposition tree for improving the denoizing. Selection of other denoizing
parameters is also studied based on overall performance. Results from different
denoizing methods are presented and compared.
Chapter 4 and Chapter 5 develop novel methods for PD feature extraction based on
UHF signals with short duration. In Chapter 4, a time-domain technique known as
Independent Component Analysis (ICA) is employed to perform the feature extraction.
ICA is first introduced through a comparison with the well-known Principal
Component Analysis. Subsequently, ICA-based feature extraction method is described
followed by experimental results.
33
CHAPTER 1 INTRODUCTION
34
CHAPTER 1 INTRODUCTION
35
CHAPTER 2
DENOIZING OF PD SIGNALS IN WAVELET PACKET
DOMAIN
In Chapter1, the background information about PD and its measurement has been
introduced. Previous research on noise reduction and PD source recognition has been
reviewed and a novel PD diagnosis scheme has been proposed. In this chapter,
denoizing of UHF PD signals using wavelet packet transform is studied. First, wavelet
packet transform and the general wavelet-packet-based denoizing scheme are briefly
reviewed. Secondly, the proposed denoizing scheme is described with special
emphasis on a novel approach for best tree selection. Lastly, numerical results are
presented and discussed.
36
2.1
INTRODUCTION
Based on WPT, a general method was proposed in [31] and implemented in a software
package [42] for signal denoizing. However, the method is found in this work not
applicable to PD signals in terms of noise level reduction and restoration of the
original waveform, as it was only developed and tested on standard waveforms, such
as sine waves. The major drawback of the method is that the criterion employed for
selecting PD dominated decomposition components may cause loss of critical PD
information, leading to poor denoizing performance. An outline of the general method
and its shortcomings is given in Section 2.2.2 and 2.2.3 respectively.
To address the above-mentioned issue with the general denoizing method, a novel
variance-based criterion is proposed in Section 2.3.2 for selecting the most effective
components from the wavelet-packet-decomposition tree.
Moreover, a scheme is
proposed in the flowchart of Fig. 2.1 for determination of the best choice of
denoizing parameters, such as wavelet filters, decomposition level and thresholding
parameters, in terms of noise reduction and original signal restoration. A
37
comprehensive database containing 256 data records was built for developing and
verifying the new denoizing method as well as the new PD source identification
methods, which will be discussed in chapters 4 to 7. Data were collected by TMT&D
from a test section of an 800 kV GIS [89], where PD of various types and locations
were initiated by applied voltages of various values. Details of the equipment
specifications and experimental set-up are given in Appendix A. Numerical results are
shown in Section 2.4 to compare the performance of various denoizing parameters and
methods, where signal-to-noise-ratio (SNR) and correlation coefficient (CC) are
employed to evaluate noise reduction and signal restoration respectively.
In Fig. 2.1, a mechanism is also proposed for verifying the performance of determined
denoizing parameters on new data by dividing the measured signals into a training set
and a test set, using which a genetic-algorithm-based method is developed in Chapter 3
to optimize the entire set of denoizing parameters.
38
39
2.2
j +1,2 n (k ) = h(m) j , n (2 j m k )
(2.1)
j +1,2 n +1 (k ) = g (m) j , n (2 j m k )
(2.2)
The complete binary tree resulted from WPT contains many nodes. It follows that the
terminal nodes (leaves) of every connected binary subtree of the complete tree form an
orthogonal basis of the signal space. Therefore, to achieve the best denoizing
performance, there is a need of choosing the best nodes subset (best tree) for
40
representing a signal in wavelet packet domain. A review on the DWT and the
generalized WPT is given in Appendix B.
Fig. 2.2 The decomposition tree structure of (a) DWT and (b) WPT
41
Typical applications of WPT include biomedical engineering [32-33], signal [34] and
image [35] processing. Recently, WPT has been successfully applied to various fields
in power system, such as power system disturbances [36-38], energy measurement [39]
and fault identification [40]. However, only a limited number of publications on the
application of WPT to PD analysis have been reported. In [41], WPT was employed to
compress PD data.
42
The standard method is started by creating a father node from a given PD signal.
Then the best tree decomposition (splitting process) is carried out as follows:
(1)
(2)
Split the "father" node into two "child" nodes by one-step-DWT using a
predetermined wavelet.
(3)
43
(4)
(5)
Choose the next node at the current decomposition level as the "father" node
and go to step (2). If all the nodes at the current level have been split, go to
next level and select the leftmost node as the "father" node. Then go to step
(2). If the last node of level J-1 has been examined where J is the specified
decomposition level, the process stops.
Many entropy functions can be used in the above process, such as Shannon entropy,
logarithm of the "energy" entropy, threshold entropy, and so on [42]. The Shannon
entropy is used in the present experiment due to its proven suitability for wavelet
packet analysis [43].
This would cause information loss representing the features of the PD. In addition, the
best tree structure resulted from the splitting has to be constructed every time when a
new PD signal is presented. This is inefficient as the tree structure can be determined
from a set of typical PD signals and kept unchanged for all the signals that are going to
be processed. Thus, a more efficient PD denoizing strategy is required to address these
issues.
2.3
NEW
WAVELET-PACKET-BASED
DENOIZING
proposed in the flowchart of Fig. 2.1 is further described as follows. Measured PD and
corona signals are first divided into two sets, namely the training and test sets for
selecting and verifying the denoizing parameters respectively. The training set is used
to determine the optimal parameters required for the remaining denoizing process. The
optimal wavelet for the wavelet packet decomposition is first selected, and followed by
the selection of decomposition level. The selection of best decomposition tree is then
performed. Parameters related to thresholding are set. The test set is entered at a
much later part of the proposed scheme of Fig. 2.1.
decomposition and coefficients thresholding are applied to both the training and test
sets. Finally, the denoized signal is reconstructed and the denoizing performance is
evaluated by signal-to-noise ratio (SNR) and Correlation Coefficient. Another round
of training will be carried out, should the post-denoizing performance be below a pre45
The first task to be accomplished with the training set is to identify the optimal wavelet
(Fig. 2.4), which best describes a set of PD signals. In this thesis, a method based on
minimum-prominent-decomposition coefficients [44] is extended to choose the optimal
wavelet from a set of candidate wavelets, such as Daubechies, Symlets, Coiflets and
Biothogonal wavelets. The flowchart of the method is shown in Fig. 2.5.
47
For each candidate wavelet, the method first decomposes the jth PD signal of the
training set into wavelet packet domain down to a predetermined level of 5 as shown in
Fig. 2.6. Secondly, the mean value of the absolute values of detail coefficients is
calculated for each decomposition level and then summated across all the five
decomposition levels forming j. The value is computed for all the other signals in
the training set and summated to give . The value of indicates how closely the
candidate wavelet is describing the PD signals. A small indicates good performance
of the candidate wavelet. The procedure is then applied to all the other wavelets. The
wavelet giving the lowest is chosen as the best wavelet. As a result, the 'sym8'
wavelet is obtained from the training set. The effectiveness of the above procedure is
illustrated in Fig. 2.7. As observed, the shape of the selected wavelet, which results in
the smallest , best represents the PD signal that is resulted from a free particle.
Similar results are obtained on the other type of PD and corona signals.
f (t )
1, 0
1,1
2,1
2,0
3,1
3,0
4,0
4,1
4,2
4,3 4,4
3,4
3,3
3,2
4,5
4,6
2,3
2 ,2
4,7 4,8
4,9 4,10
3,7
3,6
3,5
4,11 4,12
4,13 4,14
4,15
5,0 5,1 5,2 5,3 5,4 5,5 5,6 5,7 5,8 5,9 5,105,11 5,12 5,13 5,145,155,16 5,17 5,185,19 5,20 5,21 5,225,235,245,255,265,27 5,28 5,295,30 5,31
Fig. 2.7 Comparison of wavelets (a) db2; (b) bior3.3; (c) sym8; (d) PD signal.
50
effective nodes to best characterize the PD signals in the training set and to remove
the non-effective nodes that are highly corrupted by white noise. The tree structure
after pruning will be used for denoizing signals of both the training and test sets.
To evaluate the effectiveness of the nodes, a union tree is first constructed as in Fig.
2.8. Each node of the union tree is the union of the corresponding nodes in the WPD
trees of all the signals in the extended training set, which consists of 24 PD signals and
24 white-noise signals. For convenience, nodes of the union tree are numbered as in
Fig. 2.9.
51
A performance index is then required to measure the level of white noise at each node
during the best tree selection. Figs. 2.10 (a) and (b) show the wavelet-packetdecomposition coefficients of a measured PD signal and a white noise signal
respectively. Each grid in the figure represents a node of original WPD tree. It can be
seen that the decomposition coefficients of white noise have small and similar
magnitude in all the nodes, while decomposition of PD signal results in large
coefficients in the PD-dominated nodes. Therefore, if a node of the original WPD tree
is dominated by all the PD signals in the extended training set, then the coefficients in
the corresponding node of the union tree have the largest standard deviation as shown
in Fig. 2.11(a). Fig. 2.11(b) shows the case where the node is partially dominated by
PD and (c) illustrates a noise-dominated node. It is seen that the standard deviation of
the coefficients of a node in the union tree, which is defined as global standard
53
deviation, reflects the degree of PD domination of the node. It is thus computed for
each node of the union tree to evaluate its effectiveness.
54
Fig. 2.11 Nodes of the union tree (a) node 50 dominated by PD; (b) node 53
partially dominated by PD; (c) node 34 dominated by noise.
55
The global standard deviation n for the nth node of the union tree is given as:
n =
k =1
( c nk
cn
)2
(2.3)
where
cn = the mean of cn .
M = the number of coefficients in nth node.
n
Fig. 2.12 shows the calculated global standard deviations for nodes of the union tree.
Nodes with small global standard deviations that are marked with (*) in Fig. 2.12 are
thus considered white-noise corrupted and to be removed from the original WPD tree.
Only nodes with large global standard deviations that are marked with (o) are retained
in the best tree structure due to strong PD domination.
56
Fig. 2.12 Global standard deviations on each node of the union tree
Aside from having large global standard deviations, nodes retained from the above
procedure must meet the orthogonality condition [45]. The method of bi-directional
priority registration (BPR) is proposed here to meet the condition, using which a
complete pruning of the original WPD tree is performed to obtain the best tree as
follows:
(1) Calculate for each node in the union tree its global standard deviation as in Fig.
2.12. Rank the nodes in descending order of the magnitude of their global
standard deviations.
(2) Remove those nodes from the ranking in (1), whose global standard deviations
are below a predetermined value (set to 0.001 in this study based on extensive
57
study).
(3) Starting from i = 1 on the node with highest global standard deviation.
(4) Trace back the family tree of node i, and remove all father node(s) from the
current ranking.
(5) Remove all the child nodes of node i from the ranking.
(6) Descend to the next node in the current ranking, i = i+1. Go to step 7 if it goes
beyond the end of ranking. Otherwise go to (4).
(7) The resulted ranking will provide the best tree structure.
Fig. 2.13 shows the obtained best tree, using which denoizing of PD signals is carried
out. Comparative studies of the overall denoizing performance with other proposed
methods are presented in Section 2.4.
58
59
Hard thresholding removes all decomposition coefficients, which are below a certain
threshold value. In addition to hard thresholding, soft thresholding shrinks all
remaining coefficients according to some linear law.
Fig. 2.14 shows results from soft and hard thresholding the decomposition coefficients
of node (4,7) of the best decomposition tree. Fig. 2.14(a) shows coefficients before
thresholding. The large coefficients in Fig. 2.14(a) represent PD components whereas
the remaining coefficients represent the white noise. Figs. 2.14(b) & (c) show the
processing results of soft and hard thresholding respectively.
60
61
j,n (k) =
H(m 2k)
m
j +1,2n
(2.4)
62
where H,G are reconstruction filters and j,n(k) is the kth coefficient at node (j,n). The
denoized signal is the sum of all the components reconstructed from the terminal nodes
in the best tree.
C. Performance testing
After the denoized signal is reconstructed, denoizing performance is assessed. If the
performance on training set is satisfactory and the assessment on test set is better than
or close to the average performance on the training set, the parameters determined in
Section 2.3.2 are accepted. Bad performance is probably due to:
(1) Signals in training set are not able to cover the variety of the PD waveforms.
Therefore, more PD signals have to be measured under the same condition as
63
2.4
Results obtained from various choices of denoizing parameters are presented and
discussed in this section. The signal-to-noise-ratio (SNR) and correlation coefficient
(CC) as in equations (2.5) & (2.6) are employed to evaluate the denoizing performance.
Energy ( R)
SNR = 10*log10
Energy
R
Y
(
)
(2.5)
N 1
CC =
(Y (i ) Y )( R (i ) R )
i=0
N 1
N 1
(Y (i ) Y ) ( R (i ) R )
i=0
(2.6)
i=0
where Y and R denote the denoized and original PD signals respectively. Y and R
64
Due to the limitation of space, only denoizing results of PD signals resulted from free
particle are shown in this section. Similar results are obtained for other types of PD.
Fig. 2.17 shows a typical noise-free PD signal (free particle) obtained with noise
control in a shielded laboratory. To verify the effectiveness of the proposed method,
signals of various SNR are generated by superimposing artificial white noises of
different levels on the noise-free signal. As the noise-free signal and noise content are
known in advance, SNR and CC can be calculated accurately. Apart from the
generated signals, results obtained from measurement without noise control are also
presented in Section 2.4.4.
65
Correlation
Coefficient
db2
15.2
0.86
db4
15.7
0.86
db6
16.8
0.90
db8
17.5
0.92
db10
16.9
0.90
sym2
15.6
0.88
sym4
16.0
0.90
sym6
17.6
0.92
sym8
18.3
0.96
sym10
17.9
0.94
coif2
16.2
0.89
coif3
16.4
0.90
coif4
15.8
0.87
coif5
16.0
0.88
Figs. 2.18 & 2.19 show the impact of decomposition level on the denoizing
performance. Both SNR and CC after denoizing hardly increase when the
decomposition level gets beyond 5. Similar results are obtained for PD signals having
different SNRs.
66
The Var-WPT method is seen to increase the SNR values of all PD signals to a very
narrow range after denoizing. Similar observation is made on the CC values. These
results suggest that the performance of Var-WPT method is robust for PD signals of
different noise levels.
68
Fig. 2.20 A comparison of the denoizing performance for PD signal with SNR=10 dB.
(a) Noisy signal; (b) result of DWT-based method; (c) result of Ent-WPT method;
(d) result of Var-WPT method
69
Fig. 2.21 A comparison of the denoizing performance for PD signal with SNR=0 dB
(a) Noisy signal; (b) result of DWT-based method; (c) result of Ent-WPT method; (d)
result of Var-WPT method
70
Fig. 2.22 A comparison of the denoizing performance for PD signal with SNR= -10 dB
(a) Noisy signal; (b) result of DWT-based method; (c) result of Ent-WPT method; (d)
result of Var-WPT method
Table 2.2 Comparison of SNR and CC values of different methods
SNR of
Noisy PD
Signals
SNR = 10 dB
SNR = 0 dB
SNR = -10 dB
DWT
Ent-WPT
Var-WPT
DWT
Ent-WPT
Var-WPT
DWT
Ent-WPT
SNR of
Denoized PD
Signals (dB)
15.2
15.6
19.0
9.8
12.5
18.3
2.0
8.8
Var-WPT
17.5
Denoizing
Approach
Correlation
Coefficient
0.86
0.88
0.98
0.82
0.87
0.96
0.69
0.84
0.93
71
Table 2.3 Impact of threshold calculation rule on SNR and Correlation Coefficient
Algorithm
SNR of noisy
PD signal (dB)
SNR after
denoizing (dB)
Correlation
Coefficient
-5
8.5
0.84
11.4
0.86
15.0
0.90
-5
13.9
0.89
13.3
0.89
13.8
0.88
-5
4.2
0.78
12.1
0.86
19.7
0.97
-5
17.6
0.94
18.3
0.96
20.2
0.98
Performances of the soft and hard thresholding are compared in Fig. 2.23. Fig. 2.23(a)
shows a noisy PD signal. Figs. 2.23(b) and (c) show the denoizing results by applying
soft and hard thresholding respectively. The correlation coefficients resulted from soft
72
and hard thresholding are 0.86 and 0.93 respectively, which indicate the effectiveness
of the latter method over that of the former. The better performance of the hard
thresholding is also confirmed by the observation of Figs. 2.23(b) and (c), which is
seen to result in less distortion than soft thresholding. Hence, hard thresholding is used
in all studies.
73
74
2.5
CONCLUDING REMARKS
Besides the best tree, selection of other parameters associated with the denoizing
scheme is also studied and discussed. However, the parameters are considered
separately, which may result in bad overall performance. Thus, optimal selection of a
complete set of parameters is further investigated in Chapter 3.
75
CHAPTER 3
OPTIMAL SELECTION OF PARAMETERS FOR
WAVELET-PACKET-BASED DENOIZING
In this chapter, a method based on genetic algorithm (GA) is developed to address the
issue of optimal denoizing parameters selection. It begins with a summary of the
parameters to be optimized, followed by the construction of fitness function.
Subsequently, the GA optimization method is described with detailed discussion on its
control parameters. Lastly, numerical results are presented and compared with those
obtained in Chapter 2.
76
3.1
INTRODUCTION
77
3.2
Table 3.1 shows the parameters to be optimized. Four wavelet families, namely
Daubechies wavelets, Symmlet wavelets, Coiflet wavelets, and Biorthogonal wavelets
are short-listed for selection due to their proven applicability [42, 45]. Total number of
candidate wavelets is thus sixty-four. The decomposition level to be selected is from 1
to 8.
78
Subtotal
64
1-8
Soft thresholding,
hard thresholding
Threshold
Estimation Rule
Threshold
Processing Rule
No processing,
global processing,
node dependant processing
Wavelet
Decomposition
Level
Soft or Hard
Thresholding
3.3
Range of Parameter
To effectively denoise PD signal, the performance of the set of parameters used must
be evaluated by some common criteria. The objectives of denoizing are to effectively
suppress the noises and restore the original PD signal with little distortion. The signalto-noise-ratio (SNR) and correlation coefficient (CC) as in equations (2.5) & (2.6) are
thus employed to evaluate the performance.
As illustrated in Fig. 3.1, SNR and CC are sometimes conflicting. Their combination is
therefore used in the GA fitness function for consistent evaluation of the overall
denoizing performance.
79
The original definition of SNR of equation (2.5) allows negative values to be taken due
to the logarithmic computation, which makes it impossible to be used in the GA fitness
function. Therefore, another version of SNR (m_SNR) is defined as
m _ SNR =
Energy ( R )
Energy ( R Y ) ,
(3.1)
where Y and R denote the denoized and original PD signals respectively. Obviously,
the value of m_SNR is always positive. Subsequently, the GA fitness function
corresponding to each signal in the training set is defined as the combination of
m_SNR and the original CC, which may take various forms such as:
80
g = m _ SNR * CC
(3.2)
or
g = m _ SNR + CC
(3.3)
However, GA is not able to converge when fitness function in equation (3.2) is used.
Therefore, only equation (3.3) is considered as the fitness function. Since the m_SNR
usually takes a much larger value (about twenty times) than CC, the fitness values
calculated by the above formulas are governed by m_SNR. Therefore, only a high
signal-to-noise-ratio is guaranteed by optimizing the fitness function in equation (3.2)
or (3.3). The correlation coefficient is however neglected during GA optimization. As
a result, the obtained parameters may lead to effective suppression of noise, but large
distortion could be observed. To tackle this problem, the fitness function of equation
(3.3) is modified as:
g = 0.05* m _ SNR + CC
(3.4)
where the coefficient of 0.05 is used to set the two components of g in the same range.
Considering all signals in the training set, the GA fitness function is finally:
1
fitness =
N
g (i)
i =1
(3.5)
81
3.4
PARAMETER OPTIMIZATION BY GA
82
3.4.2 GA Optimization
For GA optimization, the denoizing parameters shown in Table 3.1 must be
represented in binary form. Therefore, they are coded in a string of 14 binary bits as in
Fig. 3.2.
For the implementation of GA, the roulette wheel approach is adopted here in
reproduction. The single-point crossover is applied to randomly paired sub-strings with
a probability Pc. To ensure diversity during evolution, mutation is performed for each
bit in the population with a probability Pm.
The GA flowchart for denoizing parameters optimization is shown in Fig. 3.3 and a
description of the major steps is as follows:
(1) Prepare the training set that is the same as that used in Chapter 2.
83
84
Start
Input training
set
Generate initial
population
Select individual in
order; set i=1
Reconstruction
New population
i=i+1
Fitness of individual:
fitness=Mean(g)
NO
Reproduction
NO
YES
Output optimal
solution
End
A. Population size Np
The population size of GA defines the number of candidate solutions in each
generation. Choosing a suitable population size is a fundamental consideration for GA
application. If the size of population is too small, GA may converge prematurely due
to the insufficient information given on the searching space. On the other hand, a large
population requires more evaluations per generation, which may result in an
unacceptably slow rate of convergence. In this study, a relatively small population size
(Np=8) is employed first. Then, the population size is increased until a consistent
solution is found.
Fig. 3.4 shows the performance of GA using population size of 8, 16 and 40. It can be
seen that GA converges to a sub-optimal solution when a small population size (Np=8)
is employed. In the cases of Np=16 and Np=40, similar performance is achieved,
which is better than the case of Np=8.
Table 3.2 shows the computation time of GA with various Np. As observed, the
computation time is proportional to Np. Although more iterations are required for the
case of Np=16 than that of Np=40, GA converges faster in the former case, as less
evaluations are performed at each iteration. In a word, the population size of 16 leads
to a good tradeoff between performance and computation time, and thus is chosen for
the optimization task in this study.
86
Iterations
32
102
16
48
306
40
35
675
the individuals with good performance may be discarded and the improvement of
performance may not be achieved. On the contrary, if the crossover probability is too
low, the search may stagnate prematurely due to the low exploration rate. Thus, a
proper crossover probability must be selected experimentally.
Fig. 3.5 illustrates the effect of using different crossover probability in the GA
optimization. It can be seen that GA with Pc of 0.75 gives the best performance. In the
other two cases, where Pc takes 0.95 and 0.55 respectively, GA converges to much
lower fitness values. Thus, Pc is set to 0.75 for all the subsequent experiments.
88
Fig. 3.6 illustrates the performance of GA with various Pm. It is seen that a mutation
probability of 0.15 leads to the best performance. Neither a higher Pm (=0.3) or a
lower Pm (=0.01) gives satisfactory result. Therefore, Pm=0.15 is chosen for the
optimization.
Another issue related to GA optimization is the criteria used to stop the GA program.
In this study, two criteria are adopted as follows:
(1) When the maximum number of generations (Ns) is reached, the GA program
stops. Ns is set to 1000 in this study.
89
(2) GA stops when the best fitness saturates over a number of generations.
3.5
PERFORMANCE TESTING
After parameters optimization using the training set, the performance of the parameters
is assessed on the test set. If the assessment is better than or close to the average
performance on the training set, the obtained parameters are accepted. Otherwise,
possible reasons for having bad performance are as follows:
90
(1) The signals in training set are not able to cover the variety of the PD waveforms.
Therefore, more PD signals that belong to the same class as the underperformed signals have to be measured and used to extend the training set.
(2) GA could have converged sub-optimally due to badly chosen GA parameters.
Therefore, GA parameters have to be adjusted.
After proper measures are taken, GA is executed with the updated parameters and
training set (Fig. 3.3).
3.6
In this section, results from GA are presented and compared with those obtained from
the method presented in Chapter 2. The same training and test set as in Chapter 2 is
used here.
Fig. 3.7 shows the convergence of GA and the denoizing performance using
intermediate parameters obtained during convergence. GA takes 48 iterations and
about five minutes on the Pentium-IV to converge. It improves the denoizing
effectively and continuingly during convergence.
91
92
Table 3.3 shows the parameters obtained at intermediate stages of convergence. Stage
(a) corresponds to the highest fitness value (convergence), whose parameters are
optimal for the given set of training data. Parameters obtained from Chapter 2 with the
same training set are shown in Table 3.4. It can be seen that the decomposition level
and thresholding method obtained by stage (a) and the method in Chapter 2 are the
same while other parameters are different. Stage (a) and the method in Chapter 2 both
recommend the same wavelet family (Symmlet), but different members of the family.
This indicates that the minimum-prominent-decomposition coefficients method as
adopted in Chapter 2 is effective although not optimal. In all study cases, the Symmlet
family fits the PD signals better than other wavelet families.
3.8
sym6
hard
(b)
2.7
coif2
soft
(c)
1.2
db10
hard
Threshold
estimation
rule
fixed form
threshold
mixed
estimation
rule
Stein's
unbiased
risk
estimate
Threshold
processing
rule
node
dependant
processing
node
dependant
processing
global
processing
hard
mixed
estimation rule
Threshold
processing rule
global
processing
93
The GA-based method and the method in Chapter 2 are further compared in Fig. 3.8,
with Fig. 3.8 (a) showing the noisy PD signal. Figs. 3.8 (b) & (c) show the denoized
signals using parameters obtained by the method in Chapter 2 and GA respectively.
As observed, parameters obtained by GA suppress the noise and restore the original
PD signal far more effectively. The SNR values correspond to Fig. 3.8 (b) & (c) are
16.7 and 19.1 and CC values are 0.93 and 0.97 respectively. These results confirm the
better performance of the parameters obtained by GA. Similar results are obtained
from other signals taken from the test and training sets.
94
3.7
CONCLDING REMARKS
The performance of the denoizing scheme is largely dependent on how the scheme
parameters are determined. In this chapter, a GA-based method is developed to
optimize the parameters associated with the wavelet-packet-based denoizing scheme.
Numerical results indicate that the GA-based method ensures optimal denoizing in
terms of successful restoration of the original PD signal with significant reduction in
the noise level. The method enables automatic and fast determination of parameters.
Denoized signals can then be used to develop a reliable diagnosis system for
recognizing corona and SF6 PD resulted from various defects.
95
CHAPTER 4
PD FEATURE EXTRACTON BY INDEPENDENT
COMPONENT ANALYSIS
96
4.1
INTRODUCTION
For condition monitoring of GIS, it is crucial to recognize the source of the harmful
PD activities in SF6 and the unharmful air corona in a fast and reliable manner. The
key component of such a PD diagnosis system is to extract the most effective and
reliable PD features from the measured raw data, so that satisfactory performance can
be achieved in the subsequent classification task. Fig 4.1 illustrates various methods
for extracting PD features. As reviewed in Chapter 1, the traditional PRPD and POW
approaches have noticeable limitations in terms of speed and classification
performance. Therefore, methods using UHF signals measured within hundreds of
nanoseconds are developed for PD identification in this study. In this chapter, timedomain techniques namely independent component analysis (ICA) and principal
component analysis (PCA) are employed to perform the feature extraction. In Chapter
5, a wavelet-packet-based method is proposed for extracting the most discriminating
features from time-frequency domain. Using the features extracted by ICA- or
wavelet-packet-based method, a neural network is trained and tested in Chapter 6 for
classifying a new set of measured data. Data measured one metre away from PD
source as in Table A.1 are employed in Chapters 4,5 and 6 for developing the PD
identification system. The robustness of extracted PD features on data measured from
other PD-to-sensor distances is investigated in Chapter 7, where a re-selection and retraining scheme is proposed.
97
The ICA-based PD feature extraction is illustrated in Fig. 4.2. In the current study, the
original waveforms of UHF signals are crucial for source recognition, as the feature
extraction and classification are based on the time-domain signals only. However, due
to the excessive white noises, the original waveforms are often distorted or even buried
under the noise. In Chapters 2 and 3, the problem of white noise has been successfully
tackled by applying the wavelet packet denoizing on each measured waveform as
shown in Fig. 4.2, which makes the subsequent recognition of PD source an easier task
to be accomplished.
98
Air corona is often regarded as another form of noise in PD monitoring system of GIS.
Since corona signal is very similar to SF6 PD signal, it often leads to misclassification,
which may result in wrong decision. Therefore, it is of great importance to correctly
classify PD and corona. To reduce the response time of the PD diagnosis system,
source recognition of SF6 PD and the discrimination of corona and SF6 PD are
considered together in this study, so that no second judgment is needed. In the
following text, PD identification refers to classification of all types of SF6 PD as
well as air corona, except specified.
99
levels during measurement or setting of the oscilloscope. Since the statistical measures
used in this study such as negentropy, kurtosis and skewness are subject to time
translation, the values of these measures are different for signals in Figs. 4.3 (a) and (b).
Such a difference may cause difficulties in extracting PD features and the subsequent
classification task. Hence, a process known as pre-selection (Fig. 4.2) is employed to
cancel the time shift effect by capturing a segment with a predetermined length
starting from the initial surge of the signal. The process thus ensures the signals to
have the same set of features upon a signal pattern with all possible time shifts. Details
of the pre-selection process are given in Section 4.2.
After denoizing and pre-selection, the PD identification task is performed in two steps,
namely feature extraction and classification. Each set of pre-selected signals has
100
4.2
PRE-SELECTION
101
Fig. 4.4 Detecting the starting point of PD event (a) measured signal; (b) denoized
signal
Fig. 4.5 Pre-selection of UHF signal (a) before pre-selection; (b) after pre-selection
102
4.3
103
ICA can be considered as a generalization of PCA. Both ICA and PCA linearly
transform the measured signals into independent or principal components, which are
ranked in descending order according to the variance of their corresponding
projections. The key difference between ICA and PCA is however in the nature of
components obtained. The goal of PCA is to obtain principal components, which are
uncorrelated. However, components obtained from ICA are statistically independent,
which is a stronger condition than uncorrelated in terms of independency of the
components. Separability of features in the measured data is affected by factors such
as the frequency response of sensor, the PD source and path of propagation, which are
statistically independent. A comparison of the numerical results from ICA and PCA
are given in Section 4.5, which clearly favor the former.
Y = WX
(4.1)
104
In (4.1), both the independent components Y and matrix W are unknown. Therefore,
the independent components must be found iteratively by maximizing the
independency with respect to W. In this study, an algorithm known as FastICA is
adopted for implementing the ICA [67]. According to the Central Limit Theorem, the
independency of components can be measured from the statistical property, known as
nongaussianity. In FastICA, a criterion known as negentropy is employed to be a
quantitative measure of nongaussianity. Maximizing the negentropy with respect to W
results in the independent components.
demonstrates the effectiveness of FastICA and the negentropy criterion. Fig. 4.7 shows
the two basic signals that are generated independently. The basic signals are then
linearly combined to simulate the measured signals (X) as illustrated in Fig. 4.8. Using
X as the input of FastICA, the independent components are estimated one by one. As
shown in Figs. 4.9-4.10, the independent components are found in four and three
iterations respectively by maximizing the negentropy (J). As observed, the estimated
components are almost the same as the original ones. Thus, the effectiveness of
FastICA for finding independent components is verified. Key features of ICA and its
implementation - FastICA are reviewed in Appendix D.
105
106
Fig. 4.9 Process of finding the first independent component (a) 1st iteration (J=
4.2797); (b) 2nd iteration (J= 5.7788); (c) 3rd iteration (J= 8.0597); (d) 4th iteration
(J= 11.1297).
Fig. 4.10 Process of finding the second independent component (a) 1st iteration (J=
4.6197); (b) 2nd iteration (J= 7.4563); (c) 3rd iteration (J= 10.9805).
107
4.4
The process is carried out with the aim of reducing the length of the working data for
subsequent PD identification to be automated by a neural network (Chapter 6).
108
Fig. 4.11 Chosen signal sets for calculating independent components (1)-(2) corona;
(3)-(4) particle on the surface of spacer; (5)-(6) particle on conductor; (7)-(8) free
particle on enclosure.
109
Each chosen set of signals xi, i=1,2,,8 is thus a linear combination of the
independent components:
xi = ai , j ICAPDj
j =1
i = 1, 2,...8
(4.2)
where
110
ICAPDj = the jth independent component obtained by FastICA that has a size of
1*1000. j runs from 1 to 8.
ai , j
= the projection of ith signal set (xi) on the direction of jth component.
Thus
ai , j
Subsequently, the variance of the projections onto the pth independent component is
defined as
Varp =
1 8
(ai , p p ) 2
7 i =1
(4.3)
where
ai , p
In Fig. 4.12, all ICAPDj are ranked in descending order according to the variance of
their corresponding projections as shown in Table 4.1.
111
Variance of the
projections
ICAPD1
0.2028
ICAPD2
0.1885
ICAPD3
0.0329
ICAPD4
0.0228
ICAPD5
0.0215
ICAPD6
0.0188
ICAPD7
0.0164
ICAPD8
0.0067
Following the same idea used in PCA-based method, any ICAPD with small variance
(<0.05 in this thesis) in the corresponding projections is discarded for having
negligibly small discriminating information. As a result, only the first two independent
components in Fig. 4.12 are retained to represent the set of 8 chosen signals.
(4.4)
where
ICAPD
The size of the extracted feature set ICA_Feature is thus 80 * 2 that is much smaller
than the size of pre-selected signal sets 80*1000.
If the number of inputs is too small, there will not be enough information of PD signals
for FastICA to compute the independent components correctly. On the other hand, if
there are too many inputs, it will take longer time for the algorithm to converge. In
113
addition, since only the most dominating components are useful for the subsequent
feature construction task, it is not necessary to compute too many independent
components as most of them result in projections with small variances.
Since there are four classes of signals under investigation, the number of inputs should
be at least four to cover the varieties of the measured signals. Based on waveforms of
the typical signals, the number of inputs is set to eight (two from each class) to make a
good tradeoff between accuracy of the resulted components and the convergence speed.
B. Approximation of Negentropy
As introduced in Section 4.3.2, negentropy is employed in FastICA as a measure of
nongaussianity to maximize the independency between components. However, it is
computationally very difficult to calculate negentropy directly, as an estimate of the
probability density function is required [59]. Therefore, it is highly desired to use
simpler approximations of negentropy.
(4.5)
where
114
E = expectation operator.
G1 = exp(u 2 2)
G 2 = log ( cosh( u ))
G3 =
1 4
u
4
G4 =
1 3
u
3
(4.6)
where u is the component vector under investigation. These functions are conceptually
simple, robust and fast to compute. Thus, their performances on PD signals are studied
and compared in this thesis.
115
as the evaluation criterion. The larger the value, the better the performance of the
corresponding approximated negentropy in terms of discriminative power. Following
procedure is then used to compare the approximated negentropies with different
function G.
(1) Use the chosen set of signals as input of FastICA as in Section 4.4.1. Set i=1.
(2) Set Gi as the function used to calculate the approximated negentropy in
FastICA algorithm.
(3) Run FastICA to find all the independent components.
(4) Compute the variances ( Var1i , Var2i ) of the projections onto the first two
independent components using equation 4.3.
i
i
(5) Compute i = Var1 + Var2 .
i ) .
(7) Find the best G that results in the largest , namely Gopt = max(
G
i
116
Var1
Var2
G1
0.2082
0.1885
0.3913
G2
0.2092
0.1387
0.3479
G3
0.2016
0.0914
0.293
G4
0.2022
0.1142
0.3164
C. Stop Criteria
Since FastICA is an iterative algorithm, some criteria must be applied to stop the
program. In this thesis, two criteria are adopted as follows:
(1) The algorithm stops when the maximum number of iterations is reached. It is
set to 1000 in this study.
(2) FastICA stops when the change of components saturates over a number of
iterations.
The FastICA program stops when either of the above criterions is met.
117
4.5
118
Fig. 4.13 ICA features corresponding to (a) ICAPD1 and (b) ICAPD6
(1) Use PCA to find the most dominating principal components, which result in the
largest variances in the corresponding projections.
(2) Project 80 pre-selected signals onto the two most dominating principal
components, which is similar to the process described in Section 4.4.2.
Figs. 4.14 (a) and (b) show the two most dominating independent components, while
the most dominating principal components are illustrated in Figs. 4.14 (c) and (d). It is
seen that the components obtained by ICA and PCA are quite different. This indicates
that although there are some seeming similarities between PCA and ICA, they are
essentially different statistical methods.
Fig. 4.14 Most dominating (a)-(b) independent components and (c)-(d) principal
components
120
The performances of PCA- and ICA-based methods are first compared in Table 4.3.
As observed, both of the variances obtained from independent components take much
larger values than those obtained from principal components. This suggests that the
features extracted by ICA-based method should lead to better classification due to
more discriminative power introduced by independency of the features.
Table 4.3 Variances of projections onto the most dominating independent and principal
components
Var1
Var2
Independent components
0.2082
0.1885
Principal components
0.1698
0.0893
Fig. 4.15 further compares the performance of ICA- and PCA-based feature extraction.
Features obtained from ICA are seen to cluster distinctly according to the four sources,
although clusters corresponding to spacer and enclosure are close to each other
due to the similarity of the two types of PD as shown in Figs. A.3 (b) and (d). Features
of spacer, conductor and enclosure resulted from PCA are seen to overlap with
each other. This indicates that the ICA-based feature extraction outperforms PCAbased method due to superior statistical properties of the former components.
121
Fig. 4.15 Feature clusters formed by (a) ICA features (b) PCA features
122
Table 4.4 shows the average convergence time of FastICA when signals of different
SNR levels are used as its input. It can be seen that the convergence time gets longer as
the noise level gets higher. The convergence time increases significantly due to the
more computation time required in the process of maximizing negentropy. In the worst
case, where the SNR of input signals is -5, the algorithm is not able to converge within
the pre-determined maximal iteration.
SNR=17
(after denoizing)
SNR=0
SNR= -5
1.911
3.207
* 183
*: In this case, FastICA is not able to converge in 1000 iteration. (Section 4.4.3 C).
Convergence is observed at 9800 iteration.
Fig. 4.16 illustrates the feature clusters obtained from ICA-based method with input
signals of different noise levels. As shown in Fig. 4.16 (a) where the SNR of input
signals is 0, features of spacer and enclosure are seen to overlap with each other,
although features of corona and conductor are still well separated. The worst case
(SNR= -5) is shown in Fig. 4.16 (b), where the features are all mixed up. It is
impossible to discriminate PD source correctly using these features. Thus, it is
imperative to remove the white noises before the features are extracted.
123
Fig. 4.16 Feature clusters formed by ICA-based method. Noise level of input signals is
(a) SNR=0; (b) SNR= -5.
124
4.6
CONCLUDING REMARKS
125
CHAPTER 5
PD FEATURE EXTRACTION BY WAVELET PACKET
TRANSFORM
126
5.1
INTRODUCTION
127
5.2
128
80 WPD trees are formed by performing the decomposition. The wavelet packet
decomposition is set on a decomposition level of 5 (Fig. 5.2) and the db9 wavelet
packets based on the effectiveness of the obtained features. The selection of
decomposition level and wavelet filters is discussed in Section 5.3.
f (t )
1, 0
1,1
2,1
2,0
3,1
3,0
4,0
4,1
4,2
4,3 4,4
3,4
3,3
3, 2
4,5
4,6
2,3
2 ,2
4,7 4,8
4,9 4,10
3,7
3,6
3,5
4,11 4,12
4,13 4,14
4,15
5,0 5,1 5,2 5,3 5,4 5,5 5,6 5,7 5,8 5,9 5,105,11 5,12 5,13 5,145,155,16 5,17 5,185,19 5,20 5,21 5,225,235,245,255,265,27 5,28 5,295,30 5,31
Fig. 5.2 WPD tree of level 5 (Copy of Fig. 3.8 for reference)
Each node in the WPD tree represents a set of decomposition coefficients which
correspond to a certain frequency band as shown in Fig. 5.3. The topmost node
contains the pre-selected signal which has a sampling frequency of 4 GHz. According
to the Nyquist theory, the highest frequency content contained in the nodes is up to 2
GHz, namely half of the sampling frequency f 0 . Therefore, one level of decomposition
results in two nodes that have spectra of 0-1 GHz ( 0
f0
4
f0 f0
4 2
respectively. As illustrated in Fig. 5.3, frequency span of each father node is the union
of that of its child nodes.
129
130
A. Node kurtosis
Kurtosis is a statistical parameter describing the shape of a data distribution. It is a
measure indicating whether a data distribution is more or less peaky than the normal
distribution. As shown in Fig. 5.4, data with high kurtosis tend to have a distinct peak
near the mean, decline rather rapidly, and have heavy tails. Data with low kurtosis tend
to have a flat top near the mean rather than a sharp peak.
131
Node kurtosis is defined as the kurtosis of the decomposition coefficients of each node
(j,n) in the WPD tree as in equation 5.1.
j ,n
j ,k ,n
(N
j ,n
1)
j ,n
4
j ,n
)4
3
(5.1)
where
K j ,n = node kurtosis of node (j,n).
j ,n
j ,k ,n
j ,n
N j ,n
j ,n
j ,n
j ,n
j ,n
Since normal distribution has a kurtosis value of three, the minus three in the above
equation means normalization according to normal distribution.
B. Node skewness
132
j ,n
j ,k ,n
(N
j ,n
1)
j ,n
)3
3
j ,n
(5.2)
where
S j ,n = node skewness of node (j,n).
The other variables in the above equation have the same meaning as in equation 5.1.
Comparing equation 5.1 with equation 5.2, it is seen that they have similar structure in
mathematical formula. The difference is only in the order of formula, where kurtosis
has an order of 4 and skewness is of order 3. However, they have completely different
statistical property.
133
Taking advantage of the time information provided by wavelet packet transform, node
kurtosis and node skewness describe the distribution shape of the decomposition
coefficients locally in a specified frequency band at each node. They enable detailed
time-frequency analysis of the UHF signals. Thus, they are considered as important
local features for PD identification.
C. Node energy
134
The wavelet packet power spectrum provides us with information about the local
spectral content of the signal. The local wavelet packet power spectrum corresponding
to each node (j,n) is defined as
1
j ,n
N
Pj , n =
(5.3)
where
j ,n
To reduce the computation complexity, the normalization factor 1/N in (5.3) is omitted
in our analysis. The modified wavelet spectrum is named as node energy [68], and
is denoted as
E j ,n = j ,n
(5.4)
the value halfway between the two middle data points. Mean is computed by adding all
the numbers in the set and dividing the sum by the number of elements added. For a
given set of data, these measures may be very close or may be quite different,
depending on how the data are distributed.
Node median and node mean are defined in the same way of the previous node features.
They are computed by taking the median and mean of the decomposition coefficients
of each node as in equation 5.5 and 5.6 respectively.
Med j ,n
y( N +1) / 2
= 1
2 ( y N / 2 + y N / 2+1 )
if N is odd
if N is even
(5.5)
where
y = sorted coefficients vector of node (j,n).
N = length of the coefficients vector of node (j,n).
M j ,n =
1
N
k =1
j ,k ,n
(5.6)
where
j ,k ,n
136
Node kurtosis, node skewness, node energy, node median and node mean are
computed for each node in a WPD tree. As illustrated in Fig. 5.6, these calculated
features form five feature trees, namely the kurtosis tree, skewness tree, energy tree,
median tree and mean tree, in association with each WPD tree. For example, each node
of the energy tree contains the energy value of the coefficients in the corresponding
node of WPD tree. Since each feature tree contains 62 nodes, the total number of node
features for a PD signal is 310 (=62*5), which is much smaller than the number of
WPD coefficients (=5000).
137
The within-class scatter value (Sw) measures the scatter of feature vectors of different
classes around their respective mean values. The between-class scatter value (Sb) is
defined as the scatter of the conditional mean values around the overall mean value. In
this thesis, the Sw and Sb of a node feature of type t for an L-class problem are defined
as follows:
S w ( j , n)t =
c =1
Nc 2
c ( j , n)t
N
(5.7)
138
Sb ( j , n )t =
c =1
Nc
2
(c ( j, n)t ( j, n)t )
N
(5.8)
where
t = the type of feature such as energy, kurtosis, and so on.
c2 ( j , n)t = the variance of features of type t at node (j,n) across the signals
belonging to class c.
J ( j, n )t =
S b ( j, n )t
S w ( j, n )t
(5.9)
The between-class scatter value indicates how far the features of different classes are
separated. On the other hand, the within-class scatter value shows the compactness of
the feature cluster corresponding to each class. In order to have a good separability for
classification, large between-class scatter and small within-class scatter are desired.
Therefore, a large J ( j, n)t value indicates that features of type t at node (j,n) form a
good feature set.
139
To illustrate and verify the effectiveness of the J criterion, equations 5.7 and 5.8 are
simplified by considering the 2-class case as follows:
where C1 =
N1
N
and C2 =
N2
N
(5.10)
where C3 =
N1 N 2
N2
(5.11)
It is seen from equations (5.10) and (5.11) that Sw and Sb are in proportion to the sum
of the variances and the distance of the means respectively. Therefore, the smaller the
variances and the larger the distance of means, the better the features class
separability.
The effectiveness of the J criterion is illustrated in Fig. 5.7. Fig. 5.7 (a) shows the case
where the feature clusters have means that are far from each other, but they are still not
well-separated due to their large variances. On the other hand, the means of feature
clusters in Fig. 5.7 (b) are too close to have a good separability, although the clusters
140
are compact. Fig. 5.7 (c) is the worse case where the mean values are close and
variances are large. As observed, the feature clusters are almost overlapped. An
example of good separability is shown in Fig. 5.7 (d), where feature clusters with
compact distribution are separated in the distance. Therefore, it can be concluded that a
small Sw and a large Sb lead to good features for classification. Thus the use of J
criterion is justified.
To select the best features, J values of all the 310 (62*5) nodes in the feature trees are
calculated using the J criterion. Features with the largest J values are selected to be the
input of the neural network (Chapter 6).
141
142
5.3
Associated with the wavelet packet decomposition, there are two parameters to be
determined, namely decomposition level and wavelet filters. These parameters have
significant impact on the feature calculation and selection. Thus, the selection of these
parameters is investigated in this section.
On the other hand, when decomposition level gets higher, the algorithm will get slow
dramatically. Therefore, it is crucial to select a suitable decomposition level that makes
a good tradeoff between number of candidate features and the speed. Table 5.1 shows
the effect of choosing different decomposition level. It can be seen that a
decomposition of 5 achieves sufficient number of features as well as acceptable speed.
Therefore, a decomposition of 5 is used for feature extraction.
143
Number of features
Time (min)
10
0.5
30
2.6
70
7.0
150
10.5
310
25.5
630
123.6
1270
425.0
2550
935.5
For classification, the wavelet which leads to maximal separation of classes in the
feature space is the best choice. Therefore, the J criterion defined in Section 5.2.3 is
used to select the best wavelet. The procedure leading to the determination of best
wavelet is as follows:
(1) Select a wavelet from a set of candidate wavelets that have not been examined.
Set the decomposition level to 5.
(2) Perform wavelet packet decomposition on all 80 data as in Section 5.2.1.
(3) Construct feature trees according to Section 5.2.2.
144
(4) Calculate J values for all the nodes in five types of feature trees according to
Section 5.2.3.
(5) Summate the first five largest J values and denoted as Jsum.
(6) If all the candidate wavelets have been examined, go to (7). Otherwise, go to
(1).
(7) Compare Jsum and the largest J values corresponding to different wavelets and
choose the one with the largest Jsum value.
Using above procedure, largest J values and Jsum corresponding to candidate wavelets
are computed and shown in Table 5.2. It can be seen that the use of wavelet db9
results in the largest Jsum, which in turn leads to the most discriminating features. The
best wavelet for denoizing, namely sym6 wavelet is seen to have an inferior
performance in terms of discrimination ability. Thus, db9 is employed in the feature
extraction process.
largest J
2nd largest
3rd largest
4th largest
5th largest
Jsum
db1
11.1472
8.9926
8.9214
5.1801
5.0376
39.2790
db2
8.7717
6.4077
5.1951
4.6426
4.5084
29.5255
db3
8.6536
8.53
7.4543
7.2913
6.8793
38.8084
db4
11.6558
10.6842
9.0882
8.7225
7.7441
47.8948
db5
10.0822
8.8847
8.3091
7.1419
5.1006
39.5185
db6
8.9007
7.6649
6.6189
5.6495
5.5355
34.3695
db7
9.2279
9.0119
7.5341
6.633
6.3431
38.7501
db8
9.065
8.4025
7.749
7.2971
7.2311
39.7446
db9
12.1435
11.8492
8.6009
8.5927
8.0492
49.2355
145
db10
11.0247
8.2442
7.4261
6.5047
6.0645
39.2642
sym4
9.162
8.9113
8.7878
7.9445
6.5349
41.3406
sym5
9.3107
8.4964
8.0997
7.4842
7.3544
40.7455
sym6
9.0327
8.9521
8.1455
7.2787
6.302
39.7110
sym7
8.8172
8.4305
6.9262
5.8794
5.5216
35.5750
sym8
9.0529
8.391
8.3408
8.2732
8.241
42.2990
sym9
9.4052
8.1913
6.2384
5.5362
5.1012
34.4723
sym10
9.196
7.1077
6.1544
6.068
5.6584
34.1846
coif1
11.3313
9.2107
8.0681
7.6744
7.2843
43.5689
coif2
11.03
10.1664
9.5728
8.7226
7.9131
47.4049
coif3
9.0115
8.8234
7.6481
7.1127
7.0737
39.6694
coif4
8.8934
8.4061
8.0776
7.2761
6.6866
39.3399
coif5
8.9401
6.8749
6.5338
6.1461
5.2337
33.7286
5.4
146
The frequency ranges of selected features show that both high-frequency and lowfrequency decomposition coefficients contain discriminating information. Particularly,
the selection of features defined on nodes at the right-hand side of WPD tree, such as
(5,21), (5,19) and (5,20), suggests that wavelet packet transform is more suitable than
discrete wavelet transform for this study, as these nodes do not exist in the tree
structure formed by discrete wavelet transform.
As shown in Table 5.3, the feature with the largest J value is the node kurtosis of node
(5,21) that corresponds to frequency range of 1.3125 to 1.375 GHz. This means that
the sharpness of decomposition coefficients distribution of the particular frequency
range exhibits the largest difference between signals of SF6 PD as well as air corona.
feature
J value
(5,21)kurtosis
12.1435
1.3125 G 1.375 G
(1,0)skewness
11.8492
01G
(5,1)energy
8.6009
62.5 M 125 M
(5,19)skewness
8.5927
1.1875 G 1.25 G
(5,0)kurtosis
8.0492
0 62.5 M
(3,0)kurtosis
7.7291
0 250 M
(5,20)median
7.5266
1.25 G 1.3125 G
(5,11)skewness
6.6111
687.5 M 750 M
(4,0)skewness
6.5075
0 125 M
10
(4,2)energy
6.1892
250 M 375 M
147
The effectiveness of the extracted features is shown in Figs. 5.8 5.10. Fig. 5.8 shows
the number of wavelet-packet-decomposition coefficients whose values fall into evenly
partitioned ranges. Taking Fig. 5.8(a) as an example, the first range is [-0.02, -0.018],
the second range is [-0.018,-0.016], the third range is [-0.016,-0.014], and so on. There
is one decomposition coefficient falling into [-0.02,-0.018] (first range) as shown in
Fig. 5.8(a). Fig. 5.8 illustrates the distribution of air corona and SF6 PD at node (5,21)
that is selected by the maximal class separability criterion. These distributions exhibit
different shapes and distribution-related features associated with the decomposition
coefficients at node (5,21) should be well separated.
Figs. 5.9 (a) and (b) show the kurtosis values of wavelet-packet-decomposition
coefficients of SF6 PD and air corona at node (5,21) and (4,15) respectively, while
J(5,21)kurtosis is much larger than J(4,15)kurtosis.
Figs. 5.10 and 5.11 demonstrate the feature clusters formed by the first and last two
pairs of extracted features in two-dimensional spaces respectively. As observed,
features in Fig. 5.10 are better separated than in Fig. 5.11 due to the greater J values of
the first four features. In Figs. 5.11 (a) and (b), overlapping of feature clusters is
observed, which indicates inferior classification performance. Thus, the use of J
criterion value as the indicator of separability is verified.
Moreover, it is seen that the margin between feature clusters in Fig. 5.10 (a) is much
larger than that of ICA-formed feature space as in Fig. 4.15 (a). This suggests that
WPT-based method outperforms ICA-based method due to the additional frequency
information. The effectiveness of selected features will be further studied in Chapters 6
and 7.
150
Fig. 5.10 Feature spaces formed by wavelet-packet-based method. (a) 1st and 2nd
selected features; (b) 3rd and 4th selected features
151
Fig. 5.11 Feature spaces formed by wavelet-packet-based method (continue). (a) 7th
and 8th selected features; (b) 9th and 10th selected features
152
Table 5.4 shows the best features obtained from sym6 and db9 wavelet. It can be
seen that the wavelets result in the selection of completely different node features. Figs.
5.12 (a) and (b) further illustrate the feature spaces resulted from sym6 and db9
wavelet respectively. It can be seen that the features extracted by sym6 are not as
well-separated as those extracted by db9. This indicates that although sym6 is the
best wavelet for denoizing, it is not suitable for feature extraction. Thus, the use of J
criterion is further verified as sym6 gives a smaller Jsum value than db9 as in Table
5.2.
best features
J value
(4,10)kurtosis
9.0327
1.25 G 1.375 G
(4,2)skewness
8.9521
250 M 375 M
(5,21)kurtosis
12.1435
1.3125 G 1.375 G
(1,0)skewness
11.8492
01G
sym6
db9
153
Fig. 5.12 Feature spaces formed by the best features obtained from (a) sym6 wavelet;
(b) db9 wavelet
154
Figs. 5.13 (a) and (b) illustrate the impact due to medium background-noise insertion
(SNR=0) and high background-noise insertion (SNR=-5) on separability of the features,
which have been extracted using the db9 wavelet with denoized data (SNR=17). As
shown in the feature clusters of [(5,21)kurtosis, (1,0)skewness], the features of different
classes are seen to become more and more overlapped, as the noise level gets higher
and higher.
To investigate the impact of noise levels on the feature extraction process, signals of
different SNRs are employed for calculating node features and forming the feature
spaces. As illustrated in Table 5.5, fewer features defined on high frequency band are
selected when signals corrupted by high level noises are employed in the waveletpacket-based feature extraction. This indicates that the node features computed from
decomposition coefficients of high frequencies are more affected by noises.
Furthermore, it is seen that the J values of obtained features are smaller than those in
Table 5.3, where denoized signals are used. This suggests that denoizing improves
discriminative ability of the extracted features.
155
Fig. 5.13 Impact of noise levels on the features selected in Section 5.4.1. (a) SNR=0;
(b) SNR=-5.
156
SNR = -5
Serial no.
feature
J value
feature
J value
(1,0)skewness
9.5232
(4,0)skewness
5.1781
(5,0)kurtosis
8.5458
(3,0)kurtosis
4.1951
(5,1)energy
7.9701
(2,0)kurtosis
4.1788
(3,0)skewness
7.5755
(5,1)energy
4.1379
(4,0)skewness
6.6046
(5,0)kurtosis
3.8543
(3,0)kurtosis
4.965
(2,0)skewness
3.479
(5,4)kurtosis
4.8702
(3,0)skewness
3.3888
(5,21)kurtosis
4.8486
(5,4)kurtosis
3.1177
(4,2)energy
4.4572
(5,0)energy
3.1093
10
(2,0)kurtosis
4.1856
(4,2)energy
3.0972
The feature spaces are then constructed using features with highest J values as
highlighted in Table 5.5. Figs. 5.14 (a) and (b) show the best feature spaces obtained
from signals with SNR levels of 0 and -5 respectively. It is seen that the features
extracted from such signals are not well separated in both feature spaces. Furthermore,
as the noise level gets higher, the quality of obtained feature clusters gets worse.
Therefore, it is crucial to suppress white noises present in the measured signals before
feature extraction and classification.
157
Fig. 5.14 Feature spaces obtained from signals of different SNR levels. (a) SNR=0; (b)
SNR=-5.
158
To investigate the relationship between node energy and energy in Fourier domain, the
power spectrum of a PD signal of type spacer is first built using Fast Fourier
Transform (FFT) as shown in Fig. 5.15. Subsequently, energy values in Fourier
domain are calculated for 62 frequency bands corresponding to the nodes of WPT tree.
They are computed from the power spectrum by summing up the square of FFT
coefficients of each frequency band, forming FFT_energy (1*62). FFT_ energy is then
compared with node energy that is computed from wavelet-packet-decomposition
coefficients (Section 5.2.2 C). As illustrated in Fig. 5.16, node energy is almost the
same as FFT_ energy. Therefore, it can be concluded that the Fourier domain energy
analysis is equivalent to node energy analysis, which is seen to be not sufficient for PD
identification as shown in Fig. 5.10 (b). The time-frequency information equipped with
wavelet packet transform is thus crucial for the current study.
159
160
5.5
CONCLUDING REMARKS
Comparative studies on features extracted from data with different noise levels show
that high level of white noises worsens the performance of the features. Among
features derived from decomposition coefficients, distribution-shape-related node
features are seen to be more effective than the other node features, such as node energy.
Further investigation of the relationship between node energy and power spectrum
reveals that Fourier domain energy analysis is equivalent to node energy analysis. Thus,
it can be concluded that wavelet-packet-based method outperforms methods solely in
time or frequency domain due to its time-frequency characteristics.
161
CHAPTER 6
PARTIAL DISCHARGE IDENTIFICATION USING
NEURAL NETWORKS
In previous chapters, high quality partial discharge features, namely ICA_feature and
WPT_feature, have been established from UHF signals through denoizing and feature
extraction. Based on the feature clusters as illustrated in Fig. 5.10 (a), PD identification
can be performed by experienced engineers. However, it is difficult to evaluate the
measured data by humans when the database gets larger and larger. On the other hand,
it has been found that the artificial neural networks perform more effective and reliable
classification than engineers, especially when multilayer perceptron (MLP) neural
network is employed [23, 26, 72].
162
6.1
In the past decades, several network architectures such as multilayer perceptron [26],
self-organizing map [70] and modular neural network [71] have been adopted to
classify PD sources of different types. In [72], three different types of neural networks,
namely multilayer perceptron, self-organizing map and learning vector quantization
network are studied and compared. In this study, multilayer perceptron (MLP) is
chosen due to its proven powerfulness and effectiveness for PD classification [72].
A brief introduction to MLP networks is first given in this section. Subsequently, the
construction and training of MLP are discussed. Lastly, the generalization issue of
MLP networks is studied.
163
1. There is a nonlinear activation function associated with each neuron and the
function must be smooth. The presence of nonlinearities is important because
otherwise the input-output relation of the network could be reduced to that of a
single-layer perceptron.
2. The network contains one or more layers of hidden neurons, which enable the
network to learn complex tasks by extracting progressively more meaningful
features from the input vectors.
3. The neurons are fully interconnected so that any element of a given layer feeds
all the elements of the next layer.
It is through the combination of these characteristics together with the ability to learn
from experience through training that the MLP derives its computing power. A review
of MLP is given in [66].
164
The number of neurons in input layer equals to the number of features used as the
input of MLP. Therefore, it is determined in Section 6.3 by comparative studies on the
performance of using different number of extracted features.
165
As the number of neurons in hidden layer is closely related to the generalization issue
of MLP, it will be discussed in the next section.
Corona
Spacer
Conductor
Enclosure
C. Type of Neuron
The type of a neuron is characterized by the type of activation function used in the
neuron. There are three functions commonly employed in MLPs, namely log-sigmoid,
tan-sigmoid and the linear function as shown in Fig. 6.1. For this study, the logsigmoid function is preferred as the relationship between input and output of MLP is
nonlinear and output of 0 or 1 is expected on the neurons in output layer. Thus, logsigmoid type neurons are employed in all of the layers.
166
Fig. 6.1 Activation functions. (a) log-sigmoid; (b) tan-sigmoid; (c) linear.
D. Training Algorithms
There are quite a few back-propagation algorithms available to be used to train the
MLP. Table 6.2 shows the algorithms compared in this study. A comprehensive review
of these algorithms is given in [73].
167
Description
Basic gradient
descent
(traingd)
Gradient
descent with
momentum
(traingdm)
Adaptive
learning rate
(traingda)
Adaptive
learning rate
with
momentum
(traingdx)
Resilient backpropagation
(trainrp)
Conjugate
gradient
(trainscg)
Quasi-Newton
(trainbfg)
LevenbergMarquardt
(trainlm)
Fig. 6.2 compares the convergence performance of the training algorithms. It can be
seen that MLP is not able to converge within 1000 epochs when trained with traingd,
traingdm and traingda. On the other hand, the resilient back-propagation (trainrp)
algorithm is seen to achieve the best convergence and thus adopted in this study.
Details of the resilient back-propagation algorithm are given in Appendix E.
168
(1) When the maximum number of iterations is reached. It is set to 1000 in this
study.
(2) When the mean squared error (MSE) between the network outputs and the
target outputs drops below the goal, which is set to 0.01 in this study.
169
170
After extensive studies, the configuration of the MLP network is set as in Table 6.3. It
can be seen that a very simple MLP is able to perform PD identification successfully
due to the high quality of the extracted features.
Setting
Type of neuron
Log-sigmoid
2
2 (when ICA_feature is used)
171
Clearly, the third factor is application-oriented. As far as the first factor is concerned,
an effective feature extraction, such as the ICA-based or WPT-based schemes, will
ensure good generalization by reducing the length of each training vector in the
training set.
The extracted feature set (ICA_Feature or WPT_feature) is usually divided into two
sets for determining the weights during the MLP training and estimation of
generalization error during testing. One way of forming the training and test sets is to
randomly divide the ensemble into two sets. A better method for estimating the
generalization error, known as leave-one-out, is chosen to avoid the possible bias
introduced by relying on any particular test or training set after division. The method
is chosen because it maximizes the size of the training set by employing all the 80*N
(N denotes the length of each feature vector) data for training the MLP weights.
As illustrated in Fig. 6.4, the method first splits the feature set (size of 80*N) into a
training set (size of 79*N) and a test set (size of 1*N). Then the MLP is trained using
the 79*N training set and tested with the 1*N test set. The mean squared error on test
set is calculated and denoted as e1. The above process is then applied to all the other
combinations of training and test sets. As a result, 80 values of mean squared errors (e1,
e2 e80) of the test sets are obtained. Subsequently, the generalization error Etest is
calculated by averaging (Fig. 6.4). Once the generalization error is computed, training
is re-applied on the 80*N data set to determine the MLP weights.
172
Generalization of MLP also depends on the number of neurons in the hidden layer. If
there are not enough neurons in the hidden layer, the MLP network may not have
sufficient discriminative power to correctly classify the signals. On the other hand, if
too many neurons are used in hidden layer, the MLP may overfit the training data,
leading to large error on the new data. Therefore, experiments are also carried out with
173
different numbers of hidden neurons. The number, which gives the smallest
generalization error, is chosen for classification (Section 6.3).
6.2
Experimental results using various features as input of MLP are presented and
compared. Determination of the best MLP network structure is investigated by
comparative studies.
The best number of hidden neurons is chosen according to the minimum generalization
error calculated by the leave-one-out method as described in Section 6.2.3. Table 6.4
summarizes the results obtained from using different number of hidden neurons. The
generalization error obtained from using different number of hidden neurons is shown
in Fig. 6.5. It can be seen that the MLP with 14 hidden neurons offers the best
generalization performance with respect to both the mean squared error and number of
misclassified patterns. Even in the best case, however, there are still seventeen patterns
out of eighty not classified correctly during testing.
174
After determining the structure of MLP, it is trained using all the 80*1000 data. As
illustrated in Fig. 6.6, the training converges in 70 epochs, taking 58.6 seconds on
Pentium-IV.
Averaged
convergence
epochs
Generalization
mean squared
error
Number of
Misclassified
patterns on test
2
4
6
8
10
12
1281
161
85
85
82
79
0.0669
0.0467
0.0434
0.0396
0.0387
0.0369
23/80
20/80
19/80
19/80
18/80
17/80
14
16
18
20
22
24
26
28
73
63
59
51
48
47
45
49
0.0320
0.0375
0.0382
0.0396
0.0386
0.0401
0.0421
0.0392
17/80
17/80
17/80
18/80
18/80
18/80
19/80
18/80
175
Fig. 6.6 Mean squared error during training when using pre-selected signals as input
176
Averaged
convergence
epochs
Generalization
mean squared
error
Number of
Misclassified
patterns on test
408
0.0522
11/80
125
0.0284
5/80
101
0.0223
3/80
85
0.0121
2/80
80
0.0156
2/80
78
0.0145
2/80
75
0.0230
3/80
72
0.0175
2/80
10
72
0.0234
3/80
11
70
0.0219
3/80
12
73
0.0258
4/80
13
70
0.0218
3/80
14
69
0.0245
3/80
15
71
0.0229
3/80
177
Using the 80*2 feature set, training of the MLP converges in 82 epochs as shown in
Fig. 6.8, which takes one second on Pentium-IV.
The performance of using additional independent components (>2) is also studied and
the results are summarized in Table 6.6. It can be seen that using additional
independent components does not seem to improve the performance of the MLP in
terms of speed and classification accuracy due to the dominance of the two most
dominating independent components.
178
Fig. 6.8 Mean squared error during training when using ICA_feature as input
Best
number of
neurons in
hidden
layer
Number of
Training
Generalization Misclassified
convergence
MSE
patterns on
time (s)
test
1.26
0.0146
2/80
1.51
0.0139
2/80
1.69
0.0136
2/80
1.37
0.0181
3/80
1.35
0.0130
2/80
1.83
0.0203
3/80
179
Table 6.7 Generalization performance of MLP using the first four WPT_feature
Number of
neurons in
hidden layer
Averaged
convergence
epochs
Generalization
mean squared
error
Number of
Misclassified
patterns on test
408
0.0236
3/80
152
0.0221
3/80
70
0.0118
1/80
64
0.0114
0/80
45
0.0115
0/80
41
0.0098
0/80
39
0.0102
0/80
37
0.0116
1/80
10
34
0.0110
0/80
11
31
0.0114
0/80
12
30
0.0112
0/80
13
29
0.0115
0/80
14
28
0.0112
0/80
15
28
0.0106
0/80
180
Using the 80*4 feature set, training of the MLP converges in 40 epochs as shown in
Fig. 6.10. It takes 1.02 second on Pentium-IV.
The performance of using different number of WPT features as input is also studied.
The MLP is not able to converge during training when only one feature is used as the
input of MLP. Thus, at least two features are required to classify PD. Table 6.8 shows
the classification performance of using two features chosen from Table 6.2 as the input
of MLP. It can be seen that the features with higher J values result in better
classification. This verifies the use of J criterion for selecting the most effective
features.
181
Fig. 6.10 Mean-squared error during training when using WPT_feature as input
Input of
MLP
1st& 2nd
feature
3rd & 4th
feature
5th & 6th
feature
7th & 8th
feature
th
9 & 10th
feature
Number of
Training
Generalization Misclassified
convergence
patterns on
MSE
time (s)
test
0.95
0.0111
0/80
1.14
0.0115
0/80
5.344
0.0118
1/80
18.872
0.0230
3/80
22.094
0.0280
4/80
182
Additional
input of MLP
J value of the
additional
feature
Generalization
MSE
Improvement
of
generalization
MSE
Number of
Misclassified
patterns on test
3rd feature
8.6909
0.0098
0.0013
0/80
4th feature
8.5927
0.0102
0.0009
0/80
5th feature
8.0492
0.0113
-0.0002
0/80
6th feature
7.7291
0.0114
-0.0003
0/80
7th feature
7.5266
0.0114
-0.0003
0/80
8th feature
6.6111
0.0115
-0.0004
0/80
9th feature
6.5075
0.0115
-0.0004
0/80
10th feature
6.1892
0.0117
-0.0006
0/80
183
Number of
WPT
features
Number of
neurons in
input layer
Best
number of
neurons in
hidden
layer
0.95
0.0111
0/80
1.04
0.0098
0/80
1.02
0.0096
0/80
1.005
0.0112
0/80
1.036
0.0113
0/80
1.005
0.0112
0/80
10
1.12
0.0113
0/80
11
1.088
0.0114
0/80
10
10
1.026
0.0113
0/80
Number of
Training
Generalization misclassified
convergence
patterns on
MSE
time (s)
test
As illustrated in Table 6.11, MLPs using WPT_feature and ICA_feature take only
0.186 s and 0.164 s respectively to identify a new set of data. The methods are
therefore potentially suitable for online applications.
184
Input type
Generalization
MSE
Training
convergence
time
(sec)
Pre-selected
signals
0.0320
58.6
2.541
ICA_feature
0.0121
0.164
WPT_feature
0.0098
1.02
0.186
*: Including all the processes, namely denoizing, feature extraction and MLP
classification
Table 6.12 compares the performance of the method developed in this research with
methods proposed in other published works. In [3, 23], phase-resolved (PRPD)
patterns are used as the PD features. Thus, at least a few seconds are required to form
the patterns. In addition, the computing time of the denoizing and classification
algorithm has to be added to the total identification time in [3, 23]. During the forming
PRPD patterns, more than one type of PD can take place in the GIS chamber, which
may lead to further misclassification as indicated by < in Table 6.12.
Table 6.12 Comparison of performance of different identification methods
Method
Speed (sec)
In this thesis
100%
0.186
In reference [3]
< 95%
>1
In reference [23]
< 85%
>1
185
6.3
CONCLUDING REMARKS
186
CHAPTER 7
PERFORMANCE ENSURENCE FOR PD
IDENTIFICATION
187
7.1
INTRODUCTION
188
7.2
PROCEDURE
FOR
ENSURING
ROBUSTNESS
OF
CLASSIFICATION
190
Fig. 7.2 Chosen signal sets for calculating independent components from extended
database (1)-corona; (2)- particle on the surface of spacer; (3),(5),(7),(9),(11)- particle
on conductor; (4),(6),(8),(10),(12)- free particle on enclosure.
PD-to-sensor distance: (1)-(4) one metre ; (5)-(6) 2.5 m; (7)-(8) 4.6 m; (9)-(10) 6 m;
(11)-(12) 7.8 m.
191
Fig. 7.3 Independent components obtained from FastICA for extended database
192
0.2465
ICAPD2
0.1763
ICAPD3
0.0489
ICAPD4
0.0224
ICAPD5
0.0208
ICAPD6
0.0200
ICAPD7
0.0158
ICAPD8
0.0089
ICAPD9
0.0075
ICAPD10
0.0068
ICAPD11
0.0064
ICAPD12
0.0047
193
largest J
2nd largest
3rd largest
4th largest
5th largest
Jsum
db1
db2
db3
db4
db5
db6
db7
db8
10.0856
7.0786
8.6245
11.1456
10.1421
9.5325
9.3223
9.0435
9.1245
6.9758
8.4653
10.0475
8.8542
8.0945
8.8945
8.2873
8.3664
5.0234
7.3424
9.4579
8.4636
6.7543
7.5641
7.8633
6.0312
5.012
7.2756
8.9878
7.4263
5.2351
6.8753
7.3546
5.6563
4.2765
6.6654
8.1575
4.7445
4.5754
6.0985
7.1468
39.264
28.3663
38.3732
47.7963
39.6307
34.1918
38.7547
39.6955
db9
db10
12.2098
11.0021
11.2892
8.2235
8.941
7.3621
8.6021
6.5431
7.955
5.9878
48.9971
39.1186
sym4
sym5
9.1456
9.2978
8.9673
8.5003
8.8043
8.1023
7.9253
7.4675
6.5454
7.2564
41.3879
40.6243
sym6
sym7
sym8
sym9
sym10
9.0298
8.8168
8.9234
9.3869
9.1923
8.9465
8.4289
8.3765
8.2023
7.2342
8.1423
6.9312
8.3234
6.2406
6.0967
7.3034
5.8256
8.2745
5.4852
6.0574
6.2344
5.4896
8.2405
5.0984
5.6456
39.6564
35.4921
42.1383
34.4134
34.2262
coif1
coif2
coif3
coif4
coif5
10.9939
10.8934
8.992
8.8914
8.9131
9.3241
10.0252
8.7686
8.2463
6.7842
8.0456
9.4344
7.6546
8.0422
6.5368
7.6574
8.7687
7.0675
7.1389
6.0797
7.1121
7.6733
6.7832
6.3574
5.4356
43.0431
46.795
39.2659
38.6762
33.7494
194
Subsequently, node features defined in Section 5.2.2 namely node kurtosis, node
skewness, node energy, node median and node mean are calculated for all nodes in
WPD trees, forming feature trees as illustrated in Fig. 5.6. The classification capability
of node features are then evaluated using J criterion that is defined in equation 5.9.
Table 7.3 shows the node features with the highest J values. Comparing with Table 5.2,
it can be seen that the extracted features are identical and only their sequence in the
tables are slightly different. This suggests that WPT features are robust for data having
different PD-to-sensor distances. In addition, the first four features in Table 7.3 have J
values larger than the critical J value (Jcr) defined in Section 6.3.3, which indicates
good classification capability. Classification performance of the features in Table 7.3 is
further assessed in Section 7.3.2 (B).
feature
J value
(5,21)kurtosis
12.2098
1.3125 G 1.375 G
(1,0)skewness
11.2892
01G
(5,1)energy
8.941
62.5 M 125 M
(5,19)skewness
8.6021
1.1875 G 1.25 G
(5,0)kurtosis
7.955
0 62.5 M
(5,11)skewness
7.3879
687.5 M 750 M
(5,20)median
7.1258
1.25 G 1.3125 G
(3,0)kurtosis
6.9012
0 250 M
(4,0)skewness
6.326
0 125 M
10
(4,2)energy
6.2094
250 M 375 M
195
7.3
196
Fig. 7.4 Impact of distance between PD source and sensor on original ICA_feature
(a) 2.5 m; (b) 4.3 m; (c) 6 m; (d) 7.8 m.
197
The performance of MLP trained with the original ICA features as in Chapter 6 is
investigated with data obtained from the four PD-to-sensor distances. As illustrated in
Table 7.4, an overall classification performance of 93.75 % is achieved. In the worst
case, where the PD-to-sensor distance is 7.8 m, six out of fifty patterns are
misclassified. In all the misclassified cases, patterns of enclosure are classified as
spacer. This may be due to the small margin between ICA feature clusters of
enclosure and spacer as shown in Fig. 4.15.
Table 7.4 Performance of original MLP with ICA_feature on data having different PDto-sensor distances
Distance (m)
Number of misclassified
patterns
Correct classification
rate
2.5
1/50
98 %
4.3
2/38
94.7 %
2/38
94.7 %
7.8
6/50
88 %
Subtotal
11/176
93.75 %
Table 7.5 shows the MLP performance on data with different PD-to-sensor distances
using more independent components. It can be seen that using additional independent
components does not improve the performance of the MLP in terms of overall and
worst case correct classification rate.
198
Table 7.5 Performance on data with different PD-to-sensor distances using more
independent components
Number of used
independent
components
Overall correct
classification
rate
Correct
classification rate
in the worst case
93.75 %
88 %
93.75 %
88 %
93.75 %
88 %
93.18 %
86 %
93.18 %
86 %
93.18 %
86 %
Using re-selected features as input, the MLP is re-trained and re-tested on the extended
database. During re-training, the convergence speed and network structure remain the
same as in Chapter 6. On the other hand, the performance of the updated MLP on
testing has been improved as shown in Table 7.6. As observed, the most obvious
improvement is obtained for the case of 7.8 metre PD-to-sensor distance. In addition,
199
the overall performance is also improved by 3.4%. It is shown in Table 7.7 that using
additional independent components does not improve the performance of the re-trained
MLP.
Fig. 7.5 Feature clusters formed by re-selected ICA_feature for extended database
200
Number of misclassified
patterns
Correct classification
rate
2/80
97.5%
2.5
1/50
98 %
4.3
1/38
97.4 %
1/38
97.4 %
7.8
2/50
96 %
Subtotal
7/256
97.3 %
Overall correct
classification
rate
Correct
classification rate
in the worst case
(distance = 7.8 m)
97.3 %
96 %
97.3 %
96 %
97.3 %
96 %
97.3 %
96 %
97.3 %
96 %
97.3 %
96 %
96.9%
94 %
10
96.9%
94 %
11
96.9%
94 %
12
96.9%
94 %
201
202
Fig. 7.6 Impact of distance between PD source and sensor on original WPT_feature. (a)
2.5 m; (b) 4.3 m; (c) 6 m; (d) 7.8 m.
203
Table 7.8 shows the corresponding J values for these four distances, which are higher
or close to Jcr (Chapter 6) indicating a good classification performance.
2nd feature
3rd feature
4th feature
2.5 (m)
12.1328
11.8245
8.6005
8.5927
4.3 (m)
12.1134
11.8109
8.6003
8.5925
6 (m)
11.9907
11.7854
8.5896
8.5925
7.8 (m)
11.9124
11.7565
8.5899
8.5922
The performance of MLP trained with the data measured one metre away from source
is tested with data obtained from the four PD-to-sensor distances. As shown in Table
7.9, an overall performance of 98.3% has been achieved, which is better than that
obtained from original ICA-based MLP. In addition, only two patterns are
misclassified in the worst case.
Table 7.9 Generalization performance of the original MLP on data with different PDto-sensor distance
Distance (m)
Number of Misclassified
patterns
Correct Classification
Rate
2.5
0/50
100 %
4.3
0/38
100 %
1/38
97.4 %
7.8
2/50
96 %
Subtotal
3/176
98.3 %
204
Number of misclassified
patterns
Correct classification
rate
0/80
100 %
2.5
0/50
100 %
4.3
0/38
100 %
0/38
100 %
7.8
0/50
100 %
Subtotal
0/256
100 %
205
7.4
CONCLUDING REMARKS
206
CHAPTER 8
CONCLUSIONS AND FUTURE WORK
This chapter concludes the study on PD denoizing and identification in GIS system
which has been presented in the former chapters. Based on the results of this research,
the conclusions are summarized and followed by recommendations for future work.
207
8.1
CONCLUSION
GIS has been used worldwide for many years because of its low maintenance and
compact size. This has made it an attractive option in many applications. However, on
the downside, GIS has problems relating to the sharp deterioration of the dielectric
strength of its insulation gas (SF6) due to PD. On the other hand, PD is caused by the
extreme field intensity being built around the sharp edge of small particles which may
attach to the bus conductor, the enclosure or the insulation spacer. In industry
applications, these faults could be attributed to mechanical faults during manufacture,
protrusions on the enclosure, the HV conductor as well as free moving particles.
Hence, the extreme field intensity caused by particles may produce PD inside the GIS,
which may lead to the failure of the system.
Preventing the failure of a GIS requires a reliable and efficient PD measuring and
diagnostic technique, which is able to detect and identify signals from harmful defects.
Thus, a prompt warning message can be given before the breakdown occurs. However,
the two major issues associated with such diagnostic systems, namely influence of
noise and the extraction of effective features from measured data, must be addressed to
achieve a successful diagnosis of PD activities in GIS. In this thesis, a novel PD
diagnostic system is developed based on UHF signals with special emphasis on
denoizing and feature extraction from the PD signal.
208
It has been shown that the proposed method offers better denoizing compared to DWT
and WPT with the standard entropy-based criterion. Using the proposed method,
successful and robust denoizing is achieved for PD signals having various SNR levels.
Successful restoration of the original waveform facilitates the subsequent pre-selection
process and enables extraction of reliable features for PD identification.
In this research, external corona discharge is considered as one of the typical pulseshaped noises and addressed in this thesis. In practical GIS, if other pulse-shaped
209
noises, such as switching over-voltages, are present and produce significant signals
within UHF ranges, the MLP neural network will label them as unknown signals. In
such cases, further investigation of the noises maybe required. However, drastic
changes should not be required for the proposed method.
Various PD features are derived from UHF signals and form a solid basis for current
and future work on PD identification. The first category of PD features, namely
ICA_Feature is extracted in the time domain using Independent Component Analysis.
Using ICA_Feature, successful identification of PD is achieved with limitation of
small between-class margins due to the time-domain nature of ICA. White noise
present in the measured signals is seen to reduce the discriminating capability of the
extracted features. This shows the importance of denoizing. When the distance
between PD source and UHF sensor varies, re-selection of the ICA_feature and retraining of MLP are seen to have improved the correct classification rate to 97.3%,
which ensures the robustness of the proposed method.
210
Features extracted in the time-frequency domain using the wavelet packet transform
(WPT_Feature) form the second category of PD features. Taking advantage of the
additional frequency information included with the wavelet packet transform,
WPT_Feature exhibits a large margin between feature clusters of different classes,
which
indicates
good
classification
performance.
Among
subcategories
of
WPT_Feature, distribution-shape based node features are more effective than other
node features such as node energy. Based on this it can be concluded that the waveletpacket-based method outperforms methods which operate solely in the time or
frequency domain (FFT) due to its time-frequency characteristic. The best wavelet for
feature extraction is db9, which is different from that used for denoizing namely
sym8. This indicates that the selection of wavelet is application-dependent.
Investigation of the impact of noise levels on the effectiveness of features confirms
that denoizing is crucial for reliable feature extraction and classification. For various
PD-to-sensor distances, the same set of features is selected by WPT-based method.
However, re-training of the MLP improves the classification performance, which
verifies the re-selection and re-training scheme for quality assurance.
Owing to the compactness and high quality of the extracted features, successful and
robust PD identification is achieved using a very simple MLP network. Particularly,
MLP with WPT-based preprocessing achieves 100% correct classification on all PD
activities at all location within the given GIS configuration after re-training. This
verifies the robustness of the WPT-based feature extraction. The methods developed in
this project can be used either as a stand-alone system or as a supplement to the
existing PRPD system to improve its performance. Moreover, both the WPT- and ICAbased PD diagnostic methods are potentially suitable for online applications.
211
8.2
(1)
Major PD-causing defects [7, 10] in SF6 have been considered in this research.
However, PD may also be caused by cavity or metallic intrusion within an epoxy resin
support barrier. Although the possibility of encountering these defects is very low in
practice [90], further investigation of these defects may be required to develop a
comprehensive PD diagnostic system.
The amplitude and rise time of PD current pulses produced by defects in solid differ
from those produced in SF6 due to the different nature of the insulation material [7, 90,
91]. On the other hand, the shape of PD current pulses determines the waveform of
corresponding UHF signals [92]. Thus, UHF signals excited by PD in solid and PD in
SF6 should have very different waveforms and time-frequency characteristics. This
indicates that good classification may be achieved without drastic changes on the
methods developed in this thesis. Re-selection of features and re-training of MLP may
be required to achieve satisfactory identification.
Apart from PD-to-sensor distance, dimension and shape of the particle may affect the
measured UHF signals. In this research, however, a typical particle which can cause
PD of critical amplitude without leading to immediate breakdown is employed to
212
simulate the defects. Although the dimension and shape of the defect may change the
shape of PD pulse, this has no significant influence on the basic principle of the
proposed techniques. Re-selection of features and re-training of MLP may be required
to achieve satisfactory identification.
(2)
Speed Improvement
In this research project, the entire PD denoizing and identification scheme is developed
using Matlab language on a PC platform. Since Matlab is an interpreted language
instead of a compiled language (such as C), its speed will always lag behind that of a
custom program written in a language like C. Therefore, converting the Matlab
programs into C or C++ will shorten the response time of the diagnosis system. Further
improvement of the speed may be achieved by implementing the scheme on a Digital
Signal Processor (DSP).
(3)
The new PD denoizing and identification methods are developed and tested for a
simple GIS configuration, which consists of a straight-through busbar, enclosure and
two spacers. However, there are more complicated GIS configurations such as T
junction, gas circuit breaker and disconnector in practical GIS systems. Therefore, the
performance of the methods developed in this project should be verified for these
configurations. Further development of the proposed methods may be required on new
measured data to ensure satisfactory performance for the practical GIS system.
(4)
Study on PD Location
213
214
REFERENCES
[1]
[2]
Judd M.D., Farish O., Hampton B.F., The excitation of UHF signals by partial
discharges in GIS, IEEE Trans. On Dielectrics and Electrical Insulation, vol. 3,
no. 2, pp. 213-227, Apr 1996.
[3]
Pearson J.S., Farish O., Hampton B.F., Judd M.D., Templeton D., Pryor B.M.,
Welch I.M., Partial discharge diagnostics for gas insulated substations, IEEE
Trans. On Dielectrics and Electrical Insulation, vol. 2, no. 5, pp. 893-905, Oct
1995.
[4]
[5]
[6]
Nicholas de Kock, Branko Coric and Ralf Pietsch, UHF PD detection in gasinsulated switchgear suitability and sensitivity of the UHF method in
comparison with the IEC 270 method, IEEE Electrical Insulation Magazine, vol.
12, no.6, pp. 20-26, Nov/Dec 1996.
[7]
Baumgartner R., Fruth B., Lanz W., Pettersson K., Partial discharge - Part X: PD
in gas-insulated substations measurement and practical considerations, IEEE
Electrical Insulation Magazine, vol. 8, no.1, pp. 16-27, Jan/Feb 1992.
[8]
Sellars A.G., Farish O. and Hampton B.F., Assessing the risk of failure due to
particle contamination of GIS using the UHF technique, IEEE Trans. On
Dielectrics and Electrical Insulation, vol. 1, no. 2, pp. 323-331, April 1994.
[9]
Sellars A.G., Farish O. and Peterson M.M., UHF detection of leader discharges
in SF6, IEEE Trans. On Dielectrics and Electrical Insulation, vol. 2, no. 1, pp.
143-154, Feb. 1995.
[10]
[11]
[12]
[13]
[14]
[15]
[16]
[17]
[18]
[19]
[20]
[21]
Borsi H., Gockenbach E., Wenzel D., Separation of partial discharges from
pulse-shaped noise signals with the help of neural networks, IEE Proc. Science,
Measurement and Technology, vol. 142, no. 1, pp. 69-74, Jan 1995.
[22]
[23]
[24]
216
[25]
[26]
D. J. Hamilton, J. S. Pearson, Classification of partial discharge sources in gasinsulated substations using novel preprocessing strategies, IEE Proc. Science,
Measurement and Technology, vol. 144, no. 1, pp. 17-24, Jan 1997.
[27]
[28]
[29]
Sudarshan T.S. and Dougal R.A., Mechanisms of surface flashover along solid
dielectrics in compressed gases: a review, IEEE Trans. On Electrical Insulation,
vol. 21, no. 5, pp. 727-746, 1986.
[30]
Chakrabarti A.K., Van Heeswijk R.G., Srivastava K.D., Free particle initiated 60
Hz breakdown at a spacer surface in a gas insulated bus, IEEE Trans. On
Electrical Insulation, vol. 24, no. 4, pp. 549-560, 1989.
[31]
[32]
[33]
Shen M., Sun L., Chan F.H.Y., Method for extracting time-varying rhythms of
electroencephalography via wavelet packet analysis, IEE Proc. Science,
Measurement and Technology, vol. 148, no. 1, pp. 23 -27, Jan 2001.
[34]
Carnero B., Drygajlo A., Perceptual speech coding and enhancement using
frame-synchronized fast wavelet packet transform algorithms, IEEE Trans.
Signal processing, vol. 47, no. 6, pp. 1622 -1635, Jun 1999.
[35]
Zixiang Xiong, Ramchandran K., Orchard M.T., Wavelet packet image coding
using space-frequency quantization, IEEE Trans. Image Processing, vol. 7, no. 6,
pp. 892 -898, Jun 1998.
[36]
[37]
Jaehak Chung; Powers, E.J.; Grady, W.M.; Bhatt, S.C., Power disturbance
classifier using a rule-based method and wavelet packet-based hidden Markov
model, IEEE Trans. Power Delivery, vol. 17, no. 1, pp. 233 241, Jan. 2002.
217
[38]
Littler T.B., Morrow D.J., Wavelets for the analysis and compression of power
system disturbances, IEEE Trans. Power Delivery, vol. 14, no. 2, pp. 358 -364,
Apr 1999.
[39]
Hamid, E.Y.; Mardiana, R.; Kawasaki, Z.-I., Method for RMS and power
measurements based on the wavelet packet transform, IEE Proc. Science,
Measurement and Technology, vol.149, no. 2, pp. 60 66, Mar. 2002.
[40]
Xianguing Liu, Pei Liu, Shijie Cheng, A wavelet transform based scheme for
power transformer inrush identification, Power Engineering Society Winter
Meeting, 2000. vol. 3, pp. 1862 -1867, Jan 2000.
[41]
X. Ma, C. Zhou, and I. J. Kemp, Wavelets for the analysis and compression of
partial discharge data, Electrical Insulation and Dielectric Phenomena Conf.,
Annual Report, pp. 329 334, 2001.
[42]
M. Misiti, Y. Misiti, G. Oppenheim, and J. Poggi, Wavelet Toolbox For Use with
MATLAB, The MathWorks, Inc., 1996.
[43]
[44]
B. Castro; D. Kogan; A.B. Geva, ECG feature extraction using optimal mother
wavelet, The 21st IEEE Convention of the Electrical and Electronic Engineers in
Israel, 2000, pp. 346 350.
[45]
[46]
[47]
[48]
[49]
[50]
218
[51]
Gerbex S., Cherkaoui R., Germond A.J., Optimal location of multi-type FACTS
devices in a power system by means of genetic algorithms, IEEE Trans. Power
Systems, vol. 16 , no. 3, pp 537544, Aug. 2001.
[52]
Nicolaisen J., Petrov V., Tesfatsion L., Market power and efficiency in a
computational electricity market with discriminatory double-auction pricing,
IEEE Trans. Evolutionary Computation, vol. 5 , no. 5, pp 504523, Oct. 2001.
[53]
[54]
[55]
Asghar Akbari, Peter Werle, Hossein Borsi, and Ernst Gockenbach, Transfer
Function-Based Partial Discharge Localization in Power Transformers: A
Feasibility Study, IEEE Electrical Insulation Magazine, vol.18, no.5, pp. 22 -32,
Sep/Oct 2002.
[56]
[57]
[58]
[59]
[60]
[61]
[62]
[63]
219
[64]
Nordberg J., Nordholm S., Grbic N., Mohammed A., Claesson I., Performance
improvements for sector antennas using feature extraction and spatial interference
cancellation, IEEE Trans. Vehicular Technology, vol. 51, no. 6, pp. 1685 -1698,
Nov 2002.
[65]
[66]
[67]
[68]
Yen G.G., Lin K.-C., Wavelet packet feature extraction for vibration
monitoring, IEEE Trans. Industrial Electronics, vol. 47, no. 3, pp. 650-667, June
2000.
[69]
[70]
Yu Han and Y.H. Song, Using Improved Self-Organizing Map for Partial
Discharge Diagnosis of Large Turbogenerators, IEEE Trans. energy conversion,
vol. 18, no. 3, pp. 392 399, Sep. 2003.
[71]
Tao Hong and M.T.C. Fang, Detection and Classification of Partial Discharge
Using a Feature Decomposition-Based Modular Neural Network, IEEE Trans.
instrumentation and measurement, vol. 50, no. 5, pp. 1349 1354, Oct. 2001.
[72]
[73]
Howard Demuth, Mark Beale, Neural Network Toolbox for Use with MATLAB,
The MathWorks, Inc., 2001.
[74]
[75]
[76]
[77]
[78]
[79]
[80]
[81]
[82]
[83]
[84]
[85]
[86]
Morcos M.M., Ward S.A., Anis H., On the detection and control of metallic
particle contamination in compressed GIS equipment, Electrical Insulation and
Dielectric Phenomena Conference, vol. 2, pp. 476 480, 1998.
[87]
[88]
[89]
[90]
Sellars A.G., Farish O., Hampton B.F. and Pritchard L. S., Using the UHF
technique to investigate PD produced by defects in solid insulation, IEEE Trans.
On Dielectrics and Electrical Insulation, vol. 2, no. 3, pp. 448-459, June 1995.
[91]
Yonghong Cheng, Chengyan Ren and Xiaolin Chen, Study on the partial
discharge characteristics in different solid and gaseous dielectric by simulation,
221
Sellars A.G., MacGregor S.J. and Farish O., Calibrating the UHF technique of
partial discharge detection using a PD simulator, IEEE Trans. On Dielectrics
and Electrical Insulation, vol. 2, no. 1, pp. 46-53, Feb. 1995.
222
APPENDICES
APPENDIX A
APPENDIX B
APPENDIX C
Genetic Algorithm
APPENDIX D
APPENDIX E
APPENDIX F
223
APPENDIX A
UHF Measure of Partial Discharge in GIS
Fig. A.1 Typical UHF signal corresponding to single PD current pulse. (a) PD current
pulse; (b) UHF signal results from a PD current pulse shown in (a).
224
A.1
Equipment Specifications
Equipment
Test Chamber
Sensor
Parameter
Description
Inner diameter
180 mm
Outer diameter
880 mm
Length
10.3 m
SF6 pressure
0.2 MPa
Spacer
cone type (x 2)
Type
Frequency
Sensitivity
0.5 pC
Inner diameter
43.4 mm
Outer diameter
100 mm
Operating
Relative humidity
95% RH
Model
Tektronix TDS784D
Bandwidth
1 GHz
No. of channels
Digital oscilloscope
1 channel: 4 GS/s
Sampling rate
2 channels: 2 GS/s
3 or 4 channels: 1 GS/s
Notebook PC
Maximum record
8M
Model
Toshiba Tecra A2
CPU
Memory
256 MB
Hard disk
40 GB
Display
225
A.2
Based on the configuration of Fig. A.2, a conical UHF coupler is employed to detect
PD signals. The disk size of the conical coupler must be arranged according to the
frequency range of interest, since it determines the frequency characteristics of the
coupler. On the other hand, the modes of pulse propagation along a coaxial system are
the combination of the transverse electric and magnetic (TEM) mode, the transverse
electric (TE) mode and transverse magnetic (TM) mode respectively. According to the
configuration of the GIS section under test, PD pulse propagating in TEM mode may
peak at around 100 MHz or upwards while pulses in TE or TM modes may peak in the
range of 700 1100 MHz [88]. However, the mode of propagating pulse is dependent
on whether the location of the PD source is on the bus conductor. Therefore, in order
to have full coverage over the frequency range of the pulse propagating modes, the
coupler with disk diameter of 43 mm is selected for the measurement.
Fig. A.2 The layout of the test setup with a section of an 800 kV GIS
226
A.3
Experimental Set-up
UHF resonance signals used for the present study are measured from an 800 kV GIS
chamber that has a total length of 20 m [89]. The test chamber is formed by isolating
a 10.3 m section of the GIS using gas-tight conical epoxy barriers. It is filled with SF6
gas at 0.2 MPa for the entire test. Power frequency is 50 Hz.
To detect the UHF signals caused by PD, an internal coupler electrode type sensor is
incorporated into a hatch cover plate on the side of the test chamber. In addition to the
sensor, the measuring system consists of a 3-meter long coaxial cable and a high-speed
digital oscilloscope (TDS784D) enabling the system to acquire the high frequency
components of the UHF signal as shown in Fig. A.2. The characteristic impedance of
the sensor is 50 , which is the same as the characteristic impedance of the cable and
the oscilloscope. The triggering voltage of the digital oscilloscope is set to a level well
above the background noise, enabling the capture of large UHF signals. The sampling
rate of the oscilloscope is fixed at 4 giga-samples per second when measuring and
recording the UHF signals.
To generate PD in SF6, artificial defects are made using an aluminium needle with its
length and section diameter of 10 and 0.2 mm respectively. As illustrated in Fig. A.2,
the needle is placed on but not fixed to the enclosure to simulate the free particle. For
the other two defects, it is either attached to the busbar or spacer surface using the
minimum amount of cyanoacrylate adhesive, ensuring that the ends of the needle are
clean and in contact with the surfaces. The distance between the needle and the sensor
varies from 1 to 7.8 m to study the impact of signal attenuation.
227
The system is energized using a 2300 kV, 10 MVA single phase metal-clad
transformer. Test voltage varies in the range from 40 to 160 kV rms. The PD inception
voltages for the defects of free particle, particle on conductor and particle on the
surface of spacer are 73, 110 and 158 kV rms respectively.
As illustrated in Fig. A.1, UHF signals excited by a single PD current pulse are
measured for this study. The UHF signals usually last for several hundred nanoseconds.
Typical waveforms of measured signals (including corona) and their frequency content
obtained from Fast Fourier Transform (FFT) are shown in Figs. A.3 and A.4
respectively. In this study, data measured one meter away from the PD source, as
shown in Table A.2, are used for developing the denoizing and source recognition
method. In addition, the robustness of developed method is verified using data
measured from other PD-to-sensor distances as shown in Table A.3.
228
Fig. A.3 Typical waveform of measured signal (a) corona; (b) particle on the surface
of spacer; (c) particle on conductor; (d) free particle on enclosure.
229
Fig. A.4 Frequency content of measured signal (a) corona; (b) particle on the surface
of spacer; (c) particle on conductor; (d) free particle on enclosure.
Number of signals
Corona
14
30
Particle on conductor
20
16
230
Number of signals
2.5
30
4.3
30
30
7.8
30
2.5
20
4.3
7.8
20
Particle on conductor
Free particle on
enclosure
231
APPENDIX B
Discrete Wavelet Transform (DWT) and Wavelet Packet
Transform (WPT)
j , k = f ( x)
x =1
1
2j
x k2 j
j
2
(B.1)
where N is the length of the discrete signal f ( x) . j and k represent the scaling
(decomposition level) and shifting (translation) constant respectively. j runs from 1 to
jmax, which is given by
jm ax
x k2j
j
2
N .
(baby wavelet) of the original mother wavelet ( x ) . The resultant wavelet coefficients
thus reflect the resemblance between the signal and the baby wavelet.
Fourier Transform. There are two characteristics required for any function to be
considered as a mother wavelet:
1. The function must have zero average;
2. The function must decay quickly at both ends.
There are actually a large number of functions with such features available. However,
the Mallat algorithm of DWT, which has been applied in this research, demands
additional requirements as discussed below.
232
In 1988, a new DWT algorithm, which provides fast wavelet decomposition and
reconstruction, was developed by Mallat [45]. Fig B.1 illustrates this wavelet
decomposition algorithm. It is actually a classical scheme in the signal processing
community, known as a two-channel sub-band coder using the conjugate quadrature
filters or quadrature mirror filters (QMF) [45]. It decomposes the original signal f ( x)
into coefficients of low-frequency (approximation coefficient or cAi) and highfrequency (detail coefficient or cDi) components.
According to the algorithm, there are two properties that allow the mother wavelet
( x) in equation A.1 to have this fast algorithm:
The scaling function ( x) is used to generate a pair of high-pass and low-pass filters,
namely the g and h in Fig B.1. Using these filters, DWT generates the cAi and cDi at
233
To reconstruct the original signal, the inverse discrete wavelet transform (IDWT) is
carried out involving two steps as the decomposition, namely the upsampling and
filtering of the wavelet coefficients. The upsampling process means lengthening a
signal component by inserting zeros between samples. Subsequently, the upsampled
coefficients will be input into the reconstruction filters to generate the reconstructed
signal.
The wavelet coefficient cAi contains lower half frequency content of the
decomposition filter input, and the corresponding cDi contains the upper half
frequency content. In addition, these coefficients is well localized in time domain, so
that both time and frequency information of the original signal are kept. Furthermore,
the coefficients have greater resolution in time for high frequency components and
greater resolution in frequency for low frequency components of a signal. The highest
frequency content contained in the wavelet coefficients is up to
f0
2
, where f0 is the
sampling frequency of the original signal. This limitation is attributed to the Nyquist
sampling criterion. Fig. B.2 shows the coverage of the time frequency plane for the
DWT coefficients.
234
Fig. B.2 The coverage of the time-frequency plane for DWT coefficients
DWT coefficients of four level decompositions are illustrated in Fig. B.2. As observed,
cD1 contains from
f0
2
to
f0
4
f0
4
to
f0
8
resolution in time (half that of cD1). In brief, as the decomposition level increases, the
time resolution decreases, while the frequency resolution increases.
235
236
APPENDIX C
Genetic Algorithm
Genetic algorithms (GAs) were formally introduced in the United States in the 1970s
by John Holland at University of Michigan. They are search algorithms based on the
mechanics of natural selection and natural genetics. The fundamental principle is that
the fittest member of a population has the highest probability for survival. Generally,
GAs have the following components [49]:
In each candidate solution, the decision variables to the problem can be binary-coded
and concatenated as a string (chromosome). Strings are grouped into sets known as
populations. Successive populations are called generations. GAs first form an initial
population randomly. Then each string is evaluated to find its fitness by substituting
into the fitness function. Based on the merits of different strings, a new set of strings
(population) is created using GA operators, namely reproduction, crossover and
mutation. The above process is iterated until a pre-specified stop criterion such as the
maximum number of generations has been reached. Details of the GA operators are
discussed in the following sections.
237
C.1
Reproduction
The idea behind the roulette wheel selection technique is that each individual is given a
chance to become a parent in proportion to its fitness. It is called roulette wheel
selection as the chances of selecting a parent can be seen as spinning a roulette wheel
with the size of the slot for each parent being proportional to its fitness. Obviously
those with the largest fitness (slot sizes) have more chance of being chosen. Thus, it is
possible for one member to dominate all the others and get selected a high proportion
of the time. Roulette wheel selection can be implemented as follows:
1. Sum the fitness of all the population members. Call this TF (total fitness).
2. Generate a random number n, between 0 and TF.
3. Return the first population member whose fitness added to the preceding
population members is greater than or equal to n.
C.2
Crossover
Crossover is a process that randomly takes two reproduced strings (parents) and
exchanges portions of the strings to generate two new strings (offspring) with a
238
239
C.3
Mutation
Selection and crossover alone can obviously generate a large amount of differing
strings. However, depending on the initial population chosen, there may not be enough
variety of strings to ensure the GA sees the entire problem space. Or the GA may find
itself converging on strings that are not quite close to the optimum it seeks due to a bad
initial population. Above issues are addressed by introducing a mutation operator into
GA. Mutation randomly alters each bit with a small probability, typically less than 1%.
This operator introduces innovation into the population and helps prevent premature
convergence on a local maximum.
240
APPENDIX D
Independent Component Analysis and FastICA Algorithm
In this research, it is reasonable to make such assumptions, as the factors that affect the
measured signals such as sensor response, propagation path and defects are
independent and usually nongaussian distributed.
In practice, there are several approaches to find the unknown independent components,
which use certain statistical properties of the components, such as nongaussianity,
temporal structure, cross-cumulants and nonstationarity [76]. In this research, the
241
(D.1)
where v is a Gaussian variable of zero mean and unit variance and G is any nonquadratic function.
scheme. Denote by g the derivative of the function G used in (D.1). Then the FastICA
algorithm is given as follows:
242
t 1
wt wt / wt
wt wt ( wtT w j ) w j
j =1
In practice, the expectations in FastICA are replaced by their estimates, namely the
sample means.
243
APPENDIX E
General Introduction to Neural Networks
A neural network is an information processing paradigm that was inspired by the way
biological nervous systems, such as the brain, process information. The field goes by
many names, such as connectionism, parallel distributed processing, neuro-computing,
natural intelligent systems, machine learning algorithms, and artificial neural networks.
It is an attempt to simulate the multiple layers of simple processing elements called
neurons within specialized hardware or sophisticated software. Each neuron is linked
to its neighbors with varying coefficients of connectivity that represent the strengths of
these connections. Learning is accomplished by adjusting these strengths to cause the
overall network to output appropriate results.
The function of neural networks is largely dependent on the network structure that is
determined by the way neurons connected. There are basically four types of
connections as follows:
1. Feedforward connections:
In this network structure, data from neurons of a lower layer are propagated
forward to neurons of an upper layer via feedforward connections. Multilayer
perceptron is a typical feedforward neural network.
2. Feedback Connections:
244
Feedback networks bring data from neurons of an upper layer back to neurons
of a lower layer. This type of connection is usually employed in neuralnetwork-based controller.
3. Lateral Connections:
Neurons of the same layer are interconnected. One typical example of a lateral
network is the self-organizing map.
4. Time-delayed Connections:
Delay elements may be incorporated into the connections to yield temporal
dynamics models. They are more suitable for temporal pattern recognitions.
One of the most interesting properties of a neural network is the ability to learn from
its environment in order to improve its performance over time. Generally, the learning
methods of neural networks can be classified into two categories:
1. Supervised learning:
In supervised learning, the desired output pattern corresponding to an input is
presented to the network during training in order to guide learning. The
network learns in the training phase by having its weights adjusted such that the
actual network output becomes more similar to the desired network output.
Thus, the desired output acts as an external teacher in this type of learning.
2. Unsupervised learning:
This type of learning uses no external teacher and is based upon only local
information. It is also referred to as self-organization, in the sense that it self-
245
organizes data presented to the network and discovers their emergent collective
properties.
246
APPENDIX F
Resilient Back-propagation Algorithm
The choice of the learning rate for the standard back-propagation algorithm in
equation E.1, which scales the derivative of the error function, has an important effect
on the time needed until convergence is reached.
wij ( t ) =
E
(t )
wij
(E.1)
If is set too small, too many steps are needed to reach an acceptable solution. On the
contrary, a large learning rate will possibly lead to oscillation, preventing the error to
fall bellow a certain value.
On the other hand, MLP networks typically use sigmoid transfer functions in the
hidden layers. The functions are characterized by the fact that their slope must
approach zero as the input gets large. This causes a problem when using steepest
descent to train a MLP network with sigmoid functions, since the gradient can have a
very small magnitude leading to a small learning rate; and therefore, cause small
changes in the weights and biases, even though the weights and biases are far from
their optimal values.
247
a consequence, only the sign of the derivative is considered to indicate the direction of
the weight update. The size of the weight change is exclusively determined by a
update-value
wij
(t )
where
ij
(t )
ij ( t )
(t )
= + ij
0
(t )
if E > 0
wij
(t )
if E < 0
(E.2)
wij
else
E ( t )
is the summed gradient information over all patterns of the pattern set.
wij
Each update-value evolves during the learning process according to its local sight of
the error function E. This is based on a sign-dependent adaptation process:
ij
(t )
+
( t 1)
ij
( t 1)
= ij
( t 1)
ij
, if
E ( t 1) E ( t )
>0
wij
wij
, if
E ( t 1) E ( t )
<0
( t 1)
wij
wij
(E.3)
, else
Note that the update-value is not influenced by the magnitude of the derivatives, but
only by the behaviour of the sign of two succeeding derivatives. Every time the partial
derivative of the corresponding weight changes its sign, which indicates that the last
update is too big and the algorithm has jumped over a local minimum, the update248
value
ij
(t )
is decreased by the factor -. If the derivative retains its sign, the update-
249