Template_Bachelor_2022__Copy_2
Template_Bachelor_2022__Copy_2
Template_Bachelor_2022__Copy_2
Thesis Title
Bachelor Thesis
Thesis Title
Bachelor Thesis
(i) the thesis comprises only my original work toward the Bachelor Degree
(ii) due acknowlegement has been made in the text to all other material used
Student Name
30 July, 2020
Acknowledgments
V
Abstract
VI
Contents
Acknowledgments V
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
2 Background 4
2.1 Fields Related . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
3 Methodology 20
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Libraries used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 CNN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Training Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.7 NAdam Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.8 K-means clustering for anomaly detection . . . . . . . . . . . . . . . . . . . . 27
3.9 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.10 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
Appendix 34
A Pre/Post Test 35
Level One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
VII
B Lists 36
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38
References 41
VIII
Chapter 1
Introduction
1.1 Motivation
Many industries, including manufacturing, oil and gas, power generation, and water treatment,
depend heavily on valve machines. These equipment frequently experience wear, corrosion,
and mechanical issues, which can lead to malfunctions, inefficient productivity, and safety
risks. For the purpose of avoiding failures, reducing downtime, and guaranteeing optimum op-
erating efficiency, it is essential to identify irregularities in valve machines as soon as possible.
To learn more about the health of valve machines, listen to sound. Unusual noises, such
as squeaks, hisses, or vibrations, may be a sign of underlying problems or impending failures.
Anomaly sound detection can be a useful method for tracking the health and performance
of valve machines by using sound data, advanced signal processing, and machine learning
approaches.
For valve machines, abnormal sound detection enables preventative maintenance techniques
like condition-based maintenance or predictive maintenance. Potential defects can be detected
early on by continuously monitoring the sound characteristics of valve machines, enabling
prompt maintenance actions. With a proactive approach, unplanned downtime is minimised,
maintenance plans are optimised, and total maintenance costs are decreased.
Frequently, safety-critical procedures like fluid flow control, pressure management, or sys-
tem shutdowns use valve machines. Anomalies in valve machines can jeopardise operational
safety, posing a risk to the environment, damaging equipment, or even putting human lives in
jeopardy. Potential safety issues can be reduced by creating precise and dependable anomalous
sound detection techniques, assuring the secure and efficient operation of valve machines.
Compared to other forms of machinery, abnormal sound detection for valve machines
presents particular difficulties. Complex auditory behaviours are displayed by valve systems
due to a variety of valve actions, fluid dynamics, and mechanical interactions. These unique
issues can be addressed in a thesis on valve machine sound detection, for example, by creating
specialised feature extraction methods or domain-specific anomaly classification systems.
1
CHAPTER 1. INTRODUCTION 2
The use of anomalous sound detection techniques in valve machines meets the needs and
demands of the sector. Industries that rely on valve machines are continually looking for novel
ways to enhance equipment efficiency overall, reduce operational costs, and improve mainte-
nance procedures. A thesis in this field has the potential to immediately advance commercial
applications and spark interest among key industry players.
There are prospects for cooperation with industry partners, subject-matter specialists, and
research organisations in the area of anomalous sound detection for valve machines. Collabo-
ration on projects can lead to access to real-world datasets on valve machines, useful insights,
and operational knowledge. This partnership can enhance the research process, guarantee the
thesis work’s applicability, and promote knowledge transfer between academia and industry.
Significant academic and professional development opportunities are provided by pursuing
a thesis in anomalous sound detection for valve machines. It enables in-depth investigation
of the dynamics of valve machines, signal processing methods, machine learning algorithms,
and domain-specific information. Additionally, the thesis work may result in publications in
respected journals or conference proceedings, enhancing the researcher’s academic record and
prestige in the field.
1.2 Aim
The objective of this thesis is to create a system capable of identifying sound irregularities
in valve machines. The thesis intends to contribute to the field of industrial maintenance and
improve the overall operational performance of valve machines by utilising cutting-edge signal
processing and machine learning approaches.
Learn more about the sound profiles produced by various valve machine types by taking
into account a variety of operating situations, valve movements, and mechanical interactions.
Data gathering from actual valve machines will be used in this inquiry, along with a thorough
analysis of their acoustic characteristics.
Investigate and contrast different machine learning techniques that are effective for finding
anomalies in valve machine sound data. To find the best algorithm for precisely recognising
sound anomalies, this investigation will use methods like supervised learning, unsupervised
learning, and deep learning approaches.
Create a thorough anomaly detection system that incorporates the chosen machine learning
algorithms and optimised feature extraction techniques. The system should be able to monitor
valve machine noise in real-time and send prompt alerts or notifications when anomalies are
found, allowing for preventative maintenance interventions.
Conduct in-depth analyses and comparative trials to evaluate the functionality and effi-
ciency of the created anomaly detection system. Analyse its precision, robustness, and sen-
sitivity to various abnormalities and operational circumstances. To demonstrate the system’s
superiority, compare its performance to that of other methods and benchmark datasets.
CHAPTER 1. INTRODUCTION 3
Work together with business partners or gain access to valve equipment in actual industrial
settings. To confirm the produced system’s applicability, dependability, and efficacy in spotting
sound anomalies in real-world circumstances, validate it using data from actual valve machines.
To gain insight into the fundamental causes of the anomalies, analyse the abnormalities that
have been found and the sound patterns that go along with them. The results of this research
may be used to inform maintenance decision-making procedures including root cause analysis,
fault diagnosis, and maintenance scheduling, resulting in valve machine maintenance plans that
are both effective and efficient.
The overall goal of this thesis is to advance the field of sound anomaly detection in valve
machines through the creation of a dependable and useful system that improves valve machine
maintenance procedures and operational performance in a variety of industrial applications.
Chapter 2
Background
4
CHAPTER 2. BACKGROUND 5
These fields are related to one another and serve as a basis for the creation and use of
anomalous sound detection techniques for valve machines. Researchers and practitioners can
develop the field of anomalous sound detection, better maintenance procedures, and improve
the performance and dependability of valve machines across a variety of industrial sectors by
utilising information and methods from these connected fields.
Figure 2.1: Spectrogram of a part of the original sound. The X-axis shows the time in seconds.
Based on deep learning and the Neyman-Pearson lemma, Y. Koizumi (2019)[3] devised an
unsupervised method for identifying abnormal noises in audio signals. Rare or odd sounds that
deviate from the typical sounds in the audio stream are referred to as abnormal sounds. The
suggested approach starts by teaching a deep autoencoder neural network a low-dimensional
representation of the audio signals using a set of typical audio signals. The key characteristics
of typical audio signals are captured by the learned representation, while irrelevant or strange
characteristics are ignored. The scientists computed a threshold for detecting anomalous sounds
in the low-dimensional representation of the audio signals after training the deep autoencoder
using the Neyman-Pearson lemma. By computing the audio signals’ reconstruction errors and
thresholding them above the calculated threshold, abnormal sounds were found. On a dataset
of urban sounds, the authors tested the suggested strategy, and they were able to achieve a high
detection rate with a low false alarm rate. The outcomes showed how successful the suggested
method was at unsupervisedly identifying aberrant sounds in audio streams.
CHAPTER 2. BACKGROUND 7
A pre-training step and a fine-tuning stage make up the two stages of the suggested IDNN
technique that is proposed by K. Suefusa(2020)[6]. A deep autoencoder is trained on a sizable
dataset of sound signals during the pre-training phase in order to discover a low-dimensional
representation of the data. The IDNN, which consists of numerous shallow neural networks
connected by a linear interpolation function, is then trained using this representation. The
IDNN can distinguish between representations of normal and abnormal sounds in feature space
since it is built to interpolate between them. This study introduces a potential method for
detecting anomalous sounds based on an IDNN, which combines the strength of deep learning
with the capacity to interpolate across various data representations. The suggested method
has significant applicability in numerous industries where it is essential to detect unusual or
unexpected sounds in order to maintain safety and security.
CHAPTER 2. BACKGROUND 9
recordings. Convolutional neural networks (CNNs) and autoencoders, which are based on deep
neural networks and exhibit the best performance, outperformed more conventional signal pro-
cessing techniques.
Y. Kawaguchi(2021)[9] discusses The DCASE 2021 challenge task 2, which was concerned
with domain-shifted unsupervised anomaly detection for machine condition monitoring, is de-
scribed and discussed. The goal of the challenge was to assess how well state-of-the-art ap-
proaches performed when training and testing data came from different distributions, as might
occur when trying to detect aberrant sounds in industrial machinery. The challenge received
20 applications from research teams throughout the world, and the findings demonstrated that
the suggested methods performed well in identifying aberrant noises in the machine recordings
even when the domain of the recordings was modified. Convolutional neural networks (CNNs)
and autoencoders, which are based on deep neural networks and exhibit the best performance,
outperformed more conventional signal processing techniques.
N. Harada(2021)[10] introduces a brand-new dataset, ToyADMOS2, for the detection of
anomalous sound in domain-shifting circumstances. The dataset consists of normal and ab-
normal noises produced by small machines that are similar to the sounds made by industrial
machinery. By producing various flaws and abnormalities in the machines, such as bearing
flaws, misalignments, and impeller damage, the aberrant sounds in the dataset were produced.
Different operational settings and background noises, such as varying speeds, loads, and am-
bient noise levels, were used to record the normal sounds. The report also offers a baseline
performance assessment of a number of cutting-edge techniques on the ToyADMOS2 dataset.
The findings demonstrate that the most effective algorithms produce excellent detection perfor-
mance under domain shift conditions and are based on deep neural networks, such as autoen-
coders and convolutional neural networks.
In the context of domain generalisation challenges, K. Dohi(2022)[11] introduces a brand-
new sound dataset called MIMII DG that is intended for examining and analysing broken in-
dustrial machinery. The dataset includes noises captured from a variety of industrial equipment,
including pumps, fans, compressors, and air conditioners, and it includes both typical and un-
usual machine operation circumstances. By generating numerous flaws and anomalies in the
machines, such as bearing flaws, misalignments, and motor failures, the aberrant noises in the
dataset were produced. Different operational settings and background noises, such as varying
speeds, loads, and ambient noise levels, were used to record the normal sounds. The report
also offers a baseline performance assessment of a number of cutting-edge techniques on the
MIMII DG dataset. The findings demonstrate that the most effective algorithms produce ex-
cellent detection performance under domain generalisation conditions and are based on deep
neural networks, such as autoencoders and convolutional neural networks.
CHAPTER 2. BACKGROUND 11
effectiveness of the suggested strategy is further examined by the authors in relation to several
variables, including the quantity of training samples and the type of classifier.
M. Sandler (2018)[15] demonstrates MobileNetV2, a mobile-friendly neural network ar-
chitecture made for effective and precise image classification on portable devices. The ”in-
verted residuals” and ”linear bottlenecks” building components, which the authors introduce,
are meant to increase the network’s efficiency while preserving accuracy. The linear bottleneck
block is used to lower the computational cost of the network, and the inverted residual block is
used to increase the nonlinearity of the network while minimising the number of parameters.
The study demonstrates that MobileNetV2 beats MobileNetV1 on a number of benchmarks
while retaining a similar model size and computational expense. Since then, many computer
vision tasks, including object identification, segmentation, and video classification on mobile
devices, have made extensive use of the proposed architecture.
Figure 2.6: Comparison of convolutional blocks for different architectures. ShuffleNet uses
Group Convolutions and shuffling, it also uses conventional residual approach where inner
blocks are narrower than output.
even under hypothetical test settings, is the initial stage. A domain specialisation model, used
in the second stage, can categorise strange noises produced by a particular machine. The sys-
tem extracts pertinent data from the audio signals using a variety of feature extraction tech-
niques, such as spectrograms, sub-band spectral features, and mel-frequency cepstral coef-
ficients (MFCCs). The outcomes demonstrate that the suggested system outperformed vari-
ous baseline models in the DCASE2022 Challenge in terms of size and computing cost while
achieving excellent accuracy in both stages. Since then, many computer vision tasks, includ-
ing object identification, segmentation, and video classification on mobile devices, have made
extensive use of the proposed architecture.
F. Xiao(2022)[18] present their system developed for Task 2 of the DCASE2022 Chal-
lenge, which focuses on anomalous sound detection. The proposed system incorporates self-
supervised attribute classification and Gaussian Mixture Model (GMM)-based clustering to
improve the performance of sound detection.The authors leverage self-supervised learning
techniques to learn meaningful representations of sound attributes without relying on labeled
data. This allows the system to effectively capture the characteristics of normal and anomalous
sounds, even when limited labeled data is available. The self-supervised attribute classifica-
tion helps in identifying relevant sound attributes that contribute to the detection of anoma-
lies.Additionally, the system groups together comparable sound examples using GMM-based
clustering. By identifying the underlying patterns and structures in the data, this clustering
method aids in the differentiation between normal and abnormal sounds. The system can ac-
curately recognise and categorise aberrant noises by associating each sound instance with a
particular cluster.The DCASE2022 Challenge results for the suggested system show that it per-
forms well in terms of identifying unusual sounds. The performance of the sound detection
problem is improved by combining self-supervised attribute classification and GMM-based
clustering. The results of this study offer important new perspectives on the creation of sophis-
ticated techniques for unsupervised anomalous sound detection.
Y. Deng(2022)[19] show off the AITHU system that was created for the DCASE2022 Chal-
lenge. Their technique focuses on the sound data-based unsupervised abnormal detection of
machine operational status.Without labelled training data, the AITHU system seeks to solve
the problem of identifying anomalies in machine working state. It makes use of acoustic sig-
nal analysis to spot irregularities in machine behaviour. The system can distinguish between
typical and abnormal behaviour by examining the acoustic patterns and traits of the machine
sounds.The authors suggest a brand-new methodology that blends cutting-edge machine learn-
ing methods with the evaluation of reliable data. The AITHU system makes use of several
feature extraction and signal processing techniques to extract the pertinent data from the sound
sources. Then, it uses unsupervised learning techniques to find odd patterns and categorise
them appropriately.In the DCASE2022 Challenge, the AITHU system proved to be useful by
achieving noteworthy results in unsupervised anomaly detection of machine working state. The
system’s potential for real-world applications in industrial settings, where monitoring and iden-
tifying machine failures are vital, is highlighted by its capacity to detect anomalies without the
requirement for labelled data.
S. Venkatesh(2022)[20] presented a unique method called Disentangled Surrogate Task
Learning (DSTL) to improve the domain generalisation abilities of unsupervised anomalous
CHAPTER 2. BACKGROUND 14
sound detection systems. The DSTL approach makes use of the idea of surrogate tasks, which
are side tasks that are connected to the primary detection job but simpler to learn. The sci-
entists want to increase the system’s ability to generalise to unknown domains by training the
model to perform well on these surrogate tasks.The DSTL technique explicitly distinguishes
between domain-specific and domain-invariant information in the sound data to deconstruct the
representation learning process. This detachment enables the model to concentrate on discov-
ering the common underlying structure across several domains, improving generalisation.To
assess the effectiveness of their DSTL technique, the authors ran tests using the DCASE2022
Challenge dataset. The outcomes show how well their approach works in terms of enhancing
domain generalisation for unsupervised anomalous sound detection. The potential of the DSTL
approach for real-world applications where there is a dearth of labelled data from all domains
is demonstrated by the approach’s considerable advancements in the detection of abnormalities
from unknown domains.
Figure 2.7: Block diagram of disentangled anomaly detector. In the figure, NN stands for
Nearest Neighbor. In the training phase, exclusive latent spaces were assigned to sections and
attributes.
Y. Wei(2022)[21] presented their approach for the DCASE2022 Challenge Task 2 on anoma-
lous sound detection. The task focuses on developing systems that can effectively detect
anomalous sounds for machine condition monitoring.Self-challenge and metric evaluation are
two essential parts of author’s unique approach for anomalous sound identification. The model
is trained in a self-supervised way as part of the self-challenge component, and through the
use of a binary classification problem, the model learns to distinguish between typical and
abnormal noises. The model can utilise a significant quantity of unlabeled data thanks to its
self-supervised training, which increases its adaptability to various machine circumstances and
sound environments.The authors also present a metric evaluation process that tries to improve
CHAPTER 2. BACKGROUND 15
system performance. In order to acquire a thorough assessment of the system’s detection ca-
pabilities, they suggest a metric fusion technique that incorporates various metrics, such as
frame-level F1-score and segment-level F1-score. The system can better handle varied lengths
and durations of anomalous sound events by taking into account both frame-level and segment-
level performance.The DCASE2022 Challenge dataset is used to test the proposed system, and
the results show how well it can identify unusual sounds for machine condition monitoring. The
system performs admirably in terms of F1-score and AUC metrics, demonstrating its potential
for use in practical situations.By offering a self-challenge approach and a thorough metric eval-
uation system, the study described in this paper makes a contribution to the field of anomalous
sound identification. Without relying on labelled data, self-supervised learning enables the sys-
tem to adapt to various machine circumstances, while metric fusion technique offers a more
thorough and precise assessment of the system’s performance.
K. Morita(2022)[22] showed the study’s goal is to investigate how well various spectro-
gram representations can identify unusual sounds. To view the frequency content of sound
waves over time, spectrograms are frequently employed in audio signal processing. For differ-
ent audio analysis tasks, such as sound classification and anomaly detection, they offer useful
information.The Mel-spectrogram, Gammatonegram, and CQT-spectrogram are three distinct
spectrogram forms that the authors look into. The features of each representation in capturing
various components of the sound signals, such as frequency resolution and time-frequency
localisation, are unique. The authors hope to find the best spectrogram representation for
anomalous sound detection by evaluating the performance of several representations.The au-
thors use the DCASE2022 Challenge dataset to assess performance and a machine learning-
based method for anomalous sound identification. They use each spectrogram representation
to train and test their models, and they evaluate the detection performance using metrics such
as Area Under Curve (AUC) and Partial AUC (pAUC).The experimental findings shed light on
the efficacy of various spectrogram representations for the detection of aberrant sounds. Based
on their research, the writers examine the benefits and drawbacks of each representation and
offer suggestions. The study advances knowledge of how spectrogram representations affect
the effectiveness of systems for detecting abnormal sounds.
CHAPTER 2. BACKGROUND 16
J. Bai(2022)[23] A batch mixing technique and an anomaly detector are used in the pro-
posed method for anomalous sound detection. Using the batch mixing technique, synthetic
training samples are created by randomly fusing several regular sound clips to represent anoma-
lous sound events. The goal of this strategy is to improve the model’s capacity for generali-
sation and precise anomaly sound detection.The authors use an anomaly detector, which is
trained using a combination of normal and anomalous sound recordings, to carry out the abnor-
mal sound detection. The detector gains the ability to discern between regular and abnormal
sound patterns, and it rates test samples for anomaly based on how far they depart from the
norm.The Jless method’s performance is evaluated using the DCASE2022 Challenge dataset
and conventional assessment criteria like Area Under Curve (AUC) and partial AUC (pAUC).
The evaluation’s findings show how successful the suggested approach is in identifying unusual
sounds and achieving competitive performance in the challenge.The authors offer insights into
the effectiveness of the Jless technique and talk about the advantages and disadvantages of their
strategy. In order to achieve precise and reliable anomalous sound detection, they emphasise
the advantages of the batch mixing technique and the significance of the anomaly detector.
CHAPTER 2. BACKGROUND 17
S. Verbitskiy(2022)[24] take on the problem of detecting anomalous noises, where the ob-
jective is to identify unidentified aberrant sounds based only on regular sound data. To enhance
the performance of detection, their method makes use of several time-frequency representa-
tions as input features.The authors train their anomalous sound detection (ASD) systems for
each machine type using normal sound recordings and their section indices. To capture a va-
riety of properties of the sound data, they use ensembles of 2D CNN-based systems, each
employing a different time-frequency representation.The authors use cosine similarity and the
k-nearest neighbours technique (k-NN) to score anomalies by extracting embedding vectors
from their CNNs. They establish anomaly scores to determine instances that are unusual by
contrasting the embedding vectors of test clips with those of typical sound clips.The perfor-
mance of the suggested strategy is examined using evaluation measures like Area Under Curve
(AUC) on the DCASE2022 Challenge dataset. The outcomes show that their method outper-
forms benchmark systems and produces competitive outcomes in the challenge, resulting in
a high detection performance.The success of using several time-frequency representations for
unsupervised anomalous sound identification is highlighted by the authors as they examine the
advantages and disadvantages of their approach. They offer analysis of the effectiveness of
their strategy and recommendations for more enhancements.
K. Wilkinghoff(2022)[25] provides a domain-generalization-focused anomalous sound de-
tection system for machine condition monitoring. The system’s objective is to identify unusual
sounds in a wide range of contexts, including those unrelated to the training set.To improve the
system’s capacity to identify anomalies across a variety of domains, the author’s approach in-
tegrates outlier exposure techniques. The system gets more reliable and environment-adaptive
by subjecting the model to a variety of difficult samples during training.The author combines
deep learning models, feature engineering, and outlier exposure tactics to create the anomalous
CHAPTER 2. BACKGROUND 18
sound detection system. The models use different audio representations to capture pertinent
information and are trained on common sound recordings. The system then employs strategies
for outlier exposure to boost generalisation and the detection of anomalies.The DCASE2022
Challenge dataset is used to gauge the system’s performance, and the outcomes are examined
using accepted assessment measures. The report highlights the benefits of the outlier-exposed
technique for domain generalisation while discussing the performance attained and compar-
ing it to baseline systems.The author also examines potential areas for future enhancements
and offers insights into the system’s advantages and disadvantages. By addressing the issue of
domain generalisation and demonstrating the potency of outlier exposure strategies, the work
makes a contribution to the field of unsupervised anomalous sound detection.
H. Zhang(2018)[26] authored the paper titled ”mixup: Beyond Empirical Risk Minimiza-
tion,” presented at the International Conference on Learning Representations in 2018. The
paper introduces a data augmentation technique called ”mixup” that goes beyond conventional
empirical risk minimization approaches.The mixup technique generates augmented training
samples by linearly interpolating pairs of input examples and their accompanying labels, with
the goal of enhancing the generalisation performance of deep learning models. Mixup en-
courages the model to learn from a more varied and even distribution of training samples by
combining the inputs and labels, effectively regularising the model and minimising overfit-
ting.The theoretical study of mixup presented in this paper shows that it encourages the model
to act linearly between training instances and their labels. Mixup can successfully combine
the data from several samples thanks to its linearity property, which enhances robustness and
generalisation.To assess the performance of mixup in comparison to other data augmentation
techniques, the authors also conduct comprehensive trials on a variety of image and text clas-
sification tasks. The findings show that mixup regularly improves generalisation performance,
CHAPTER 2. BACKGROUND 19
lowering the likelihood of overfitting and enhancing the model’s capacity to handle out-of-
distribution data.To assess the performance of mixup in comparison to other data augmentation
techniques, the authors also conduct comprehensive trials on a variety of image and text clas-
sification tasks. The findings show that mixup regularly improves generalisation performance,
lowering the likelihood of overfitting and enhancing the model’s capacity to handle out-of-
distribution data.
I. Nejjar(2022)[27] suggests an innovative method for learning meaningful representations
from unlabeled audio data that makes use of self-supervised learning. A deep neural network
can learn to capture high-level acoustic properties that can be used for future unsupervised
anomalous sound detection by pre-training it on a pretext task using a lot of unlabeled audio.The
architecture and training methodology used in their approach are described in the publication.
A smaller labelled dataset of typical and unusual sound samples is used to fine-tune the pre-
trained model. On the DCASE2022 Challenge dataset, the authors test their methodology, and
the results are presented in terms of performance metrics including accuracy, precision, recall,
and F1 score.The application of self-supervised learning approaches for unsupervised anoma-
lous sound detection is the paper’s main contribution. The suggested method can successfully
identify anomalous sounds without the use of manual annotation or explicit labelling by util-
ising unlabeled data and learning meaningful representations.The outcomes indicate how the
self-supervised learning strategy works and highlight its potential for unsupervised anomaly
detection in reliable data. The writers also talk about the method’s shortcomings and potential
prospects for the future.
Chapter 3
Methodology
3.1 Dataset
The dataset was created using ToyADMOS2 [10] and MIMII DG [11]. The dataset includes
operational noises from seven different toy and real machine types, including the ToyCar, Toy-
Train, fan, gearbox, bearing, slide rail, and valve.
Each recording is a 10-second long, single-channel, 16 kHz sampling audio file. To con-
struct the training/test data, we combined machine sounds collected in laboratories with am-
bient noise recorded in actual industries. You may read more about the recording process in
[10] and [11].I would want to make it clear that the audio samples in the dataset I used for
training and evaluation are all sampled at a rate of 16 kHz. The frequency range that can be
accurately caught within the audio signal depends on the sampling rate, which is the quantity
of audio samples taken each second. I made sure that all of the audio samples were consistent
by standardising the dataset to a 16kHz sampling rate. This made it possible for me to use a
consistent set of signal processing and analysis procedures throughout the investigation. Addi-
tionally, a sampling rate of 16 kHz is enough for recording important details in valve machine
sounds and is frequently employed in a variety of audio-related applications. The selection of
a sampling rate of 16 kHz is particularly appropriate for valve machine monitoring because it
strikes a balance between capturing high-frequency components that may indicate anomalies
in the machine’s operation and effectively representing the audio data without requiring large
files or computational demands.
3.2 Autoencoders
A potent type of artificial neural networks known as autoencoders has demonstrated consid-
erable promise in a number of fields, including valve machine monitoring. Because they can
efficiently capture the underlying patterns and irregularities present in the sensor data produced
20
CHAPTER 3. METHODOLOGY 21
by valve machines, autoencoders are particularly helpful in this situation. Autoencoders im-
prove the overall monitoring and maintenance of valve machines by enabling effective feature
extraction and anomaly detection by learning a compressed representation of the input data.
The capacity of autoencoders to recognise intricate and nonlinear correlations in sensor data
is one of their primary advantages in valve machine monitoring. Numerous sensors, including
pressure sensors, temperature sensors, and vibration sensors, are used in valve machines to
produce enormous volumes of data. Even though this data offers useful information about the
machine’s operational state, manually identifying significant features can be difficult. With-
out the requirement for explicit feature engineering, autoencoders may automatically learn a
condensed representation of the sensor data, capturing the most important patterns and corre-
lations.
Autoencoders are often trained on a large dataset of typical operating situations in valve
machine monitoring. The autoencoder learns to accurately recreate the input data during the
training phase while attempting to reduce the reconstruction error. The autoencoder can record
the typical patterns and fluctuations in the sensor data thanks to this technique. The autoencoder
can be used to reconstruct fresh data samples after being trained and compare the reconstruc-
tion error to a predetermined threshold. Higher reconstruction errors will occur from unusual
or abnormal patterns that considerably depart from the ingrained normal patterns, suggesting
potential flaws or anomalies in the valve machine.
Another benefit of using autoencoders for valve machine monitoring is their unsupervised
nature. Autoencoders may learn directly from unlabeled sensor data, in contrast to supervised
learning techniques that demand labelled data for training. This is especially useful in situations
where labelled anomalous data is hard to come by or rare. Autoencoders can find and record
tiny irregularities in the sensor data that may be difficult to find using conventional approaches
by utilising unsupervised learning.
Autoencoders are also capable of real-time learning and adaptation, which qualifies them
for online valve machine monitoring. The autoencoder may update its learnt representation
to take into account changes in the machine’s behaviour and detect anomalies in real-time as
sensor data streams continually. This functionality enables proactive maintenance and reduces
downtime by enabling early defect detection or abnormal operating circumstances.
TensorFlow is an appropriate candidate for processing the sensor data produced by valve ma-
chines because of its performance, which is vital when working with huge datasets and intricate
autoencoder structures.
Building neural network models with TensorFlow is simple and flexible, and it includes
autoencoders. It offers TensorFlow Keras, a high-level API that makes it easier to define and
train deep learning models. With TensorFlow, you can quickly build numerous autoencoder
types, experiment with different architectures, and modify the model to meet the unique needs
of valve machine monitoring.
As a strong and widely used deep learning framework, TensorFlow offers effective comput-
ing, flexibility in model creation, GPU acceleration, a rich ecosystem, deployment flexibility,
and substantial community support. TensorFlow is a great option for using autoencoders in
valve machine monitoring because of these features, which allow for effective processing of
sensor data, model customisation, and smooth integration into monitoring systems.You can
read more about tensorflow at[28].
I decided to use TensorFlow as the main framework for building the autoencoder-based
anomaly detection system for my thesis on valve machine monitoring. TensorFlow provides a
variety of strong arguments that support the needs and goals of my research.
TensorFlow is the best option for handling the massive datasets produced by valve machines
because of its fast calculation capabilities. I can effectively handle and analyse the sensor data
using TensorFlow’s optimised computational libraries, which enables the creation of precise
and reliable autoencoder models.
Another key benefit of TensorFlow is its model construction flexibility. I can quickly design
and train several kinds of autoencoders, experiment with various architectures, and fine-tune
the models in accordance with the particular requirements of valve machine monitoring thanks
to TensorFlow’s high-level API, TensorFlow Keras. This versatility enables me to alter the
autoencoder architecture and enhance its functionality for applications involving anomaly de-
tection.
The training and inference phases of the autoencoder models are significantly accelerated
by TensorFlow’s easy connection with GPUs. TensorFlow maximises computational efficiency
and cuts down on model training time by taking advantage of GPU capability. When working
with large-scale valve machine datasets, this functionality is crucial because it speeds up testing
and iteration.
The robust ecosystem of TensorFlow is essential to the success of my thesis work. I have
access to a multitude of resources inside the TensorFlow ecosystem, including pre-trained mod-
els, libraries, and tools, which I can use to improve the efficiency and performance of my
autoencoder-based anomaly detection system. TensorFlow may also be seamlessly integrated
with a wide range of data processing, visualisation, and evaluation tools, enabling a thorough
study of every machine data. Additionally, TensorFlow is compatible with other well-known
Python libraries that I will state in a moment.
I used the Audiomentations library to improve the performance of my anomaly detection
system in my thesis on valve machine monitoring. The extensive collection of audio data
CHAPTER 3. METHODOLOGY 23
the system’s memory constraints and avoid using extremely high batch sizes that could cause
memory overflow or slower convergence.
The models had enough opportunities to learn from the dataset after 100 epochs of training.
The model may access the full dataset during each epoch, allowing it to gradually enhance
its performance. A balance between training time and obtaining acceptable convergence and
generalisation performance led to the selection of 100 epochs. It is crucial to keep in mind
that the ideal number of epochs may change based on the problem’s complexity, the size of the
dataset, and the model’s design.
I attempted to assure successful model training and convergence while effectively manag-
ing computational resources by employing a batch size of 64 and training the models for 100
epochs. With the end goal of obtaining accurate and dependable findings in the context of valve
machine monitoring, these decisions were made after careful analysis of the dataset features,
model complexity, and available computational infrastructure.
The Nesterov accelerated gradient (NAG) method is incorporated into the NAdam opti-
mizer, which is an extension of the Adam optimizer. In order to achieve faster and more stable
convergence, it blends adaptive learning rate adjustment with the advantages of NAG.
It has been demonstrated that NAdam enhances generalisation performance, improving
model performance on untested data. By adding NAG, it successfully lowers parameter update
overshooting, enhancing the model’s ability to generalise to new cases and handle anomalies.
Increased resistance to noise and outliers in the training data is shown by the NAdam op-
timizer. This can be especially useful for monitoring valve machines because the recorded
sound data may contain abnormalities or unexpected fluctuations. By reducing the effect of
noisy samples during training, the NAdam optimizer aids in the model’s learning of more reli-
able representations.
Like Adam, NAdam dynamically modifies each parameter’s learning rate based on its prior
gradients. Due to this adaptability, training becomes more consistent and effective by ensuring
that the learning rate is optimal for various factors.
The NAdam optimizer deals with some of the optimisation problems that Adam has, namely
the sensitivity to learning rate selection. By minimising the oscillations and instabilities that
can happen during training, it demonstrates enhanced optimisation dynamics. In the context
of valve machine monitoring, where precise and trustworthy anomaly detection is required for
ensuring the machines’ normal operation, this stability is crucial.
In some cases, including when working with sparse gradients, the NAdam optimizer per-
forms better than Adam. Deep learning models frequently use sparse gradients, particularly
when working with high-dimensional data like audio signals. The model can learn more quickly
and accurately capture the underlying patterns and anomalies contained in the valve machine
sound data thanks to the NAdam optimizer’s adept handling of sparse gradients.
The NAdam optimizer consistently outperforms competing algorithms on a variety of tasks
and datasets. In the context of monitoring valve machines, where the anomaly detection sys-
tem must be strong and dependable across various operating circumstances, environments, and
anomaly kinds, this consistency is beneficial. The system’s overall performance and stability
are influenced by the NAdam optimizer’s ability to consistently optimise the model parameters.
The NAdam optimizer is becoming more well-liked and is being widely used in a variety of
fields of study. It has demonstrated encouraging results in enhancing deep learning models’ per-
formance in anomaly detection applications. The NAdam optimizer has received widespread
acceptance and research, which has resulted to a plethora of information, resources, and com-
munity support, making it an excellent alternative for applications involving valve machine
monitoring.
In choosing the NAdam optimizer for my thesis, I hoped to take use of its unique benefits
and tackle the problems related to anomaly identification in valve machines. The goals of
precisely and effectively identifying anomalies in valve machine sound data are well-aligned
with the NAdam optimizer’s increased optimisation dynamics, handling of sparse gradients,
consistency in performance, and wide adoption.
CHAPTER 3. METHODOLOGY 27
3.9 Metrics
Area Under the Curve (AUC) and Partial AUC (pAUC) metrics were used to assess the ef-
fectiveness of the proposed valve machine monitoring system. The capacity of the model to
distinguish between typical and anomalous sound samples is measured in detail by the AUC
metric, which is frequently used for binary classification tasks. It is appropriate for assessing
the efficiency of the anomaly detection system since it takes into account the overall perfor-
mance across various threshold values.
The ability of the model to correctly categorise sound samples as normal or anomalous
may be evaluated quantitatively using AUC as a statistic. AUC values closer to 1 imply greater
performance, with higher scores indicating a larger capacity for discrimination.
In this thesis, Partial AUC (pAUC) was also used as a supplemental statistic. pAUC concen-
trates on a particular range of false positive rates that are pertinent to the particular application,
as opposed to AUC, which takes into account the entire range of false positive rates. pAUC
can offer a more focused assessment of the system’s performance under particular operating
conditions or false positive rates of interest in the context of valve machine monitoring.
CHAPTER 3. METHODOLOGY 28
This thesis seeks to offer a thorough evaluation of the suggested valve machine monitoring
system’s performance using AUC and pAUC as evaluation measures. These measures allow
for a quantitative evaluation of current techniques and make it easier to pinpoint the best valve
machine anomaly detection strategies.
Overall, the use of AUC and pAUC as evaluation measures provides a thorough and unbi-
ased assessment of the effectiveness of the suggested approach in spotting anomalies in valve
machine sound data. The use of these measures enables a thorough evaluation of the system’s
performance and makes it easier to pinpoint areas where valve machine monitoring methods
need to be improved.
This thesis intends to improve the resilience and generalisation ability of the valve ma-
chine monitoring system by utilising these four data augmentation techniques: shift, pitchshift,
timestretch, and Gaussian noise. Combining these augmentation techniques enables the model
to extract more representative and discriminative characteristics from the enhanced data, im-
proving the accuracy and reliability of its anomaly detection.
In conclusion, data augmentation, which uses methods like Gaussian noise, timestretch,
pitchshift, and shift, is crucial for enhancing the effectiveness of the valve machine monitoring
system. The model’s capacity to record variations in sound patterns is improved by these
augmentation techniques, which ultimately aid in the more accurate detection of anomalies in
valve machine sound data.
Figure 3.1: Above is shown a snippet of the code showing the parameters given to each aug-
menting function.
Chapter 4
Source Target
AUC pAUC AUC pAUC
Valve00 99.76 98.73 70.36 52.0
Valve01 92.88 83.78 68.72 54.1
Valve02 100.0 100.0 92.72 81.0
Source Target
AUC pAUC AUC pAUC
Valve00 99.76 98.73 70.36 52.0
Valve01 92.88 83.78 68.72 54.1
Valve02 100.0 100.0 92.72 81.0
30
Chapter 5
5.1 Conclusion
In conclusion, the NAdam optimizer and data augmentation techniques have shown to be use-
ful for monitoring valve machines. In comparison to conventional optimizers like Adam, the
NAdam optimizer has better convergence and generalisation capabilities thanks to its variable
learning rate and momentum. This enhances the model’s performance by enabling it to suc-
cessfully navigate challenging optimisation environments.
The model’s capacity to generalise to new data and identify anomalies in valve machine
noises has also been successfully improved by the use of data augmentation techniques such
Gaussian noise, temporal stretching, pitch shift, and shift. Data augmentation reduces overfit-
ting and increases the model’s robustness by exposing it to a larger range of realistic scenarios
by introducing controlled variances in the training data.
A more trustworthy and precise valve machine monitoring system has been created using
the NAdam optimizer and data augmentation approaches. The model shows improved detection
abilities, gaining greater accuracy and better performance in spotting anomalies and odd sound
patterns.
Overall, the use of the NAdam optimizer and approaches for data augmentation provide a
strong foundation for efficient and trustworthy valve machine monitoring. These methods aid
in the creation of reliable and precise anomaly detection systems, opening the way for increased
operational and maintenance effectiveness in industrial settings.
31
CHAPTER 5. CONCLUSION AND FUTURE WORK 32
It would be beneficial to have real-time monitoring capabilities for quick detection and
reaction to anomalies. Real-time monitoring in valve machine systems would involve the ap-
plication of effective algorithms and optimisation of the computational needs.
Valve machines can function in a variety of environmental settings, which causes alterations
in sound patterns. The model’s ability to generalise and adapt to various operating situations
would be improved by increasing its robustness to these fluctuations.
A worthwhile direction for valve machine monitoring would be to investigate unsupervised
anomaly detection techniques. Unsupervised approaches are more versatile and responsive to
changing machine settings because they can identify anomalies without relying on labelled
data.
An important future path would be to expand the valve machine monitoring system to ac-
commodate large-scale industrial situations. For wider implementation, it would be essential to
provide scalability and effective processing of enormous volumes of data from various devices.
The augmented samples may contain some degree of information loss or distortion as a re-
sult of the data augmentation procedures. The model’s capacity to faithfully represent particular
features or patterns in the data may be impacted by this loss.
Overfitting can still occur in situations when augmented samples are too similar to the
training data, despite the fact that data augmentation works to reduce this risk. To avoid this
problem, the augmentation settings must be carefully chosen and tuned.
Different augmentation methods may differ in their efficacy and applicability for particular
kinds of sound data. To ensure effective augmentation, it is crucial to assess and choose the
best procedures depending on the peculiarities of valve machine sounds.
The variety of augmented samples might be constrained by the augmentation techniques
that are accessible. The model’s capacity to generalise to other kinds of anomalies could be
further improved by including more sophisticated and diverse augmentation techniques.
Further advancements might result from researching and creating advanced augmentation
methods designed specifically for valve machine sounds. This includes methods that enhance
the variety and realism of the augmented samples by taking into account the particular traits
and patterns found in valve machine sounds.
The augmentation process might be improved by looking at techniques for automated se-
lection and adaptation of augmentation procedures based on the unique properties of the valve
machine data. This could entail using machine learning algorithms to determine the best aug-
mentation techniques for certain machine circumstances.
The relevance and efficiency of the generated samples can be improved by taking domain-
specific knowledge into account and incorporating it into the data augmentation process. The
precise auditory properties and abnormalities relevant to valve machines could be captured
using domain-specific augmentation techniques.
It would be beneficial to do thorough analyses to determine the effects of various augmen-
tation approaches on the performance, generalisation, and resilience of models. The choice and
CHAPTER 5. CONCLUSION AND FUTURE WORK 33
fine-tuning of augmentation strategies for better valve machine monitoring can be guided by
this analysis.
Overall, even though data augmentation is an effective method for enhancing model perfor-
mance, its advantages for valve machine monitoring must be carefully considered, along with
potential future developments.
Appendix
34
Appendix A
Pre/Post Test
Level One
35
Appendix B
Lists
36
List of Tables
37
List of Figures
2.1 Spectrogram of a part of the original sound. The X-axis shows the time in
seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Procedure of anomalous sound simulation using autoencoder. . . . . . . . . . . 7
2.3 Evaluation Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Examples of log-Mel spectrograms of the original sound. . . . . . . . . . . . . 9
2.5 Examples of spectrograms for each machine type. . . . . . . . . . . . . . . . . 11
2.6 Comparison of convolutional blocks for different architectures. ShuffleNet uses
Group Convolutions and shuffling, it also uses conventional residual approach
where inner blocks are narrower than output. . . . . . . . . . . . . . . . . . . 12
2.7 Block diagram of disentangled anomaly detector. In the figure, NN stands for
Nearest Neighbor. In the training phase, exclusive latent spaces were assigned
to sections and attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 MobileFaceNet Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.9 The procedure of batch mixing strategy. . . . . . . . . . . . . . . . . . . . . . 17
2.10 Structure of the proposed anomalous sound detection system. . . . . . . . . . . 18
3.1 Above is shown a snippet of the code showing the parameters given to each
augmenting function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
38
Bibliography
[1] Y. Koizumi, S. Saito, H. Uematsu, and N. Harada. ”Optimizing acoustic feature extractor
for anomalous sound detection based on NeymanPearson lemma,”. in Proc. 25th Euro-
pean Signal Processing Conference (EUSIPCO), 2017.
[2] Y. Kawaguchi and T. Endo. “How can we detect anomalies from subsampled audio sig-
nals?” . in Proc. 27th IEEE International Workshop on Machine Learning for Signal
Processing (MLSP), 2017.
[4] Y. Kawaguchi, R. Tanabe, T. Endo, K. Ichige, and K. Hamada. “Anomaly detection based
on an ensemble of dereverberation and anomalous sound extraction,”. in Proc. 44th IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.
39
BIBLIOGRAPHY 40
[15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. “MobileNetV2: Inverted
residuals and linear bottlenecks,”. in Proc. 31st IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR),, 2018.
[16] Y. Zeng, H. Liu, L. Xu, Y. Zhou, and L. Gan. “Robust anomaly sound detection framework
for machine condition monitoring,”. DCASE2022 Challenge, Tech. Rep., 2022.
[17] I. Kuroyanagi, T. Hayashi, K. Takeda, and T. Toda. “Two-stage anomalous sound detec-
tion systems using domain generalization and specialization techniques,”. DCASE2022
Challenge, Tech. Rep., 2022.
[18] F. Xiao, Y. Liu, Y. Wei, J. Guan, Q. Zhu, T. Zheng, and J. Han. “The dcase2022 challenge
task 2 system: Anomalous sound detection with self-supervised attribute classification
and gmm-based clustering,”. DCASE2022 Challenge, Tech. Rep., 2022.
[19] Y. Deng, J. Liu, and W.-Q. Zhang. “Aithu system for unsupervised anomalous detection
of machine working status via sounding,”. DCASE2022 Challenge, Tech. Rep., 2022.
BIBLIOGRAPHY 41
[21] Y. Wei, J. Guan, H. Lan, and W. Wang. “Anomalous sound detection system with self-
challenge and metric evaluation for dcase2022 challenge task 2,”. DCASE2022 Challenge,
Tech. Rep., 2022.
[23] J. Bai, Y. Jia, and S. Huang. “Jless submission to dcase2022 task2: Batch mixing strategy
based method with anomaly detector for anomalous sound detection,” . DCASE2022
Challenge, Tech. Rep., 2022.
[25] K. Wilkinghoff. “An outlier exposed anomalous sound detection system for domain gen-
eralization in machine condition monitoring,”. DCASE2022 Challenge, Tech. Rep., 2022.
[26] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. “mixup: Beyond empirical risk
minimization,”. in International Conference on Learning Representations, 2018.
[27] I. Nejjar, J. P. J. Meunier-Pion, G. M. Frusque, and O. Fink. “Dcase challenge 2022: Self-
supervised learning pre-training, training for unsupervised anomalous sound detection,”.
DCASE2022 Challenge, Tech. Rep., 2022.