Template_Bachelor_2022__Copy_2

Download as pdf or txt
Download as pdf or txt
You are on page 1of 49

Media Engineering and Technology Faculty

German University in Cairo

Thesis Title

Bachelor Thesis

Author: Student Name


Supervisors: Supervisor Name

Submission Date: 30 July, 2020


Media Engineering and Technology Faculty
German University in Cairo

Thesis Title

Bachelor Thesis

Author: Student Name


Supervisors: Supervisor Name

Submission Date: 30 July, 2020


This is to certify that:

(i) the thesis comprises only my original work toward the Bachelor Degree

(ii) due acknowlegement has been made in the text to all other material used

Student Name
30 July, 2020
Acknowledgments

V
Abstract

VI
Contents

Acknowledgments V

1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Aim . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2

2 Background 4
2.1 Fields Related . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2.2 Related Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Methodology 20
3.1 Dataset . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.2 Autoencoders . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
3.3 Libraries used . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
3.4 CNN Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.5 Training Parameters . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.6 Loss Function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.7 NAdam Optimizer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.8 K-means clustering for anomaly detection . . . . . . . . . . . . . . . . . . . . 27
3.9 Metrics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
3.10 Data Augmentation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

4 Testing and Results 30

5 Conclusion and Future Work 31


5.1 Conclusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
5.2 Future Work and Limitations . . . . . . . . . . . . . . . . . . . . . . . . . . . 31

Appendix 34

A Pre/Post Test 35
Level One . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

VII
B Lists 36
List of Tables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
List of Figures . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

References 41

VIII
Chapter 1

Introduction

1.1 Motivation
Many industries, including manufacturing, oil and gas, power generation, and water treatment,
depend heavily on valve machines. These equipment frequently experience wear, corrosion,
and mechanical issues, which can lead to malfunctions, inefficient productivity, and safety
risks. For the purpose of avoiding failures, reducing downtime, and guaranteeing optimum op-
erating efficiency, it is essential to identify irregularities in valve machines as soon as possible.
To learn more about the health of valve machines, listen to sound. Unusual noises, such
as squeaks, hisses, or vibrations, may be a sign of underlying problems or impending failures.
Anomaly sound detection can be a useful method for tracking the health and performance
of valve machines by using sound data, advanced signal processing, and machine learning
approaches.
For valve machines, abnormal sound detection enables preventative maintenance techniques
like condition-based maintenance or predictive maintenance. Potential defects can be detected
early on by continuously monitoring the sound characteristics of valve machines, enabling
prompt maintenance actions. With a proactive approach, unplanned downtime is minimised,
maintenance plans are optimised, and total maintenance costs are decreased.
Frequently, safety-critical procedures like fluid flow control, pressure management, or sys-
tem shutdowns use valve machines. Anomalies in valve machines can jeopardise operational
safety, posing a risk to the environment, damaging equipment, or even putting human lives in
jeopardy. Potential safety issues can be reduced by creating precise and dependable anomalous
sound detection techniques, assuring the secure and efficient operation of valve machines.
Compared to other forms of machinery, abnormal sound detection for valve machines
presents particular difficulties. Complex auditory behaviours are displayed by valve systems
due to a variety of valve actions, fluid dynamics, and mechanical interactions. These unique
issues can be addressed in a thesis on valve machine sound detection, for example, by creating
specialised feature extraction methods or domain-specific anomaly classification systems.

1
CHAPTER 1. INTRODUCTION 2

The use of anomalous sound detection techniques in valve machines meets the needs and
demands of the sector. Industries that rely on valve machines are continually looking for novel
ways to enhance equipment efficiency overall, reduce operational costs, and improve mainte-
nance procedures. A thesis in this field has the potential to immediately advance commercial
applications and spark interest among key industry players.
There are prospects for cooperation with industry partners, subject-matter specialists, and
research organisations in the area of anomalous sound detection for valve machines. Collabo-
ration on projects can lead to access to real-world datasets on valve machines, useful insights,
and operational knowledge. This partnership can enhance the research process, guarantee the
thesis work’s applicability, and promote knowledge transfer between academia and industry.
Significant academic and professional development opportunities are provided by pursuing
a thesis in anomalous sound detection for valve machines. It enables in-depth investigation
of the dynamics of valve machines, signal processing methods, machine learning algorithms,
and domain-specific information. Additionally, the thesis work may result in publications in
respected journals or conference proceedings, enhancing the researcher’s academic record and
prestige in the field.

1.2 Aim
The objective of this thesis is to create a system capable of identifying sound irregularities
in valve machines. The thesis intends to contribute to the field of industrial maintenance and
improve the overall operational performance of valve machines by utilising cutting-edge signal
processing and machine learning approaches.
Learn more about the sound profiles produced by various valve machine types by taking
into account a variety of operating situations, valve movements, and mechanical interactions.
Data gathering from actual valve machines will be used in this inquiry, along with a thorough
analysis of their acoustic characteristics.
Investigate and contrast different machine learning techniques that are effective for finding
anomalies in valve machine sound data. To find the best algorithm for precisely recognising
sound anomalies, this investigation will use methods like supervised learning, unsupervised
learning, and deep learning approaches.
Create a thorough anomaly detection system that incorporates the chosen machine learning
algorithms and optimised feature extraction techniques. The system should be able to monitor
valve machine noise in real-time and send prompt alerts or notifications when anomalies are
found, allowing for preventative maintenance interventions.
Conduct in-depth analyses and comparative trials to evaluate the functionality and effi-
ciency of the created anomaly detection system. Analyse its precision, robustness, and sen-
sitivity to various abnormalities and operational circumstances. To demonstrate the system’s
superiority, compare its performance to that of other methods and benchmark datasets.
CHAPTER 1. INTRODUCTION 3

Work together with business partners or gain access to valve equipment in actual industrial
settings. To confirm the produced system’s applicability, dependability, and efficacy in spotting
sound anomalies in real-world circumstances, validate it using data from actual valve machines.
To gain insight into the fundamental causes of the anomalies, analyse the abnormalities that
have been found and the sound patterns that go along with them. The results of this research
may be used to inform maintenance decision-making procedures including root cause analysis,
fault diagnosis, and maintenance scheduling, resulting in valve machine maintenance plans that
are both effective and efficient.
The overall goal of this thesis is to advance the field of sound anomaly detection in valve
machines through the creation of a dependable and useful system that improves valve machine
maintenance procedures and operational performance in a variety of industrial applications.
Chapter 2

Background

2.1 Fields Related


Industrial maintenance is directly related to anomaly sound detection for valve machines. This
industry focuses on making sure that machinery and equipment operate efficiently and perform
at their best in industrial environments. By enabling early detection of valve machine problems
and malfunctions, saving downtime, and enhancing overall operational efficiency, anomaly
sound detection techniques can support predictive maintenance plans.
Analysing and interpreting the sound data produced by valve machines requires the use of
acoustic signal processing. In this area, sound signals are subjected to methods for signal fil-
tering, feature extraction, spectrum analysis, time-frequency analysis, and pattern recognition.
The accuracy and dependability of anomaly detection algorithms for the sounds produced by
valve machines can be improved by the efficient use of acoustic signal processing techniques.
In order to create anomaly detection systems for valve machines, machine learning and
pattern recognition are crucial domains. Valve machine sound data can be used to identify
trends and find abnormalities using methods including statistical modelling, deep learning,
supervised learning, and unsupervised learning. These disciplines offer the approaches and
tools needed to create reliable and precise anomaly detection models.
For valve machines, anomaly sound detection and root cause investigation go hand in hand.
Further investigation is necessary to identify the underlying cause of an anomaly in valve ma-
chine noises. This entails locating the precise flaw or failure that resulted in the strange sound
patterns. Understanding the nature of the abnormalities and directing subsequent maintenance
operations are made possible by fault diagnosis and root cause analysis approaches.
Systems for industrial automation and control include valve machines as essential compo-
nents. To improve the monitoring and diagnostic capabilities of these systems, anomaly sound
detection techniques can be incorporated. Real-time warnings and automatic actions can be
performed based on the anomalies discovered by integrating anomaly detection with automa-
tion and control systems, enabling proactive maintenance and reducing downtime.

4
CHAPTER 2. BACKGROUND 5

These fields are related to one another and serve as a basis for the creation and use of
anomalous sound detection techniques for valve machines. Researchers and practitioners can
develop the field of anomalous sound detection, better maintenance procedures, and improve
the performance and dependability of valve machines across a variety of industrial sectors by
utilising information and methods from these connected fields.

2.2 Related Work


Y. Koizumi et al(2017)[1] proposed on the basis of the Neyman-Pearson lemma, a technique
for improving an acoustic feature extractor to find anomalous sounds was suggested. Anoma-
lous sounds are those that are uncommon or rare, like gunshots or explosions. The suggested
technique starts by choosing a group of acoustic properties that are pertinent to the endeav-
our of anomalous sound detection. The weights of the features were then optimised using
the Neyman-Pearson lemma in order to get a detection score for each time frame of the au-
dio stream. After thresholding the detection score, each time frame was classified as either
anomalous or not abnormal. On a dataset of gunshot noises, the authors tested the suggested
strategy, and they were able to achieve a high detection rate with a low false alarm rate. The
outcomes show how well the suggested technique for enhancing the acoustic feature extractor
for anomalous sound detection works.
Y. Kawaguchi (2017)[2] developed a technique for finding anomalies in subsampled audio
sources. The term ”anomaly” describes events or patterns in an audio stream that are not con-
sistent with expectations, for as environmental sounds that are not generally present or audio
hiccups. The audio signal is initially subsampled in the suggested way to lessen the dimen-
sionality of the data. The authors next learned a dictionary of audio attributes that can describe
the audio stream at the subsampled rate using a sparse coding technique. A collection of train-
ing audio signals that included both abnormal and typical audio events served as the basis for
learning the lexicon. The suggested method was tested by the authors using a collection of envi-
ronmental sounds, and they showed that it was successful at identifying abnormal occurrences
in subsampled audio signals. The outcomes shown that, on the same dataset, the proposed
method performed better than previous anomaly detection approaches.
CHAPTER 2. BACKGROUND 6

Figure 2.1: Spectrogram of a part of the original sound. The X-axis shows the time in seconds.

Based on deep learning and the Neyman-Pearson lemma, Y. Koizumi (2019)[3] devised an
unsupervised method for identifying abnormal noises in audio signals. Rare or odd sounds that
deviate from the typical sounds in the audio stream are referred to as abnormal sounds. The
suggested approach starts by teaching a deep autoencoder neural network a low-dimensional
representation of the audio signals using a set of typical audio signals. The key characteristics
of typical audio signals are captured by the learned representation, while irrelevant or strange
characteristics are ignored. The scientists computed a threshold for detecting anomalous sounds
in the low-dimensional representation of the audio signals after training the deep autoencoder
using the Neyman-Pearson lemma. By computing the audio signals’ reconstruction errors and
thresholding them above the calculated threshold, abnormal sounds were found. On a dataset
of urban sounds, the authors tested the suggested strategy, and they were able to achieve a high
detection rate with a low false alarm rate. The outcomes showed how successful the suggested
method was at unsupervisedly identifying aberrant sounds in audio streams.
CHAPTER 2. BACKGROUND 7

Figure 2.2: Procedure of anomalous sound simulation using autoencoder.

Y. Kawaguchi(2019)[4] presents a methodology that combines dereverberation and anoma-


lous sound extraction to find anomalous noises in audio streams. The method starts by utilis-
ing a dereverberation method to remove reverberation from the audio signal before extracting
anomalous sounds with a model that has been trained on regular sound data. The suggested
approach has great accuracy in detecting abnormalities when tested on various types of au-
dio sources. The outcomes show the method’s potential for use in practical applications for
identifying unusual sounds in challenging acoustic settings.
Y. Koizumi(2019)[5] propose To solve the issue of uneven data distribution in the training
dataset, which can result in biassed anomaly scores, it is suggested to employ batch uniformiza-
tion. It is intended to alter the training batches so that each batch has an equal number of normal
and anomalous samples. This is accomplished by undersampling the typical samples in each
batch while oversampling the anomalous samples. The authors demonstrate that, when com-
pared to other approaches like data augmentation or resampling, this strategy yields results for
anomaly identification that are more reliable. The results demonstrate that batch uniformization
can effectively reduce the maximum anomaly score of the DNN-based anomaly detection sys-
tem, improving performance in detecting uncommon and unexpected sounds. The suggested
method is tested on a dataset of ambient noises. The authors also offer some information about
how certain settings affect how well the strategy works.
CHAPTER 2. BACKGROUND 8

Figure 2.3: Evaluation Results.

A pre-training step and a fine-tuning stage make up the two stages of the suggested IDNN
technique that is proposed by K. Suefusa(2020)[6]. A deep autoencoder is trained on a sizable
dataset of sound signals during the pre-training phase in order to discover a low-dimensional
representation of the data. The IDNN, which consists of numerous shallow neural networks
connected by a linear interpolation function, is then trained using this representation. The
IDNN can distinguish between representations of normal and abnormal sounds in feature space
since it is built to interpolate between them. This study introduces a potential method for
detecting anomalous sounds based on an IDNN, which combines the strength of deep learning
with the capacity to interpolate across various data representations. The suggested method
has significant applicability in numerous industries where it is essential to detect unusual or
unexpected sounds in order to maintain safety and security.
CHAPTER 2. BACKGROUND 9

Figure 2.4: Examples of log-Mel spectrograms of the original sound.

H. Purohit(2020)[7] presents the use of deep autoencoders in conjunction with Gaussian


mixture models (GMMs) to create a novel unsupervised anomaly detection technique for audio
signals. The difficult task of unsupervised anomaly detection involves identifying anomalies
in data without the aid of labelled samples of anomalous signals. This task is crucial in the
context of acoustic signals since it has applications in surveillance, environmental monitoring,
and industrial maintenance. The number of components in the GMM and the threshold for
identifying signals as anomalous are two other hyperparameters that the authors suggest can
be optimised. A grid search is used to optimise the hyperparameters on a validation set. Two
acoustic signal datasets, the DCASE 2018 Task 2 dataset and the MIMII dataset, are used to
assess the proposed technique. On both datasets, the results demonstrate that the suggested
strategy performs better than other cutting-edge unsupervised anomaly detection techniques.
The authors also offer some information about how the method’s performance is impacted by
the hyperparameters.
Y. Koizumi(2020)[8] discusses task 2 of the DCASE2020 challenge, which was centred
on the unsupervised identification of aberrant sounds for machine condition monitoring. The
goal of the challenge was to assess the effectiveness of cutting-edge techniques for spotting un-
usual noises in industrial machinery, which is crucial for maintaining safety, cutting down on
downtime, and increasing maintenance effectiveness. Pumps, fans, valves, and slide rails were
among the four machine types that made up the challenge dataset. The recordings, which were
made during routine operations, featured a range of ambient sounds and operational settings.
By intentionally creating various flaws and anomalies in the machines, such as bearing flaws,
unbalance, and misalignment, the aberrant sounds were produced. Twenty proposals from re-
search teams throughout the world were received for the challenge, and the outcomes demon-
strated that the suggested methods performed well in identifying unusual sounds in the machine
CHAPTER 2. BACKGROUND 10

recordings. Convolutional neural networks (CNNs) and autoencoders, which are based on deep
neural networks and exhibit the best performance, outperformed more conventional signal pro-
cessing techniques.
Y. Kawaguchi(2021)[9] discusses The DCASE 2021 challenge task 2, which was concerned
with domain-shifted unsupervised anomaly detection for machine condition monitoring, is de-
scribed and discussed. The goal of the challenge was to assess how well state-of-the-art ap-
proaches performed when training and testing data came from different distributions, as might
occur when trying to detect aberrant sounds in industrial machinery. The challenge received
20 applications from research teams throughout the world, and the findings demonstrated that
the suggested methods performed well in identifying aberrant noises in the machine recordings
even when the domain of the recordings was modified. Convolutional neural networks (CNNs)
and autoencoders, which are based on deep neural networks and exhibit the best performance,
outperformed more conventional signal processing techniques.
N. Harada(2021)[10] introduces a brand-new dataset, ToyADMOS2, for the detection of
anomalous sound in domain-shifting circumstances. The dataset consists of normal and ab-
normal noises produced by small machines that are similar to the sounds made by industrial
machinery. By producing various flaws and abnormalities in the machines, such as bearing
flaws, misalignments, and impeller damage, the aberrant sounds in the dataset were produced.
Different operational settings and background noises, such as varying speeds, loads, and am-
bient noise levels, were used to record the normal sounds. The report also offers a baseline
performance assessment of a number of cutting-edge techniques on the ToyADMOS2 dataset.
The findings demonstrate that the most effective algorithms produce excellent detection perfor-
mance under domain shift conditions and are based on deep neural networks, such as autoen-
coders and convolutional neural networks.
In the context of domain generalisation challenges, K. Dohi(2022)[11] introduces a brand-
new sound dataset called MIMII DG that is intended for examining and analysing broken in-
dustrial machinery. The dataset includes noises captured from a variety of industrial equipment,
including pumps, fans, compressors, and air conditioners, and it includes both typical and un-
usual machine operation circumstances. By generating numerous flaws and anomalies in the
machines, such as bearing flaws, misalignments, and motor failures, the aberrant noises in the
dataset were produced. Different operational settings and background noises, such as varying
speeds, loads, and ambient noise levels, were used to record the normal sounds. The report
also offers a baseline performance assessment of a number of cutting-edge techniques on the
MIMII DG dataset. The findings demonstrate that the most effective algorithms produce ex-
cellent detection performance under domain generalisation conditions and are based on deep
neural networks, such as autoencoders and convolutional neural networks.
CHAPTER 2. BACKGROUND 11

Figure 2.5: Examples of spectrograms for each machine type.

R. Giri(2020)[12] offers a technique for autonomously learning to identify unusual noises.


The suggested approach is based on contrastive learning, in which a deep neural network is
taught to increase similarity between examples of good sound and decrease similarity between
examples of negative sound. By randomly changing the pitch, volume, or speed of the original
sound, the authors employ a vast collection of environmental noises to introduce anomalous
sounds. The outcomes demonstrate that the suggested strategy performs better than previous
unsupervised methods and is competitive with supervised methods.
P. Primus(2020)[13] provides a straightforward method for binary classification-based anoma-
lous sound detection. Instead of constructing sophisticated models, the authors contend that it
is possible to get good performance by carefully choosing proxy outlier cases and training
a straightforward binary classifier. By choosing sound instances that are different from the
typical sound examples in the training set, the proxy outliers are obtained. Additionally, the au-
thors suggest a metric for comparing the similarity of sound examples that combines time- and
frequency-domain properties. The DCASE 2020 challenge dataset is used to test the proposed
method, which compares favourably against previous unsupervised approaches. The proposed
method’s computational effectiveness and simplicity of integration into actual systems are both
shown by the authors.
The approach for detecting anomalous sounds in machine condition monitoring using the
confidence scores of a sound classification model is suggested in the publication ”Detection
of anomalous sounds for machine condition monitoring using classification confidence” by T.
Inoue (2020)[14]. The technique employs a threshold-based approach to categorise sounds as
normal or anomalous based on the confidence ratings using a pre-trained classification model
to extract characteristics from audio signals. When compared to existing state-of-the-art algo-
rithms, the suggested method performs competitively on the DCASE 2020 Task 2 dataset. The
CHAPTER 2. BACKGROUND 12

effectiveness of the suggested strategy is further examined by the authors in relation to several
variables, including the quantity of training samples and the type of classifier.
M. Sandler (2018)[15] demonstrates MobileNetV2, a mobile-friendly neural network ar-
chitecture made for effective and precise image classification on portable devices. The ”in-
verted residuals” and ”linear bottlenecks” building components, which the authors introduce,
are meant to increase the network’s efficiency while preserving accuracy. The linear bottleneck
block is used to lower the computational cost of the network, and the inverted residual block is
used to increase the nonlinearity of the network while minimising the number of parameters.
The study demonstrates that MobileNetV2 beats MobileNetV1 on a number of benchmarks
while retaining a similar model size and computational expense. Since then, many computer
vision tasks, including object identification, segmentation, and video classification on mobile
devices, have made extensive use of the proposed architecture.

Figure 2.6: Comparison of convolutional blocks for different architectures. ShuffleNet uses
Group Convolutions and shuffling, it also uses conventional residual approach where inner
blocks are narrower than output.

Y. Zeng(2022)[16] presents a two-stage model that combines a feature extraction stage


based on a trained convolutional neural network (CNN) and a classification stage based on
a multi-layer perceptron (MLP) model as a method for anomalous sound detection. To en-
hance the system’s performance, the strategy additionally uses data augmentation and en-
sembling techniques. The report shows the efficacy of the suggested method for detecting
anomalous sounds in the presence of domain shifts by presenting experimental findings on the
DCASE2022 Challenge dataset.
I. Kuroyanagi(2022)[17] explains a two-stage anomalous sound detection system for moni-
toring machine health. A domain generalisation model, which can categorise anomalous noises
CHAPTER 2. BACKGROUND 13

even under hypothetical test settings, is the initial stage. A domain specialisation model, used
in the second stage, can categorise strange noises produced by a particular machine. The sys-
tem extracts pertinent data from the audio signals using a variety of feature extraction tech-
niques, such as spectrograms, sub-band spectral features, and mel-frequency cepstral coef-
ficients (MFCCs). The outcomes demonstrate that the suggested system outperformed vari-
ous baseline models in the DCASE2022 Challenge in terms of size and computing cost while
achieving excellent accuracy in both stages. Since then, many computer vision tasks, includ-
ing object identification, segmentation, and video classification on mobile devices, have made
extensive use of the proposed architecture.
F. Xiao(2022)[18] present their system developed for Task 2 of the DCASE2022 Chal-
lenge, which focuses on anomalous sound detection. The proposed system incorporates self-
supervised attribute classification and Gaussian Mixture Model (GMM)-based clustering to
improve the performance of sound detection.The authors leverage self-supervised learning
techniques to learn meaningful representations of sound attributes without relying on labeled
data. This allows the system to effectively capture the characteristics of normal and anomalous
sounds, even when limited labeled data is available. The self-supervised attribute classifica-
tion helps in identifying relevant sound attributes that contribute to the detection of anoma-
lies.Additionally, the system groups together comparable sound examples using GMM-based
clustering. By identifying the underlying patterns and structures in the data, this clustering
method aids in the differentiation between normal and abnormal sounds. The system can ac-
curately recognise and categorise aberrant noises by associating each sound instance with a
particular cluster.The DCASE2022 Challenge results for the suggested system show that it per-
forms well in terms of identifying unusual sounds. The performance of the sound detection
problem is improved by combining self-supervised attribute classification and GMM-based
clustering. The results of this study offer important new perspectives on the creation of sophis-
ticated techniques for unsupervised anomalous sound detection.
Y. Deng(2022)[19] show off the AITHU system that was created for the DCASE2022 Chal-
lenge. Their technique focuses on the sound data-based unsupervised abnormal detection of
machine operational status.Without labelled training data, the AITHU system seeks to solve
the problem of identifying anomalies in machine working state. It makes use of acoustic sig-
nal analysis to spot irregularities in machine behaviour. The system can distinguish between
typical and abnormal behaviour by examining the acoustic patterns and traits of the machine
sounds.The authors suggest a brand-new methodology that blends cutting-edge machine learn-
ing methods with the evaluation of reliable data. The AITHU system makes use of several
feature extraction and signal processing techniques to extract the pertinent data from the sound
sources. Then, it uses unsupervised learning techniques to find odd patterns and categorise
them appropriately.In the DCASE2022 Challenge, the AITHU system proved to be useful by
achieving noteworthy results in unsupervised anomaly detection of machine working state. The
system’s potential for real-world applications in industrial settings, where monitoring and iden-
tifying machine failures are vital, is highlighted by its capacity to detect anomalies without the
requirement for labelled data.
S. Venkatesh(2022)[20] presented a unique method called Disentangled Surrogate Task
Learning (DSTL) to improve the domain generalisation abilities of unsupervised anomalous
CHAPTER 2. BACKGROUND 14

sound detection systems. The DSTL approach makes use of the idea of surrogate tasks, which
are side tasks that are connected to the primary detection job but simpler to learn. The sci-
entists want to increase the system’s ability to generalise to unknown domains by training the
model to perform well on these surrogate tasks.The DSTL technique explicitly distinguishes
between domain-specific and domain-invariant information in the sound data to deconstruct the
representation learning process. This detachment enables the model to concentrate on discov-
ering the common underlying structure across several domains, improving generalisation.To
assess the effectiveness of their DSTL technique, the authors ran tests using the DCASE2022
Challenge dataset. The outcomes show how well their approach works in terms of enhancing
domain generalisation for unsupervised anomalous sound detection. The potential of the DSTL
approach for real-world applications where there is a dearth of labelled data from all domains
is demonstrated by the approach’s considerable advancements in the detection of abnormalities
from unknown domains.

Figure 2.7: Block diagram of disentangled anomaly detector. In the figure, NN stands for
Nearest Neighbor. In the training phase, exclusive latent spaces were assigned to sections and
attributes.

Y. Wei(2022)[21] presented their approach for the DCASE2022 Challenge Task 2 on anoma-
lous sound detection. The task focuses on developing systems that can effectively detect
anomalous sounds for machine condition monitoring.Self-challenge and metric evaluation are
two essential parts of author’s unique approach for anomalous sound identification. The model
is trained in a self-supervised way as part of the self-challenge component, and through the
use of a binary classification problem, the model learns to distinguish between typical and
abnormal noises. The model can utilise a significant quantity of unlabeled data thanks to its
self-supervised training, which increases its adaptability to various machine circumstances and
sound environments.The authors also present a metric evaluation process that tries to improve
CHAPTER 2. BACKGROUND 15

system performance. In order to acquire a thorough assessment of the system’s detection ca-
pabilities, they suggest a metric fusion technique that incorporates various metrics, such as
frame-level F1-score and segment-level F1-score. The system can better handle varied lengths
and durations of anomalous sound events by taking into account both frame-level and segment-
level performance.The DCASE2022 Challenge dataset is used to test the proposed system, and
the results show how well it can identify unusual sounds for machine condition monitoring. The
system performs admirably in terms of F1-score and AUC metrics, demonstrating its potential
for use in practical situations.By offering a self-challenge approach and a thorough metric eval-
uation system, the study described in this paper makes a contribution to the field of anomalous
sound identification. Without relying on labelled data, self-supervised learning enables the sys-
tem to adapt to various machine circumstances, while metric fusion technique offers a more
thorough and precise assessment of the system’s performance.
K. Morita(2022)[22] showed the study’s goal is to investigate how well various spectro-
gram representations can identify unusual sounds. To view the frequency content of sound
waves over time, spectrograms are frequently employed in audio signal processing. For differ-
ent audio analysis tasks, such as sound classification and anomaly detection, they offer useful
information.The Mel-spectrogram, Gammatonegram, and CQT-spectrogram are three distinct
spectrogram forms that the authors look into. The features of each representation in capturing
various components of the sound signals, such as frequency resolution and time-frequency
localisation, are unique. The authors hope to find the best spectrogram representation for
anomalous sound detection by evaluating the performance of several representations.The au-
thors use the DCASE2022 Challenge dataset to assess performance and a machine learning-
based method for anomalous sound identification. They use each spectrogram representation
to train and test their models, and they evaluate the detection performance using metrics such
as Area Under Curve (AUC) and Partial AUC (pAUC).The experimental findings shed light on
the efficacy of various spectrogram representations for the detection of aberrant sounds. Based
on their research, the writers examine the benefits and drawbacks of each representation and
offer suggestions. The study advances knowledge of how spectrogram representations affect
the effectiveness of systems for detecting abnormal sounds.
CHAPTER 2. BACKGROUND 16

Figure 2.8: MobileFaceNet Architecture

J. Bai(2022)[23] A batch mixing technique and an anomaly detector are used in the pro-
posed method for anomalous sound detection. Using the batch mixing technique, synthetic
training samples are created by randomly fusing several regular sound clips to represent anoma-
lous sound events. The goal of this strategy is to improve the model’s capacity for generali-
sation and precise anomaly sound detection.The authors use an anomaly detector, which is
trained using a combination of normal and anomalous sound recordings, to carry out the abnor-
mal sound detection. The detector gains the ability to discern between regular and abnormal
sound patterns, and it rates test samples for anomaly based on how far they depart from the
norm.The Jless method’s performance is evaluated using the DCASE2022 Challenge dataset
and conventional assessment criteria like Area Under Curve (AUC) and partial AUC (pAUC).
The evaluation’s findings show how successful the suggested approach is in identifying unusual
sounds and achieving competitive performance in the challenge.The authors offer insights into
the effectiveness of the Jless technique and talk about the advantages and disadvantages of their
strategy. In order to achieve precise and reliable anomalous sound detection, they emphasise
the advantages of the batch mixing technique and the significance of the anomaly detector.
CHAPTER 2. BACKGROUND 17

Figure 2.9: The procedure of batch mixing strategy.

S. Verbitskiy(2022)[24] take on the problem of detecting anomalous noises, where the ob-
jective is to identify unidentified aberrant sounds based only on regular sound data. To enhance
the performance of detection, their method makes use of several time-frequency representa-
tions as input features.The authors train their anomalous sound detection (ASD) systems for
each machine type using normal sound recordings and their section indices. To capture a va-
riety of properties of the sound data, they use ensembles of 2D CNN-based systems, each
employing a different time-frequency representation.The authors use cosine similarity and the
k-nearest neighbours technique (k-NN) to score anomalies by extracting embedding vectors
from their CNNs. They establish anomaly scores to determine instances that are unusual by
contrasting the embedding vectors of test clips with those of typical sound clips.The perfor-
mance of the suggested strategy is examined using evaluation measures like Area Under Curve
(AUC) on the DCASE2022 Challenge dataset. The outcomes show that their method outper-
forms benchmark systems and produces competitive outcomes in the challenge, resulting in
a high detection performance.The success of using several time-frequency representations for
unsupervised anomalous sound identification is highlighted by the authors as they examine the
advantages and disadvantages of their approach. They offer analysis of the effectiveness of
their strategy and recommendations for more enhancements.
K. Wilkinghoff(2022)[25] provides a domain-generalization-focused anomalous sound de-
tection system for machine condition monitoring. The system’s objective is to identify unusual
sounds in a wide range of contexts, including those unrelated to the training set.To improve the
system’s capacity to identify anomalies across a variety of domains, the author’s approach in-
tegrates outlier exposure techniques. The system gets more reliable and environment-adaptive
by subjecting the model to a variety of difficult samples during training.The author combines
deep learning models, feature engineering, and outlier exposure tactics to create the anomalous
CHAPTER 2. BACKGROUND 18

sound detection system. The models use different audio representations to capture pertinent
information and are trained on common sound recordings. The system then employs strategies
for outlier exposure to boost generalisation and the detection of anomalies.The DCASE2022
Challenge dataset is used to gauge the system’s performance, and the outcomes are examined
using accepted assessment measures. The report highlights the benefits of the outlier-exposed
technique for domain generalisation while discussing the performance attained and compar-
ing it to baseline systems.The author also examines potential areas for future enhancements
and offers insights into the system’s advantages and disadvantages. By addressing the issue of
domain generalisation and demonstrating the potency of outlier exposure strategies, the work
makes a contribution to the field of unsupervised anomalous sound detection.

Figure 2.10: Structure of the proposed anomalous sound detection system.

H. Zhang(2018)[26] authored the paper titled ”mixup: Beyond Empirical Risk Minimiza-
tion,” presented at the International Conference on Learning Representations in 2018. The
paper introduces a data augmentation technique called ”mixup” that goes beyond conventional
empirical risk minimization approaches.The mixup technique generates augmented training
samples by linearly interpolating pairs of input examples and their accompanying labels, with
the goal of enhancing the generalisation performance of deep learning models. Mixup en-
courages the model to learn from a more varied and even distribution of training samples by
combining the inputs and labels, effectively regularising the model and minimising overfit-
ting.The theoretical study of mixup presented in this paper shows that it encourages the model
to act linearly between training instances and their labels. Mixup can successfully combine
the data from several samples thanks to its linearity property, which enhances robustness and
generalisation.To assess the performance of mixup in comparison to other data augmentation
techniques, the authors also conduct comprehensive trials on a variety of image and text clas-
sification tasks. The findings show that mixup regularly improves generalisation performance,
CHAPTER 2. BACKGROUND 19

lowering the likelihood of overfitting and enhancing the model’s capacity to handle out-of-
distribution data.To assess the performance of mixup in comparison to other data augmentation
techniques, the authors also conduct comprehensive trials on a variety of image and text clas-
sification tasks. The findings show that mixup regularly improves generalisation performance,
lowering the likelihood of overfitting and enhancing the model’s capacity to handle out-of-
distribution data.
I. Nejjar(2022)[27] suggests an innovative method for learning meaningful representations
from unlabeled audio data that makes use of self-supervised learning. A deep neural network
can learn to capture high-level acoustic properties that can be used for future unsupervised
anomalous sound detection by pre-training it on a pretext task using a lot of unlabeled audio.The
architecture and training methodology used in their approach are described in the publication.
A smaller labelled dataset of typical and unusual sound samples is used to fine-tune the pre-
trained model. On the DCASE2022 Challenge dataset, the authors test their methodology, and
the results are presented in terms of performance metrics including accuracy, precision, recall,
and F1 score.The application of self-supervised learning approaches for unsupervised anoma-
lous sound detection is the paper’s main contribution. The suggested method can successfully
identify anomalous sounds without the use of manual annotation or explicit labelling by util-
ising unlabeled data and learning meaningful representations.The outcomes indicate how the
self-supervised learning strategy works and highlight its potential for unsupervised anomaly
detection in reliable data. The writers also talk about the method’s shortcomings and potential
prospects for the future.
Chapter 3

Methodology

3.1 Dataset
The dataset was created using ToyADMOS2 [10] and MIMII DG [11]. The dataset includes
operational noises from seven different toy and real machine types, including the ToyCar, Toy-
Train, fan, gearbox, bearing, slide rail, and valve.
Each recording is a 10-second long, single-channel, 16 kHz sampling audio file. To con-
struct the training/test data, we combined machine sounds collected in laboratories with am-
bient noise recorded in actual industries. You may read more about the recording process in
[10] and [11].I would want to make it clear that the audio samples in the dataset I used for
training and evaluation are all sampled at a rate of 16 kHz. The frequency range that can be
accurately caught within the audio signal depends on the sampling rate, which is the quantity
of audio samples taken each second. I made sure that all of the audio samples were consistent
by standardising the dataset to a 16kHz sampling rate. This made it possible for me to use a
consistent set of signal processing and analysis procedures throughout the investigation. Addi-
tionally, a sampling rate of 16 kHz is enough for recording important details in valve machine
sounds and is frequently employed in a variety of audio-related applications. The selection of
a sampling rate of 16 kHz is particularly appropriate for valve machine monitoring because it
strikes a balance between capturing high-frequency components that may indicate anomalies
in the machine’s operation and effectively representing the audio data without requiring large
files or computational demands.

3.2 Autoencoders
A potent type of artificial neural networks known as autoencoders has demonstrated consid-
erable promise in a number of fields, including valve machine monitoring. Because they can
efficiently capture the underlying patterns and irregularities present in the sensor data produced

20
CHAPTER 3. METHODOLOGY 21

by valve machines, autoencoders are particularly helpful in this situation. Autoencoders im-
prove the overall monitoring and maintenance of valve machines by enabling effective feature
extraction and anomaly detection by learning a compressed representation of the input data.
The capacity of autoencoders to recognise intricate and nonlinear correlations in sensor data
is one of their primary advantages in valve machine monitoring. Numerous sensors, including
pressure sensors, temperature sensors, and vibration sensors, are used in valve machines to
produce enormous volumes of data. Even though this data offers useful information about the
machine’s operational state, manually identifying significant features can be difficult. With-
out the requirement for explicit feature engineering, autoencoders may automatically learn a
condensed representation of the sensor data, capturing the most important patterns and corre-
lations.
Autoencoders are often trained on a large dataset of typical operating situations in valve
machine monitoring. The autoencoder learns to accurately recreate the input data during the
training phase while attempting to reduce the reconstruction error. The autoencoder can record
the typical patterns and fluctuations in the sensor data thanks to this technique. The autoencoder
can be used to reconstruct fresh data samples after being trained and compare the reconstruc-
tion error to a predetermined threshold. Higher reconstruction errors will occur from unusual
or abnormal patterns that considerably depart from the ingrained normal patterns, suggesting
potential flaws or anomalies in the valve machine.
Another benefit of using autoencoders for valve machine monitoring is their unsupervised
nature. Autoencoders may learn directly from unlabeled sensor data, in contrast to supervised
learning techniques that demand labelled data for training. This is especially useful in situations
where labelled anomalous data is hard to come by or rare. Autoencoders can find and record
tiny irregularities in the sensor data that may be difficult to find using conventional approaches
by utilising unsupervised learning.
Autoencoders are also capable of real-time learning and adaptation, which qualifies them
for online valve machine monitoring. The autoencoder may update its learnt representation
to take into account changes in the machine’s behaviour and detect anomalies in real-time as
sensor data streams continually. This functionality enables proactive maintenance and reduces
downtime by enabling early defect detection or abnormal operating circumstances.

3.3 Libraries used


A well-liked open-source deep learning framework called TensorFlow offers a variety of tools
and functions, including autoencoders, for creating and training machine learning models. Ten-
sorFlow is frequently used in conjunction with autoencoders for valve machine monitoring for
a number of reasons:
Deep learning models call for the efficient execution of complicated mathematical opera-
tions, and TensorFlow makes use of highly optimised computational libraries to do just that.
CHAPTER 3. METHODOLOGY 22

TensorFlow is an appropriate candidate for processing the sensor data produced by valve ma-
chines because of its performance, which is vital when working with huge datasets and intricate
autoencoder structures.
Building neural network models with TensorFlow is simple and flexible, and it includes
autoencoders. It offers TensorFlow Keras, a high-level API that makes it easier to define and
train deep learning models. With TensorFlow, you can quickly build numerous autoencoder
types, experiment with different architectures, and modify the model to meet the unique needs
of valve machine monitoring.
As a strong and widely used deep learning framework, TensorFlow offers effective comput-
ing, flexibility in model creation, GPU acceleration, a rich ecosystem, deployment flexibility,
and substantial community support. TensorFlow is a great option for using autoencoders in
valve machine monitoring because of these features, which allow for effective processing of
sensor data, model customisation, and smooth integration into monitoring systems.You can
read more about tensorflow at[28].
I decided to use TensorFlow as the main framework for building the autoencoder-based
anomaly detection system for my thesis on valve machine monitoring. TensorFlow provides a
variety of strong arguments that support the needs and goals of my research.
TensorFlow is the best option for handling the massive datasets produced by valve machines
because of its fast calculation capabilities. I can effectively handle and analyse the sensor data
using TensorFlow’s optimised computational libraries, which enables the creation of precise
and reliable autoencoder models.
Another key benefit of TensorFlow is its model construction flexibility. I can quickly design
and train several kinds of autoencoders, experiment with various architectures, and fine-tune
the models in accordance with the particular requirements of valve machine monitoring thanks
to TensorFlow’s high-level API, TensorFlow Keras. This versatility enables me to alter the
autoencoder architecture and enhance its functionality for applications involving anomaly de-
tection.
The training and inference phases of the autoencoder models are significantly accelerated
by TensorFlow’s easy connection with GPUs. TensorFlow maximises computational efficiency
and cuts down on model training time by taking advantage of GPU capability. When working
with large-scale valve machine datasets, this functionality is crucial because it speeds up testing
and iteration.
The robust ecosystem of TensorFlow is essential to the success of my thesis work. I have
access to a multitude of resources inside the TensorFlow ecosystem, including pre-trained mod-
els, libraries, and tools, which I can use to improve the efficiency and performance of my
autoencoder-based anomaly detection system. TensorFlow may also be seamlessly integrated
with a wide range of data processing, visualisation, and evaluation tools, enabling a thorough
study of every machine data. Additionally, TensorFlow is compatible with other well-known
Python libraries that I will state in a moment.
I used the Audiomentations library to improve the performance of my anomaly detection
system in my thesis on valve machine monitoring. The extensive collection of audio data
CHAPTER 3. METHODOLOGY 23

augmentation methods offered by Audiomentations was helpful in enhancing the robustness


and generalizability of my model.
I was able to apply several augmentation tactics to the valve machine sound data by includ-
ing Audiomentations in my thesis study. As a result, I was able to create augmented samples
and efficiently increase the training dataset, overcoming issues like data scarcity and class im-
balance. I was able to recreate real-world situations and add variations to the audio signals
because to the library’s wide range of augmentation techniques, such as time stretching, pitch
shifting, background noise injection, and more.
The smooth integration of Audiomentations with well-known deep learning frameworks
like TensorFlow and PyTorch was one of the main benefits of utilising it. The library’s user-
friendly interface made it simple to incorporate audio data augmentation into my workflow for
training models. This removed the need for intricate custom code implementations, saving time
and guaranteeing correct and consistent dataset supplementation.
Additionally, Audiomentations provided an effective implementation that was speed- and
memory-optimized, which was essential for managing big audio files. Because of its adaptabil-
ity to different audio file formats, I was able to deal with the 16kHz sound samples in my thesis
without any hassle.
I was able to run thorough experiments and assess the effects of various augmentation
tactics on the effectiveness of my anomaly detection system by utilising Audiomentations in
my thesis. The usage of Audiomentations improves the model’s resistance to fluctuations and
noise present in real-world circumstances while also improving the model’s capacity to detect
anomalies in valve machine sound.
I used the Keras deep learning framework as a key tool for creating and refining my anomaly
detection model for my thesis on valve machine monitoring. The construction of intricate neu-
ral network topologies was facilitated by Keras’ high-level and user-friendly interface, which
also sped up the entire model creation procedure.
The simplicity and usability of Keras were major factors in my decision to utilise it for my
thesis. It is the perfect tool for quick iteration and exploration of diverse strategies because
of its straightforward syntax and modular design, which allowed me to quickly prototype and
experiment with various network designs. Additionally, Keras provided a large selection of
pre-built layers, activation functions, and optimisation methods that made it much easier to
create sophisticated neural network models for my goal of monitoring valve machines.
The smooth integration of Keras with TensorFlow, a potent open-source machine learning
framework, was another important benefit. Through this integration, I was able to take advan-
tage of TensorFlow’s broad capability while still utilising Keras’ ease of use and flexibility. I
had complete control over the model’s behaviour and performance because I could use low-
level TensorFlow operations as needed.
Additionally, Keras offered comprehensive GPU acceleration support, allowing me to take
advantage of the GPUs’ computing capacity and quicken the training process. This greatly
decreased the training time and made experimentation and model improvement more efficient,
which was especially helpful when working with large-scale datasets and intricate network
topologies.You can read more about keras at [30].
CHAPTER 3. METHODOLOGY 24

3.4 CNN Architecture


I chose to use Convolutional Neural Network (CNN) architectures for my thesis on valve status
monitoring since they are ideally suited for this particular application. CNNs have become
widely used in the field of computer vision and have successfully completed tasks requiring the
analysis of images. But they are highly suited for analysing sensor data and finding anomalies
in valve machines because of their inherent qualities.
Local patterns and features in the input data are particularly well-captured by CNNs. These
patterns can be an indication of typical or aberrant machine behaviour in the context of valve
condition monitoring. The CNN can automatically learn and extract pertinent information from
the sensor data by using convolutional layers, enabling the detection of tiny irregularities that
might not be visible to human observers.
CNN architectures are distinguished by their multi-layered, hierarchical structure, which
gradually learns increasingly abstract representations of the input data. This hierarchical feature
extraction is advantageous for valve status monitoring because it enables the CNN to collect
both high-level and low-level characteristics, such as complicated temporal correlations and
frequency patterns, in the sensor data. The CNN can distinguish between regular and abnormal
machine states by learning these hierarchical representations.
Noise, varying operating conditions, and other sources of variability are frequently present
in valve machine data. Given their demonstrated resilience to such influences, CNN architec-
tures are excellent for addressing the inherent noise and variability of real-world sensor data.
The dependability and generalisation abilities of the valve condition monitoring system are
improved by using regularisation and dropout techniques to successfully manage noisy and
defective data.
In conclusion, the ability to extract localised features, translation invariance, hierarchical
representation learning, robustness to noise and variability, and the availability of transfer learn-
ing and pre-trained models were the factors that led me to choose CNN architectures for my
thesis on valve condition monitoring. These characteristics make CNNs well suited for sensor
data analysis and anomaly detection in valve machines, allowing me to create a strong and
efficient monitoring system to guarantee the best operation and maintenance of valve systems.

3.5 Training Parameters


In my thesis on monitoring valve machines, I used a batch size of 64 and trained the models
over a period of 100 epochs. The number of epochs denotes the total number of times the
complete dataset is passed through the model during training, whereas the batch size refers to
the number of samples processed in each iteration during the training phase.
I tried to balance computational effectiveness and model performance by using a batch size
of 64. Greater batch sizes enable faster training by utilising the parallel processing capability
of contemporary hardware to process more samples. However, it’s crucial to take into account
CHAPTER 3. METHODOLOGY 25

the system’s memory constraints and avoid using extremely high batch sizes that could cause
memory overflow or slower convergence.
The models had enough opportunities to learn from the dataset after 100 epochs of training.
The model may access the full dataset during each epoch, allowing it to gradually enhance
its performance. A balance between training time and obtaining acceptable convergence and
generalisation performance led to the selection of 100 epochs. It is crucial to keep in mind
that the ideal number of epochs may change based on the problem’s complexity, the size of the
dataset, and the model’s design.
I attempted to assure successful model training and convergence while effectively manag-
ing computational resources by employing a batch size of 64 and training the models for 100
epochs. With the end goal of obtaining accurate and dependable findings in the context of valve
machine monitoring, these decisions were made after careful analysis of the dataset features,
model complexity, and available computational infrastructure.

3.6 Loss Function


The Mixup loss function was added into my training strategy. By combining pairs of input
samples and their corresponding labels during training, the Mixup loss function is a method for
promoting smoothness and regularisation in the learned representations.
Convex combinations of training sample and label pairs are used to implement the Mixup
loss function. By averaging the characteristics and labels of two randomly chosen samples
from the dataset, it effectively creates virtual training examples. This blending procedure aids
in building a model that is more resistant to noise or small fluctuations in the input data.
I wanted to strengthen the model’s generalizability and capacity to handle unusual sound
patterns in valve machines by applying the Mixup loss function. By combining data from sev-
eral samples, this loss function allows the model to develop more meaningful and generalised
representations. By encouraging smoother decision boundaries in the learnt feature space, it
can successfully prevent overfitting and enhance the model’s capacity to detect abnormalities.
The efficiency of the Mixup loss function in fostering regularisation and enhancing the
model’s performance in various machine learning tasks was the primary factor in the decision
to deploy it. I sought to improve the model’s resilience and anomaly detection capabilities in
the specific setting of valve machine monitoring by including this loss function into my training
procedure.

3.7 NAdam Optimizer


My deep learning models were trained using the NAdam optimizer as the optimisation strategy.
The model’s overall performance and the pace at which it converges are both impacted by the
optimizer chosen.
CHAPTER 3. METHODOLOGY 26

The Nesterov accelerated gradient (NAG) method is incorporated into the NAdam opti-
mizer, which is an extension of the Adam optimizer. In order to achieve faster and more stable
convergence, it blends adaptive learning rate adjustment with the advantages of NAG.
It has been demonstrated that NAdam enhances generalisation performance, improving
model performance on untested data. By adding NAG, it successfully lowers parameter update
overshooting, enhancing the model’s ability to generalise to new cases and handle anomalies.
Increased resistance to noise and outliers in the training data is shown by the NAdam op-
timizer. This can be especially useful for monitoring valve machines because the recorded
sound data may contain abnormalities or unexpected fluctuations. By reducing the effect of
noisy samples during training, the NAdam optimizer aids in the model’s learning of more reli-
able representations.
Like Adam, NAdam dynamically modifies each parameter’s learning rate based on its prior
gradients. Due to this adaptability, training becomes more consistent and effective by ensuring
that the learning rate is optimal for various factors.
The NAdam optimizer deals with some of the optimisation problems that Adam has, namely
the sensitivity to learning rate selection. By minimising the oscillations and instabilities that
can happen during training, it demonstrates enhanced optimisation dynamics. In the context
of valve machine monitoring, where precise and trustworthy anomaly detection is required for
ensuring the machines’ normal operation, this stability is crucial.
In some cases, including when working with sparse gradients, the NAdam optimizer per-
forms better than Adam. Deep learning models frequently use sparse gradients, particularly
when working with high-dimensional data like audio signals. The model can learn more quickly
and accurately capture the underlying patterns and anomalies contained in the valve machine
sound data thanks to the NAdam optimizer’s adept handling of sparse gradients.
The NAdam optimizer consistently outperforms competing algorithms on a variety of tasks
and datasets. In the context of monitoring valve machines, where the anomaly detection sys-
tem must be strong and dependable across various operating circumstances, environments, and
anomaly kinds, this consistency is beneficial. The system’s overall performance and stability
are influenced by the NAdam optimizer’s ability to consistently optimise the model parameters.
The NAdam optimizer is becoming more well-liked and is being widely used in a variety of
fields of study. It has demonstrated encouraging results in enhancing deep learning models’ per-
formance in anomaly detection applications. The NAdam optimizer has received widespread
acceptance and research, which has resulted to a plethora of information, resources, and com-
munity support, making it an excellent alternative for applications involving valve machine
monitoring.
In choosing the NAdam optimizer for my thesis, I hoped to take use of its unique benefits
and tackle the problems related to anomaly identification in valve machines. The goals of
precisely and effectively identifying anomalies in valve machine sound data are well-aligned
with the NAdam optimizer’s increased optimisation dynamics, handling of sparse gradients,
consistency in performance, and wide adoption.
CHAPTER 3. METHODOLOGY 27

3.8 K-means clustering for anomaly detection


K-means clustering was used in this thesis as a primary method for examining and spotting
anomalies in the monitoring of valve machines. An established unsupervised learning ap-
proach called K-means clustering divides data points into K clusters based on how similar
they are. K-means clustering can successfully separate anomalous sound patterns during the
testing phase by establishing clusters that represent typical valve machine behaviour during the
training phase.
The use of K-means clustering in valve machine monitoring enables the recognition of
distinctive sound patterns and their correlation with particular operational circumstances. The
technique enables the detection of departures from the predicted behaviour, potentially suggest-
ing abnormalities or problems with the valve machines. It does this by allocating each sound
sample to the closest cluster centroid.
K-means clustering has a number of benefits when used for anomaly detection in valve
machines. First off, it offers an unsupervised method, which means that it is flexible enough
to adapt to different operating circumstances without needing explicit labelling for anomalies.
Additionally, the clustering results’ interpretability enables a deeper comprehension of the un-
derlying sound patterns and their connection to machine circumstances.
This thesis attempts to increase the precision and effectiveness of anomaly detection in
valve machines by using K-means clustering. Operators and maintenance staff may quickly
address possible problems, avoid expensive downtime, and guarantee the efficient running of
valve machines by spotting abnormal sound patterns.

3.9 Metrics
Area Under the Curve (AUC) and Partial AUC (pAUC) metrics were used to assess the ef-
fectiveness of the proposed valve machine monitoring system. The capacity of the model to
distinguish between typical and anomalous sound samples is measured in detail by the AUC
metric, which is frequently used for binary classification tasks. It is appropriate for assessing
the efficiency of the anomaly detection system since it takes into account the overall perfor-
mance across various threshold values.
The ability of the model to correctly categorise sound samples as normal or anomalous
may be evaluated quantitatively using AUC as a statistic. AUC values closer to 1 imply greater
performance, with higher scores indicating a larger capacity for discrimination.
In this thesis, Partial AUC (pAUC) was also used as a supplemental statistic. pAUC concen-
trates on a particular range of false positive rates that are pertinent to the particular application,
as opposed to AUC, which takes into account the entire range of false positive rates. pAUC
can offer a more focused assessment of the system’s performance under particular operating
conditions or false positive rates of interest in the context of valve machine monitoring.
CHAPTER 3. METHODOLOGY 28

This thesis seeks to offer a thorough evaluation of the suggested valve machine monitoring
system’s performance using AUC and pAUC as evaluation measures. These measures allow
for a quantitative evaluation of current techniques and make it easier to pinpoint the best valve
machine anomaly detection strategies.
Overall, the use of AUC and pAUC as evaluation measures provides a thorough and unbi-
ased assessment of the effectiveness of the suggested approach in spotting anomalies in valve
machine sound data. The use of these measures enables a thorough evaluation of the system’s
performance and makes it easier to pinpoint areas where valve machine monitoring methods
need to be improved.

3.10 Data Augmentation


In order to artificially increase the amount and variety of the training dataset, data augmentation
is an essential machine learning technique. It entails applying a variety of changes or perturba-
tions to the initial data to produce augmented samples that are analogous to the original samples
but slightly different from them. The robustness, generalizability, and overall performance of
the model are all enhanced by include data augmentation in the training process.
Data augmentation is crucial in the context of valve machine monitoring because it im-
proves the system’s capacity to spot anomalies in sound data. In order to do this, four distinct
data augmentation techniques—Gaussian noise, timestretch, pitchshift, and shift—were used
in this thesis.
Random noise is added to the audio signals during gaussian noise augmentation. The model
grows more resistant to changes in the acoustic environment and is better able to distinguish be-
tween typical and abnormal sound patterns by adding noise. This method enhances the model’s
ability to detect abnormalities in real-world events and helps it generalise well to unknown data.
By extending or shortening the length of the audio signals, timestretch augmentation modi-
fies their temporal properties. The model can better represent the various time-scale fluctuations
inherent in the sound data thanks to this transformation. It improves the system’s ability to spot
anomalies that could happen quickly or slowly.
Pitchshift augmentation alters the audio signals’ pitch or frequency content while keeping
their general structure intact. This transformation mimics the pitch changes that might happen
in valve machine sounds as a result of various operating circumstances or problems. Pitchshift
augmentation makes the model more resistant to changes in pitch and improves its ability to
accurately capture aberrant sound patterns over a range of frequency ranges.
Shift augmentation entails slightly deviating the temporal alignment of the audio waves in
time. This method enables the model to recognise abnormalities that may occur at various tem-
poral positions within the sound data and aids in the learning of invariant representations. The
model becomes more adaptable in identifying abnormalities across various temporal situations
by using shift augmentation.
CHAPTER 3. METHODOLOGY 29

This thesis intends to improve the resilience and generalisation ability of the valve ma-
chine monitoring system by utilising these four data augmentation techniques: shift, pitchshift,
timestretch, and Gaussian noise. Combining these augmentation techniques enables the model
to extract more representative and discriminative characteristics from the enhanced data, im-
proving the accuracy and reliability of its anomaly detection.
In conclusion, data augmentation, which uses methods like Gaussian noise, timestretch,
pitchshift, and shift, is crucial for enhancing the effectiveness of the valve machine monitoring
system. The model’s capacity to record variations in sound patterns is improved by these
augmentation techniques, which ultimately aid in the more accurate detection of anomalies in
valve machine sound data.

Figure 3.1: Above is shown a snippet of the code showing the parameters given to each aug-
menting function.
Chapter 4

Testing and Results

Source Target
AUC pAUC AUC pAUC
Valve00 99.76 98.73 70.36 52.0
Valve01 92.88 83.78 68.72 54.1
Valve02 100.0 100.0 92.72 81.0

Table 4.1: Results Before Modification.

Source Target
AUC pAUC AUC pAUC
Valve00 99.76 98.73 70.36 52.0
Valve01 92.88 83.78 68.72 54.1
Valve02 100.0 100.0 92.72 81.0

Table 4.2: Results After Modification.

30
Chapter 5

Conclusion and Future Work

5.1 Conclusion
In conclusion, the NAdam optimizer and data augmentation techniques have shown to be use-
ful for monitoring valve machines. In comparison to conventional optimizers like Adam, the
NAdam optimizer has better convergence and generalisation capabilities thanks to its variable
learning rate and momentum. This enhances the model’s performance by enabling it to suc-
cessfully navigate challenging optimisation environments.
The model’s capacity to generalise to new data and identify anomalies in valve machine
noises has also been successfully improved by the use of data augmentation techniques such
Gaussian noise, temporal stretching, pitch shift, and shift. Data augmentation reduces overfit-
ting and increases the model’s robustness by exposing it to a larger range of realistic scenarios
by introducing controlled variances in the training data.
A more trustworthy and precise valve machine monitoring system has been created using
the NAdam optimizer and data augmentation approaches. The model shows improved detection
abilities, gaining greater accuracy and better performance in spotting anomalies and odd sound
patterns.
Overall, the use of the NAdam optimizer and approaches for data augmentation provide a
strong foundation for efficient and trustworthy valve machine monitoring. These methods aid
in the creation of reliable and precise anomaly detection systems, opening the way for increased
operational and maintenance effectiveness in industrial settings.

5.2 Future Work and Limitations


It might be advantageous in the future to combine sound data with information from other
sensors, such as vibration sensors or temperature sensors. This integration would give a more
complete picture of the machine’s health and make anomaly identification more precise.

31
CHAPTER 5. CONCLUSION AND FUTURE WORK 32

It would be beneficial to have real-time monitoring capabilities for quick detection and
reaction to anomalies. Real-time monitoring in valve machine systems would involve the ap-
plication of effective algorithms and optimisation of the computational needs.
Valve machines can function in a variety of environmental settings, which causes alterations
in sound patterns. The model’s ability to generalise and adapt to various operating situations
would be improved by increasing its robustness to these fluctuations.
A worthwhile direction for valve machine monitoring would be to investigate unsupervised
anomaly detection techniques. Unsupervised approaches are more versatile and responsive to
changing machine settings because they can identify anomalies without relying on labelled
data.
An important future path would be to expand the valve machine monitoring system to ac-
commodate large-scale industrial situations. For wider implementation, it would be essential to
provide scalability and effective processing of enormous volumes of data from various devices.
The augmented samples may contain some degree of information loss or distortion as a re-
sult of the data augmentation procedures. The model’s capacity to faithfully represent particular
features or patterns in the data may be impacted by this loss.
Overfitting can still occur in situations when augmented samples are too similar to the
training data, despite the fact that data augmentation works to reduce this risk. To avoid this
problem, the augmentation settings must be carefully chosen and tuned.
Different augmentation methods may differ in their efficacy and applicability for particular
kinds of sound data. To ensure effective augmentation, it is crucial to assess and choose the
best procedures depending on the peculiarities of valve machine sounds.
The variety of augmented samples might be constrained by the augmentation techniques
that are accessible. The model’s capacity to generalise to other kinds of anomalies could be
further improved by including more sophisticated and diverse augmentation techniques.
Further advancements might result from researching and creating advanced augmentation
methods designed specifically for valve machine sounds. This includes methods that enhance
the variety and realism of the augmented samples by taking into account the particular traits
and patterns found in valve machine sounds.
The augmentation process might be improved by looking at techniques for automated se-
lection and adaptation of augmentation procedures based on the unique properties of the valve
machine data. This could entail using machine learning algorithms to determine the best aug-
mentation techniques for certain machine circumstances.
The relevance and efficiency of the generated samples can be improved by taking domain-
specific knowledge into account and incorporating it into the data augmentation process. The
precise auditory properties and abnormalities relevant to valve machines could be captured
using domain-specific augmentation techniques.
It would be beneficial to do thorough analyses to determine the effects of various augmen-
tation approaches on the performance, generalisation, and resilience of models. The choice and
CHAPTER 5. CONCLUSION AND FUTURE WORK 33

fine-tuning of augmentation strategies for better valve machine monitoring can be guided by
this analysis.
Overall, even though data augmentation is an effective method for enhancing model perfor-
mance, its advantages for valve machine monitoring must be carefully considered, along with
potential future developments.
Appendix

34
Appendix A

Pre/Post Test

Level One

35
Appendix B

Lists

36
List of Tables

4.1 Results Before Modification. . . . . . . . . . . . . . . . . . . . . . . . . . . . 30


4.2 Results After Modification. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

37
List of Figures

2.1 Spectrogram of a part of the original sound. The X-axis shows the time in
seconds. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6
2.2 Procedure of anomalous sound simulation using autoencoder. . . . . . . . . . . 7
2.3 Evaluation Results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
2.4 Examples of log-Mel spectrograms of the original sound. . . . . . . . . . . . . 9
2.5 Examples of spectrograms for each machine type. . . . . . . . . . . . . . . . . 11
2.6 Comparison of convolutional blocks for different architectures. ShuffleNet uses
Group Convolutions and shuffling, it also uses conventional residual approach
where inner blocks are narrower than output. . . . . . . . . . . . . . . . . . . 12
2.7 Block diagram of disentangled anomaly detector. In the figure, NN stands for
Nearest Neighbor. In the training phase, exclusive latent spaces were assigned
to sections and attributes. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.8 MobileFaceNet Architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.9 The procedure of batch mixing strategy. . . . . . . . . . . . . . . . . . . . . . 17
2.10 Structure of the proposed anomalous sound detection system. . . . . . . . . . . 18

3.1 Above is shown a snippet of the code showing the parameters given to each
augmenting function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29

38
Bibliography

[1] Y. Koizumi, S. Saito, H. Uematsu, and N. Harada. ”Optimizing acoustic feature extractor
for anomalous sound detection based on NeymanPearson lemma,”. in Proc. 25th Euro-
pean Signal Processing Conference (EUSIPCO), 2017.

[2] Y. Kawaguchi and T. Endo. “How can we detect anomalies from subsampled audio sig-
nals?” . in Proc. 27th IEEE International Workshop on Machine Learning for Signal
Processing (MLSP), 2017.

[3] Y. Koizumi, S. Saito, H. Uematsu, Y. Kawachi, and N. Harad. “Unsupervised detection of


anomalous sound based on deep learning and the Neyman-Pearson lemma,”. EEE/ACM
Transactions on Audio, Speech, and Language Processing, vol. 27, no. 1, pp. 212–224,
Jan. 2019.

[4] Y. Kawaguchi, R. Tanabe, T. Endo, K. Ichige, and K. Hamada. “Anomaly detection based
on an ensemble of dereverberation and anomalous sound extraction,”. in Proc. 44th IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2019.

[5] Y. Koizumi, S. Saito, M. Yamaguchi, S. Murata, and N. Harada. “Batch uniformization


for minimizing maximum anomaly score of DNN-based anomaly detection in sounds,”.
in Proc. IEEE Workshop on Applications of Signal Processing to Audio and Acoustics
(WASPAA), 2019.

[6] K. Suefusa, T. Nishida, H. Purohit, R. Tanabe, T. Endo, and Y. Kawaguchi. “Anomalous


sound detection based on interpolation deep neural network,”. ” in Proc. 45th IEEE
International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2020.

[7] H. Purohit, R. Tanabe, K. Suefusa T. Endo, Y. Nikaido, and Y. Kawaguchi. “Deep


autoencoding GMM-based unsupervised anomaly detection in acoustic signals and its
hyper-parameter optimization,”. in Proc. 5th Workshop on Detection and Classification
of Acoustic Scenes and Events (DCASE), 2020.

[8] Y. Koizumi, Y. Kawaguchi, K. Imoto, T. Nakamura, Y. Nikaido, R. Tanabe, H. Puro-


hit, K. Suefusa, T. Endo, M. Yasuda, and N. Harada. “Description and discussion on
DCASE2020 challenge task2: Unsupervised anomalous sound detection for machine con-
dition monitoring,” . in Proc. 5th Workshop on Detection and Classification of Acoustic
Scenes and Events (DCASE), 2020.

39
BIBLIOGRAPHY 40

[9] Y. Kawaguchi, K. Imoto, Y. Koizumi, N. Harada, D. Niizumi, K. Dohi, R. Tanabe,


H. Purohit, and T. Endo. “Description and discussion on DCASE 2021 challenge task
2: Unsupervised anomalous detection for machine condition monitoring under domain
shifted conditions,”. in Proc. 6th Workshop on Detection and Classification of Acoustic
Scenes and Events (DCASE), 2021.

[10] N. Harada, D. Niizumi, D. Takeuchi, Y. Ohishi, M. Yasuda, and S. Saito. “Toyadmos2:


Another dataset of miniature-machine operating sounds for anomalous sound detection
under domain shift conditions,” . in Proc. 6th Workshop on Detection and Classification
of Acoustic Scenes and Events (DCASE), 2021.

[11] K. Dohi, T. Nishida, H. Purohit, R. Tanabe, T. Endo, M. Yamamoto, Y. Nikaido, and


Y. Kawaguchi. “MIMII DG: Sound dataset for malfunctioning industrial machine inves-
tigation and inspection for domain generalization task,”. 2022.

[12] R. Giri, S. V. Tenneti, F. Cheng, K. Helwani, U. Isik, and A. Krishnaswamy. “Self-


supervised classification for detecting anomalous sounds,”. in Proc. 5th Workshop on
Detection and Classification of Acoustic Scenes and Events (DCASE), 2020.

[13] P. Primus, V. Haunschmid, P. Praher, and G. Widmer. “Anomalous sound detection as


a simple binary classification problem with careful selection of proxy outlier examples,”.
in Proc. 5th Workshop on Detection and Classification of Acoustic Scenes and Events
(DCASE),, 2020.

[14] T. Inoue, P. Vinayavekhin, S. Morikuni, S. Wang, T. H. Trong, D. Wood, M. Tatsubori,


and R. Tachibana. “Detection of anomalous sounds for machine condition monitoring
using classification confidence,”. in Proc. 5th Workshop on Detection and Classification
of Acoustic Scenes and Events (DCASE), 2020.

[15] M. Sandler, A. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen. “MobileNetV2: Inverted
residuals and linear bottlenecks,”. in Proc. 31st IEEE/CVF Conference on Computer
Vision and Pattern Recognition (CVPR),, 2018.

[16] Y. Zeng, H. Liu, L. Xu, Y. Zhou, and L. Gan. “Robust anomaly sound detection framework
for machine condition monitoring,”. DCASE2022 Challenge, Tech. Rep., 2022.

[17] I. Kuroyanagi, T. Hayashi, K. Takeda, and T. Toda. “Two-stage anomalous sound detec-
tion systems using domain generalization and specialization techniques,”. DCASE2022
Challenge, Tech. Rep., 2022.

[18] F. Xiao, Y. Liu, Y. Wei, J. Guan, Q. Zhu, T. Zheng, and J. Han. “The dcase2022 challenge
task 2 system: Anomalous sound detection with self-supervised attribute classification
and gmm-based clustering,”. DCASE2022 Challenge, Tech. Rep., 2022.

[19] Y. Deng, J. Liu, and W.-Q. Zhang. “Aithu system for unsupervised anomalous detection
of machine working status via sounding,”. DCASE2022 Challenge, Tech. Rep., 2022.
BIBLIOGRAPHY 41

[20] S. Venkatesh, G. Wichern, A. Subramanian, and J. Le Roux. “Disentangled surrogate


task learning for improved domain generalization in unsupervised anomalous sound de-
tection,”. DCASE2022 Challenge, Tech. Rep., 2022.

[21] Y. Wei, J. Guan, H. Lan, and W. Wang. “Anomalous sound detection system with self-
challenge and metric evaluation for dcase2022 challenge task 2,”. DCASE2022 Challenge,
Tech. Rep., 2022.

[22] K. Morita, T. Yano, and K. Tran. “Comparative experiments on spectrogram representa-


tion for anomalous sound detection,”. DCASE2022 Challenge, Tech. Rep., 2022.

[23] J. Bai, Y. Jia, and S. Huang. “Jless submission to dcase2022 task2: Batch mixing strategy
based method with anomaly detector for anomalous sound detection,” . DCASE2022
Challenge, Tech. Rep., 2022.

[24] S. Verbitskiy, M. Shkhanukova, and V. Vyshegorodtsev. “Unsupervised anomalous sound


detection using multiple time-frequency representations,”. DCASE2022 Challenge, Tech.
Rep., 2022.

[25] K. Wilkinghoff. “An outlier exposed anomalous sound detection system for domain gen-
eralization in machine condition monitoring,”. DCASE2022 Challenge, Tech. Rep., 2022.

[26] H. Zhang, M. Cisse, Y. N. Dauphin, and D. Lopez-Paz. “mixup: Beyond empirical risk
minimization,”. in International Conference on Learning Representations, 2018.

[27] I. Nejjar, J. P. J. Meunier-Pion, G. M. Frusque, and O. Fink. “Dcase challenge 2022: Self-
supervised learning pre-training, training for unsupervised anomalous sound detection,”.
DCASE2022 Challenge, Tech. Rep., 2022.

[28] Tensorflow. https://www.tensorflow.org/api_docs.

[29] blockly. https://developers.google.com/blockly.

[30] Keras. https://keras.io/api/.

You might also like