2501.01342v1

IEEE TRANSACTIONS ON XX, VOL. XX, NO.
XX, XXXX 1
DeepFilter: An Instrumental Baseline for Accurate

and Efficient Process Monitoring
Hao Wang, Zhichao Chen† , Licheng Pan, Xiaoyu Jiang, Yichen Song, Qunshan He,
Xinggao Liu† Member, IEEE
Abstract—Effective process monitoring is increasingly vital in These examples underscore the pressing need for advanced
industrial automation for ensuring operational safety, necessitat- process monitoring systems to mitigate risks, reduce costs,
ing both high accuracy and efficiency. Although Transformers and ensure safety, reliability, and efficiency in industrial au-
have demonstrated success in various fields, their canonical form
tomation [3, 13, 14].
arXiv:2501.01342v1 [cs.AI] 2 Jan 2025
based on the self-attention mechanism is inadequate for process

monitoring due to two primary limitations: (1) the step-wise Process monitoring aims to estimate next-step values of
correlations captured by self-attention mechanism are difficult quality variables from historical logs, identifying anomalies
to capture discriminative patterns in monitoring logs due to the when estimated values deviate significantly from observations.
lacking semantics of each step, thus compromising accuracy; (2) Unlike other industrial data analytics tasks, such as fault
the quadratic computational complexity of self-attention hampers
efficiency. To address these issues, we propose DeepFilter, a diagnosis [15, 16] or soft sensing [17–19], process monitoring
Transformer-style framework for process monitoring. The core requires not only high accuracy but also operational effi-
innovation is an efficient filtering layer that excel capturing long- ciency for real-time decision-making. In large-scale chemical
term and periodic patterns with reduced complexity. Equipping plants [9, 10], for instance, precise estimates help detect
with the global filtering layer, DeepFilter enhances both accuracy subtle shifts in temperature or pressure, enabling operators to
and efficiency, meeting the stringent demands of process mon-
itoring. Experimental results on real-world process monitoring intervene before potential reactor instabilities escalate. Equally
datasets validate DeepFilter’s superiority in terms of accuracy important, real-time efficiency ensures these corrective mea-
and efficiency compared to existing state-of-the-art models.1 sures are executed promptly, minimizing the risk of safety
Index Terms—Energy security, Industrial time series analytics, incidents or costly downtime. Thus, an instrumental process
Process monitoring. monitoring system should predict the next-step values of
quality variables both accurately and efficiently, fulfilling its
pivotal role in safeguarding industrial equipment.
I. I NTRODUCTION To achieve accurate and efficient process monitoring, a va-
riety of data-driven algorithms have been developed, evolving
M ONITORING quality variables through advanced time-
series analysis is paramount for ensuring operational
safety in industrial automation across a broad range of ap-
from early identification methods to advanced deep learning
models. Traditional methods such as ARIMA [20] offer com-
plications [1–3]. These techniques are widely employed in putational simplicity but are limited in handling the nonlinear
process engineering [4], manufacturing [5, 6], and energy complexity of industrial data. Statistical methods, includ-
conversion [7], where the growing demands for higher effi- ing decision trees [21], XGBoost [22], and regression-based
ciency and cost-effectiveness often drive equipment to operate approaches [23], provide improved accuracy through richer
under extreme conditions, thereby increasing the likelihood feature extraction but rely heavily on manual engineering and
of catastrophic failures [7, 8]. For example, in chemical struggle to scale with large datasets. In contrast, deep learning
engineering, the Haber-Bosch process for ammonia synthesis architectures—spanning convolutional [24], recurrent [25, 26],
requires temperatures above 400°C and pressures exceeding and graph neural networks [27]—enable automated feature
200 bar, escalating risks of equipment failure and hazardous extraction and GPU acceleration, driving both performance
leaks [9, 10]. Similarly, nuclear power plants operate reactors and speed. Notably, Transformers [28] have emerged as an
at high pressures and temperatures to maximize output, height- exemplar solution, featured by the self-attention layer for
ening the threat of catastrophic radiation leakage [11, 12]. capturing temporal patterns. This layer dynamically computes
weighted dependencies across all time steps in the monitor-
This work is supported by the National Natural Science Foundation of ing logs, capturing step-wise relationships useful for predic-
China (11975207, 12075212, 12105246, 62073288), the National Key Re- tion [29]. Additionally, the architecture is optimized for GPU
search and Development Program of China (Grant No. 2021YFC2101100),
and Zhejiang University NGICS Platform. acceleration, well-suited for large-scale monitoring data [28]
Hao Wang, Zhichao Chen, Licheng Pan, Xiaoyu Jiang, Yichen Song, processing. These advancements have made Transformers a
Qunshan He, and Xinggao Liu are with the State Key Laboratory preferred choice in industrial time-series analytics [29–31].
of Industrial Control Technology, College of Control Science and En-
gineering, Zhejiang University, Hangzhou 310027, China (e-mail: hao- Despite the promise of Transformers, we contend that
haow@zju.edu.cn; 12032042@zju.edu.cn; 22132045@zju.edu.cn; jiangxi- canonical self-attention layers present two critical shortcom-
aoyu@zju.edu.cn syc 1203@zju.edu.cn; heqs@zju.edu.cn; lxg@zju.edu.cn). ings that hinder their suitability for the stringent demands
1 Code is available in an anonymous repository: https://anonymous.4open.
science/r/DeepFilter-BC3B. The dose rate data is daily updated on our project of process monitoring. First, step-wise correlations in self-
website, and we commit to releasing them after peer review completes. attention fail to adequately represent the discriminative, long-
IEEE TRANSACTIONS ON XX, VOL. XX, NO. XX, XXXX 2
term patterns—such as trends and periodicities—crucial for

detecting anomalies in process data. Individual observations 𝑦) ' ∈ ℝ Temporal tensor
in industrial logs often lack semantic richness, hindering the GRU layer Frequency tensor
self-attention layer from capturing the discriminative patterns
for accurate monitoring. Second, the quadratic computational
complexity of self-attention poses serious challenge, especially Layer-norm
in real-time monitoring applications where low latency is 𝐙& ∈ ℝ!×#
paramount. Consequently, canonical Transformers fall short IFFT
of delivering the required accuracy and efficiency in process
monitoring. FFN layer
𝑾(%)
To overcome these limitations, we propose DeepFilter, a 𝐑 ∈ ℝ!×#
refined Transformer architecture that replaces the self-attention Layer-norm
layer with a novel global filtering block specifically tailored
for process monitoring. This block performs adaptive filtering 𝐙 (%) ∈ ℂ!×#
across the entire temporal sequence, effectively modeling the
FFT
long-term discriminative patterns. Theoretical analysis further Efficient filtering layer
confirms its capacity to enhance representations of long-term 𝐙 ∈ ℝ!×#
discriminative patterns. Moreover, by discarding the quadratic ×K GF block
complexity inherent in self-attention, the global filtering block
significantly reduces computational overhead. Extensive evalu- 𝐙 ∈ ℝ!×#
Affine layer
ations on real-world datasets demonstrate that DeepFilter con-
sistently delivers superior accuracy and efficiency relative to 𝐗 ∈ ℝ!×#
state-of-the-art models, highlighting its role as an instrumental
baseline for Transformer-based process monitoring. Fig. 1. Overview of the core components in DeepFilter.
Organization. Section II provides a detailed description of
the DeepFilter architecture. Section III presents a case study
on the process monitoring for real-world nuclear power plants, are detected as significant deviations between the actual and
demonstrating the improvements in accuracy and efficiency predicted quality variable values.
achieved by DeepFilter. Section IV offers a review of related
works in process monitoring and highlight the contribution B. Global Filtering Block
of this work in the context of existing studies. Finally, we
The fundamental component of DeepFilter is the Global Fil-
summarize our conclusions, limitations and outline directions
tering (GF) block, as illustrated in Fig. 1, which integrates an
for future research.
efficient filtering layer for mixing information across different
time steps and a feed-forward network (FFN) layer for mixing
II. M ETHODOLOGY
information across different channels.
In response to the inherent limitations of Transformer in ac- Let Z ∈ RT×D denote the input sequence to the k-th GF
curate and efficient process monitoring, this section introduces block, where T is the window length of historical monitoring
the DeepFilter approach. DeepFilter replaces the self-attention logs and D is the hidden dimension. The global filtering
layer in Transformer with an efficient filtering layer for fusing process begins by transforming Z from the time domain to the
the information across different steps, effectively enhancing frequency domain using the Fast Fourier Transform (FFT):
accuracy and operational efficiency for process monitoring.
Z(F) = F(Z), (1)
A. Problem Definition where F denotes the FFT operation. In the frequency domain,
A monitoring log consists of a chronological sequence noisy and discriminative patterns are often easily isolated.
of observations [L(1), L(2), . . . , L(P)], where each L(t) ∈ Typically, noisy patterns often reside in high-frequency com-
R1×Din represents the observation at the t-th step with Din ponents [32–34], while discriminative patterns often emerge
covariates. We define X ∈ RT×Din as the historical sequence in low-frequency components [35, 36]. To extract the discrim-
and y H ∈ R as the quality variable, where T is the length of inative patterns and suppress the noisy ones, we perform a
the historical window and H is the monitoring horizon. At an filtering operation using the Hadamard product:
arbitrary time step t, the historical sequence is represented as
X = [L(t − T + 1), . . . , L(t)], and the corresponding quality Z̄(F) = Z(F) ⊙ W̄, (2)
variable is specified as the final feature in L(t + H). where W̄ ∈ CT×D contains learnable parameters that are
The objective of process monitoring is to develop a pre- optimized to discern discriminative patterns in model training.
dictive model g : RT×Din → R that generates the quality The filtered sequence is then transformed back to the time
variable perdition g(X) = ŷ H → y H . In practical process domain via the inverse FFT:
monitoring applications, the training dataset predominantly
comprises normal operational logs. Consequently, anomalies Z̄ = F −1 (Z̄(F) ), (3)
which is immediately followed by a residual connection and facilitates the modeling of long-term dependencies. Both pe-
layer normalization to stabilize the training process and miti- riodic and long-term patterns are typically discriminative for
gate gradient degradation: process monitoring, in contrast to the step-wise correlations
captured by standard Transformers.
R = LayerNorm(Z̄ + Z), (4)
• The efficient filtering layer reduces the computational com-
which is the output of the efficient filtering layer. To demon- plexity, thereby improving efficiency. The overall complex-
strate the efficacy of the operations in this layer, we restate ity of the efficient filtering layer is O(T log T), significantly
the convolution theorem below. lower than that of self-attention layers and convolution
layers with a full receptive field (O(T2 )).
Theorem II.1. Suppose W = F −1 (W(F) ), ”∗” is the
While the filtering layer captures dominant temporal pat-
circular convolution operator, the filtered sequence in (3) can
terns in each channel, it does not incorporate channel-wise
be acquired by performing circular convolution below
interactions. To fill in the gap, we introduce FFN as follows:
Z̄ = W ∗ Z.
FFN(R) = ReLU(RW(1) + b(1) )W(2) + b(2) , (5)
(F)
Proof. It is equivalent to prove F(W ∗ Z) = Z ⊙ W̄. To
R̄ = LayerNorm(FFN(R) + R), (6)
this end, the n-th element of the circular convolution above
can be expressed as follow where W(1) , b(1) , W(2) and b(2) are learnable parameters.
T−1
X To stabilize the training process, residual connection and layer
Z̄n = Wm Z(n−m)%T normalization are subsequently applied.
m=0 In a nutshell, the GF block captures temporal and channel-
wise patterns via the efficient filtering layer and the FFN layer,
On the basis, the FFT of Z̄ is denoted as Z̄(F) , where the ω-th
respectively, contributing to a representation R̄ ∈ RT×D that
element is given by:
comprehensively understands the process monitoring logs.
X T−1
T−1
2πi
C. DeepFilter Architecture and Learning Objective
X
Z̄(F)
ω = Wm Z(n−m)%T e− T nω
n=0 m=0 The GF block efficiently processes historical monitoring

T−1
X T−1 logs and excels at capturing discriminative temporal patterns.
2πi 2πi
X
= Wm e− T mω Z(n−m)%T e− T ω(n−m)
However, this block focuses on encapsulating these patterns
n=0 m=0 into a compact representation R̄, without generating the pre-
T−1 T−1
X 2πi
X 2πi dicted value of quality variable for process monitoring. To
= Wm e− T mω Z(n−m)%T e− T ω(n−m)
bridge this gap, we introduce DeepFilter, which integrates
m=0 n=0
cascaded GF blocks for achieving process monitoring.
T−1
X 2πi The architecture of DeepFilter is illustrated in Fig. 1. It
= Wω(F) Z(n−m)%T e− T ω(n−m)
begins by transforming the historical monitoring log X ∈
n=0
T−1
RT×Din into a latent representation through an affine layer:
2πi
X
= Wω(F) Zn−m e− T ω(n−m)
Z0 = XW(0) + b(0) (7)
n=m
0 N×D
m−1 where Z ∈ R represents the initial embeddings with
2πi
X
+ Wω(F) Zn−m+T e− T ω(n−m) hidden dimension D, W(0) , b(0) are learnable parameters.
n=0 These embeddings are then sequentially processed through K
T−m−1
X 2πi
T−1
X 2πi
GF blocks. Let Zk denote the input to the k-th GF block, the
= Wω(F) ( Zn e− T ω(n) + Zn e− T ω(n) ) output of this block is given by:
n=0 n=T−m
R̄k := GFk (Zk ), (8)
= Wω(F) · Z(F)
ω .
where GFk (·) performs the transformations from Eq (2) to
Thus, the equation F(W ∗ Z) = Z(F) ⊙ W̄ holds, and the
(6) sequentially. The output of each GF block serves as
proof is thereby completed.
the input for the subsequent block, i.e., Zk+1 := R̄k . The
Theoretical implications. Theorem II.1 implies that the effi- output from the last GF block, denoted as R̄K ∈ RT×D ,
cient filtering layer adepts at accuracy and efficiency, meeting encapsulates the historical monitoring logs comprehensively.
the dual excessive demand of process monitoring. This representation is passed to a Gated Recurrent Unit (GRU)
• The efficient filtering layer excel capturing discriminant tem- decoder, and the final-step output of the GRU decoder serves
poral patterns in the historical sequence, thereby improving as the prediction of the quality variable:
accuracy. According to Theorem II.1, the layer is equivalent
ŷ H := GRU(R̄K ), (9)
to a circular convolution between the historical sequence and
a large convolution kernel, where the kernel size equals the where ŷ H is the estimated value of the target quality variable.
historical window length T. Circular convolution facilitates There are several learnable parameters in DeepFilter, such
the capturing of periodic patterns, while the large kernel size as the weights and biases in the FFN, the affine layer, the
A: Radiation monitoring station TABLE I

B: Weather sensors
C: Spectrometer
VARIABLES IN THE M ONITORING L OG
D: Ionization chamber
B
E: Aerosol sampler C
D
F: Rain gauge E Field Unit of measure
G: Settlement sampler
F
Atmospheric radiation
Dose rate nGy / h
A Spectrometer measures (1024 channels) ANSI/IEEE N42.42
G
Meteorological conditions
Temperature ◦C
Humidity %
(a) National station distribution. (b) Composition of a single station. Atmosphere pressure hPa
Wind direction Clockwise angle
Fig. 2. Overview of the nuclear monitoring stations in our project. Wind speed m/s
Precipitation indicator Boolean value
Amount of precipitation mm
Spectrometer operating conditions
Battery voltage V
Spectrometer voltage V
Spectrometer temperature ◦C
TABLE II
D ESCRIPTION OF SAMPLING STRATEGY.
Station Location #Sample #Variable Interval

Xingan Station Hegang 38,686 1,034 5 min
Fig. 3. Scene photo of monitoring stations and key sensors. Huayuan Station Jinan 38,687 1,034 5 min
GRU decoder, and the filter tensor W(F) . These parameters worldwide, such as the RadNet in the United States [38], the
are optimized by minimizing the mean squared error (MSE) Fixed Point Surveillance Network in Canada [39], and the At-
between the predicted and actual quality variable values, mospheric Nuclear Radiation Monitoring Network (ANRMN)
defined as: in China, as shown in Fig. 2. These systems enable continuous,
2
L := y H − ŷ H . (10) reliable monitoring of radionuclide concentrations, reflecting
whether NPPs are operating normally.
III. E XPERIMENTS The key quality variable monitored by these systems is
the atmospheric γ-ray dose rate, measured using ionization
This section aims to empirically validating the effectiveness chambers (Fig. 3) [40]. Fig. 4(a) illustrates the non-stationary
of DeepFilter in the context of process monitoring. To this end, dynamics of this variable, likely influenced by external fac-
there are three aspects need to be investigated. tors such as weather conditions. To account for these in-
1) Accuracy: Does DeepFilter work effectively? Section III-C fluences, the monitoring systems include additional process
compares the accuracy of DeepFilter against baselines on variables, such as spectrometer measurements (spanning 1024
two large-scale real-world process monitoring datasets. channels), meteorological conditions (e.g., precipitation), and
2) Efficiency: Does DeepFilter work efficiently? Section III-D spectrometer operational parameters (e.g., battery voltage).
evaluates the actual running time of DeepFilter and Trans- These process variables, shown in Fig. 4(b-d), display diverse
formers under varying configurations. temporal patterns. Frequency domain analysis reveals concen-
3) Sensitivity: Is DeepFilter sensitive to hyperparameter vari- trated energy in low-frequency bands, which diminishes at
ation? Section III-E evaluates and analyzes the perfor- higher frequencies, highlighting the potential of frequency-
mance of DeepFilter under varying hyperparameter values. based models to extract semantic-rich representations.
The monitoring logs integrate 1,024-channel spectrometer
A. Background and Data Collection data, seven meteorological covariates, and three operational
parameters, as summarized in Table I. By consolidating diverse
Nuclear power plants (NPPs) play a crucial role in in-
data sources, these logs provide a comprehensive view of
dustrial automation, providing a stable and efficient energy
NPP operational status, facilitating the construction of process
source. In the United States, NPPs produce nearly 800 billion
monitoring systems to safeguard its operations.
kilowatt-hours of electricity annually, accounting for over 60%
of the nation’s emission-free electricity [37]. This reduces
approximately 500 million metric tons of carbon emissions, B. Experimental Setup
demonstrating their environmental and industrial significance. 1) Datasets: We employ two industrial datasets sourced
However, NPPs also pose security risks, as operational anoma- from monitoring logs collected from the ANRMN project.
lies can lead to radionuclide leaks, resulting in severe en- These datasets encompass 1034 input variables, as detailed
vironmental pollution and casualties [11]. To mitigate these in Table I. The statistics of preprocessed data is present in
risks, automated monitoring networks have been deployed Table II. Each dataset is sequentially divided into training,
1.0
is systematically performed following the standard protocol [7]
to enhance model performance. The learning rate is tuned
Volumn
0.5
within {0.001, 0.005, 0.01}; the batch size is tuned within {32,
64}; the number of GF blocks is set to 2; the historical window
0.0
0 2000 4000 6000 8000 10000
length is set to 16. The model is trained for a maximum of
Time stamp 200 epochs. An early stopping strategy with a patience of 15
(a) The dynamics of the quality variable: dose rate. epochs is employed, stopping training if no improvement was
1.0
observed within the validation set. Finally, the performance
2000 metrics on the test set is calculated and reported.
Volumn
Volumn
0.5
4) Evaluation Strategy: The coefficient of determination
1000
(R2 ) is selected for evaluating monitoring accuracy:
0.0 0 H 2
PN H

0 2000 4000 6000 8000 10000 0 1000 2000 3000 4000 5000 2 t=1 yt − ŷt
Time stamp Frequency bin R = 1 − PN , (11)
H H 2
(b) The 64-th channel of the spectrometer measurements and its FFT. t=1 yt − ȳt
1.0
where ytH represents the actual dose rate, ŷtH denotes the
4000 estimated dose rate by the model, and ȳtH is the mean value
Volumn
Volumn
0.5 of the actual dose rates over the test set of size N. This
2000
metric effectively captures the proportion of variance in y H
0.0 0 that is predictable from ŷ H . We also incorporate the root mean
0 2000 4000 6000 8000 10000 0 1000 2000 3000 4000 5000
Time stamp Frequency bin
squared error (RMSE) and the mean absolute error (MAE) as
supplementary metrics, quantifying the average magnitude of
(c) The temperature measurements and its FFT.
the prediction errors.
1.0
4000
v
u
u1 X N
Volumn
Volumn
2
0.5 RMSE = t (y H − ŷtH ) ,
2000
N t=1 t
(12)
0.0 0 N
0 2000 4000 6000 8000 10000 0 1000 2000 3000 4000 5000 1 X H
Time stamp Frequency bin MAE = |(y − ŷtH )|.
N t=1 t
(d) The humidity measurements and its FFT.
Fig. 4. Dynamics of the quality variable (a) and some important process C. Overall Performance
variables (b-d) and in the collected dataset from Jinan station. We also provide
the amplitude characteristics of their FFT in (b-d). In this section, we compare the monitoring accuracy of
DeepFilter and baselines across four distinct forecast horizons
(H = 1, 3, 5, 7). Results are shown in Table III with key
validation, and testing subsets with allocation ratios of 70%, observations below:
15%, and 15%, respectively. To ensure effective model training
• Identification models demonstrate limited accuracy in pro-
and evaluation, the datasets are normalized using a min-max
cess monitoring. Primarily designed to capture linear au-
scaling technique.
tocorrelations, identification models struggle to model the
2) Baselines: We compare DeepFilter with three categories
non-linear patterns that are prevalent in many datasets. This
of baselines as follows:
limitation becomes more pronounced in long-term forecast-
• Identification methods: AR, MA and ARIMA [20]; ing scenarios, where ARIMA models, for instance, record
• Statistical methods: Lasso Regression (LASSO) [23], relatively high MAE of 0.158 and 0.127 on the Hegang and
Support Vector Regression (SVR) [41], Random Forest Jinan datasets, respectively, for H = 7.
(RF) [21] , and eXtreme Gradient Boosting (XGB) [22]; • Statistic models integrate external factors, such as me-
• Deep methods: Long Short-Term Memory (LSTM) [26], teorological conditions, and demonstrate competitive per-
Gated Recurrent Unit (GRU) [25], Transformer [28], formance in short-term monitoring. Non-linear estimators,
Informer [42], AttentionMixer [7] and iTransformer [43]. particularly XGBoost, outperform linear models due to
Aligning with the prevailing work [7], we employ a their higher modeling capacities. Notably, XGBoost exhibits
GRU decoder for transformer-based baselines to produce robust performance across both datasets, achieving results
quality variable prediction. comparable to traditional deep learning models like LSTM
3) Training Strategy: All experimental procedures are ex- and GRU across most evaluation metrics.
ecuted using the PyTorch framework, utilizing the Adam • Deep models achieve the best performance among baselines.
optimizer [44] for its adaptive learning rate capabilities and ef- Specifically, Transformer-based methods display varying
ficient convergence properties. The experiments are conducted accuracy contingent upon their temporal fusing mechanisms.
on a hardware platform comprising two Intel(R) Xeon(R) Among them, the standard Transformer model, utilizing
Platinum 8383C CPUs operating at 2.70 GHz and eight self-attention mechanisms, shows suboptimal monitoring
NVIDIA GeForce 1080Ti GPU. Hyperparameter optimization accuracy, highlighting the limitations of self-attention in
TABLE III
C OMPARATIVE S TUDY ON THE H EGANG AND J INAN DATASETS OVER FOUR FORECAST HORIZONS .
H=1 H=3 H=5 H=7

Methods
MAE R2 MAE R2 MAE R2 MAE R2
Dataset: Hegang Station
AR 0.135±0.000 0.064±0.000 0.137±0.000 -0.038±0.000 0.140±0.000 -0.111±0.000 0.141±0.000 -0.163±0.000
MA 0.133±0.000 0.110±0.000 0.136±0.000 -0.042±0.000 0.138±0.000 -0.096±0.000 0.140±0.000 -0.158±0.000
ARIMA 0.135±0.000 0.063±0.000 0.148±0.000 -0.246±0.000 0.147±0.000 -0.238±0.000 0.143±0.000 -0.256±0.000
LASSO 0.045±0.000 0.282±0.000 0.046±0.000 0.243±0.000 0.047±0.000 0.183±0.000 0.049±0.000 0.109±0.000
SVR 0.023±0.000 0.890±0.000 0.034±0.000 0.779±0.000 0.051±0.000 0.540±0.000 0.058±0.000 0.391±0.000
XGBoost 0.016±0.000 0.943±0.000 0.018±0.000 0.902±0.000 0.022±0.000 0.838±0.000 0.023±0.000 0.778±0.000
LSTM 0.021±0.001 0.846±0.011 0.023±0.001 0.790±0.013 0.026±0.001 0.718±0.015 0.027±0.002 0.651±0.024
GRU 0.023±0.001 0.832±0.008 0.025±0.001 0.773±0.011 0.028±0.002 0.693±0.024 0.030±0.002 0.631±0.019
Transformer 0.015±0.002 0.927±0.022 0.020±0.001 0.845±0.009 0.027±0.001 0.675±0.004 0.031±0.002 0.557±0.015
Informer 0.042±0.017 0.562±0.269 0.035±0.004 0.575±0.114 0.037±0.008 0.498±0.102 0.036±0.002 0.483±0.014
iTransformer 0.014±0.001 0.919±0.026 0.016±0.003 0.911±0.025 0.018±0.002 0.853±0.019 0.032±0.012 0.660±0.134
AttentionMixer 0.011±0.003 0.962±0.012 0.017±0.005 0.901±0.041 0.018±0.003 0.832±0.043 0.023±0.003 0.766±0.040
DeepFilter 0.012±0.001 0.963±0.006 0.015±0.001 0.927±0.004 0.018±0.001 0.860±0.018 0.022±0.002 0.763±0.053
Dataset: Jinan Station
AR 0.106±0.000 0.744±0.000 0.118±0.000 0.668±0.000 0.123±0.000 0.620±0.000 0.130±0.000 0.571±0.000
MA 0.112±0.000 0.710±0.000 0.119±0.000 0.666±0.000 0.123±0.000 0.623±0.000 0.129±0.000 0.571±0.000
ARIMA 0.101±0.000 0.759±0.000 0.111±0.000 0.710±0.000 0.118±0.000 0.646±0.000 0.127±0.000 0.607±0.000
LASSO 0.029±0.000 0.742±0.000 0.030±0.000 0.717±0.000 0.031±0.000 0.680±0.000 0.033±0.000 0.636±0.000
SVR 0.061±0.000 0.640±0.000 0.065±0.000 0.599±0.000 0.056±0.000 0.679±0.000 0.050±0.000 0.706±0.000
XGBoost 0.031±0.000 0.903±0.000 0.033±0.000 0.883±0.000 0.036±0.000 0.853±0.000 0.038±0.000 0.816±0.000
LSTM 0.015±0.000 0.948±0.001 0.017±0.000 0.921±0.005 0.019±0.000 0.888±0.004 0.021±0.000 0.853±0.004
GRU 0.016±0.001 0.939±0.005 0.018±0.000 0.914±0.006 0.020±0.000 0.882±0.005 0.022±0.001 0.842±0.012
Transformer 0.016±0.004 0.955±0.023 0.017±0.002 0.929±0.014 0.024±0.005 0.837±0.057 0.025±0.002 0.813±0.013
Informer 0.018±0.002 0.917±0.013 0.025±0.008 0.851±0.041 0.022±0.000 0.813±0.003 0.026±0.002 0.761±0.021
iTransformer 0.012±0.000 0.971±0.005 0.014±0.002 0.948±0.028 0.018±0.002 0.906±0.023 0.020±0.004 0.881±0.033
AttentionMixer 0.011±0.004 0.981±0.012 0.012±0.002 0.967±0.007 0.016±0.002 0.912±0.035 0.027±0.010 0.840±0.039
DeepFilter 0.010±0.002 0.986±0.003 0.012±0.000 0.970±0.004 0.015±0.001 0.940±0.007 0.019±0.003 0.890±0.023
Note: The results are reported in mean±std . The best and second best metrics are bolded and underlined, respectively.
capturing discriminative patterns within this context. Modi- TABLE IV

fications to the token mixer, such as iTransformer and Atten- C OMPLEXITY A NALYSIS
tionMixer, result in significant performance enhancements.
Blocks Complexity Sequential Ops Path Length
This suggests that the refinement of token mixer is critical
for accommodating Transformers to process monitoring. RNN O(T · D2 ) O(T) O(T)
Transformer O(T2 · D) O(1) O(1)
• DeepFilter demonstrates the best overall performance across iTransformer O(T · D2 ) O(1) O(1)
different metrics and datasets. Its superior accuracy demon- DeepFilter O(T · log T · D) O(1) O(1)
strates that the efficient filtering layer excels at captur-
ing long-term discriminant patterns, as discussed in The-
orem II.1, which facilitates understanding ECP logs. • Fitting goodness. The right panels display scatter plots of
In-depth analysis. We conduct a detailed comparison of the predicted values against the ground truth. DeepFilter’s pre-
monitoring performance of DeepFilter and the Transformer dictions form a tighter cluster along the diagonal, indicating
model with self-attention for temporal fusion, focusing on more accurate predictions overall. In contrast, the Trans-
three key aspects in Fig. 5: former shows greater dispersion, particularly in higher value
• Predicted series. The left panels illustrate the predicted ranges, with more extreme outliers. Therefore, DeepFilter
time-series values against the ground truth. DeepFilter con- is advantageous in providing predictions that consistently
sistently tracks the ground truth more accurately, especially match the ground truth and reducing large errors.
during regions with sharp fluctuations and peaks. For in-
stance, in both datasets, the Transformer fails to capture
sudden spikes around timestamps 1700 and 1800. DeepFil- D. Complexity Analysis
ter, in contrast, remains utility in these cases, effectively In this section, we analyze and evaluate the computational
following both gradual trends and abrupt changes. cost of DeepFilter and baseline models [28]. Models are
• Error distribution. The middle panels display the distribu- employed to transform a sequence (x1 , ..., xT ) into another
tion of MAE. Overall, the Transformer’s error distribution sequence (z1 , ..., zT ) of equal length T and feature number D.
is more dispersed, with a heavier tail extending to higher Three key metrics are considered: complexity, sequential op-
error values, indicating occasional significant deviations. In erations, and path length, respectively quantifying the amount
contrast, DeepFilter exhibits a more concentrated error dis- of floating-point operations, the number of non-parallelizable
tribution with consistently lower prediction errors, making operations, and the minimum number of layers required to
it more reliable for critical monitoring tasks. model relationships between any two time steps.
Ground truth Transformer DeepFilter DeepFilter Transformer Ground truth Transformer DeepFilter
0.5 0.5
300
Prediction
0.4 0.4
Count
Value
200
0.3 0.3
0.2 100 0.2
0.1 0.1
0
1000 1200 1400 1600 1800 2000 0.00 0.05 0.10 0.15 0.20 0.1 0.2 0.3 0.4 0.5
Time stamp Volume of MAE Ground truth
(a) Performance comparison on the Hegang dataset.
Ground truth Transformer DeepFilter DeepFilter Transformer Ground truth Transformer DeepFilter
0.15 400 0.15
Prediction
300
Count
Value
0.10 0.10
200
0.05 0.05
100
0
1000 1200 1400 1600 1800 2000 0.00 0.05 0.10 0.15 0.20 0.025 0.050 0.075 0.100 0.125 0.150 0.175
Time stamp Volume of MAE Ground truth
(b) Performance comparison on the Jinan dataset.
Fig. 5. In-depth comparison on the monitoring performance of DeepFilter and Transformer.
iTransformer iTransformer the minimum sequential ops and path length.

−1 Transformer −1 Transformer • The empirical results are presented in Table III-D, compar-
10 10
Time (s)
Time (s)
DeepFilter DeepFilter
ing the actual inference times of iTransformer, Transformer,
−2 −2
10 10 and DeepFilter under various conditions. These results
−3 −3
corroborate the theoretical complexity in Table IV, with
10 10
DeepFilter consistently outperforming both iTransformer
32 64 128 256 512 1024 32 64 128 256 512 1024
Historical length (L) Feature Number (D) and Transformer in terms of inference speed.
(a) Inference time on the Intel(R) Xeon(R) Gold 6140 CPU. The results above underscore DeepFilter’s superior effi-
ciency and scalability, making it highly suitable for practical
−2
DeepFilter
10
−2 DeepFilter process monitoring applications where the operational effi-
10 iTransformer iTransformer
ciency of monitoring system is a demanding factor.
Time (s)
Time (s)
Transformer Transformer
−3 10
−3 E. Parameter Sensitivity Study
10
In this section, we investigate the impact of key hyperpa-
32 64 128 256 512 1024 32 64 128 256 512 1024 rameters on DeepFilter’s performance, including the number
Historical length (L) Feature Number (D)
of global filtering blocks (K), window length (L), the number
(b) Inference time on the Nvidia RTX Titan GPU.
of hidden dimensions (D), and batch size. The results are
Fig. 6. Inference time given varying settings, with solid lines for mean values summarized in Fig. 7 with key observations as follows:
of 10 trials and shaded areas for 90% confidence intervals. The default values
• DeepFilter’s performance is not significantly dependent on
of L, D and batch size are 16, 32, and 64, respectively.
a deep stack of blocks. As illustrated in Fig. 7(a), utilizing
1-2 blocks already yields promising results. While adding
more blocks can incrementally improve performance, there
• The theoretical results are presented in Table IV. Specif- is a risk of performance degradation, potentially due to
ically, RNNs exhibit inefficiency due to heavy sequential overfitting and optimization challenges.
ops and lengthy path lengths. In contrast, Transformer and • The model’s effectiveness improves with the inclusion of
iTransformer reduce sequential operations and path lengths, adequate historical monitoring data. In Fig. 7(b), we observe
enabling them to process entire sequences within a single that performance is suboptimal at L=4 but significantly
computational block and leverage GPU acceleration effec- enhances at L=8. Extending the window length further can
tively. However, they incur quadratic complexity relative to lead to improved monitoring accuracy, indicating the value
T and D, respectively, producing inefficiency when dealing of incorporating sufficient historical context in the model.
with long sequences or numerous covariates. • The relationship between the number of hidden dimensions
• DeepFilter addresses these limitations by incorporating a and performance does not follow a clear pattern as per
filtering layer that reduces the computational complexity Fig. 7(c). A smaller dimension appears sufficient to effec-
to O(T · log T · D), which is notably lower than that of tively model the monitoring log at both stations, suggesting
Transformer and iTransformer models, while maintaining that the logs contain a high degree of redundancy. Finally,
0.99 0.02 to capture nonlinear temporal dependencies limits their ef-

Hegang Jinan
0.98 fectiveness. Statistical methods, including decision trees[21],
RMSE
XGBoost [22], and generalized linear models [23], were
R2
0.97
subsequently introduced to address these limitations. These
0.96 methods improve accuracy by capturing nonlinear patterns,
Hegang Jinan
0.95 0.01 but they rely heavily on manual feature engineering and face
1 2 3 4 1 2 3 4
K K
scalability issues in large-scale industrial settings.
The advent of deep learning has revolutionized data-driven
(a) Performance with varying numbers of blocks (K).
process monitoring, enabling automatic feature extraction and
Hegang Jinan
enhanced parallel computing capabilities. Various architec-
0.98
0.02 tures, such as Convolutional Neural Networks (CNNs) [24],
Recurrent Neural Networks (RNNs) [25, 26], and Graph
RMSE
R2
0.95 Neural Networks (GNNs) [27], have been developed to extract

discriminative representations from monitoring logs for next-
Hegang Jinan
0.92 0.01
value prediction. For example, a spatiotemporal attention-
4 8 24 32 64 4 8 24 32 64 based RNN model [45] is proposed to capture the nonlinearity
L L
among process variables and their temporal dynamics; a multi-
(b) Performance with varying settings of window length (L). scale attention-enhanced CNN [31] is proposed to identify
0.99 long- and short-term patterns; a multi-scale residual CNN [49]
Hegang Jinan
0.02
is proposed to extract high-dimensional nonlinear features at
multiple scales. These examples highlight the versatility of
RMSE
R2
0.97 deep learning in the context of process monitoring.

Building on the success of deep learning methods above,
Hegang Jinan
Transformers [28] have gained prominence in process moni-
0.95 0.01
4 8 24 32 64 4 8 24 32 64 toring due to their scalability and parallel computing capabili-
D D
ties [29–31]. Early applications utilized self-attention to model
(c) Performance with varying numbers of hidden dimensions (D). step-wise relationships within monitoring logs [31, 50, 51]
0.02
Subsequent works mainly enhanced the attention mechanisms
0.99
Hegang Jinan for time-series, such as FedFormer [52] for noise filtering,
Informer [42] for reduced redundancy, Pyraformer [53] for
RMSE
0.97
multi-scale dependencies, and LogTrans [54] for locality
R2
enhanced representation. However, the step-wise correlation

0.95 Hegang Jinan captured by self-attention struggles to capture discriminative
0.01
8 16 32 128 8 16 32 128 patterns in industrial logs due to the lack of semantic richness
Batch size Batch size in individual observations. Recognizing this issue, another
(d) Performance with varying settings of batch size. line of works [7, 43] advocated applying self-attention to
model variate-wise correlations, which is more semantically
Fig. 7. Parameter sensitivity study results on Hegang and Jinan datasets,
where the dark areas indicate the 90% confidence intervals. meaningful than step-wise correlations in process monitoring.
While advanced models offer better predictive accuracy, the
increased size and complexity hinder practical deployment in
enlarging batch sizes generally improves performance in monitoring scenarios with strict latency and computational
Fig. 7(d), which suggests the potential benefit of increasing constraints. To address this issue, research has explored dis-
the batch size to further enhance model performance. tributed modeling [55, 56] and sparsification techniques [57–
59], which are effective to enhanced efficiency but entails
IV. R ELATED WORKS additional hard-core resources or scarifies some accuracy.
Therefore, there exists a critical need for novel solutions that
Data-driven process modeling has become a cornerstone co-optimize accuracy and efficiency in process monitoring.
of industrial automation with the rise of Industry 4.0 and Developing a modern Transformer-like architecture, satisfying
digital factory concepts [45–47]. These methods leverage large the accuracy and efficiency demands in real-time process
volumes of monitoring data to enhance operational safety monitoring, remains an open question.
across diverse industrial applications [4, 5, 48]. Current ap-
proaches can be categorized into three groups: identification
V. C ONCLUSION
methods, statistical methods, and deep learning approaches,
each offering distinct advantages and limitations. In this paper, we introduced DeepFilter, an adaptation of
Early identification methods, such as Auto-Regressive (AR), the Transformer architecture specifically optimized for process
Moving Average (MA), and Auto-Regressive Integrated Mov- monitoring. By replacing the canonical self-attention layer in
ing Average (ARIMA)[20], provide computational simplicity Transformer with an efficient global filtering layer, DeepFilter
and real-time processing capabilities. However, their inability excels at capturing long-term and periodic patterns inherent
in monitoring logs while significantly reducing computational [17] W. Shao, Y. Li, W. Han, and D. Zhao, “Block-wise parallel semisu-
complexity. Our experimental results on real-world process pervised linear dynamical system for massive and inconsecutive time-
series data with application to soft sensing,” IEEE Trans. Instrum. Meas.,
monitoring datasets demonstrate that DeepFilter outperforms vol. 71, pp. 1–14, 2022.
existing state-of-the-art models in both accuracy and effi- [18] W. Shao, W. Han, C. Xiao, L. Chen, M.-Q. Yu, and J. Chen, “Semi-
ciency, effectively meeting the stringent demands of modern supervised robust hidden markov regression for large-scale time-series
industrial data analytics and its applications to soft sensing,” IEEE Trans.
process monitoring. Autom. Sci. Eng., 2024.
Limitations and Future Work. The current implementation [19] Z. Yang, T. Hu, L. Yao, L. Ye, Y. Qiu, and S. Du, “Stacked dual-guided
of DeepFilter utilizes the FFT for domain transformation, autoencoder: A scalable deep latent variable model for semi-supervised
industrial soft sensing,” IEEE Trans. Instrum. Meas., 2024.
which is well-suited for typical stationary monitoring data but [20] A. Chandrakar, D. Datta, A. K. Nayak, and G. Vinod, “Statistical
may not effectively accommodate monitoring logs with swiftly analysis of a time series relevant to passive systems of nuclear power
varying patterns. Future research could investigate alterna- plants,” Int. J. Syst. Assur. Eng. Manag., vol. 8, no. 1, pp. 89–108, 2017.
[21] S. Grape, E. Branger, Z. Elter, and L. P. Balkeståhl, “Determination
tive transformation techniques, such as wavelet transforms of spent nuclear fuel parameters using modelled signatures from non-
or adaptive filtering methods, which are better equipped to destructive assay and random forest regression,” Nucl. Instrum. Methods
handle non-stationary monitoring logs. Another future work Phys. Res., Sect. A, vol. 969, p. 163979, 2020.
[22] J. I. Aizpurua, S. D. J. McArthur, B. G. Stewart, B. Lambert, J. G. Cross,
is deploying the proposed model on our online monitoring and V. M. Catterson, “Adaptive power transformer lifetime predictions
platform to support national-wide nuclear power plants. through machine learning and uncertainty modeling in nuclear power
plants,” IEEE Trans. Ind. Electron., vol. 66, no. 6, pp. 4726–4737, 2019.
[23] Y. K. Chan and Y. C. Tsai, “Multiple regression approach to predict
R EFERENCES turbine-generator output for Chinshan nuclear power plant,” Kerntechnik,
[1] H. Wang, X. Liu, Z. Liu, H. Li, Y. Liao, Y. Huang, and Z. Chen, vol. 82, no. 1, pp. 24–30, 2017.
“Lspt-d: Local similarity preserved transport for direct industrial data [24] H. Wang, M. Peng, R. Xu, A. Ayodeji, and H. Xia, “Remaining useful
imputation,” IEEE Trans. Autom. Sci. Eng., 2024. life prediction based on improved temporal convolutional network for
[2] J. Yu, L. Ye, L. Zhou, Z. Yang, F. Shen, and Z. Song, “Dynamic process nuclear power plant valves,” Front. Energy Res., vol. 8, p. 296, 2020.
monitoring based on variational bayesian canonical variate analysis,” [25] J. Zhang, Z. Pan, W. Bai, and X. Zhou, “Pressurizer water level
IEEE Trans. Syst., Man, Cybern., Syst., vol. 52, no. 4, pp. 2412–2422, reconstruction for nuclear power plant based on gru,” in IMCCC,
2021. pp. 1675–1679, 2018.
[3] Z. Yang and Z. Ge, “On paradigm of industrial big data analytics: From [26] J. Choi and S. J. Lee, “Consistency Index-Based Sensor Fault Detection
evolution to revolution,” IEEE Trans. Ind. Informat., vol. 18, no. 12, System for Nuclear Power Plant Emergency Situations Using an LSTM
pp. 8373–8388, 2022. Network,” Sensors, vol. 20, no. 6, p. 1651, 2020.
[4] Z. Chen, H. Wang, G. Chen, Y. Ma, L. Yao, Z. Ge, and Z. Song, [27] Z. Chen and Z. Ge, “Knowledge automation through graph mining,
“Analyzing and improving supervised nonlinear dynamical probabilistic convolution, and explanation framework: A soft sensor practice,” IEEE
latent variable model for inferential sensors,” IEEE Trans. Ind. Informat., Trans. Ind. Informat., vol. 18, no. 9, pp. 6068–6078, 2022.
pp. 1–12, 2024. [28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
[5] H. Wang, Z. Chen, Z. Liu, L. Pan, H. Xu, Y. Liao, H. Li, and X. Liu, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Proc. Adv.
“Spot-i: Similarity preserved optimal transport for industrial iot data Neural Inf. Process. Syst., vol. 30, 2017.
imputation,” IEEE Trans. Ind. Informat., pp. 1–9, 2024. [29] X. Yuan, N. Xu, L. Ye, K. Wang, F. Shen, Y. Wang, C. Yang, and
[6] Z. Chen, H. Wang, Z. Song, and Z. Ge, “Improving data-driven inferen- W. Gui, “Attention-based interval aided networks for data modeling
tial sensor modeling by industrial knowledge: A bayesian perspective,” of heterogeneous sampling sequences with missing values in process
IEEE Trans. Syst., Man, Cybern., Syst., vol. 20, no. 11, pp. 13296– industry,” IEEE Trans. Ind. Informat., 2023.
13307, 2024. [30] X. Yuan, Z. Jia, Z. Xu, N. Xu, L. Ye, K. Wang, Y. Wang, C. Yang,
[7] H. Wang, Z. Wang, Y. Niu, Z. Liu, H. Li, Y. Liao, Y. Huang, and W. Gui, and F. Shen, “Hierarchical self-attention network for industrial
X. Liu, “An accurate and interpretable framework for trustworthy process data series modeling with different sampling rates between the input and
monitoring,” IEEE Trans. Artif. Intell, vol. 5, no. 5, pp. 2241–2252, output sequences,” IEEE Trans. Neural Netw. Learn. Syst., 2024.
2023. [31] X. Yuan, L. Huang, L. Ye, Y. Wang, K. Wang, C. Yang, W. Gui, and
[8] H. Xu, Z. Liu, H. Wang, C. Li, Y. Niu, W. Wang, and X. Liu, “Denoising F. Shen, “Quality prediction modeling for industrial processes using
diffusion straightforward models for energy conversion monitoring data multiscale attention-based convolutional neural network,” IEEE Trans.
imputation,” IEEE Trans. Ind. Informat., 2024. Cybern., 2024.
[9] Z. Yang, T. Hu, L. Yao, L. Ye, Y. Qiu, and S. Du, “Stacked dual-guided [32] X. Zhang, S. Zhao, Z. Song, H. Guo, J. Zhang, C. Zheng, and W. Qiang,
autoencoder: A scalable deep latent variable model for semi-supervised “Not all frequencies are created equal: Towards a dynamic fusion
industrial soft sensing,” IEEE Trans. Instrum. Meas., vol. 73, pp. 1–14, of frequencies in time-series forecasting,” in Proc. ACM Int. Conf.
2024. Multimedia, p. 4729–4737, 2024.
[10] Z. Chen, Z. Song, and Z. Ge, “Variational inference over graph: [33] J. Liang, S. Liang, A. Liu, K. Ma, J. Li, and X. Cao, “Exploring inconsis-
Knowledge representation for deep process data analytics,” IEEE Trans. tent knowledge distillation for object detection with data augmentation,”
Knowl. Data Eng., vol. 36, no. 6, pp. 2730–2744, 2024. in Proc. ACM Int. Conf. Multimedia, p. 768–778, 2023.
[11] F.-C. Chen and M. R. Jahanshahi, “Nb-cnn: Deep learning-based crack [34] W. Qin, B. Zou, X. Li, W. Wang, and H. Ma, “Micro-expression
detection using convolutional neural network and naı̈ve bayes data spotting with face alignment and optical flow,” in Proc. ACM Int. Conf.
fusion,” IEEE Trans. Ind. Electron., vol. 65, no. 5, pp. 4392–4400, 2018. Multimedia, p. 9501–9505, 2023.
[12] M. Embrechts and S. Benedek, “Hybrid identification of nuclear power [35] T. Ma, G. Guo, Z. Li, and Z. Yang, “Infrared small target detection
plant transients with artificial neural networks,” IEEE Trans. Ind. Elec- method based on high-low frequency semantic reconstruction,” IEEE
tron., vol. 51, no. 3, pp. 686–693, 2004. Geosci. Remote. Sens. Lett., 2024.
[13] L. Yao, W. Shao, and Z. Ge, “Hierarchical quality monitoring for large- [36] W. Zou, H. Gao, W. Yang, and T. Liu, “Wave-mamba: Wavelet state
scale industrial plants with big process data,” IEEE Trans. Neural Netw. space model for ultra-high-definition low-light image enhancement,” in
Learn. Syst., vol. 32, no. 8, pp. 3330–3341, 2019. Proc. ACM Int. Conf. Multimedia, p. 1534–1543, 2024.
[14] L. Yao and Z. Ge, “Industrial big data modeling and monitoring frame- [37] M. B. Roth and P. Jaramillo, “Going nuclear for climate mitigation:
work for plant-wide processes,” IEEE Trans. Ind. Informat., vol. 17, An analysis of the cost effectiveness of preserving existing us nuclear
no. 9, pp. 6399–6408, 2020. power plants as a carbon avoidance strategy,” Energy, vol. 131, pp. 67–
[15] X. Jiang and Z. Ge, “Data augmentation classifier for imbalanced fault 77, 2017.
classification,” IEEE Trans. Autom. Sci. Eng., vol. 18, no. 3, pp. 1206– [38] L. Li, A. J. Blomberg, J. D. Spengler, B. A. Coull, J. D. Schwartz, and
1217, 2020. P. Koutrakis, “Unconventional oil and gas development and ambient
[16] S. Liao, X. Jiang, and Z. Ge, “Weakly supervised multilayer perceptron particle radioactivity,” Nat. Commun., vol. 11, no. 1, pp. 1–8, 2020.
for industrial fault classification with inaccurate and incomplete labels,” [39] C. Liu, W. Zhang, K. Ungar, E. Korpach, B. White, M. Benotto, and
IEEE Trans. Autom. Sci. Eng., vol. 19, no. 2, pp. 1192–1201, 2020. E. Pellerin, “Development of a national cosmic-ray dose monitoring sys-
tem with health canada’s fixed point surveillance network,” J. Environ.

Radioact., vol. 190, pp. 31–38, 2018.
[40] J. Hirouchi, S. Hirao, J. Moriizumi, H. Yamazawa, and A. Suzuki, “Esti-
mation of infiltration and surface run-off characteristics of radionuclides
from gamma dose rate change after rain,” J. Nucl. Sci. Technol., vol. 51,
no. 1, pp. 48–55, 2014.
[41] L. Tang, L. Yu, S. Wang, J. Li, and S. Wang, “A novel hybrid ensemble
learning paradigm for nuclear energy consumption forecasting,” Appl.
Energy, vol. 93, pp. 432–443, 2012.
[42] H. Zhou, S. Zhang, J. Peng, S. Zhang, J. Li, H. Xiong, and W. Zhang,
“Informer: Beyond efficient transformer for long sequence time-series
forecasting,” in Proc. AAAI Conf. Artif. Intell., vol. 35, pp. 11106–11115,
2021.
[43] Y. Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “itrans-
former: Inverted transformers are effective for time series forecasting,”
in Proc. Int. Conf. Learn. Represent., 2024.
[44] D. P. Kingma and J. A. Ba, “Adam: A method for stochastic optimiza-
tion,” in Proc. Int. Conf. Learn. Represent., vol. 434, pp. 1–15, 2015.
[45] X. Yuan, L. Li, Y. A. Shardt, Y. Wang, and C. Yang, “Deep learning
with spatiotemporal attention-based lstm for industrial soft sensor model
development,” IEEE Trans. Ind. Electron., vol. 68, no. 5, pp. 4404–4414,
2020.
[46] Z. Yang, L. Yao, B. Shen, and P. Wang, “Probabilistic fusion model
for industrial soft sensing based on quality-relevant feature clustering,”
IEEE Trans. Ind. Informat., vol. 19, no. 8, pp. 9037–9047, 2022.
[47] B. Shen, L. Yao, Z. Yang, and Z. Ge, “Mode information separated
β-vae regression for multimode industrial process soft sensing,” IEEE
Sensors Journal, vol. 23, no. 9, pp. 10231–10240, 2023.
[48] H. Wang, Z. Chen, J. Fan, H. Li, T. Liu, W. Liu, Q. Dai, Y. Wang,
Z. Dong, and R. Tang, “Optimal transport for treatment effect estima-
tion,” in Proc. Adv. Neural Inf. Process. Syst., pp. 1–9, 2023.
[49] K. Liu, N. Lu, F. Wu, R. Zhang, and F. Gao, “Model fusion and
multiscale feature learning for fault diagnosis of industrial processes,”
IEEE Trans. Cybern., vol. 53, no. 10, pp. 6465–6478, 2022.
[50] T. Zhang, X. Gong, and C. P. Chen, “Bmt-net: Broad multitask trans-
former network for sentiment analysis,” IEEE Trans. Cybern., vol. 52,
no. 7, pp. 6232–6243, 2021.
[51] B. Pu, J. Liu, Y. Kang, J. Chen, and S. Y. Philip, “Mvstt: A multiview
spatial-temporal transformer network for traffic-flow forecasting,” IEEE
Trans. Cybern., vol. 54, no. 3, pp. 1582–1595, 2022.
[52] T. Zhou, Z. Ma, Q. Wen, X. Wang, L. Sun, and R. Jin, “Fedformer:
Frequency enhanced decomposed transformer for long-term series fore-
casting,” in Proc. Int. Conf. Mach. Learn., pp. 27268–27286, PMLR,
2022.
[53] S. Liu, H. Yu, C. Liao, J. Li, W. Lin, A. X. Liu, and S. Dustdar,
“Pyraformer: Low-complexity pyramidal attention for long-range time
series modeling and forecasting,” in Proc. Int. Conf. Learn. Represent.,
2021.
[54] S. Li, X. Jin, Y. Xuan, X. Zhou, W. Chen, Y.-X. Wang, and X. Yan,
“Enhancing the locality and breaking the memory bottleneck of trans-
former on time series forecasting,” in Proc. Adv. Neural Inf. Process.
Syst., vol. 32, 2019.
[55] Q. Jiang, S. Chen, X. Yan, M. Kano, and B. Huang, “Data-driven
communication efficient distributed monitoring for multiunit industrial
plant-wide processes,” IEEE Trans. Autom. Sci. Eng., vol. 19, no. 3,
pp. 1913–1923, 2021.
[56] H. Ma, Y. Wang, H. Chen, J. Yuan, and Z. Ji, “Quality-oriented efficient
distributed kernel-based monitoring strategy for nonlinear plant-wide
industrial processes,” IEEE Trans. Autom. Sci. Eng., 2023.
[57] J. Zhang, M. Chen, and X. Hong, “Monitoring multimode nonlinear
dynamic processes: an efficient sparse dynamic approach with continual
learning ability,” IEEE Trans. Ind. Informat., vol. 19, no. 7, pp. 8029–
8038, 2022.
[58] K. Wang and Z. Song, “High-dimensional cross-plant process monitor-
ing with data privacy: A federated hierarchical sparse pca approach,”
IEEE Trans. Ind. Informat., 2023.
[59] S. Wu, X. Xiao, Q. Ding, P. Zhao, Y. Wei, and J. Huang, “Adversarial
sparse transformer for time series forecasting,” in Proc. Adv. Neural Inf.
Process. Syst., vol. 33, pp. 17105–17115, 2020.

2501.01342v1

Uploaded by

Copyright:

Available Formats

2501.01342v1

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

2501.01342v1

Uploaded by

Copyright:

Available Formats

IEEE TRANSACTIONS ON XX, VOL. XX, NO.

DeepFilter: An Instrumental Baseline for Accurate

based on the self-attention mechanism is inadequate for process

term patterns—such as trends and periodicities—crucial for

n=0 m=0 The GF block efficiently processes historical monitoring

A: Radiation monitoring station TABLE I

Station Location #Sample #Variable Interval

H=1 H=3 H=5 H=7

capturing discriminative patterns within this context. Modi- TABLE IV

0.2 100 0.2

(a) Performance comparison on the Hegang dataset.

0.15 400 0.15

(b) Performance comparison on the Jinan dataset.

Fig. 5. In-depth comparison on the monitoring performance of DeepFilter and Transformer.

iTransformer iTransformer the minimum sequential ops and path length.

0.99 0.02 to capture nonlinear temporal dependencies limits their ef-

0.98 fectiveness. Statistical methods, including decision trees[21],

0.95 Neural Networks (GNNs) [27], have been developed to extract

0.97 deep learning in the context of process monitoring.

enhanced representation. However, the step-wise correlation

tem with health canada’s fixed point surveillance network,” J. Environ.

You might also like