2501.01342v1
2501.01342v1
2501.01342v1
XX, XXXX 1
Abstract—Effective process monitoring is increasingly vital in These examples underscore the pressing need for advanced
industrial automation for ensuring operational safety, necessitat- process monitoring systems to mitigate risks, reduce costs,
ing both high accuracy and efficiency. Although Transformers and ensure safety, reliability, and efficiency in industrial au-
have demonstrated success in various fields, their canonical form
tomation [3, 13, 14].
arXiv:2501.01342v1 [cs.AI] 2 Jan 2025
𝑾(%)
To overcome these limitations, we propose DeepFilter, a 𝐑 ∈ ℝ!×#
refined Transformer architecture that replaces the self-attention Layer-norm
layer with a novel global filtering block specifically tailored
for process monitoring. This block performs adaptive filtering 𝐙 (%) ∈ ℂ!×#
across the entire temporal sequence, effectively modeling the
FFT
long-term discriminative patterns. Theoretical analysis further Efficient filtering layer
confirms its capacity to enhance representations of long-term 𝐙 ∈ ℝ!×#
discriminative patterns. Moreover, by discarding the quadratic ×K GF block
complexity inherent in self-attention, the global filtering block
significantly reduces computational overhead. Extensive evalu- 𝐙 ∈ ℝ!×#
Affine layer
ations on real-world datasets demonstrate that DeepFilter con-
sistently delivers superior accuracy and efficiency relative to 𝐗 ∈ ℝ!×#
state-of-the-art models, highlighting its role as an instrumental
baseline for Transformer-based process monitoring. Fig. 1. Overview of the core components in DeepFilter.
Organization. Section II provides a detailed description of
the DeepFilter architecture. Section III presents a case study
on the process monitoring for real-world nuclear power plants, are detected as significant deviations between the actual and
demonstrating the improvements in accuracy and efficiency predicted quality variable values.
achieved by DeepFilter. Section IV offers a review of related
works in process monitoring and highlight the contribution B. Global Filtering Block
of this work in the context of existing studies. Finally, we
The fundamental component of DeepFilter is the Global Fil-
summarize our conclusions, limitations and outline directions
tering (GF) block, as illustrated in Fig. 1, which integrates an
for future research.
efficient filtering layer for mixing information across different
time steps and a feed-forward network (FFN) layer for mixing
II. M ETHODOLOGY
information across different channels.
In response to the inherent limitations of Transformer in ac- Let Z ∈ RT×D denote the input sequence to the k-th GF
curate and efficient process monitoring, this section introduces block, where T is the window length of historical monitoring
the DeepFilter approach. DeepFilter replaces the self-attention logs and D is the hidden dimension. The global filtering
layer in Transformer with an efficient filtering layer for fusing process begins by transforming Z from the time domain to the
the information across different steps, effectively enhancing frequency domain using the Fast Fourier Transform (FFT):
accuracy and operational efficiency for process monitoring.
Z(F) = F(Z), (1)
A. Problem Definition where F denotes the FFT operation. In the frequency domain,
A monitoring log consists of a chronological sequence noisy and discriminative patterns are often easily isolated.
of observations [L(1), L(2), . . . , L(P)], where each L(t) ∈ Typically, noisy patterns often reside in high-frequency com-
R1×Din represents the observation at the t-th step with Din ponents [32–34], while discriminative patterns often emerge
covariates. We define X ∈ RT×Din as the historical sequence in low-frequency components [35, 36]. To extract the discrim-
and y H ∈ R as the quality variable, where T is the length of inative patterns and suppress the noisy ones, we perform a
the historical window and H is the monitoring horizon. At an filtering operation using the Hadamard product:
arbitrary time step t, the historical sequence is represented as
X = [L(t − T + 1), . . . , L(t)], and the corresponding quality Z̄(F) = Z(F) ⊙ W̄, (2)
variable is specified as the final feature in L(t + H). where W̄ ∈ CT×D contains learnable parameters that are
The objective of process monitoring is to develop a pre- optimized to discern discriminative patterns in model training.
dictive model g : RT×Din → R that generates the quality The filtered sequence is then transformed back to the time
variable perdition g(X) = ŷ H → y H . In practical process domain via the inverse FFT:
monitoring applications, the training dataset predominantly
comprises normal operational logs. Consequently, anomalies Z̄ = F −1 (Z̄(F) ), (3)
IEEE TRANSACTIONS ON XX, VOL. XX, NO. XX, XXXX 3
which is immediately followed by a residual connection and facilitates the modeling of long-term dependencies. Both pe-
layer normalization to stabilize the training process and miti- riodic and long-term patterns are typically discriminative for
gate gradient degradation: process monitoring, in contrast to the step-wise correlations
captured by standard Transformers.
R = LayerNorm(Z̄ + Z), (4)
• The efficient filtering layer reduces the computational com-
which is the output of the efficient filtering layer. To demon- plexity, thereby improving efficiency. The overall complex-
strate the efficacy of the operations in this layer, we restate ity of the efficient filtering layer is O(T log T), significantly
the convolution theorem below. lower than that of self-attention layers and convolution
layers with a full receptive field (O(T2 )).
Theorem II.1. Suppose W = F −1 (W(F) ), ”∗” is the
While the filtering layer captures dominant temporal pat-
circular convolution operator, the filtered sequence in (3) can
terns in each channel, it does not incorporate channel-wise
be acquired by performing circular convolution below
interactions. To fill in the gap, we introduce FFN as follows:
Z̄ = W ∗ Z.
FFN(R) = ReLU(RW(1) + b(1) )W(2) + b(2) , (5)
(F)
Proof. It is equivalent to prove F(W ∗ Z) = Z ⊙ W̄. To
R̄ = LayerNorm(FFN(R) + R), (6)
this end, the n-th element of the circular convolution above
can be expressed as follow where W(1) , b(1) , W(2) and b(2) are learnable parameters.
T−1
X To stabilize the training process, residual connection and layer
Z̄n = Wm Z(n−m)%T normalization are subsequently applied.
m=0 In a nutshell, the GF block captures temporal and channel-
wise patterns via the efficient filtering layer and the FFN layer,
On the basis, the FFT of Z̄ is denoted as Z̄(F) , where the ω-th
respectively, contributing to a representation R̄ ∈ RT×D that
element is given by:
comprehensively understands the process monitoring logs.
X T−1
T−1
2πi
C. DeepFilter Architecture and Learning Objective
X
Z̄(F)
ω = Wm Z(n−m)%T e− T nω
Meteorological conditions
Temperature ◦C
Humidity %
(a) National station distribution. (b) Composition of a single station. Atmosphere pressure hPa
Wind direction Clockwise angle
Fig. 2. Overview of the nuclear monitoring stations in our project. Wind speed m/s
Precipitation indicator Boolean value
Amount of precipitation mm
Spectrometer operating conditions
Battery voltage V
Spectrometer voltage V
Spectrometer temperature ◦C
TABLE II
D ESCRIPTION OF SAMPLING STRATEGY.
GRU decoder, and the filter tensor W(F) . These parameters worldwide, such as the RadNet in the United States [38], the
are optimized by minimizing the mean squared error (MSE) Fixed Point Surveillance Network in Canada [39], and the At-
between the predicted and actual quality variable values, mospheric Nuclear Radiation Monitoring Network (ANRMN)
defined as: in China, as shown in Fig. 2. These systems enable continuous,
2
L := y H − ŷ H . (10) reliable monitoring of radionuclide concentrations, reflecting
whether NPPs are operating normally.
III. E XPERIMENTS The key quality variable monitored by these systems is
the atmospheric γ-ray dose rate, measured using ionization
This section aims to empirically validating the effectiveness chambers (Fig. 3) [40]. Fig. 4(a) illustrates the non-stationary
of DeepFilter in the context of process monitoring. To this end, dynamics of this variable, likely influenced by external fac-
there are three aspects need to be investigated. tors such as weather conditions. To account for these in-
1) Accuracy: Does DeepFilter work effectively? Section III-C fluences, the monitoring systems include additional process
compares the accuracy of DeepFilter against baselines on variables, such as spectrometer measurements (spanning 1024
two large-scale real-world process monitoring datasets. channels), meteorological conditions (e.g., precipitation), and
2) Efficiency: Does DeepFilter work efficiently? Section III-D spectrometer operational parameters (e.g., battery voltage).
evaluates the actual running time of DeepFilter and Trans- These process variables, shown in Fig. 4(b-d), display diverse
formers under varying configurations. temporal patterns. Frequency domain analysis reveals concen-
3) Sensitivity: Is DeepFilter sensitive to hyperparameter vari- trated energy in low-frequency bands, which diminishes at
ation? Section III-E evaluates and analyzes the perfor- higher frequencies, highlighting the potential of frequency-
mance of DeepFilter under varying hyperparameter values. based models to extract semantic-rich representations.
The monitoring logs integrate 1,024-channel spectrometer
A. Background and Data Collection data, seven meteorological covariates, and three operational
parameters, as summarized in Table I. By consolidating diverse
Nuclear power plants (NPPs) play a crucial role in in-
data sources, these logs provide a comprehensive view of
dustrial automation, providing a stable and efficient energy
NPP operational status, facilitating the construction of process
source. In the United States, NPPs produce nearly 800 billion
monitoring systems to safeguard its operations.
kilowatt-hours of electricity annually, accounting for over 60%
of the nation’s emission-free electricity [37]. This reduces
approximately 500 million metric tons of carbon emissions, B. Experimental Setup
demonstrating their environmental and industrial significance. 1) Datasets: We employ two industrial datasets sourced
However, NPPs also pose security risks, as operational anoma- from monitoring logs collected from the ANRMN project.
lies can lead to radionuclide leaks, resulting in severe en- These datasets encompass 1034 input variables, as detailed
vironmental pollution and casualties [11]. To mitigate these in Table I. The statistics of preprocessed data is present in
risks, automated monitoring networks have been deployed Table II. Each dataset is sequentially divided into training,
IEEE TRANSACTIONS ON XX, VOL. XX, NO. XX, XXXX 5
1.0
is systematically performed following the standard protocol [7]
to enhance model performance. The learning rate is tuned
Volumn
0.5
within {0.001, 0.005, 0.01}; the batch size is tuned within {32,
64}; the number of GF blocks is set to 2; the historical window
0.0
0 2000 4000 6000 8000 10000
length is set to 16. The model is trained for a maximum of
Time stamp 200 epochs. An early stopping strategy with a patience of 15
(a) The dynamics of the quality variable: dose rate. epochs is employed, stopping training if no improvement was
1.0
observed within the validation set. Finally, the performance
2000 metrics on the test set is calculated and reported.
Volumn
Volumn
0.5
4) Evaluation Strategy: The coefficient of determination
1000
(R2 ) is selected for evaluating monitoring accuracy:
0.0 0 H 2
PN H
0 2000 4000 6000 8000 10000 0 1000 2000 3000 4000 5000 2 t=1 yt − ŷt
Time stamp Frequency bin R = 1 − PN , (11)
H H 2
(b) The 64-th channel of the spectrometer measurements and its FFT. t=1 yt − ȳt
1.0
where ytH represents the actual dose rate, ŷtH denotes the
4000 estimated dose rate by the model, and ȳtH is the mean value
Volumn
Volumn
0.5 of the actual dose rates over the test set of size N. This
2000
metric effectively captures the proportion of variance in y H
0.0 0 that is predictable from ŷ H . We also incorporate the root mean
0 2000 4000 6000 8000 10000 0 1000 2000 3000 4000 5000
Time stamp Frequency bin
squared error (RMSE) and the mean absolute error (MAE) as
supplementary metrics, quantifying the average magnitude of
(c) The temperature measurements and its FFT.
the prediction errors.
1.0
4000
v
u
u1 X N
Volumn
Volumn
2
0.5 RMSE = t (y H − ŷtH ) ,
2000
N t=1 t
(12)
0.0 0 N
0 2000 4000 6000 8000 10000 0 1000 2000 3000 4000 5000 1 X H
Time stamp Frequency bin MAE = |(y − ŷtH )|.
N t=1 t
(d) The humidity measurements and its FFT.
Fig. 4. Dynamics of the quality variable (a) and some important process C. Overall Performance
variables (b-d) and in the collected dataset from Jinan station. We also provide
the amplitude characteristics of their FFT in (b-d). In this section, we compare the monitoring accuracy of
DeepFilter and baselines across four distinct forecast horizons
(H = 1, 3, 5, 7). Results are shown in Table III with key
validation, and testing subsets with allocation ratios of 70%, observations below:
15%, and 15%, respectively. To ensure effective model training
• Identification models demonstrate limited accuracy in pro-
and evaluation, the datasets are normalized using a min-max
cess monitoring. Primarily designed to capture linear au-
scaling technique.
tocorrelations, identification models struggle to model the
2) Baselines: We compare DeepFilter with three categories
non-linear patterns that are prevalent in many datasets. This
of baselines as follows:
limitation becomes more pronounced in long-term forecast-
• Identification methods: AR, MA and ARIMA [20]; ing scenarios, where ARIMA models, for instance, record
• Statistical methods: Lasso Regression (LASSO) [23], relatively high MAE of 0.158 and 0.127 on the Hegang and
Support Vector Regression (SVR) [41], Random Forest Jinan datasets, respectively, for H = 7.
(RF) [21] , and eXtreme Gradient Boosting (XGB) [22]; • Statistic models integrate external factors, such as me-
• Deep methods: Long Short-Term Memory (LSTM) [26], teorological conditions, and demonstrate competitive per-
Gated Recurrent Unit (GRU) [25], Transformer [28], formance in short-term monitoring. Non-linear estimators,
Informer [42], AttentionMixer [7] and iTransformer [43]. particularly XGBoost, outperform linear models due to
Aligning with the prevailing work [7], we employ a their higher modeling capacities. Notably, XGBoost exhibits
GRU decoder for transformer-based baselines to produce robust performance across both datasets, achieving results
quality variable prediction. comparable to traditional deep learning models like LSTM
3) Training Strategy: All experimental procedures are ex- and GRU across most evaluation metrics.
ecuted using the PyTorch framework, utilizing the Adam • Deep models achieve the best performance among baselines.
optimizer [44] for its adaptive learning rate capabilities and ef- Specifically, Transformer-based methods display varying
ficient convergence properties. The experiments are conducted accuracy contingent upon their temporal fusing mechanisms.
on a hardware platform comprising two Intel(R) Xeon(R) Among them, the standard Transformer model, utilizing
Platinum 8383C CPUs operating at 2.70 GHz and eight self-attention mechanisms, shows suboptimal monitoring
NVIDIA GeForce 1080Ti GPU. Hyperparameter optimization accuracy, highlighting the limitations of self-attention in
IEEE TRANSACTIONS ON XX, VOL. XX, NO. XX, XXXX 6
TABLE III
C OMPARATIVE S TUDY ON THE H EGANG AND J INAN DATASETS OVER FOUR FORECAST HORIZONS .
Ground truth Transformer DeepFilter DeepFilter Transformer Ground truth Transformer DeepFilter
0.5 0.5
300
Prediction
0.4 0.4
Count
Value
200
0.3 0.3
0.1 0.1
0
1000 1200 1400 1600 1800 2000 0.00 0.05 0.10 0.15 0.20 0.1 0.2 0.3 0.4 0.5
Time stamp Volume of MAE Ground truth
Ground truth Transformer DeepFilter DeepFilter Transformer Ground truth Transformer DeepFilter
Prediction
300
Count
Value
0.10 0.10
200
0.05 0.05
100
0
1000 1200 1400 1600 1800 2000 0.00 0.05 0.10 0.15 0.20 0.025 0.050 0.075 0.100 0.125 0.150 0.175
Time stamp Volume of MAE Ground truth
Time (s)
DeepFilter DeepFilter
ing the actual inference times of iTransformer, Transformer,
−2 −2
10 10 and DeepFilter under various conditions. These results
−3 −3
corroborate the theoretical complexity in Table IV, with
10 10
DeepFilter consistently outperforming both iTransformer
32 64 128 256 512 1024 32 64 128 256 512 1024
Historical length (L) Feature Number (D) and Transformer in terms of inference speed.
(a) Inference time on the Intel(R) Xeon(R) Gold 6140 CPU. The results above underscore DeepFilter’s superior effi-
ciency and scalability, making it highly suitable for practical
−2
DeepFilter
10
−2 DeepFilter process monitoring applications where the operational effi-
10 iTransformer iTransformer
ciency of monitoring system is a demanding factor.
Time (s)
Time (s)
Transformer Transformer
−3 10
−3 E. Parameter Sensitivity Study
10
In this section, we investigate the impact of key hyperpa-
32 64 128 256 512 1024 32 64 128 256 512 1024 rameters on DeepFilter’s performance, including the number
Historical length (L) Feature Number (D)
of global filtering blocks (K), window length (L), the number
(b) Inference time on the Nvidia RTX Titan GPU.
of hidden dimensions (D), and batch size. The results are
Fig. 6. Inference time given varying settings, with solid lines for mean values summarized in Fig. 7 with key observations as follows:
of 10 trials and shaded areas for 90% confidence intervals. The default values
• DeepFilter’s performance is not significantly dependent on
of L, D and batch size are 16, 32, and 64, respectively.
a deep stack of blocks. As illustrated in Fig. 7(a), utilizing
1-2 blocks already yields promising results. While adding
more blocks can incrementally improve performance, there
• The theoretical results are presented in Table IV. Specif- is a risk of performance degradation, potentially due to
ically, RNNs exhibit inefficiency due to heavy sequential overfitting and optimization challenges.
ops and lengthy path lengths. In contrast, Transformer and • The model’s effectiveness improves with the inclusion of
iTransformer reduce sequential operations and path lengths, adequate historical monitoring data. In Fig. 7(b), we observe
enabling them to process entire sequences within a single that performance is suboptimal at L=4 but significantly
computational block and leverage GPU acceleration effec- enhances at L=8. Extending the window length further can
tively. However, they incur quadratic complexity relative to lead to improved monitoring accuracy, indicating the value
T and D, respectively, producing inefficiency when dealing of incorporating sufficient historical context in the model.
with long sequences or numerous covariates. • The relationship between the number of hidden dimensions
• DeepFilter addresses these limitations by incorporating a and performance does not follow a clear pattern as per
filtering layer that reduces the computational complexity Fig. 7(c). A smaller dimension appears sufficient to effec-
to O(T · log T · D), which is notably lower than that of tively model the monitoring log at both stations, suggesting
Transformer and iTransformer models, while maintaining that the logs contain a high degree of redundancy. Finally,
IEEE TRANSACTIONS ON XX, VOL. XX, NO. XX, XXXX 8
RMSE
XGBoost [22], and generalized linear models [23], were
R2
0.97
subsequently introduced to address these limitations. These
0.96 methods improve accuracy by capturing nonlinear patterns,
Hegang Jinan
0.95 0.01 but they rely heavily on manual feature engineering and face
1 2 3 4 1 2 3 4
K K
scalability issues in large-scale industrial settings.
The advent of deep learning has revolutionized data-driven
(a) Performance with varying numbers of blocks (K).
process monitoring, enabling automatic feature extraction and
Hegang Jinan
enhanced parallel computing capabilities. Various architec-
0.98
0.02 tures, such as Convolutional Neural Networks (CNNs) [24],
Recurrent Neural Networks (RNNs) [25, 26], and Graph
RMSE
R2
0.97
multi-scale dependencies, and LogTrans [54] for locality
R2
in monitoring logs while significantly reducing computational [17] W. Shao, Y. Li, W. Han, and D. Zhao, “Block-wise parallel semisu-
complexity. Our experimental results on real-world process pervised linear dynamical system for massive and inconsecutive time-
series data with application to soft sensing,” IEEE Trans. Instrum. Meas.,
monitoring datasets demonstrate that DeepFilter outperforms vol. 71, pp. 1–14, 2022.
existing state-of-the-art models in both accuracy and effi- [18] W. Shao, W. Han, C. Xiao, L. Chen, M.-Q. Yu, and J. Chen, “Semi-
ciency, effectively meeting the stringent demands of modern supervised robust hidden markov regression for large-scale time-series
industrial data analytics and its applications to soft sensing,” IEEE Trans.
process monitoring. Autom. Sci. Eng., 2024.
Limitations and Future Work. The current implementation [19] Z. Yang, T. Hu, L. Yao, L. Ye, Y. Qiu, and S. Du, “Stacked dual-guided
of DeepFilter utilizes the FFT for domain transformation, autoencoder: A scalable deep latent variable model for semi-supervised
industrial soft sensing,” IEEE Trans. Instrum. Meas., 2024.
which is well-suited for typical stationary monitoring data but [20] A. Chandrakar, D. Datta, A. K. Nayak, and G. Vinod, “Statistical
may not effectively accommodate monitoring logs with swiftly analysis of a time series relevant to passive systems of nuclear power
varying patterns. Future research could investigate alterna- plants,” Int. J. Syst. Assur. Eng. Manag., vol. 8, no. 1, pp. 89–108, 2017.
[21] S. Grape, E. Branger, Z. Elter, and L. P. Balkeståhl, “Determination
tive transformation techniques, such as wavelet transforms of spent nuclear fuel parameters using modelled signatures from non-
or adaptive filtering methods, which are better equipped to destructive assay and random forest regression,” Nucl. Instrum. Methods
handle non-stationary monitoring logs. Another future work Phys. Res., Sect. A, vol. 969, p. 163979, 2020.
[22] J. I. Aizpurua, S. D. J. McArthur, B. G. Stewart, B. Lambert, J. G. Cross,
is deploying the proposed model on our online monitoring and V. M. Catterson, “Adaptive power transformer lifetime predictions
platform to support national-wide nuclear power plants. through machine learning and uncertainty modeling in nuclear power
plants,” IEEE Trans. Ind. Electron., vol. 66, no. 6, pp. 4726–4737, 2019.
[23] Y. K. Chan and Y. C. Tsai, “Multiple regression approach to predict
R EFERENCES turbine-generator output for Chinshan nuclear power plant,” Kerntechnik,
[1] H. Wang, X. Liu, Z. Liu, H. Li, Y. Liao, Y. Huang, and Z. Chen, vol. 82, no. 1, pp. 24–30, 2017.
“Lspt-d: Local similarity preserved transport for direct industrial data [24] H. Wang, M. Peng, R. Xu, A. Ayodeji, and H. Xia, “Remaining useful
imputation,” IEEE Trans. Autom. Sci. Eng., 2024. life prediction based on improved temporal convolutional network for
[2] J. Yu, L. Ye, L. Zhou, Z. Yang, F. Shen, and Z. Song, “Dynamic process nuclear power plant valves,” Front. Energy Res., vol. 8, p. 296, 2020.
monitoring based on variational bayesian canonical variate analysis,” [25] J. Zhang, Z. Pan, W. Bai, and X. Zhou, “Pressurizer water level
IEEE Trans. Syst., Man, Cybern., Syst., vol. 52, no. 4, pp. 2412–2422, reconstruction for nuclear power plant based on gru,” in IMCCC,
2021. pp. 1675–1679, 2018.
[3] Z. Yang and Z. Ge, “On paradigm of industrial big data analytics: From [26] J. Choi and S. J. Lee, “Consistency Index-Based Sensor Fault Detection
evolution to revolution,” IEEE Trans. Ind. Informat., vol. 18, no. 12, System for Nuclear Power Plant Emergency Situations Using an LSTM
pp. 8373–8388, 2022. Network,” Sensors, vol. 20, no. 6, p. 1651, 2020.
[4] Z. Chen, H. Wang, G. Chen, Y. Ma, L. Yao, Z. Ge, and Z. Song, [27] Z. Chen and Z. Ge, “Knowledge automation through graph mining,
“Analyzing and improving supervised nonlinear dynamical probabilistic convolution, and explanation framework: A soft sensor practice,” IEEE
latent variable model for inferential sensors,” IEEE Trans. Ind. Informat., Trans. Ind. Informat., vol. 18, no. 9, pp. 6068–6078, 2022.
pp. 1–12, 2024. [28] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez,
[5] H. Wang, Z. Chen, Z. Liu, L. Pan, H. Xu, Y. Liao, H. Li, and X. Liu, Ł. Kaiser, and I. Polosukhin, “Attention is all you need,” Proc. Adv.
“Spot-i: Similarity preserved optimal transport for industrial iot data Neural Inf. Process. Syst., vol. 30, 2017.
imputation,” IEEE Trans. Ind. Informat., pp. 1–9, 2024. [29] X. Yuan, N. Xu, L. Ye, K. Wang, F. Shen, Y. Wang, C. Yang, and
[6] Z. Chen, H. Wang, Z. Song, and Z. Ge, “Improving data-driven inferen- W. Gui, “Attention-based interval aided networks for data modeling
tial sensor modeling by industrial knowledge: A bayesian perspective,” of heterogeneous sampling sequences with missing values in process
IEEE Trans. Syst., Man, Cybern., Syst., vol. 20, no. 11, pp. 13296– industry,” IEEE Trans. Ind. Informat., 2023.
13307, 2024. [30] X. Yuan, Z. Jia, Z. Xu, N. Xu, L. Ye, K. Wang, Y. Wang, C. Yang,
[7] H. Wang, Z. Wang, Y. Niu, Z. Liu, H. Li, Y. Liao, Y. Huang, and W. Gui, and F. Shen, “Hierarchical self-attention network for industrial
X. Liu, “An accurate and interpretable framework for trustworthy process data series modeling with different sampling rates between the input and
monitoring,” IEEE Trans. Artif. Intell, vol. 5, no. 5, pp. 2241–2252, output sequences,” IEEE Trans. Neural Netw. Learn. Syst., 2024.
2023. [31] X. Yuan, L. Huang, L. Ye, Y. Wang, K. Wang, C. Yang, W. Gui, and
[8] H. Xu, Z. Liu, H. Wang, C. Li, Y. Niu, W. Wang, and X. Liu, “Denoising F. Shen, “Quality prediction modeling for industrial processes using
diffusion straightforward models for energy conversion monitoring data multiscale attention-based convolutional neural network,” IEEE Trans.
imputation,” IEEE Trans. Ind. Informat., 2024. Cybern., 2024.
[9] Z. Yang, T. Hu, L. Yao, L. Ye, Y. Qiu, and S. Du, “Stacked dual-guided [32] X. Zhang, S. Zhao, Z. Song, H. Guo, J. Zhang, C. Zheng, and W. Qiang,
autoencoder: A scalable deep latent variable model for semi-supervised “Not all frequencies are created equal: Towards a dynamic fusion
industrial soft sensing,” IEEE Trans. Instrum. Meas., vol. 73, pp. 1–14, of frequencies in time-series forecasting,” in Proc. ACM Int. Conf.
2024. Multimedia, p. 4729–4737, 2024.
[10] Z. Chen, Z. Song, and Z. Ge, “Variational inference over graph: [33] J. Liang, S. Liang, A. Liu, K. Ma, J. Li, and X. Cao, “Exploring inconsis-
Knowledge representation for deep process data analytics,” IEEE Trans. tent knowledge distillation for object detection with data augmentation,”
Knowl. Data Eng., vol. 36, no. 6, pp. 2730–2744, 2024. in Proc. ACM Int. Conf. Multimedia, p. 768–778, 2023.
[11] F.-C. Chen and M. R. Jahanshahi, “Nb-cnn: Deep learning-based crack [34] W. Qin, B. Zou, X. Li, W. Wang, and H. Ma, “Micro-expression
detection using convolutional neural network and naı̈ve bayes data spotting with face alignment and optical flow,” in Proc. ACM Int. Conf.
fusion,” IEEE Trans. Ind. Electron., vol. 65, no. 5, pp. 4392–4400, 2018. Multimedia, p. 9501–9505, 2023.
[12] M. Embrechts and S. Benedek, “Hybrid identification of nuclear power [35] T. Ma, G. Guo, Z. Li, and Z. Yang, “Infrared small target detection
plant transients with artificial neural networks,” IEEE Trans. Ind. Elec- method based on high-low frequency semantic reconstruction,” IEEE
tron., vol. 51, no. 3, pp. 686–693, 2004. Geosci. Remote. Sens. Lett., 2024.
[13] L. Yao, W. Shao, and Z. Ge, “Hierarchical quality monitoring for large- [36] W. Zou, H. Gao, W. Yang, and T. Liu, “Wave-mamba: Wavelet state
scale industrial plants with big process data,” IEEE Trans. Neural Netw. space model for ultra-high-definition low-light image enhancement,” in
Learn. Syst., vol. 32, no. 8, pp. 3330–3341, 2019. Proc. ACM Int. Conf. Multimedia, p. 1534–1543, 2024.
[14] L. Yao and Z. Ge, “Industrial big data modeling and monitoring frame- [37] M. B. Roth and P. Jaramillo, “Going nuclear for climate mitigation:
work for plant-wide processes,” IEEE Trans. Ind. Informat., vol. 17, An analysis of the cost effectiveness of preserving existing us nuclear
no. 9, pp. 6399–6408, 2020. power plants as a carbon avoidance strategy,” Energy, vol. 131, pp. 67–
[15] X. Jiang and Z. Ge, “Data augmentation classifier for imbalanced fault 77, 2017.
classification,” IEEE Trans. Autom. Sci. Eng., vol. 18, no. 3, pp. 1206– [38] L. Li, A. J. Blomberg, J. D. Spengler, B. A. Coull, J. D. Schwartz, and
1217, 2020. P. Koutrakis, “Unconventional oil and gas development and ambient
[16] S. Liao, X. Jiang, and Z. Ge, “Weakly supervised multilayer perceptron particle radioactivity,” Nat. Commun., vol. 11, no. 1, pp. 1–8, 2020.
for industrial fault classification with inaccurate and incomplete labels,” [39] C. Liu, W. Zhang, K. Ungar, E. Korpach, B. White, M. Benotto, and
IEEE Trans. Autom. Sci. Eng., vol. 19, no. 2, pp. 1192–1201, 2020. E. Pellerin, “Development of a national cosmic-ray dose monitoring sys-
IEEE TRANSACTIONS ON XX, VOL. XX, NO. XX, XXXX 10