Short-term load forecasting system based on sliding fuzzy granulation and equilibrium optimizer

Li, Shoujiang; Wang, Jianzhou; Zhang, Hui; Liang, Yong

doi:10.1007/s10489-023-04599-0

Short-term load forecasting system based on sliding fuzzy granulation and equilibrium optimizer

Published: 07 June 2023

Volume 53, pages 21606–21640, (2023)
Cite this article

Download PDF

Applied Intelligence Aims and scope Submit manuscript

Short-term load forecasting system based on sliding fuzzy granulation and equilibrium optimizer

Download PDF

1351 Accesses
1 Altmetric
Explore all metrics

Abstract

Short-term electricity load forecasting is critical and challenging for scheduling operations and production planning in modern power management systems due to stochastic characteristics of electricity load data. Current forecasting models mainly focus on adapting to various load data to improve the accuracy of the forecasting. However, these models ignore the noise and nonstationarity of the load data, resulting in forecasting uncertainty. To address this issue, a short-term load forecasting system is proposed by combining a modified information processing technique, an advanced meta-heuristics algorithm and deep neural networks. The information processing technique utilizes a sliding fuzzy granulation method to remove noise and obtain uncertainty information from load data. Deep neural networks can capture the nonlinear characteristics of load data to obtain forecasting performance gains due to the powerful mapping capability. A novel meta-heuristics algorithm is used to optimize the weighting coefficients to reduce the contingency and improve the stability of the forecasting. Both point forecasting and interval forecasting are utilized for comprehensive forecasting evaluation of future electricity load. Several experiments demonstrate the superiority, effectiveness and stability of the proposed system by comprehensively considering multiple evaluation metrics.

Load Forecasting Based on Improved Long Short-Term Memory Artificial Neural Network

Short-term load forecasting method based on fuzzy optimization combined model of load feature recognition

Article Open access 20 June 2024

Correlation-Based Short-Term Electric Demand Forecasting Using ANFIS Model

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the world economy recovering from the COVID-19 pandemic, the global energy crisis is intensifying. The causes of energy crisis are multifaceted and both in terms of high energy prices and inadequate energy supply, and have had a marked impact on citizen electricity bills in major energy-consuming countries in the Americas, Europe, Asia, and Australia. An optimal balance between generation and load demand must be maintained to avoid fatal disruptions to the grid due to overloads [1]. Based on the background of energy crisis, strengthening electricity load forecasting is crucial for developing power market and ensuring stable operation of power system [2,3,4]. Electricity loads are influenced by multiple factors such as energy policies, industrial manufacturing activities, customer population, atmospheric factors, holidays. Therefore, the electricity load series exhibit nonlinear, non-stationary, seasonal, even chaotic characteristic [5,6,7]. Short-term load forecasting (STLF) is based on the information contained in historical data sources and is crucial for scheduling operations and production planning in modern power management systems. Establishing an ideal model and dealing with stochastic factors are still the main challenging of load forecasting.

In recent years, scholars have proposed various load forecasting models. In general, these models are categorized into the following: physical models [8], statistical models [9], artificial intelligence models [10, 11] and combined or hybrid models [7, 12, 13].

Physical model utilizes historical data and physical properties to establish an electricity load forecasting model. For physical models, it is usually assumed that analysis of the relationship between raw series and physical information enables future electricity load forecasting, including the load density unit consumption and elastic coefficient methods. Zhao et al. [8] utilizes multi-verse optimizer to determine parameters of discrete grey model (DGM), and then utilizes the optimized DGM model to forecast annual peak load. Although the parameters of the DGM model are optimized and can perform short-term forecasting, the DGM model is suitable for exponentially growing series and may not yield satisfactory predictions for load data. Xiao et al. [14] utilize electricity consumption per capita, load intensity, and load type estimation method to effectively forecast future long-term power consumption. However, this method is not suitable for short-term forecasting as it not only has high requirements for observation data but also only has high precision for long-term load forecasting. In conclusion, physical models, although having high accuracy in some scenarios, require high requirements of observation data and computational resources, and the existing physical models ignore the uncertainty of load forecasting and cannot satisfy the high requirements of existing grid management.

Statistical model is relatively stable for linear series data such as autoregressive integrated moving average (ARIMA) [15], vector auto-regression (VAR) [16], Kalman filters (KF) [17], and generalized autoregressive conditional heteroskedasticity (GARCH) [18]. Wang et al. [19] propose residual modification models to improve the precision of seasonal ARIMA (SARIMA) for demand forecasting. Although the prediction precision of SARIMA is improved, SARIMA cannot capture the nonlinear characteristics of the load data, and the prediction accuracy and stability are limited. Rendon-Sanchez et al. [20] propose the structural combinations of forecasts based on seasonal exponential smoothing model for load forecasting. Despite the results showing that this method can outperform competitive benchmarks, it is only suitable for seasonal serial data and also fails to capture nonlinear features. These classical statistical models are easy to implement and require less computational cost than physical model, but they cannot effectively capture the nonlinear characteristics of load data, and the load data are noisy and volatile, therefore they cannot satisfy the prediction accuracy and stability required in current grid management.

In recent decades, artificial intelligence models have obtained impressive good performance on various fields and have a strong ability to capture nonlinear features. Wang et al. [21] propose a STLF method based on attention mechanism, rolling update and bi-directional long short-term memory (Bi-LSTM) neural network. This model acquires effective input characteristics using AM and gains accuracy with the powerful mapping capability of Bi-LSTM neural network. Although Bi-LSTM network can capture nonlinear characteristics well, they are prone to overfitting and falling into local optima, thus becoming less capable of predicting future noisy and unstable load data. Wang et al. [22] propose a STLF model based on the Temporal Convolutional Network (TCN) and Light Gradient Boosting Machine (LightGBM). The TCN is used to extract input features from electrical, meteorological, and date features to obtain long-term temporal dependencies, and then to perform load forecasting with high accuracy using LightGBM. However, LightGBM may grow a deeper decision tree, which is prone to overfitting, and since it is a bias-based algorithm, it is more sensitive to noise. Therefore, artificial intelligence models can significantly enhance the performance of load forecasting, but the forecasting models tend to fall into overfitting, and most of the existing models ignore the noise and uncertainty of load data, thereby poorer forecasting accuracy on new data to the extent that the robustness and generalization of the models are insufficient.

For electricity load forecasting, it is a necessary and useful to integrate multiple models or algorithms to analyze the electricity load data in detail to improve the forecasting precision. Various hybrid models have been proposed to predict the electricity load [23, 24]. Based on support vector machine (SVM) [25] and grey wolf optimizer (GWO) [26], Barman et al. [27] propose a new STLF model. This model integrates social considerations like rituals and consumer behaviors and the results validate the superiority of the method in load forecasting. The GWO algorithm has poor population diversity and may sometimes fall into local optimal solutions. In addition, this model requires social considerations, which may not be available in many other scenarios, and thus is not very applicable. Li et al. [28] develop a combination model based on culture particle swarm optimization (CPSO) algorithm to improve the precision of STLF. This model uses the CPSO algorithm to optimize the weights of combination models to improve accuracy, but ignores the forecasting uncertainty due to the randomness and nonstationary of the load data. These combined or hybrid models can enhance the accuracy and stability of load forecasting, but lack the exploration of uncertainty information for load forecasting.

Table 1 Summary of forecasting research categories

Full size table

In addition, data preprocessing techniques have also been adopted by many scholars and obtained good forecasting accuracy. He et al. [29] develop a hybrid STLF method based on variational mode decomposition (VMD) and long short-term memory (LSTM) with Bayesian optimization algorithm (BOA). VMD uses a variational model to determine the relevant frequency bands and extract the corresponding modal components, which has wall anti-noise capability and theoretical basis, thus making the hybrid STLF method has high forecasting accuracy. However, both the modal number and penalty parameter of the VMD method decomposition need to be manually selected, and the optimal combination of parameters may not be obtained. Wang et al. [30] develop an electricity load forecasting system by the combination of local mean decomposition (LMD) technique, hybrid optimization algorithms, and several individual forecasting models. The LMD data preprocessing method used in this system is a new adaptive time-frequency analysis method with instantaneous frequency and clear physical significance. Zhang et al. [31] propose a decomposition-ensemble model for STLF by integrating singular spectrum analysis (SSA), SVM, autoregressive integrated moving (AIMA) model, and cuckoo search algorithm (CSA). The SSA method can identify and extract trends or noise to improve the forecasting performance. However, these methods also ignore the uncertainty of the forecast. Table 1 summarizes the categories of forecasting studies.

In summary, most of the studies have mainly focus on point forecasting of electricity load and have made important contributions. Current forecasting models mainly focus on adapting to various load data to improve the accuracy of the forecasting. However, these models ignore the noise and nonstationarity of the load data, resulting in forecasting uncertainty. Uncertainty analysis is crucial for electricity load data with noise and nonstationarity. To this end, a short-term load forecasting system is proposed by combining a modified information processing strategy, a novel meta-heuristics optimization algorithm and deep learning frameworks. The information processing strategy utilizes a sliding fuzzy granulation method to extract the upper bound, lower bound and median series of the raw data. Using the obtained upper and lower bound information for interval forecasting can provide more uncertain information for grid management. Deep neural networks can capture the nonlinear characteristics of load data to obtain forecasting performance gains due to the powerful mapping capability. However, no single forecasting technique is superior in all cases, therefore we consider deep neural networks to establish a combined model for STLF, and the individual models consist of temporal convolutional network (TCN) [32], gated recurrent unit (GRU) [33], convolutional neural network (CNN) [34], N-BEATS [35], and multi-layer perceptron (MLP). A novel meta-heuristics algorithm, called Equilibrium optimizer (EO), is used to optimize the weighting coefficients to reduce the contingency and improve the stability of the forecasting. The objective function of the EO algorithm is designed to combine several forecasting performance metrics, and the stability and generalization ability of the forecasting system can be enhanced by combining forecasting strategy, therefore the superiority of the proposed system can be maintained in multi-step forecasting. In this study, both point forecasting and interval forecasting are utilized for the comprehensive evaluation of future electricity load. This paper makes the following contributions:

I.:: A modified information processing strategy extracts the upper bound, lower bound and median series of the raw load data in preparation for the uncertainty analysis of electricity load.
II.:: The effective point forecasting and interval forecasting methods are proposed to improve the load forecasting accuracy and to analyze the uncertainty of electricity load data, respectively. The median series extract the key information of the raw data by sliding fuzzy granulation, and are used as features for point forecasting, and the forecasting accuracy is significantly better than that of the raw data. The upper and lower bound series extracted by sliding fuzzy granulation accurately measure the uncertainty of the raw series. The results of interval forecasting are significantly superior to those of the compared methods.
III.:: The proposed combined strategy enables the forecasting system to deal with the non-stationary and noisy characteristics of the raw series, overcoming the limitations and drawbacks of individual models, and greatly improving the forecasting accuracy, especially the multi-step interval forecasting still has a high performance.
IV.:: A novel meta-heuristics optimization algorithm can determine the optimal weight coefficients for each individual model of the proposed system. This remedied the unstable forecasting results caused by most models that only optimize for an individual model.
V.:: Several experiments are designed to validate the performance of the proposed system. A comprehensive evaluation metric system is adopted to fairly evaluate the experimental results for point forecasting and interval forecasting. The superiority of the proposed system is fully demonstrated by comparison with several benchmark methods.

2 Methodology

In this section, a data preprocessing module is designed by a modified fuzzy information granulation method, namely the sliding fuzzy granulation (SFG), and a short-term forecasting system is proposed by combining a deep neural network module and EO algorithm to enhance the performance of STLF. First, the general framework flowchart and steps of the proposed system are described, and then the functions and principles of each module are detailed.

2.1 Short-term load forecasting system

The proposed short-term load forecasting system is established by combined deep neural networks module based on SFG method and EO algorithm. The flowchart is illustrated in Fig. 1.

The specific stages are summarized as follows.

Stage 1: Data preprocessing

The SFG method is an efficient and effective data preprocessing strategy for the noisy and nonstationary load series data, which provides the upper bound, lower bound and median values of the raw data by using triangular membership function. Furthermore, the preprocessed load series data will be used for the forecasting tasks in the later stages.

Stage 2: Forecasting of individual model

The individual models of the deep neural network module perform point forecasting and interval forecasting, respectively.

A.:: Point forecasting The median series after data preprocessing extracts the key features of the raw series data and is utilized as input data, and the raw series is used as the output data, to perform the point forecasting task by the rolling forecasting mechanism, as shown in Fig. 2. The deep neural networks are utilized for the electricity load forecasting of four datasets.
B.:: Interval forecasting The upper and lower bound data extracted by the SFG method are divided into input and output data using the rolling forecasting mechanism in Fig. 2, which are then fed to each deep neural network for training to obtain the upper and lower bound estimation results of the forecast interval to form the interval forecasting. As a result, the interval forecasting results of the individual models are obtained and used in the next stage of forecasting tasks.

Stage 3: Construction of the combined forecasting system

A.:

Point forecasting In point forecasting, the forecasting results of the individual models are combined with the optimal weight coefficients obtained by EO algorithm. In general, the individual models in a traditional combination model use stable positive coefficients to determine the weights. However, in special cases where the individual models have significantly different prediction results, the weight of the best individual model is set to be 1 and the weight of the worst model is set to be 0. To improve the forecasting accuracy of the proposed system, the search space of weight for each individual model is between -2 and 2 [2, 40, 41]. The mean absolute percentage error is used as the objective function $\overline{\overline{OF}}_{PF}$ of EO algorithm, as shown in (1). Then, based on the obtained weights of the individual models, the forecasting results of such models are combined as the final electricity load point forecasting result.

$$\begin{aligned} \overline{\overline{OF}}_{PF} = \frac{1}{N} \sum _{i=1}^{N}\left|\frac{S^{act}_{i}-S^{for}_{i}}{S^{act}_{i}}\right|\times 100 \% \end{aligned}$$

(1)

where $S^{act}_{i} \text {and} S^{for}_{i}$ denote the actual values and forecasting values, respectively.

B.:

Interval forecasting The uncertainty of the operating state of the electricity system can be more accurately described by interval forecasting. Interval forecasting can describe the uncertainty of the operating state of the electricity system more precisely. The combined approach can effectively enhance the generalization ability and robustness of the load forecasting system compared to individual model. Therefore, a combined interval forecasting model based on EO algorithm is proposed. The upper and lower bounds of the individual model forecasting are combined to obtain the optimal weights by EO algorithm, and the search space of weight for each individual model is between -2 and 2 as with the point forecast [2, 40, 41]. The objective function $\overline{\overline{OF}}_{IF}$ of EO algorithm for interval forecasting is constructed as shown in the following (2).

$$\begin{aligned} \overline{\overline{OF}}_{IF} = \lambda _1|\textbf{WS}|+ \lambda _2 \textbf{FINAW}-\lambda _3 \textbf{FICP}, \end{aligned}$$

(2)

$$\begin{aligned} \begin{aligned}&{\textbf{WS}=\frac{1}{N} \sum _{i=1}^{N} S^{\alpha }_{i}}, \ \ S^{\alpha }_{i}=\left\{ \begin{aligned}-2 \alpha \vartheta _{i}-4\left[ L_{i}-S^{act}_{i}\right] ,&\text{ if } S^{act}_{i} < L_{i} \\ -2 \alpha \vartheta _{i},&\text{ if } S^{act}_{i} \in \vartheta _{i} \\ -2 \alpha \vartheta _{i}-4\left[ S^{act}_{i}-U_{i}\right] ,&\text{ if } S^{act}_{i} > U_{i} \end{aligned}\right. , \end{aligned} \end{aligned}$$

(3)

$$\begin{aligned} {\textbf{FINAW}}= {1} / (NR)\sum _{i=1}^{N}\left( U_{i}-L_{i}\right) \times 100 \% , \end{aligned}$$

(4)

$$\begin{aligned} \begin{aligned}&{\textbf{FICP}}=\frac{1}{N} \sum _{i=1}^{N} C_{i} \times 100 \% ,\ \ C_{i}= {\left\{ \begin{array}{ll}1, &{} \text{ if } S^{act}_{i} \in \left[ L_{i}, U_{i}\right] \\ 0, &{} \text{ if } S^{act}_{i} \notin \left[ L_{i}, U_{i}\right] \end{array}\right. }. \end{aligned} \end{aligned}$$

(5)

where $\vartheta _{i}=U_{i}-L_{i}$, $\alpha $ is the nominal confidence level, $L_{i}$ and $U_{i}$ are lower and upper bounds of the i th forecasting interval, and R is the range of forecasting values. Let $\lambda _i > 0, (i=1,2,3)$, which are the weight coefficient of each sub-objective function and are obtained according to the empirical trial-and-error method in the experiments. The objective function is carefully designed to cover several interval forecasting performance metrics $\textbf{WS}$, ${\textbf{FINAW}}$ and ${\textbf{FICP}}$, and the stability and generalization ability of the forecasting system can be enhanced by combining forecasting strategy, therefore the superiority of the proposed system can be maintained in multi-step forecasting.

Stage 4: Short-term electricity load forecasting

After the above stages, the combined forecasting system is developed to obtain the optimal results of STLF in this study. In order to evaluate the efficiency of the proposed system, multi-step forecasting is performed, and several metrics are adopted to analyze the significance and effectiveness of load forecasting results. The construction of the specific data features and the optimization strategy of the proposed system as well as the rolling forecasting mechanism are visualized in Fig. 2. For multi-step forecasting, the forecast origin and forecast horizon are defined as the time index t and the positive integer p. Assuming that the current data series $S_t$ is at the time index t, the target forecast is $\overline{S}_{t+p}$, where $p \ge 1$. Let $\hat{S}_{t+p}$ be the forecast estimate of the data series $S_{t+p}$, then $\hat{S}_{t+p}$ is referred to be the p-step ahead forecast of data series $S_t$ at the forecast origin p and the time index t. If p=1, $\hat{S}_{t+1}$ is referred to be the one-step ahead forecast of data series $S_t$ at the forecast origin t [42, 43]. In this study, the observed data we collected are half-hourly interval load data, therefore one-step, two-step, and three-step ahead forecasting are half-hour, one-hour, and one-and-a-half-hour ahead forecasting, respectively.

2.2 Data preprocessing module

The raw electricity load series has serious stochastic and noisy characteristics. Therefore, it is crucial to perform uncertainty analysis for the raw load series. Information granulation is an aggregation of data information with similar content, similar relations, and indistinguishable data. The concept of fuzzy information granulation [44] provides an effective data preprocessing strategy for uncertainty analysis by extracting the maximum and minimum correlation of the raw load series. The general description of fuzzy information granulation is as follows (6).

$$\begin{aligned} \overline{\varvec{g}}=(\overline{\varvec{\alpha }} \text{ is } \overline{\overline{\varvec{\mathcal {G}}}}) \text{ is } \overline{\varvec{\textbf{P}}}, \ \ \overline{\varvec{\alpha }} \in \overline{\overline{\varvec{\mathcal {X}}}}, \ \ \overline{\overline{\varvec{\mathcal {G}}}} \subseteq \overline{\overline{\varvec{\mathcal {X}}}} \end{aligned}$$

(6)

where $\overline{\varvec{\alpha }}$ is an element in $\overline{\overline{\varvec{\mathcal {X}}}}$, $\overline{\overline{\varvec{\mathcal {G}}}}$ denotes fuzzy information set, and $\overline{\varvec{\textbf{P}}}$ is the probability that $\overline{\varvec{\alpha }}$ belongs to $\overline{\overline{\varvec{\mathcal {X}}}}$. In this paper, a sliding fuzzy granulation strategy is utilized to preprocess the electricity load raw series. The SFG mainly consists of two steps.

Step 1: A subsequence $\overline{\overline{\textbf{S}}}_{sub}$ is obtained by sliding partition starting from the first element $\overline{\varvec{s}}_1$ of the raw series $\overline{\overline{\textbf{S}}}=(\overline{\varvec{s}}_0, \overline{\varvec{s}}_1, \cdots ,\overline{\varvec{s}}_t, \cdots , \overline{\varvec{s}}_T)$ with a certain size of granulation window. The sliding step is one, and each subsequence is considered as a granulation window.

Step 2: For each subsequence, the essential granulation information is extracted using a triangular membership function [45] to build fuzzy information particles and is shown in (7).

$$\begin{aligned} \overline{\overline{\varvec{\Pi }}}^{tri}{(\overline{\textbf{S}})} =\left\{ \begin{array}{cc} {(\overline{\textbf{S}}-\overline{\overline{\textbf{S}}}_{sub}^{low})}\Biggl /{(\overline{\overline{\textbf{S}}}_{sub}^{med}-\overline{\overline{\textbf{S}}}_{sub}^{low})}, &{} \overline{\overline{\textbf{S}}}_{sub}^{low} \ \ \ \leqslant \ \overline{\textbf{S}} \ \ \leqslant \ \overline{\overline{\textbf{S}}}_{sub}^{med} \\ {(\overline{\overline{\textbf{S}}}_{sub}^{up}-\overline{\varvec{S}})}\Biggl /{(\overline{\overline{\textbf{S}}}_{sub}^{up}-\overline{\overline{\textbf{S}}}_{sub}^{med})}, &{} \overline{\overline{\textbf{S}}}_{sub}^{med}< \overline{\textbf{S}} \leqslant \overline{\overline{\textbf{S}}}_{sub}^{up} \\ 0, &{} \overline{\textbf{S}} > \overline{\overline{\textbf{S}}}_{sub}^{up} \ \text {or}\ \overline{\textbf{S}}\ < \overline{\overline{\textbf{S}}}_{sub}^{low} \end{array}\right. , \end{aligned}$$

(7)

where $\overline{\overline{\textbf{S}}}_{sub}^{med}$ denotes the median of each subsequence, $\overline{\overline{\textbf{S}}}_{sub}^{low}$ is the lower bound and $\overline{\overline{\textbf{S}}}_{sub}^{up}$ is the upper bound for the data change in each sliding granulation window. Finally, the upper bound, lower bound and median series of the raw load data are obtained by the SFG processing to prepare for STLF.

2.3 Deep neural network module

Temporal convolutional network (TCN) [32] is a neural network model that is excellent at capturing temporal dependencies and local information, and can flexibly adjust the size of receptive field. The visualization of TCN is shown in Fig. 3. The specific structure is as follows.

A.:

Causal Convolution The causal convolution is the key network structure in the TCN. For one-dimensional time series input $\overline{\overline{\textbf{S}}}=(\overline{\varvec{s}}_0, \overline{\varvec{s}}_1, \cdots ,\overline{\varvec{s}}_t, \cdots , \overline{\varvec{s}}_T)$, the output $\overline{\overline{\textbf{O}}}_t$ at time t only depends on the inputs from the current time $\overline{\varvec{s}}_t$ and the partial past time such as $(\overline{\varvec{s}}_{t-1},\overline{\varvec{s}}_{t-2},\overline{\varvec{s}}_{t-3})$, without depending on any future inputs such as $(\overline{\varvec{s}}_{t+1},\overline{\varvec{s}}_{t+2},\overline{\varvec{s}}_{t+3},$ $\cdots ,\overline{\varvec{s}}_T)$.

B.:

Dilated Convolution For input $\overline{\overline{\textbf{S}}}$ and a filter $\overline{\varvec{\xi }}:{0, 1, 2,\ldots , n-1}$, the dilated convolution operation $\overline{\overline{{\varvec{\textbf{D}}}}}(\cdot )$ of the series element $\overline{\varvec{s}}$ is defined as follows:

$$\begin{aligned} \overline{\overline{{\varvec{\textbf{D}}}}}(\overline{\varvec{s}})=\left( \overline{\overline{\textbf{S}}} *_{d} \overline{\varvec{\xi }}\right) (\overline{\varvec{s}})=\sum _{i=0}^{n-1} \overline{\varvec{\xi }}(i) \cdot \overline{\overline{\textbf{S}}}_{(\overline{\varvec{s}}-d \cdot i)} \end{aligned}$$

(8)

where n denotes the filter size, d denotes the dilation factor and ($\overline{\varvec{s}}-d \cdot i$) accounts for the direction of the past. In Fig. 3, the filter size $n=2$ and dilation factor $d=[1,2,4]$. The bottom layer $d=1$ indicates that each point is sampled during the input, and the middle layer $d=2$ indicates that every two points are sampled once as the input.

C.:

Residual Blocks A transformation operation $\overline{\overline{\varvec{\Delta }}}(\cdot )$ is performed on the input $\overline{\overline{\textbf{S}}}_{(h-1)}$ via a branch of the residual block. The output $\overline{\overline{\textbf{S}}}_{(h)}$ of the residual block h is denoted as follows:

$$\begin{aligned} \overline{\overline{\textbf{S}}}_{(h)}=\overline{\varvec{\delta }}\left( \overline{\overline{\varvec{\Delta }}}\left( \overline{\overline{\textbf{S}}}_{(h-1)}\right) +\overline{\overline{\textbf{S}}}_{(h-1)}\right) \end{aligned}$$

(9)

where $\overline{\varvec{\delta }}(\cdot )$ is the activation function, $\overline{\overline{\varvec{\Delta }}}(\cdot )$ denotes some transformation operations including the dilated causal convolution layer, the WeightNorm, the activation layer (i.e., ReLU) and dropout.

Gated recurrent unit (GRU) [33] aims to solve the vanishing gradient problem that comes with a standard recurrent neural network (RNN). The structure of a single GRU is visualized on the left in Fig. 4. Let $\overline{\overline{\textbf{S}}}_t$ be the input vector, $\overline{\overline{\textbf{H}}}_t$ is hidden state at timestep t, and $\overline{\overline{\textbf{O}}}_t$ is output at timestep t. The update gate $\overline{\overline{\textbf{U}}}_t$ is used to control the state information $\overline{\overline{\textbf{H}}}_{t-1}$ of the previous timestep ${t-1}$ is brought into the current state $\overline{\overline{\textbf{H}}}_{t-1}$, and is updated by $ \overline{\overline{{\textbf{U}}}}_t=\overline{\varvec{\sigma }}\bigl (\overline{\overline{\textbf{W}}}_{\overline{\overline{{\textbf{U}}}}} \cdot \bigl [\overline{\overline{\textbf{H}}}_{t-1}, \overline{\overline{\textbf{S}}}_t\bigl ]\bigl ) $. The larger $\overline{\overline{\textbf{U}}}_t$, the more information $\overline{\overline{\textbf{H}}}_{t-1}$ of the previous timestep ${t-1}$ is brought into the current state. The reset gate $\overline{\overline{\varvec{\Gamma }}}_t$ is used to control the state information $\overline{\overline{\textbf{H}}}_{t-1}$ of the previous timestep ${t-1}$ is ignored, and is computed by $\overline{\overline{\varvec{\Gamma }}}_t=\overline{\varvec{\sigma }}\bigl (\overline{\overline{\textbf{W}}}_{\overline{\overline{\varvec{\Gamma }}}_t} \cdot \bigl [\overline{\overline{\textbf{H}}}_{t-1}, \overline{\overline{\textbf{S}}}_t\bigl ]\bigl )$. A smaller value of $\overline{\overline{\varvec{\Gamma }}}_t$ indicates that more information $\overline{\overline{\textbf{H}}}_{t-1}$ of the previous timestep ${t-1}$ is ignored.

The actual activation of $\overline{\overline{\textbf{H}}}_t$ is computed by $\overline{\overline{\textbf{H}}}_t=\bigl (1-\overline{\overline{\textbf{U}}}_{t}\bigl ) \odot \widetilde{\overline{\textbf{H}}}_{t-1}+\overline{\overline{\textbf{U}}}_{t} \odot \widetilde{\overline{\textbf{H}}}_{t}$, where $\widetilde{\overline{\textbf{H}}}_{t}= {\overline{\varvec{tanh}}} \bigl (\overline{\overline{\textbf{W}}}_{\widetilde{\overline{\textbf{H}}}} \cdot \bigl [\overline{\overline{\varvec{\Gamma }}}_{t} \odot \overline{\overline{\textbf{H}}}_{t-1}, \overline{\overline{\textbf{S}}}_t\bigl ]\bigl )$.

The output is $\overline{\overline{\textbf{O}}}_t=\overline{\varvec{\sigma }}\bigl (\overline{\overline{\textbf{W}}}_{\overline{\overline{\textbf{O}}}} \cdot \overline{\varvec{\delta }}_{t}\bigl )$, and $\overline{\overline{\textbf{W}}}_{\overline{\overline{\varvec{\Gamma }}}}$, $\overline{\overline{\textbf{W}}}_{\overline{\overline{\textbf{U}}}}$, $ \overline{\overline{\textbf{W}}}_{\widetilde{\overline{\textbf{H}}}}$, $\overline{\overline{\textbf{W}}}_{\overline{\overline{\textbf{O}}}}$ are weight matrices which are learned, $\overline{\varvec{\sigma }}$ is sigmoid function, $\overline{\varvec{tanh}}$ is tanh function, $[\cdot ]$ indicates that two vectors are concatenated, $\odot $ is Hadamard Product, that is, element-wise of matrix multiplication.

N-BEATS [35] is an interpretable, widely applicable, and fast-trainable neural network architecture for univariate point prediction problems. The architecture of N-BEATS consists of a deep neural structure based on backward and forward residual links and a very deep stack of fully-connected layers, and is depicted on the left in Fig. 4.

A.:

Basic Block The block receives inputs vector $\overline{\overline{\textbf{S}}}_{\ell }$, and outputs two vectors $\hat{\overline{\textbf{S}}}_{\ell }, \hat{\overline{\textbf{O}}}_{\ell }$. One output $\hat{\overline{\textbf{S}}}_{\ell }$ of each block is the block’s forward forecast of input length and the other output $\hat{\overline{\textbf{O}}}_{\ell }$ is the block’ best estimate of input $\overline{\overline{\textbf{S}}}_{\ell }$, also named the “backcast”. The interior of the basic building block includes a fully connected network and two basis layers. The fully connected network yields two predictors of expansion coefficients including the backward $\overline{\overline{\varvec{{\beta }}}}^{back}_{\ell }$ and the forward $\overline{\overline{\varvec{{\beta }}}}^{for}_{\ell }$, described as following equations.

$$\begin{aligned} \begin{aligned} \overline{\overline{\varvec{{\beta }}}}^{back}_{\ell }=\overline{\overline{{\textbf{LP}}}}_{\ell }^{back}\left( \overline{\overline{\textbf{H}}}_{\ell , 4}\right) , \overline{\overline{\varvec{{\beta }}}}^{for}_{\ell }=\overline{\overline{{\textbf{LP}}}}_{\ell }^{for}\left( \overline{\overline{\textbf{H}}}_{\ell , 4}\right) . \end{aligned} \end{aligned}$$

(10)

where $\overline{\overline{{\textbf{LP}}}}$ layer is a linear projection layer, $i.e. \ \ \overline{\overline{\varvec{{\beta }}}}^{for}_{\ell }=\overline{\overline{\textbf{W}}}_{\ell }^{for} \overline{\overline{\textbf{H}}}_{\ell , 4}$. The operation of the $\ell $-th block is $\overline{\overline{\textbf{H}}}_{\ell , 1}=\overline{\overline{{\textbf{FC}}}}_{\ell , 1}\bigl (\overline{\overline{\textbf{S}}}_{\ell }\bigl )$, $\overline{\overline{\textbf{H}}}_{\ell , 2}=\overline{\overline{{\textbf{FC}}}}_{\ell , 2}\bigl (\overline{\overline{\textbf{H}}}_{\ell , 1}\bigl )$, $\overline{\overline{\textbf{H}}}_{\ell , 3}=\overline{\overline{{\textbf{FC}}}}_{\ell , 3}\bigl (\overline{\overline{\textbf{H}}}_{\ell , 2}\bigl )$, and $\overline{\overline{\textbf{H}}}_{\ell , 4}=\overline{\overline{{\textbf{FC}}}}_{\ell , 4}\bigl (\overline{\overline{\textbf{H}}}_{\ell , 3}\bigl )$. The $\overline{\overline{{\textbf{FC}}}}$ layer is a standard fully connected layer. The basis layers include the backward $\overline{\overline{\varvec{{\varphi }}}}^{back}_{\ell }$ layer and the forward $\overline{\overline{\varvec{{\varphi }}}}^{for}_{\ell }$ layer that receive the backward $\overline{\overline{\varvec{{\beta }}}}^{back}_{\ell }$ and forward $\overline{\overline{\varvec{{\beta }}}}^{for}_{\ell }$ expansion coefficients, respectively, and output the backcast $\hat{\overline{\textbf{S}}}_{\ell }=\overline{\overline{\varvec{{\varphi }}}}^{back}_{\ell }\bigl (\overline{\overline{\varvec{{\beta }}}}^{back}_{\ell }\bigl )$ and $\hat{\overline{\textbf{O}}}_{\ell }=\overline{\overline{\varvec{{\varphi }}}}^{for}_{\ell }\bigl (\overline{\overline{\varvec{{\beta }}}}^{for}_{\ell }\bigl )$. The operation is introduced as the following equations:

$$\begin{aligned} \hat{\overline{\textbf{S}}}_{\ell }=\sum _{i=1}^{{\text {dim}}\bigl (\overline{\overline{\varvec{{\beta }}}}^{back}_{\ell }\bigl )} \overline{\overline{\varvec{{\beta }}}}^{back}_{\ell , i} \overline{{\varvec{{\upsilon }}}}_{i}^{back}, \quad \hat{\overline{\textbf{O}}}_{\ell }=\sum _{i=1}^{{\text {dim}}\bigl (\overline{\overline{\varvec{{\beta }}}}^{for}_{\ell }\bigl )} \overline{\overline{\varvec{{\beta }}}}^{for}_{\ell , i} \overline{{\varvec{{\upsilon }}}}_{i}^{for}, \end{aligned}$$

(11)

where $\overline{{\varvec{{\upsilon }}}}_{i}^{back}$ and $\overline{{\varvec{{\upsilon }}}}_{i}^{for}$ denote backcast and forecast basis vectors, $\overline{\overline{\varvec{{\beta }}}}^{back}_{\ell , i}$ and $\overline{\overline{\varvec{{\beta }}}}^{for}_{\ell , i}$ are the i-th element of $\overline{\overline{\varvec{{\beta }}}}^{back}_{\ell }$ and $\overline{\overline{\varvec{{\beta }}}}^{back}_{\ell }$, respectively.

B.:

Doubly Residual Stacking The architecture consists of two residual branches, running on backcast prediction and forecast prediction of each layer, as shown in the following equations:

$$\begin{aligned} \overline{\overline{\textbf{S}}}_{\ell }=\overline{\overline{\textbf{S}}}_{\ell -1}-\hat{\overline{\textbf{S}}}_{\ell -1}, \quad \hat{\overline{\textbf{O}}}=\sum _{\ell } \hat{\overline{\textbf{O}}}_{\ell }. \end{aligned}$$

(12)

Convolutional neural network (CNN) is widely used in various fields [34, 46]. The simple CNN mainly consists of input layers, convolutional layers, pooling layers and fully-connected layers. According to different application scenarios, CNN is categorized into one-dimensional (Conv1D) convolution model, two-dimensional convolution (Conv2D) model and three-dimensional convolution (Conv3D) model. In this paper, we utilize the one-dimensional convolution. For Conv1D, the input $\overline{\overline{\textbf{S}}}=(\overline{\varvec{s}}_0, \overline{\varvec{s}}_1, \cdots ,\overline{\varvec{s}}_t, \cdots , \overline{\varvec{s}}_T)$, output $\overline{\overline{\textbf{O}}}$ and convolution kernel $\overline{\overline{\textbf{K}}}$ are usually vectors, which is expressed as following (13).

$$\begin{aligned} \overline{\overline{\textbf{O}}}(j) = \overline{\overline{\textbf{K}}}(j) *\overline{\overline{\textbf{S}}}(j) = \sum _{i=0}^{N}\overline{\overline{\textbf{K}}}(j-i) \overline{\overline{\textbf{S}}}(i) \end{aligned}$$

(13)

where $*$ denotes convolution operation. The left of Fig. 5 illustrates the process of one-dimensional convolution. The convolution kernel and input vector are weighted and summed to obtain the convolutional calculation result of the input data corresponding to the center of the kernel.

Multi-Layer Perceptron (MLP) is a Feed-forward neural networks (FNNs) with one hidden layer and is shown on the right in Fig. 5. There are only one-way and one-directional connections between the neurons of FNNs. In the FNNs, neuraons are arranged between the different layers. Generally, the first layer is the input layer, and the last layer is the output, and the other layers between the first and last layer are hidden layers. For the one-hidden-layer MLP with h hidden units, the output of the hidden layer is denoted by $\overline{\overline{\textbf{H}}}$. Formally, the output $\overline{\overline{\textbf{O}}}$ is calculated according to the following equation,

$$\begin{aligned} \begin{aligned}&\overline{\overline{\textbf{H}}}=\overline{\varvec{\sigma }}\bigl (\overline{\overline{\textbf{W}}}_{\ell _1} \overline{\overline{\textbf{S}}}+\overline{\overline{\textbf{b}}}_{\ell _1} \bigl ), \ \ \ \overline{\overline{\textbf{O}}}=\overline{\overline{\textbf{W}}}_{\ell _2} \overline{\overline{\textbf{H}}}+\overline{\overline{\textbf{b}}}_{\ell _2} \end{aligned} \end{aligned}$$

(14)

where the $\overline{\overline{\textbf{W}}}_{\ell _1}, \overline{\overline{\textbf{b}}}_{\ell _1}$ denote hidden-layer weights and biases, respectively. Let $\overline{\overline{\textbf{W}}}_{\ell _1}, \overline{\overline{\textbf{b}}}_{\ell _2}$ be the output-layer weights and biases, respectively, and $\overline{\varvec{\sigma }}$ denotes activate function.

2.4 Equilibrium optimizer

Equilibrium optimizer (EO) [47] is a meta-heuristic algorithm inspired by the mass balance equation in physics, which simulates the phenomenon of dynamic masses equilibrium within a controlled volume.

EO propagates individuals (particles) in the search space of the optimization problem and initiates the optimization process by the following (15).

$$\begin{aligned} \overline{\overline{\varvec{h}}}^{inital}_i = \overline{\overline{\varvec{h}}}_{\textrm{min} }+\left( \overline{\overline{\varvec{h}}}_{\textrm{max} }-\overline{\overline{\varvec{h}}}_{\textrm{min} }\right) * \overline{\overline{\varvec{\textbf{R}}}}_i,\ \ i=0,1,2, \ldots , N , \end{aligned}$$

(15)

where $\overline{\overline{\varvec{h}}}^{inital}_i$ denotes the initial position vector of the i-th individual, $\overline{\overline{\varvec{h}}}_{min }$ and $\overline{\overline{\varvec{h}}}_{max }$ denote the lower and upper bound vector, respectively. Let N be the population size, and $\overline{\overline{\varvec{\textbf{R}}}}_i$ denotes a randomly generated vector.

The first four particles facilitate EO to enhance exploration abilities and the average-particle enables EO to improve exploitation abilities. EO stores the five equilibrium candidates using a vector represented as follows:

$$\begin{aligned} \overline{\overline{\varvec{\textbf{H}}}}^{equ}_{\textrm{pool}}=\left[ \overline{\overline{\varvec{h}}}^{equ}_1,\ \ \overline{\overline{\varvec{h}}}^{equ}_2,\ \ \overline{\overline{\varvec{h}}}^{equ}_3,\ \ \overline{\overline{\varvec{h}}}^{equ}_4,\ \ \overline{\overline{\varvec{h}}}^{equ}_{\textrm{avg}},\right] \end{aligned}$$

(16)

where $\overline{\overline{\varvec{h}}}^{equ}_i, (i=1, 2, 3, 4) $ are the optimal solution found by the current iteration respectively. Let $\overline{\overline{\varvec{h}}}^{equ}_{\textrm{avg}}$ be the average of the above four solutions. The exponential term $\overline{\overline{{\textbf{E}}}}$ provides a good trade-off between global exploration and local exploitation capabilities, which is computed by the formula (17):

$$\begin{aligned} \overline{\overline{{\textbf{E}}}}={\textbf{exp}}^{-\overline{\overline{{\textbf{R}}}}_i^{exp}\left( \overline{\overline{{\textbf{T}}}}-\overline{\overline{{\textbf{T}}}}_{0}\right) }, \ \ \ \overline{\overline{\textbf{T}}}={\left( 1-\frac{\varvec{Iter}}{\varvec{Iter}_{\textrm{max}}}\right) }^{\bigl (\frac{\varvec{Iter}}{\varvec{Iter}_{\textrm{max}}}\bigl )}, \end{aligned}$$

(17)

where $\overline{\overline{{\textbf{R}}}}_i^{exp}$ is a randomly generated vector between 0 and 1. Let $\varvec{Iter}_{\textrm{max}}$ denote the maximum number of iterations and $\varvec{Iter}$ is the current iteration. $\overline{\overline{{\textbf{T}}}}$ decreases as the number of iterations $\varvec{Iter}$ increases, and $\overline{\overline{{\textbf{T}}}}_0=\nicefrac {1}{\overline{\overline{{\textbf{R}}}}_i^{exp}} \ln \bigl (-\overline{\overline{\varvec{\alpha }}}_{1} {\mathbf {{sign}}}\bigl ({\overline{\overline{{\textbf{R}}}}_i^{exp}}$ $-0.5\bigl )\bigl [1-\textbf{exp}^{-\overline{\overline{{\textbf{R}}}}_i^{exp} \overline{\overline{{\textbf{T}}}}}\bigl ]\bigl )+\overline{\overline{{\textbf{T}}}} $. Here, $\overline{\overline{\varvec{\alpha }}}_1$ and $\overline{\overline{\varvec{\alpha }}}_2$ are the weight for global exploration and local exploitation, respectively, and when the value is larger, the corresponding capability is stronger, generally, $\overline{\overline{\varvec{\alpha }}}_1=2$ and $\overline{\overline{\varvec{\alpha }}}_2=1$. The ${\mathbf {{sign}}}\bigl ({\overline{\overline{{\textbf{R}}}}_i^{exp}}-0.5\bigl )$ effects on the direction of exploration and exploitation.

The generation rate $\overline{\overline{\textbf{G}}}$ is an import indicator in the mass balance equation that can further improve the local exploitation capacity, which is computed by the following (18).

$$\begin{aligned} \overline{\overline{\textbf{G}}}=\overline{\overline{\textbf{G}}}_{0} \cdot {\textbf{exp}}^{-\overline{\overline{{\textbf{R}}}}_i^{exp}\left( \overline{\overline{{\textbf{T}}}}-\overline{\overline{{\textbf{T}}}}_{0}\right) }=\overline{\overline{\textbf{G}}}_{0} \cdot \overline{\overline{{\textbf{E}}}} ,\ \ \ \overline{\overline{\textbf{G}}}_{0}=\overline{\overline{\textbf{C}}}^{gen} \cdot \biggl (\overline{\overline{\varvec{h}}}^{equ}_{\textrm{ran}}-\overline{\overline{{\textbf{R}}}}_i^{gen}\overline{\overline{\varvec{h}}}\biggl ), \end{aligned}$$

(18)

where $\overline{\overline{\varvec{h}}}^{equ}_{\textrm{ran}}$ is a vector randomly selected from the equilibrium pool, $\overline{\overline{\textbf{C}}}^{gen}$ is the generation rate control parameter and is as follows:

$$\begin{aligned} \overline{\overline{\textbf{C}}}^{gen}= {\left\{ \begin{array}{ll}0.5 \cdot \overline{\overline{{\textbf{R}}}}_{1}^{num}, &{}\overline{\overline{{\textbf{R}}}}_{2}^{num} \geqslant \overline{\overline{\varvec{\Gamma }}}^{gen}_{\textrm{pro}} \\ 0, &{} \overline{\overline{{\textbf{R}}}}_{2}^{num}< \overline{\overline{\varvec{\Gamma }}}^{gen}_{\textrm{pro}} \end{array}\right. }, \end{aligned}$$

(19)

and $\overline{\overline{{\textbf{R}}}}_{1}^{num}, \overline{\overline{{\textbf{R}}}}_{2}^{num}$ are random numbers between 0 and 1, $\overline{\overline{\varvec{\Gamma }}}^{gen}_{\textrm{pro}}$ is the generation probability. When $\overline{\overline{\varvec{\Gamma }}}^{gen}_{\textrm{pro}}=0.5$, the exploration and exploitation capabilities are balanced. $\overline{\overline{\varvec{h}}}^{equ}_{\textrm{ran}}$ is a vector randomly selected from the equilibrium pool $\overline{\overline{\varvec{\textbf{H}}}}^{equ}_{\textrm{pool}}$. After the abstraction and improvement of the original physical theory, the EO algorithm is updated according to the following (20), where $\overline{\overline{\varvec{\textbf{U}}}}=1$ is considered as unit.

$$\begin{aligned} \overline{\overline{\varvec{h}}}=\overline{\overline{\varvec{h}}}^{equ}+\bigl (\overline{\overline{\varvec{h}}}-\overline{\overline{\varvec{h}}}^{equ}\bigl )\cdot \overline{\overline{\varvec{\textbf{E}}}}+\frac{\overline{\overline{\textbf{G}}}}{\overline{\overline{{\textbf{R}}}}_i^{gen}{\overline{\overline{\varvec{\textbf{U}}}}}}\bigl (1-\overline{\overline{\varvec{\textbf{E}}}}\bigl ) \end{aligned}$$

(20)

The algorithm for combining the optimized weights using EO is shown in Algorithm 1, and the flowchart of EO is shown in Fig. 6.

3 Experiments and analysis

In this section, several experiments are designed to demonstrate the effectiveness of the proposed technique, including two aspects of point forecasting and interval forecasting. The specific parameter settings for the experiments are listed in Table 2.

3.1 Data description

This study collects electricity load data from Queensland, Australia. The data is available from the Australian Energy Market Operator (AEMO).^{Footnote 1} The load data covers the complete four seasons of spring, summer, autumn and winter, from September 1, 2020 to August 31, 2021, and the interval between observations is half an hour. Therefore, we naturally divide the data into four datasets according to seasons. In chronological order, every three months is divided into one dataset and the statistical indicators for the four datasets are displayed in Table 3. A brief map of the four datasets is illustrated in Fig. 2. In this study, the first 60%, the middle 20% and the last remaining 20% of the data is split into training set, validation set and testing set, respectively. In performing the forecasting task, the same rolling forecasting mechanism is utilized on the training set, validation set and testing set.

3.2 Evaluation criteria

In this paper, a comprehensive evaluation system consisting of nine metrics is established to evaluate the forecasting ability of the model in terms of both point forecasting and interval forecasting performance. The evaluation metrics are shown in Tables 4 and 5.

For point forecasting, the mean absolute percentage error (MAPE) is widely used to reflect the effectiveness and reliability of the model. The mean absolute error (MAE) and root mean squared error (RMSE) are generally used to evaluate the average magnitude of the error between and actual value. The smaller the values of MAPE, RMSE and MAE, the better the performance of the model. The coefficient of determination ${\mathbf {R^2}}$ denotes the level of fit of the model to the observed values, and it takes a value between 0 and 1. The closer the value of ${\mathbf {R^2}}$ is to 1, the better the model fits. The forecasting interval coverage probability (FICP) indicates the ability of the forecasting interval to capture the actual target value. The forecasting interval normalized average width (FINAW) evaluates the quality of the width of the forecasting interval. FINAW and FICP are conflicting in nature. A high FICP yields less useful information. Similarly, compressing the width of forecasting interval may reduce the probability of the target point covered by the interval. The coverage width-based criterion (CWC) and Winkler score (WS) are two metrics for the comprehensive evaluation of interval forecasting performance. A smaller CWC or a smaller absolute value of WS indicates a better quality of the forecasting interval.

3.3 Analysis of point forecasting

In this section, we design two experiments to demonstrate the effectiveness of the proposed system for point forecasting. The median series of the raw data extracted by SFG is used as the features for point forecasting, thereby improving the forecasting performance of the proposed combined system.

3.3.1 Experiment I: Comparison with individual models based on raw data

This experiment compares the forecasting performance of individual model based on raw data with that of the proposed system. In the experiments, the sliding granulation window size (SGWS) is set to 2. The detailed experimental forecasting results are displayed in Table 6. For one-step forecasting, the proposed system exhibits the best performance on the spring dataset. The detailed metrics are ${\textbf{MAPE}^\mathbf {1-step-spring}_\textbf{lowest}}\!\!=\!\!\varvec{0.7973}\%, {\textbf{RMSE}^\mathbf {1-step-spring}_\textbf{lowest}} \!\!=\!\! 69.$$6636\%$, ${\textbf{MAE}^\mathbf {1-step-spring}_\textbf{lowest}}=51.7418\%$ and ${\mathbf {R^2}^\mathbf {1-step-spring}_\textbf{highest}}$ $=0.9942\%$. All four evaluation metrics are optimal for spring dataset. Taking ${\mathbf {M\!APE}}$ as an example, ${\mathbf {M\!APE}^\mathbf {1\!-\!step\!-spring}_\textbf{proposed}}$$=\!0.7973\%$ is significantly lower than the ${\textbf{MAPE}^\mathbf {1-step-spring}_\mathbf {TCN,GRU,N-BEATS,CNN,MLP,LSTM}}=[1.1419\%, 1.7$ $057\%,$ $1.2302\%,0.9840\%,$ $0.9102\%,1.0017\%]$ of five individual models. The situation for the other season datasets is similar to that of spring, and the performance of the proposed system is optimal in all models. With regard to the two-step forecasting, all the evaluation metrics of the proposed system are optimal compared to the other models on the summer, autumn and winter datasets. For three-step forecasting, the metrics are significantly poorer than the performance of one-step and two-step forecasting. Nonetheless, the metrics of the proposed system still outperform those of the compared individual models, except for spring. Taking autumn as an example, the ${\textbf{MAPE}^\mathbf {3-step-autumn}_\textbf{proposed}}=3.4439\%$ is significantly lower than the ${\textbf{MAPE}^\mathbf {3-step-autumn}_\mathbf {TCN,GRU,N-BEATS,CNN,MLP,LSTM}}=[4.0549\%, 3.8979\%,$ $3.9822\%,$ $ 4.2987\%, 4.6658\%,$ $4.0521\%]$ of five individual models.

3.3.2 Experiment II: Comparison with individual models processed by sliding fuzzy granulation

The purpose of this experiment is to verify the effectiveness of the proposed system compared to individual model processed by SFG. In addition, the effectiveness and rationality of the proposed system for data pre-processing using SFG method is demonstrated by comparing the performance of individual model processed by SFG with that of individual model based on raw data. The detailed results of the experiments are presented in Table 6. The visualization of three steps point forecasting for autumn is illustrated in Fig. 7.

Table 2 Experimental parameters

Full size table

The parameters are set as in Experiment I. For one-step forecasting, the performance of the proposed system is optimal for four evaluation metrics on four datasets. Taking spring as an example, the performance metrics of the proposed system are ${\textbf{MAPE}^\mathbf {1-step-spring}_\textbf{proposed}}= 0.7973\%$, ${\textbf{RMSE}^\mathbf {1\!-step-spring}_\textbf{proposed}}\!\!=\!69.6636, {\textbf{MAE}^\mathbf {1\!-step\!-spring}_\textbf{proposed}}\!\!=\! 51.\!7418$, ${\mathbf {R^2}^\mathbf {1-step-spring}_\textbf{proposed}}\!=\!0.9942$, respectively. The proposed system outperforms SFG-MLP that is the best performance of individual model in all four metrics. The metrics of SFG-MLP are ${\textbf{MAPE}^\mathbf {1-step-spring}_\mathbf {SGF-MLP}}= 0.8675\%$, ${\textbf{RMSE}^\mathbf {1-step-spring}_\mathbf {SGF-MLP}}=73.2838$, ${\textbf{MAE}^\mathbf {1-step-spring}_\mathbf {SGF-MLP}} =55.8124, {\mathbf {R^2}^\mathbf {1-step-spring}_\mathbf {SGF-MLP}}=0.9935$. For each individual model, whether based on SFG or raw data, no individual model can perform the best on all four dataset. The performance of the proposed system outperforms individual SFG-based model equally well in two- and three-step forecasting, except for a slightly poorer performance on the spring dataset. For spring, the MAPE of the individual models forecasting are ${\textbf{MAPE}^\mathbf {1-step-spring}_\mathbf {TCN, SFG-TCN}}$ $= [1.1419\%,\!$ $0.8954\%]$, ${\textbf{MAPE}^\mathbf {1\!-step\!-spring}_\mathbf {GRU, SFG-GRU}} \!=\! [1.7057\%,\! 1.1$$120\%]$, ${\textbf{MAPE}^\mathbf {1\!-step\!-spring}_\mathbf {N\!-BEATS, SFG\!-N\!-BEATS}}\! =\! [1.2302\%,\! 1.1593$$\%]$, ${\textbf{MAPE}^\mathbf {1\!-step\!-spring}_\mathbf {CNN, SFG-CNN}} = [0.9840\%, 0.9152\%]$, ${\textbf{MAPE}^\mathbf {1-step-spring}_\mathbf {MLP, SFG-MLP}} = [0.9102\%, 0.8675\%]$, ${\textbf{MAPE}^\mathbf {1-step-spring}_\mathbf {LSTM, SFG-LSTM}}$ $= [1.2946\%,$ $ 1.0017\%]$. The forecasting MAPE of individual models based SFG processing show different degrees of improvement than that of the raw data, thus validating the feasibility and effectiveness of point forecasting utilizing median series extracted from SFG method.

Table 3 Statistical characteristic of datasets

Full size table

Table 4 Point forecasting metrics for performance evaluation

Full size table

Table 5 Interval forecasting metrics for performance evaluation

Full size table

Table 6 Point forecasting performances of the proposed system and individual model

Full size table

3.4 Analysis of interval forecasting

Interval forecasting is capable of providing more information to help managers correctly assess the efficiency and safety of electricity system operations thus better control risks and costs. This experiment verifies the effectiveness of interval forecasting method using SFG by comparing the performance of individual model and the proposed system. In this experiment, forecasting interval nominal confidence level (FINC) is set to 95%, and the SGWS is set to 3. The weight coefficients of individual objectives for the objective function are set to $\lambda _1=1, \lambda _2=150, \lambda _3=300$, respectively by trial and error method. Similar to point forecasting, we conduct multi-step forecasting experiments to verify the effectiveness of the proposed system. Five metrics are adopted to comprehensively evaluate the interval forecasting results. Table 7 exhibits interval forecasting performances. The visualization of interval forecasting for summer is illustrated in Fig. 8.

(I).:: For one-step forecasting, the proposed system has the highest FICP and ACE compared to the references individual models on all datasets. For the comprehensive evaluation, the proposed system has the lowest CWC and absolute of WS on all datasets, although there is a slight sacrifice of the FINAW relative to the individual models. For instance, the performance evaluation values are ${\textbf{FICP}^\mathbf {1-step-spring}_\textbf{highest}}= 94.3613\%$, ${\textbf{FINAW}^\mathbf {1-step-spring}_\textbf{proposed}}= 0.1174$, ${\textbf{ACE}^\mathbf {1-step-spring}_\textbf{highest}}= \!-\!0.0\!064$, ${\textbf{CWC}^\mathbf {1-step-spring}_\textbf{lowest}}\!=\! 0\!.2699$, ${\mathbf {|WS|}^\mathbf {1-step-spring}_\textbf{lowest}}$ $= 56.8280$. Specifically, the proposed system ${\textbf{CWC}^\mathbf {1\!-\!step\!-spring}_\textbf{proposed}}$ $=\! 0.2699$ outperforms the best performing indvidual model, SFG-MLP, ${\textbf{CWC}^\mathbf {1\!-step-spring}_\mathbf {SFG-MLP}}$ $= 48.4355$ by a factor of nearly one hundred. Futher, for the comprehensive metric WS, taking the spring as an example, ${\mathbf {|WS|}^\mathbf {1-step-spring}_\mathbf {proposed, SFG-individual}}$= $[\varvec{56.828}, 82.45$01, 157.1554, 120.0073, 71.4267, 72.7410, 74.3326], the absolute value of WS obtained by the proposed system is significantly lower than that of the other individual models.
(II).:: Regarding two-step forecasting, the comprehensive performance of the proposed system still outperforms the individual models. For spring, summer, autumn and winter datasets, the FICP and ACE values of the proposed are always the highest and the CWC and absolute values of WS are always the lowest. Further, we take autumn as an example for analysis. The FICP of the proposed system is ${\textbf{FICP}^\mathbf {2-step-autumn}_\textbf{highest}}= 92.2463\%$, while the FICP of individual models are less than to 80%, ${\textbf{FICP}^\mathbf {2-step-autumn}_\mathbf { SFG-individual}}\!=\! [73.2041\%, 72.7480\%,\! 71.$$9498\%, 70.9236\%, 71.4937\%,75.3986\%]$. Similarly, the WS of the proposed system significantly outperforms the performance of individual models, specifically, ${\mathbf {|WS|}^\mathbf {2-step-autumn}_\mathbf {proposed, SFG-individual}}$ = $[\varvec{125.5619}, 207.74$$71, 227.3684, 210.5835, 215.6201, 217.8378,168.19$ $20]$. For the comprehensive metric CWC, the performance obtained by the individual models is quite poor, several orders of magnitude worse than the CWC value of the proposed system.
(III).:: For three-step forecasting, although the interval forecasting performance is significantly poorer compared to that of one-step prediction, the proposed system can still achieve good performance. For instance, the PICP values of the proposed system is greater than 90% on autumn dataset, ${\textbf{FICP}^\mathbf {3-step-autumn}_\textbf{highest}}= 92.0091\%$, while for the individual models, the FICP values are less than 70%, ${\textbf{FICP}^\mathbf {3\!-step\!-autumn}_\mathbf {SFG\!-individual}}$= $[62.5571\%, 55.4795\%,$$ 61.52 97\%,\! 48.6301\%, 54.9087\%,69.2132\%]$; From the perspective of comprehensive metrics, the CWC of the proposed system is significantly better than that of the individual models by several orders of magnitude. The CWC of the proposed system is less than 1.5, while the CWC of the individual models is greater than 8 $\times $ $10^{5}$, and the absolute value of WS is also lower than that of the individual models, such as ${\mathbf {|WS|}^\mathbf {3-step-autumn}_\mathbf {proposed, SFG-individual}}$ = $[\varvec{162.1054}, 455.9354, 35$6.8596, 450.9305, 392.9018, 279.2480]. According to the detailed comparison of the experiments, the combined strategy proposed addresses the limitations and deficiencies of the individual models forecasting, with optimal interval estimation ranges from one-step to three-step forecasting.

Table 7 Interval forecasting performances of the proposed system and individual model

Full size table

4 Discussion

In this section, the improvement ratio (IR), sensitivity analysis and Kolmogorov-Smirnov Predictive Accuracy (KSPA) test of the proposed system are discussed. Furthermore, we also discuss the effect of the sliding granulation window size on the forecasting performance.

4.1 Improvement ratio from the proposed system

4.1.1 Improvement ratio for point forecasting

In this section, the metric $\varvec{IR_\textrm{MAPE}}$ measures the improvement ratio of the point forecasting MAPE of the proposed system. We define $\varvec{IR_\textrm{MAPE}}$ as follows:

$$\begin{aligned} \varvec{IR_\textrm{MAPE}} = \left[ \frac{\varvec{\textrm{MAPE}_{com}}-\varvec{\textrm{MAPE}_{pro}}}{\varvec{\textrm{MAPE}_{com}}} \right] \times 100\% \end{aligned}$$

(21)

where $\varvec{\textrm{MAPE}_{com}}$ and $\varvec{\textrm{MAPE}_{pro}}$ denote the MAPE values of the comparison models and the proposed system, respectively. The specific improvement percentages are displayed in Table 8. According to the calculation results of the improvement, the following conclusions are drawn.

(I).
Compared with the individual model based on raw data, the forecasting performance of the proposed system is significantly improved from the experimental results. As an example, for spring, the improvement ratio is $\varvec{IR_\mathrm {MAPE-TCN}^\mathrm {1-step-spring}}=30.1001\%$ compared to TCN and $\varvec{IR_\mathrm {MAPE-GRU}^\mathrm {1-step-spring}}=53.2587\%$ GRU in one-step forecasting.
(II).
The proposed system also has different magnitudes of improvement compared to the individual models based on SFG method. Taking autumn as an example, the improvement values are $\varvec{IR_\mathrm {MAPE-SFG-CNN}^\mathrm {1-2-3-step-autumn}}=[17.0853\%,$ $ 10.7270\%, 14.5989\%]$ compared to SFG-CNN model for one-, two-, and three-step.
(III).
Calculating the average of $\varvec{IR}$ for each step, the proposed system has a greater improvement ratio compared to individual models based on raw data. For instance, $\varvec{IR_\mathrm {MAPE-MLP}^\mathrm {1-2-3-step-average}}=[17.8571\%, 17.6540\%, 25.43$ $98\%]$ compared to MLP model, and $\varvec{IR_\mathrm {MAPE-SFG-MLP}^\mathrm {1-2-3-step-average}}$ $=[10.3666\%, $ $ 10.4106\%,$ $21.9309\%]$ compared to SFG-MLP model. This experimental result verifies that the median series extracted by the SFG method to be used as features for point forecasting, which can improve the performance.

4.1.2 Improvement ratio for interval forecasting

For the interval forecasting, the metric $\varvec{IR}_\textbf{WS}$ is utilized to measure the improvement rate of the interval forecasting WS of the proposed system. The $\varvec{IR}_\textbf{WS}$ is defined as follows:

$$\begin{aligned} \varvec{IR}_{\textbf{WS}} = \left[ \frac{\textbf{WS}_{\varvec{com}}-\textbf{WS}_{\varvec{pro}}}{\textbf{WS}_{\varvec{com}}} \right] \times 100\% \end{aligned}$$

(22)

where ${\textbf{WS}_{\varvec{com}}}$ and ${\textbf{WS}_{\varvec{pro}}}$ denote the Winkler score of the comparison models and the proposed system, respectively. The detailed improvement percentages are displayed in Table 9. According to the calculation results of the improvement, the conclusions are drawn as following:

(I).
Compared with individual models based on SFG for interval forecasting, the proposed combined system substantially improves the performance of interval forecasting. The average of improvement ratio on four datasets are $\varvec{IR_\mathrm {WS-SFG-GRU}^\mathrm {1-2-3-step-average}}\!=\![30.2635\%, 50.2429\%,$ $ 64.1143\%]$ compared to SFG-GRU model, and $\varvec{IR}_\mathbf {WS-SFG-N-BEATS}^\mathbf {1-2-3-step-average}=[20.1902\%, 53.0378\%, 61.6$ $278\%]$ compared to SFG-N-BEATS model.
(II).
The interval forecasting performance of individual model based on SFG decreases with increasing number of steps, however, the improvement rate of the proposed combined forecasting system increasing with increasing number of steps. For some instances, $\varvec{IR}_{\mathbf {WS-SFG-GRU}}^{\mathbf {1-2-3-step-summer}}=[16.0447\%, 39.6658\%, 55.5$$046\%]$, $\varvec{IR}_\mathbf {WS-SFG-TCN}^\mathbf {1-2-3-step-autumn}=[19.6580\%, 39.560$$2\%, 58.4455\%]$, and $\varvec{IR}_\mathbf {WS-SFG-CNN}^\mathbf {1-2-3-step-winter}=[10.59$$97\%, 48.0815\%, 58.2988\%]$, the experimental results demonstrate the strong generalization capability of the proposed system.

Table 8 The results of improvement percentages for point forecasting

Full size table

Table 9 The results of improvement percentages for interval forecasting

Full size table

Table 10 Sensitivity analysis of the proposed system of point forecasting

Full size table

Table 11 Sensitivity analysis of the proposed system of interval forecasting

Full size table

4.2 Sensitivity analysis

The sensitivity analysis mechanism only varies the value of one key parameter of EO algorithm at a time as specified, while keeping the other parameters fixed, and then the volatility of the forecasting results is analyzed. The sensitivity is analyzed by calculating the standard deviation of evaluation metrics. The specific formula is defined as $\overline{\textbf{SD}}(M) = {\sum ^t_{i=1}(M_i-\overline{M})^2}\Big / t$, where t denotes the number of testing times, $M_i$ is the value of metric, $\overline{M}$ is the average value of all testing times. The smaller the $\overline{\textbf{SD}}(M)$, the lower the sensitivity and more stable the proposed system. In this paper, the proposed combined optimization model is conducted for sensitivity analysis, and the key parameters include Particle Number and Iteration Number.

Definition 1

Given two vectors $\overrightarrow{\varvec{P}} = (\varvec{p}_1, \varvec{p}_2, \cdots , \varvec{p}^*, \cdots , $ $\varvec{p}_i)$, $\overrightarrow{\varvec{Q}} = (\varvec{q}_1, \varvec{q}_2, \cdots , \varvec{q}^*, \cdots , \varvec{q}_i)$, let $\varvec{Q}=\varvec{q}^*$, and the elements of $\overrightarrow{\varvec{P}}$ varies continuously from $\varvec{p}_1$ to $\varvec{p}_2$, and the type of variation pattern is defined as $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}} $.

The vector of Particle Number is set as $\overrightarrow{\varvec{P}} =[\varvec{20, 40, 60,}$$ \varvec{80^*, 100}]$ and Iteration Number is set as $\overrightarrow{\varvec{Q}}=[\varvec{200, 400, 600,} $ $\varvec{800, 1000^*}]$. The $*$ denotes the optimal parameters for EO algorithm. The specific results of the sensitivity analysis are illustrated in Tables 10 and 11 for point and interval forecasting, respectively.

4.2.1 Sensitivity analysis for point forecasting

The proposed system exhibits strong robustness when facing with two pattern changes. Integrating all the sensitivity measures, the influence of changes for the two key parameters of EO algorithm on the point forecasting results is at a relatively low level. The visualization of the two pattern for winter is depicted in Fig. 9.

(I).:: For one-step point forecasting, $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ is the most sensitive pattern. Taking summer as an example, Particle Number is the most sensitive of the two parameters and the sensitivity measures of each metric are $\overline{\textbf{SD}}^\mathrm {{\mathbf {1-step}}}_{{\textbf{MAPE}}}=0.0485$, $\overline{\textbf{SD}}^{{\mathbf {1-step}}}_{{\textbf{RMSE}}}=5.1296$, $\overline{\textbf{SD}}^{{\mathbf {1-step}}}_{{\textbf{MAE}}}=3.3847$ and $\overline{\textbf{SD}}^{{\mathbf {1-step}}}_{{\mathbf {R^2}}}=0.0010$, respectively. In both patterns, $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ obtains lower sensitivity measures, which means that the proposed system is more stable for Iteration Number within the given range of iterations, expect for spring. For spring, the $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ pattern is slightly higher sensitivity measure values than the $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$. However, the sensitivity measures of the proposed system for the changes of two parameters are relatively low, demonstrating than the proposed system is comparatively stable for one-step point forecasting.
(II).:: Regarding two-step point forecasting for spring, $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ pattern is more sensitive than $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ pattern. For other datasets, $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ pattern has higher sensitivity measures than the $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ pattern. However, the sensitivity values of each metric are relatively low, indicating than the proposed system is stable with the variation of Particle Number and Iteration Number.
(III).:: For three-step point forecasting, the two patterns exhibit different sensitivities for different datasets. The pattern $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ is more sensitive than the pattern $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ in spring, but the pattern $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ is more sensitive than pattern $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ in summer. Specific measures of the pattern $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ such as $\overline{\textbf{SD}}^{{\mathbf {3-step-spring}}}_{{\textbf{MAPE}}}=0.1430$, $\overline{\textbf{SD}}^{{\mathbf {3-step-spring}}}_{{\textbf{RMSE}}}=14.2449$, $\overline{\textbf{SD}}^{{\mathbf {3\!-step-spring}}}_{{\textbf{MAE}}}\!=\!10.4573$ and $\overline{\textbf{SD}}^{{\mathbf {3-step-spring}}}_{{\mathbf {R^2}}}=0.0098 $ for spring. In summer, the pattern $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ such as $\overline{\textbf{SD}}^{{\mathbf {3-step-summer}}}_{{\textbf{MAPE}}}=0.0509 $, $\overline{\textbf{SD}}^{{\mathbf {3-step-summer}}}_{{\textbf{RMSE}}}=4.0209$, $\overline{\textbf{SD}}^{{\mathbf {3-step-summer}}}_{{\textbf{MAE}}}$$=3.3236$ and $\overline{\textbf{SD}}^{{\mathbf {3-step-summer}}}_{{\mathbf {R^2}}}=0.0022 $.

Integrating all the sensitivity measures, the influence of changes for the two key parameters of EO algorithm on the point forecasting results is at a relatively low level.

4.2.2 Sensitivity analysis for interval forecasting

The interval forecasting performance of the proposed system has better robustness with the variation of the key parameters of the EO algorithm. The two pattern for summer is visualized in Fig. 10.

(I).:: For one-step interval forecasting, taking summer as an example, the $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ pattern is more sensitive than the $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ pattern for the proposed system and the sensitivity measures of each metric are $\overline{\textbf{SD}}^{{\mathbf {1-step-spring}}}_{\textbf{PICP}}=0.0062$, $\overline{\textbf{SD}}^{{\mathbf {1-step-spring}}}_{{\textbf{PINAW}}}=0.0086$, $\overline{\textbf{SD}}^{{\mathbf {1-step-spring}}}_{{\textbf{CWC}}}=0.0888$ and $\overline{\textbf{SD}}^{{\mathbf {1-step-spring}}}_{\textbf{WS}}=3.8141$, respectively. For winter, $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ obtains lower sensitivity measures, which means that the proposed system is more stable for Particle Number within the given range of iterations, and the sensitivity measures of each metric are $\overline{\textbf{SD}}^{{\mathbf {1-step-winter}}}_{\textbf{PICP}}=0.0047$, $\overline{\textbf{SD}}^{{\mathbf {1-step-winter}}}_{{\textbf{PINAW}}}=0.0032$, $\overline{\textbf{SD}}^{{\mathbf {1-step-winter}}}_{{\textbf{CWC}}}=0.0032$, and $\overline{\textbf{SD}}^{{\mathbf {1-step-winter}}}_{\textbf{WS}}=1.4235$, respectively. Consequently, the sensitivity measures of the proposed system for the changes of two parameters are relatively low, demonstrating than the proposed system is comparatively stable for one-step interval forecasting.
(II).:: Regarding two-step point forecasting for spring, summer and autumn, the $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ pattern is more sensitive than $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ pattern. But for winter, the $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ pattern has higher sensitivity measures than the $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ pattern. Although two patterns exhibit different sensitivities on each dataset, respectively, the sensitivity values of each metric are relatively low, demonstrated that the proposed system is stable with the variation of two key parameters.
(III).:: For three-step, the measures of the pattern $ \overrightarrow{\varvec{P}} \& \overline{\overline{\varvec{Q}}}$ are $\overline{\textbf{SD}}^{{\mathbf {3-step-spring}}}_{\textbf{PICP}}=0.0418 $, $\overline{\textbf{SD}}^{\mathbf {3-step-spring}}_{\textbf{PINAW}}=0.0227$, $\overline{\textbf{SD}}^{{\mathbf {3-step-spring}}}_{{\textbf{CWC}}}=15.7249$ and $\overline{\textbf{SD}}^{\mathbf {3-step-spring}}_{\textbf{WS}}=31.1143$, for spring. In summer, the measures of the pattern $ \overrightarrow{\varvec{Q}} \& \overline{\overline{\varvec{P}}}$ are $\overline{\textbf{SD}}^{{\mathbf {3-step-summer}}}_{\textbf{PICP}}=0.0044$, $\overline{\textbf{SD}}^{\mathbf {3-step-summer}}_{\textbf{PINAW}}\!=0.0052$, $\overline{\textbf{SD}}^{{\mathbf {3-step-summer}}}_{{\mathbf {\textrm{CWC}}}}\!=\!0.0878$ and $\overline{\textbf{SD}}^{{\mathbf {3-step-summer}}}_{\textbf{WS}}=2.3355$. As with one-step and two-step interval forecasting, the sensitivity is relatively low across datasets, thus demonstrating that the proposed system is relatively stable.

4.3 Effect of sliding granulation window size

In this section, the effect of SGWS is analyzed for point and interval forecasting. For simplicity, we set SGWS to 2, 3 and 4, respectively, and then analyze the performance of the proposed system through the experimental results.

4.3.1 Effect of sliding granulation window size for point forecasting

The performance metrics of the proposed system are shown in Table 12 for different SGWS. With the increase of SGWS, the four metrics subsequently become worse. When SGWS is 2, the performance of the proposed system is optimal. For spring one-step forecasting, the metrics of the proposed system are $\varvec{\textrm{MAPE}^\mathrm {1-step-spring}_\mathrm {SGWS=[2,3,4]}}= [{\textbf {0.7973\%}}, 0.9875\%, 1.8126\%]$, $\varvec{\textrm{RMSE}^\mathrm {1-step-spring}_\mathrm {SGWS=[2,3,4]}}= [{\textbf {69.6636}}, 88.3603,153.8969]$, $\varvec{\textrm{MAE}^\mathrm {1-step-spring}_\mathrm {SGWS=[2,3,4]}}= [{\textbf {51.7418}},$ 64.0632, 118.2034] and $\varvec{\mathrm {R^2}^\mathrm {1-step-spring}_\mathrm {SGWS=[2,3,4]}}= [{\textbf {0.9942}}, 0.9906, 0.9715]$. Based on the sliding fuzzy granulation mechanism, when SGWS is 2, the median information particles extracted by SFG represents the information of the two data points.

When the SGWS is larger, the number of data points that can be represented by the median information particles increases, for point forecasting, the loss of deterministic information increases. Therefore, when the SGWS is 2, the SFG method can remove the noise from the raw series data.

Table 12 Point forecasting performances of the proposed system with different sliding granulation window sizes

Full size table

Table 13 Interval forecasting performances of the proposed system with different sliding granulation window sizes

Full size table

Table 14 KSPA test of the proposed system of point forecasting

Full size table

4.3.2 Effect of sliding granulation window size for interval forecasting

Interval forecasting is designed to describe the uncertainty of data series. Table 13 shows the performance metrics of the proposed system for different SGWS. The interval forecasting of the proposed system does not have the same regularity as the point forecasting, however, for the comprehensive metric WS, when SGWS is 3, the performance of the interval forecasting is optimal relative to other sliding granularity window sizes. For summer one-step forecasting, the metrics of the proposed system are $\varvec{\textrm{PICP}^\mathrm {1-step-spring}_\mathrm {SGWS=[2,3,4]}}= [94.5349\%, 95.2326\%, {\textbf {95.4598\%}}, ]$, $\varvec{\textrm{ACE}^\mathrm {1-step-spring}_\mathrm {SGWS=[2,3,4]}}= [-0.0047, 0.0023, {\textbf {0.0046}}]$, $\varvec{\textrm{FINAW}^\mathrm {1-step-spring}_\mathrm {SGWS=[2,3,4]}}= [0.10$ $50, {\textbf {0.0998}}, 0.1471]$, $\varvec{\textrm{CWC}^\mathrm {1-step-spring}_\mathrm {SGWS=[2,3,4]}}= [0.2302, {\textbf {0.0998}},$ $ 0.1471]$ and $\varvec{\textrm{CWC}^\mathrm {1-step-spring}_\mathrm {SGWS=[2,3,4]}}\!=\! [-49.6120, {\textbf {-44.4442}},-6$6.4431]. When SGWS is equal to 4, FINAW is the largest for all situations. According to the sliding fuzzy granulation mechanism, it is intuitive that the larger SGWS, the wider the interval extracted by SFG method.

4.4 Kolmogorov-Smirnov Predictive Accuracy test

Kolmogorov-Smirnov Predictive Accuracy (KSPA) test [48, 49] is a non-parametric complement statistical test based on the Kolmogorov-Smirnov (KS) test principle and is used to distinguish the predictive accuracy of two sets of forecasting.

The two sample two-sided KSPA test aims to determine whether the distributions of the forecasting errors of the two models are statistically significantly different and can be calculated as follows.

$$\begin{aligned} D =\max _x|F_{Error1}^{m_1}(x)-F_{Error2}^{m_2}(x)|\end{aligned}$$

(23)

where $F_{Error1}^{m_1}(x)$ and $F_{Error1}^{m_1}(x)$ denote the empirical cumulative distribution function (c.d.f.) for the forecasting errors from the two models $m_1$ and $m_2$. The forecasting errors Error1 and Error2 are the absolute or squared forecast error from the forecasting models $m_1$ and $m_2$. The null hypothesis is that there is no significant difference between the two statistical forecasts, and when the bilateral KSPA test produces a test statistic that is less than the significance level $\alpha $(usually 1%, 5%, or 10%), the null hypothesis will be rejected and the alternative hypothesis that the forecasting errors do not share the same distribution will be accepted. In such situation, it is able to conclude with 1-$\alpha $ confidence level that there is a statistically significant difference between the forecasting distributions of the two models.

The goal of the two-sample one-sided KSPA test is to determine whether the model that reports the smallest error based on some loss function also reports a randomly smaller error compared to the alternative model.

For economy of space, the autumn dataset is used to perform the KSPA test. Table 14 shows the results of the KSPA test for the autumn point forecasting of the proposed system and the comparison of the individual models forecasting errors. Figures 11 and 12 illustrate one-step point forecasting for autumn and empirical cumulative distribution function of error (c.d.f.) for the proposed system and individual models based on raw data and SFG method, respectively. With the significance level $\alpha = 0.05$, if the p-value of two-sided is less than 0.05, it means that the prediction errors of the proposed system and the comparison model are statistically significantly different. Then, if the p-value of one-sided is less than 0.05, the proposed system has a lower random error than the forecasting of the comparison models. Table 14 shows that only very few prediction results do not pass the KSPA test completely. Therefore, the proposed system has the best forecasting performance.

5 Conclusion

In view of the non-stationary and noisy characteristics of electricity load data, we utilize SFG method to obtain the upper bound, lower bound and median series of the load data, consider deep neural networks to build a combined system optimized by EO algorithm, and then perform point forecasting and interval forecasting. Several experiments are designed to demonstrate the superiority, effectiveness and stability of the proposed system by comprehensively considering multiple evaluation metrics.

Comparison of individual models based on raw series and SFG preprocessed data demonstrates that the SFG method can effectively reduce the adverse effects of noise and significantly improve the performance of point forecasting. Through discussing the effect of SGWS, the point forecasting performance of the proposed system gradually decreases as the SGWS increases. For interval forecasting, compared to individual models, the proposed system is not only optimal in one-step forecasting, but also can significantly improve the performance of multi-step forecasting.

In summary, the data preprocessing strategy based on SFG provides support for the uncertainty analysis of electricity load data, and EO algorithm determines the optimal weight coefficient for each individual model of the combined model, addressing the limitations of the individual models. As a result, the proposed system can guarantee better performance in both point forecasting and interval forecasting.

Data Availability

The data that support the findings of this study are available from https://aemo.com.au/en.

Notes

https://aemo.com.au/en

References

Esteves GRT, Bastos BQ, Cyrino FL, Calili RF, Souza RC (2015) Long term electricity forecast: a systematic review. Procedia Computer Science 55:549–558. https://doi.org/10.1016/j.procs.2015.07.041
Article Google Scholar
Wang J, Zhang L, Li Z (2022) Interval forecasting system for electricity load based on data pre-processing strategy and multi-objective optimization algorithm. Appl Energy 305:117–911. https://doi.org/10.1016/j.apenergy.2021.117911
Article Google Scholar
Yang W, Wang J, Niu T, Pei D (2020) A novel system for multi-step electricity price forecasting for electricity market management. Appl Soft Comput 88:106029. https://doi.org/10.1016/j.asoc.2019.106029
Article Google Scholar
Wang S, Wang J, Haiyan L, Zhao W (2021) A novel combined model for wind speed prediction-combination of linear model, shallow neural networks, and deep learning approaches. Energy 234:121–275. https://doi.org/10.1016/j.energy.2021.121275
Article Google Scholar
Zhang Z, Hong W-C (2021) Application of variational mode decomposition and chaotic grey wolf optimizer with support vector regression for forecasting electric loads. Knowl-Based Syst 228:107–297. https://doi.org/10.1016/j.knosys.2021.107297
Article Google Scholar
Wang J, Pei D, Haiyan L, Yang W, Niu T (2018) An improved grey model optimized by multi-objective ant lion optimization algorithm for annual electricity consumption forecasting. Appl Soft Comput 72:321–337. https://doi.org/10.1016/j.asoc.2018.07.022
Article Google Scholar
Wang K, Wang J, Zeng B, Haiyan L (2022) An integrated power load point-interval forecasting system based on information entropy and multi-objective optimization. Appl Energy 314:118938. https://doi.org/10.1016/j.seta.2021.101940
Article Google Scholar
Zhao H, Han X, Guo S (2018) DGM (1, 1) model optimized by mvo (multi-verse optimizer) for annual peak load forecasting. Neural Comput Appl 30(6):1811–1825. https://doi.org/10.1007/s00521-016-2799-1
Article Google Scholar
Pappas SS, Ekonomou L, Karampelas P, Karamousantas DC, Katsikas SK, Chatzarakis GE, Skafidas PD (2010) Electricity demand load forecasting of the hellenic power system using an ARMA model. Electric Power Systems Research 80(3):256–264. https://doi.org/10.1016/j.epsr.2009.09.006
Article Google Scholar
Wang B, Zhang L, Ma H, Wang H, Wan S (2019) Parallel LSTM-based regional integrated energy system multienergy source-load information interactive energy prediction. Complexity. https://doi.org/10.1155/2019/7414318
Article Google Scholar
Kai Wang, Shuai Bin Huang, and Yu Le Ding. Application of GRNN neural network in short term load forecasting. In Advanced Materials Research, volume 971, pages 2242–2247. Trans Tech Publ, 2014. https://doi.org/10.4028/www.scientific.net/AMR.971-973.2242
Zhou Y, Wang J, Haiyan L, Zhao W (2022) Short-term wind power prediction optimized by multi-objective dragonfly algorithm based on variational mode decomposition. Chaos, Solitons & Fractals 157:111–982. https://doi.org/10.1016/j.chaos.2022.111982
Article MathSciNet Google Scholar
Yang W, Wang J, Niu T, Pei D (2019) A hybrid forecasting system based on a dual decomposition strategy and multi-objective optimization for electricity price forecasting. Appl Energy 235:1205–1225. https://doi.org/10.1016/j.apenergy.2018.11.034
Article Google Scholar
Xiao X, Xie W, Zhou Y, Zhao W, Liu X, Zhang C (2019) Prediction and analysis of energy demand of high energy density AC/DC park based on spatial static load forecasting method. The Journal of Engineering 2019(16):3388–3391. https://doi.org/10.1049/joe.2018.8389
Article Google Scholar
Ramos P, Santos N, Rebelo R (2015) Performance of state space and ARIMA models for consumer retail sales forecasting. Robotics and Computer-integrated Manufacturing 34:151–163. https://doi.org/10.1016/j.rcim.2014.12.015
Article Google Scholar
Nyberg H, Saikkonen P (2014) Forecasting with a noncausal var model. Computational Statistics & Data Analysis 76:536–555. https://doi.org/10.1016/j.csda.2013.10.014
Article MathSciNet MATH Google Scholar
Takeda H, Tamura Y, Sato S (2016) Using the ensemble Kalman filter for electricity load forecasting and analysis. Energy 104:184–198. https://doi.org/10.1016/j.energy.2016.03.070
Article Google Scholar
Garcia RC, Contreras J, Van Akkeren M, Garcia JBC (2005) A Garch forecasting model to predict day-ahead electricity prices. IEEE Trans Power Syst 20(2):867–874. https://doi.org/10.1109/TPWRS.2005.846044
Article Google Scholar
Wang Y, Wang J, Zhao G, Dong Y (2012) Application of residual modification approach in seasonal ARIMA for electricity demand forecasting: a case study of China. Energy Policy 48:284–294. https://doi.org/10.1016/j.enpol.2012.05.026
Article Google Scholar
Rendon-Sanchez JF, de Menezes LM (2019) Structural combination of seasonal exponential smoothing forecasts applied to load forecasting. Eur J Oper Res 275(3):916–924. https://doi.org/10.1016/j.ejor.2018.12.013
Article MathSciNet MATH Google Scholar
Wang S, Wang X, Wang S, Wang D (2019) Bi-directional long short-term memory method based on attention mechanism and rolling update for short-term load forecasting. International Journal of Electrical Power & Energy Systems 109:470–479. https://doi.org/10.1016/j.ijepes.2019.02.022
Article Google Scholar
Wang Y, Chen J, Chen X, Zeng X, Kong Y, Sun S, Guo Y, Liu Y (2020) Short-term load forecasting for industrial customers based on TCN-LIGHTGBM. IEEE Trans Power Syst 36(3):1984–1997. https://doi.org/10.1109/TPWRS.2020.3028133
Article Google Scholar
Zhang X, Wang J, Zhang K (2017) Short-term electric load forecasting based on singular spectrum analysis and support vector machine optimized by cuckoo search algorithm. Electric Power Systems Research 146:270–285
Article Google Scholar
Wang J, Wang S, Zeng B, Haiyan L (2022) A novel ensemble probabilistic forecasting system for uncertainty in wind speed. Appl Energy 313:118–796
Article Google Scholar
Hearst MA, Dumais ST, Osuna E, Platt J, Scholkopf B (1998) Support vector machines. IEEE Intelligent Systems and Their Applications 13(4):18–28. https://doi.org/10.1109/5254.708428
Article Google Scholar
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. Adv Eng Softw 69:46–61. https://doi.org/10.1016/j.advengsoft.2013.12.007
Article Google Scholar
Barman M, Choudhury NBD (2020) A similarity based hybrid GWO-SVM method of power system load forecasting for regional special event days in anomalous load situations in Assam. India. Sustainable Cities and Society 61:102–311. https://doi.org/10.1016/j.scs.2020.102311
Article Google Scholar
Li W-Q, Chang L (2018) A combination model with variable weight optimization for short-term electrical load forecasting. Energy 164:575–593. https://doi.org/10.1016/j.energy.2018.09.027
Article Google Scholar
He F, Zhou J, Feng Z, Liu G, Yang Y (2019) A hybrid short-term load forecasting model based on variational mode decomposition and long short-term memory networks considering relevant factors with Bayesian optimization algorithm. Appl Energy 237:103–116. https://doi.org/10.1016/j.apenergy.2019.01.055
Article Google Scholar
Wang R, Wang J, Yunzhen X (2019) A novel combined model based on hybrid optimization algorithm for electrical load forecasting. Appl Soft Comput 82:105–548. https://doi.org/10.1016/j.asoc.2019.105548
Article Google Scholar
Zhang X, Wang J (2018) A novel decomposition-ensemble model for forecasting short-term load-time series with multiple seasonal patterns. Appl Soft Comput 65:478–494. https://doi.org/10.1016/j.asoc.2018.01.017
Article Google Scholar
Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018. https://arxiv.org/pdf/1803.01271
Junyoung Chung, Caglar Gulcehre, KyungHyun Cho, and Yoshua Bengio. Empirical evaluation of gated recurrent neural networks on sequence modeling. arXiv preprint arXiv:1412.3555, 2014. https://arxiv.org/abs/1412.3555
Yixiao Yu, Han X, Yang M, Yang J (2020) Probabilistic prediction of regional wind power based on spatiotemporal quantile regression. IEEE Trans Ind Appl 56(6):6117–6127. https://doi.org/10.1109/TIA.2020.2992945
Article Google Scholar
Boris N. Oreshkin, Dmitri Carpov, Nicolas Chapados, and Yoshua Bengio. N-beats: neural basis expansion analysis for interpretable time series forecasting. In International Conference on Learning Representations, 2020. https://openreview.net/forum?id=r1ecqn4YwBhttps://openreview.net/forum?id=r1ecqn4YwB
Zheng Z, Chen H, Luo X (2019) A Kalman filter-based bottom-up approach for household short-term load forecast. Appl Energy 250:882–894. https://doi.org/10.1016/j.apenergy.2019.05.102
Article Google Scholar
Li Xin Dai and Fang Fang Hu. Application optimization of grey model in power load forecasting. In Advanced Materials Research, volume 347, pages 301–305. Trans Tech Publ, 2012. https://doi.org/10.4028/www.scientific.net/AMR.347-353.301
Hou Bin, Yun Xiao Zu, and Chao Zhang. A forecasting method of short-term electric power load based on BP neural network. In Applied Mechanics and Materials, volume 538, pages 247–250. Trans Tech Publ, 2014. https://doi.org/10.4028/www.scientific.net/AMM.538.247
Pei D, Wang J, Yang W, Niu T (2019) A novel hybrid model for short-term wind power forecasting. Appl Soft Comput 80:93–106. https://doi.org/10.1016/j.asoc.2019.03.035
Article Google Scholar
Song J, Wang J, Haiyan L (2018) A novel combined model based on advanced optimization algorithm for short-term wind speed forecasting. Appl Energy 215:643–658. https://doi.org/10.1016/j.apenergy.2018.02.070
Article Google Scholar
Zhang L, Wang J, Niu X (2021) Wind speed prediction system based on data pre-processing strategy and multi-objective dragonfly optimization algorithm. Sustainable Energy Technol Assess 47:101–346. https://doi.org/10.1016/j.seta.2021.101346
Article Google Scholar
Wang J, Jianming H (2015) A robust combination approach for short-term wind speed forecasting and analysis-combination of the ARIMA (Autoregressive Integrated Moving Average), ELM (Extreme Learning Machine), SVM (Support Vector Machine) and lssvm (least square SVM) forecasts using a GPR (Gaussian Process Regression) model. Energy 93:41–56. https://doi.org/10.1016/j.energy.2015.08.045
Article Google Scholar
Li J, Wang J, Zhang H, Li Z (2022) An innovative combined model based on multi-objective optimization approach for forecasting short-term wind speed: a case study in China. Renewable Energy 201:766–779. https://doi.org/10.1016/j.renene.2022.10.123
Article Google Scholar
Lotfi A Zadeh. Fuzzy sets and information granularity. Advances in fuzzy set theory and applications, 11:3–18, 1979
Duan L, Fusheng Yu, Pedrycz W, Wang X, Yang X (2018) Time-series clustering based on linear fuzzy information granules. Appl Soft Comput 73:1053–1067. https://doi.org/10.1016/j.asoc.2018.09.032
Article Google Scholar
Anwen Zhu, Xiaohui Li, Zhiyong Mo, and Ruaren Wu. Wind power prediction based on a convolutional neural network. In 2017 International Conference on Circuits, Devices and Systems (ICCDS), pages 131–135. IEEE, 2017. https://doi.org/10.1109/ICCDS.2017.8120465
Faramarzi A, Heidarinejad M, Stephens B, Mirjalili S (2020) Equilibrium optimizer: a novel optimization algorithm. Knowl-Based Syst 191:105190. https://doi.org/10.1016/j.knosys.2019.105190
Hossein Hassani and Emmanuel Sirimal Silva (2015) A Kolmogorov-Smirnov based test for comparing the predictive accuracy of two sets of forecasts. Econometrics 3(3):590–609. https://doi.org/10.3390/econometrics3030590
Article Google Scholar
Fan G-F, Zhang L-Z, Meng Yu, Hong W-C, Dong S-Q (2022) Applications of random forest in multivariable response surface for short-term load forecasting. International Journal of Electrical Power & Energy Systems 139:10–8073. https://doi.org/10.1016/j.ijepes.2022.108073
Article Google Scholar

Download references

Acknowledgements

This work was supported by the Natural Science Foundation of China (Grant No. 71671029), Scientific Research Project of Shaanxi Education Department (Grant No. 21JK0550) and the Major Key Project of PCL (Grant No. PCL2021A12).

Author information

Authors and Affiliations

Macau Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau, 999078, China
Shoujiang Li & Jianzhou Wang
School of Mathematics and Data Science, Shaanxi University of Science and Technology, Xi’an, 710021, Shaanxi, China
Hui Zhang
Peng Cheng Laboratory, Shenzhen, 518005, China
Yong Liang

Authors

Shoujiang Li
View author publications
You can also search for this author in PubMed Google Scholar
Jianzhou Wang
View author publications
You can also search for this author in PubMed Google Scholar
Hui Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Yong Liang
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: Shoujiang Li, Jianzhou Wang; Methodology: Shoujiang Li, Hui Zhang; Formal analysis and investigation: Shoujiang Li, Hui Zhang, Jianzhou Wang; Writing - original draft preparation: Shoujiang Li, Hui Zhang; Writing - review and editing: Jianzhou Wang, Yong Liang; Funding acquisition: Hui Zhang, Jianzhou Wang, Yong Liang; Resources: Jianzhou Wang; Supervision: Jianzhou Wang, Yong Liang.

Corresponding author

Correspondence to Jianzhou Wang.

Ethics declarations

Competing interests

The authors have no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, S., Wang, J., Zhang, H. et al. Short-term load forecasting system based on sliding fuzzy granulation and equilibrium optimizer. Appl Intell 53, 21606–21640 (2023). https://doi.org/10.1007/s10489-023-04599-0

Download citation

Accepted: 29 March 2023
Published: 07 June 2023
Issue Date: October 2023
DOI: https://doi.org/10.1007/s10489-023-04599-0

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Short-term load forecasting system based on sliding fuzzy granulation and equilibrium optimizer

Abstract

Similar content being viewed by others

Load Forecasting Based on Improved Long Short-Term Memory Artificial Neural Network

Short-term load forecasting method based on fuzzy optimization combined model of load feature recognition

Correlation-Based Short-Term Electric Demand Forecasting Using ANFIS Model

Explore related subjects

1 Introduction

2 Methodology

2.1 Short-term load forecasting system

2.2 Data preprocessing module

2.3 Deep neural network module

2.4 Equilibrium optimizer

3 Experiments and analysis

3.1 Data description

3.2 Evaluation criteria

3.3 Analysis of point forecasting

3.3.1 Experiment I: Comparison with individual models based on raw data

3.3.2 Experiment II: Comparison with individual models processed by sliding fuzzy granulation

3.4 Analysis of interval forecasting

4 Discussion

4.1 Improvement ratio from the proposed system

4.1.1 Improvement ratio for point forecasting

4.1.2 Improvement ratio for interval forecasting

4.2 Sensitivity analysis

Definition 1

4.2.1 Sensitivity analysis for point forecasting

4.2.2 Sensitivity analysis for interval forecasting

4.3 Effect of sliding granulation window size

4.3.1 Effect of sliding granulation window size for point forecasting

4.3.2 Effect of sliding granulation window size for interval forecasting

4.4 Kolmogorov-Smirnov Predictive Accuracy test

5 Conclusion

Data Availability

Notes

References

Acknowledgements

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Competing interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation