Energies 15 02263 v3
Energies 15 02263 v3
Energies 15 02263 v3
Article
A Hybrid Channel-Communication-Enabled CNN-LSTM
Model for Electricity Load Forecasting
Faisal Saeed 1 , Anand Paul 1 and Hyuncheol Seo 2, *
Abstract: Smart grids provide a unique platform to the participants of energy markets to tweak their
offerings based on demand-side management. Responding quickly to the needs of the market can
help to improve the reliability of the system, as well as the cost of capital investments. Electric load
forecasting is important because it is used to make and run decisions about the power grid. However,
people use electricity in nonlinear ways, which makes the electric load profile a complicated signal.
Even though there has been a lot of research done in this field, an accurate forecasting model is still
needed. In this regard, this article proposed a hybrid cross-channel-communication (C3)-enabled
CNN-LSTM model for accurate load forecasting which helps decision making in smart grids. The
proposed model is the combination of three different models, i.e., a C3 block to enable channel
communication of a CNN (convolutional neural networks) model, two convolutional layers to extract
the features and an LSTM (long short-term memory network) model for forecasting. In the proposed
hybrid model, Leaky ReLu (rectified linear unit) was used as activation function instead of sigmoid.
The channel communication in CNN model makes the proposed model very light and efficient.
Extensive experimentation was done on electricity load data. The results show the model’s high
efficiency. The proposed model shows 98.3% accuracy and 0.4560 MAPE error.
Citation: Saeed, F.; Paul, A.; Seo, H.
A Hybrid Channel-Communication-
Keywords: cross-channel communication; Convolutional Neural Networks; LSTM; electricity; load;
Enabled CNN-LSTM Model for
forecasting
Electricity Load Forecasting. Energies
2022, 15, 2263. https://doi.org/
10.3390/en15062263
2. Related Work
The authors of [11] present an efficient method for rapid and precise load forecasting
in the day-ahead energy market, which is critical for the proper functioning of SGs with
significant demand-side flexibility. They proposed an SPLNF model that can retain linearity
while also learning-from-data in LMs. They improved the overall effectiveness for faster
model training by lowering the input vector dimensionality. In [12], the authors present an
IoT-based deep learning system that automatically extracts characteristics from acquired
data and, as a result, provides an accurate prediction of future load value. Their model is
an individually constructed two-step forecasting technique, which enhances forecasting
precision greatly. Additionally, the proposed model can statistically investigate the impacts
Energies 2022, 15, 2263 3 of 17
of several main attributes, which is very effective in choosing attribute patterns and deploy-
ing onboard sensors for smart grids with large territories, varying climates, and societal
customs.
Ayub et al. [13] proposes SVM classifier to tackle the problem of load forecasting
accuracy. The forecasting model is divided into two stages: feature engineering and SVM
classification. For feature selection, a mixture of two approaches (XGBoost and DTC) is
used to choose the finest features from the dataset. The SVM classifier is fine-tuned using
three super factors until the desired accuracy is obtained. The SVM classifier has achieved
98% accuracy rate.
Another research study [14] offered a novel method for smart meter client load predic-
tion by converting nonlinear smart meter data into linear system profiles. The approach’s
resilience was demonstrated using extremely fluctuating smart meter customer demand
data. The study demonstrated the advantages of employing the suggested technique over
neural networks, particularly when dealing with extremely fluctuating smart meter con-
sumer needs. The combination of the cluster forecast provided a more precise prediction
while keeping the information’s variability. Usman et al. [15] proposed a modified RNN
for short-term pricing and predictive modelling to forecast electricity load and price using
data analytics. Data preprocessing techniques such as RFE and DTC are used to eliminate
extraneous characteristics to decreases redundancy. LSTM is used to train and test the
suggested model. The experimental findings demonstrate the efficacy of the suggested
strategy. The analytical findings reveal that their suggested system has a lower MAPE than
FFNN and RNN. The study [16] compares three different machine learning techniques on a
real-world example based on the daily data from an Aarhus-based DHN (Denmark). In
the analysis, support vector regression depending on the climatic parameters and calen-
dar events outperforms other models in the 15–38 h prediction ranges. Wang et al. [17]
increased the accuracy of load forecasting by presenting a novel load-forecasting system
called VMD–CISSA–LSSVM. The system includes the data preparation approach varia-
tional modal decomposition (VMD), the sparrow searches algorithm (SSA), and the least
squares support vector machine (LSSVM). To solve the drawbacks of the SSA method,
which is susceptible to local optima and sluggish convergence, they also developed a
multistrategy improved chaotic sparrow search algorithm (CISSA).
The authors of [18] proposed a fuzzy logic-based controller, which is extremely appro-
priate for reducing disruptions caused by variations in STLF. The challenge is designed to
optimize RER utilization in order to improve the dependability of the power network. To
identify any unpredictability in the power system caused by overloading and faults, an ef-
fective fuzzy control strategy is used. Their results showed that the network becomes stable
in a shorter amount of time than the other methods due to the controller’s quick response
time to unplanned disruptions. In the suggested method in [19], researchers estimated load
using accessible big data, using Apache Spark and Apache Hadoop as big data platforms
for distributed computing. This study assessed the development of ML techniques utilizing
Apache Spark’s MLib package. According to the findings, distributed computing of load
prediction delivered good precision and calculation times. Yang et al. [20] proposed a
deep scalable and adaptable ensemble learning system for individualized probabilistic
load forecasting. To increase uncertainty measurement efficiency, customer categorization
and multitask pattern recognition were applied. The ensemble projections were refined
using the LASSO-based quantile combination strategy. They also performed case studies
on residential and SME clients with two forecasting horizons, showing their superiority
and efficacy when compared to state-of-the-art benchmarking approaches.
3.1.1.
3.1.1.Convolutional
ConvolutionalLayerLayer
Convolutional
Convolutionallayers
layersconduct
conductaacomplex
complexprocess
processon onthe
theinput
inputimage,
image,and
andthe theoutput
output
isispassed
passedtoto its
its following
following layer. At Ateach
eachposition
positionininthe theconvolutional
convolutionallayer,
layer,there
there
is is a
a re-
responsive regionwith
sponsive region with aa set
set of units from
from the
the previous
previous levels.
levels. The
Theneurons
neuronsmay
mayacquire
acquire
elementary
elementaryvisual
visualproperties
propertiesininthe
theimmediate
immediatereceptive
receptivefield
fieldsuch
suchas ascorners,
corners,endpoints,
endpoints,
and
and orientated edges. This convolutional layer has numerous featuremaps
orientated edges. This convolutional layer has numerous feature mapsfromfromwhich
which
different
differentproperties
propertiescan canbebeextracted.
extracted.Every
Everyunit
unithas
hasthe
thesame
sameweightage
weightageandandbias
biasininevery
every
individual feature map. As a result of this, the identified properties are same for all possible
input locations. This mathematical formulation is commonly used to indicate the equation
of a convolutional layer:
h i
X j = f ∑iεM Xi ∗ k ij + bi
I I −1 I I
(1)
j
Energies 2022, 15, 2263 5 of 17
where X jI denotes the output feature map, M j symbolizes the number of input channels, k ijI
represents the kernels and b is bias term.
The down in the above equation represents the subsampling function. In practice,
this function performs a sum over each individual block of input picture to reduce the
dimensions.
where, f li is a function has the functionality of collecting all the feature responses of all
channels. Simultaneously, it updates the encoded features of the channels. This
Energies 2022, 15, x FOR PEER REVIEW 6 of 17cross-
channel communication enables communication between all sides of the network.
The feature encoding, message passing, and feature decoding are the three main parts
of cross-channel communication network.
Feature Encoder
This module is responsible for extracting global information from each channel re-
sponse map. Particularly, the response map xli as discussed earlier, is flatten into simple
one-dimensional vector and then passes it two FC layers, i.e.,
yil = f enc
in
xli ,
zil = f enc
out
σ yil (4)
There are two fully connected layers where f in and f out are the linear functions and σ
is a ReLu activation function.
Message Passing
The message passing module is used to make sure that all channels communicate
with each other so that the different feature responses can be represented in different ways
by updating the final feature responses. Graph convolutional network (GCN) [22] is a
good way to learn the channel interaction. Specifically, we proposed a graph attention
network [23] for enabling channel interaction between load data, which has a built-in soft
attention mechanism the same as GCN. Our model has the same cross-channel interaction
ability as the block intension module. In our model, we construct an undirected graph
j
where Z = zil are nodes and sij = f att zil , zl is the edge strength between two nodes.
There are number of methods available to learn f att [23–25] but we used the following
method to learn it.
hl wl
zi [ k ]
zij = ∑ l ,
k =1
( hl wl )
j
sij = − zij − z j (5)
where hl and wl represents the hight and width of a layer. zil [k] represents the k th element
of zil 1-D vector. To allow more communication between the similar channels, we computed
negative square distance. This way, group of similar channels were formed which becomes
more harmonizing and distinct. Then the SoftMax layer normalized the attention score.
Feature Decoder
This module is responsible for obtaining the information for all repaired channels and
reshaping it to the original input’s dimensions. The feature decoder employs a standard
convolution technique to transmit the data to the subsequent layers. After acquiring
updated channel wise output zil , the decoder module reshapes it to the original dimension
by applying simple convolutional process. All three modules enable communication for
balance across all the neurons at the same level.
𝑝 − μ(𝑝)
Energies 2022, 15, 2263 𝑝𝑛 = 7 (12)
of 17
σ(𝑝)
where μ and σ denotes mean and standard deviation respectively. In the equation (𝑝𝑛 ),
Mean and Std are used to calculate the standard deviation of the standardized load data.
2. Gates: This property of LSTM in the network assists to manage the distribution of
Z-score normalization is the term used to describe this process. The data is separated in
information. This mechanism is comprised of three gates: the input gate it , the forget
hourly manner for our convenience. Algorithm 1 is used to divide the data into training,
gate f t , and the output gate Ot ;
validating, and testing parts. The system is then trained using the training dataset and
3. The said gates in LSTM helps to restrict the quantity of information flows. The
validated using the validating dataset in the subsequent phase. A trained neural network
value is expressed between 0 and 1, where the value 0 refers that no transmission
is tested using a dataset
of information including anticipated
is authorized, and the value data1 of load that
means for atotal
day communication
ahead. Root mean
of
square error (RMSE) estimate
information is accomplished. is used to test the model’s efficiency.
Figure 3.
Figure Pictorial description
3. Pictorial description of
of proposed
proposed LSTM
LSTM cell.
cell.
In the proposed LSTM model, despite the conventional LSTM network, we used ReLu
Algorithm 1: #This algorithm separated the electricity load dataset into training,
and Leaky ReLu [26] activation function instead of traditional sigmoid and hyper tanh
validation
functions as and testing
shown sets 3. As we described earlier, learning the nonlinear behavior of
in Figure
Input: Electricity
load data is a littleload dataset for activation functions such as sigmoid and tanh because
challenging
Output: 65% Training,
of their low output limit. 15%Validation
To overcomeand 10% Testing
this challenge, wesets
implemented the LSTM model
1.
which uses ReLuasData_length
Data_Size (time-series)
activation function ×0.65in the function. The mathematical form of
as shown
2. Data
all the for
cells LSTM is
ofTraining time-series
shown [0 ….. equations.
in following Data_Size]
3. X length (time-series) × 0.1
4. Validation-Data time-series (Data-Size
f t = ReLu w f · [h….
t−1X)
, xt ] + b f (6)
5. Testing-Datatime-series(X ….length(Data-Size) + length(X))
6. Return Train-Data, Validation-Data, it = ReLu(wTesting
i · [ ht−1Data
, x t ] + bt ) (7)
cet = L.ReLu(wc · [ht−1 , xt ] + bc ) (8)
4. Implementation Detail
ct = f t · ct−1 + it · cet (9)
The electricity datasets of the energy load were taken from different data sources,
Ot = ReLu(wo · [ht−1 , xt ] + bt ) (10)
e.g., the power consumption dataset was from Independent System Operator New Eng-
land (ISO NE) [27] and New York Independent ht = L.Relu(System
ct ) Operator (NYISO) [28]. ISO (11)
NE
controls the creation and distribution system for New England. ISO NE yields and spreads
where f t , it , ct , Ot , and ht are representing the forget gate, input gate, cell state, output
nearly 30,000 MW electrical energy every day. At ISO NE, per annum USD 10 million of
gate, and hidden state. The proposed approach is divided into four primary sections:
business is accomplished by a total of 400 electrical consumers in the market. The facts
preparing the data, training the LSTM system, verifying the system, forecasting the load,
consist of ISO NE zone’s limits of system load per hour and adjusting capacity clearance
and calculating the value or cost based on the testing data. The processes for cost prediction
value of 21 states in USA for the past 8 years that is starting from January 2011 to March
are detailed in the following phases. For the first phase, the historical price and load vectors
are normalized using the following computation.
p − µ( p )
pn = (12)
σ( p )
Energies 2022, 15, 2263 8 of 17
where µ and σ denotes mean and standard deviation respectively. In the equation (pn ),
Mean and Std are used to calculate the standard deviation of the standardized load data.
Z-score normalization is the term used to describe this process. The data is separated in
hourly manner for our convenience. Algorithm 1 is used to divide the data into training,
validating, and testing parts. The system is then trained using the training dataset and
validated using the validating dataset in the subsequent phase. A trained neural network
is tested using a dataset including anticipated data of load for a day ahead. Root mean
square error (RMSE) estimate is used to test the model’s efficiency.
Algorithm 1: #This algorithm separated the electricity load dataset into training, validation and
testing sets
Input: Electricity load dataset
Output: 65% Training, 15%Validation and 10% Testing sets
1. Data_Size ← Data_length (time-series) × 0.65
2. Data for Training ← time-series [0 . . . Data_Size]
3. X ← length (time-series) × 0.1
4. Validation-Data ← time-series (Data-Size . . . X)
5. Testing-Data← time-series(X . . . length(Data-Size) + length(X))
6. Return Train-Data, Validation-Data, Testing Data
4. Implementation Detail
The electricity datasets of the energy load were taken from different data sources, e.g.,
the power consumption dataset was from Independent System Operator New England
(ISO NE) [27] and New York Independent System Operator (NYISO) [28]. ISO NE controls
the creation and distribution system for New England. ISO NE yields and spreads nearly
30,000 MW electrical energy every day. At ISO NE, per annum USD 10 million of business
is accomplished by a total of 400 electrical consumers in the market. The facts consist of
ISO NE zone’s limits of system load per hour and adjusting capacity clearance value of
21 states in USA for the past 8 years that is starting from January 2011 to March 2018. The
dataset shows about 63,224 estimations. New York Independent System Operator is a
nonprofit establishment that works with an American city’s electricity grid and is in charge
of an entire state’s comprehensive energy markets. The evidence collected from New York
Independent System Operator comprises the hourly utilization and value in the city. It
contains 13 years’ worth of data which is from January 2006 to October 2018 and has a total
of 112,300 estimations.
To train the model, we used the minibatch method. The minibatch approach divides
the data into many batches and updates the variable for each batch individually. Minibatch
avoids the massive number of finds produced by the traditional training strategy of criss-
crossing the whole data variable. We must perform gradient steps for all training sets as
a single batch in batch gradient descent. In contrast to batch gradient descent, minibatch
gradient descent allows a dataset to be split into many little datasets, such as one batch of
data into many small vectors of data called minibatches. The training datasets are trundled
synchronously between X and Y using the minibatch gradient descent technique. This
shuffle ensures that samples are divided into tiny batches at random. The shuffled batch is
then divided into several smaller batches. Each micro batch is usually a power of two in
size (64, 128, 256, 512, 1024, etc.). The minibatch approach infuses adequate chaos to each
gradient update while obtaining relative rapid convergence, because minibatch updates
weights on each minibatch gradient. Adam optimizer is used to avoid this disadvantage.
The Adam algorithm is not to be confused with the traditional conditional gradient descent
algorithm. The classic gradient descent technique maintains a single iteration rate while up-
dating all weights. Throughout the training, the learning rate remains constant. The Adam
algorithm calculates the gradient’s first instant approximation and second raw instant
approximation. For various variables, the instant approximation is built as an independent
adaptive learning rate, which may be changed throughout the training process.
Energies 2022, 15, 2263 9 of 17
Actualdata
Figure4.4.Actual
Figure data values
values of four
of four different
different cells.
cells.
Then, the four-block LSTM model was trained on extracted features. To check the
model efficiency, we visualized the learning curves for four different runs. The model
performance can be examined over several epochs on training and testing data using a
learning curve. It can be said after looking at the learning curve that the model is picking
Energies 2022, 15, 2263 10 of 17
Then, the four-block LSTM model was trained on extracted features. To check the
model efficiency, we visualized the learning curves for four different runs. The model
performance can be examined over several epochs on training and testing data using a
learning curve. It can be said after looking at the learning curve that the model is picking
up new information from the data or simply memorizing it. The high error rate in training
and testing and the fast convergence because of the high learning rate and bias results
the learning curve being skewed, and the model does not learn from its errors. Similarly,
when the gap between training and testing errors is high, the high variance develops. In
both ways, the model has problem and results in inaccurate generalization. When the test
error increases while the training error decreases, this phenomenon is called overfitting.
This demonstrates that the model is memorizing, but not learning. Consequently, in these
situations it is impossible to generalize from the model. After applying dropout method and
early termination of learning can avoid overfitting. For the proposed model, however,
Energies 2022, 15, x FOR PEER REVIEW 11 ofthe
17
testing/validation error gradually diminishes alongside the training error for the electricity
grids as shown in Figure 5. Our model handled overfitting issues quite well.
Figure 5.
Figure C3-enabled CNN
5. C3-enabled CNN LSTM
LSTM model
model loss.
loss.
Table 2 presents the first three epochs of each four runs: their time of execution, loss,
Table 2 presents the first three epochs of each four runs: their time of execution, loss,
and accuracy. Meanwhile, while testing the C3-enabled CNN-LSTM model, the model
and accuracy. Meanwhile, while testing the C3-enabled CNN-LSTM model, the model
achieved 98.3% accuracy, as shown in Figure 6, while Figure 7 shows the ROC curve of
achieved 98.3% accuracy, as shown in Figure 6, while Figure 7 shows the ROC curve of
the model. We depicted the predicted loads of four grids’ data in Figure 8a–d for the four
the model. We depicted the predicted loads of four grids’ data in Figure 8a–d for the four
grids, respectively, which shows the complete load forecasting of our model. From the
grids, respectively, which shows the complete load forecasting of our model. From the
figure, it can be seen that the model is performing better and efficiently. In Figure 8, blue
figure, it can be seen that the model is performing better and efficiently. In Figure 8, blue
lines represent the actual value of the load, the yellow line shows the prediction on training
lines represent the actual value of the load, the yellow line shows the prediction on train-
data, and green lines show load forecasting. It is noticeable from the presented graphs that
ing data, and green lines show load forecasting. It is noticeable from the presented graphs
the proposed C3-enabled CNN-LSTM model can capture nonlinear behavior from the past
that the proposed C3-enabled CNN-LSTM model can capture nonlinear behavior from
data and, on this learned behavior, it can forecast the load very efficiently.
the past data
Table and, on this
3 provides learned behavior,
the comparison it can forecast
of the proposed modelthewith
loadexisting
very efficiently.
models in terms
of MAPE. We showed this table for one grid station. This table lists the numerical findings
of benchmark models such as LSTM [29], CNN-LSTM [30], Bi-LSTM [31], and our proposed
model. It also shows the day-ahead forecasted load based on the proposed model. Our
model has a MAPE error of 0.4560%, while the Bi-LSTM model has a MAPE error of
2.5397%, the CNN-LSTM model has a MAPE error of 2.3123%, and the LSTM model has a
MAPE error of 4.3664%. When compared to state-of-the-art models, our proposed model
shows lower MAPE, which means the proposed model has more accurate results. In terms
of accuracy, CNN-LSTM-projected load forecasting outperforms LSTM, while Bi-LSTM
outperforms CNN-LSTM. The CNN-LSTM model employed RMSprop for optimization,
Energies 2022, 15, x FOR PEER REVIEW 11 of 17
Energies 2022, 15, 2263 11 of 17
but the Bi-LSTM model utilized DEA, which improves prediction accuracy by decreasing
error. At the expense of greater execution time, this higher precision is achieved. Due
to the inclusion of the C3-based CNN model for feature selection and RMSprop-based
optimization module in LSTM framework, the proposed C3-enabled CNN-LSTM model
outperforms Bi-LSTM, CNN-LSTM, and LSTM models. Table 1 shows the statistical results
of our model with state-of the art models for a single power-grid station in terms of MAPE.
We conclude that the proposed C3-enabled CNN-LSTM outperforms benchmark models
based on the findings and discussion. In terms of MAPE, the average numerical findings
for a power grid are 0.4560% which are lower than the benchmark models.
Table 2. Training time, loss, and accuracy of first three epochs of each run.
Figure 6.
Figure C3-enabled CNN
6. C3-enabled CNN LSTM
LSTM model
model accuracy.
accuracy.
Energies 2022, 15, x FOR PEER REVIEW 12 of 17
Energies
Energies2022,
2022,15,
15,x2263
FOR PEER REVIEW 1212ofof17
17
Figure
Figure7.
7.ROC
ROCcurve
curveof
ofproposed
proposedmodel.
model.
Figure 7. ROC curve of proposed model.
(a) (b)
(a) (b)
(c) (d)
(c) (d)
Figure
Figure8.
8.(a–d)
(a–d)Load
Loadforecasting
forecastingresults
results of
of proposed
proposed C3-CNN-LSTM
C3-CNN-LSTM model
model on
on four
four different
different grids.
grids.
Figure 8. (a–d) Load forecasting results of proposed C3-CNN-LSTM model on four different grids.
TableThe
2. Training time, loss,
comparison and
of the accuracymodel
proposed of firstwith
threerespect
epochs of
to each
time run.
execution with benchmark
Table 2. Training time, loss, and accuracy of first three epochs of each run.
models is depicted in Figure 9. Sometimes the accuracy of the Bi-LSTM model increased
Epochs Training Time Loss Accuracy
because of the DEA optimization
Epochs Trainingalgorithm,
TimeFirst Runbut it came
Lossat the cost of a higher execution
Accuracy
time due to its greedy nature. From Figure 8, it
First Runis obvious our model becomes more accurate
1/50 2s 0.3183 0.6817
with lower execution times. The reasons behind the low execution time is the C3 block,
1/50 2ss
2/50 the convolutional0.03 0.3183 0.6817
0.2153are two main reasons
1.7847
which lessens layers to only three. There for the low
2/50
3/50 0.03
0.05 ss 0.2153
0.1411 1.7847
0.8511
execution time of the C3-enabled CNN-LSTM model, i.e., cross-channel-communication
3/50 enabled the channel
block which 0.05 scommunication
Second Run within 0.1411 layer and the use0.8511
of the ReLu
Second Run
activation function instead of sigmoid in the LSTM model.
Energies 2022, 15, 2263 13 of 17
Table 3. Comparison of proposed model with other state of the art algorithms using mean absolute
percentage error.
We have made a scalability analysis in Figure 10. This analysis allows us to make
assumptions whether the proposed C3-enabled CNN-LSTM model is scalable for the huge
dataset or in other said scenarios. We changed the input sample, bias of the model, changed
some weights and tried some different features and then analysed the model performance.
In the scenario where we changed the weights of the model, but the input remained
constant, the proposed model was not affected. Figure 9 shows the impact of these factors
on the execution time of the models. We compared our model execution time with other
benchmark models in this scenario. This analysis shows, even in the said scenario, that our
model outperforms and shows a lower execution time because of the inclusion of the C3
block. Figure 11 shows the comparison of the poposed model with other hybrid models in
terms of MAPE error. From the figure we can observe that MAPE error rate of proposed
model is lower than WTNNEA [32], WGMIPSO [33] and another hybrid model [34]. This
result demonstrates that the proposed model outperforms these hybrid models.
creased because of the DEA optimization algorithm, but it came at the cost of a higher
execution time due to its greedy nature. From Figure 8, it is obvious our model becomes
more accurate with lower execution times. The reasons behind the low execution time is
the C3 block, which lessens the convolutional layers to only three. There are two main
reasons for the low execution time of the C3-enabled CNN-LSTM model, i.e., cross-chan-
Energies 2022, 15, 2263 14 of 17
nel-communication block which enabled the channel communication within layer and the
use of the ReLu activation function instead of sigmoid in the LSTM model.
(a) (b)
We have made a scalability analysis in Figure 10. This analysis allows us to make
assumptions whether the proposed C3-enabled CNN-LSTM model is scalable for the huge
dataset or in other said scenarios. We changed the input sample, bias of the model,
changed some weights and tried some different features and then analysed the model
performance. In the scenario where we changed the weights of the model, but the input
remained constant, the proposed model was not affected. Figure 9 shows the impact of
these factors on the execution time of the models. We compared our model execution time
with other benchmark models in this scenario. This analysis shows, even in the said sce-
nario, that our model outperforms and shows a lower execution time because of the in-
clusion of the C3 block. Figure 11 shows the comparison of the poposed model with other
hybrid models in terms of MAPE error. From the figure we can observe that MAPE error
rate of(c)
proposed model is lower than WTNNEA [32], WGMIPSO (d) [33] and another hybrid
model [34]. This result demonstrates that the proposed model outperforms
Figure 9. Comparative analysis of proposed model with respect to time execution. these hybrid
Figure 9a–d
Figure 9. Comparative analysis of proposed model with respect to time execution. (a–d) shows the
models.
shows the execution time of four different test results respectively for four grids.
execution time of four different test results respectively for four grids.
Figure 10. Scalability analysis of our model with other state-of-the-art models.
Figure 10. Scalability analysis of our model with other state-of-the-art models.
Energies 2022, 15, 2263 15 of 17
Figure 11. Comparison of proposed model with other hybrid models in terms of MAPE. WTN-
NEA [32], WGMIPSO [33] and an other hybrid model [34].
6. Conclusions
Accurate electric-load forecasting is critical for decision making and system function-
ing in electricity power grids. With efficient forecasting of load demand, operators may
create an ideal market strategy to maximize the economic benefits of energy management.
In this manuscript, a hybrid C3-enabled CNN-LSTM model for load forecasting is pro-
posed. The proposed model contains three parts, i.e., convolutional layers, a C3 block
and LSTM layers. The convolutional layers and C3 block worked to extract the important
features from the load data and LSTM layers were used to predict the load. Two different
datasets of electricity load were used, named as NYISO and ISO NE. In the model, ReLu
functions were used as activation functions. The presented experiments show that the
proposed model gained 98.3% accuracy in prediction. The proposed model is compared
with other state-of-the-art methods, i.e., LSTM, CNN-LSTM, and Bi-LSTM based on MAPE
and execution time. The proposed model showed a 0.4560% error rate while LSTM showed
4.3664%, CNN-LSTM showed 2.3123%, and Bi-LSTM showed 2.5397%. As the proposed
model used a C3 block inside the CNN network, making the model shallow, the execution
time of the proposed model is comparatively less than other benchmark models.
Author Contributions: F.S. performed conceptualization, prepare the methodologies, performed the
experiments and validation of the model while A.P. prepared the first draft, completed the writing
process and carried out formal analysis. H.S. supervised the work and with provided all the resources.
All authors have read and agreed to the published version of the manuscript.
Funding: This article received no external funding than NRF.
Acknowledgments: This research is supported by the National Research Foundation of Korea. Grant
funded by Korean Government (MSIP, South Korea) Number: 2020R1C1C1007127).
Conflicts of Interest: The authors declare no conflict of interest.
Energies 2022, 15, 2263 16 of 17
Nomenclature
BP Back Propagation
DR Demand Response
LSTM Long short-term memory
MAPE Mean absolute percentage error
RMSE Root mean square error
SVM Support vector machine
ARIMA Auto-regressive integrated moving average
BPNN BP neural network
ELM Extreme learning machine
HEMS Home energy management system
RNN Recurrent neural networks
WNN Wavelet neural network
ReLu Rectified Linear Unit
SG Smart grid
References
1. Colak, I.; Fulli, G.; Sagiroglu, S.; Yesilbudak, M.; Covrig, C.F. Smart grid projects in Europe: Current status, maturity and future
scenarios. Appl. Energy 2015, 152, 58–70. [CrossRef]
2. Jian, L.; Zheng, Y.; Xiao, X.; Chan, C.C. Optimal scheduling for vehicle-to-grid operation with stochastic connection of plug-in
electric vehicles to smart grid. Appl. Energy 2015, 146, 150–161. [CrossRef]
3. Yu, M.; Hong, S.H. Supply–demand balancing for power management in smart grid: A Stackelberg game approach. Appl. Energy
2016, 164, 702–710. [CrossRef]
4. Siano, P. Demand response and smart grids—A survey. Renew. Sustain. Energy Rev. 2014, 30, 461–478. [CrossRef]
5. IEEE Xplore. Energy Efficient Integration of Renewable Energy Sources in the Smart Grid for Demand Side Management. IEEE J.
Mag. Available online: https://ieeexplore.ieee.org/abstract/document/8443332 (accessed on 15 February 2022).
6. Xiao, L.; Shao, W.; Wang, C.; Zhang, K.; Lu, H. Research and application of a hybrid model based on multi-objective optimization
for electrical load forecasting. Appl. Energy 2016, 180, 213–233. [CrossRef]
7. Alahakoon, D.; Yu, X. Smart Electricity Meter Data Intelligence for Future Energy Systems: A Survey. IEEE Trans. Ind. Inform.
2016, 12, 425–436. [CrossRef]
8. Hernandez, L.; Baladron, C.; Aguiar, J.M.; Carro, B.; Sanchez-Esguevillas, A.J.; Lloret, J.; Massana, J. A survey on electric power
demand forecasting: Future trends in smart grids, microgrids and smart buildings. IEEE Commun. Surv. Tutor. 2014, 16, 1460–1495.
[CrossRef]
9. Rahman, A.; Srikumar, V.; Smith, A.D. Predicting electricity consumption for commercial and residential buildings using deep
recurrent neural networks. Appl. Energy 2018, 212, 372–385. [CrossRef]
10. Saeed, F.; Paul, A.; Ahmed, M.J.; Gul, M.J.J.; Hong, W.H.; Seo, H. Intelligent implementation of residential demand response
using multiagent system and deep neural networks. Concurr. Comput. Pract. Exp. 2021, 33, e6168. [CrossRef]
11. Tavassoli-Hojati, Z.; Ghaderi, S.F.; Iranmanesh, H.; Hilber, P.; Shayesteh, E. A self-partitioning local neuro fuzzy model for
short-term load forecasting in smart grids. Energy 2020, 199, 117514. [CrossRef]
12. Li, L.; Ota, K.; Dong, M. When weather matters: IoT-based electrical load forecasting for smart grid. IEEE Commun. Mag. 2017, 55,
46–51. [CrossRef]
13. Ayub, N.; Javaid, N.; Mujeeb, S.; Zahid, M.; Khan, W.Z.; Khattak, M.U. Electricity Load Forecasting in Smart Grids Using Support
Vector Machine. Adv. Intell. Syst. Comput. 2019, 926, 1–13.
14. Khan, Z.A.; Jayaweera, D. Approach for forecasting smart customer demand with significant energy demand variability. In
Proceedings of the IEEE International Conference on Power, Energy & Smart Grid (ICPESG), Mirpur Azad Kashmir, Pakistan,
9–10 April 2018; pp. 1–5. [CrossRef]
15. Usman, M.; Ali Khan, Z.; Khan, I.U.; Javaid, S.; Javaid, N. Data Analytics for Short Term Price and Load Forecasting in Smart
Grids using Enhanced Recurrent Neural Network. In Proceedings of the Emerging Technologies Blockchain and IoT: ITT
2019—Information Technology Trends, Ras Al Khaimah, United Arab Emirates, 20–21 November 2019; pp. 84–88. [CrossRef]
16. Dahl, M.; Brun, A.; Kirsebom, O.S.; Andresen, G.B. Improving Short-Term Heat Load Forecasts with Calendar and Holiday Data.
Energies 2018, 11, 1678. [CrossRef]
17. Wang, G.; Wang, X.; Wang, Z.; Ma, C.; Song, Z. A VMD–CISSA–LSSVM Based Electricity Load Forecasting Model. Mathematics
2021, 10, 28. [CrossRef]
18. Ali, M.; Adnan, M.; Tariq, M. Optimum control strategies for short term load forecasting in smart grids. Int. J. Electr. Power Energy
Syst. 2019, 113, 792–806. [CrossRef]
19. Syed, D.; Refaat, S.S.; Abu-Rub, H. Performance evaluation of distributed machine learning for load forecasting in smart grids.
In Proceedings of the 2020 Cybernetics & Informatics (K&I), Velke Karlovice, Czech Republic, 29 January–1 February 2020.
[CrossRef]
Energies 2022, 15, 2263 17 of 17
20. Yang, Y.; Hong, W.; Li, S. Deep ensemble learning based probabilistic load forecasting in smart grids. Energy 2019, 189, 116324.
[CrossRef]
21. Yang, J.; Ren, Z.; Gan, C.; Zhu, H.; Parikh, D. Cross-channel Communication Networks. Adv. Neural Inf. Process. Syst. 2019, 32.
22. Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the 5th International
Conference on Learning Representations, ICLR 2017, Toulon, France, 24–26 April 2017. Available online: https://arxiv.org/abs/16
09.02907v4 (accessed on 15 February 2022).
23. Veličković, P.; Casanova, A.; Liò, P.; Cucurull, G.; Romero, A.; Bengio, Y. Graph Attention Networks. In Proceedings of the 6th
International Conference on Learning Representations, ICLR 2018, Vancouver, BC, Canada, 30 April–3 May 2018. Available
online: https://arxiv.org/abs/1710.10903v3 (accessed on 15 February 2022).
24. Yang, J.; Lu, J.; Lee, S.; Batra, D.; Parikh, D. Graph R-CNN for Scene Graph Generation. ECCV 2018, 670–685. Available
online: https://openaccess.thecvf.com/content_ECCV_2018/papers/Jianwei_Yang_Graph_R-CNN_for_ECCV_2018_paper.pdf
(accessed on 15 February 2022).
25. Wang, X.; Girshick, R.; Gupta, A.; He, K. Non-Local Neural Networks. ECCV 2018, 7794–7803. Available online: https://openaccess.
thecvf.com/content_cvpr_2018/papers/Wang_Non-Local_Neural_Networks_CVPR_2018_paper.pdf (accessed on 15 February
2022).
26. Xu, B.; Wang, N.; Kong, H.; Chen, T.; Li, M. Empirical Evaluation of Rectified Activations in Convolutional Network. Available
online: https://arxiv.org/abs/1505.00853v2 (accessed on 15 February 2022).
27. ISO_NE_Network Electricity Markit Data. Available online: https://www.iso-ne.com/isoexpress/web/reports/pricing (accessed
on 23 January 2022).
28. NYISO. NYISO Market Opration Data. 2019. Available online: http://www.nyiso.com/public/markets_operations/market_
data/custom_report (accessed on 23 January 2022).
29. Kong, W.; Dong, Z.Y.; Jia, Y.; Hill, D.J.; Xu, Y.; Zhang, Y. Short-Term Residential Load Forecasting Based on LSTM Recurrent
Neural Network. IEEE Trans. Smart Grid 2019, 10, 841–851. [CrossRef]
30. Alhussein, M.; Aurangzeb, K.; Haider, S.I. Hybrid CNN-LSTM Model for Short-Term Individual Household Load Forecasting.
IEEE Access 2020, 8, 180544–180557. [CrossRef]
31. Gul, M.J.; Urfa, G.M.; Paul, A.; Moon, J.; Rho, S.; Hwang, E. Mid-term electricity load prediction using CNN and Bi-LSTM.
J. Supercomput. 2021, 77, 10942–10958. [CrossRef]
32. Amjady, N.; Keynia, F. Short-term load forecasting of power systems by combination of wavelet transform and neuro-evolutionary
algorithm. Energy 2009, 34, 46–57. [CrossRef]
33. Li, S.; Goel, L.; Wang, P. An ensemble approach for short-term load forecasting by extreme learning machine. Appl. Energy 2016,
170, 22–29. [CrossRef]
34. Zhang, J.; Wei, Y.M.; Li, D.; Tan, Z.; Zhou, J. Short term electricity load forecasting using a hybrid model. Energy 2018, 158,
774–781. [CrossRef]