10 1109@access 2020 3028281

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

This article has been accepted for publication in a future issue of this journal, but has not been

fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

Date of publication xxxx 00, 0000, date of current version xxxx 00, 0000.
Digital Object Identifier 10.1109/ACCESS.2019.Doi Number

Hybrid CNN-LSTM Model for Short-Term


Individual Household Load Forecasting
Musaed Alhussein1, Khursheed Aurangzeb1, and Syed Irtaza Haider1
1
Computer Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh 11543, Saudi Arabia

Corresponding author: Khursheed Aurangzeb (e-mail: kaurangzeb@ksu.edu.sa).


The authors extend their appreciation to the Deputyship for Research & Innovation, “Ministry of Education“ in Saudi Arabia for funding this research work
through the project number IFKSURG-1438-034.

ABSTRACT Power grids are transforming into flexible, smart, and cooperative systems with greater
dissemination of distributed energy resources, advanced metering infrastructure, and advanced
communication technologies. Short-term electric load forecasting for individual residential customers plays
a progressively crucial role in the operation and planning of future grids. Compared to the aggregated
electrical load at the community level, the prediction of individual household electric loads is legitimately
challenging because of the high uncertainty and volatility involved. Results from previous studies show that
prediction using machine learning and deep learning models is far from accurate, and there is still room for
improvement. We herein propose a deep learning framework based on a combination of a convolutional
neural network (CNN) and long short-term memory (LSTM). The proposed hybrid CNN-LSTM model uses
CNN layers for feature extraction from the input data with LSTM layers for sequence learning. The
performance of our developed framework is comprehensively compared to state-of-the-art systems currently
in use for short-term individual household electric load forecasting. The proposed model achieved
significantly better results compared to other competing techniques. We evaluated our proposed model with
the recently explored LSTM-based deep learning model on a publicly available electrical load data of
individual household customers from the Smart Grid Smart City (SGSC) project. We obtained an average
mean absolute percentage error (MAPE) of 40.38% for individual household electric load forecasts in
comparison with the LSTM-based model that obtained an average MAPE of 44.06%. Furthermore, we
evaluated the effectiveness of the proposed model on different time horizons (up to 3 h ahead). Compared to
the recently developed LSTM-based model tested on the same dataset, we obtained 4.01%, 4.76%, and 5.98%
improvement for one, two, and six look-forward time steps, respectively (with 2 lookback time steps).
Additionally, we have performed clustering analysis based on the power consumption behavior of the energy
users, which indicate that prediction accuracy could be improved by grouping and training the representative
model using large amount of data. The results indicated that the proposed model outperforms the LSTM-
based model for both 1 h ahead and 3 h ahead in forecasting individual household electric loads.

INDEX TERMS CNN; deep learning framework; energy consumption; energy consumption forecasting;
Individual household; LSTM.

I. INTRODUCTION is critical. Subsequently, it is crucial to design and develop


Short-term electric load forecasting is a vital part of the energy scheduling plans for efficient usage of the power available.
sector since it concerns the forecasting of power consumption Moreover, forecasting errors have a considerable influence on
in the subsequent few hours. The accurate prediction of the the safety check, the dynamic state estimation, and power load
load can significantly support the operations, maintenance, dispatching of the power grid [2],[3]. To support both system
and management of the power system. Energy cannot be operability and planning, power distribution companies rely
stored in considerable quantity, which means that there must on accurate forecasts of generation and consumption with
be a fair balance between the generation and demand [1]. For different time horizons.
the accurate and effective scheduling of power operations, a
precise prediction of the power generated and the power load

VOLUME XX, 2019 1

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

The integration of the information and communication training set, while in contrast CNNs although are used to
technology (ICT) and advanced metering infrastructure (AMI) extract patterns of local trend as well as the same pattern which
in the traditional power grid (TPG) results in its transformation appears in different region of time-series data, they are not
into a smart grid (SG) that enables bi-directional usually adapted for long temporal dependencies. Therefore, a
communication between consumers and the utility. By hybrid model that exploits the benefits of both deep learning
integrating the ICT in power grids, it is possible to monitor and techniques could improve the forecasting accuracy.
optimize power generation, power distribution, and power In this paper, we propose a CNN-LSTM model that utilizes
consumption. Due to intelligent techniques and ICT, the SGs the ability of convolution layers to learn the internal
empower its consumers with reliable, economical, sustainable, representation of time-series data and obtain the important
secure, and efficient energy. Demand-side management attributes as well as the usefulness of LSTM layers to identify
(DSM) technology applied in SGs enables efficient load short-term and long-term dependencies. The proposed model
utilization by shifting end customer load from peak hours to was developed and evaluated on real-world load consumption
off-peak hours, helping both in cost reduction and energy data of various individual households from the Smart Grid
management of the power grids. The opportunity of a two-way Smart City (SGSC) project funded by the Australian
communication flow between utility and consumers Government [13]. We compared the performance of the
empowers the optimization of energy consumption, which proposed model with existing state-of-the-art methods in
helps in refining the management and operation of the power individual household load forecasting. The effectiveness of
system [4]–[8]. the proposed model was further evaluated for various time
The forecast for energy consumption over time allows horizons. The results show that the exploitation of
individual customers to assess their consumption habits and, convolutional layers along with LSTM layers could provide a
whenever possible, to shift their energy use to off-peak significant improvement in the accuracy of individual
periods. Accurate prediction of energy consumption provides household load forecasting.
energy users with the opportunity of relating their current The major contributions of this paper are: (1) developing a
usage pattern with the future expense of their energy. hybrid CNN-LSTM model, which can exploit the benefits of
Consequently, these users might take advantage of the convolutional layers and LSTM layers; (2) Illustrating the
forecasting algorithms through awareness of their energy effectiveness of the proposed model in individual household
consumption and future projections, and they might be able to load forecasting in comparison with existing state-of-the-art
manage the expenses of their energy usage more efficiently. methods; (3) validating the efficacy of the proposed model for
Energy customers play an important role in smart grid demand various time horizons; (4) investigating the clustering
response and can be divided into three categories: residential, behavior by grouping customers with structural similarity in
business, and industrial sectors. The residential sector their load profiles.
consumes a significant quantity of the total generated energy. The remainder of the paper is organized in the following
The AMI installed in the residential sector is highly helpful in way. Section II provides a literature review of short-term
forecasting the short-term power load of the end energy load forecasting. Data analysis and problem formulation are
customers [9]. provided in Section III. The proposed CNN-LSTM model is
In the past, statistical and machine learning models have been presented in Section IV. Finally, the results and discussions
developed for predicting energy generation through renewable are elaborated in Section V.
resources as well as aggregated load forecasting. These
learning models are established on time-series analyses and II. RELATED WORK
can be termed data-driven models [10], [11]. A lot of research work has been conducted in the field of short-
Several approaches have been reported in the literature to term power load forecasting. Previously, conventional
address short-term electric load forecasting. Very few of them, statistical analysis techniques were used for such time-series
however, addressed individual households. Recently, a deep analyses. Recently, with the enormous progress in the fields of
learning model based on long short term memory (LSTM) has artificial intelligence and machine/deep learning, researchers
been developed for short-term individual electric load have developed various deep learning models for load
forecasting [12]. Their proposed model outperforms some forecasting problems.
well-known machine-learning methods. Artificial neural networks (ANNs) have been effectively used
LSTM networks and CNNs are probably the most widely used for short term load forecasting at an industrial scale due to their
techniques of deep learning. The main idea of utilizing such nonlinear mapping attributes [14]. The main issue with ANN-
models on time-series data is that the LSTM networks are able based forecasting models is that these models can easily be
to capture the sequence pattern information while CNN stuck into local minima, which causes poor generalization.
models are useful in extracting the valuable features and may Moreover, the forecasting frameworks based on ANN models
filter out the noise of the input data. However, the LSTM can be over-fitted, and their convergence rate is slow [15].
networks although are designed to work with temporal
correlations, they utilize only the attributes provided in the

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

The forecasting problems can also be solved by applying other respectively. In our previous works in [37]-[40], we have
machine-learning models such as generalized regression developed several efficient models for power load forecasting.
neural networks (GRNNs) [16], support vector machines In [12], the authors proposed a deep learning model based on
(SVMs) [17], extreme learning machine neural networks LSTM for short-term residential load forecasting. They
(ELMNNs) [18] and Kernel-based Support Vector Quantile compared their model with state-of-the-art machine learning
Regression [19]. The prediction accuracy of the ELMNN is models as well as empirical models. Their proposed model
intensely reliant on the applied activation function. The outperformed all the rival techniques and achieved an average
unsystematically chosen activation function will cause poor MAPE of 44.06% for short-term residential load forecasting
generalization [20]. Moreover, it is not suitable for prediction of 69 customers.
problems that require deep extraction of features as it cannot Based on these recent explorations conducted by valuable
encode the sequence of layers (it can encode one layer only). researchers, there is a vibrant and increasingly understandable
The GRNN model is computationally much more complex, research tendency that looks at challenges related to
which makes it inappropriate for such forecasting issues [21]. behavioral and other factors that have an impact on the energy
The attributes of all of these models, including large memory consumption of the individual household. The motivation is to
space requirements, high computational complexity, and observe and get feedback on power usage patterns of each
optimal choice of a kernel of SVM based models, make them particular household energy user and determine important
inappropriate for such forecasting problems [22]. fundamental relations between the contextual issues such as
Most of the developed models for short-term load forecasting time of use, day of the week (weekday or weekend), season,
focus on aggregated load forecasting [23]-[30]. For supporting etc. It is anticipated that the insights through such explorations
future smart grid applications, short-term power load may enhance the understanding and awareness of household
forecasting of individual energy customers is gaining energy consumption, which will lead us to better usage of the
increasing interest, a subject that has been targeted by very few electricity.
researchers in the recent past. Our proposed method not only outperforms [12] for the next
The authors in [31] considered the functional time series time step forecasting but also achieved much better results for
approach to examine the individual household load load forecasting of up to 3 h ahead. In [12], the authors
forecasting. Their evaluation is based on the root mean square forecasted the power load value after the next 30 minutes (look
error (RMSE). In [32], the Kalman filter is used to estimate the ahead). In our analysis, we included simulations for up to the
load of the individual household for various time horizons and next three hours. Additionally, we have performed clustering
sampling periods. They argued that the chosen sampling rate analysis based on the power consumption behavior of the
provides a compromise between accuracy and computational energy users, in order to analyze whether the prediction
complexity. accuracy could be improved by grouping users of similar
The authors in [33] applied SVM and ANN methods on high- energy profile.
resolution data collected over thirty days from three houses.
They obtained considerable improvements in the mean III. DATA ANALYSIS AND PROBLEM FORMULATION
absolute error of (4% – 33%). In [34], the authors proposed an Individual household load forecasting is quite challenging
approach based on the activity sequence and support vector because hourly consumption of electricity depends on several
regression. They concluded that the activity sequence variable factors, such as the number of persons living in each
is an impelling factor that could enhance the accuracy of household, the number of major appliances running at a
individual household load forecasting for a time horizon of particular time, weather conditions, economics, lifestyle, and
fifteen minutes ahead. daily routines, etc. An individual household load can lack a
The authors in [35] explored several forecasting models, such stable pattern and fluctuate even in consecutive hours. On the
as neural networks, ARIMA, and exponential smoothing for contrary, forecasting the aggregated power load at the
horizons ranging from 15 min to 24 h. They evaluated the community or utility level is comparatively easy [40]. The
developed models using two data sets. One dataset was from diversity in the aggregated power demand smooths daily
six households in the United States, while the other was from power load shapes, which make relative forecasting errors
a single household in Germany. They obtained an average quite low in terms of MAPE.
mean absolute percentage error (MAPE) of 85% and 30% for In this work, we have used the data gathered during the SGSC
the data sets from the United States and Germany respectively. project initiated by the Australian Government [13]. The
In [36], the authors applied several different models, including SGSC collected the power consumption data for about 10,000
SVM, classification and regression trees, and multilayer customers in Australia. Short-term individual households
perceptron neural networks. They concluded that a electric load forecasting is one of the research areas that can
combination of household behavioral data and historical utilize the data gathered during this project.
electricity usage data from individual households could Our aim in this work is to forecast the power demand of a
significantly improve the forecasting accuracy. They achieved group of general individual customers. Therefore, it was
a MAPE of 51% and 48 % for the neural networks and SVM, unrealistic to consider all the customers available in the SGSC

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

database. For the demonstration of the proposed method to outliers with no clusters. This customer exhibits highly
make a one-on-one comparison with [12], we selected a subset volatile power consumption behavior and is hence very
of the SGSC dataset, the customers who owned a hot water difficult to predict accurately.
system. Based on this selection criterion, we separated a
reasonably-sized subset of data, which corresponded to 69
customers.
Below, we present a brief analysis of the power load profiles
of some of the individual households. For evaluating the
regularity in daily power consumption profiles, we applied the
well-known density-based clustering technique called
‘density-based spatial clustering of application with noise’
(DBSCAN) [41]. The advantage of using the DBSCAN for
regularity analysis in a power consumption profile is that it
does not need cluster information in the dataset. Additionally, FIGURE 2. Number of major/minor clusters and outliers for a few
it includes outliers in the dataset. Generally, the power randomly selected customers.
consumption behavior of the customers will be repeated For some customers like (customer ID 10509861), there exist
during weekdays, which makes the DBSCAN an ideal more than one prominent pattern in daily profiles. For such
clustering technique for identifying outliers in the dataset of customers, it is difficult to apply the commonly used
daily power consumption. If the outcome of the DBSCAN forecasting schemes that are based on features such as time of
shows a low number of outliers, it means that the regularity in the day, any day of the week.
power consumption behavior is high. The load profile for a randomly selected day of 5th June 2013
Figure 1 shows the half-hourly power consumption profile of for an individual customer (customer ID 8198267) as well as
randomly selected customer (customer ID 11462018) for the the aggregated load of 69 customers for the same day are
considered period of 92 days. It can be inferred from the figure shown in Figure 3. It is evident from this figure that compared
that the behavior of this customer varies over a span of three to the power profile of an individual customer, the aggregated
months and makes one major cluster, one minor cluster, and power profile of 69 customers have smooth variations. The
individual customer profile shows a peak in the evening
some outliers.
between 07:00 p.m. and 07:30 p.m. and a second peak that is
smaller than the peak in the evening, between 01:30 p.m. and
02:00 p.m. On the contrary, the aggregated power profile
shows a distinct peak in the morning and the second distinct
peak in the evening.

FIGURE 1. The 92 curves of Customer 11462018 grouped into one major


cluster, one minor cluster, and five outliers.
Figure 2 shows the number of major and minor clusters along
with the number of outliers in the power consumption of the
daily profiles of a few randomly selected customers. The
independent axis shows customer identification (ID), while
the dependent axis indicates the number of major/minor
clusters and outliers. As seen in the figure, the number of
major/minor clusters and outliers vary from customer to
customer.
For instance, customer ID 8459427 has only one major cluster
with no outliers, which signifies that the household has a
regular power consumption pattern and could be easily FIGURE 3. Electricity load profile across a 24 h period on 5th June 2013
based on data [13]; (A) for an individual customer (customer ID 8198267),
predictable. On the contrary, customer ID 8282282 has all (B) aggregated for 69 customers.

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

The above analysis signifies that the load forecasting of an


individual customer is challenging due to the abrupt variations
in daily profiles. For every individual customer, a deep
learning model should be trained and evaluated on his own set
of data. This means that for 69 different customers, 69
different models should be trained and tested on their
respective datasets.
The CNN and the LSTM are the most commonly used
machine learning models. Our main purpose in designing the
hybrid model of CNN and LSTM layers is to exploit their
characteristics for developing an efficient model for load
forecasting of the individual household. The individual
household load is a time-series data for which we chose the
LSTM layers because of their capability to extract the
sequence pattern information as well as short-term and long-
term dependencies. On the other hand, the CNN layers are
employed due to their capability of extracting the valuable FIGURE 4. Data preparation steps.
features embedded in the time series data. Additionally, the B. Proposed Model Architecture
CNN layers are helpful in filtering out the noise of the input The architecture of the proposed CNN-LSTM based deep
data. Consequently, a hybrid model that exploits the benefits learning framework is shown in Figure 5.
of both CNN and LSTM is expected to enhance the load The CNN feature extraction block consists of three 1D
forecasting accuracy of the individual household. convolutional layers. We incorporated the MaxPooling layer
and Rectified Linear Unit (ReLU) layer in between the two
IV. THE PROPOSED HYBRID DEEP LEARNING consecutive convolution layers. The convolutional operation
FRAMEWORK
is highly effective and piling several convolutional layers in a
In this section, we discuss the attributes considered for the
deep learning framework enables the initial layers to learn
developed model. Next, we describe the architecture of the
low-level features in the applied input. The feature map, which
proposed CNN-LSTM model. The convolutional layers are
is the output of the convolutional layers has a limitation that it
used to extract the valuable features from the input data while
keeps track of the precise location of the features in the input.
LSTM layers are used to exploit short-term and long-term
It means that little movements in the location of the feature in
dependencies.
the input will lead to a different feature map. A pooling layer
A. Feature Preparation is usually added after the convolutional layer for mitigating the
In this work, we study the energy data from the SGSC project. limitation of the invariance of the produced feature map
The raw dataset acquired using smart-meters contains half- whereas the activation function is applied for enhancing the
hourly load consumption of residential customers measured in capability of the model for learning complex structures. In our
kilowatt-hour (kWh). We used the most commonly exploited developed model, we have added a MaxPooling layer which
features in load forecasting literature to obtain the attributes is a down-sampling scheme that reduces the spatial dimension
from the data. The electricity load for residential customers of the feature maps by a factor of 2, hence reduces the overall
can vary considerably across different hours of a day as well computational load. The ReLU activation function is resilient
as various days of the week, therefore features such as an hour against the gradient vanishing problem and has been widely
of the day, day of the week, and holiday indicator are implemented by various researchers to make the network more
considered. The seasonal impact is excluded from the analysis trainable.
as the data only corresponded to the winter season in Australia. In the development of any deep learning model, the dropout
The input feature vector shown in Figure 4 is composed of the layer offers a cool way to relieve the overfitting issue. This
following attributes: layer includes the random selection of neurons and
1. The energy consumption sequence Ei for the past K time deactivating some of them in the training process. In this work,
steps. The energy consumption data was first applied to we have incorporated a dropout layer between the CNN
the Min-Max normalization technique. feature extraction block and the LSTM sequence learning to
2. The one-hot encoded hour indicator Ti which indicates the prevent overfitting. The output of the sequence learning block
time of the day for the past K time steps (ranges from 1 to is connected to a dropout layer, followed by a fully connected
48) layer to produce the final output.
3. The one-hot encoded day of week indicator Di for the past
K time steps (range of values 0 to 6)
4. The one-hot encoded holiday indicator Hi for the past K
time steps (which can be 0 or 1).

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

TABLE 1
CONFIGURATION OF THE VARIOUS LAYERS OF THE PROPOSED MODEL.

Proposed Model
Kernels 48
Convolution1
Size of Receptive Field 3
MaxPooling - -
ReLU - -
Kernels 32
Convolution2
Size of Receptive Field 3
MaxPooling - -
ReLU - -
Kernels 16
Convolution3
Size of Receptive Field 3
MaxPooling - -
ReLU - -
Dropout1 - 0.25
Hidden Nodes 20
LSTM1
Return Sequence True
Hidden Nodes 20
LSTM2
Return Sequence True
Hidden Nodes 20
LSTM3
Return Sequence False
Dropout2 - 0.25
Fully Connected Hidden Nodes 20
Output Hidden Nodes 1/2/6
The parameter settings of the developed deep learning
FIGURE 5. Proposed deep CNN-LSTM framework. framework are presented in Table II. In this work, we have
used a well-known optimizer ‘Adam’ and mean absolute error
It is a common practice to adopt a coarse-to-fine approach as a loss function.
when developing a CNN model. This structure introduces TABLE II.
PARAMETER SETTINGS OF THE DEVELOPED MODEL.
higher computational complexity as it involves a large number
Parameter Setting
of trainable parameters. We chose a pyramid architecture, as
discussed in [43] when developing our CNN feature extraction Optimizer Adam
block, where the number of kernels is large in the lower-level Loss Function Mean Absolute Error (MAE)
layers, which are gradually decreased by a constant as we go Learning Rate {0.001}
down the higher-level layers. The configuration of various
Learning Rate Monitor = validation loss, patience = 10 Epochs,
layers of the proposed model are provided in Table I. We
Adjustment factor = 0.8, minimum learning rate = 1e-5
select a kernel size of 48 for the first convolution layer, which
is reduced to 32 and 16 for the second and third convolution Batch Size {128}
layers. This type of structure avoids overfitting and reduces Epoch {150}
the number of trainable parameters. Figure 6 shows the training flow of the proposed deep learning
In the sequence learning block, we used three LSTM layers model. The input data is split into 70% training, 20%
with 20 neurons each. The return sequence is set to true for the validation, and 10% test data. We used mean absolute error
first two LSTM layers so that the network will output the full (MAE) as a loss function to monitor the validation loss.
sequence of hidden states whereas, in the final LSTM layer, Initially, the training data and the validation data are loaded,
the return sequence is set to false so that the network will and the training process is initialized. After completing each
output the hidden state at the final time step. We used the epoch, the validation loss is determined and checked to see if
dropout layer before the fully connected layer to avoid over- it is decreasing. If the validation loss is decreasing, then the
fitting. The fully connected layer has 20 neurons. The number model is saved with the updated weights, and the epochs are
of neurons in the output layers are varied from one to six for incremented. However, if the validation loss is not decreasing
evaluating the different number of lookahead (up to 3 h ahead for ten consecutive epochs, then the learning rate is decreased,
load forecasting). and the epochs are incremented. The training stops when the
epoch count reaches 150. We load the last saved best model
for prediction and evaluation on the test data to avoid
overfitting.

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

The various attributes of the data, including the energy


consumption value, the time of the day indicator, the day of
the week indicator, and the holiday indicator, are organized
column wise. The numbering of the various attributes of
column-wise data is shown in Table III.
A. Single-Step Forecast
In this section, we investigated the comparison of various
machine learning and deep learning models with the proposed
model for a single-step forecast, i.e., to predict the value at the
next time step.
In [12], the authors evaluated various machine learning
models including the ELM [27], KNN [28], [29],
backpropagation neural network (BPNN), and the input
selection combined with hybrid forecasting (IS-HF) [30] on
the same dataset. The comparison of the MAPE results for
various machine learning models, including the deep learning
model based on LSTM [12] with the proposed hybrid model
for different lookback time steps, is shown in Table IV.
TABLE IV.
COMPARISON OF THE PROPOSED DEEP LEARNING FRAMEWORK WITH
OTHER MODELS.
FIGURE 6. Training flow of the proposed model. Scenario Average MAPE Individual
Method
(Lookback) Forecasts (%)
V. RESULTS AND DISCUSSIONS 2-time steps 40.38
In this study, we chose a pool of 69 customers out of thousands
CNN-LSTM 6-time steps 41.07
of customers' data. The data were retrieved from the SGSC
project initiated by the Australian Government [13]. The 69 12-time steps 42.85
customers were chosen based on the criterion of energy users 2 time steps 44.39
having a hot water system. Some customers had more than one LSTM 6-time steps 44.31
year of data available, while some had only a few months of 12-time steps 44.06
data available. We selected data starting from June (01 June
2-time steps 49.62
2013) until the end of August (31 August 2013), when all of
the customers' data was available. We excluded the seasonal BPNN 6-time steps 49.04
impact from our analysis as the data only corresponded to the 12-time steps 49.49
winter season in Australia. For each customer, the data 2-time steps 74.83
spanned 92 days. We partitioned the data into training, KNN 6-time steps 71.19
validation, and test data as 70%, 20%, and 10%, respectively.
12-time steps 81.13
The developed model was experimented with different
configurations of various lookback time steps to include two, 2-time steps 122.90
six, and twelve. In addition, the model was evaluated for a ELM 6-time steps 136.49
different look ahead/forward time steps such as half-hour, one 12-time steps 123.45
hour, and 3 h corresponding to one, two, and six look forward MAPE Minimization - 46.00
time steps.
TABLE III. IS-HF - 96.76
ATTRIBUTES OF THE DATA SET USED FOR TRAINING AND TESTING OF THE Empirical Mean - 136.46
DEVELOPED MODEL.
Attribute No. Description Formula
We did not re-evaluate all of the machine learning models, and
1 Energy Consumption sequence E empirical methods mentioned in Table IV solely for
2 – 49 Time of the day Indicator Ti, i = 1, 2, .., 48 comparison. However, we did take the MAPE values of these
(dummy variable) methods from Table II of [12]. The LSTM based model
50 – 56 Day of the week Indicator Di, i =1, 2, .., 7 achieved an average MAPE of 44.06% for a look back of 12-
(dummy variable) time steps. As concluded in [12], the average MAPE of the
57 – 58 Holiday Indicator (dummy Hi, i = 0 / 1 deep learning model based on the LSTM architecture is better
variable) than those of the other machine learning models as well as
empirical methods for various lookback time steps. Based on
this analysis, we selected the deep learning model based on the

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

LSTM architecture for evaluating the performance of the TABLE V.


THE COMPARISON OF PROPOSED MODEL WITH THE LSTM MODEL FOR
proposed hybrid CNN-LSTM model. It can be inferred from VARIOUS LOOKBACK AND LOOK-FORWARD CONFIGURATIONS.
Table IV that the proposed hybrid CNN-LSTM model Average MAPE
achieved an average MAPE of 40.38% with a look back of Individual Forecasts
Percentage
Lookback Look forward
two-time steps. On the contrary, the model based on LSTM (%)
Improvement
(time steps) (time steps)
achieved the best result with a look back of twelve-time steps. (%)
Proposed LSTM
The proposed method achieved better forecasting performance
1 40.38 44.39 4.01
as compared to many state-of-the-art methods including
LSTM based approach. 2 2 46.98 51.74 4.76
The comparison of the proposed hybrid CNN-LSTM model 6 59.26 65.24 5.98
with the LSTM based deep learning model [12] is shown in 1 41.07 44.31 3.24
Figure 7. The independent axis in this figure shows the varying
6 2 46.73 53.66 6.93
number of outliers while the dependent axis shows the MAPE.
6 59.86 67.24 7.38
It can be observed in this figure that the MAPE of the proposed
hybrid model is lower than that of the LSTM model. For a look 1 42.85 44.06 1.21
back of 2-time steps, the hybrid CNN-LSTM model performs 12 2 48.05 56.08 8.03
the best (lower MAPE value) for 57 out of 69 households, 6 61.91 69.55 7.64
whereas the LSTM is the best predictor for 12 households. It It is evident from this table that for all the lookback and look-
is pertinent to note that as the number of outliers increases, forward time steps, the average MAPE of the proposed model
forecasting energy consumption became more challenging for is lower than that of the LSTM. It is observed that the
both the LSTM based model and the proposed hybrid percentage improvement in terms of MAPE is slightly
approach. However, the proposed model improves the overall decreasing when the lookback is varied from 2-12 time steps.
average MAPE of individual household energy consumption This trend is only observed for a look-forward of the next time
forecasting. This improvement is more noticeable when the step which may be due to the LSTM layers learning more from
outliers are relatively large. the lag features. Other than that the gap between the MAPE
values of the proposed hybrid model and the LSTM model
gets wider for increasing look-forward time steps. This
signifies that the proposed hybrid model not only outperforms
the LSTM based model for a single-step forecast but the multi-
step forecast as well. Compared to the LSTM based deep
learning model [12] for two lookback time steps, with our
developed model, we obtained 4.01%, 4.76%, and 5.98%
improvement for one, two, and six look-forward time steps
respectively.
The proposed model and the deep learning model based on
LSTM are applied to the test data, and the results are presented
in Figure 8 for three different customers. The independent axis
shows the time instances of the different days of the test data.
By carefully observing different parts of this figure, we can
see that a deep learning approach based on LSTM only has
unreasonably high peaks at various instances of time. On the
contrary, our proposed model, which is based on a
combination of CNN and LSTM quite closely follow the
FIGURE 7. MAPE vs. no. of outliers for the proposed hybrid model and
actual original load at almost all the time instances.
the LSTM model. Additionally, the LSTM based model also shows some peaks
B. Multi-Step Forecast
at time instances ‘168’ and ‘216’ in Figure 8 (b) where there
In this section, we evaluated the effectiveness of the proposed are no peaks in the actual load. In part (c) of Figure 8, there are
some peaks at time instances ahead of the real peaks. Based
model for different time-horizons, i.e., multi-step forecasting.
on all these observations, we can conclude that the proposed
For each instance of test data, the proposed model was
CNN-LSTM based deep learning framework is an efficient
assessed such that if the time horizon is set to 3 h ahead, then
approach for forecasting individual household power
the model will forecast the next six values. We compared the consumption.
proposed hybrid CNN-LSTM model with the LSTM based
model for multi-step forecasting. Table V shows the average
MAPE for the individual household forecasts for different
lookback and look-forward time steps.

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

corresponds to standard deviation and average power in each


time period respectively.
Attribute 6 – 9: For each time interval, weekend versus
𝑊𝑒𝑒𝑘𝑑𝑎𝑦
weekday difference score using 𝑊𝑡𝑖 = |𝑃𝑖 −
𝑊𝑒𝑒𝑘𝑒𝑛𝑑 Weekday Weekend
𝑃𝑖 | where Pi and Pi are the average power
during the weekdays and weekend respectively over the entire
training data.

FIGURE 9. Average silhouette score based on nine attributes of 69


customers.
For each household, the above mentioned nine attributes
summarize their consumption behavior for the training data.
In order to find the optimal number of clusters, different
alternatives have been explored by the researchers, and the
FIGURE 8. Forecast results of three customers for the proposed and the silhouette score analysis is one of the appropriate technique.
LSTM models. The silhouette index ranges from -1 to +1 and is an indicator
of how closely similar an object is to the rest of the members
C. Clustering Analysis of its cluster. A higher value of the silhouette index (close to
The research on household clustering has shifted its focus +1) shows that an object is highly similar to other members of
from attributes-oriented factors to consumption-pattern- its own cluster and is highly dissimilar to the objects of other
oriented factors. This shift is mainly due to the use of smart clusters.
meters that provides high-resolution power consumption time- The k-means clustering is iteratively performed on the above
series data. mentioned nine attributes of the 69 households from 2 to 50
In this section, we have used the k-mean clustering technique clusters and the result is shown in Figure 9. As shown in the
to group similar households based on their consumption figure, silhouette score is maximum at k equal to 15 and
pattern, i.e. load profiling. For clustering analysis, we have decreases afterwards. The high-resolution time-series load
considered the training data of the same pool of 69 customer profiles of households often carry comprehensive attributes. It
that were used in the single and multi-scale forecast in Section is common to find larger number of clusters, i.e. more than 10
V. For our analysis, we have divided a day into four periods clusters (e.g. [45], [46], [47]).
that are described below: The average power consumption patterns of the fifteen
1) Breakfast period: 6.30 AM – 9.00 AM clusters over the entire day is shown in Figure 10. For each
2) Daytime period: 9.00 AM – 3.30 PM cluster, only the training and validation data (83 days out of 92
3) Evening period: 3.30 PM–10.30 PM days) of their representative households are considered. The
4) Overnight period: 10.30 PM – 6.30 AM average power consumption is normalized with the maximum
The different attributes of the considered data are explained value in each cluster. The resulting clusters show some
below: differences from each other. For example, Cluster_1,
Cluster_8, Cluster_12, and Cluster_13 has three distinct
Attributes 1 – 4: For each time period, the relative mean power peaks, the main difference between them is the time. Cluster_1
using 𝑃𝑡𝑖 = ∑ 𝑃 ⁄𝑃𝑑 where numerator is the sum of power for has a smaller overnight peak whereas Cluster_8 doesn’t have
each time interval ti and Pd is the daily mean power over the an overnight peak. Cluster_12 and Cluster_13 don’t have a
entire training data. distinct peak in the daytime. Cluster_3, Cluster_6, Cluster_7,
Attribute 5: The mean relative standard deviation using 𝜎 = and Cluster_11 have only one distinct peak in the evening
1⁄4 ∑4𝑖=1 𝜎𝑖 ⁄𝑃𝑖 over the entire training data, where σi and Pi time.

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

FIGURE 10. The average power consumption patterns of the fifteen clusters.
households will further improve the forecasting accuracy. We
However, the difference between them is the average power identify patterns in energy usage profiles over four periods, i.e.
consumption during overnight, breakfast, and daytime as well overnight, breakfast, daytime and evening, and group the
as the width of the peak span. Cluster_2 exhibits a decreasing households with similar profiles. In this experiment, the
trend during the day-time whereas average power training, validation, and test data for each cluster are prepared
consumption drastically increases in the late evening. by concatenating their representative household attributes. For
Cluster_4, Cluster_5, Cluster_9, and Cluster_14 have two example, to train a single model for Cluster_0, we vertically
distinct peaks in the breakfast and in the evening. However, concatenate the training data attributes mentioned in Table III
Cluster_4 has a wider peak span as compared to Cluster_5. of each representative household. The result of the clustering
Moreover, Cluster_9 has a low average power consumption analysis is presented in column five of Table VI. The green
during the night time whereas a sharp rise in the early and the red color represents the best and the worst model for
breakfast. each household in Table VI. We observed that some clusters
In the previous section, we compare the individual household contain very less number of households since the number of
forecasting performance of the proposed model with the households for the analysis is not large enough (only 69
LSTM model, where the average MAPE of the LSTM model customers).
was obtained from [12] as mentioned in Table IV. In order to Column two of Table VI shows the customer ID along with
make a one-to-one MAPE comparison between the proposed the number of outliers in daily profiles of training and
model and the LSTM model for all the households, we have validation data. For each household, the total number of days
implemented the LSTM model presented in [12]. The obtained in the training (67 days) and validation data (16 days) is 83.
average MAPE of 44.68% for the LSTM model is very close The households in each cluster are arranged in increasing
to the average MAPE of 44.39% presented in [12]. We present order of the number of outliers in their training and validation
the comparison of the individual forecast using LSTM (third data. For instance, for Cluster_0, customer id 8804804 has
column), and individual forecast using the proposed method only one outlier whereas customer id 8282282 has all the
(fourth column) in terms of MAPE for each of the 69 outliers with no distinct daily profile in the training and
households in Table VI. validation data.
In this section, we perform the clustering analysis using the It is evident from the below table that for households with
proposed method to examine if by grouping the similar mostly outliers in the training and validation data will result in

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

larger MAPE, e.g. customer id 8282282, 9012348, 8291712, to the proposed method without clustering could be due to a
8540084, and 9393680 in Cluster_0, Cluster_2, Cluster_7, large number of outliers in various households in different
Cluster_9 and Cluster_10 respectively. As shown in the table clusters. It results in more variation in the training data in terms
as we move down the cluster, MAPE is generally increasing. of load characteristics that degrade the optimal learning of the
Such a large error is because of the fact that there are no model.
distinct daily profiles of such customers i.e. most of the data Another observation can be made from the table that for some
are outliers. For such households, the proposed method customers the average MAPE is relatively large with very few
without clustering achieved better results compared to the or no outliers in the training data. For instance, customer id
individual LSTM based approach as well as the clustering- 8680284 in Cluster_14 has a larger value of MAPE even
based approach. One of the reasons why the clustering-based though it has only two outliers in the training data.
approach may not be performing generally well as compared
TABLE VI. COMPARED MAPES FOR SIXTY NINE HOUSEHOLDS IN FIFTEEN CLUSTERS WITH OR WITHOUT CLUSTERING
Customer ID / No. LSTM Proposed Proposed Cluster Customer ID / No. LSTM Proposed Proposed
Cluster of Outliers in without Method Method with No. of Outliers in without Method Method with
No. Training & Cluster without Cluster Cluster Training & Cluster without Cluster Cluster
Validation Data MAPE (%) MAPE (%) MAPE (%) Validation Data MAPE (%) MAPE (%) MAPE (%)
8804804 / 01 36.15 36.50 41.41 8451629 / 14 28.76 26.79 31.37
8350006 / 05 21.58 22.09 21.88 8181075 / 16 27.50 18.15 24.35
8496980 / 06 29.94 29.85 30.61 8673172 / 16 38.34 33.96 35.55
8618165 / 07 33.44 31.04 30.04 8617151 / 17 35.09 32.41 32.52
Cluster_0 8328122 / 08 32.25 30.99 28.73 8679346 / 26 66.19 64.71 62.37
8308588 / 21 26.09 24.33 24.77 Cluster_9 8184653 / 28 32.02 28.30 25.65
8176593 / 30 22.52 22.36 23.69 8211599 / 31 45.86 35.83 46.42
8566459 / 35 42.87 43.35 42.69 10598990 / 47 52.74 47.94 41.90
8282282 / 83 81.15 76.39 99.15 11081920 / 52 34.43 29.28 30.49
8342852 / 01 39.90 33.76 36.28 8376656 / 56 39.87 33.10 43.76
8482121 / 02 31.36 28.02 39.60 8540084 / 63 72.62 70.30 84.39
Cluster_1
8196671 / 06 25.83 24.27 25.52 Cluster_10 9393680 / 65 88.97 77.29 77.29
8487461 / 34 50.44 55.97 61.65 8459427 / 00 20.49 18.55 21.11
11462018 / 05 35.72 26.26 34.00 8466525 / 03 25.72 25.37 27.66
8196669 / 11 24.16 22.43 22.86 8478501 / 08 50.94 42.71 37.84
Cluster_2
10692972 / 11 47.06 37.23 37.41 8334780 / 16 26.04 23.41 23.66
9012348 / 75 135.07 99.25 129.30 8196621 / 16 35.32 34.66 34.87
Cluster_11
8685932 / 02 16.34 15.65 14.75 8419708 / 19 28.84 28.08 24.55
Cluster_3
8257054 / 20 25.62 20.13 21.64 8198267 / 20 34.52 32.76 34.91
8557605 / 07 37.46 35.74 34.41 8733828 / 23 29.85 29.65 31.72
8661542 / 15 28.48 28.38 30.04 8347238 / 23 35.98 33.72 36.00
Cluster_4 8196659 / 39 25.21 27.98 38.56 8687500 / 30 35.56 33.32 40.58
8156517 / 46 96.90 94.51 144.65 8523058 / 00 56.13 56.39 43.42
8326944 / 54 93.94 70.81 74.63 8655993 / 00 27.43 32.33 27.67
Cluster_12
10509861 / 25 27.99 23.38 23.54 8198319 / 01 30.44 28.74 36.80
Cluster_5 8273592 / 30 31.51 28.97 28.24 8487297 / 07 107.12 97.88 109.71
8147703 / 47 41.12 42.08 48.68 8273230 / 19 25.96 26.16 48.82
8149711 / 31 56.51 55.09 52.78 8351602 / 22 22.84 22.05 25.17
Cluster_6 8568209 / 37 57.23 29.61 68.91 Cluster_13 8504552 / 28 23.34 24.69 27.70
8145135 / 65 49.28 44.57 48.10 8487285 / 34 25.25 22.16 27.49
8264534 / 49 32.16 32.98 39.10 10702066 / 37 36.10 35.91 47.67
Cluster_7 8198345 / 54 49.99 53.18 55.37 8680284 / 02 57.96 50.70 51.39
8291712 / 68 134.73 124.47 131.34 Cluster_14 8432046 / 63 58.42 53.40 64.44
8519102 / 44 55.82 62.21 53.61 8257034 / 66 80.61 51.91 48.18
Cluster_8
10595596 / 62 70.17 49.55 42.66 Average 44.68 40.38 44.76

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

Figure 12 shows how the proposed method with the clustering


approach forecasts the energy consumption of each household
in Cluster_6. As shown in the figure, the prediction curve
follows the fluctuation of the original consumption on an
hourly basis for all three customers, except for a few peak
usage hours for customer id 8145135 and 8149711. This
occurs due to the fact that there are random peaks for an
individual household.
FIGURE 11. The average hourly power consumption of customer id
8680284.

FIGURE 12. Cluster_6 forecasting using proposed method with clustering.


the best predictor for only 10 households. The proposed
We observed from the hourly load profile of customer id method with clustering approach obtained an average MAPE
8680284 that the average hourly power consumption in the test of 44.76% which is very close to the average MAPE of
data was lower than that of training data as shown in FIGURE 44.68% obtained using the LSTM model. The proposed
11Figure 11. It is pertinent to note that for the training and method without clustering outperform the LSTM model as
validation data, the average power consumption is gradually well as the clustering-based approach for most of the
rising from around 06:00 pm whereas the peak average customers.
consumption occurs in the late evening. However, for the test
data, there is a sharp decrease in average consumption after
06:00 pm. The model may predict a peak in the late evening
however the actual power consumption is quite low which
results in larger MAPE. The proposed method without
clustering achieved a MAPE of 50.70 whereas the clustering-
based approach achieved second-best MAPE of 51.39 for
customer id 8680284.
The hourly profile of customer id 8568209 shows that the peak
average consumption occurs in the evening for the training and
FIGURE 13. The average hourly power consumption of customer id
validation data whereas there is no peak in the test data as
8568209.
shown in Figure 13. The prediction curve follows the original
The cluster analysis revealed information on the electricity
consumption for customer id 8568209 as shown in Figure 12,
use pattern of various households in SGSC database. The
however results in larger MAPE. This occurs due to the fact customers are classified in their energy profiles, depending
that the actual value of power consumption for the entire test on their structural similarity. The obtained findings are
data is close to zero which results in larger MAPE. constrained by the small sample size, i.e., only 69
The results presented in Table VI indicates that the clustering- households. Future study may have greater sample size and
based approach performs the best (lower MAPE value) for 17 more diverse samples.
out of 69 households, whereas the LSTM without clustering is

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

VI. CONCLUSIONS [4] S. Aslam, Z. Iqbal, N. Javaid, Z. A. Khan, K. Aurangzeb, and S. I.


This paper seeks to explore the short-term energy Haider, “Towards Efficient Energy Management of Smart Buildings
Exploiting Heuristic Optimization with Real Time and Critical Peak
consumption prediction problem of the individual household Pricing Schemes,” Energies, vol. 10, no. 12, p. 2065, Dec. 2017,
customers in the residential sector. Load forecasting at an Accessed: Aug. 19, 2020. [Online].
individual household level is quite challenging because it lacks [5] A. Mohsenian-Rad, V. W. S. Wong, J. Jatskevich, R. Schober, and
A. Leon-Garcia, “Autonomous Demand-Side Management Based on
a stable pattern and fluctuates even in consecutive hours. Game-Theoretic Energy Consumption Scheduling for the Future
First, a clustering technique is applied to identify the number Smart Grid,” IEEE Trans. Smart Grid, vol. 1, no. 3, pp. 320–331,
of outliers and to discover the regularity in daily power Dec. 2010.
[6] A. Khalid, S. Aslam, K. Aurangzeb, S. I. Haider, M. Ashraf, and N.
consumption profiles of individual household data. Next, a Javaid, “An Efficient Energy Management Approach Using Fog-as-
hybrid model is proposed, which is based on a combination of a-Service for Sharing Economy in a Smart Grid,” Energies, vol. 11,
CNN and LSTM. no. 12, p. 3500, Dec. 2018, Accessed: Aug. 19, 2020. [Online].
The developed framework is tested on a publicly available [7] D. Niyato, L. Xiao, and P. Wang, “Machine-to-machine
communications for home energy management system in smart
residential smart meter data from the SGSC project. The grid,” IEEE Commun. Mag., vol. 49, no. 4, pp. 53–59, Apr. 2011.
performance of the developed framework is comprehensively [8] N. W. A. Lidula and A. D. Rajapakse, “Microgrids research: A
compared to other state of the art systems in short-term electric review of experimental microgrids and test systems,” Renewable
Sustainable Energy Rev., vol. 15, no. 1, pp. 186–202, Jan. 2011.
load forecasting. The results indicate that the proposed hybrid [9] J. Massana, C. Pous, L. Burgas, J. Melendez, and J. Colomer, “Short-
CNN-LSTM based deep learning framework outperforms the term load forecasting in a non-residential building contrasting
other rival techniques in forecasting individual household models and attributes,” Energy and Buildings, vol. 92. pp. 322–330,
2015, doi: 10.1016/j.enbuild.2015.02.007.
energy consumption having both regular and irregular usage [10] C. Paoli, C. Voyant, M. Muselli, and M.-L. Nivet, “Solar Radiation
behavior. Forecasting Using Ad-Hoc Time Series Preprocessing and Neural
The prediction problem becomes more challenging for both Networks,” in Emerging Intelligent Computing Technology and
Applications, 2009, pp. 898–907.
the LSTM based model and the proposed hybrid approach as
[11] L. Martín, L. F. Zarzalejo, J. Polo, A. Navarro, R. Marchante, and M.
the number of outliers increases. However, the proposed Cony, “Prediction of global solar irradiance based on time series
model improves the overall average MAPE of individual analysis: Application to solar thermal power plants energy
household energy consumption prediction for both single-step production planning,” Solar Energy, vol. 84, no. 10, pp. 1772–1781,
Oct. 2010.
and multi-step forecasting. This improvement is more [12] W. Kong, Z. Y. Dong, Y. Jia, D. J. Hill, Y. Xu, and Y. Zhang, “Short-
noticeable when the outliers are relatively large. term residential load forecasting based on LSTM recurrent neural
The load forecasting model would reveal a better prediction network,” IEEE Trans. Smart Grid, vol. 10, no. 1, pp. 841–851,
2017.
result if parameters such as appliance ownership, [13] Smart Grid, Smart City, Australian Govern., Australia, Canberra,
sociodemographics data and household occupancy can be ACT, Australia, 2014. [Online]. Available:
detected and added as feature. Future research should focus on http://www.industry.gov.au/ENERGY/PROGRAMMES/SMARTG
RIDSMARTCITY/Pages/default.
further exploring the behavioral characteristics of customers [14] S. M. Elgarhy, M. M. Othman, A. Taha, and H. M. Hasanien, “Short
and using that data to the load forecasting model. term load forecasting using ANN technique,” in 2017 Nineteenth
International Middle East Power Systems Conference (MEPCON),
CONFLICTS OF INTEREST Dec. 2017, pp. 1385–1394.
[15] S. Bouktif, A. Fiaz, A. Ouni, and M. A. Serhani, “Optimal Deep
The authors declare no conflicts of interest. Learning LSTM Model for Electric Load Forecasting using Feature
Selection and Genetic Algorithm: Comparison with Machine
AUTHOR CONTRIBUTIONS Learning Approaches,” Energies, vol. 11, no. 7, p. 1636, Jun. 2018,
All three authors equally contributed to the manuscript. Accessed: Aug. 19, 2020. [Online].
[16] H. Bendu, B. B. V. L. Deepak, and S. Murugan, “Multi-objective
optimization of ethanol fuelled HCCI engine performance using
ACKNOWLEDGMENT hybrid GRNN–PSO,” Appl. Energy, vol. 187, pp. 601–611, Feb.
The authors extend their appreciation to the Deputyship for 2017.
Research & Innovation, “Ministry of Education“ in Saudi [17] R. C. Deo, X. Wen, and F. Qi, “A wavelet-coupled support vector
machine model for forecasting global incident solar radiation using
Arabia for funding this research work through the project limited meteorological dataset,” Appl. Energy, vol. 168, pp. 568–
number IFKSURG-1438-034. 593, Apr. 2016.
[18] M. Hossain, S. Mekhilef, M. Danesh, L. Olatomiwa, and S.
REFERENCES Shamshirband, “Application of extreme learning machine for short
[1] J. G. Jetcheva, M. Majidpour, and W.-P. Chen, “Neural network term output power forecasting of three grid-connected PV systems,”
model ensembles for building-level electricity load forecasts,” J. Clean. Prod., vol. 167, pp. 395–405, Nov. 2017.
Energy Build., vol. 84, pp. 214–223, Dec. 2014. [19] M. Khan, N. Javaid, S. Javaid, and K. Aurangzeb, “Short Term
[2] R. Ballini and R. R. Yager, “OWA filters and forecasting models Power Load Probability Forecasting by Kernel based Support Vector
applied to electric power load time series,” Evolving Systems, vol. 5, Quantile Regression for Real-Time Data Analysis,” in Proceedings
no. 3, pp. 159–173, Sep. 2014. of the International Conference on Innovation and Intelligence for
[3] J. Zheng, C. Xu, Z. Zhang, and X. Li, “Electric load forecasting in Informatics, Computing, and Technologies, Nov 2018.
smart grids using Long-Short-Term-Memory based Recurrent [20] S. Lin, X. Liu, J. Fang, and Z. Xu, “Is extreme learning machine
Neural Network,” in 2017 51st Annual Conference on Information feasible? A theoretical assessment (part II),” IEEE Trans Neural
Sciences and Systems (CISS), Mar. 2017, pp. 1–6. Netw Learn Syst, vol. 26, no. 1, pp. 21–34, Jan. 2015.

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/
This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication. Citation information: DOI
10.1109/ACCESS.2020.3028281, IEEE Access

[21] J. Mareček, “Usage of Generalized Regression Neural Networks in [34] R. P. Singh, P. X. Gao, and D. J. Lizotte, “On hourly home peak load
Determination of the Enterprise’s Future Sales Plan,” Littera Scr, prediction,” 2012 IEEE Third International Conference on Smart
vol. 3, pp. 32–41, 2016. Grid Communications (SmartGridComm). 2012, doi:
[22] H. Bhavsar and M. H. Panchal, “A review on support vector machine 10.1109/smartgridcomm.2012.6485977.
for data classification,” International Journal of Advanced Research [35] Y. Ding, J. Borges, M. A. Neumann, and M. Beigl, “Sequential
in Computer Engineering & Technology, vol. 1, no. 10, 2012, pattern mining — A study to understand daily activity patterns for
[Online]. load forecasting enhancement,” 2015 IEEE First International Smart
[23] X. Cao, S. Dong, Z. Wu, and Y. Jing, “A Data-Driven Hybrid Cities Conference (ISC2). 2015, doi: 10.1109/isc2.2015.7366169.
Optimization Model for Short-Term Residential Load Forecasting,” [36] A. Veit, C. Goebel, R. Tidke, C. Doblander, H.A. Jacobsen,
in 2015 IEEE International Conference on Computer and “Household electricity demand forecasting: benchmarking state-of-
Information Technology; Ubiquitous Computing and the-art methods,” in Proceedings of the 5th International Conference
Communications; Dependable, Autonomic and Secure Computing; on Future Energy Systems. June 2014.
Pervasive Intelligence and Computing, Oct. 2015, pp. 283–287. [37] A. Naz, N. Javaid, M. B. Rasheed, A. Haseeb, M. Alhussein, and K.
[24] Z. Yun, Z. Quan, S. Caixin, L. Shaolan, L. Yuming, and S. Yang, Aurangzeb, “Game Theoretical Energy Management with Storage
“RBF Neural Network and ANFIS-Based Short-Term Load Capacity Optimization and Photo-Voltaic Cell Generated Power
Forecasting Approach in Real-Time Price Environment,” IEEE Forecasting in Micro Grid,” Sustainability, vol. 11, no. 10. p. 2763,
Trans. Power Syst., vol. 23, no. 3, pp. 853–858, Aug. 2008. 2019, doi: 10.3390/su11102763.
[25] H. Li, Y. Zhao, Z. Zhang, and X. Hu, “Short-term load forecasting [38] K. Aurangzeb, "Short Term Power Load Forecasting using Machine
based on the grid method and the time series fuzzy load forecasting Learning Models for energy management in a smart
method,” in Proceedings of the International Conference on community," 2019 International Conference on Computer and
Renewable Power Generation (RPG), Oct. 2015. Information Sciences (ICCIS), doi: 10.1109/ICCISci.2019.8716475.
[26] P. Qingle and Z. Min, “Very Short-Term Load Forecasting Based on [39] R. Khalid, N. Javaid, F.A. Al-zahrani, K. Aurangzeb, E.-U.-H. Qazi,
Neural Network and Rough Set,” in 2010 International Conference T. Ashfaq, “Electricity Load and Price Forecasting Using Jaya-Long
on Intelligent Computation Technology and Automation, May 2010, Short Term Memory (JLSTM) in Smart Grids,” Entropy, vol. 22, no.
vol. 3, pp. 1132–1135. 10, 2019, doi: 10.3390/e22010010
[27] R. Zhang, Y. Xu, Z. Y. Dong, K. Meng, and K. P. Wong, “Short- [40] M. Alhussein, S. I. Haider, and K. Aurangzeb, “Microgrid-Level
term load forecasting of Australian National Electricity Market by an Energy Management Approach Based on Short-Term Forecasting of
ensemble model of extreme learning machine,” IET Generation, Wind Speed and Solar Irradiance,” Energies, vol. 12, no. 8. p. 1487,
Transmission & Distribution, vol. 7, no. 4. pp. 391–397, 2013, doi: 2019, doi: 10.3390/en12081487.
10.1049/iet-gtd.2012.0541. [41] B. Stephen, X. Tang, P. R. Harvey, S. Galloway, and K. I. Jennett,
[28] R. Zhang, Y. Xu, Z. Y. Dong, W. Kong, and K. P. Wong, “A “Incorporating Practice Theory in Sub-Profile Models for Short
composite k-nearest neighbor model for day-ahead load forecasting Term Aggregated Residential Load Forecasting,” IEEE Transactions
with limited temperature forecasts,” 2016 IEEE Power and Energy on Smart Grid, vol. 8, no. 4. pp. 1591–1598, 2017, doi:
Society General Meeting (PESGM). 2016, doi: 10.1109/tsg.2015.2493205.
10.1109/pesgm.2016.7741097. [42] M. Ester, H.P. Kriegel, J. Sander, X. Xu, “A density-based algorithm
[29] F. H. Al-Qahtani and S. F. Crone, “Multivariate k-nearest neighbor for discovering clusters in large spatial databases with noise,” in
regression for time series data — A novel algorithm for forecasting Proceedings of the Second International Conference on Knowledge
UK electricity demand,” The 2013 International Joint Conference on Discovery and Data Mining, Aug. 1996.
Neural Networks (IJCNN). 2013, doi: 10.1109/ijcnn.2013.6706742. [43] I. Ullah, M. Hussain, E.u.H. Qazi, H. Aboalsamh, “An automated
[30] M. Ghofrani, M. Ghayekhloo, A. Arabali, and A. Ghayekhloo, “A system for epilepsy detection using EEG brain signals based on deep
hybrid short-term load forecasting with a new input selection learning approach,” Expert Syst. Appl., 107, pp. 61 – 71, 2018, doi:
framework,” Energy, vol. 81. pp. 777–786, 2015, doi: 10.1016/j.eswa.2018.04.021.
10.1016/j.energy.2015.01.028. [44] D. Hsu, “Time Series Forecasting Based on Augmented Long Short-
[31] M. Chaouch, “Clustering-Based Improvement of Nonparametric Term Memory.,” arXiv: Neural and Evolutionary Computing, 2017.
Functional Time Series Forecasting: Application to Intra-Day [45] T. Räsänen, D. Voukantsis, H. Niska, K. Karatzas, and M.
Household-Level Load Curves,” IEEE Transactions on Smart Grid, Kolehmainen, “Data-based method for creating electricity use load
vol. 5, no. 1. pp. 411–419, 2014, doi: 10.1109/tsg.2013.2277171. profiles using large amount of customer-specific hourly measured
[32] M. Ghofrani, M. Hassanzadeh, M. Etezadi-Amoli, and M. S. Fadali, electricity use data,” Appl. Energy, vol. 87, no. 11, pp. 3538–3545,
“Smart meter based short-term load forecasting for residential Nov. 2010.
customers,” 2011 North American Power Symposium. 2011, doi: [46] J. J. López, J. A. Aguado, F. Martín, F. Muñoz, A. Rodríguez, and J.
10.1109/naps.2011.6025124. E. Ruiz, “Hopfield–K-Means clustering algorithm: A proposal for
[33] H. Ziekow, C. Goebel, J. Struker, and H.-A. Jacobsen, “The potential the segmentation of electricity customers,” Electric Power Systems
of smart home sensors in forecasting household electricity demand,” Research, vol. 81, no. 2, pp. 716–724, 2011.
2013 IEEE International Conference on Smart Grid [47] J. Kwac, J. Flora, and R. Rajagopal, “Household Energy
Communications (SmartGridComm). 2013, doi: Consumption Segmentation Using Hourly Data,” IEEE Trans. Smart
10.1109/smartgridcomm.2013.6687962. Grid, vol. 5, no. 1, pp. 420–430, 2014.

VOLUME XX, 2019 3

This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/

You might also like