2023 07 Dhingra Thesis 01
2023 07 Dhingra Thesis 01
2023 07 Dhingra Thesis 01
Machine Learning
Abstract
Solar photovoltaic (PV) power forecasting is a crucial aspect of efficient energy manage-
ment in the renewable energy sector. This thesis explores the application of various types
of artificial neural networks (ANNs) for predicting PV power output by considering vari-
ous variables that affect the power output. The proposed ANNs are used to forecast the
output power for different PV technologies while considering different prediction horizons.
Additionally, the impact of panel ageing is investigated using different machine learning
models. To evaluate the performance of the proposed ANNs, real-world PV power data
was collected and preprocessed. The preprocessed data was then used to train and test dif-
ferent ANNs, including recurrent neural networks, autoencoders and convolutional neural
networks. The experimental results show that the proposed ANNs can accurately pre-
dict PV power output, with LSTM demonstrating the best performance for short-term
forecasting. Furthermore, the impact of panel ageing on PV power was analyzed using
different machine learning models, including linear regression and predictive analysis. The
results show that the machine learning models can effectively predict the degradation of
PV panel performance over time.
To improve the accuracy of predictions, the effects of splitting the dataset into two dis-
tinct datasets - sunny and cloudy - is investigated. Furthermore, a separate prediction
model is utilized for each of these datasets. The results indicate that clustering the
dataset leads to improved prediction accuracy. Overall, this thesis provides a compre-
hensive analysis of the application of different ANNs for solar PV power forecasting and
the impact of panel ageing on PV power output. The results demonstrate the potential
of using machine learning techniques for accurate and reliable solar PV power forecasting.
Per migliorare l’accuratezza delle previsioni, viene studiato l’impatto della suddivisione
del dataset in due sottodataset: soleggiato e nuvoloso. Inoltre, viene utilizzato un modello
di previsione dedicato per ciascun sottodataset. I risultati indicano che la suddivisione del
dataset porta a un miglioramento dell’accuratezza delle previsioni. In generale, questa tesi
fornisce un’analisi esaustiva dell’applicazione di diverse ANN per la previsione dell’energia
solare PV e dell’impatto dell’invecchiamento dei pannelli sull’output dell’energia PV. I
risultati dimostrano il potenziale dell’utilizzo delle tecniche di apprendimento automatico
per una previsione accurata e affidabile dell’energia solare PV.
Contents
Abstract i
Contents v
1 Introduction 1
1.1 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Thesis Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Thesis Structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
2 Literature Review 7
2.1 Solar Power Forecasting Methods . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1 Statistical Methods . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.1.1 Time Series based Forecasting Techniques . . . . . . . . . 7
2.1.1.2 Machine Learning based Forecasting - Artificial Neural
Networks . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
2.1.1.3 Deep Learning: State-of-the-art in Machine Learning . . . 13
2.1.2 Sky Imagers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.1.3 Satellite Imaging . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.1.4 Numerical Weather Prediction (NWP) . . . . . . . . . . . . . . . . 20
2.1.4.1 Persistence Forecast . . . . . . . . . . . . . . . . . . . . . 20
2.1.4.2 Physical Model . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Prediction Method Selection . . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2.1 Prediction Horizon Selection . . . . . . . . . . . . . . . . . . . . . . 22
2.2.2 Prediction Mode Selection . . . . . . . . . . . . . . . . . . . . . . . 23
6 Appendix 87
List of Figures 97
Acknowledgements 103
1
1| Introduction
1.1. Motivation
Due to mounting concerns regarding the emissions of greenhouse gases and environmental
pollution stemming from the excessive utilization of fossil fuel-based energy sources, the
significance of renewable energy sources in the field of power generation has escalated [28].
Consequently, it has become imperative to explore alternative energy sources to supplant
fossil fuels. By the end of 2019, the cost-effectiveness of generating energy from wind and
photovoltaic (PV) installations had surpassed that of conventional fossil fuel-powered
plants in some regions. Moreover, in specific regions, the installation of new wind and
solar PV facilities has been found to be more cost-effective than the ongoing operation of
existing fossil fuel-based power plants [25]. Considering the characteristics of solar energy,
such as its clean nature, abundant and free availability, and renewable attributes, there
has been a significant increase in the deployment of PV panels for harnessing solar power
in recent years [41]. Due to nations’ interest in investing in renewable energy sources, it
is probable that the installation of PV panels will continue to rise. Figure 1.1 depicts the
worldwide capacity of solar photovoltaics from 2011 to 2021.
solar panels. As a result, the power output from solar panels can fluctuate due to these
variables [9].
Figure 1.4 demonstrates the variability in power output within a PV array under diverse
conditions, thereby posing a potential threat to the reliability and stability of the power
system.
Given the escalating importance of accurate solar prediction, the field of solar forecasting
has garnered considerable attention as a promising area of research. Consequently, nu-
merous studies and research endeavors have been conducted in recent years to enhance
the precision of solar predictions in order to meet the required standards. Accurate pre-
diction of photovoltaic (PV) power output is important for optimizing the performance
of solar power systems [37]. Here are some tools that can be used to achieve this:
1. Solar irradiance models: These models use meteorological data to predict the amount
of solar irradiance that will be received at a particular location and time. Some com-
monly used models include the Clear Sky Model and the Linke Turbidity Model.
3. Physical models: Physical models can be used to simulate the behavior of a solar
panel and predict its power output. These models take into account factors such as
temperature, shading, and panel orientation to make predictions.
4. Hybrid models: Hybrid models combine the strengths of different types of models to
achieve more accurate predictions. For example, a hybrid model may use a machine
learning algorithm to make predictions based on historical data, and then use a
physical model to adjust these predictions based on real-time conditions.
In addition to these tools, it is also important to consider other factors that can affect PV
power output, such as the age of the panel, its efficiency, and the level of maintenance it
has received.
1| Introduction 5
Chapter 3 presents a detailed description of the data and tools utilized in this study.
Chapter 4 is dedicated to data exploration and preprocessing, which are crucial steps
in the research. The chapter also introduces the performance metrics employed in this
study to evaluate the prediction models. Hyperparameter tuning is conducted to identify
the optimal set of hyperparameters for the models. The obtained results from different
6 1| Introduction
Moreover, the chapter investigates the potential of splitting the dataset into sunny and
overcast days and explores the use of multiple prediction models to enhance the accuracy of
solar predictions. All the obtained results, along with their interpretations, are presented
comprehensively in this chapter.
Finally, in Chapter 5, the thesis study concludes by summarizing the key findings and
outcomes of the study. Additionally, potential areas for improvement and future research
directions are proposed, offering avenues for further advancement in this field.
7
2| Literature Review
This chapter is dedicated to conducting a comprehensive literature review and studying
existing solar forecasting methods. It also entails a meticulous examination of the distinct
characteristics associated with each technique under consideration.
or even finer timeframes, contingent upon the temporal dynamics of the specific variable
under consideration [1]. Several time series-based forecasting techniques include:
In Equation (2.1), Yt represents the current observation, Ŷt is the predicted value,
and α is the smoothing constant, which ranges between 0 and 1. The forecasting
equation calculates the predicted value at time t + 1 as the sum of the last predicted
value Ŷt and the forecast adjustment factor α(Yt − Ŷt ).
p q
X X
X(t) = αi X(t − i) + βj e(t − j) (2.2)
i=1 j=1
(i) Autoregressive (AR) term: This term signifies that the future values of a time
series depend on its past values. In other words, the current value of a time series
can be predicted based on its previous values.
(ii) Moving Average (MA) term: This term indicates that the future values of a
time series are influenced by the errors or residuals of past observations. In other
words, the current value of a time series can be predicted based on the errors made
in predicting its previous values.
(iii) Integrated (I) term: This term refers to the requirement of differencing to
transform a time series into a stationary one, where the mean and variance remain
constant over time.
Where y(t) is the value of the time series at time t; c is a constant term or inter-
cept; ϕ(1), . . . , ϕ(p) are the coefficients of the autoregressive terms of the model,
which represent the effect of the past values of the time series on its current value;
θ(1), . . . , θ(q) are the coefficients of the moving average terms of the model, which
represent the effect of the past errors or residuals on the current value of the time
series; e(t) is the error or residual term at time t, which represents the difference
between the predicted value and the actual value of the time series at that time. d
is the order of differencing required to make the time series stationary.
Neurons are organized into layers in an ANN. The most common arrangement is a layered
2| Literature Review 11
architecture consisting of an input layer, one or more hidden layers, and an output layer.
The input layer receives the initial input data, while the output layer produces the final
output of the network. The hidden layers are intermediary layers between the input and
output layers, responsible for extracting and transforming features from the input data.
Some of the commonly used network topologies in ANNs for forecasting include:
The number of nodes in the hidden layer plays a crucial role in the MLP’s capacity
to capture complex nonlinear functions. Research conducted by Hegazy et al. in
[20] has demonstrated that a single hidden layer with a sufficient number of hidden
nodes is capable of effectively representing complex nonlinear functions. However,
as the number of nodes in a Multi-Layer Perceptron Neural Network (MLPNN)
increases, challenges such as overfitting and training issues can arise [8]. Figure 2.3
illustrates a simplified architecture of an MLPNN.
just by the input at that time step, but also by the output of the preceding time step.
This means that RNNs can represent the context of the input sequence, which is very
useful for natural language processing, speech recognition, and music production.
An RNN’s design generally comprises a hidden state that is changed at each time
step based on the input and the preceding hidden state. The output at that time
step is then generated using this concealed state. The hidden state is a learnt
representation of prior inputs that is carried over to the next time step.
Several RNN variants, such as Long Short-Term Memory (LSTM) and Gated Re-
current Units (GRU), have been created to overcome the problem of vanishing gra-
dients, which can occur during regular RNN training. Additional gates govern the
flow of information in the network, allowing for higher memory retention and more
steady training. Figure 2.4 shows the basic architecture of a Recurrent Neural Net-
work.
For time series forecasting with ELM, the input variables are the past observations
of the time series, and the output variable is the forecasted value of the time series
at the next time step. The ELM algorithm can learn the underlying patterns and
dependencies in the time series and use them to predict the future values. One of the
advantages of ELM for time series forecasting is its fast training speed, which is due
to the use of random weights and the pseudo-inverse method. This makes ELM suit-
able for handling large datasets and high-dimensional input spaces. However, ELM
may not be as accurate as traditional neural networks in some cases, and careful
selection of hyperparameters may be necessary to achieve optimal performance.
Deep learning neural networks require a vast amount of data to be trained accurately,
and the training process involves revising the neurons’ weights and biases depending on
the difference between the expected and real output [15]. This is usually accomplished
through the use of a technique known as backpropagation, which computes the gradient
of the error with respect to each weight and bias in the network. There are many different
types of deep learning neural networks, but some of the most commonly used architectures
include:
Input Layer: The input layer receives the raw image data, which is typically
represented as a matrix of pixel values.
Convolutional Layer: The convolutional layer applies a set of filters to the input
image to extract features. Each filter is a small matrix of weights, which is applied
to a small portion of the input image at a time (a sliding window). The output
of the convolutional layer is a set of feature maps, where each map represents the
response of a specific filter. The output feature map is calculated as follows:
XX
Fi,j = Ii+m,j+n × Km,n (2.4)
m n
where Fi,j is the output value at position (i, j) in the feature map, I is the input
2| Literature Review 15
image, K is the filter matrix, and m and n are the indices of the filter matrix.
ReLU Layer: The ReLU (rectified linear unit) layer applies the element-wise ac-
tivation function f (x) = max(0, x) to the output of the convolutional layer. This
introduces non-linearity into the model and helps to improve its performance.
Pooling Layer: The pooling layer reduces the dimensionality of the feature maps
by down-sampling them. The most common pooling operation is max-pooling,
which takes the maximum value within a sliding window. This operation is applied
separately to each feature map.
Fully Connected Layer: The fully connected layer takes the output of the pre-
vious layer and applies a linear transformation to produce the final output. This
layer is similar to the one in a standard neural network, except that it receives a
set of feature maps instead of a vector. The output of the fully connected layer is
calculated as followsy = W x+b where x is the input vector, W is the weight matrix,
b is the bias vector, and y is the output vector.
Softmax Layer: The softmax layer applies the softmax function to the output of
the fully connected layer to produce a probability distribution over the classes. This
function normalizes the output vector so that the values sum up to 1.
ezj
σ(z)j = P z (2.5)
ke
k
where σ(z)j is the output probability for class j, z is the input vector, and the sum
is taken over all classes.
Output Layer: The output layer produces the final prediction by selecting the
class with the highest probability from the softmax distribution. This is a basic
overview of the CNN architecture and its mathematical equations. In practice,
there are many variations and extensions of this model, including different types of
layers, regularization techniques, and optimization methods.
The reset gate determines how much of the previous hidden state should be for-
gotten, while the update gate determines how much of the new input should be
incorporated into the current hidden state [30]. These gates are controlled by sig-
moid functions that output values between 0 and 1.
where zt is the update gate at time t, σ is the sigmoid function, Wz , Uz , and bz are
the weight matrices and bias vector associated with the update gate, xt is the input
at time t, and ht−1 is the hidden state from the previous time step. The reset gate
is defined similarly:
rt = σ(Wr xt + Ur ht−1 + br ) (2.7)
where rt is the reset gate at time t, Wr , Ur , and br are the weight matrices and bias
vector associated with the reset gate. Using the reset gate, we can determine how
much of the previous hidden state should be forgotten:
where h′t is the new candidate hidden state at time t, tanh is the hyperbolic tangent
activation function, Wh , Uh , and bh are the weight matrices and bias vector associ-
ated with the candidate hidden state. Finally, we use the update gate to determine
how much of the new candidate hidden state should be incorporated into the current
2| Literature Review 17
hidden state:
ht = (1 − zt ) ∗ ht−1 + zt ∗ h′t (2.9)
where ht is the new hidden state at time t, and ht−1 is the previous hidden state.
In an LSTM network, there are three main components: the input gate, the forget
gate, and the output gate [16]. Each of these gates has its own set of weights and
biases, which are learned during the training process. The input and output gates
control the flow of information into and out of the LSTM cell, while the forget gate
determines which information to discard from the cell state [13]. Figure 2.7 shows
basic architecture of an LSTM unit.
The LSTM cell updates its internal state based on the input, the previous cell state,
and the output from the previous time step. The internal state of the cell is also
known as the hidden state. The cell state is updated as follows:
ct = ft ∗ ct−1 + it ∗ gt (2.14)
ht = ot ∗ tanh(ct ) (2.15)
where xt is the input at time step t; ht is the hidden state at time step t; ct is
the cell state at time step t; ft , it , and ot are the forget, input, and output gates
at time step t, respectively; gt is the candidate cell state at time step t; σ is the
sigmoid activation function which represents element-wise multiplication; tanh is
the hyperbolic tangent activation function; Wf , Wi , Wo , Wg are weight matrices for
the forget, input, output, and candidate gate, respectively; bf , bi , bo , bg are bias terms
for the forget, input, output, and candidate gate, respectively
The output of the LSTM cell at time step t is the hidden state, ht . This output can
be fed into another layer for further processing or used directly as the forecasted
value. LSTM is a powerful tool for time series forecasting tasks, especially when
dealing with long-term dependencies. Its internal architecture allows it to selectively
retain or discard information from the input, ensuring that relevant information is
propagated through the network while irrelevant information is filtered out [38].
• Autoencoders are a type of neural network commonly used for unsupervised learn-
ing tasks, such as dimensionality reduction and feature extraction. In the context
of time series forecasting, autoencoders can be used to learn a compressed represen-
tation of the time series data, which can then be used to make predictions.
The basics of autoencoders for time series forecasting along with some mathematical
equations are:
Encoding: The first step in an autoencoder is to encode the input time series data
into a lower-dimensional representation. This is done using an encoder function,
which maps the input time series x(t) to a lower-dimensional representation z(t)
using a set of learned parameters. The encoder function can be written as:
where f is the encoder function, θ are the learned parameters, and z(t) is the encoded
representation of x(t).
Decoding: The next step is to decode the encoded representation z(t) back into a
2| Literature Review 19
reconstruction of the original time series data. This is done using a decoder function,
which maps the encoded representation z(t) back to the original time series x(t).
The decoder function can be written as:
where g is the decoder function, ϕ is the learned parameter, and x̂(t) is the recon-
struction of x(t) from z(t).
Loss function: To train the autoencoder, we need a way to measure how well the
reconstructed time series matches the original time series. This is done using a loss
function, which measures the difference between the reconstructed time series x̂(t)
and the original time series x(t). The loss function can be written as:
Forecasting: Once the autoencoder is trained, it can be used for time series fore-
casting by feeding in a sequence of input time steps and predicting the next time
step using the encoded representation. This is done by encoding the input time steps
using the encoder function, and then using the encoded representation to make a
prediction using a linear or nonlinear regression model.
agers can detect the presence of clouds, estimate their height above the ground, and
calculate their motion velocity. Moreover, the detection of cloud shadows allows sky
imagers to identify sudden changes in solar irradiance [42].
However, it is important to note that sky imagers have a relatively limited prediction
horizon, typically up to 30 minutes. This means their forecasting capability is more
suitable for short-term predictions [41].
The fundamental principle of the persistence model is based on the assumption that the
power output remains relatively constant if weather conditions and external factors remain
unchanged. The forecasting equation for this model is represented as follows:
n−1
1X
P (t + k|t) = P (t − i∆t) (2.19)
T i=0
2| Literature Review 21
In Equation (2.19), the variable k represents the forecast time period, and P (t + k|t)
denotes the predicted power at time t + k based on the available information at time t. T
denotes the duration of the forecast interval, while n represents the number of historical
measurements considered. The term P (t−i∆t) corresponds to the actual power measured
at time t and at time steps i within the forecast interval T. Additionally, ∆t signifies the
time difference between consecutive measurements in the time series.
It is important to note that the persistence model is often used as a benchmark for
evaluating the performance of other forecasting methods. However, as the time horizon
increases, the accuracy of the persistence model decreases significantly, particularly when
climate conditions change over time [1].
In contrast to persistence forecasting that relies solely on historical data, physical models
take into account the fundamental physics and engineering principles that govern the
performance of solar PV systems. These models consider the dynamic properties of the
atmosphere and can provide forecasts for a time horizon of more than 15 days. They
utilize a set of numerical equations to mathematically describe and model the physical
conditions and their interactions.
∆A
= F (A) (2.20)
∆t
In Equation (2.20), ∆A denotes the change in the value of the forecasted response at a
specific spatial location, ∆t represents the change in time or temporal horizon, and F (A)
signifies the mathematical function that incorporates the variables influencing the value
of A. The physical model aims to capture the relationships and interactions among these
variables in order to predict the changes in the forecasted response over time.
on several factors, including the nature of the problem, the type of data, the desired
prediction accuracy, prediction horizon and computational resources available.
Based on the forecast horizon, solar power generation can be categorized into four groups,
as described in previous studies [42],[11].
Short-term Prediction: For short-term prediction, the forecast horizon typically spans
up to 72 hours into the future. This type of forecasting is useful for tasks like unit
commitment, scheduling, electrical power dispatching, and other operational purposes.
Figure 2.9 provides an overview of the applications of different prediction horizons, show-
casing the suitability of statistical methods, especially Artificial Neural Networks (ANNs),
for solar forecasting tasks.
2| Literature Review 23
In their publication [14], Gigoni, Betti, Crisostomi, and their collaborators investigated the
reasons behind the superior accuracy of direct methods over indirect methods in PV solar
forecasting. One of the key factors they identified is that indirect prediction models used in
the estimation of PV output are mere approximations of real-world PV systems, lacking
the ability to capture every possible physical phenomenon that may occur. Moreover,
the physical characteristics of PV systems gradually deteriorate over time, leading to
diminished reliability in the forecasts generated by indirect methods. Consequently, the
research project under consideration focuses on exploring and evaluating direct techniques
for PV solar forecasting.
25
3.1. Data
The data are sourced from the National Renewable Energy Laboratory Photovoltaic Data
Acquisition (NREL PVDAQ), which is a large-scale time-series database containing sys-
tem metadata and performance data from a variety of experimental PV sites like San
Francisco, USA as shown in Figure 3.1 and commercial public PV sites [26].
Photovoltaic field array data are made up of time-series, raw performance data collected
by a number of sensors linked to a PV system. Two datasets are utilized - One ranging
from 2013 to 2018 with one-minute sampling interval and the other from 2011 to 2019
with 15-minute sampling interval. Figure 3.2 displays a selection of rows from the power
dataset, offering contextual information.
Some meteorological data was also available like ambient temperature and solar irradiance.
For some systems, this data was available in different datasets [26]. To provide context,
Figure 3.3 showcases a selection of rows from the weather dataset, where the sampling
rate is also set at one minute.
Furthermore, the data lake offered data for 156 PV systems deployed in the previously
indicated locations. Because there were data discrepancies involving some of the systems.
The System 4 dataset was used for this study since it had comprehensive data from 2013
to 2018. Furthermore, because these units were connected to the grid, a massive amount
of data pertaining to them was also available. The dataset had inconsistencies such as
negative values and extremely high values. These issues will be addressed in Chapter 4.
Jupyter Notebooks are a web-based interactive computing platform that allows users
to create and share documents that contain live code, equations, visualizations, and nar-
rative text. The Jupyter notebook interface consists of cells. Each cell can contain code,
Markdown text, or raw text. You can run individual cells and see the output immediately,
which makes it easy to experiment and iterate.
Matplotlib is a plotting library for the Python programming language. Matplotlib pro-
vides a large library of customizable plot types, including line plots, scatter plots, bar
plots, error bars, histograms, bar charts, pie charts, box plots, violin plots, density plots,
and more.
3| Data and Tools 27
NumPy is a Python library used for scientific computing and data analysis. It provides a
high-performance multidimensional array object and tools for working with these arrays.
Pandas is a fast, powerful, flexible and easy to use open-source data analysis and data
manipulation library built on top of the Python programming language. It provides data
structures for efficiently storing large datasets and tools for working with them.
TensorFlow is an open-source software library for machine learning and deep learning.
It provides a flexible and powerful platform for building and deploying machine learning
models, allowing users to create and train neural networks for a wide range of tasks such
as image classification, natural language processing, and time-series forecasting.
29
To transform the dataset into a usable format, each year’s data is considered. The Date-
Time column is set as the index of the data frame, enabling the processing of datasets
based on the index variable. Each day of the year corresponds to its own dataset, which
may exhibit inconsistencies such as varying numbers of columns. To address these discrep-
ancies, the datasets for each year are combined, and unnecessary columns are removed.
Efficient pre-processing steps were implemented to filter out any solar irradiance values
with missing associated PV power values, as well as PV power records with missing
irradiance data. The detailed steps are as follows:
• During the early and late hours of the day, negative solar radiation values and miss-
ing associated power values are often observed. These occurrences can be attributed
to sensor offsets and inverter failures, respectively. To address this, it is advisable
to set the radiation and PV output values to zero (0) in such instances.
• The absence of solar radiation and output power data during mid-day periods may
be ascribed to malfunctions in solar irradiation sensors, inverters, or network dis-
ruptions. In order to ensure accurate analysis, it is recommended to exclude these
data points from further processing.
An example of how missing values in the data frame are made to zero is depicted in Figure
4.2.
Count: The count refers to the number of records or observations for each attribute.
Mean: The mean represents the average value of each attribute, obtained by summing
all the values and dividing by the count.
Std: The standard deviation (std) indicates the amount of dispersion or variability of
data around the mean. It provides information about the spread of values within the
dataset.
4| Implementation and Results 31
Min: The minimum value is the smallest observed value among the data points for a
given attribute.
25%: The 25th percentile, also known as the lower quartile, divides the data into four
equal parts, with 25
50%: The 50th percentile represents the median, which is the middle value in a sorted
dataset. It indicates the value that divides the data into two equal halves, revealing
information about the distribution’s skewness.
75%: The 75th percentile, or the upper quartile, indicates the threshold below which 75
Max: The maximum value is the largest observed value among the data points for a
given attribute.
Table 4.1 displays statistical analysis for the dataset corresponding to 2018 and Table 4.2
displays statistical analysis for the dataset corresponding to 2014-2016 which has been
used for training, testing and validating the model.
In terms of days, the PV power output is affected by several factors that cause variations
in the amount of solar radiation reaching the panels. These factors include the position of
the sun in the sky, the presence of clouds, and atmospheric conditions such as humidity
and air pollution. As a result, the PV power output typically follows a daily pattern,
with a peak around midday when the sun is highest in the sky and the least amount of
shading occurs. In this case, we have found a high periodicity in terms of 24 hours and
12 hours.
In terms of years, the PV power output is affected by the seasonal changes in solar
radiation. This is because the sun’s angle in the sky changes throughout the year, which
affects the amount of solar radiation reaching the panels. In general, PV power output is
highest in the summer months when the sun is highest in the sky and days are longest,
and lowest in the winter months when the sun is at its lowest angle and days are shortest.
Figure 4.3 depicts the spectrum of the power signal.
The correspondence between PV output power and solar irradiance can be clearly ob-
served. Therefore, as solar irradiance levels vary throughout the day depending on the
weather conditions, the output power of a PV system will also fluctuate and understand-
ing the correspondence between PV output power and solar irradiance is important for
designing and optimizing PV systems for maximum efficiency and output.
(a) Power and Irradiance (June 2014) (b) Power and Irradiance (January 2015)
Better handling of scale: Normalizing the data helps handle the differences in scale be-
tween different features or variables. This can help prevent some features from dominating
others, which can skew the results and make the model less interpretable.
4| Implementation and Results 35
Improved interpretability: Normalizing the data can make it easier to compare and
interpret the results, especially when comparing variables or features with different scales.
Easier feature engineering: Normalizing the data can make it easier to engineer new
features or to combine existing features, since the data is on the same scale.
Supposing power time series is Yi = Y1 , Y2 , ..., Yn , Ŷi is the predicted time series, and
indicates time.
n
1X
M SE = (Yi − Ŷi )2 (4.1)
n i=1
36 4| Implementation and Results
where n is the number of instances in the dataset and the symbol represents the sum
P
The MSE measures the average magnitude of the errors in the predictions, and penalizes
large differences more than smaller ones. By squaring the differences, the loss function
ensures that the error is positive and assigns a higher loss value to larger errors. The MSE
is commonly used because it’s easy to compute and differentiable, making it suitable for
optimization with gradient-based methods.
Pn
|yi − xi |
M AE = i=1
(4.2)
n
where yi is the prediction and xi is the true value. The advantage of using MAE as the loss
function is that it is easy to interpret and understand, as it gives the average magnitude
of the error in the units of the target variable.
v
u n
uX (ŷi − yi )2
RM SE = t (4.3)
i=1
n
where yi = y1 , y2 , ..., yn are observed values, ŷi is a predicted time series and i indicates
time. The RMSE penalizes larger differences more heavily than smaller differences, which
makes it suitable for cases where the range of the target variable is large. Moreover, the
RMSE is in the same units as the target variable, which makes it easy to interpret and
compare the results with the actual values.
4| Implementation and Results 37
In machine learning, the learning rate is a hyperparameter that determines the step size
at each iteration during the optimization process of a neural network. It is a scalar value
that determines the size of the update made to the model’s weights and biases during
training. If the learning rate is too small, the model will take a long time to converge to
the minimum of the loss function, while if it is too large, the model may overshoot the
minimum and fail to converge.
The number of epochs is a hyperparameter that can significantly affect the performance
of a neural network. Too few epochs may result in underfitting, where the model is not
able to learn the underlying patterns in the data, while too many epochs can result in
overfitting, where the model learns to fit the training data too well and performs poorly
on new data.
Learning rate, and epoch numbers are modified in this study. Also, in all cases power
output for 1 hour in the future has been predicted using 3 hours in the past.
Figures 4.6 - 4.10 show the outcomes of hyperparameter adjustment for the LSTM archi-
tecture.
Figure 4.6: Training and Forecasting results, Learning rate = 0.0001, Epochs = 15
38 4| Implementation and Results
Figure 4.7: Training and Forecasting results, Learning rate = 0.001, Epochs = 18
Figure 4.8: Training and Forecasting results, Learning rate = 0.01, Epochs = 10
Figure 4.9: Training and Forecasting results, Learning rate = 0.001 for epoch ≤ 6; Learn-
ing rate = 0.0001 for epoch > 6, Total Epochs = 10
4| Implementation and Results 39
Figure 4.10: Training and Forecasting results, Exponentially Decaying Learning rate;
Initial value = 0.01; Decay steps = 1000 and Decay rate = 0.9, Epochs = 10
Comparison
Based on the results in this table, we can see that different combinations of learning rate
and epoch number can significantly affect the performance of a machine learning model :
• Lower learning rates (0.0001 and 0.001) tend to result in better performance on the
test dataset, as measured by lower MAE and RMSE values. However, these lower
learning rates may require more epochs to achieve good performance, as seen in the
higher epoch numbers for the 0.0001 and 0.001 learning rates.
40 4| Implementation and Results
• Exponentially decaying learning rates can also be effective, as seen in the last row
of the table, which achieved favorable test MAE and RMSE.
• Learning rate of 0.001 for Epoch Number ≤ 6 and 0.0001 for Epoch Number > 6
can also be considered as a good combination based on the results in the table. This
setting achieved a test MAE of 0.0500 and a test RMSE of 0.0995, which are among
the lowest values in the table. Additionally, it achieved good performance with only
8 epochs of training, which is fewer than some of the other settings in the table.
It’s worth noting that the best epoch numbers for each learning rate can vary, suggesting
that finding the best hyperparameters requires experimentation and tuning.
One of the key benefits of a shorter PV power output prediction horizon is that it can
lead to greater accuracy in predicting solar power generation. This is due to the fact that
weather conditions and other factors affecting solar power output may change rapidly,
and a shorter prediction horizon allows for more up-to-date information to be integrated
into the forecast. With a shorter horizon, more current meteorological data can be incor-
porated, resulting in more accurate predictions. Another benefit of a narrower prediction
horizon is that it allows for better integration of solar power into the power network. Grid
operators can better manage electricity supply and demand with more precise estimates
of solar power generation over shorter periods.
It is feasible to improve the charging and discharging of energy storage systems by prop-
erly estimating the amount of solar power that will be generated over a shorter duration.
This can assist to cut expenses and increase the energy storage system’s efficiency. Lastly,
shorter forecast horizons can aid renewable energy producers in asset management. Reli-
able forecasts of solar power output over shorter durations can assist producers in properly
scheduling maintenance, planning for equipment improvements, and making other critical
operational decisions. To demonstrate the difference of the accuracy achieved for shorter
prediction horizons, a prediction horizon of 15 minutes using data of past 3 hours has
been taken as an example in this case.
Window Length: Window length refers to the number of time steps that the network
considers when processing a sequence of input data. In practice, the window length can
be chosen based on the length of the input sequences and the desired level of temporal
dependency that the network should capture. A longer window length can capture longer-
term dependencies in the input sequences, but may also require more computational
resources to process. A shorter window length may be more computationally efficient but
may not capture long-term dependencies as effectively.
Prediction Horizon: Prediction horizon refers to the number of time steps into the
future that the network is trained to predict. The prediction horizon is determined by
the output layer of the network, which is typically designed to produce a sequence of
output vectors that correspond to the predicted values for each time step in the future.
In practice, the prediction horizon can be chosen based on the task at hand and the
desired level of accuracy in the predictions.
Figures 4.12 - 4.29 show the outcomes of hyperparameter adjustment for the LSTM ar-
chitecture for short-time prediction.
42 4| Implementation and Results
Figure 4.12: Training and Forecasting results, Learning rate = 0.001, Window Length =
6hr, Predicion Horizon = 1hr
Figure 4.13: Training and Forecasting results, Learning rate = 0.01, Window Length =
6hr, Predicion Horizon = 1hr
Figure 4.14: Training and Forecasting results, Learning rate = 0.001, Window Length =
9hr, Predicion Horizon = 1hr
4| Implementation and Results 43
Figure 4.15: Training and Forecasting results, Learning rate = 0.01, Window Length =
9hr, Predicion Horizon = 1hr
Figure 4.16: Training and Forecasting results, Learning rate = 0.001, Window Length =
12hr, Predicion Horizon = 1hr
Figure 4.17: Training and Forecasting results, Learning rate = 0.01, Window Length =
12hr, Predicion Horizon = 1hr
44 4| Implementation and Results
Figure 4.18: Training and Forecasting results, Learning rate = 0.001, Window Length =
6hr, Predicion Horizon = 2hr
Figure 4.19: Training and Forecasting results, Learning rate = 0.01, Window Length =
6hr, Predicion Horizon = 2hr
Figure 4.20: Training and Forecasting results, Learning rate = 0.001, Window Length =
9hr, Predicion Horizon = 2hr
4| Implementation and Results 45
Figure 4.21: Training and Forecasting results, Learning rate = 0.01, Window Length =
9hr, Predicion Horizon = 2hr
Figure 4.22: Training and Forecasting results, Learning rate = 0.001, Window Length =
12hr, Predicion Horizon = 2hr
Figure 4.23: Training and Forecasting results, Learning rate = 0.01, Window Length =
12hr, Predicion Horizon = 2hr
46 4| Implementation and Results
Figure 4.24: Training and Forecasting results, Learning rate = 0.001, Window Length =
6hr, Predicion Horizon = 3hr
Figure 4.25: Training and Forecasting results, Learning rate = 0.01, Window Length =
6hr, Predicion Horizon = 3hr
Figure 4.26: Training and Forecasting results, Learning rate = 0.001, Window Length =
9hr, Predicion Horizon = 3hr
4| Implementation and Results 47
Figure 4.27: Training and Forecasting results, Learning rate = 0.01, Window Length =
9hr, Predicion Horizon = 3hr
Figure 4.28: Training and Forecasting results, Learning rate = 0.001, Window Length =
12hr, Predicion Horizon = 3hr
Figure 4.29: Training and Forecasting results, Learning rate = 0.01, Window Length =
12hr, Predicion Horizon = 3hr
48 4| Implementation and Results
Comparison
Prediction
Learning Window Test
Hori- Test MAE Test MSE
Rate Length(hr) RMSE
zon(hr)
0.001 6 1 0.0527 0.0102 0.1011
0.01 6 1 0.0596 0.0109 0.1042
0.001 9 1 0.0587 0.0109 0.1043
0.01 9 1 0.0588 0.0108 0.1040
0.001 12 1 0.0624 0.0114 0.1066
0.01 12 1 0.0578 0.0108 0.1037
0.001 6 2 0.0687 0.0143 0.1194
0.01 6 2 0.0734 0.0148 0.1217
0.001 9 2 0.0735 0.0149 0.1221
0.01 9 2 0.0700 0.0147 0.1213
0.001 12 2 0.0732 0.0151 0.1229
0.01 12 2 0.0684 0.0146 0.1211
0.001 6 3 0.0793 0.0175 0.1323
0.01 6 3 0.0801 0.0175 0.1323
0.001 9 3 0.0779 0.0174 0.1319
0.01 9 3 0.0796 0.0174 0.1318
0.001 12 3 0.0848 0.0180 0.1342
0.01 12 3 0.0776 0.0172 0.1316
Based on the results in this table, we can see that different combinations of learning
rate, window length and prediction horizon can significantly affect the performance of a
machine learning model :
• The learning rate of 0.01 generally led to better results than the learning rate of
0.001 across most hyperparameter combinations.
4| Implementation and Results 49
• The model’s performance is better for shorter prediction horizons, with the best
results being achieved when the horizon is 1 hour which shows that increasing the
prediction horizon appears to increase the errors as well, which is not surprising since
forecasting farther into the future is generally more challenging than predicting for
a shorter duration.
All these results can be analyzed graphically as shown below in Figure 4.30.
This table provides valuable information for selecting the most effective hyperparameter
configuration when using this particular model for a time-series forecasting task. It also
provides insights into which hyperparameters are most influential for achieving the best
performance. The results suggest that the model may perform better when using larger
window lengths and smaller prediction horizons. This suggests that the model may be
better at making short-term predictions as compared to longer-term forecasts.
neural networks are more interpretable, making it easier to understand how the network
is making its predictions. They are also less prone to overfitting, which occurs when the
network learns to memorize the training data instead of generalizing to new data. Simpler
neural networks require less data to train effectively and are easier to deploy in production
due to their low computational requirements.
However, it’s important to note that simpler neural networks may not always perform
as well as LSTM networks, especially in tasks that require processing sequential data
or long-term dependencies. In such cases, LSTM networks may be the better choice.
Ultimately, the choice of neural network architecture depends on the specific task and
available resources.
Figure 4.31: Convolutional Neural Network (CNN), Window Length = 12hr, Prediction
Horizon = 1hr
Figure 4.33: Gated Recurrent Unit (GRU), Window Length = 12hr, Prediction Horizon
= 1hr
Comparison
Table 4.5 summarizes results for short-time prediction to compare simpler networks with
LSTM.
Prediction
Window Test Test Test
Network Hori-
Length(hr) MAE MSE RMSE
zon(hr)
Convolutional Neural
12 1 0.0649 0.0134 0.1174
Network (CNN)
Autoencoders 12 1 0.0534 0.0109 0.1162
Gated Recurrent Unit
12 1 0.0498 0.0104 0.1159
(GRU)
Long Short-Term
12 1 0.0463 0.0101 0.1025
Memory (LSTM)
Based on the results in this table, the four different neural network models were trained
and tested on the time series dataset. The results show that the LSTM model performed
the best among the four models, as it had the lowest MAE, MSE, and RMSE, with values
of 0.0463, 0.0101, and 0.1025, respectively. The next best-performing model was the
GRU, and the CNN model had the highest MAE, MSE, and RMSE values among the
four models. The results are graphically represented in Figure 4.34.
52 4| Implementation and Results
Overall, the results suggest that the LSTM and GRU models are the most effective for
predicting future values in this time series dataset, while the Autoencoder model also
performs well, but the CNN model is less effective for this specific problem.
4.5. Ageing
Solar panels are known to degrade over time due to exposure to the environment, temper-
ature variations, and other factors. This degradation is commonly referred to as "ageing"
and can have a significant impact on the power output of the solar panels. As solar panels
age, their efficiency in converting sunlight into electricity decreases gradually. This is be-
cause the materials used in solar panels, such as silicon, can deteriorate over time and lose
their ability to absorb light and generate electrical current [33]. The rate of degradation
depends on various factors such as the quality of the materials used, the design of the
panel, and the operating conditions.
Exposure to the elements: Solar panels are typically installed outdoors and are ex-
posed to various weather conditions such as rain, snow, wind, and hail. These environ-
mental factors can cause wear and tear on the solar panels, leading to cracks, corrosion,
and other forms of damage.
UV radiation: Solar panels are exposed to sunlight, which contains ultraviolet (UV)
radiation that can cause the panel’s materials to degrade over time. The UV radiation
can break down the encapsulant (like Ethylene-vinyl acetate, EVA) that holds the solar
cells in place and cause yellowing of the panel surface, which can reduce the panel’s
efficiency.
Humidity: High humidity levels can cause moisture to seep into the solar panel, which
can lead to corrosion of the metal parts and damage to the electrical connections.
Manufacturing defects: Sometimes, solar panels may have manufacturing defects that
can cause premature aging. For instance, if the panel’s cells are not properly soldered, it
can cause hotspots that can reduce the panel’s efficiency.
Dust and debris: Solar panels can collect dust and debris over time, which can reduce
the amount of sunlight that reaches the cells. This can cause the panel’s efficiency to
decrease over time.
The power output of solar panels typically decreases by around 0.5% to 1% per year.
This means that a solar panel that originally produced 200 watts of power may produce
only 180 watts after ten years of use. The degradation rate can vary depending on the
specific panel and the environmental conditions it is exposed to. Regular maintenance
and cleaning can help extend the lifespan of solar panels and minimize the impact of
ageing on their performance.
54 4| Implementation and Results
Figure 4.36 shows learning curve and Figures 4.37 - 4.41 show simulation results for
different days in the future.
(a) Simulation Results, July 2016 (b) Simulation Results, October 2016
(a) Simulation Results, July 2018 (b) Simulation Results, August 2018
(a) Simulation Results, March 2019 (b) Simulation Results, April 2019
(a) Simulation Results, April 2020 (b) Simulation Results, June 2020
(a) Simulation Results, September 2021 (b) Simulation Results, November 2021
Observations
Table 4.6 summarizes the changing difference between actual and predicted values over
time due to ageing.
Average difference
Prediction Prediction
Prediction Window between real and
view time Hori-
Month, Year Length(hr) predicted values over
span(Days) zon(hr)
the time span
July, 2016 2 6 0.5 0.0264
October, 2016 2 6 0.5 0.0265
July, 2018 2 6 0.5 0.0464
August, 2018 2 6 0.5 0.0477
March, 2019 2 6 0.5 0.0489
April, 2019 2 6 0.5 0.0491
April, 2020 2 6 0.5 0.0273
June, 2020 2 6 0.5 0.0534
September, 2021 2 6 0.5 0.0557
November, 2021 2 6 0.5 0.0562
Table 4.6: Changing difference between actual and predicted values due to ageing
The table summarizes the effect of panel ageing on the accuracy of predicted values. It
shows that as the time passes, the difference between actual and predicted values increases,
indicating that the accuracy of the prediction model decreases due to panel ageing. The
table can be useful in understanding the limitations of the prediction model and making
decisions based on the predicted values.
4| Implementation and Results 57
Also, there is an inconsistency in the results for April 2020 compared to the surrounding
time points. A possible reason for this inconsistency is that there was a change in the
conditions of the panels during that time period, which could have affected the accuracy
of the predictions. For example, there could have been a temporary improvement in the
environmental conditions, or some maintenance work might have been done on the panels
during that time period.
It is clear that DC Power and Solar Irradiance follows an almost fixed pattern over seasons
and years.
58 4| Implementation and Results
Figure 4.43 shows the relationship between Power output and solar irradiance, and the
data removed during preprocessing.
(a) July 2012 - July 2014 (b) January 2016 - January 2017
Figure 4.43: Relationship between Power and Solar Irradiance, and the data removed
during pre-processing
Average difference
Prediction Prediction
Prediction Window between real and
view time Hori-
Month, Year Length(hr) predicted values over
span(Days) zon(hr)
the time span
January, 2015 10 90 0.5 0.1150
February, 2015 10 90 0.5 0.1283
March, 2015 10 90 0.5 0.1455
April, 2015 10 90 0.5 0.1711
May, 2015 10 90 0.5 0.1609
June, 2015 10 90 0.5 0.1530
July, 2015 10 90 0.5 0.1473
August, 2015 10 90 0.5 0.1411
September, 2015 10 90 0.5 0.1503
October, 2015 10 90 0.5 0.1353
November, 2015 10 90 0.5 0.1342
4| Implementation and Results 59
Table 4.7: Difference between actual and predicted values over time
The corresponding simulation results are shown in Figure 4.44 - 4.48. The results are also
graphically represented in Figure 4.49.
The table spans from January 2015 to December 2019, and for each month within that
time frame, the table provides the prediction view time span (number of days), window
length and prediction horizon (in hours), and the average difference between the actual
and predicted values over the time span. Based on table, we can infer that the accuracy of
the predictions varies over time. The average difference between the actual and predicted
values is quite small for some months (e.g., January 2015, January 2016), while for other
months, the difference is relatively large (e.g., April 2015, April 2018).
4| Implementation and Results 61
Figure 4.49: Graphical representation of difference between actual and predicted values
over months
There are many factors that can contribute to the difference between actual and pre-
dicted values in different months. Some of the factors that can influence the accuracy of
predictions include changes in external factors, such as weather or maintenance patterns.
64 4| Implementation and Results
Improving prediction accuracy by clustering the dataset for different weather conditions
is discussed in detail in section 4.6.2.
(a) (b)
To analyze ageing, a scatter plot corresponding to each year along with the interpolating
line is shown in Figure 4.51.
4| Implementation and Results 65
Table 4.8: Slope, intercept and R-value corresponding to the plots in Figure 4.51
(a)
(b)
(c)
Figure 4.51: Scatter plot of irradiance vs power from 2011-2019 with the interpolating
line
66 4| Implementation and Results
The graph and table represents the results of a linear regression analysis performed on a
time series dataset consisting of 9 years of observations of solar irradiance and power. The
slope, intercept, and R-value are calculated for each year to describe the linear relationship
between the two variables. The slope represents the rate of change in power for a unit
change in irradiance, while the intercept represents the estimated value of power when
irradiance is zero. The R-value, also known as the correlation coefficient, is a measure
of the strength and direction of the linear relationship between the two variables. An
R-value of 1 indicates a perfect positive correlation, while an R-value of -1 indicates a
perfect negative correlation. An R-value of 0 indicates no correlation between the two
variables.
The R-values ranging from 0.9806 to 0.9863 indicate a strong positive linear correlation
between the two variables. This means that as irradiance increases, power output also
increases. This suggests that there is a direct relationship between the amount of solar
radiation received and the power output.
The slope values for each year, ranging from 0.9419 to 0.9661, indicate that the rate of
change in power output per unit change in irradiance is relatively consistent over time.
This suggests that the relationship between irradiance and power output is stable and
reliable. The intercept and slope values for each year are not identical, indicating that
there are yearly differences in the relationship between irradiance and power output.
This could be due to various factors such as changes in weather patterns, maintenance of
equipment, or modifications to the system.
If panel aging occurred, it could potentially affect the slope values over time, as the perfor-
mance of solar panels can decline with age due to factors such as weathering, degradation
4| Implementation and Results 67
Based on the data in the table, there is a clear evidence of a consistent decrease in slope
values over the 9-year period covered by the data. While there is some variability in slope
values from year to year but there is a certain trend indicating a decrease in slope values
over time. Factors such as maintenance practices, changes in environmental conditions,
and changes in technology could also potentially affect the performance of solar panels
over time.
To analyze ageing and relationship between irradiance and power, a scatter plot corre-
sponding to each year along with the quadratic interpolation is shown in Figure 4.53.
(a)
(b)
(c)
The quadratic term (ax2 ) represents the curvature of the relationship between irradiance
and power output. In the case of these equations, a is negative, which means that the
relationship between irradiance and power is "inverted U-shaped" or "downward-sloping".
This implies that the power output initially increases as the irradiance level increases,
but eventually reaches a maximum and then decreases as the irradiance level continues
to increase.
The magnitude of the coefficient of x2 is also important. The values provided range from
-0.00014 to -0.00018, which means that the curvature of the relationship is relatively small.
In other words, the maximum power output is not very far from the point where the rate
of increase in power output begins to slow down. The linear term (bx) represents the rate
of change in power output with respect to irradiance. In the case of these equations, b is
generally positive but relatively small in magnitude (between 1.08 and 1.13). This means
that the power output increases as the irradiance level increases, but not very rapidly.
The constant term (c) represents the power output when the irradiance level is zero.
In other words, it represents the "baseline" power output of the PV panel. The values
provided range from -1.91 to -0.41, which means that the baseline power output is negative
but relatively small in magnitude.
The quadratic equations provided suggest that the power output of PV panels increases
as the irradiance level increases, but eventually reaches a maximum and then decreases.
The rate of increase in power output is not very rapid, and the baseline power output
4| Implementation and Results 69
is relatively small. These equations can be used to model the performance of PV panels
over time and optimize their operation.
Moreover, to understand the ageing, years vs PV power output graph considering quadratic
fitting is plotted in Figure 4.54 for a fixed value of irradiance (1200 W/m2 ).
Figure 4.54: Power Output each year with quadratic interpolation with Irradiance = 1200
W/m2
It is evident that the output power is gradually decreasing each year with the same value
of irradiance, as observed through linear interpolation. This indicates that a quadratic fit
may provide a more suitable description of the correlation between irradiance and power,
and also aid in assessing the effects of ageing.
The sigma value in a Gaussian filter determines the width of the Gaussian distribution
used to smooth the data. Specifically, it determines the standard deviation of the Gaussian
distribution. A larger sigma value will result in a wider distribution and a smoother out-
put. When the sigma value is small, the Gaussian filter only smooths out high-frequency
noise in the data, leaving the low-frequency information intact. However, as the sigma
value increases, the filter begins to smooth out lower frequency information as well, re-
sulting in a loss of detail in the data. The algorithm used to apply the gaussian filter is
70 4| Implementation and Results
Therefore, the choice of sigma value depends on the nature of the data and the degree of
smoothing required. If the data contains a lot of high-frequency noise, a smaller sigma
value may be appropriate. On the other hand, if the data is relatively smooth and we
want to remove outliers or reduce noise without losing important details, a larger sigma
value may be more appropriate.
It is also important to note that excessively large sigma values can result in oversmoothing
and loss of important features in the data. Therefore, After experimenting with different
sigma values to find the optimal value for this specific dataset, Sigma = 1.5 has been used.
The scatter plot before and after applying this filter over the whole dataset is shown in
Figure 4.56.
The application of a Gaussian filter has effectively reduced outliers and noise within the
dataset, as shown in the figures.
To analyze ageing and relationship between irradiance and power after applying the filter
and removing the noise significantly, a scatter plot corresponding to each year along with
the quadratic interpolation is shown in Figure 4.57.
4| Implementation and Results 71
(a)
(b)
(c)
Figure 4.57: Scatter plot of Irradiance vs Power with Gaussian filter and quadratic inter-
polation
The quadratic equation for each year after applying filter is given in Table 4.10.
Moreover, to understand the effect of applying the filter on ageing, years vs PV power
output graph considering quadratic fitting is plotted in Figure 4.58 for a fixed value of
irradiance (1200 W/m2 ).
Figure 4.58: Power Output each year with Gaussian filter and quadratic interpolation
with Irradiance = 1200 W/m2
Observations using quadratic interpolation after applying filter on the output power with
constant irradiance have revealed a gradual annual decrease. Although the application of
a filter has successfully removed noise from the dataset, the similar trend of decrease in
output power suggests an aging effect that remains unaffected.
input time-series data, the RNN decoder is used to model the temporal dependencies,
and the skip-connection network is used to help preserve the information from the input
directly to the output. The basic architecture of LSTNet is shown in Figure 4.59.
CNN Encoder: The input time-series data is first passed through a CNN encoder, which
extracts local patterns from the input time-series data. The encoder consists of a series
of convolutional layers, which are followed by a max-pooling layer. The output of the
encoder is a sequence of feature maps, which represent the extracted local patterns.
RNN Decoder: The sequence of feature maps from the CNN encoder is then fed into an
RNN decoder, which models the temporal dependencies between the extracted features.
The RNN decoder is typically a type of gated RNN, such as an LSTM or GRU, which
is capable of modeling long-term dependencies. The output of the RNN decoder at each
time step is given by:
where ht is the hidden state at time t, ht−1 is the hidden state at the previous time step,
and yt−1 is the output at the previous time step. f is the recurrent function, which is
defined as:
f (ht−1 , yt−1 ) = LSTM(ht−1 , yt−1 ) (4.5)
where yˆt is the predicted value at time t, ht is the hidden state of the RNN decoder at
time t, and st is the skip connection output at time t. g is a fully connected layer, which
is used to combine the two inputs.
Overall, LSTnet is a powerful and effective model for time-series forecasting, which is
capable of capturing both long-term and short-term dependencies in the data.
The comparison between simulation results of LSTNet and LSTM as a function of test
set loss are shown in Table 4.11.
LSTNet generally exhibits slightly better performance compared to LSTM. Across most
years, LSTNet consistently achieves lower test loss values, indicating its superior predictive
capability. However, there are instances where LSTM outperforms LSTNet. Specifically,
in March 2015 and March 2018, LSTM achieved marginally lower test loss values com-
pared to LSTNet. This suggests that LSTM may have had a better ability to capture the
underlying patterns and dependencies in the data during those particular years. While
the differences in test loss values between the models are relatively small, these instances
4| Implementation and Results 75
indicate that LSTM can occasionally exhibit competitive performance with LSTNet. The
choice between the two models may depend on specific requirements, such as the dataset
characteristics or the desired balance between accuracy and computational efficiency. Fur-
ther analysis and experimentation is necessary to gain a comprehensive understanding of
the models’ relative strengths and weaknesses.
(a) (b)
(c) (d)
(e) (f)
Preprocess the data: Before clustering the dataset, it is important to preprocess the
data by normalizing it and removing any outliers or missing values.
Select a clustering algorithm: There are many different clustering methods to choose
from. The choice of the method will depend on the size of the dataset, the desired number
of clusters, and the computational resources available.
Cluster the data: Once the algorithm has been defined, the dataset can be clustered
using the selected algorithm. The output of this step will be a set of clusters, each
containing days with similar trends in PV power generation.
Train a separate model for each cluster: Finally, a separate model can be trained
for each cluster using the days within that cluster. This can help to improve the accuracy
of the models, since they will be trained on data with lower variability.
To address the challenge of uncertainty in power values on a day-to-day basis, the dataset
is divided into two categories: sunny days and overcast days. Each category is then
trained separately using dedicated models. This approach involves the use of two distinct
models, one specifically trained for sunny days and another for overcast days. The model
trained specifically on sunny days is employed to forecast power values under sunny con-
ditions, whereas the model trained on overcast days is utilized for power prediction during
overcast conditions. By grouping days with similar power production characteristics into
the same category, the variability within each set is reduced. This clustering approach
allows for a more targeted and accurate prediction by tailoring the models to specific
weather conditions.
Daily mean irradiance is used to split the dataset. To split the PV power dataset into
cloudy and sunny days using daily mean irradiance, the daily mean irradiance values for
each day in the dataset are obtained. The daily mean irradiance is usually calculated by
averaging the irradiance values over a day. Then, a threshold value is determined that
will serve as the cutoff between cloudy and sunny days. This threshold value depends on
the specific dataset.
In order to determine the most suitable threshold value, various thresholds were evaluated.
4| Implementation and Results 77
The daily mean irradiance value for each day was compared to the threshold value. If
the irradiance value exceeded the threshold, the day was classified as sunny. Conversely,
if the irradiance value fell below the threshold, the day was classified as cloudy. This
classification process resulted in the creation of a new dataset where days were grouped
into either sunny or cloudy categories based on the threshold comparison.
Simulation Results
The simulation results for cloudy and sunny days with a threshold value of mean daily
irradiance equal to 250 W/m2 are shown below from Figures 4.62 - 4.63.
78 4| Implementation and Results
(a) June 2016, MAE = 0.0324 (b) August 2016, MAE = 0.0371
(a) November 2017, MAE = 0.0621 (b) April 2018, MAE = 0.0779
The clustering of the dataset has been shown to effectively reduce prediction errors. It is
worth noting that the predictive accuracy is significantly lower for overcast days compared
to sunny days. One possible explanation for this disparity is the substantial variability
in power values within the overcast dataset, whereas the sunny dataset tends to exhibit
more consistent patterns. Consequently, training models on the sunny dataset leads to
more repetitive and reliable forecasts.
As a result, during cloudy weather conditions, model performance tends to suffer due to
the inherent variability in both power and irradiance values within the dataset, making
prediction task more challenging.
The performance of utilizing a model trained on one weather condition (sunny or overcast)
for predicting power values on the opposite condition (overcast or sunny, respectively) can
4| Implementation and Results 79
also be assessed.
Figures 4.64 - 4.65 shows the simulation results of employing a model trained on one
weather condition for predicting power values on the opposite condition.
(a) January 2016, MAE = 0.0293 (b) March 2016, MAE = 0.0378
(a) January 2017, MAE = 0.0736 (b) February 2017, MAE = 0.0854
The results demonstrate that using a model trained on the sunny dataset, to predict
power values on overcast days led to higher MAE compared to its performance on sunny
days. Similarly, utilizing a model trained on the overcast dataset, for power prediction on
sunny days resulted in higher MAE compared to its performance on overcast days.
These findings indicate that there are notable differences in power production charac-
teristics between sunny and overcast conditions. The models trained on their respective
datasets have learned specific patterns and relationships relevant to the corresponding
80 4| Implementation and Results
weather conditions. When applied to opposite weather conditions, the models struggled
to capture the nuanced dynamics, leading to decreased performance.
In conclusion, utilizing a model trained on one weather condition to predict power values
on the opposite condition resulted in decreased performance compared to using the models
on their original datasets. These findings highlight the importance of tailoring models to
specific weather conditions for accurate power predictions. Therefore, it is recommended
to employ separate models trained specifically for sunny and overcast conditions to address
the challenge of uncertainty in power values on a day-to-day basis effectively.
81
Simulations were conducted to evaluate the performance of these models and determine
the most accurate prediction. The hyperparameters of the models such as learning rate,
number of epochs, and window length, while examining different prediction horizons using
two datasets with varying sampling intervals of 1 minute and 15 minutes, were also fine-
tuned to identify the optimal set of parameters that would yield the best performance
on the datasets. The results of the simulations demonstrated that the LSTM model
outperformed all other networks in terms of prediction accuracy. The findings revealed
a notable decrease in prediction accuracy as the prediction horizon increased, indicating
the increased challenge of forecasting with longer timeframes.
The study also involved a meticulous investigation into the aging of PV systems. Two
methods were employed to analyze aging effects. Firstly, data was collected from PV
panels that were installed some time ago (referred to as "old data" relative to the present
day), and the model’s performance was compared with more recent data. It was expected
that predictions following the data used for model creation would be more accurate, while
predictions further into the future would overestimate actual data. Consequently, the dif-
ference between predicted and actual values over multiple years was examined to observe
the impact of aging. The results demonstrated an increasing disparity between predicted
and actual values over time, indicating a decrease in prediction model accuracy due to
panel aging. The second method employed in the aging analysis involved calculating the
slope of the scatter plot between irradiance and power over different years. The results
indicated a gradual decrease in the output power for each year, despite the irradiance
82 5| Conclusions and Future Developments
remaining constant. This observation further supports the notion of aging affecting the
performance of PV systems.
Furthermore, a clustering approach was employed to classify the dataset into sunny and
cloudy days. Individual prediction models were then developed for each category, leading
to an improvement in prediction accuracy. Through training and testing the prediction
model on separate datasets that represent distinct climate conditions, it was observed
that locations with more consistent and stable sunny weather conditions, with fewer
occurrences of cloudy days throughout the year, achieved higher prediction accuracy. This
finding can be attributed to the increased variability in power and solar irradiance within
datasets characterized by unstable weather conditions, which poses a greater challenge
for accurate predictions.
Therefore, inference from the results suggests that LSTM architecture is the most effective
model for accurately predicting power generation from a solar PV system. The research
findings indicate that increasing the prediction horizon poses a challenge to forecasting
accuracy, highlighting the importance of considering shorter prediction intervals for more
precise predictions. Furthermore, the study reveals that longer window lengths contribute
to improved prediction accuracy, emphasizing the significance of selecting appropriate
window lengths during model training. Additionally, the analysis of PV system aging
provides valuable insights into the deterioration of prediction accuracy over time, as the
difference between actual and predicted values increases with panel aging. This suggests
the need for periodic recalibration or retraining of the prediction model to account for the
changing characteristics of aging PV panels.
Future implications of this research are significant for the optimization and performance
enhancement of solar power systems. The identified superior performance of LSTM ar-
chitecture in power output prediction can serve as a benchmark for future studies and
industry applications. These findings can guide the development of more accurate and
reliable prediction models, leading to improved operational efficiency and better resource
planning for solar energy installations. The understanding that prediction accuracy de-
creases with panel aging highlights the importance of monitoring and maintenance prac-
tices in ensuring the long-term viability and productivity of PV systems. Moreover, the
analysis of climate conditions and its impact on prediction accuracy provides insights
into the challenges associated with unstable weather patterns and emphasizes the need
for tailored prediction models that account for regional variations. Overall, the research
outcomes contribute to the advancement of renewable energy technologies and pave the
way for future developments in solar power forecasting and system optimization.
83
Bibliography
[1] R. Ahmed, V. Sreeram, Y. Mishra, and M. Arif. A review and evaluation of the state-
of-the-art in pv solar power forecasting: Techniques and optimization. Renewable and
Sustainable Energy Reviews, 124, 2020. doi: 10.1016/j.rser.2020.109792.
[2] A. Aussem and F. Murtagh. Dynamical recurrent neural networks — towards envi-
ronmental time series prediction. International Journal of Neural Systems, 6:145–170,
1995. doi: 10.1142/s0129065795000123.
[4] Y. Bengio, P. Simard, and P. Frasconi. Learning long-term dependencies with gra-
dient descent is difficult. IEEE transactions on neural networks, 5(2):157–166, 1994.
doi: 10.1109/72.279181.
[5] S. Bhatti and H. Manzoor. Machine learning for accelerating the discovery of high
performance low-cost solar cells: a systematic review. arXiv preprint, 2022. doi:
10.48550/arXiv.2212.13893.
[6] D. T. Bui. Hybrid intelligent model based on least squares support vector regression
and artificial bee colony optimization for time-series modeling and forecasting hori-
zontal displacement of hydropower dam. Handbook of neural computation, Academic
Press, pages 279–293, 2017.
[8] T. Bui, V. Nhu, and N. Hoang. Prediction of soil compression coefficient for urban
housing project using novel integration machine learning approach of swarm intelli-
gence and multi-layer perceptron neural network. Adv Eng Inf, 38:144–152, 2018.
[9] C. Chen, S. Duan, T. Cai, and B. Liu. Online 24-h solar power forecasting based
84 | Bibliography
on weather type classification using artificial neural network. Solar energy, 85(11):
2856–2870, 2011. doi: 10.1016/j.solener.2011.08.027.
[10] J. Chung, C. Gulcehre, K. Cho, and Y. Bengio. Empirical evaluation of gated recur-
rent neural networks on sequence modeling. arXiv preprint, 2014.
[11] U. K. Das and K. S. Tey. Forecasting of photovoltaic power generation and model
optimization: A review. Renewable and Sustainable Energy Reviews, 81:912–928,
2018. doi: 10.1016/j.rser.2017.08.017.
[15] I. Goodfellow, Y. Bengio, and A. Courville. Deep Learning. MIT press, 2016.
[16] A. Graves. Long short-term memory. Supervised sequence labeling with recurrent
neural networks, pages 37–45, 2012. doi: 10.1007/978-3-642-24797-2_2.
[17] H. Guang and Z. Qin. Extreme learning machine: A new learning scheme of feed-
forward neural networks. IEEE International Conference on Neural Networks, 2:
985–990, 2004. doi: 10.1109/IJCNN.2004.1380068.
[18] M. A. Hall and L. A. Smith. Feature selection for machine learning: comparing a
correlation-based filter approach to the wrapper. FLAIRS conference, 1999:235–239,
1999.
[19] K. He, X. Zhang, S. Ren, and J. Sun. Deep residual learning for image recognition.
Proceedings of the IEEE conference on computer vision and pattern recognition, pages
770–778, 2016.
[20] T. Hegazy, P. Fazio, and O. Moselhi. Developing practical neural network applications
using back-propagation. Sol Energy, 9:145–159, 1994.
[23] G. Lai, W. Chang, Y. Yang, and H. Liu. Modeling long-and short-term temporal
patterns with deep neural networks. arXiv preprint, 2017. doi: 10.1145/3209978.
3210006.
[25] H. Murdock and D. Gibb. Renewables 2022 global status report. Technical report,
United Nations Environment Programme, 2022.
[26] NREL. Solar power data for studies, 2022. URL https://data.openei.org/
submissions/4568.
[28] N. Panwar, S. Kaushik, and S. Kothari. Role of renewable energy sources in en-
vironmental protection: A review. Renewable and Sustainable Energy Reviews, 15:
1513–1524, 2011. doi: 10.1016/j.rser.2010.11.037.
[29] B. Pham, T. Bui, I. Prakash, and M. Dholakia. Hybrid integration of multilayer per-
ceptron neural networks and machine learning ensembles for landslide susceptibility
assessment at himalayan area (india) using gis. Catena, 149:52–63, 2017.
[33] S. Santos, N. Torres, and R. Lameirinhas. The impact of aging of solar cells on the
performance of photovoltaic panels. Energy Conversion and Management, 10:82–100,
2021. doi: 10.1016/j.ecmx.2021.100082.
86 5| BIBLIOGRAPHY
[34] G. Sbrana and A. Silvestrini. Random switching exponential smoothing and inventory
forecasting. Int J Prod Econ, 156:283–294, 2014.
[35] M. Sipper. High per parameter: A large-scale study of hyperparameter tuning for
machine learning algorithms. Algorithms, 315, 2022. doi: 10.3390/a15090315.
[36] M. Steurer, R. Hill, and N. Pfeifer. Metrics for evaluating the performance of machine
learning based automated valuation models. Journal of Property Research, 38:99–129,
2021. doi: 10.1080/09599916.2020.1858937.
[38] M. Sundermeyer, R. Schluter, and H. Ney. Lstm neural networks for language mod-
eling. Thirteenth annual conference of the international speech communication asso-
ciation, 2012.
[39] T. Talakoobi. Solar power forecast using artificial neural network techniques. Mas-
ter’s thesis, Politecnico di Torino, 10 2020. Department of Control and Computer
Engineering.
[40] L. F. Tratar and E. Strmcnik. The comparison of holt–winters method and multiple
regression method: a case study. Energies, 109:266–276, 2016.
[41] A. Tuohy, J. Zack, S. E. Haupt, and J. Sharp. Solar forecasting: methods, challenges,
and performance. IEEE Power and Energy Magazine, 13(6):50–59, 2015.
[42] C. Wan, J. Zhao, Y. Song, Z. Xu, J. Lin, and Z. Hu. Photovoltaic and solar power
forecasting for smart grid energy management. CSEE Journal of Power and Energy
Systems, 1(4):38–46, 2015. doi: 10.17775/CSEEJPES.2015.00046.
87
6| Appendix
Data Analysis and LSTM Algorithm
import os
import datetime
import IPython
import IPython.display
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.dates as mdates
import numpy as np
import pandas as pd
import seaborn as sns
import tensorflow as tf
!pip install -q tensorflow-model-optimization
import tensorflow-model-optimization as tfmot
mpl.rcParams['figure.figsize'] = (8, 6)
mpl.rcParams['axes.grid'] = False
df=pd.concat(map(pd.read_csv["contentSystem34Combined.csv"]
print(df)
This block of code imports various libraries that are commonly used for data analysis,
visualization, and machine learning. Some of the libraries imported include os (operating
system interactions), datetime (date and time operations), matplotlib and seaborn (plot-
ting libraries), numpy (numerical computing library), pandas (data manipulation library),
tensorflow (machine learning framework), and tensorflow-model-optimization (library for
model optimization). Next line reads the CSV file "contentSystem34Combined.csv" us-
ing pd.read-csv["contentSystem34Combined.csv"]. The resulting DataFrames are then
assigned to the variable df.
88 6| Appendix
df['measured_on'] = pd.to_datetime(df['measured_on'],
infer_datetime_format=True)
df.set_index('measured_on', inplace=True)
df.describe()
First, the code preprocesses the data by converting the ’measured-on’ column to datetime
format and setting it as the index of the DataFrame. This helps in working with time-
series data efficiently. Next, the code generates descriptive statistics for the DataFrame
using the describe method. This provides an overview of the numerical columns in the
DataFrame, including count, mean, standard deviation, minimum, maximum, and quar-
tiles. These statistics can help in understanding the distribution and characteristics of
the data. Finally, the code defines a function called plot-range which takes two param-
eters: start and end. This function allows you to plot a specific range of data from the
DataFrame. Inside the plot-range function, a subset of the DataFrame is created using
the specified columns (’dc-power-hw–2694’ and ’poa-irradiance–2679’) and the specified
date range (start to end). The plot method is then used to generate a plot of the subset
of data, with subplots=True indicating that each column should be plotted on separate
subplots. The last line of code calls the plot-range function with specific start and end
dates, allowing you to visualize a specific range of data from the DataFrame.
fft = tf.signal.rfft(df['dc_power_hw__2694'])
f_per_dataset = np.arange(0, len(fft))
n_samples = len(df['dc_power_hw__2694'])
years_per_dataset = n_samples/(24*365.25*4)
f_per_year = f_per_dataset/years_per_dataset
plt.step(f_per_year, np.abs(fft))
plt.xscale('log')
plt.ylim(0, 30000000)
plt.xlim([0.1, max(plt.xlim())])
plt.xticks([1, 365.25], labels=['1/Year', '1/day'])
_ = plt.xlabel('Frequency (log scale)')
The code begins by creating two new variables: minute and day. The minute variable
represents the minutes of the day (0 to 1439) calculated from the DataFrame’s index, while
the day variable represents the day of the year (0 to 364) calculated from the DataFrame’s
index.
Next, the code defines two constants: day-minutes, representing the total number of
minutes in a day, and year-days, representing the average number of days in a year.
These constants are used in the subsequent calculations.
The code then adds four new columns to the DataFrame df: ’Day sin’, ’Day cos’, ’Year
sin’, and ’Year cos’. These columns contain sinusoidal transformations of the minute
and day variables, which convert the minutes and days into periodic signals that capture
their cyclical nature. The sinusoidal transformation allows the model to capture the
time-related patterns in the data. After adding the new features, the code computes the
Fast Fourier Transform (FFT) of the ’dc-power-hw–2694’ column using tf.signal.rfft from
TensorFlow. The FFT is used to analyze the frequency components of the signal and
identify any periodic patterns.
The code then calculates the frequency range for the FFT using np.arange and defines
years-per-dataset as the number of years represented in the dataset. This information is
used to convert the frequency values into meaningful units. Finally, the code generates
a plot using plt.step to visualize the FFT results. The x-axis is scaled logarithmically
using plt.xscale(’log’), and the y-axis limits are set with plt.ylim(0, 30000000). The plot
is further customized with appropriate x-axis labels (’1/Year’, ’1/day’) and an x-axis title
(’Frequency (log scale)’).
90 6| Appendix
def normalize_df(df):
return (df-df.min())/(df.max()-df.min())
column_indices = {name: i for i, name in enumerate(df.columns)}
n = len(df)
groups = [group[1] for group in df.groupby(df.index.year)]
train_df = pd.concat(groups[:4])
val_df = groups[4]
test_df = pd.concat(groups[5:])
num_features = df.shape[1]
train_df = normalize_df(train_df)
val_df = normalize_df(val_df)
test_df = normalize_df(test_df)
The normalize-df function takes a DataFrame as input and applies normalization to each
column. The normalization formula (df - df.min()) / (df.max() - df.min()) subtracts the
minimum value of each column from its values and divides it by the range (maximum -
minimum) of that column. This normalization ensures that all values in each column are
scaled between 0 and 1.
Next, the code defines a dictionary column-indices to map column names to their re-
spective indices in the DataFrame. This mapping will be used later during the modeling
process. The variable n is assigned the length of the DataFrame df, representing the total
number of rows. The code then groups the data in the DataFrame by the year component
of the index using df.groupby(df.index.year). This results in a list of groups, where each
group contains data from a specific year.
The training data is created by concatenating the data from the first four groups using
pd.concat(groups[:4]). This combines the data from the first four years into the train-df
DataFrame. The validation data is assigned as the fifth group, groups[4], which contains
the data from the fifth year.
The test data is created by concatenating the data from all the remaining groups, start-
ing from the sixth group, using pd.concat(groups[5:]). This combines the data from the
sixth year onwards into the test-df DataFrame. The variable num-features is assigned the
number of columns in the DataFrame, representing the total number of features in the
dataset. Finally, the normalize-df function is applied to the training, validation, and test
DataFrames to normalize their respective data.
6| Appendix 91
if not exclude_input:
x = df[:-(past + future)].values
else:
x = df[:-(past + future)][[c for c in df.columns if not c in
exclude_input]].values
y = df.iloc[past+future:][target]
return tf.keras.preprocessing.timeseries_dataset_from_array(
x,
y,
sequence_length=past,
batch_size=batch_size,
sampling_rate=sampling_rate,
sequence_stride=sequence_stride,
shuffle=False
)
The code provided defines a function generate-dataset that takes a DataFrame df and
generates a dataset for time-series forecasting. Inside the function, the variables past and
future are calculated based on the provided pasth and futureh values.
The code then checks whether to exclude any input features based on the exclude-input
parameter. If exclude-input is None, all columns except the target columns are used
as input features (x). Otherwise, the columns specified in exclude-input are excluded
from the input features. The target variable(s) (y) are extracted from the DataFrame,
starting from the index corresponding to the past + future offset. Finally, the function
uses tf.keras.preprocessing.timeseries-dataset-from-array to create a time-series dataset
from the input (x) and target (y) arrays. The sequence length is set to past, representing
the length of the input sequence. Other parameters such as batch size, sampling rate,
sequence stride, and shuffle are also provided to configure the dataset generation. The
function returns the generated time-series dataset.
92 6| Appendix
In this code snippet, the generate-dataset function is used to generate time-series datasets
for training, validation, and testing from the respective DataFrames train-df, val-df, and
test-df.
The generate-dataset function is called three times with different DataFrame inputs and
the same values for pasth (60) and sampling-period (15) parameters. This ensures con-
sistency in the length of the input sequence (past) and the sampling interval. After
generating the datasets, a loop is used to iterate over the first batch of the dataset-train
dataset using dataset-train.take(1). Each iteration of the loop provides a batch of data,
which is unpacked into inputs and targets variables.
The next lines of code print the shapes of the inputs and targets arrays using the .shape
attribute. The inputs array represents the input sequences of the time-series data, and
the targets array represents the corresponding target values to be predicted. The output
of the code snippet provides the shape of the inputs array as (batch-size, sequence-length,
num-features) and the shape of the targets array as (batch-size, num-targets). This gives
information about the dimensions of the input and target arrays, which is useful for un-
derstanding the structure of the generated datasets.
MAX_EPOCHS = 20
model.compile(loss=tf.losses.MeanSquaredError(),
optimizer=tf.optimizers.Adam(), metrics=
[tf.metrics.MeanAbsoluteError(), tf.metrics.RootMeanSquaredError()])
history = model.fit(train, epochs=epochs,
verbose = verbose,
validation_data=val,
callbacks=[early_stopping])
print(model.summary())
return history
def plot_history(history):
plt.plot(history.history['loss'])
plt.plot(history.history['val_loss'])
plt.title('model loss')
plt.ylabel('loss')
plt.xlabel('epoch')
plt.legend(['train', 'val'], loc='upper left')
plt.show()
This code snippet imports the defaultdict class from the collections module and sets a con-
stant MAX-EPOCHS to 20. The compile-and-fit function is defined, which takes a model
object, training data train, validation data val, and several optional parameters. Inside the
function, an instance of tf.keras.callbacks.EarlyStopping is created with the specified mon-
itor, patience, and mode parameters. This callback monitors the validation loss during
training and stops the training process if the validation loss does not improve for a spec-
ified number of epochs. The model is compiled using a mean squared error loss function
(tf.losses.MeanSquaredError()) and the Adam optimizer (tf.optimizers.Adam()). Addi-
tionally, two metrics are specified: mean absolute error (tf.metrics.MeanAbsoluteError())
and root mean squared error (tf.metrics.RootMeanSquaredError()).
The model.fit method is called to train the model. The training data train is used, and the
number of epochs is specified. The verbose parameter determines the level of verbosity
94 6| Appendix
during training. The validation data val is provided for evaluation during training. The
early-stopping callback is included as a callback to monitor the validation loss and stop
training if needed. After training, the model summary is printed using model.summary().
The compile-and-fit function returns the training history, which contains information
about the loss and metrics values during training. The plot-history function is defined to
plot the training and validation loss over epochs. It takes the training history as input
and uses plt.plot to create a line plot of the loss values. The function also adds labels
to the plot and displays it using plt.show(). The performance variable is defined as a
nested dictionary structure using defaultdict. It is intended to store performance metrics
for validation and test datasets.
model_inputs = Input(inputs.numpy().shape[1:])
lstm_out = tf.keras.layers.LSTM(64)(model_inputs)
outputs = tf.keras.layers.Dense(units=1, activation='linear')(lstm_out)
lstm = tf.keras.Model(inputs=model_inputs, outputs=outputs)
history = compile_and_fit(lstm, dataset_train, dataset_val, epochs=3)
plot_history(history)
performance['val']['lstm'] = lstm.evaluate(dataset_val, verbose=0)
performance['test']['lstm'] = lstm.evaluate(dataset_test, verbose=0)
In this code snippet, a model architecture is defined, trained, and evaluated using the
LSTM (Long Short-Term Memory) layer.
First, an input layer model-inputs is created with the shape of the inputs array obtained
from the previous code snippet using Input(inputs.numpy().shape[1:]). This defines the
shape of the input data for the model. Next, an LSTM layer with 64 units is added to the
model using tf.keras.layers.LSTM(64)(model-inputs). This layer processes the input se-
quences and extracts relevant features. The output of the LSTM layer is passed to a dense
layer with a single unit and linear activation function using tf.keras.layers.Dense(units=1,
activation=’linear’)(lstm-out).
The model is created using tf.keras.Model by specifying the input and output layers: lstm
= tf.keras.Model(inputs=model-inputs, outputs=outputs). The compile-and-fit function
is called to compile and train the model. The LSTM model, training dataset (dataset-
train), and validation dataset (dataset-val) are provided as arguments. The model’s per-
formance is evaluated on the validation dataset and the test dataset using lstm.evaluate(dataset-
val, verbose=0) and lstm.evaluate(dataset-test, verbose=0), respectively.
6| Appendix 95
The code snippet provides utility functions for generating and viewing predictions using
trained models. The yield-windows function is a generator that yields overlapping win-
dows of a specified width from an input array (arr). It iterates n times with a step size
of step, and for each iteration, it extracts a window of width width starting from index i
* step. The window is reshaped to have a shape of (1, window-width, num-features) and
yielded using yield x. This function is useful for dividing the input array into windows
for making predictions.
96 6| Appendix
The view-predictions function takes a dictionary of models, a start and end datetime
range, and an optional step size, window width, shift, and dataframe (df). It prepares the
input array by extracting a slice from the dataframe df based on the specified datetime
range and converts it to a numpy array. It also prepares the output dataframe (out) by
selecting the target column and renaming it to ’real’.
The calculate-predictions function takes a model and a list of windows and returns a list
of predictions. It uses the model.predict method to make predictions for each window
and flattens the output to obtain a scalar prediction. The predictions are stored in a list
and returned.
This code calls the view-predictions function to generate and visualize predictions using
the LSTM model. Two sets of predictions are generated and displayed:
The first call to view-predictions generates predictions using the LSTM model for the date-
time range from January 16, 2015, to January 25, 2015. It uses the validation dataframe
(val-df) for the predictions. The second call to view-predictions generates predictions us-
ing the LSTM model for the datetime range from January 16, 2016, to January 25, 2016.
It uses the test dataframe (test-df) for the predictions.
By calling view-predictions, the code generates predictions for the specified datetime
ranges using the LSTM model and displays the predictions alongside the real values.
97
List of Figures
1.1 Global capacity of Solar PV and annual additions . . . . . . . . . . . . . . 1
1.2 Solar PV capacity of differnt countries . . . . . . . . . . . . . . . . . . . . 2
1.3 A microgrid system example . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.4 Output Power variation in a PV array in different conditions . . . . . . . . 3
List of Tables
4.1 Statistical analysis (2018) . . . . . . . . . . . . . . . . . . . . . . . . . . . 31
4.2 Statistical analysis (2014-2016) . . . . . . . . . . . . . . . . . . . . . . . . 31
4.3 Hyperparameters tuning results . . . . . . . . . . . . . . . . . . . . . . . . 39
4.4 Hyperparameters tuning results for short-time prediction . . . . . . . . . . 48
4.5 Results for short-time prediction using different networks . . . . . . . . . . 51
4.6 Changing difference between actual and predicted values due to ageing . . 56
4.7 Difference between actual and predicted values over time . . . . . . . . . . 60
4.8 Slope, intercept and R-value corresponding to the plots in Figure 4.51 . . . 65
4.9 Quadratic equation for each year . . . . . . . . . . . . . . . . . . . . . . . 68
4.10 Quadratic equation for each year after filter . . . . . . . . . . . . . . . . . 71
4.11 Comparison between LSTNet and LSTM results . . . . . . . . . . . . . . . 74
103
Acknowledgements
I stand at the culmination of an incredible journey, and as I turn the pages of this thesis,
I am overwhelmed with gratitude for the numerous individuals who have accompanied
me along this path. It is with heartfelt appreciation that I express my deepest gratitude
to those who have contributed to the completion of my Master’s thesis.
I am also indebted to the faculty members of Politecnico di Milano, whose wisdom, exper-
tise, and passion for their respective fields have enriched my academic experience. Their
commitment to fostering an environment of learning and intellectual curiosity has been a
constant source of inspiration throughout my journey.
My heartfelt appreciation also goes out to my family and friends who have supported me
unconditionally throughout this journey. I reserve a special place of appreciation for my
dear friend and colleague, Prateek Pati. Throughout this transformative journey, Pra-
teek’s support, camaraderie, and intellectual companionship have been truly remarkable.
Lastly, I would like to acknowledge the countless authors, researchers, and scholars whose
works have formed the foundation of my research. Their pioneering efforts and ground-
breaking discoveries have paved the way for my own investigations, and I am grateful for
the wealth of knowledge they have contributed to the academic community.
I am humbled by the opportunity to have been part of this remarkable academic journey,
and I am profoundly grateful to everyone who has played a role, big or small, in its
realization. Their collective contributions have undoubtedly shaped the person I have
become today, both academically and personally.
Saloni Dhingra