Atmosphere 14 01635
Atmosphere 14 01635
Atmosphere 14 01635
Article
Machine Learning Dynamic Ensemble Methods for Solar
Irradiance and Wind Speed Predictions
Francisco Diego Vidal Bezerra 1 , Felipe Pinto Marinho 2 , Paulo Alexandre Costa Rocha 1,3, * ,
Victor Oliveira Santos 3 , Jesse Van Griensven Thé 3,4 and Bahram Gharabaghi 3
Abstract: This paper proposes to analyze the performance increase in the forecasting of solar irra-
diance and wind speed by implementing a dynamic ensemble architecture for intra-hour horizon
ranging from 10 to 60 min for a 10 min time step data. Global horizontal irradiance (GHI) and wind
speed were computed using four standalone forecasting models (random forest, k-nearest neighbors,
support vector regression, and elastic net) to compare their performance against two dynamic en-
semble methods, windowing and arbitrating. The standalone models and the dynamic ensemble
methods were evaluated using the error metrics RMSE, MAE, R2 , and MAPE. This work’s findings
showcased that the windowing dynamic ensemble method was the best-performing architecture
when compared to the other evaluated models. For both cases of wind speed and solar irradiance
forecasting, the ensemble windowing model reached the best error values in terms of RMSE for all the
assessed forecasting horizons. Using this approach, the wind speed forecasting gain was 0.56% when
Citation: Vidal Bezerra, F.D.; Pinto compared with the second-best forecasting model, whereas the gain for GHI prediction was 1.96%,
Marinho, F.; Costa Rocha, P.A.;
considering the RMSE metric. The development of an ensemble model able to provide accurate and
Oliveira Santos, V.; Van Griensven
precise estimations can be implemented in real-time forecasting applications, helping the evaluation
Thé, J.; Gharabaghi, B. Machine
of wind and solar farm operation.
Learning Dynamic Ensemble
Methods for Solar Irradiance and
Keywords: wind energy; solar energy; renewable energy; machine learning; forecasting ensembles
Wind Speed Predictions. Atmosphere
2023, 14, 1635. https://doi.org/
10.3390/atmos14111635
The influences of atmospheric factors on the generation of electrical energy from solar
and wind sources are usually the main problem in the generation of smart grids, where
large-scale generation plants need to be integrated into the electrical grid, which directly
affects the planning, investment, and decision-making processes. Forecast models can
minimize that problem via machine learning models [6].
The benefits of optimizing the forecast of generation from wind and solar sources
using models is also an economic factor, as it gives greater security to the electricity sector
via the improvement of renewable energy purchase contracts [7].
A 14-year-long data set was explored in [8], containing daily values of meteorological
variables. This dataset was used to train three deep neural network (DNNs) architectures
over several time horizons to predict global solar radiation for Fortaleza, in the northeastern
region of Brazil. The accuracy of the predictions was considered excellent according to its
normalized root mean squared error (nRMSE) values and good according to mean absolute
percentage error (MAPE) values.
The variability of mathematical prediction models has individual importance inherent
to each one of the methods employed, and in this scenario, dynamic ensemble models
emerge, which present potentially better performance when compared to individual mod-
els, since they seek maximum optimization by considering the best of the individual models.
This approach is currently very successfully used in research and industrial areas. Several
dynamic ensemble methods have been developed for forecasting energy generation from
renewable sources in which they use the presence of well-known forecast models such
as random forest regression (RF), support vector regression (SVR), and k-nearest neigh-
bors (kNN), which are applied to integrate optimizations for use in dynamic ensemble
methods [9].
The random forest (RF) forecasting model is based on the creation of random decision
trees. In this method, these decision trees state specific rules and conditions for the flow of
the result until its conclusion.
The support vector machine (SVR) is a regression algorithm that uses coordinates
for individual observations and uses hyperplanes to segregate data sets. This is a widely
used method for categorizing clusters and classifying. This model was first developed
for classification purposes and has been largely tested [10,11] in recent approaches [12] to
develop a novel method for the maximum power point tracking of a photovoltaic panel
and in [13], where solar radiation estimation via five different machine learning approaches
is discussed.
The KNN method is a supervised learning algorithm which is widely used as a
classifier that, based on the proximity of nearest neighboring data, performs categorization
via similarity and predicts a new sample using the K-closest samples. Recently, this
approach has been used in [14], where virtual meteorological masts use calibrated numerical
data to provide precise wind estimates during all phases of a wind energy project to
reproduce optimal site-specific environmental conditions.
Most studies have focused on accurate wind power forecasting, where the random
fluctuations and uncertainties involved are considered. The study in [15] proposes a novel
method of ultra-short-term probabilistic wind power forecasting using an error correction
modeling with the random forest approach.
The elastic net method is a regularized regression method that linearly combines the
penalties of the LASSO and Ridge methods. In [16], the study uses forecast combinations
that are obtained by applying regional data from Germany for both solar photovoltaic and
wind via the elastic net model, with cross-validation and rolling window estimation, in the
context of renewable energy forecasts.
The state of the art is currently to use dynamic ensemble methods in a meta-learning
approach such as arbitrating, which uses output combinations according to the predictions
of the loss that shall result, as well as windowing approaches, which have parameterizations
for adjusting the degree of data to be considered [17].
Atmosphere 2023, 14, 1635 3 of 27
In [18], a global climate model (GCM) is studied to improve a near-surface wind speed
(WS) simulation via 28 coupled model intercomparisons using dynamical components.
In [19], a hybrid transfer learning model based on a convolutional neural network and
a gated recurrent neural network is proposed to predict short-term canyon wind speed
with fewer observation data. The method uses a time sliding window to extract time series
from historical wind speed data and temperature data of adjacent cities as the input of the
neural network.
In [20], authors studied the multi-GRU-RCN method, an ensemble model, to obtain
significant information regarding factors such as precipitation and solar irradiation via
short-time cloud motion predictions from a cloud image. The ensemble modeling used
in [21] integrates wind and solar forecasting methodologies applied to two locations at
different latitudes and with climatic profiles. The obtained results reduce the forecast errors
and can be useful in optimizing planning to use intermittent solar and wind resources in
electrical matrices.
A proposed new ensemble model in [22] was based on graph attention networks (GAT)
and GraphSAGE to predict wind speed in a bi-dimensional approach using a Dutch dataset
including several time horizons, time lags, and weather influences. The results showed
that the ensemble model proposed was equivalent to or outperformed all benchmarking
models and had smaller error values than those found in reference literature.
In [23], time horizons ranging from 5 min to 30 min were studied in 5-min time
steps in evaluating solar irradiance short-term forecasts to global horizontal irradiance
(GHI) and direct normal irradiance (DNI) using deep neural networks with 1-dimensional
convolutional neural networks (CNN-1Ds), long short-term memory (LSTM), and CNN–
LSTM. The metrics used were the mean absolute error (MAE), mean bias error (MBE), root
mean squared error (RMSE), relative root mean squared error (rRMSE), and coefficient of
determination (R2 ). The best accuracy was obtained for a horizon of 10 min, improving
11.15% on this error metric compared to the persistence model.
There are studies employing different DNN architectures, such as GNN, CNN, and
LSTM, achieving satisfactory outcomes in different fields of science [24–27]. However,
the present work focuses on classical ML, since the main objective is to identify the best
supporting ensemble approach to the ML procedures. by analyzing the influence of
dynamic ensemble arbitrating and windowing methods on machine learning algorithms
traditionally, focusing on predicting electrical power generation. We also present their
greater efficiency, using data of interest for energy production with input variables of wind
speed and solar irradiance. We have followed this approach because of its advantage in
exploring dynamic ensemble methods, since these seek the best pre-existing efficiency for
generating a unique and more effective predictability model.
Table 1. Geographic coordinates, altitude in relation to the sea level, measurement intervals, and
measurement periods of the data were collected from the Petrolina station. MI and MP stand for,
respectively, “measurement interval” and “measurement period”.
Figure1.1.Map
Figure Mapof
ofthe
thenortheast
northeastof
ofBrazil.
Brazil.The
ThePetrolina
Petrolinameasurement
measurementsite
siteisishighlighted
highlighted[29].
[29].
Table 1. Geographic
The coordinates,
Petrolina region altitude
is classified asin relation
a BSh to theclimate
Koppen sea level, measurement
zone [30]. Thereintervals, and
are consid-
measurement periods of the data were collected from the Petrolina station. MI and
erable differences in the annual cycle between solar radiation and wind. The average wind MP stand for,
respectively, “measurement interval” and “measurement period”.
speed and solar irradiance in Petrolina experience significant seasonal variations through-
out their
Type annual cycle.Lat.The
(◦) windiest interval
Long. (◦) of the(m)
Alt. yearMIoccurs
(min)from May toMP November,
with average wind speeds above 5.4 m/s. The month with the strongest winds is August,
1 January 2007 to 12
Anemometric
with an average hourly wind speed of 6.7 m/s. The period with the lowest December 2010 of
wind volume
09°04′08″ to
the year is from November S May.40°19′11″ O with
The month 387 the calmest
10 winds is March, with an
1 January 2010 to 12
Solarimetric
average hourly wind speed of 4.1 m/s.
December 2010
The period of greatest solar radiance in the year is from September to November,
2
with a daily average above 7.2 kWh/m , with October being the peak with an average of
The Petrolina
7.5 kWh/m region
2 . The period withisthe
classified as a radiance
lowest solar BSh Koppenin the climate zoneMay
year is from [30]. Therewith
to July, are
considerable differences in the annual cycle between solar radiation
a daily average of 6.1 kWh/m2 , with June being the month with the lowest solar radiance, and wind. The
average
with wind speed
an average and solar
of 5.7 kWh/m 2 . irradiance in Petrolina experience significant seasonal
variations throughout their annual cycle. The windiest interval of the year occurs from
May
2.1. to November,
Wind Speed Datawith average wind speeds above 5.4 m/s. The month with the strongest
winds is August,
The wind speed withwere
an average hourly in
was obtained wind
m/sspeed
fromofa 6.7 m/s. The period
meteorological withwhich
station, the lowest
has
wind volume of the year is from November to May. The month with the
anemometric sensors at altitudes of 25 m and 50 m from the ground. The highest altitude calmest winds is
March,
was withfor
chosen anthis
average
study,hourly
both towind speed
reduce theofeffects
4.1 m/s.
of the terrain and to be closer to the
Thecurrently
altitudes period ofingreatest
practicesolar radiance
for wind in the
turbines year is from September to November,
[31].
with a daily average above 7.2 kWh/m2, with October being the peak with an average of
7.5 kWh/m2. The period with the lowest solar radiance in the year is from May to July,
with a daily average of 6.1 kWh/m2, with June being the month with the lowest solar
radiance, with an average of 5.7 kWh/m2.
Atmosphere 2023, 14, 1635 5 of 27
I
kt = (1)
Ics
3. Methodology
Initially, wind speed and irradiance data were acquired and the intervals for the test
and training sets were determined. For wind speed data, in a measurement period from
2007 to 2010, the first three years were used as the training data set and the last year as
the test set. In order to allow the evaluation of the performance of the tested forecasting
models and also of dynamic ensemble methods, this study developed a computational
code in Python to evaluate the output values obtained by the well-known machine learning
forecasting methods: random forest, k-nearest neighbors (kNN), support vector regression
(SVR), and elastic net. For each of the methods, the best performance parameters (lower
root mean squared error (RMSE)) were evaluated. Right after the stage of acquisition
and determination of the optimal parameters for each of the models, the methods of
dynamic ensemble windowing and arbitrating were executed, from which performance
metrics values were also obtained: coefficient of determination (R2 ), root mean squared
error (RMSE), mean absolute error (MAE) and mean absolute percentage error (MAPE).
These values were compared to evaluate the efficiency of the dynamic ensemble methods
compared to other stand-alone models. The variation of the λ parameter for windowing,
which
Atmosphere 2023, is the
14, x FOR PEERlength
REVIEWused for the extension of the values considered in the data forecast, was
6 of 27
also evaluated. The methodology used can be seen in Figure 2.
In the data pre-processing, a recursive approach of Lagged Average values for kt and ν
time series was applied: this feature is given by the vector L(t) with components calculated
using Equation (2).
1
N t∈[t−iδ−T,t∑
Li (t) = x(t) (2)
−(i−1)δ−T]
3.3. Machine Learning Prediction Models and Dynamic Ensemble Method Parameters
In the data training stage, GridSearch was used with 5-fold cross-validation. The
search parameters are shown in Table 2.
Table 2. Search parameters and grid values applied to the tested methods.
GridSearch is a tool from the Scikit-learn library used in Python which applies a
methodology whose function is to combine parameters from the methods under evaluation
and present them in a single output object for analysis. This is a very important tool when
comparing performance between methods, the object of this study.
2
∑iN=1 (yi − yˆl )
R2 = 1 − 2
(3)
∑iN=1 (yi − yl )
• Root mean squared error (RMSE)
v
(y − yˆl )2
uN
RMSE = t ∑ i
u
(4)
i =1
n
1 N
N i∑
MAE = |yˆl − yi | (5)
=1
1 n yi − yˆl
n i∑
MAPE = (6)
=1
yi
Method Parameter t + 10 t + 20 t + 30 t + 60
best_max_depth 7 7 7 7
Random forest
best_n_estimators 20 20 20 20
KNN best_n_neighbors 49 49 49 49
best_C 1 1 1 1
SVR
best_epsilon 0.1 0.1 1 0.1
Elastic net best_l1_ratio 1 1 1 1
Efficiency evaluations for each of the forecasting methods were based on perfor-
mance metrics evaluations for each time horizon under study (t + 10, t + 20, t + 30 and
t + 60). Initially, for all time horizons, windowing proved to be the most efficient method.
Then, a fine-tuning evaluation was performed based on the variation of the windowing
parameter to assess its influence on performance. The predominance of better performance
for windowing in all time horizons and its comparisons can be seen in Table 4 and Figure 3.
Atmosphere 2023, 14, 1635 8 of 27
Table 4. Comparison of RMSE (m/s) values, using different methods for different time horizons and
windowing λ parameter variation. The best results for each time horizon are in bold.
Elastic net is a penalized linear regression model that is a combination of Ridge and
LASSO regression into a single algorithm and uses best_l1_ratio as a penalty parameter
during the training step, being 0 for Ridge and 1 value for LASSO regression. From Table 3,
the parameter obtained the value of 1, which means that LASSO regression was used in
its entirety.
6 1.16170
t + 60 min 12 1.18092 1.19527 1.17764 1.18281 1.16685 1.18156
25 1.16987
Atmosphere 2023, 14, 1635
50 1.17254 9 of 27
100 1.17455
Figure 3. Windowing λ parameter variation influence in RMSE for different time horizons in wind
Figurespeed
3. Windowing
data analysis λ
forparameter variation
all the studied influence in RMSE for different time horizons in wi
time horizons.
speed data analysis for all the studied time horizons.
osphere 2023, 14, x FOR PEER REVIEW As with the evaluation employing RMSE, values from R2 , MAE, and MAPE were also 10 of
assessed.
As with theOnce the best performance
evaluation employingwas foundvalues
RMSE, for the from
windowing ensemble
R2, MAE, method, were a
and MAPE
an in-depth analysis was performed based on the variation of its parameter λ to assess
assessed. Once the best performance was found for the windowing ensemble method,
the influence on its internal performance. Since the time horizon that presented the best
in-depth analysiswas
performance wast + performed based
10, this was the focuson theanalysis,
of the variation of its in
as shown parameter λ The
Figures 4–7. to assess t
influence ondata
detailed its for
internal performance.
all the horizons is shown Since the
in Tables 5–7.time horizon that presented the b
performance was t + 10, this was the focus of the analysis, as shown in Figures 4–7. T
detailed data for all the horizons is shown in Tables 5–7.
Figure 4. Windowing λ parameter influence on RMSE value for the time horizon t + 10.
Figure 4. Windowing λ parameter influence on RMSE value for the time horizon t + 10.
Table 5. Comparison of MAE (m/s) values, using different methods in different time horizons a
windowing λ parameter variation. The best results for each time horizon are in bold.
25 0.55291
50 0.55087
100 0.54933
Figure 5. Windowing λ parameter influence in MAE value for the time horizon t + 10.
Figure 5. Windowing λ parameter influence in MAE value for the time horizon t + 10.
Table 6. Comparison of R2 (m/s) values, using different methods in different time horizons and
windowing λ parameter variation. The best results for each time horizon are in bold.
Table 5. Comparison of MAE (m/s) values, using different methods in different time horizons and
windowing λ parameter variation. The best results for each time horizon are in bold.
Table 6. Comparison of R2 (m/s) values, using different methods in different time horizons and
windowing λ parameter variation. The best results for each time horizon are in bold.
Table 7. Comparison of MAPE (m/s) values, using different methods in different time horizons and
windowing λ parameter variation. The best results for each time horizon are in bold.
Table 9. Comparison of RMSE (W/m2 ) values, using different methods in different time horizons
and windowing λ parameter variation. The best results for each time horizon are in bold.
Just like the evaluation employing RMSE, values of R2 , MAE, and MAPE were also
analyzed. After the best performance was found for the windowing method, an in-depth
analysis was performed based on the variation of its parameter λ to assess the influence on
its internal performance. Since the time horizon that presented the best performance was
t + 10, this was the focus of the analysis, as shown in Figures 10–13. The detailed data for
all tested time horizons is shown in Tables 10–12.
6 107.76000
t + 60 min 12 112.05000 112.13000 112.76000 118.08000 108.89000 111.13000
25 109.32000
Atmosphere 2023, 14, 1635 50 110.12000 16 of 27
100 110.30000
Figure 9. Windowing λ parameter variation influence in RMSE for all the studied time horizons
solar irradiation data analysis.
Just like the evaluation employing RMSE, values of R2, MAE, and MAPE were al
analyzed. After the best performance was found for the windowing method, an in-dep
analysis was performed based on the variation of its parameter λ to assess the influen
on its internal performance. Since the time horizon that presented the best performan
was t + 10, this was the focus of the analysis, as shown in Figures 10–13. The detailed da
Figure 9. Windowing λ parameter variation influence in RMSE for all the studied time horizons in
for allsolar
tested time horizons
irradiation is shown in Tables 10–12.
data analysis.
Figure 10. Windowing λ parameter influence in RMSE value in time horizon t + 10.
Figure 10. Windowing λ parameter influence in RMSE value in time horizon t + 10.
Table 10. Comparison of R2 (W/m2) values, using different methods in different time horizons an
windowing λ parameter variation. The best results for each time horizon are in bold.
6 59.91000
12 60.85000
25 61.34000
50 61.84000
10 61.51000
1 74.59000
3 75.92000
6 77.11000
t + 60 min 12 81.28000 79.84000 81.44000 89.07000 78.47000 79.80000
25 79.08000
50 79.48000
10 79.63000
Figure 11. Windowing λ parameter influence in R2 value in time horizon t + 10.
Figure 11. Windowing λ parameter influence in R2 value in time horizon t + 10.
Table 11. Comparison of MAE (W/m2) values, using different methods in different time horizo
and windowing λ parameter variation. The best results for each time horizon are in bold.
Figure 13. Windowing λ parameter influence in MAPE value in time horizon t + 10.
Figure 13. Windowing λ parameter influence2 in MAPE
2
value in time horizon t + 10.
Table 10. Comparison of R (W/m ) values, using different methods in different time horizons and
windowing λ parameter variation. The best results for each time horizon are in bold.
Some authors applied elastic Nnet in time-varying combinations [16], using RMSE as
a performance
Time Horizon metric.
λ They found
RF that,KNN
for PV forecasts,
SVR it Elastic
obtainedNet 13.4% more precise
Windowing Arbitrating
forecasts than the 1simple average and for the wind forecast, it obtained 6.1% better
0.92184
forecasts. 3 0.92141
In [21], an ensemble
6 method which used MAPE as the comparative efficiency 0.92062 metric
for wind
t + 10 min speed data 12 was studied
0.92000 with 0.92000
a value of 0.92000
9.345%, and solar with0.92080
0.92000 7.186%, which
0.92000
proved to be the most 25 efficient. 0.92073
In this study,50 performance improvements were obtained for the0.92022 most efficient
method (windowing) 100 compared to the second most efficient for wind speed of 0.56% and,
0.91976
for solar irradiation, 1 1.86%. 0.91000
3 0.91000
6 0.90000
t + 20 min 12 0.88000 0.90000 0.90000 0.90000 0.90000 0.90000
25 0.90000
50 0.90000
100 0.90000
1 0.89000
3 0.89000
6 0.89000
t + 30 min 12 0.88000 0.88000 0.88000 0.87000 0.89000 0.88000
25 0.89000
50 0.88000
100 0.89000
Atmosphere 2023, 14, 1635 19 of 27
Table 11. Comparison of MAE (W/m2 ) values, using different methods in different time horizons
and windowing λ parameter variation. The best results for each time horizon are in bold.
Table 12. Comparison of MAPE (W/m2 ) values, using different methods in different time horizons
and windowing λ parameter variation. The best results for each time horizon are in bold.
Some authors applied elastic Nnet in time-varying combinations [16], using RMSE as
a performance metric. They found that, for PV forecasts, it obtained 13.4% more precise
forecasts than the simple average and for the wind forecast, it obtained 6.1% better forecasts.
In [21], an ensemble method which used MAPE as the comparative efficiency metric
for wind speed data was studied with a value of 9.345%, and solar with 7.186%, which
proved to be the most efficient.
In this study, performance improvements were obtained for the most efficient method
(windowing) compared to the second most efficient for wind speed of 0.56% and, for solar
irradiation, 1.86%.
objectives, hyperparameters, and input data [22]. To facilitate the comparison against the
results found in the literature, Table 13 compiles the results previously presented for the
proposed windowing model. The results found in literature for wind speed forecasting are
compiled and presented in Table 14, where RMSE and MAE are in m/s.
Table 13. Compilation of the windowing’s results for different time horizons.
Analyzing the results for reference [22], in which wind speed was forecasted in the
Netherlands using an ensemble approach merging graph theory and attention-based
deep learning, we can observe that the proposed windowing ensemble model is not able
to surpass the results for neither RMSE nor MAE for t + 60 forecasting horizon. The
accentuated difference between these two models can be explained because the GNN
SAGE GAT model, being developed to handle graph-like data structure, excels in retrieving
complex spatiotemporal relationships underlaying the dataset, drastically improving its
forecasting capacity when compared with other ML and DL models alike.
In reference [37], the authors proposed a wind forecasting for a location in Sweden,
with a model based on a bi-directional recurrent neural network, a hierarchical decompo-
sition technique, and an optimization algorithm. When compared with their results, the
windowing model proposed in this paper offers improvement over the reference results for
t + 10 forecasting horizon by 1% and by 20% for t + 60. When MAE and MAPE are analyzed,
the windowing indicates improvement over these metrics for t + 10 and t + 60, increasing
by 28% the MAE value for t + 10, and 9% for t + 60. Regarding MAPE, the improvement is
64% for t + 10 and 95% for t + 60.
In the work of Liu et al. [39], another deep learning-based predictive model was
proposed. It used a hybrid approach composed of data area division to extract historical
wind speed information and an LSTM layer optimized via a genetic algorithm to process the
temporal aspect of the dataset to forecast wind speed in Japan. Compared to this reference,
the windowing model showed no improvement for wind speed forecasting. However,
the windowing approach offers competitive forecasting for the assessed time windows,
being in the same order of magnitude as the ones in the reference. In work [40], the authors
proposed the employment of another hybrid forecasting architecture composed of CNN
and LSTM deep learning models for wind speed estimation in the USA. Their results,
when compared against the windowing methodology, are very similar for all forecasting
horizons, showing that both windowing and CNN–LSTM offer good results for wind speed
estimation for these time intervals.
In Dowell et al. [38], a statistical model for estimation of future wind speed values
in the Netherlands was proposed. For the available t + 60 time horizon, we observe that,
again, the forecasted wind speeds for the reference and proposed windowing models are
very similar, suggesting both models as valuable tools for wind speed forecasting.
For GHI forecasting, the results found in the literature are presented in Table 15.
Atmosphere 2023, 14, 1635 23 of 27
In work [23], a deep learning standalone model of CNN was applied to estimate future
GHI values in the USA. Comparing the GHI forecasting results achieved via windowing
with this reference, we observe that the proposed model was not able to provide superior
forecasting performance. However, the windowing results are still competitive since both
approaches were able to reach elevated coefficient of determination values for all the
assessed forecasting horizons, with a slight advantage for the deep learning model.
In reference [41], the authors combined principal component analysis (PCA) with
multivariate empirical model decomposition (MEMD) and gated recurrent unit (GRU) to
predict GHI in India. In their methodology, the PCA extracted the most relevant features
from the dataset after it was filtered via the MEMD algorithm. Lastly, the future irradiance
was estimated via the deep learning model of GRU. Compared to their approach, the
windowing model could not improve the GHI forecasting within a t + 60 time window. Also,
the reference model MEMD-PCA-GRU provided an elevated R2 value of 99%, showing
clearly superior performance over the proposed ensemble model.
When our model is compared with the physical-based forecasting models proposed
in [42,43], we can conclude that windowing can achieve similar results for time horizons
of t + 30 and t + 60. In [42], authors used the FY-4A-Heliosat method for satellite imagery
to estimate GHI in China. Although the windowing model could not improve on GHI
forecasting for t + 30 and t + 60 time windows, the proposed model was able to return
relevant results for irradiance estimation in both cases. The second physical-based model
proposed in [43] was applied to estimate GHI in Finland. In their methodology, the Heliosat
method is again employed, together with geostationary weather data from satellite images.
Compared to their proposed approach, the windowing model can improve GHI forecasting
for t + 60 in 8%, providing significant advance in the irradiance estimation.
In work [44], the authors used the state-of-the-art transformer deep learning architec-
ture together with sky images [45] for GHI estimation in the USA. Analyzing their results
and the ones provided by the windowing method, we observe that the transformer-based
model reaches the best GHI forecasting values for RMSE in all the assessed time windows.
After the comparison of the ensemble windowing approach with reference models
found in the literature, we see that wind speed prediction is often competitive and usually
improves wind speed prediction for the assessed forecasting horizons. The results for
wind speed prediction using the ensemble model corroborate the results found in the
literature, where the ensemble approach often reaches state-of-the-art forecasting in time-
Atmosphere 2023, 14, 1635 24 of 27
series prediction applications [21,46–48]. Their improved performance comes from the
combination of weaker predictive models to improve their overall forecasting capacity, also
reducing the ensembled model’s variance [49,50].
However, the proposed dynamic ensembled approach faced increased difficulty when
determining future GHI values. This may be an indication that irradiance forecasting is
a more complex non-linear natural phenomenon, requiring improved extraction of spa-
tiotemporal information from the dataset. Since the proposed ensemble model does not
have a deep learning model in its architecture it cannot properly identify and extract spa-
tiotemporal information underlying the dataset, thus failing in providing better irradiance
estimation. Deep learning models can often excel in this type of task, as proved in the results
from Table 15. Extensive literature can be found regarding improvements of time-series
forecasting problems when complex and deep approaches are employed [22,23,51,52].
5. Conclusions
This work proposed to evaluate the performance of two machine learning (ML) dy-
namic ensemble methods, using wind speed and solar irradiance data separately as in-
puts. Initially, wind speed and solar irradiance data from the same meteorological station
were collected, the time horizons to be studied were determined (t + 10 min, t + 20 min,
t + 30 min and t + 60 min), and then a recursive approach of lagged average values was
applied to evaluate the models’ predictors.
ML methods well known in other energy forecasting research works regarding wind
and irradiance data were selected to compare their efficiency with two other methods that
use a dynamic ensemble approach (windowing and arbitrating). The programming code
in Python was developed to catalog the optimal efficiency parameters of each previously
known model, based on error metrics and coefficient of determination. The dynamic
ensemble methods (windowing and arbitrating), based on the optimal parameters of
each previously calibrated models (random forest, k-nearest neighbors, support vector
regression, and elastic net), generated a single model with greater efficiency for both wind
and solar irradiance data.
For forecasting wind speed data, the most efficient method was found to be windowing
for all time horizons, when evaluated by the criterion of the lowest RMSE value, and
specifically for the time horizon t + 10, as evidenced in Figure 3. The greatest efficiency was
found in an interval of 1 to 74 for the λ parameter, reaching maximum performance for
the value λ = 19, as seen in Figure 8, which suggests that the windowing parameterization
directly influences the method’s performance.
Structurally, solar radiation data is different from wind data, since they have cycles
in nature and are different physical phenomena, presenting different correlations with
their historical values, which impacts different trends for the λ parameter in each of the
variables.
For solar irradiation forecasting, the most efficient method was also windowing and
the t + 10 min time horizon reached the lowest RMSE value. Unlike what was found for
wind speed data, a greater linearity in the trend was perceived from the λ windowing
parameter variation plot when analyzing its RMSE values. Looking at the λ interval under
study, the best performance value (using RMSE criteria) of λ = 1 was found, as can be seen
in Figure 10. Unlike all other plots, in Figure 12, there is a sudden jump between λ from 1
to 3. Although the reference metric is RMSE, for some other metrics the use of λ = 1 may
mean insufficient information for the model, since it will have as input variable just one
previous time step (window size).
Using wind speed data, the efficiency gain of the most efficient model (windowing
for the time horizon t + 10 min and λ = 19, see Table 4), when compared to the second
highest efficiency (SVR), was 0.56% when using the lowest value RMSE metric. A similar
trend could be observed for the model using solar irradiance data. The efficiency increase,
comparing the most efficient model (windowing for the time horizon t + 10 min and
Atmosphere 2023, 14, 1635 25 of 27
λ = 1, see Table 9) to the second highest efficiency (arbitrating), was about 1.72%, and when
compared to the third most efficient method (SVR), it was about 1.96%.
Also, extensive comparisons with spatiotemporal models found in the literature show
that the dynamic ensemble model for wind speed often provides superior forecasting
performance for the assessed time horizons, deeming the proposed approach as a valuable
tool for wind speed estimation. Regarding irradiance forecasting, the dynamic ensemble
architecture proposed in this study could not surpass the deep learning-based models,
which showed superior spatiotemporal identification, and consequently better estimated
GHI values. However, the proposed windowing approach can provide competitive results
and superior GHI forecasting when compared to physics-based predictive models.
For future works, the dynamic ensemble architecture can be improved with the addi-
tion of more complex machine learning models, such as deep learning and graph-based
approaches, such as the one in works [22,51,52]. This may boost the windowing forecasting
capacity for GHI and wind speed estimation once it is able to benefit from spatiotemporal
data information underlying the dataset. The models were developed to treat the database
in a generalized way. Specific studies with delimitation of seasons and/or times of day can
be carried out as future studies. The development of an ensemble model able to provide
accurate and precise estimations can then be employed in the development of real-time
forecasting applications, helping the evaluation of wind and solar farms operation.
Author Contributions: Conceptualization, F.D.V.B., F.P.M. and P.A.C.R.; data curation, F.D.V.B. and
F.P.M.; formal analysis, P.A.C.R.; methodology, F.D.V.B., F.P.M. and P.A.C.R.; software, F.D.V.B.
and F.P.M.; supervision, P.A.C.R.; validation, P.A.C.R., J.V.G.T. and B.G.; visualization, P.A.C.R.,
J.V.G.T. and B.G.; writing—original draft, F.D.V.B., F.P.M. and V.O.S.; writing—review and editing,
F.D.V.B., F.P.M., P.A.C.R., V.O.S., J.V.G.T. and B.G.; project administration, P.A.C.R.; funding acqui-
sition, P.A.C.R., B.G. and J.V.G.T. All authors have read and agreed to the published version of the
manuscript.
Funding: This research was funded by the Natural Sciences and Engineering Research Council of
Canada (NSERC) Alliance, grant No. 401643, in association with Lakes Environmental Software
Inc., by the Coordenação de Aperfeiçoamento de Pessoal de Nível Superior—Brasil (CAPES)—
Finance Code (Grant No. 001), and by the Conselho Nacional de Desenvolvimento Científico e
Tecnológico—Brasil (CNPq), grant no. 303585/2022-6.
Institutional Review Board Statement: Not applicable.
Informed Consent Statement: Not applicable.
Data Availability Statement: The data of wind speed and irradiation from Petrolina—PE—Brazil
are downloaded from SONDA (National Organization of Environmental Data System)\\portal
(http://sonda.ccst.inpe.br/, accessed on 12 July 2023).
Conflicts of Interest: The authors declare no conflict of interest.
References
1. Osman, A.I.; Chen, L.; Yang, M.; Msigwa, G.; Farghali, M.; Fawzy, S.; Rooney, D.W.; Yap, P.S. Cost, environmental impact, and
resilience of renewable energy under a changing climate: A review. Environ. Chem. Lett. 2022, 21, 741–764. [CrossRef]
2. Calif, R.; Schmitt, F.G.; Duran Medina, O. −5/3 Kolmogorov turbulent behavior and Intermittent Sustainable Ener-
gies. In Sustainable Energy-Technological Issues, Applications and Case Studies; Zobaa, A., Abdel Aleem, S., Affi, S.N., Eds.;
Intech: London, UK, 2016. [CrossRef]
3. Carneiro, T.C.; de Carvalho, P.C.M.; dos Santos, H.A.; Lima, M.A.F.B.; de Souza Braga, A.P. Review on Pho-tovoltaic Power and
Solar Resource Forecasting: Current Status and Trends. J. Sol. Energy Eng. Trans. ASME 2022, 144, 010801. [CrossRef]
4. Shikhovtsev, A.Y.; Kovadlo, P.G.; Kiselev, A.V.; Eselevich, M.V.; Lukin, V.P. Application of Neural Networks to Estimation and
Prediction of Seeing at the Large Solar Telescope Site. Publ. Astron. Soc. Pac. 2023, 135, 014503. [CrossRef]
5. Yuval, J.; O’Gorman, P.A. Neural-Network Parameterization of Subgrid Momentum Transport in the Atmosphere. J. Adv. Model.
Earth Syst. 2023, 15, e2023MS003606. [CrossRef]
6. Meenal, R.; Binu, D.; Ramya, K.C.; Michael, P.A.; Vinoth Kumar, K.; Rajasekaran, E.; Sangeetha, B. Weather Forecasting for
Renewable Energy System: A Review. Arch. Comput. Methods Eng. 2022, 29, 2875–2891. [CrossRef]
Atmosphere 2023, 14, 1635 26 of 27
7. Mesa-Jiménez, J.J.; Tzianoumis, A.L.; Stokes, L.; Yang, Q.; Livina, V.N. Long-term wind and solar energy generation forecasts, and
optimisation of Power Purchase Agreements. Energy Rep. 2023, 9, 292–302. [CrossRef]
8. Rocha, P.A.C.; Fernandes, J.L.; Modolo, A.B.; Lima, R.J.P.; da Silva, M.E.V.; Bezerra, C.A.D. Estimation of daily, weekly and
monthly global solar radiation using ANNs and a long data set: A case study of Fortaleza, in Brazilian Northeast region. Int. J.
Energy Environ. Eng. 2019, 10, 319–334. [CrossRef]
9. Du, L.; Gao, R.; Suganthan, P.N.; Wang, D.Z.W. Bayesian optimization based dynamic ensemble for time series forecasting. Inf.
Sci. 2022, 591, 155–175. [CrossRef]
10. Vapnik, V.N. Adaptive and Learning Systems for Signal Processing, Communications and Control. In The Nature of Statistical
Learning Theory; Springer: Berlin/Heidelberg, Germany, 1995.
11. Smola, A. Regression Estimation with Support Vector Learning Machines. Master’s Thesis, Technische Universit at Munchen,
Munich, Germany, 1996.
12. Mahesh, P.V.; Meyyappan, S.; Alia, R.K.R. Support Vector Regression Machine Learning based Maximum Power Point Tracking
for Solar Photovoltaic systems. Int. J. Electr. Comput. Eng. Syst. 2023, 14, 100–108. [CrossRef]
13. Demir, V.; Citakoglu, H. Forecasting of solar radiation using different machine learning approaches. Neural Comput. Appl. 2023,
35, 887–906. [CrossRef]
14. Schwegmann, S.; Faulhaber, J.; Pfaffel, S.; Yu, Z.; Dörenkämper, M.; Kersting, K.; Gottschall, J. Enabling Virtual Met Masts for
wind energy applications through machine learning-methods. Energy AI 2023, 11, 100209. [CrossRef]
15. Che, J.; Yuan, F.; Deng, D.; Jiang, Z. Ultra-short-term probabilistic wind power forecasting with spatial-temporal multi-scale
features and K-FSDW based weight. Appl. Energy 2023, 331, 120479. [CrossRef]
16. Nikodinoska, D.; Käso, M.; Müsgens, F. Solar and wind power generation forecasts using elastic net in time-varying forecast
combinations. Appl. Energy 2022, 306, 117983. [CrossRef]
17. Cerqueira, V.; Torgo, L.; Pinto, F.; Soares, C. Arbitrage of forecasting experts. Mach. Learn. 2019, 108, 913–944. [CrossRef]
18. Lakku, N.K.G.; Behera, M.R. Skill and Intercomparison of Global Climate Models in Simulating Wind Speed, and Future Changes
in Wind Speed over South Asian Domain. Atmosphere 2022, 13, 864. [CrossRef]
19. Ji, L.; Fu, C.; Ju, Z.; Shi, Y.; Wu, S.; Tao, L. Short-Term Canyon Wind Speed Prediction Based on CNN—GRU Transfer Learning.
Atmosphere 2022, 13, 813. [CrossRef]
20. Su, X.; Li, T.; An, C.; Wang, G. Prediction of short-time cloud motion using a deep-learning model. Atmosphere 2020, 11, 1151.
[CrossRef]
21. Carneiro, T.C.; Rocha, P.A.C.; Carvalho, P.C.M.; Fernández-Ramírez, L.M. Ridge regression ensemble of machine learning models
applied to solar and wind forecasting in Brazil and Spain. Appl. Energy 2022, 314, 118936. [CrossRef]
22. Santos, V.O.; Rocha PA, C.; Scott, J.; Thé, J.V.G.; Gharabaghi, B. Spatiotemporal analysis of bidimensional wind speed forecasting:
Development and thorough assessment of LSTM and ensemble graph neural networks on the Dutch database. Energy 2023,
278, 127852. [CrossRef]
23. Marinho, F.P.; Rocha, P.A.C.; Neto, A.R.; Bezerra, F.D.V. Short-Term Solar Irradiance Forecasting Using CNN-1D, LSTM, and
CNN-LSTM Deep Neural Networks: A Case Study with the Folsom (USA) Dataset. J. Sol. Energy Eng. Trans. ASME 2023,
145, 041002. [CrossRef]
24. Wu, Q.; Zheng, H.; Guo, X.; Liu, G. Promoting wind energy for sustainable development by precise wind speed prediction based
on graph neural networks. Renew. Energy 2022, 199, 977–992. [CrossRef]
25. Oliveira Santos, V.; Costa Rocha, P.A.; Thé, J.V.G.; Gharabaghi, B. Graph-Based Deep Learning Model for Forecasting Chloride
Concentration in Urban Streams to Protect Salt-Vulnerable Areas. Environments 2023, 10, 157. [CrossRef]
26. Tabrizi, S.E.; Xiao, K.; van Griensven Thé, J.; Saad, M.; Farghaly, H.; Yang, S.X.; Gharabaghi, B. Hourly road pavement surface
temperature forecasting using deep learning models. J. Hydrol. 2021, 603, 126877. [CrossRef]
27. Zhang, Y.; Gu, Z.; Thé, J.V.G.; Yang, S.X.; Gharabaghi, B. The Discharge Forecasting of Multiple Monitoring Station for Humber
River by Hybrid LSTM Models. Water 2022, 14, 1794. [CrossRef]
28. INPE. SONDA—Sistema de Organização Nacional de Dados Ambientais. 2012. Available online: http://sonda.ccst.inpe.br/
(accessed on 26 September 2023).
29. GOOGLE. Google Earth Website. Available online: http://earth.google.com/ (accessed on 12 July 2023).
30. Peel, M.C.; Finlayson, B.L.; McMahon, T.A. Updated world map of the Köppen-Geiger climate classification. Hydrol. Earth Syst.
Sci. 2007, 11, 1633–1644. [CrossRef]
31. Landberg, L.; Myllerup, L.; Rathmann, O.; Petersen, E.L.; Jørgensen, B.H.; Badger, J.; Mortensen, N.G. Wind resource estimation—
An overview. Wind. Energy 2003, 6, 261–271. [CrossRef]
32. Kasten, F.; Czeplak, G. Solar and terrestrial radiation dependent on the amount and type of cloud. Sol. Energy 1980, 24, 177–189.
[CrossRef]
33. Ineichen, P.; Perez, R. A new airmass independent formulation for the linke turbidity coefficient. Sol. Energy 2002, 73, 151–157.
[CrossRef]
34. Marquez, R.; Coimbra, C.F.M. Proposed metric for evaluation of solar forecasting models. J. Sol. Energy Eng. Trans. ASME 2013,
135, 011016. [CrossRef]
Atmosphere 2023, 14, 1635 27 of 27
35. Rocha, P.A.C.; Santos, V.O. Global horizontal and direct normal solar irradiance modeling by the machine learning methods
XGBoost and deep neural networks with CNN-LSTM layers: A case study using the GOES-16 satellite imagery. Int. J. Energy
Environ. Eng. 2022, 13, 1271–1286. [CrossRef]
36. Cerqueira, V.; Torgo, L.; Soares, C. Arbitrated ensemble for solar radiation forecasting. In Advances in Computational Intelli-
gence; Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics); Springer: Berlin/Heidelberg, Germany, 2017. [CrossRef]
37. Neshat, M.; Nezhad, M.M.; Abbasnejad, E.; Mirjalili, S.; Tjernberg, L.B.; Astiaso Garcia, D.; Alexander, B.; Wagner, M. A deep
learning-based evolutionary model for short-term wind speed forecasting: A case study of the Lillgrund offshore wind farm.
Energy Convers. Manag. 2021, 236, 114002. [CrossRef]
38. Dowell, J.; Weiss, S.; Infield, D. Spatio-temporal prediction of wind speed and direction by continuous directional regime.
In Proceedings of the 2014 International Conference on Probabilistic Methods Applied to Power Systems, PMAPS 2014, Durham,
UK, 7–10 July 2014. [CrossRef]
39. Liu, Z.; Hara, R.; Kita, H. Hybrid forecasting system based on data area division and deep learning neural network for short-term
wind speed forecasting. Energy Convers. Manag. 2021, 238, 114136. [CrossRef]
40. Zhu, Q.; Chen, J.; Shi, D.; Zhu, L.; Bai, X.; Duan, X.; Liu, Y. Learning Temporal and Spatial Correlations Jointly: A Unified
Framework for Wind Speed Prediction. IEEE Trans. Sustain. Energy 2020, 11, 509–523. [CrossRef]
41. Gupta, P.; Singh, R. Combining a deep learning model with multivariate empirical mode decomposition for hourly global
horizontal irradiance forecasting. Renew. Energy 2023, 206, 908–927. [CrossRef]
42. Yang, L.; Gao, X.; Hua, J.; Wang, L. Intra-day global horizontal irradiance forecast using FY-4A clear sky index. Sustain. Energy
Technol. Assess. 2022, 50, 101816. [CrossRef]
43. Kallio-Myers, V.; Riihelä, A.; Lahtinen, P.; Lindfors, A. Global horizontal irradiance forecast for Finland based on geostationary
weather satellite data. Sol. Energy 2020, 198, 68–80. [CrossRef]
44. Liu, J.; Zang, H.; Cheng, L.; Ding, T.; Wei, Z.; Sun, G. A Transformer-based multimodal-learning framework using sky images for
ultra-short-term solar irradiance forecasting. Appl. Energy 2023, 342, 121160. [CrossRef]
45. Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you
need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017;
pp. 5998–6008.
46. Peng, Z.; Peng, S.; Fu, L.; Lu, B.; Tang, J.; Wang, K.; Li, W. A novel deep learning ensemble model with data denoising for
short-term wind speed forecasting. Energy Convers. Manag. 2020, 207, 112524. [CrossRef]
47. Abdellatif, A.; Mubarak, H.; Ahmad, S.; Ahmed, T.; Shafiullah, G.M.; Hammoudeh, A.; Abdellatef, H.; Rahman, M.M.; Gheni,
H.M. Forecasting Photovoltaic Power Generation with a Stacking Ensemble Model. Sustainability 2022, 14, 11083. [CrossRef]
48. Wu, H.; Levinson, D. The ensemble approach to forecasting: A review and synthesis. Transp. Res. Part C Emerg. Technol. 2021,
132, 103357. [CrossRef]
49. Ghojogh, B.; Crowley, M. The Theory behind Overfitting, cross Validation, Regularization, Bagging and Boosting: Tutorial. arXiv
2023, arXiv:1905.12787.
50. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the ACM SIGKDD International Conference
on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 14–18 August 2016. [CrossRef]
51. Oliveira Santos, V.; Costa Rocha, P.A.; Scott, J.; Van Griensven Thé, J.; Gharabaghi, B. Spatiotemporal Air Pollution Forecasting in
Houston-TX: A Case Study for Ozone Using Deep Graph Neural Networks. Atmosphere 2023, 14, 308. [CrossRef]
52. Oliveira Santos, V.; Costa Rocha, P.A.; Scott, J.; Thé, J.V.G.; Gharabaghi, B. A New Graph-Based Deep Learning Model to Predict
Flooding with Validation on a Case Study on the Humber River. Water 2023, 15, 1827. [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual
author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to
people or property resulting from any ideas, methods, instructions or products referred to in the content.