Aghighi 2018
Aghighi 2018
Aghighi 2018
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 1
1939-1404 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
AGHIGHI et al.: ML REGRESSION TECHNIQUES FOR THE SILAGE MAIZE YIELD PREDICTION USING TIME-SERIES IMAGES 3
C. Remotely Sensed Dataset generated for each year, where each record represented a field
and columns showed the time series of NDVI values. To sum-
The remotely sensed Landsat 8 OLI images are essential for
this paper due to their capabilities to study agriculture. Landsat marize, the data arranged in each 2-D matrix are indicated by
X, where each record xi ∈ RB , xbi = [x1i , x2i , . . . , xB i ] is a field
8 OLI’s design is an advancement on Landsat sensor technol-
data composed of B features. In this paper, i = (1, 2, . . . , n)
ogy, which extends the remarkable 50 years of Landsat records
and allows collection of a significantly greater number of im- where n represents the total number of silage maize fields in
each year, b = (1, 2, . . . , B) where B is the number of fea-
ages per day with improvements in signal-to-noise ratio, as well
as in spectral and radiometric resolutions [48]. Moreover, the tures or images in the time series of each year and each value
Landsat archive and also the collected data by Landsat 8 OLI are of b represents a feature number or a time step in a given
time series. The target variable y represents the ACY value
available to download at no cost via the Internet within 24 h of
acquisition from the United States Geological Survey (USGS) (kg·ha−1 ) for each silage maize field of a given year, where
data center (http://glovis.usgs.gov/) [48]. Hence, this long his- y ∈ R1 , yi = [y1 ; y2 ; . . . ; yn ], i = (1, 2, . . . , n) where n repre-
sents the total number of silage maize fields in each year, and
torical dataset provides a unique opportunity for all countries
to develop yield prediction/estimation models at the regional each yi is attached to xbi . In this study, the values of n are 48,
scales and also individual fields. 68, and 49 for years 2013, 2014, and 2015, respectively.
Since the silage maize in Moghan plain is planted between
May 1 and end of July and harvested between August 1 and end III. METHODS AND EXPERIMENTS
of October, a time series of all the Landsat surface reflectance
The basic assumption regarding data mining and ML ap-
(SR) and climate data record products for the period of between
proaches is the ability of identification, detection, and quan-
May 1 and the end of October was obtained for 2013–2015
tification of patterns in datasets using specific algorithms [55],
from the USGS website. These images were atmospherically
[56]. This enables automatic learning which, in turn, allows the
and geometrically corrected [49], [50]. Although the metadata
extraction of important information from raw data even if the
file provided with each Landsat image indicated that the cloud
data model is unknown [55], [57]. However, the performance of
cover of the images was less than 70%, we employed a newly
ML techniques can be affected by the changes of the variable-to-
developed Fmask (Function of mask) method proposed by [51]
observation ratio [58], where a large number of input variables
to mask the cloud/shadow of each scene. According to [51],
in comparison with the number of observations (i.e., dimension-
the only difference between the results of this Fmask algo-
ality) leads to the curse of dimensionality [59] and, therefore,
rithm and that of Landsat 4–7 [52] is the place where the cirrus
overfitting [60]. For instance, the number of variables B was 12,
band is applied. Zhu et al. [51] employed the top of atmosphere
13, and 11 for years 2013, 2014, and 2015, respectively. Given
(TOA) reflectance of the cirrus band to compute the cirrus cloud
the total number of fields (TNF) (n) (see Section II-D) and B
probability (CCP) (2) and added this probability to that of the
for each year, there are about 4, 5.2, and 4.45 observations per
original Fmask algorithm. According to [51], pixels with TOA
feature for years 2013, 2014, and 2015, respectively. Moreover,
reflectance of the cirrus band larger than 0.01 should be labeled
division of observations into training/validation sets decreases
as potential cloud pixels. Hence, the new Fmask algorithm pro-
the observation-to-feature ratio value, which, in turn, makes ML
vides better cloud probabilities
methods prone to overfitting. It should also be considered that
TOA Reflectance of Cirrus Band when using ML techniques, these approaches require tuning of
CCP = . (2)
0.04 different parameters based on training data.
The cirrus band for Landsat 8 OLI images is band 10. As mentioned above, this paper aims to retrieve silage maize
The employed Fmask software is freely available at https:// yields using different ML techniques including GPR [61], SVR
github.com/prs021/fmask. [62], [63], BRT [64], and RFR [65]. Therefore, the training data
Then, NDVI derived from Landsat 8 OLI images was calcu- selection strategy, the four utilized ML techniques, implemen-
lated as tation details, and the hyperparameter tuning procedures are
ρNIR − ρRed presented in the following sections.
NDVI = (3)
ρNIR + ρRed
where ρNIR and ρRed are the SRs of the NIR and red bands of A. Subset Selection From NDVI Time-Series Data
Landsat 8 OLI, respectively [53]. In order to improve prediction performance of ML techniques
through defying the curse of dimensionality [66], we employed
D. Data Preparation a robust feature selection method, which modifies the neigh-
In order to compute the NDVI values for each silage maize borhood component analysis for regression problems [67]. In
field [see Fig. 1(b)–(d)], the NDVI values of all pixels within the this method, -insensitive loss function is utilized to make the
polygon of each silage maize field were averaged for each im- method more robust to outliers [68]. The -insensitive loss func-
age. This procedure results in a single vector for each time step tion is written as
that represents the average of NDVI values for that given time
[54]. Through extraction of NDVI values and integrating them 0 for |f (x) − y| <
L (y) = (4)
for the time-series dataset, a two-dimensional (2-D) matrix was |f (x) − y| − , other wise
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
4 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
Fig. 2. Feature selection results in 2013. (a) Computed loss value for the lambda values (a vector from zero to four with 200 points). (b) Selected features based
on their weight. (c) Number of fields with maximum NDVI in each feature. (d) NDVI time series of four different fields. (e) Shifted NDVI time series of four
different fields. Lambda is a regularization parameter that minimizes the generalization error of the model.
AGHIGHI et al.: ML REGRESSION TECHNIQUES FOR THE SILAGE MAIZE YIELD PREDICTION USING TIME-SERIES IMAGES 5
results are directly related to the selection of training set compo- GPR first utilizes the training dataset to obtain a prior GPR to
nents with varying features and targets, the size of training set, constrain the possible forms of the unknown function and then
and the training set selection strategy [70]. Although several updates the prior to generate a posterior GPR [47]
training set selection techniques such as holdout method also
N
known as test sample estimation, CV, leave one out, and stratifi-
cation approaches have been proposed, most of them suffer from ŷ = f (x) = αi K(xi , xt ) + ω0 (5)
i=1
some limitations because of their theoretical assumptions. For
instance, the holdout methods cannot efficiently use the data, where ω0 is the bias term, αi ∈ R is the weight assigned to each
because the dataset size is finite and this method simply makes (b)
{xi }N i=1 , and K is a function that evaluates similarity between
two partitions of the original dataset. Therefore if a given class (b)
is overrepresented in one subset, it might be underrepresented in the test set xt and the training set xi , i = 1, 2, . . . , N [46].
the other subset [70]. The leave-one-out approach, which splits GPR frequently requires to fully tune model hyperparameters,
observations into two parts including an individual validation regularization terms, and optimization parameters related to the
observation and the training set of size n − 1, is identified as an covariance or kernel functions [46].
unbiased strategy; however, this method is expensive and also In this paper, we utilized a Gaussian-likelihood function,
leads to unreliable estimates due to its high variance [70], [71]. which is often used for regression [61], [77]. The employed
In the stratified CV method, the proportions of classes in each GPR software is freely available at http://www.gaussianprocess.
fold are approximately equal to the proportions of labels in the org/gpml/ [61] .
original dataset [72]. A basic form of CV is k-fold CV, which
first partitions data into k equally or nearly equally sized folds, D. Support Vector Regression
then a fold is left out and used as the test set in one iteration and Support vector machine is a supervised nonparametric sta-
other folds that contain totally n( k −1
k ) samples build the train- tistical learning technique [78], which has been widely applied
ing set [73]. With regards to the advantages and drawbacks of in the RS field due to its ability to successfully handle small
these techniques, k-fold CV strategy is employed in this paper. training sets [79]. SVM can be categorized into support vector
In this paper, all the n samples were randomly shuffled first to classification and SVR, where SVR is the most common form
remove any inherent pattern embedded in the dataset, and then of SVMs [80]. SVMs have been employed in numerous studies
they were split into five equally sized (or almost equally sized) such as classification tasks [81]–[83] as well as to solve regres-
without replacement groups [73]. Each of the above-mentioned sion problems [47], [84]. Many efforts have been made toward
ML techniques was then trained using four of the folds (n( k −1 k ) the application of these algorithms in the RS community such as
samples) and the fifth fold was reserved for validation and es- for estimating and monitoring biochemical and the biophysical
timation of prediction error. The estimated yields for each test parameters of plants [79], [84], [85], which was well reviewed
fold were considered as the final results of that fold; then, the by [79]; hence, we do not detail them here.
estimated yields from k iterations were integrated to produce the The most widely used versions of SVR are ε-SVR and ν-SVR,
total results. The k errors estimated from the validation steps which were categorized regarding their regularization parame-
were averaged to produce a single estimation error. ters, i.e., ε and ν [86]. The ν-SVR and also its comparison
In this study, a repeated fivefold CV strategy was utilized; for with ε-SVR were discussed in detailed by [63]. According to
this reason, the data were reshuffled 100 times and the learning [47] and [86], ν-SVR has a more meaningful interpretation than
and validation procedures were repeated after each round. The ε-SVR; moreover, the work in [87] stated that regardless of
100 results were again averaged to produce a single crop yield the limitation imposed by the number of training samples, ν-
estimation. SVR has good generalization properties; hence, this study only
utilized ν-SVR using a radial basis function (RBF) kernel to
C. Gaussian Process Regression approximate the nonlinear regression function, as in [47], [87],
and [88]. RBF was chosen here due to its simplicity, and fewer
GPR as a famous kernel-based ML approach implements
implementation difficulties, faster training rate, and ability to
Gaussian processes (GPs) for nonlinear regression purposes. A
deliver an acceptable accuracy [47], [87]–[90].
GP is a collection of random variables with the property that any (b) (b)
finite number of their subsets have a joint Gaussian distribution Given a set of {(x1 , yi ), . . . , (xn , yn )} where each xi ∈
B 1
[61]. GPR was formulated and interpreted in a Bayesian con- R is an input and yi ∈ R is a target output, the problem of
text (5); however, it is categorized as a nonparametric Bayesian ν-SVR is minimization of the following equation [86]:
approach [74] or as a less parametric tool [75]. GPR is not de- N
signed to fit the required parameters of a selected model but 1 T 1 ∗
ww + C ν + (ξi + ξi )
instead aims to capture many different relations that show how 2 N i=1
all the training inputs and target variables are correlated [76]. As T
stated in Section III-B, in order to learn and generate the final w Φ (xi ) + b − yi ≤ +ξi
function model also known as posterior GPR [47], this paper
yi − wT Φ (xi ) + b ≤ +ξi∗
utilized a fivefold CV technique to generate a training dataset
(b)
{xi }N N ξi , ξi∗ ≥ 0, i = 1, . . . , N, ≥ 0 (6)
i=1 and also output dataset {yi }i=1 . In each CV iteration,
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
6 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
where 0 ≤ ν ≤ 1, C is the regularization parameter, xi denotes learners; hence, random forest sensibly averages the total trees
the training vectors, Φ is a function that mapped xt into a higher grown to make a prediction model (fˆrQf r )
dimensional space, and ξi and ξi∗ are slack variables representing
Q
upper and lower constraint on the outputs of the system. This 1
fˆrQf r = T (y, Θq ) (9)
paper employed LibSVM software, which is freely available at Q q =1
https://www.csie.ntu.edu.tw/˜cjlin/libsvm/ [63]. As mentioned
in Section III-B, fivefold CV was employed to estimate C, γ, where Q represents the total number of trees, T (y, Θq ) indicates
and ν using training dataset and the ε was set to 0.1 [86]. the qth tree, y is the target point that requires prediction, and
Θq is a random vector generated for the qth tree independent of
E. Boosted Regression Trees previous vectors.
In this paper, the generalization errors of the model were es-
The term boosted in BRT technique refers to a general and
timated using out-of-bag estimates [96] as an efficient method
provably effective method in ML, which can combine a large
[97]. Moreover, the Bayesian optimization algorithm [98] was
number of relatively simple tree models adaptively to produce an
employed for identifying some optimal predictive model param-
accurate and optimized prediction rule [64], [91]. This method
eters including minimum leaf size and the number of predictors
was developed based on the idea that finding and linearly com-
to use at each node.
bining many rough rules is much easier than finding a single and
highly accurate prediction model [91]. Although BRT model dif-
G. Evaluation
fers from traditional regression methods, it is interpreted as an
advanced form of regression in the statistical community [92]. In this study, the root-mean-square error (RMSE), which pro-
Hence, we employed BRT as an advanced ML technique in this vides a general purpose error metric of numerical prediction, the
paper, where residuals were considered as the loss function [93]. mean absolute error (MAE), which describes average model-
As mentioned in Section III-B, fivefold CV was employed performance error, and correlation coefficient (R) were calcu-
to iteratively fit the model to the training data. For this reason, lated to compare the performances of GPR, SVR, BRT, and RFR
first the initial values were assigned to the constant model that methodologies in maize yield prediction. The RMSE, MAE, and
also included an oversized tree; and then at every step, a boost- R are calculated by using (10)–(12), respectively [99], [100]
ing technique adds a new tree under a certain growth property
that steps down the gradient of the loss function, which focused 1 M
RMSE = (yi − ŷi )2 (10)
on the residuals. In the training phase, three hyperparameters M i=1
including interaction depth, learning rate, and number of trees
were determined using Bayesian optimization method [98]. In where M is the number of test samples, yi indicates the actual
the prediction procedure, each successively selected tree gives maize yield (AMY) data, and ŷi is the predicted maize yield
extra weights to incorrectly predicted points by earlier predic- (PMY)
tors. A BRT model is defined as follows:
1 M
MAE =
K
|yi − ŷi | (11)
fˆBRT
A
= T (y, Θk ) (7) M i=1
k =1
M
where T (y, Θk ) is kth tree, and at each step of the procedure, (yi − y i ) ŷi − ŷ i
R =
i=1 (12)
the following cost function should be solved: N 2
N 2
i=1 (yi − y i ) i=1 ŷi − ŷ i
N
Θ̂m = argmin L(yi , fk −1 (yi ) + T (x, Θk )) (8) where y i and ŷ i are the average of observed and predicted data of
Θk i=1 the model, respectively. The square of R is called the coefficient
of determination that can be computed using (13)
where fk −1 (y) indicates the current model, Θk = {Rj k , γj k }j1k
2
as a region set of Rj k and constants γj k in the kth tree, j M
(yi − y i ) ŷi − ŷ i
i=1
represents a metaparameter, and Rj k refers to the jth disjoint R 2 =
N 2 .
2
N
(13)
nonoverlapping region in the kth tree, yi is the response associ- i=1 (yi − y i ) i=1 ŷi − ŷ i
ated with xi , and L is the loss function.
IV. RESULTS AND DISCUSSION
F. Random Forest Regression
As mentioned in Section III-B, the fivefold CV was employed
Random forests [94] as an ensemble learning method is a to train/validate the GPR, SVR, BRT, and RFR methodologies.
particular implementation of bootstrap aggregation (bagging) For each year, the R, RMSE and MAE of each method for 100
that attempts to mitigate the problems of high variance and runs are compared in Table II, whereas the distribution of those
high bias of the final prediction model through reduction of metrics for each individual experiment is illustrated as box plots
the correlation between estimators (trees) [95]. To this end, in Fig. 3, in order to summarize and illustrate the distribution
each learner randomly samples a subset of n input features to of data based on their minimum, first quartile, median, third
construct splits for each tree. This results in different models for quartile, and maximum.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
AGHIGHI et al.: ML REGRESSION TECHNIQUES FOR THE SILAGE MAIZE YIELD PREDICTION USING TIME-SERIES IMAGES 7
TABLE II
AVERAGE (μ) AND STANDARD DEVIATION (σ) OF R, RMSE AND MAE VALUES OF 100 EXPERIMENTS
Fig. 3. Box plots of MAE, RMSE, and R of the estimated maize yield using different ML techniques for years 2013–2015.
The overall patterns of the R metric were numerically ana- results, but their reported statistics should be interpreted with
lyzed and reported in Table III. Moreover, the PMY of all the the caution as they can be misleading.
experiments were averaged for each method to produce a single As Fig. 3 shows, 25% of the experiment points (e.g., R values)
estimation, which is illustrated as Fig. 4. are distributed under the central blue rectangle of each box; these
Fig. 3 is a combination of box plots and beeswarm plots that points are categorized as the first quartile and can be defined
illustrate the probability distribution of the computed R, RMSE, as the middle points between the smallest metrics values of
and MAE values for all experiments by plotting their quartiles each method and the first quartiles values (see Table III). By
against each other, as well as the individual position of each comparing the first quartile values and the minimum values in
computed metric. The full range of the computed R, RMSE and Table III, it can generally be concluded that the first quartile
MAE values of each box plot is represented by the horizontal of RFR and SVR techniques is more compact than those from
black lines at the end of the dashed whiskers (see Fig. 3), and other techniques and the R values of GPR technique have a
the outliers are illustrated by red crosses; however, we are aware wider spread. However, the average of minimum value and the
that the outliers are not necessarily bad data points. Hence, first quartile value of the BRT technique are higher than those
they should not be automatically removed from the experiments from other methods. Therefore, it can be concluded that in the
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
8 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
TABLE III
THE MINIMUM, 1ST QUARTILE, MEDIAN, 3RD QUARTILE AND THE MAXIMUM VALUES OF R VALUES OF 100 EXPERIMENTS USING DIFFERENT ML TECHNIQUES
AND THE SUMMATION OF ALL VALUES IN EACH COLUMN
Fig. 4. Scatter plots of all the AMY versus PMY using different ML techniques for years 2013–2015.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
AGHIGHI et al.: ML REGRESSION TECHNIQUES FOR THE SILAGE MAIZE YIELD PREDICTION USING TIME-SERIES IMAGES 9
TABLE IV
SOT VALUES OF FIRST, SECOND, AND THIRD QUARTILES OF R 2 VALUES OF 100 EXPERIMENTS
TABLE V
DIFFERENCE BETWEEN THE UPPER AND LOWER QUARTILES VALUES (INTERQUARTILE RANGE), THE DIFFERENCE BETWEEN THE THIRD AND THE FIRST
QUARTILES AND THE DIFFERENCE BETWEEN THE MAXIMUM AND THE THIRD QUARTILES OF 100 EXPERIMENTS
TABLE VI
DISTANCE BETWEEN MEDIAN AND MINIMUM OF SECOND QUARTILES (L_NOTCH) AND THE MAXIMUM VALUE OF THIRD QUARTILES
(U_NOTCH) OF R VALUES OF 100 EXPERIMENTS
worst cases, the performances of BRT technique were better the AMY values in 50% of the experiments and also the smaller
than the others. values can be interpreted as demonstrating the higher stability
In regards to Table IV, at least in 25% of the experiments, of the regression model [see Fig. 3(a)–(c)].
the BRT technique can predict maize yield better than the other Regarding to Table III, the interquartile ranges of RFR and
techniques on its worst performance, because the summation of SVR are more compressed than the others, emerging as the most
tolerance (SOT) of R2 values (14) for all years (SOT = 1.01) stable methods. Moreover, the summation of the first and third
is lower than for the others. Moreover, based on the computed quartile values across the three years for each technique (see
SOT values of all the techniques (see Table IV), the PMY using Table III) also shows that SVR outperforms other techniques in
SVR (SOT = 0.18) was closer together than others in 50% of terms of the interquartile range (see the last row of Table III).
experiments The estimated confidence intervals for each median are rep-
resented by a notch close to the median of each box plot [103].
P
As Fig. 3 shows, the notches are not similar in any of the box
SOT = 1 − R2 (14) plots as well as between the second and third quartiles of some
i=1
box plots. In these cases, the distance between the median and
where P is the number of years for which their datasets were the first quartile is not same as the distance between the median
employed (P = 3). Although 1 − R2 was called the tolerance and the third quartile (see Table VI). Regarding this table, it
by [101] and it was defined as “the proportion of variation that can be concluded that the distribution of the R values in the
is not explained by the regression model” [102], we introduced second and third quartiles is skewed for all of the experiments.
(14) to compare the performance of the regression models in the In Table VI, the smaller U_Notch value than the L_Notch value
time-series problem. of each technique shows that the bottom whisker is much longer
The central blue rectangle of each box plot in Fig. 3 spans and the median is closer to larger R values. As can be seen, the
the first to the third quartiles. These rectangles demonstrate that R values of all techniques except BRT in 2015 are skewed to
50% of metrics values are between the values of first and third the left side; hence, the mean of almost all of the techniques is
quartiles (see Table III). The difference between the third and usually not in the middle but is located somewhere closer to the
first quartiles is referred to as interquartile range. The horizontal third quartiles.
bar in the middle of each box plot is called the median, which The distance between the minimum and maximum of R val-
shows that the accuracies of half of the experiments results ues (full range of variation) of all the experiments (see Fig. 3
are higher than the median and half are worse. The median and Table III) is presented in Table V. Based on this table, the
of R values for each technique and their SOT are represented BRT boxplots are comparatively shorter in range, which indi-
in Tables III and IV, respectively. The interquartile ranges of cate that the overall performance of this method over different
R2 values for each technique were computed and presented in runs has higher level of agreements with each other. Moreover,
Table V. These values indicate that the PMY values are close to the boxplots of GPR are comparatively taller than the others,
where particularly its R values across all years and those of
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
10 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
TABLE VII
t-TEST FOR THE COEFFICIENT OF DETERMINATION (R) MEASURES USING THE UPPER AND LOWER QUARTILES OF R VALUES OF 100 EXPERIMENTS (H = 1
REJECTS THE NULL HYPOTHESIS IN 95% CONDENCE LEVEL)
AGHIGHI et al.: ML REGRESSION TECHNIQUES FOR THE SILAGE MAIZE YIELD PREDICTION USING TIME-SERIES IMAGES 11
Fig. 5. Scatter plots of all the AMY versus PMY using the maximum and summation of the silage maizes NDVI profile during the growing season and the best
individual NDVI dataset (see Table VIII) for years 2013–2015.
the null hypothesis for best Julian day results with the maximum TABLE IX
SUMMARY OF STATISTICS OF 100 EXPERIMENTS ON THE PERFORMANCES OF
and summation of NDVIs for all years. ML TECHNIQUES TO PREDICT THE MAIZE YIELD IN 2015, WHILE THEY
The above results confirmed that the ML techniques can be TRAINED USING THE DATASETS OF 2013–2014
employed to estimate the maize yield of a large studying area
in a given year. In the last part of this study, we focused on
predicting the maize yield in a given growing season, while the
ML techniques being trained using the dataset of other seasons.
In this situation, the performance of the ML techniques in yield
prediction can be evaluated in new seasons and under new con-
ditions. Therefore, we utilized fivefold CV (see Section III-B) to
train all the ML techniques using the integrated dataset of 2013–
2014; then predict the maize yield in 2015. These experiments MAE, a combination of both metrics should be used to assess
were repeated 100 times [100]. model performance. Therefore, for each ML method, the min-
Based on correlation coefficient definition presented in [105], imum, average (μ), maximum, and standard deviation (σ) of
the R value represents the strength of correlation between the RMSE and MAE for 100 runs are reported in Table IX.
dependent and independent variables, and R2 value indicates As Table IX shows, although the average of RMSE and MAE
how much of the variance between those two variables can be of GPR model is less than for other techniques, their standard
described by the linear fit. With regard to [99], although most deviations are more than for the others. In contrast, the lowest
of the works in the field of geoscience recognized RMSE as a standard deviation values were obtained by RFR method while
standard metric to model errors and a few of them employed their average of RMSE and MAE is more than for GPR. Due
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
12 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
to these results, it can be concluded that the results of 100 runs ability to deal with high-dimensional data of complex distribu-
of RFR tend to be close to their mean values and GPR results tions as well as inconsistency of NDVI time series. Specifically,
are spread out over a wider range of values in comparison to the BRT technique, where its average R value was higher than
other techniques. Moreover, RFR can be considered as a stable 0.87 for all the years, performed the best for PMY. BRT was
method in 100 runs and GPR was more accurate at predicting followed by RFR with almost the same performances as GPR
maize yield. technique and better than SVR method. Moreover, in the cases
A comparison between Tables IX and II shows that employing that ML trained using a dataset of two years (2013–2014) to
the trained ML techniques using 2013–2014 dataset to predict predict the maize yield of another year (2015), the average of
the silage maize yield in 2015 results in increasing the RMSE RMSE and MAE of GPR method in 100 runs was lower than for
and MAE. One of the limitations with these results is that the other methods and the RFR technique was the most stable, hav-
number of images (B) in the time series of 2015 (B = 11) was ing the smallest standard deviations. However, in future studies,
less than 2013 and 2014 with 13 and 12 images, respectively, these methods should also be tested with regard to different crop
where one of them in 2013 and two images in 2015 were re- types in different geographical regions as well as on different
moved due to the presence of cloud covers over the studying dataset sizes.
area. One of the removed images in 2015 was acquired at about
the peak of the growing season of silage maize. As it was men- ACKNOWLEDGMENT
tioned, the feature within the interval [t − 3, t + 2] results in
better maize yield estimation (see Fig. 2 and Table I); therefore, The authors would like to thank the editor and two anony-
losing data in that interval might decrease the performance of mous reviewers for their careful reading of this manuscript as
the ML techniques. This can be also seen in the case of using the well as their valuable and constructive comments and sugges-
trained ML method in a given year to predict the silage maize tions that have helped to improve this manuscript. The authors
yield in that year (see Table II), where the amount of B value gratefully acknowledge USGS, EROS Science Processing Ar-
may have contributed to higher R and lower RMSE values of chitecture, for providing the Landsat 8 OLI images. The authors
2014 than 2013 and 2015. also acknowledge Moghan Agro-Industrial and Animal Hus-
With regards to the results of this paper, NDVI values and their bandry Company for providing silage maize yield dataset.
time series can be considered as an appropriate input for yield
prediction of crops with different cultivated dates. However, REFERENCES
due to the influence of cloud cover and cloud shadow on NDVI
[1] T. Horie, M. Yajima, and H. Nakagawa, “Yield forecasting,” Agricultural
and other VIs derived from the optical remotely sensed data, Syst., vol. 40, no. 1–3, pp. 211–236, 1992.
the multisource cloud free optical satellite imagery (MSCFOSI) [2] V. Barnett and J. Riley, “Statistics for environmental change,” Exp. Agri-
can be used to gather more information than a single satellite. culture, vol. 31, no. 2, pp. 117–130, 1995.
[3] M. K. van Ittersum, K. G. Cassman, P. Grassini, J. Wolf, P. Tittonell,
Therefore, VIs derived from MSCFOSI imagery could result in and Z. Hochman, “Yield gap analysis with local to global relevance—A
more accurate yield estimation of the desired crop. review,” Field Crops Res., vol. 143, pp. 4–17, 2013.
LAI of maize increased rapidly from 40 to 60 days after sow- [4] C. Reynolds, M. Yitayew, D. Slack, C. Hutchinson, A. Huete, and M.
Petersen, “Estimating crop yields and production by integrating the
ing, where higher LAI levels could result in saturation of NDVI; FAO crop specific water balance model with real-time satellite data
hence, other indices such as SAVI and SAVI2 could be investi- and ground-based ancillary data,” Int. J. Remote Sens., vol. 21, no. 18,
gated in future works [106]. These indices might show stronger pp. 3487–3508, 2000.
[5] D. M. Johnson, “An assessment of pre-and within-season remotely
correlations with biomass and LAI than NDVI, thereby leading sensed variables for forecasting corn and soybean yields in the United
to more stable models. In addition, it should be noticed that States,” Remote Sens. Environ., vol. 141, pp. 116–128, 2014.
these methods required the crop maps as a prerequisite; hence, [6] D. B. Lobell, D. Thau, C. Seifert, E. Engle, and B. Little, “A scal-
able satellite-based crop yield mapper,” Remote Sens. Environ., vol. 164,
future research could investigate new approaches that can mask pp. 324–333, 2015.
the maize fields in advance and then predict the crop yields. [7] L. Leroux, C. Baron, B. Zoungrana, S. B. Traoré, D. L. Seen, and A.
Moreover, this paper did not investigated the effect of other Bégué, “Crop monitoring using vegetation and thermal indices for yield
estimates: Case study of a rainfed cereal in semi-arid West Africa,” IEEE
environmental parameters (e.g., soil texture, soil moisture, cli- J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 1, pp. 347–
matic parameters, and LAI) on crop yield predictions; hence, the 362, Jan. 2016.
future works can also incorporate the environmental parameters [8] O. Vergara-Dı́az et al., “A novel remote sensing approach for predic-
tion of maize yield under different conditions of nitrogen fertilization,”
in an assimilation-crop modeling framework. Frontiers Plant Sci., vol. 7, p. 666, 2016.
[9] O. D. Sirotenko, “Crop modeling,” Agronomy J., vol. 93, no. 3, pp. 650–
653, 2001.
V. CONCLUSION [10] P. J. Pinter, Jr., et al., “Remote sensing for crop management,” Pho-
togramm. Eng. Remote Sens., vol. 69, no. 6, pp. 647–664, 2003.
This study successfully incorporates BRT, RFR, SVR, and [11] C. Ferencz et al., “Crop yield estimation by satellite remote sensing,”
GPR ML algorithms in silage maize yield prediction process Int. J. Remote Sens., vol. 25, no. 20, pp. 4113–4149, 2004.
[12] G. Asrar, M. Fuchs, E. Kanemasu, and J. Hatfield, “Estimating absorbed
based on NDVI time-series dataset of different fields in a part of photosynthetic radiation and leaf area index from spectral reflectance in
Moghan fertilized plain, Ardabil, Iran. The utilized time series of wheat,” Agronomy J., vol. 76, no. 2, pp. 300–306, 1984.
NDVI values was derived from Landsat 8 OLI images. Regard- [13] Z.-Q. Jin, D.-K. Ge, and C.-L. Shi, “Several strategies of food crop
production in the Northeast China plain for adaptation to global climate
ing the presented results, the ML techniques performances were change—A modeling study,” Acta Agron. Sin., vol. 28, no. 1, pp. 24–31,
higher than for the conventional regression methods due to their 2002.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
AGHIGHI et al.: ML REGRESSION TECHNIQUES FOR THE SILAGE MAIZE YIELD PREDICTION USING TIME-SERIES IMAGES 13
[14] R. Hadria et al., “Potentiality of optical and radar satellite data at high [37] C. Zhang and J. M. Kovacs, “The application of small unmanned aerial
spatio-temporal resolutions for the monitoring of irrigated wheat crops systems for precision agriculture: A review,” Precis. Agriculture, vol. 13,
in Morocco,” Int. J. Appl. Earth Observ. Geoinf., vol. 12, pp. S32–S37, no. 6, pp. 693–712, 2012.
2010. [38] D. Jiang, X. Yang, N. Clinton, and N. Wang, “An artificial neural network
[15] D. Raes, P. Steduto, T. C. Hsiao, and E. Fereres, “Aquacrop—The FAO model for estimating crop yields using remotely sensed information,” Int.
crop model to simulate yield response to water: II. Main algorithms J. Remote Sens., vol. 25, no. 9, pp. 1723–1732, 2004.
and software description,” Agronomy J., vol. 101, no. 3, pp. 438–447, [39] S. S. Panda, D. P. Ames, and S. Panigrahi, “Application of vegetation
2009. indices for agricultural crop yield prediction using neural network tech-
[16] P. Steduto, T. C. Hsiao, D. Raes, and E. Fereres, “Aquacrop—The FAO niques,” Remote Sens., vol. 2, no. 3, pp. 673–696, 2010.
crop model to simulate yield response to water: I. Concepts and under- [40] M. Rahman and B. Bala, “Modelling of jute production using artificial
lying principles,” Agronomy J., vol. 101, no. 3, pp. 426–437, 2009. neural networks,” Biosyst. Eng., vol. 105, no. 3, pp. 350–356, 2010.
[17] X. E. Pantazi, D. Moshou, T. Alexandridis, R. Whetton, and A. M. [41] P. Bose, N. K. Kasabov, L. Bruzzone, and R. N. Hartono, “Spiking neural
Mouazen, “Wheat yield prediction using machine learning and advanced networks for crop yield estimation based on spatiotemporal analysis of
sensing techniques,” Comput. Electron. Agriculture, vol. 121, pp. 57–65, image time series,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 11,
2016. pp. 6563–6573, Nov. 2016.
[18] G. Petropoulos and C. Kalaitzidis, “Multispectral vegetation indices in [42] D. Jones and E. Barnes, “Fuzzy composite programming to combine
remote sensing: An overview e book chapter,” Ecol. Model., vol. 2, remote sensing and crop models for decision support in precision crop
pp. 15–39, 2012. management,” Agricultural Syst., vol. 65, no. 3, pp. 137–158, 2000.
[19] R. B. Myneni, F. G. Hall, P. J. Sellers, and A. L. Marshak, “The inter- [43] M. S. Moran, Y. Inoue, and E. Barnes, “Opportunities and limitations
pretation of spectral vegetation indexes,” IEEE Trans. Geosci. Remote for image-based remote sensing in precision crop management,” Remote
Sens., vol. 33, no. 2, pp. 481–486, Mar. 1995. Sens. Environ., vol. 61, no. 3, pp. 319–346, 1997.
[20] S. Goswami, J. Gamon, S. Vargas, and C. Tweedie, “Relationships of [44] H. Fang, S. Liang, and G. Hoogenboom, “Integration of MODIS LAI and
NDVI, biomass, and leaf area index (LAI) for six key plant species in vegetation index products with the CSM–CERES–Maize model for corn
Barrow, Alaska,” PeerJ PrePrints, vol. 3, 2015, Art. no. e913v1. yield estimation,” Int. J. Remote Sens., vol. 32, no. 4, pp. 1039–1065,
[21] A. Kross, H. McNairn, D. Lapen, M. Sunohara, and C. Champagne, 2011.
“Assessment of rapideye vegetation indices for estimation of leaf area [45] J. Verrelst et al., “Machine learning regression algorithms for biophysical
index and biomass in corn and soybean crops,” Int. J. Appl. Earth Observ. parameter retrieval: Opportunities for Sentinel-2 and-3,” Remote Sens.
Geoinf., vol. 34, pp. 235–248, 2015. Environ., vol. 118, pp. 127–139, 2012.
[22] T. Sakamoto, A. A. Gitelson, and T. J. Arkebauer, “Near real-time pre- [46] J. Verrelst, L. Alonso, J. P. R. Caicedo, J. Moreno, and G. Camps-
diction of U.S. corn yields based on time-series MODIS data,” Remote Valls, “Gaussian process retrieval of chlorophyll content from imaging
Sens. Environ., vol. 147, pp. 219–231, 2014. spectroscopy data,” IEEE J. Sel. Topics Appl. Earth Observ. Remote
[23] C. Wang, S. Nie, X. Xi, S. Luo, and X. Sun, “Estimating the biomass of Sens., vol. 6, no. 2, pp. 867–874, Apr. 2013.
maize with hyperspectral and LiDAR data,” Remote Sens., vol. 9, no. 1, [47] D. Ashourloo, H. Aghighi, A. A. Matkan, M. R. Mobasheri, and A. M.
2017, Art. no. 11. Rad, “An investigation into machine learning regression techniques for
[24] E. Raymond Hunt, Jr., “Relationship between woody biomass and par the leaf rust disease detection using hyperspectral measurement,” IEEE
conversion efficiency for estimating net primary production from NDVI,” J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 9, pp. 4344–
Int. J. Remote Sens., vol. 15, no. 8, pp. 1725–1729, 1994. 4351, Sep. 2016.
[25] J. Lewis, J. Rowland, and A. Nadeau, “Estimating maize production [48] D. P. Roy et al., “Landsat-8: Science and product vision for terrestrial
in Kenya using NDVI: Some statistical considerations,” Int. J. Remote global change research,” Remote Sens. Environ., vol. 145, pp. 154–172,
Sens., vol. 19, no. 13, pp. 2609–2617, 1998. 2014.
[26] J. Mika, J. Kerényi, A. Rimóczi-Paál, Á. Merza, C. Szinell, and I. Csiszár, [49] S. Lhermitte, J. Verbesselt, W. W. Verstraeten, and P. Coppin, “A compar-
“On correlation of maize and wheat yield with NDVI: Example of Hun- ison of time series similarity measures for classification and change de-
gary (1985–1998),” Adv. Space Res., vol. 30, no. 11, pp. 2399–2404, tection of ecosystem dynamics,” Remote Sens. Environ., vol. 115, no. 12,
2002. pp. 3129–3152, 2011.
[27] M. S. Mkhabela, M. S. Mkhabela, and N. N. Mashinini, “Early maize [50] E. Hamunyela, J. Verbesselt, and M. Herold, “Using spatial context
yield forecasting in the four agro-ecological regions of Swaziland using to improve early detection of deforestation from Landsat time series,”
NDVI data derived from NOAA’s-AVHRR,” Agricultural Forest Meteo- Remote Sens. Environ., vol. 172, pp. 126–138, 2016.
rol., vol. 129, no. 1, pp. 1–9, 2005. [51] Z. Zhu, S. Wang, and C. E. Woodcock, “Improvement and expansion
[28] D. K. Bolton and M. A. Friedl, “Forecasting crop yield using remotely of the Fmask algorithm: Cloud, cloud shadow, and snow detection for
sensed vegetation indices and crop phenology metrics,” Agricultural Landsats 4–7, 8, and Sentinel 2 images,” Remote Sens. Environ., vol. 159,
Forest Meteorol., vol. 173, pp. 74–84, 2013. pp. 269–277, 2015.
[29] M. Battude et al., “Estimating maize biomass and yield over large areas [52] Z. Zhu and C. E. Woodcock, “Object-based cloud and cloud shadow
using high spatial and temporal resolution Sentinel-2 like remote sensing detection in Landsat imagery,” Remote Sens. Environ., vol. 118,
data,” Remote Sens. Environ., vol. 184, pp. 668–681, 2016. pp. 83–94, 2012.
[30] R. Shrestha, L. Di, G. Y. Eugene, L. Kang, Y.-z. Shao, and Y.-q. Bai, [53] Y. Ke, J. Im, J. Lee, H. Gong, and Y. Ryu, “Characteristics of Landsat
“Regression model to estimate flood impact on corn yield using MODIS 8 OLI-derived {NDVI} by comparison with multiple satellite sensors
NDVI and USDA cropland data layer,” J. Integrative Agriculture, vol. 16, and in-situ observations,” Remote Sens. Environ., vol. 164, pp. 298–313,
no. 2, pp. 398–407, 2017. 2015.
[31] Field Crops: Usual Planting and Harvesting Dates, vol. 628, United [54] D. G. Neary, B. Orr, W. Leeuwen, S. B. Aguilar, L. Wittenberg, and Y.
States Department of Agriculture National Agricultural Statistics Ser- Carmel, “A multi-country assessment of vegetation dynamics, soil ero-
vice, Washington, DC, USA, 2010. sion, and watershed degradation after wildfires,” in Hydrology and Water
[32] T. Sakamoto, A. A. Gitelson, and T. J. Arkebauer, “MODIS-based corn Resources in Arizona and the Southwest. Mesa, AZ, USA: Arizona-
grain yield estimation model incorporating crop phenology information,” Nevada Acad. Sci., 2005.
Remote Sens. Environ., vol. 131, pp. 215–231, 2013. [55] J. Behmann, A.-K. Mahlein, T. Rumpf, C. Römer, and L. Plümer, “A
[33] C. Funk and M. E. Budde, “Phenologically-tuned MODIS NDVI-based review of advanced machine learning methods for the detection of biotic
production anomaly estimates for Zimbabwe,” Remote Sens. Environ., stress in precision crop protection,” Precis. Agriculture, vol. 16, no. 3,
vol. 113, no. 1, pp. 115–125, 2009. pp. 239–260, 2015.
[34] B. Khakural, P. Robert, and D. Huggins, “Variability of corn/soybean [56] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical
yield and soil/landscape properties across a southwestern Minnesota Machine Learning Tools and Techniques. San Mateo, CA, USA: Morgan
landscape,” Precis. Agriculture, 1999, pp. 573–579. Kaufmann, 2016.
[35] J. McKinion and H. Lemmon, “Expert systems for agriculture,” Comput. [57] H. Buxton, “Learning and understanding dynamic scene activity: A re-
Electron. Agriculture, vol. 1, no. 1, pp. 31–40, 1985. view,” Image Vis. Comput., vol. 21, no. 1, pp. 125–136, 2003.
[36] R. Doluschitz and W. Schmisseur, “Expert systems: Applications to agri- [58] C. M. Van der Walt and E. Barnard, “Data characteristics that determine
culture and farm management,” Comput. Electron. Agriculture, vol. 2, classifier performance,” Human Lang. Technol. Res. Group, Meraka
no. 3, pp. 173–182, 1988. Inst., Pretoria, South Africa, 2006.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
14 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
[59] R. E. Bellman, Adaptive Control Processes: A Guided Tour. Princeton, [86] C.-C. Chang and C.-J. Lin, “Training Nu-support vector regression: The-
NJ, USA: Princeton Univ. Press, 2015. ory and algorithms,” Neural Comput., vol. 14, no. 8, pp. 1959–1978,
[60] G. C. Cawley and N. L. Talbot, “On over-fitting in model selection and 2002.
subsequent selection bias in performance evaluation,” J. Mach. Learn. [87] J. Stamenkovic, P. Ferrazzoli, L. Guerriero, D. Tuia, J.-P. Thiran, and
Res., vol. 11, no. Jul., pp. 2079–2107, 2010. M. Borgeaud, “Crop backscatter modeling and soil moisture estimation
[61] C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine with support vector regression,” in Proc. IEEE Int. Geosci. Remote Sens.
Learning, vol. 1. Cambridge, MA, USA: MIT Press, 2006. Symp., 2014, pp. 3228–3231.
[62] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” [88] V. Anandhi and R. M. Chezian, “Support vector regression to forecast
Statist. Comput., vol. 14, no. 3, pp. 199–222, 2004. the demand and supply of pulpwood,” Int. J. Future Comput. Commun.,
[63] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector vol. 2, no. 3, pp. 266–269, 2013.
machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, p. 27, 2011. [89] S. S. Keerthi and C.-J. Lin, “Asymptotic behaviors of support vector ma-
[64] J. Elith, J. R. Leathwick, and T. Hastie, “A working guide to boosted chines with Gaussian kernel,” Neural Comput., vol. 15, no. 7, pp. 1667–
regression trees,” J. Animal Ecol., vol. 77, no. 4, pp. 802–813, 2008. 1689, 2003.
[65] A. Liaw et al., “Classification and regression by random forest,” R News, [90] D. Bhatt, P. Aggarwal, P. Bhattacharya, and V. Devabhaktuni, “An en-
vol. 2, no. 3, pp. 18–22, 2002. hanced MEMS error modeling approach based on Nu-support vector
[66] I. Guyon and A. Elisseeff, “An introduction to variable and feature se- regression,” Sensors, vol. 12, no. 7, pp. 9448–9466, 2012.
lection,” J. Mach. Learn. Res., vol. 3, no. Mar., pp. 1157–1182, 2003. [91] R. E. Schapire, “The boosting approach to machine learning: An
[67] W. Yang, K. Wang, and W. Zuo, “Neighborhood component fea- overview,” in Nonlinear Estimation and Classification. New York, NY,
ture selection for high-dimensional data,” J. Comput., vol. 7, no. 1, USA: Springer, 2003, pp. 149–171.
pp. 161–168, 2012. [92] J. Friedman et al., “Additive logistic regression: A statistical view of
[68] P. Bermolen and D. Rossi, “Support vector regression for link load pre- boosting (with discussion and a rejoinder by the authors),” Ann. Statist.,
diction,” Comput. Netw., vol. 53, no. 2, pp. 191–201, 2009. vol. 28, no. 2, pp. 337–407, 2000.
[69] V. Vapnik, The Nature of Statistical Learning Theory. New York, NY, [93] J. Friedman, T. Hastie, and R. Tibshirani, The Elements of Statistical
USA: Springer, 2013. Learning (Springer Series in Statistics), vol. 1. New York, NY, USA:
[70] R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy Springer, 2001.
estimation and model selection,” in Proc. Int. Joint Conf. Artif. Intell., [94] T. K. Ho, “Random decision forests,” in Proc. 3rd Int. Conf. Document
Stanford, CA, USA, 1995, vol. 14, pp. 1137–1145. Anal. Recognit., 1995, vol. 1, pp. 278–282.
[71] B. Efron, “Estimating the error rate of a prediction rule: Improvement on [95] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
cross-validation,” J. Amer. Statist. Assoc., vol. 78, no. 382, pp. 316–331, Learning: Data Mining, Inference, and Prediction. New York, NY, USA:
1983. Springer, 2002.
[72] U. M. Braga-Neto and E. R. Dougherty, “Is cross-validation valid for [96] R. Tibshirani, Bias, Variance and Prediction Error for Classification
small-sample microarray classification?” Bioinformatics, vol. 20, no. 3, Rules. Toronto, ON, Canada: Dept. Statist., Univ. Toronto, 1996.
pp. 374–380, 2004. [97] L. Breiman, “Out-of-bag estimation,” Citeseer, 1996.
[73] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross validation,” in Encyclopedia [98] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian optimiza-
of Database Systems (EDBS). New York, NY, USA: Springer, 2009, p. 6. tion of machine learning algorithms,” in Proc. Adv. Neural Inf. Process.
[74] S. J. Gershman and D. M. Blei, “A tutorial on Bayesian nonparametric Syst., 2012, pp. 2951–2959.
models,” J. Math. Psychol., vol. 56, no. 1, pp. 1–12, 2012. [99] C. J. Willmott and K. Matsuura, “Advantages of the mean absolute error
[75] M. Ebden et al., “Gaussian processes for regression: A quick introduc- (MAE) over the root mean square error (RMSE) in assessing average
tion,” The Website of Robotics Research Group, Department of Engi- model performance,” Clim. Res., vol. 30, no. 1, pp. 79–82, 2005.
neering Science, University of Oxford, Oxford, U.K., 2008. [100] T. Chai and R. R. Draxler, “Root mean square error (RMSE) or mean
[76] C. K. Williams, “Prediction with Gaussian processes: From linear re- absolute error (MAE)?” Geosci. Model Develop. Discussions, vol. 7,
gression to linear prediction and beyond,” in Learning in Graphical pp. 1525–1534, 2014.
Models (NATO ASI Series D: Behavioural and Social Sciences), vol. 89. [101] J. Miles, “Tolerance and variance inflation factor,” in, Encyclopedia of
Dordrecht: The Netherlands: Springer, 1998, pp. 599–621. Statistics in Behavioral Science. Hoboken, NJ, USA: Wiley, 2005.
[77] F. Pérez-Cruz, S. Van Vaerenbergh, J. J. Murillo-Fuentes, M. Lázaro- [102] T. P. Ryan, Modern Regression Methods, vol. 655. Hoboken, NJ, USA:
Gredilla, and I. Santamaria, “Gaussian processes for nonlinear signal Wiley, 2008.
processing: An overview of recent advances,” IEEE Signal Process. [103] R. Koenker and K. Hallock, “Quantile regression: An introduction,” J.
Mag., vol. 30, no. 4, pp. 40–50, Jul. 2013. Econ. Perspectives, vol. 15, no. 4, pp. 43–56, 2001.
[78] H. Drucker, C. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Sup- [104] M. Labus, G. Nielsen, R. Lawrence, R. Engel, and D. Long, “Wheat yield
port vector regression machines,” in Advances in Neural Information estimates using multi-temporal NDVI satellite imagery,” Int. J. Remote
Processing Systems, vol. 9. Cambridge, MA, USA: MIT Press, 1997, Sens., vol. 23, no. 20, pp. 4169–4180, 2002.
pp. 155–161. [105] R. Taylor, “Interpretation of the correlation coefficient: A basic review,”
[79] G. Mountrakis, J. Im, and C. Ogole, “Support vector machines in remote J. Diagnostic Med. Sonography, vol. 6, no. 1, pp. 35–39, 1990.
sensing: A review,” ISPRS J. Photogramm. Remote Sens., vol. 66, no. 3, [106] A. Huete, “Soil-dependent spectral response in a developing plant
pp. 247–259, 2011. canopy,” Agronomy J., vol. 79, no. 1, pp. 61–68, 1987.
[80] D. Basak, S. Pal, and D. C. Patranabis, “Support vector regression,”
Neural Inf. Process.—Lett. Rev., vol. 11, no. 10, pp. 203–224, 2007.
[81] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote
sensing images with support vector machines,” IEEE Trans. Geosci.
Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.
[82] M. Pal and P. Mather, “Support vector machines for classification in Hossein Aghighi received the B.Sc. degree in nat-
remote sensing,” Int. J. Remote Sens., vol. 26, no. 5, pp. 1007–1011, ural resource engineering from Gorgan University
2005. of Agricultural Sciences and Natural Resources, the
[83] H. Aghighi, J. Trinder, Y. Tarabalka, and S. Lim, “Dynamic block-based M.Sc. degree in remote sensing and GIS from Tarbiat
parameter estimation for MRF classification of high-resolution images,” Modares University, Tehran, Iran, and the Ph.D. de-
IEEE Geosci. Remote Sens. Lett., vol. 11, no. 10, pp. 1687–1691, Oct. gree in remote sensing from The University of New
2014. South Wales, Sydney, NSW, Australia, in 2000, 2012,
[84] G. Camps-Valls, L. Bruzzone, J. L. Rojo-Alvarez, and F. Melgani, “Ro- 2015, respectively.
bust support vector regression for biophysical variable estimation from He is currently an Assistant Professor with the
remotely sensed images,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 3, Remote Sensing and GIS Research Center, Shahid
pp. 339–343, Jul. 2006. Beheshti University, Tehran, Iran. His research in-
[85] D. Tuia, J. Verrelst, L. Alonso, F. Pérez-Cruz, and G. Camps-Valls, terests include remote sensing (RS) data processing and applications (optical,
“Multioutput support vector regression for remote sensing biophysical Lidar, and radar), retrieval of vegetation biophysical and biochemical param-
parameter estimation,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 4, eters from RS data, Internet of things (IOT), and convergence and synergism
pp. 804–808, Jul. 2011. between RS, GIS, and IOT for precision farming.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
AGHIGHI et al.: ML REGRESSION TECHNIQUES FOR THE SILAGE MAIZE YIELD PREDICTION USING TIME-SERIES IMAGES 15
Mohsen Azadbakht received the B.S. and M.S. Hamid Salehi Shahrabi received the B.A. degree in
degrees in pure mathematics and remote sensing, physical geography and M.S. degree in remote sens-
and the Ph.D. degree in geomatic engineering from ing from Shahid Beheshti University, Tehran, Iran, in
Lorestan University, Shahid Beheshti University 2010 and 2013, respectively.
and the University of Melbourne, Melbourne, VIC, He is currently a Researcher with the Applied Re-
Australia, in 2004, 2009 and 2016, respectively. mote Sensing Group, Iranian Space Research Center,
He is currently an Assistant Professor with the Tehran, Iran. His research interests include crop-type
Remote Sensing and GIS Research Center, Shahid mapping, crop biophysical and biochemical parame-
Beheshti University, Tehran, Iran. His research in- ters estimation, and water productivity estimation of
terests include remote sensing data processing and agricultural fields.
applications.
Davoud Ashourloo received the B.Sc. degree in nat- Soheil Radiom received the Ph.D. degree in telecom-
ural resource engineering from Gorgan University munication & microelectronics Eng. from MICAS-
of Agricultural Sciences and Natural Resources, the ESAT, Katholieke Universiteit Leuven, Leuven,
M.Sc. degree in remote sensing and GIS from Shahid Belgium, in 2011.
Beheshti University, Tehran, Iran, and the Ph.D. de- He is currently an Assistant Professor with the
gree in remote sensing from K. N. Toosi University of Iranian Space Research Center, Tehran, Iran, where
Technology, Tehran, Iran, in 2001, 2003, and 2014, he is also the Director of the space research insti-
respectively. tute. His research interests include different scopes
His research interests include remote sensing, in- in wireless sensor networks, Internet of things, RFID
cluding crop mapping, wheat leaf rust detection, and systems, remote sensing applications development,
retrieval of vegetation biophysical and biochemical ADC, DACs, etc.
parameters.