Aghighi 2018

This article has been accepted for inclusion in a future issue of this journal.
Content is final as presented, with the exception of pagination.
IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING 1
Machine Learning Regression Techniques for the

Silage Maize Yield Prediction Using Time-Series
Images of Landsat 8 OLI
Hossein Aghighi , Mohsen Azadbakht, Davoud Ashourloo , Hamid Salehi Shahrabi, and Soheil Radiom
Abstract—Machine learning (ML) techniques have been utilized I. INTRODUCTION

for the crop monitoring and yield estimation/prediction using re-
T REGIONAL and national levels, reliable crop yield es-
motely sensed data. However, these methods have been investigated
less for yield prediction of some crops, such as silage maize, which
can be cultivated at various times in different fields of an area. In-
A timation is crucial especially for governments, planners,
and decision makers to predict the total volume of crop prod-
consistency between fields for satellite-derived normalized differ- ucts that could be imported/exported in case of shortfall or
ence vegetation index (NDVI) temporal profiles can lead to some
difficulties in yield prediction methods using time series of remotely
surplus, respectively [1]–[3]. Traditionally, regional crop pro-
sensed data. Therefore, this research has investigated silage maize duction statistics of many countries have been computed using
yield prediction based on time series of NDVI dataset derived from ground-based observations and the production reports; however,
Landsat 8 OLI. This paper employed advanced ML techniques data gathering based on ground-based field visits is costly, time-
including boosted regression tree (BRT), random forest regression consuming and mostly inefficient while raising concerns about
(RFR), support vector regression, and Gaussian process regression
(GPR) approaches and compared their performance with some
the reliability of the process that could lead to poor crop yield
proposed conventional regression methods. For this purpose, the assessments [4]. Moreover, these datasets are mostly gathered
NDVI values of all silage maize fields were averaged and integrated too late for the above-mentioned decision-making processes [5].
to produce a two-dimensional dataset for each year. The ML tech- Hence, continuous monitoring of crop status and early yield es-
niques were employed 100 times and their evaluation metrics were timation and prediction have been hot topics in recent decades.
used to evaluate their performances and also analyze their stabil-
ity. Finally, all the results of each ML technique were averaged
Remote sensing (RS) has been extensively adopted to pre-
to produce silage maize yields. The comparisons between the re- dict/estimate crop yield at regional scales as well as individual
sults of these methods indicate that the BRT technique, with the fields [6], because it can provide synoptic, repetitive, time,
average R value higher than 0.87, outperforms other ones for all and cost-effective information on the earth’s surface [7], [8].
years. It was followed by RFR with almost same performance as Two strategies have been used to obtain a quantitative relation
GPR technique. This research demonstrated that some advanced
ML approaches can predict the silage maize yield and they are
between remotely sensed data and crop yields [9]–[11]. The
less sensitive to inconsistency of NDVI time series. The results also first group of approaches incorporates RS data into agromete-
showed that RFR was the most stable method to predict the maize orological or plant-physical models [12], such as SAFY [13],
yield in 2015, while it was trained using 2013–2014 dataset. [14] and AquaCrop [15], [16]. Although these methods aim
Index Terms—Boosted regression tree (BRT), crop yield pre- to accurately model the crop developments and predict the
diction, Gaussian process regression (GPR), Landsat 8 OLI, ma- crop yield, a precise estimation of crop yield requires a large
chine learning (ML), maize, normalized difference vegetation in- number of field inputs such as water balance models, fertilizers,
dex (NDVI) time series, random forest regression (RFR), ν-support etc., which are costly and complicated to obtain from remotely
vector regression (SVR). sensed data [11], [17]. The second group of classical crop yield
estimation approaches often rely on empirical relationships,
such as simple regression and multiple regression, between
Manuscript received October 5, 2017; revised February 7, 2018 and March multi/hyperspectral calibrated data and in situ crop physical
19, 2018; accepted March 23, 2018. This work was supported by “An Automatic
System to Retrieve the Biophysical and Biochemical Parameters of Agricultural parameter observations such as above-ground biomass, water
Crops (i.e., LAI, chlorophyll, biomass, yield) Using Remote Sensing Data” content, and crop yield [18].
Project, The Applied Remote Sensing Group, Iranian Space Research Center, Spectral vegetation indices (SVIs) derived from multi/
Tehran, Iran. (Corresponding author: Hossein Aghighi.)
H. Aghighi, M. Azadbakht, and D. Ashourloo are with the Remote Sensing hyperspectral calibrated data are important remotely sensed
and GIS Research Center, Faculty of Earth Sciences, Shahid Beheshti Univer- metrics for agricultural monitoring and crop estimation [19].
sity, Tehran 1983969411, Iran (e-mail:, h_aghighi@sbu.ac.ir; m_azadbakht@ Among the SVIs, the normalized difference vegetation index
sbu.ac.ir; d_ashourloo@sbu.ac.ir).
H. Salehi Shahrabi and S. Radiom are with the Iranian Space Research Cen- (NDVI), which is a normalization of reflectance values from
ter, Tehran 1459777511, Iran (e-mail:, hamidsalehi2007@gmail.com; soheil. near infrared (NIR) and red bands, has been successfully ap-
radiom@gmail.com). plied for crop growth monitoring due to its close relationships
Color versions of one or more of the figures in this paper are available online
at http://ieeexplore.ieee.org. to some vegetation parameters (VPs) such as leaf area index
Digital Object Identifier 10.1109/JSTARS.2018.2823361 (LAI) [20], [21], green biomass [22], [23], photosynthetically
1939-1404 © 2018 IEEE. Personal use is permitted, but republication/redistribution requires IEEE permission.
See http://www.ieee.org/publications standards/publications/rights/index.html for more information.
This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination.
2 IEEE JOURNAL OF SELECTED TOPICS IN APPLIED EARTH OBSERVATIONS AND REMOTE SENSING
active radiation [24], the fraction of absorbed photosynthet-

ically active radiation, and crop yields [7]. Furthermore, the
literature review has shown that individual NDVI values or its
time-series dataset can be employed to predict the crop yields
[25]–[30]. In these works, the investigated crop of each study
area was mostly planted at the time when crops were usually
planted [31].
In the case of availability of irrigation water as well as warm
and sunny weather, few crops such as silage maize can be culti-
vated based on a nonconforming cultivation calendar in differ-
ent fields of a study area. Therefore, due to different planting
times by farmers, the maturing and harvesting dates in dif-
ferent fields are not the same. These crops are characterized
by different NDVI temporal profiles; and there is no match-
ing sequence between the NDVI time-series data of the fields
[28], [32], [33]. Therefore, the time when each silage maize
reaches the maximum yield and the peak of NDVI is not similar
for all fields in a region. These conditions present the second
category of classical crop yield prediction methods with some
difficulties, because these models are usually described by sim-
ple formulas and assume the relation is relatively constant [11];
hence, the predicted results can vary spatially and temporally
[17], [34].
In order to overcome the above limitations of the second
group of models, which establish a simple regression model,
several nonlinear algorithms such as computational intelligence Fig. 1. (a) Location of the silage maize fields in the study area, (b) in 2015,
(CI) and expert systems have been employed in agriculture to (c) in 2014, and (d) in 2013. The yellow polygons show the borders of maize
aid decision support [10], [17], [35]. The suitability, feasibil- fields in each year. In the background images, the RGB composites are red,
NIR, green.
ity, and application of expert systems for agriculture and farm
management have been discussed in many studies [35]–[37]. In
addition, the agricultural RS literature abounds with examples
of using different components of CI algorithms, e.g., artificial II. MATERIALS AND DATA PREPARATION
neural networks [38]–[41], fuzzy-set theory [42], and genetic
algorithms [43], [44] to estimate yields in various crops. There A. Studying Area
are also some relatively new machine learning (ML) techniques, The studying area is a part of Moghan fertilized plain, which is
namely Gaussian process regression (GPR), support vector re- an area of more than 350 thousand hectares located in the North
gression (SVR), boosted regression trees (BRT), and random West of Iran [see Fig. 1(a)]. Since 1975, Moghan Agro-Industrial
forest regression (RFR), which have been successfully utilized and Animal Husbandry Company (MAIAHC) has been estab-
to estimate VPs [45]–[47]. The performances of these methods lished in this area in which about 28 thousand hectares have
can also be evaluated for other problems, such as yield predic- been used for agricultural, gardening, and animal husbandry
tion of crops with inter-/intraseasonal variability in the NDVI activities. This area is located between lat. 39.465◦ N and lat.
time-series trend. Hence, this study, conducted several experi- 39.615◦ N and long. 47.548◦ E and long. 48.009◦ E. About 90%
ments to compare the performance of these ML techniques with of the company’s lands and all of the maize fields are irrigated
the classical crop yield estimation approaches. by water from the channel through Aras River along the border
This study aims to investigate the potential of these of Iran and Azerbaijan.
methods for silage maize yield prediction and compare their
performances with the second classical crop yield estimation
approaches. Furthermore, this paper proposed a method to B. Crop Yield Dataset
employ the trained ML models for times as well as places with MAIAHC collects the annual crop yields for all crops across
no matching sequences of NDVI time series. its properties. Hence, the crop yield dataset (kg) per field (CYF)
This manuscript is organized as follows. In Section II, the of more than 40 silage maize fields per year was obtained for
study area and the employed datasets including crop yield, the period of 2013–2015 from the company [see Fig. 1(b)–(d)].
RS images, and SVIs are introduced. In Section III, the data Then, the average value of the annual crop yields ACY (kg·ha−1 )
preparation methodology, experiments, and the implemented was computed for each field from that period as follows:
ML techniques are presented. In Section IV, the results are re-
ported and discussed in detail. Finally, the conclusion is given in CYF
ACY = . (1)
Section V. FieldArea
AGHIGHI et al.: ML REGRESSION TECHNIQUES FOR THE SILAGE MAIZE YIELD PREDICTION USING TIME-SERIES IMAGES 3
C. Remotely Sensed Dataset generated for each year, where each record represented a field
and columns showed the time series of NDVI values. To sum-
The remotely sensed Landsat 8 OLI images are essential for
this paper due to their capabilities to study agriculture. Landsat marize, the data arranged in each 2-D matrix are indicated by
X, where each record xi ∈ RB , xbi = [x1i , x2i , . . . , xB i ] is a field
8 OLI’s design is an advancement on Landsat sensor technol-
data composed of B features. In this paper, i = (1, 2, . . . , n)
ogy, which extends the remarkable 50 years of Landsat records
and allows collection of a significantly greater number of im- where n represents the total number of silage maize fields in
each year, b = (1, 2, . . . , B) where B is the number of fea-
ages per day with improvements in signal-to-noise ratio, as well
as in spectral and radiometric resolutions [48]. Moreover, the tures or images in the time series of each year and each value
Landsat archive and also the collected data by Landsat 8 OLI are of b represents a feature number or a time step in a given
time series. The target variable y represents the ACY value
available to download at no cost via the Internet within 24 h of
acquisition from the United States Geological Survey (USGS) (kg·ha−1 ) for each silage maize field of a given year, where
data center (http://glovis.usgs.gov/) [48]. Hence, this long his- y ∈ R1 , yi = [y1 ; y2 ; . . . ; yn ], i = (1, 2, . . . , n) where n repre-
sents the total number of silage maize fields in each year, and
torical dataset provides a unique opportunity for all countries
to develop yield prediction/estimation models at the regional each yi is attached to xbi . In this study, the values of n are 48,
scales and also individual fields. 68, and 49 for years 2013, 2014, and 2015, respectively.
Since the silage maize in Moghan plain is planted between
May 1 and end of July and harvested between August 1 and end III. METHODS AND EXPERIMENTS
of October, a time series of all the Landsat surface reflectance
The basic assumption regarding data mining and ML ap-
(SR) and climate data record products for the period of between
proaches is the ability of identification, detection, and quan-
May 1 and the end of October was obtained for 2013–2015
tification of patterns in datasets using specific algorithms [55],
from the USGS website. These images were atmospherically
[56]. This enables automatic learning which, in turn, allows the
and geometrically corrected [49], [50]. Although the metadata
extraction of important information from raw data even if the
file provided with each Landsat image indicated that the cloud
data model is unknown [55], [57]. However, the performance of
cover of the images was less than 70%, we employed a newly
ML techniques can be affected by the changes of the variable-to-
developed Fmask (Function of mask) method proposed by [51]
observation ratio [58], where a large number of input variables
to mask the cloud/shadow of each scene. According to [51],
in comparison with the number of observations (i.e., dimension-
the only difference between the results of this Fmask algo-
ality) leads to the curse of dimensionality [59] and, therefore,
rithm and that of Landsat 4–7 [52] is the place where the cirrus
overfitting [60]. For instance, the number of variables B was 12,
band is applied. Zhu et al. [51] employed the top of atmosphere
13, and 11 for years 2013, 2014, and 2015, respectively. Given
(TOA) reflectance of the cirrus band to compute the cirrus cloud
the total number of fields (TNF) (n) (see Section II-D) and B
probability (CCP) (2) and added this probability to that of the
for each year, there are about 4, 5.2, and 4.45 observations per
original Fmask algorithm. According to [51], pixels with TOA
feature for years 2013, 2014, and 2015, respectively. Moreover,
reflectance of the cirrus band larger than 0.01 should be labeled
division of observations into training/validation sets decreases
as potential cloud pixels. Hence, the new Fmask algorithm pro-
the observation-to-feature ratio value, which, in turn, makes ML
vides better cloud probabilities
methods prone to overfitting. It should also be considered that
TOA Reflectance of Cirrus Band when using ML techniques, these approaches require tuning of
CCP = . (2)
0.04 different parameters based on training data.
The cirrus band for Landsat 8 OLI images is band 10. As mentioned above, this paper aims to retrieve silage maize
The employed Fmask software is freely available at https:// yields using different ML techniques including GPR [61], SVR
github.com/prs021/fmask. [62], [63], BRT [64], and RFR [65]. Therefore, the training data
Then, NDVI derived from Landsat 8 OLI images was calcu- selection strategy, the four utilized ML techniques, implemen-
lated as tation details, and the hyperparameter tuning procedures are
ρNIR − ρRed presented in the following sections.
NDVI = (3)
ρNIR + ρRed
where ρNIR and ρRed are the SRs of the NIR and red bands of A. Subset Selection From NDVI Time-Series Data
Landsat 8 OLI, respectively [53]. In order to improve prediction performance of ML techniques
through defying the curse of dimensionality [66], we employed
D. Data Preparation a robust feature selection method, which modifies the neigh-
In order to compute the NDVI values for each silage maize borhood component analysis for regression problems [67]. In
field [see Fig. 1(b)–(d)], the NDVI values of all pixels within the this method, -insensitive loss function is utilized to make the
polygon of each silage maize field were averaged for each im- method more robust to outliers [68]. The -insensitive loss func-
age. This procedure results in a single vector for each time step tion is written as
that represents the average of NDVI values for that given time
[54]. Through extraction of NDVI values and integrating them 0 for |f (x) − y| <
L (y) = (4)
for the time-series dataset, a two-dimensional (2-D) matrix was |f (x) − y| − , other wise
Fig. 2. Feature selection results in 2013. (a) Computed loss value for the lambda values (a vector from zero to four with 200 points). (b) Selected features based
on their weight. (c) Number of fields with maximum NDVI in each feature. (d) NDVI time series of four different fields. (e) Shifted NDVI time series of four
different fields. Lambda is a regularization parameter that minimizes the generalization error of the model.
TABLE I Although the preliminary feature selection results seem

TOTAL NUMBERS OF FIELDS WHICH REACHED TO THE PEAK NDVI
VALUE AT EACH TSI
promising, the key problem with this approach is that it is not
possible to apply a trained ML technique using the selected
features of an individual year to another dataset from a differ-
ent year. Therefore, it can be concluded that the model is not
globally applicable because of dissimilar number of images in
different time series across distinct areas [see Fig. 2(c)] and also
the mismatching of the NDVI time-series data of different fields
Here, y represents the observation vector for matrix x and f (x) [see Fig. 2(d)]. To overcome this limitation, we select a feature
is the predicted response vector. According to this equation, the set comprised of a TSI with the maximum NDVI value as the
loss (L (y)) is zero if |f (x) − y| is less than ; otherwise, L (y) central TSI (t) and the NDVI values of its neighboring TSIs as
is computed by |f (x) − y| − . The value of depends on the appropriate features for the silage maize yield prediction [see
data and it defines a tube of radius around y [69]. The opti- Fig. 2(e)]. Our experiments showed that such a feature set that
mum regularization parameter of this method was tuned using spans within the interval [t − 3, t + 2] results in better yield
fivefold cross validation (CV) [see Fig. 2(a)].The feature weight- estimation. Therefore, according to Section III-A, the value of
ing algorithm uses gradient ascent technique and maximizes the B for this study is 6, and the b value for this TSI being 4 for all
leave-one-out accuracy regarding the regularization term. the observations employed in this study [see Fig. 2(e)]. Given
Fig. 2(b) shows the results of the performed feature selection B = 6, the number of observations per feature increased from
method. As this figure shows, the selected features can be di- 4, 5.2, and 4.45 (see Section III) to about 8, 11.3, and 8.17 for
vided into two major groups: 1) features with higher weights of years 2013, 2014, and 2015, respectively.
time-series index (TSI) between 4 and 7, and 2) features with
lower weights of TSI values of 2, 3, and 8. TSI is a set of time-
B. Training Data Selection Strategy
based values that indicate when each image was collected over
the studying area. There is a notable similarity between the TSIs Theoretically, the input to the automatic learning algorithm
of the first group in Fig. 2(b) and those of the peak values of is a set of features xbi and their corresponding target responses
each graph in Fig. 2(c). Each value on the graphs in Fig. 2(c) (yi ). Here, b represents the feature number with values from
represents the TNF that reached to the peak of NDVI values at 1 to 6, because of B = 6 (see Sections III-A and II-D). Each
that given TSI for each year. The summation of TNF across all dataset was partitioned into training (N samples) and testing
(b)
years is presented in Table I. A comparison between Fig. 2(b), sets (M = n − N samples) where the training set {xi }N i=1
(c), and Table I indicates the linkage between the weight pat- was used to train the learners, whereas the test set was used to
terns and their corresponding TNF, at each TSI between 4 and estimate the error rate of the predictive model together with the
7. For instance, the maximum TNF in Table I belongs to the TSI provision of prediction for the test observations. The stability
number 5, which also has the highest weight in Fig. 2(b). of the model and the accuracy and reliability of the estimated
results are directly related to the selection of training set compo- GPR first utilizes the training dataset to obtain a prior GPR to
nents with varying features and targets, the size of training set, constrain the possible forms of the unknown function and then
and the training set selection strategy [70]. Although several updates the prior to generate a posterior GPR [47]
training set selection techniques such as holdout method also
N

known as test sample estimation, CV, leave one out, and stratifi-
cation approaches have been proposed, most of them suffer from ŷ = f (x) = αi K(xi , xt ) + ω0 (5)
i=1
some limitations because of their theoretical assumptions. For
instance, the holdout methods cannot efficiently use the data, where ω0 is the bias term, αi ∈ R is the weight assigned to each
because the dataset size is finite and this method simply makes (b)
{xi }N i=1 , and K is a function that evaluates similarity between
two partitions of the original dataset. Therefore if a given class (b)
is overrepresented in one subset, it might be underrepresented in the test set xt and the training set xi , i = 1, 2, . . . , N [46].
the other subset [70]. The leave-one-out approach, which splits GPR frequently requires to fully tune model hyperparameters,
observations into two parts including an individual validation regularization terms, and optimization parameters related to the
observation and the training set of size n − 1, is identified as an covariance or kernel functions [46].
unbiased strategy; however, this method is expensive and also In this paper, we utilized a Gaussian-likelihood function,
leads to unreliable estimates due to its high variance [70], [71]. which is often used for regression [61], [77]. The employed
In the stratified CV method, the proportions of classes in each GPR software is freely available at http://www.gaussianprocess.
fold are approximately equal to the proportions of labels in the org/gpml/ [61] .
original dataset [72]. A basic form of CV is k-fold CV, which
first partitions data into k equally or nearly equally sized folds, D. Support Vector Regression
then a fold is left out and used as the test set in one iteration and Support vector machine is a supervised nonparametric sta-
other folds that contain totally n( k −1
k ) samples build the train- tistical learning technique [78], which has been widely applied
ing set [73]. With regards to the advantages and drawbacks of in the RS field due to its ability to successfully handle small
these techniques, k-fold CV strategy is employed in this paper. training sets [79]. SVM can be categorized into support vector
In this paper, all the n samples were randomly shuffled first to classification and SVR, where SVR is the most common form
remove any inherent pattern embedded in the dataset, and then of SVMs [80]. SVMs have been employed in numerous studies
they were split into five equally sized (or almost equally sized) such as classification tasks [81]–[83] as well as to solve regres-
without replacement groups [73]. Each of the above-mentioned sion problems [47], [84]. Many efforts have been made toward
ML techniques was then trained using four of the folds (n( k −1 k ) the application of these algorithms in the RS community such as
samples) and the fifth fold was reserved for validation and es- for estimating and monitoring biochemical and the biophysical
timation of prediction error. The estimated yields for each test parameters of plants [79], [84], [85], which was well reviewed
fold were considered as the final results of that fold; then, the by [79]; hence, we do not detail them here.
estimated yields from k iterations were integrated to produce the The most widely used versions of SVR are ε-SVR and ν-SVR,
total results. The k errors estimated from the validation steps which were categorized regarding their regularization parame-
were averaged to produce a single estimation error. ters, i.e., ε and ν [86]. The ν-SVR and also its comparison
In this study, a repeated fivefold CV strategy was utilized; for with ε-SVR were discussed in detailed by [63]. According to
this reason, the data were reshuffled 100 times and the learning [47] and [86], ν-SVR has a more meaningful interpretation than
and validation procedures were repeated after each round. The ε-SVR; moreover, the work in [87] stated that regardless of
100 results were again averaged to produce a single crop yield the limitation imposed by the number of training samples, ν-
estimation. SVR has good generalization properties; hence, this study only
utilized ν-SVR using a radial basis function (RBF) kernel to
C. Gaussian Process Regression approximate the nonlinear regression function, as in [47], [87],
and [88]. RBF was chosen here due to its simplicity, and fewer
GPR as a famous kernel-based ML approach implements
implementation difficulties, faster training rate, and ability to
Gaussian processes (GPs) for nonlinear regression purposes. A
deliver an acceptable accuracy [47], [87]–[90].
GP is a collection of random variables with the property that any (b) (b)
finite number of their subsets have a joint Gaussian distribution Given a set of {(x1 , yi ), . . . , (xn , yn )} where each xi ∈
B 1
[61]. GPR was formulated and interpreted in a Bayesian con- R is an input and yi ∈ R is a target output, the problem of
text (5); however, it is categorized as a nonparametric Bayesian ν-SVR is minimization of the following equation [86]:
approach [74] or as a less parametric tool [75]. GPR is not de- N

signed to fit the required parameters of a selected model but 1 T 1 ∗
ww + C ν + (ξi + ξi )
instead aims to capture many different relations that show how 2 N i=1
all the training inputs and target variables are correlated [76]. As T
stated in Section III-B, in order to learn and generate the final w Φ (xi ) + b − yi ≤ +ξi
function model also known as posterior GPR [47], this paper
yi − wT Φ (xi ) + b ≤ +ξi∗
utilized a fivefold CV technique to generate a training dataset
(b)
{xi }N N ξi , ξi∗ ≥ 0, i = 1, . . . , N, ≥ 0 (6)
i=1 and also output dataset {yi }i=1 . In each CV iteration,
where 0 ≤ ν ≤ 1, C is the regularization parameter, xi denotes learners; hence, random forest sensibly averages the total trees
the training vectors, Φ is a function that mapped xt into a higher grown to make a prediction model (fˆrQf r )
dimensional space, and ξi and ξi∗ are slack variables representing
Q
upper and lower constraint on the outputs of the system. This 1
fˆrQf r = T (y, Θq ) (9)
paper employed LibSVM software, which is freely available at Q q =1
https://www.csie.ntu.edu.tw/˜cjlin/libsvm/ [63]. As mentioned
in Section III-B, fivefold CV was employed to estimate C, γ, where Q represents the total number of trees, T (y, Θq ) indicates
and ν using training dataset and the ε was set to 0.1 [86]. the qth tree, y is the target point that requires prediction, and
Θq is a random vector generated for the qth tree independent of
E. Boosted Regression Trees previous vectors.
In this paper, the generalization errors of the model were es-
The term boosted in BRT technique refers to a general and
timated using out-of-bag estimates [96] as an efficient method
provably effective method in ML, which can combine a large
[97]. Moreover, the Bayesian optimization algorithm [98] was
number of relatively simple tree models adaptively to produce an
employed for identifying some optimal predictive model param-
accurate and optimized prediction rule [64], [91]. This method
eters including minimum leaf size and the number of predictors
was developed based on the idea that finding and linearly com-
to use at each node.
bining many rough rules is much easier than finding a single and
highly accurate prediction model [91]. Although BRT model dif-
G. Evaluation
fers from traditional regression methods, it is interpreted as an
advanced form of regression in the statistical community [92]. In this study, the root-mean-square error (RMSE), which pro-
Hence, we employed BRT as an advanced ML technique in this vides a general purpose error metric of numerical prediction, the
paper, where residuals were considered as the loss function [93]. mean absolute error (MAE), which describes average model-
As mentioned in Section III-B, fivefold CV was employed performance error, and correlation coefficient (R) were calcu-
to iteratively fit the model to the training data. For this reason, lated to compare the performances of GPR, SVR, BRT, and RFR
first the initial values were assigned to the constant model that methodologies in maize yield prediction. The RMSE, MAE, and
also included an oversized tree; and then at every step, a boost- R are calculated by using (10)–(12), respectively [99], [100]
ing technique adds a new tree under a certain growth property

that steps down the gradient of the loss function, which focused 1 M
RMSE = (yi − ŷi )2 (10)
on the residuals. In the training phase, three hyperparameters M i=1
including interaction depth, learning rate, and number of trees
were determined using Bayesian optimization method [98]. In where M is the number of test samples, yi indicates the actual
the prediction procedure, each successively selected tree gives maize yield (AMY) data, and ŷi is the predicted maize yield
extra weights to incorrectly predicted points by earlier predic- (PMY)
tors. A BRT model is defined as follows:

1 M
MAE =
K
|yi − ŷi | (11)
fˆBRT
A
= T (y, Θk ) (7) M i=1
k =1

M
where T (y, Θk ) is kth tree, and at each step of the procedure, (yi − y i ) ŷi − ŷ i
R =
i=1 (12)
the following cost function should be solved: N 2
N 2
i=1 (yi − y i ) i=1 ŷi − ŷ i
N

Θ̂m = argmin L(yi , fk −1 (yi ) + T (x, Θk )) (8) where y i and ŷ i are the average of observed and predicted data of
Θk i=1 the model, respectively. The square of R is called the coefficient
of determination that can be computed using (13)
where fk −1 (y) indicates the current model, Θk = {Rj k , γj k }j1k

2
as a region set of Rj k and constants γj k in the kth tree, j M
(yi − y i ) ŷi − ŷ i
i=1
represents a metaparameter, and Rj k refers to the jth disjoint R 2 =
N 2 .
2
N
(13)
nonoverlapping region in the kth tree, yi is the response associ- i=1 (yi − y i ) i=1 ŷi − ŷ i
ated with xi , and L is the loss function.
IV. RESULTS AND DISCUSSION
F. Random Forest Regression
As mentioned in Section III-B, the fivefold CV was employed
Random forests [94] as an ensemble learning method is a to train/validate the GPR, SVR, BRT, and RFR methodologies.
particular implementation of bootstrap aggregation (bagging) For each year, the R, RMSE and MAE of each method for 100
that attempts to mitigate the problems of high variance and runs are compared in Table II, whereas the distribution of those
high bias of the final prediction model through reduction of metrics for each individual experiment is illustrated as box plots
the correlation between estimators (trees) [95]. To this end, in Fig. 3, in order to summarize and illustrate the distribution
each learner randomly samples a subset of n input features to of data based on their minimum, first quartile, median, third
construct splits for each tree. This results in different models for quartile, and maximum.
TABLE II
AVERAGE (μ) AND STANDARD DEVIATION (σ) OF R, RMSE AND MAE VALUES OF 100 EXPERIMENTS
Fig. 3. Box plots of MAE, RMSE, and R of the estimated maize yield using different ML techniques for years 2013–2015.
The overall patterns of the R metric were numerically ana- results, but their reported statistics should be interpreted with
lyzed and reported in Table III. Moreover, the PMY of all the the caution as they can be misleading.
experiments were averaged for each method to produce a single As Fig. 3 shows, 25% of the experiment points (e.g., R values)
estimation, which is illustrated as Fig. 4. are distributed under the central blue rectangle of each box; these
Fig. 3 is a combination of box plots and beeswarm plots that points are categorized as the first quartile and can be defined
illustrate the probability distribution of the computed R, RMSE, as the middle points between the smallest metrics values of
and MAE values for all experiments by plotting their quartiles each method and the first quartiles values (see Table III). By
against each other, as well as the individual position of each comparing the first quartile values and the minimum values in
computed metric. The full range of the computed R, RMSE and Table III, it can generally be concluded that the first quartile
MAE values of each box plot is represented by the horizontal of RFR and SVR techniques is more compact than those from
black lines at the end of the dashed whiskers (see Fig. 3), and other techniques and the R values of GPR technique have a
the outliers are illustrated by red crosses; however, we are aware wider spread. However, the average of minimum value and the
that the outliers are not necessarily bad data points. Hence, first quartile value of the BRT technique are higher than those
they should not be automatically removed from the experiments from other methods. Therefore, it can be concluded that in the
TABLE III
THE MINIMUM, 1ST QUARTILE, MEDIAN, 3RD QUARTILE AND THE MAXIMUM VALUES OF R VALUES OF 100 EXPERIMENTS USING DIFFERENT ML TECHNIQUES
AND THE SUMMATION OF ALL VALUES IN EACH COLUMN
Fig. 4. Scatter plots of all the AMY versus PMY using different ML techniques for years 2013–2015.
TABLE IV
SOT VALUES OF FIRST, SECOND, AND THIRD QUARTILES OF R 2 VALUES OF 100 EXPERIMENTS
TABLE V
DIFFERENCE BETWEEN THE UPPER AND LOWER QUARTILES VALUES (INTERQUARTILE RANGE), THE DIFFERENCE BETWEEN THE THIRD AND THE FIRST
QUARTILES AND THE DIFFERENCE BETWEEN THE MAXIMUM AND THE THIRD QUARTILES OF 100 EXPERIMENTS
TABLE VI
DISTANCE BETWEEN MEDIAN AND MINIMUM OF SECOND QUARTILES (L_NOTCH) AND THE MAXIMUM VALUE OF THIRD QUARTILES
(U_NOTCH) OF R VALUES OF 100 EXPERIMENTS
worst cases, the performances of BRT technique were better the AMY values in 50% of the experiments and also the smaller
than the others. values can be interpreted as demonstrating the higher stability
In regards to Table IV, at least in 25% of the experiments, of the regression model [see Fig. 3(a)–(c)].
the BRT technique can predict maize yield better than the other Regarding to Table III, the interquartile ranges of RFR and
techniques on its worst performance, because the summation of SVR are more compressed than the others, emerging as the most
tolerance (SOT) of R2 values (14) for all years (SOT = 1.01) stable methods. Moreover, the summation of the first and third
is lower than for the others. Moreover, based on the computed quartile values across the three years for each technique (see
SOT values of all the techniques (see Table IV), the PMY using Table III) also shows that SVR outperforms other techniques in
SVR (SOT = 0.18) was closer together than others in 50% of terms of the interquartile range (see the last row of Table III).
experiments The estimated confidence intervals for each median are rep-
resented by a notch close to the median of each box plot [103].
P
As Fig. 3 shows, the notches are not similar in any of the box
SOT = 1 − R2 (14) plots as well as between the second and third quartiles of some
i=1
box plots. In these cases, the distance between the median and
where P is the number of years for which their datasets were the first quartile is not same as the distance between the median
employed (P = 3). Although 1 − R2 was called the tolerance and the third quartile (see Table VI). Regarding this table, it
by [101] and it was defined as “the proportion of variation that can be concluded that the distribution of the R values in the
is not explained by the regression model” [102], we introduced second and third quartiles is skewed for all of the experiments.
(14) to compare the performance of the regression models in the In Table VI, the smaller U_Notch value than the L_Notch value
time-series problem. of each technique shows that the bottom whisker is much longer
The central blue rectangle of each box plot in Fig. 3 spans and the median is closer to larger R values. As can be seen, the
the first to the third quartiles. These rectangles demonstrate that R values of all techniques except BRT in 2015 are skewed to
50% of metrics values are between the values of first and third the left side; hence, the mean of almost all of the techniques is
quartiles (see Table III). The difference between the third and usually not in the middle but is located somewhere closer to the
first quartiles is referred to as interquartile range. The horizontal third quartiles.
bar in the middle of each box plot is called the median, which The distance between the minimum and maximum of R val-
shows that the accuracies of half of the experiments results ues (full range of variation) of all the experiments (see Fig. 3
are higher than the median and half are worse. The median and Table III) is presented in Table V. Based on this table, the
of R values for each technique and their SOT are represented BRT boxplots are comparatively shorter in range, which indi-
in Tables III and IV, respectively. The interquartile ranges of cate that the overall performance of this method over different
R2 values for each technique were computed and presented in runs has higher level of agreements with each other. Moreover,
Table V. These values indicate that the PMY values are close to the boxplots of GPR are comparatively taller than the others,
where particularly its R values across all years and those of
TABLE VII
t-TEST FOR THE COEFFICIENT OF DETERMINATION (R) MEASURES USING THE UPPER AND LOWER QUARTILES OF R VALUES OF 100 EXPERIMENTS (H = 1
REJECTS THE NULL HYPOTHESIS IN 95% CONDENCE LEVEL)
Here, NaN stands for not a number.
TABLE VIII Regarding the random sampling methods, although we inves-

R AND RMSE OF THE PREDICTED SILAGE MAIZE YIELD USING THE
MAXIMUM AND SUMMATION OF THE SILAGE MAIZE NDVI PROFILES DURING
tigated to avoid overfitting, there is no guarantee for avoidance
THE GROWING SEASON AND THE BEST INDIVIDUAL NDVI of sampling bias. These samples should be used to train the
DATASET FOR YEARS 2013–2015 ML techniques, which can lead to over- or underestimation of
the parameters and consequently overfitting or underestimating
results. Therefore, as previously stated, each method was itera-
tively trained, applied, and evaluated 100 times [100]. In order
to reduce the sampling bias impacts of different runs, we aver-
aged the PMY values of all iterations and plotted them against
the AMY values in Fig. 4, where X-axis corresponds to the
AMY values and Y-axis represents the PMY values. Moreover,
BRT and RFR in 2015 and of SVR in 2013 show heavier tailed a regression fit line was added to each scatter plot to model the
populations. The variance of R values in the heavier tailed pop- relationship between the AMY and PMY values. The regression
ulation cases is larger; therefore, their sample average might equation of each model and its R value were also added to their
overestimate the observation average. corresponding plots.
The pattern of the whole distribution of R values in each As Fig. 4 shows, there is a linear pattern in all the plots,
boxplot should be considered; therefore, the difference between indicating the two sets of data are correlated. The R value
the maximum and third quartile values of all experiments is of each plot shows how appropriately the model can predict
presented in Table V. The summation of each column shows that the yield, whereas a higher R2 value means that the strength of
generally the GPR results have a higher level of agreement with the relationship between AMY and PMY values is compelling.
each other; and the BRT technique results are more compact. The R values of the BRT model are higher than 0.74 for all
However, regarding this table and also the maximum R values years, with the average of R = 0.87 being more than those from
and third quartile values (see Table III), the BRT technique the other techniques. The minimum R values, however, belong
performances in the last quartile are far better than those of the to SVR. After the BRT technique, the average of R values of
other techniques. RFR is almost the same as that of GPR and is more than that of
From the overall patterns of RMSE [see Fig. 3(g)–(i)] and SVR, where its R values change between 0.64 and 0.8. There-
MAE [see Fig. 3(d)–(f)] metrics and the minimum, maximum, fore, all the above discussions confirmed that the BRT method
mean, and standard deviation values (see Tables II and III), the had relatively better performance than the other techniques for
average error (plus or minus) of PMY values generated by SVR silage maize prediction.
is larger than those obtained from the other ML techniques, As mentioned in Section I, there have been some works that
revealing its inferior performance compared to the others for employed the second classical crop yield estimation approaches
fitting to the observed data. [27], [33], [104]. According to them, all the NDVI dataset of the
Regarding the previous discussions and also the above- last two months of growing season as well as the maximum and
mentioned metrics, which are reported in Table III, the PMY summation of NDVI values time series of the growing season
values of BRT technique are considerably closer to the AMY can be successfully adopted to estimate the crop yields. There-
values than other methods [see Fig. 3(a)–(c)]. However, the fore, this paper utilized these parameters to predict the silage
t-Test approach was employed to statistically test the differ- maize yield (see Table VIII). According to the R and RMSE
ence between the obtained R and RMSE values of all the values of this table, the maximum NDVI values of each field
experiments (see Table VII). The reported H = 1 rejects the during the growing season show a stronger relationship with the
null hypothesis with 5% confidence level. The null hypoth- silage maize yield than the others. The scatter plots are illus-
esis states that the both data come from a distribution with trated in Fig. 5. The last column of this table illustrates the Julian
equal means and unknown variance. The alternative hypoth- days of an individual NDVI dataset, which results in the highest
esis states that the population distribution of both data does correlation. Although Fig. 5 shows a linear pattern in most of
not have equal mean. As can be seen in Table VII, we failed the plots, which indicate the two sets of data are correlated, their
to reject the null hypothesis for BRT and RFR techniques in correlation coefficients are mostly lower than those of the ML
2013. techniques, except for SVR. Moreover, the t-Test failed to reject
Fig. 5. Scatter plots of all the AMY versus PMY using the maximum and summation of the silage maizes NDVI profile during the growing season and the best
individual NDVI dataset (see Table VIII) for years 2013–2015.
the null hypothesis for best Julian day results with the maximum TABLE IX
SUMMARY OF STATISTICS OF 100 EXPERIMENTS ON THE PERFORMANCES OF
and summation of NDVIs for all years. ML TECHNIQUES TO PREDICT THE MAIZE YIELD IN 2015, WHILE THEY
The above results confirmed that the ML techniques can be TRAINED USING THE DATASETS OF 2013–2014
employed to estimate the maize yield of a large studying area
in a given year. In the last part of this study, we focused on
predicting the maize yield in a given growing season, while the
ML techniques being trained using the dataset of other seasons.
In this situation, the performance of the ML techniques in yield
prediction can be evaluated in new seasons and under new con-
ditions. Therefore, we utilized fivefold CV (see Section III-B) to
train all the ML techniques using the integrated dataset of 2013–
2014; then predict the maize yield in 2015. These experiments MAE, a combination of both metrics should be used to assess
were repeated 100 times [100]. model performance. Therefore, for each ML method, the min-
Based on correlation coefficient definition presented in [105], imum, average (μ), maximum, and standard deviation (σ) of
the R value represents the strength of correlation between the RMSE and MAE for 100 runs are reported in Table IX.
dependent and independent variables, and R2 value indicates As Table IX shows, although the average of RMSE and MAE
how much of the variance between those two variables can be of GPR model is less than for other techniques, their standard
described by the linear fit. With regard to [99], although most deviations are more than for the others. In contrast, the lowest
of the works in the field of geoscience recognized RMSE as a standard deviation values were obtained by RFR method while
standard metric to model errors and a few of them employed their average of RMSE and MAE is more than for GPR. Due
to these results, it can be concluded that the results of 100 runs ability to deal with high-dimensional data of complex distribu-
of RFR tend to be close to their mean values and GPR results tions as well as inconsistency of NDVI time series. Specifically,
are spread out over a wider range of values in comparison to the BRT technique, where its average R value was higher than
other techniques. Moreover, RFR can be considered as a stable 0.87 for all the years, performed the best for PMY. BRT was
method in 100 runs and GPR was more accurate at predicting followed by RFR with almost the same performances as GPR
maize yield. technique and better than SVR method. Moreover, in the cases
A comparison between Tables IX and II shows that employing that ML trained using a dataset of two years (2013–2014) to
the trained ML techniques using 2013–2014 dataset to predict predict the maize yield of another year (2015), the average of
the silage maize yield in 2015 results in increasing the RMSE RMSE and MAE of GPR method in 100 runs was lower than for
and MAE. One of the limitations with these results is that the other methods and the RFR technique was the most stable, hav-
number of images (B) in the time series of 2015 (B = 11) was ing the smallest standard deviations. However, in future studies,
less than 2013 and 2014 with 13 and 12 images, respectively, these methods should also be tested with regard to different crop
where one of them in 2013 and two images in 2015 were re- types in different geographical regions as well as on different
moved due to the presence of cloud covers over the studying dataset sizes.
area. One of the removed images in 2015 was acquired at about
the peak of the growing season of silage maize. As it was men- ACKNOWLEDGMENT
tioned, the feature within the interval [t − 3, t + 2] results in
better maize yield estimation (see Fig. 2 and Table I); therefore, The authors would like to thank the editor and two anony-
losing data in that interval might decrease the performance of mous reviewers for their careful reading of this manuscript as
the ML techniques. This can be also seen in the case of using the well as their valuable and constructive comments and sugges-
trained ML method in a given year to predict the silage maize tions that have helped to improve this manuscript. The authors
yield in that year (see Table II), where the amount of B value gratefully acknowledge USGS, EROS Science Processing Ar-
may have contributed to higher R and lower RMSE values of chitecture, for providing the Landsat 8 OLI images. The authors
2014 than 2013 and 2015. also acknowledge Moghan Agro-Industrial and Animal Hus-
With regards to the results of this paper, NDVI values and their bandry Company for providing silage maize yield dataset.
time series can be considered as an appropriate input for yield
prediction of crops with different cultivated dates. However, REFERENCES
due to the influence of cloud cover and cloud shadow on NDVI
[1] T. Horie, M. Yajima, and H. Nakagawa, “Yield forecasting,” Agricultural
and other VIs derived from the optical remotely sensed data, Syst., vol. 40, no. 1–3, pp. 211–236, 1992.
the multisource cloud free optical satellite imagery (MSCFOSI) [2] V. Barnett and J. Riley, “Statistics for environmental change,” Exp. Agri-
can be used to gather more information than a single satellite. culture, vol. 31, no. 2, pp. 117–130, 1995.
[3] M. K. van Ittersum, K. G. Cassman, P. Grassini, J. Wolf, P. Tittonell,
Therefore, VIs derived from MSCFOSI imagery could result in and Z. Hochman, “Yield gap analysis with local to global relevance—A
more accurate yield estimation of the desired crop. review,” Field Crops Res., vol. 143, pp. 4–17, 2013.
LAI of maize increased rapidly from 40 to 60 days after sow- [4] C. Reynolds, M. Yitayew, D. Slack, C. Hutchinson, A. Huete, and M.
Petersen, “Estimating crop yields and production by integrating the
ing, where higher LAI levels could result in saturation of NDVI; FAO crop specific water balance model with real-time satellite data
hence, other indices such as SAVI and SAVI2 could be investi- and ground-based ancillary data,” Int. J. Remote Sens., vol. 21, no. 18,
gated in future works [106]. These indices might show stronger pp. 3487–3508, 2000.
[5] D. M. Johnson, “An assessment of pre-and within-season remotely
correlations with biomass and LAI than NDVI, thereby leading sensed variables for forecasting corn and soybean yields in the United
to more stable models. In addition, it should be noticed that States,” Remote Sens. Environ., vol. 141, pp. 116–128, 2014.
these methods required the crop maps as a prerequisite; hence, [6] D. B. Lobell, D. Thau, C. Seifert, E. Engle, and B. Little, “A scal-
able satellite-based crop yield mapper,” Remote Sens. Environ., vol. 164,
future research could investigate new approaches that can mask pp. 324–333, 2015.
the maize fields in advance and then predict the crop yields. [7] L. Leroux, C. Baron, B. Zoungrana, S. B. Traoré, D. L. Seen, and A.
Moreover, this paper did not investigated the effect of other Bégué, “Crop monitoring using vegetation and thermal indices for yield
estimates: Case study of a rainfed cereal in semi-arid West Africa,” IEEE
environmental parameters (e.g., soil texture, soil moisture, cli- J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 1, pp. 347–
matic parameters, and LAI) on crop yield predictions; hence, the 362, Jan. 2016.
future works can also incorporate the environmental parameters [8] O. Vergara-Dı́az et al., “A novel remote sensing approach for predic-
tion of maize yield under different conditions of nitrogen fertilization,”
in an assimilation-crop modeling framework. Frontiers Plant Sci., vol. 7, p. 666, 2016.
[9] O. D. Sirotenko, “Crop modeling,” Agronomy J., vol. 93, no. 3, pp. 650–
653, 2001.
V. CONCLUSION [10] P. J. Pinter, Jr., et al., “Remote sensing for crop management,” Pho-
togramm. Eng. Remote Sens., vol. 69, no. 6, pp. 647–664, 2003.
This study successfully incorporates BRT, RFR, SVR, and [11] C. Ferencz et al., “Crop yield estimation by satellite remote sensing,”
GPR ML algorithms in silage maize yield prediction process Int. J. Remote Sens., vol. 25, no. 20, pp. 4113–4149, 2004.
[12] G. Asrar, M. Fuchs, E. Kanemasu, and J. Hatfield, “Estimating absorbed
based on NDVI time-series dataset of different fields in a part of photosynthetic radiation and leaf area index from spectral reflectance in
Moghan fertilized plain, Ardabil, Iran. The utilized time series of wheat,” Agronomy J., vol. 76, no. 2, pp. 300–306, 1984.
NDVI values was derived from Landsat 8 OLI images. Regard- [13] Z.-Q. Jin, D.-K. Ge, and C.-L. Shi, “Several strategies of food crop
production in the Northeast China plain for adaptation to global climate
ing the presented results, the ML techniques performances were change—A modeling study,” Acta Agron. Sin., vol. 28, no. 1, pp. 24–31,
higher than for the conventional regression methods due to their 2002.
[14] R. Hadria et al., “Potentiality of optical and radar satellite data at high [37] C. Zhang and J. M. Kovacs, “The application of small unmanned aerial
spatio-temporal resolutions for the monitoring of irrigated wheat crops systems for precision agriculture: A review,” Precis. Agriculture, vol. 13,
in Morocco,” Int. J. Appl. Earth Observ. Geoinf., vol. 12, pp. S32–S37, no. 6, pp. 693–712, 2012.
2010. [38] D. Jiang, X. Yang, N. Clinton, and N. Wang, “An artificial neural network
[15] D. Raes, P. Steduto, T. C. Hsiao, and E. Fereres, “Aquacrop—The FAO model for estimating crop yields using remotely sensed information,” Int.
crop model to simulate yield response to water: II. Main algorithms J. Remote Sens., vol. 25, no. 9, pp. 1723–1732, 2004.
and software description,” Agronomy J., vol. 101, no. 3, pp. 438–447, [39] S. S. Panda, D. P. Ames, and S. Panigrahi, “Application of vegetation
2009. indices for agricultural crop yield prediction using neural network tech-
[16] P. Steduto, T. C. Hsiao, D. Raes, and E. Fereres, “Aquacrop—The FAO niques,” Remote Sens., vol. 2, no. 3, pp. 673–696, 2010.
crop model to simulate yield response to water: I. Concepts and under- [40] M. Rahman and B. Bala, “Modelling of jute production using artificial
lying principles,” Agronomy J., vol. 101, no. 3, pp. 426–437, 2009. neural networks,” Biosyst. Eng., vol. 105, no. 3, pp. 350–356, 2010.
[17] X. E. Pantazi, D. Moshou, T. Alexandridis, R. Whetton, and A. M. [41] P. Bose, N. K. Kasabov, L. Bruzzone, and R. N. Hartono, “Spiking neural
Mouazen, “Wheat yield prediction using machine learning and advanced networks for crop yield estimation based on spatiotemporal analysis of
sensing techniques,” Comput. Electron. Agriculture, vol. 121, pp. 57–65, image time series,” IEEE Trans. Geosci. Remote Sens., vol. 54, no. 11,
2016. pp. 6563–6573, Nov. 2016.
[18] G. Petropoulos and C. Kalaitzidis, “Multispectral vegetation indices in [42] D. Jones and E. Barnes, “Fuzzy composite programming to combine
remote sensing: An overview e book chapter,” Ecol. Model., vol. 2, remote sensing and crop models for decision support in precision crop
pp. 15–39, 2012. management,” Agricultural Syst., vol. 65, no. 3, pp. 137–158, 2000.
[19] R. B. Myneni, F. G. Hall, P. J. Sellers, and A. L. Marshak, “The inter- [43] M. S. Moran, Y. Inoue, and E. Barnes, “Opportunities and limitations
pretation of spectral vegetation indexes,” IEEE Trans. Geosci. Remote for image-based remote sensing in precision crop management,” Remote
Sens., vol. 33, no. 2, pp. 481–486, Mar. 1995. Sens. Environ., vol. 61, no. 3, pp. 319–346, 1997.
[20] S. Goswami, J. Gamon, S. Vargas, and C. Tweedie, “Relationships of [44] H. Fang, S. Liang, and G. Hoogenboom, “Integration of MODIS LAI and
NDVI, biomass, and leaf area index (LAI) for six key plant species in vegetation index products with the CSM–CERES–Maize model for corn
Barrow, Alaska,” PeerJ PrePrints, vol. 3, 2015, Art. no. e913v1. yield estimation,” Int. J. Remote Sens., vol. 32, no. 4, pp. 1039–1065,
[21] A. Kross, H. McNairn, D. Lapen, M. Sunohara, and C. Champagne, 2011.
“Assessment of rapideye vegetation indices for estimation of leaf area [45] J. Verrelst et al., “Machine learning regression algorithms for biophysical
index and biomass in corn and soybean crops,” Int. J. Appl. Earth Observ. parameter retrieval: Opportunities for Sentinel-2 and-3,” Remote Sens.
Geoinf., vol. 34, pp. 235–248, 2015. Environ., vol. 118, pp. 127–139, 2012.
[22] T. Sakamoto, A. A. Gitelson, and T. J. Arkebauer, “Near real-time pre- [46] J. Verrelst, L. Alonso, J. P. R. Caicedo, J. Moreno, and G. Camps-
diction of U.S. corn yields based on time-series MODIS data,” Remote Valls, “Gaussian process retrieval of chlorophyll content from imaging
Sens. Environ., vol. 147, pp. 219–231, 2014. spectroscopy data,” IEEE J. Sel. Topics Appl. Earth Observ. Remote
[23] C. Wang, S. Nie, X. Xi, S. Luo, and X. Sun, “Estimating the biomass of Sens., vol. 6, no. 2, pp. 867–874, Apr. 2013.
maize with hyperspectral and LiDAR data,” Remote Sens., vol. 9, no. 1, [47] D. Ashourloo, H. Aghighi, A. A. Matkan, M. R. Mobasheri, and A. M.
2017, Art. no. 11. Rad, “An investigation into machine learning regression techniques for
[24] E. Raymond Hunt, Jr., “Relationship between woody biomass and par the leaf rust disease detection using hyperspectral measurement,” IEEE
conversion efficiency for estimating net primary production from NDVI,” J. Sel. Topics Appl. Earth Observ. Remote Sens., vol. 9, no. 9, pp. 4344–
Int. J. Remote Sens., vol. 15, no. 8, pp. 1725–1729, 1994. 4351, Sep. 2016.
[25] J. Lewis, J. Rowland, and A. Nadeau, “Estimating maize production [48] D. P. Roy et al., “Landsat-8: Science and product vision for terrestrial
in Kenya using NDVI: Some statistical considerations,” Int. J. Remote global change research,” Remote Sens. Environ., vol. 145, pp. 154–172,
Sens., vol. 19, no. 13, pp. 2609–2617, 1998. 2014.
[26] J. Mika, J. Kerényi, A. Rimóczi-Paál, Á. Merza, C. Szinell, and I. Csiszár, [49] S. Lhermitte, J. Verbesselt, W. W. Verstraeten, and P. Coppin, “A compar-
“On correlation of maize and wheat yield with NDVI: Example of Hun- ison of time series similarity measures for classification and change de-
gary (1985–1998),” Adv. Space Res., vol. 30, no. 11, pp. 2399–2404, tection of ecosystem dynamics,” Remote Sens. Environ., vol. 115, no. 12,
2002. pp. 3129–3152, 2011.
[27] M. S. Mkhabela, M. S. Mkhabela, and N. N. Mashinini, “Early maize [50] E. Hamunyela, J. Verbesselt, and M. Herold, “Using spatial context
yield forecasting in the four agro-ecological regions of Swaziland using to improve early detection of deforestation from Landsat time series,”
NDVI data derived from NOAA’s-AVHRR,” Agricultural Forest Meteo- Remote Sens. Environ., vol. 172, pp. 126–138, 2016.
rol., vol. 129, no. 1, pp. 1–9, 2005. [51] Z. Zhu, S. Wang, and C. E. Woodcock, “Improvement and expansion
[28] D. K. Bolton and M. A. Friedl, “Forecasting crop yield using remotely of the Fmask algorithm: Cloud, cloud shadow, and snow detection for
sensed vegetation indices and crop phenology metrics,” Agricultural Landsats 4–7, 8, and Sentinel 2 images,” Remote Sens. Environ., vol. 159,
Forest Meteorol., vol. 173, pp. 74–84, 2013. pp. 269–277, 2015.
[29] M. Battude et al., “Estimating maize biomass and yield over large areas [52] Z. Zhu and C. E. Woodcock, “Object-based cloud and cloud shadow
using high spatial and temporal resolution Sentinel-2 like remote sensing detection in Landsat imagery,” Remote Sens. Environ., vol. 118,
data,” Remote Sens. Environ., vol. 184, pp. 668–681, 2016. pp. 83–94, 2012.
[30] R. Shrestha, L. Di, G. Y. Eugene, L. Kang, Y.-z. Shao, and Y.-q. Bai, [53] Y. Ke, J. Im, J. Lee, H. Gong, and Y. Ryu, “Characteristics of Landsat
“Regression model to estimate flood impact on corn yield using MODIS 8 OLI-derived {NDVI} by comparison with multiple satellite sensors
NDVI and USDA cropland data layer,” J. Integrative Agriculture, vol. 16, and in-situ observations,” Remote Sens. Environ., vol. 164, pp. 298–313,
no. 2, pp. 398–407, 2017. 2015.
[31] Field Crops: Usual Planting and Harvesting Dates, vol. 628, United [54] D. G. Neary, B. Orr, W. Leeuwen, S. B. Aguilar, L. Wittenberg, and Y.
States Department of Agriculture National Agricultural Statistics Ser- Carmel, “A multi-country assessment of vegetation dynamics, soil ero-
vice, Washington, DC, USA, 2010. sion, and watershed degradation after wildfires,” in Hydrology and Water
[32] T. Sakamoto, A. A. Gitelson, and T. J. Arkebauer, “MODIS-based corn Resources in Arizona and the Southwest. Mesa, AZ, USA: Arizona-
grain yield estimation model incorporating crop phenology information,” Nevada Acad. Sci., 2005.
Remote Sens. Environ., vol. 131, pp. 215–231, 2013. [55] J. Behmann, A.-K. Mahlein, T. Rumpf, C. Römer, and L. Plümer, “A
[33] C. Funk and M. E. Budde, “Phenologically-tuned MODIS NDVI-based review of advanced machine learning methods for the detection of biotic
production anomaly estimates for Zimbabwe,” Remote Sens. Environ., stress in precision crop protection,” Precis. Agriculture, vol. 16, no. 3,
vol. 113, no. 1, pp. 115–125, 2009. pp. 239–260, 2015.
[34] B. Khakural, P. Robert, and D. Huggins, “Variability of corn/soybean [56] I. H. Witten, E. Frank, M. A. Hall, and C. J. Pal, Data Mining: Practical
yield and soil/landscape properties across a southwestern Minnesota Machine Learning Tools and Techniques. San Mateo, CA, USA: Morgan
landscape,” Precis. Agriculture, 1999, pp. 573–579. Kaufmann, 2016.
[35] J. McKinion and H. Lemmon, “Expert systems for agriculture,” Comput. [57] H. Buxton, “Learning and understanding dynamic scene activity: A re-
Electron. Agriculture, vol. 1, no. 1, pp. 31–40, 1985. view,” Image Vis. Comput., vol. 21, no. 1, pp. 125–136, 2003.
[36] R. Doluschitz and W. Schmisseur, “Expert systems: Applications to agri- [58] C. M. Van der Walt and E. Barnard, “Data characteristics that determine
culture and farm management,” Comput. Electron. Agriculture, vol. 2, classifier performance,” Human Lang. Technol. Res. Group, Meraka
no. 3, pp. 173–182, 1988. Inst., Pretoria, South Africa, 2006.
[59] R. E. Bellman, Adaptive Control Processes: A Guided Tour. Princeton, [86] C.-C. Chang and C.-J. Lin, “Training Nu-support vector regression: The-
NJ, USA: Princeton Univ. Press, 2015. ory and algorithms,” Neural Comput., vol. 14, no. 8, pp. 1959–1978,
[60] G. C. Cawley and N. L. Talbot, “On over-fitting in model selection and 2002.
subsequent selection bias in performance evaluation,” J. Mach. Learn. [87] J. Stamenkovic, P. Ferrazzoli, L. Guerriero, D. Tuia, J.-P. Thiran, and
Res., vol. 11, no. Jul., pp. 2079–2107, 2010. M. Borgeaud, “Crop backscatter modeling and soil moisture estimation
[61] C. E. Rasmussen and C. K. Williams, Gaussian Processes for Machine with support vector regression,” in Proc. IEEE Int. Geosci. Remote Sens.
Learning, vol. 1. Cambridge, MA, USA: MIT Press, 2006. Symp., 2014, pp. 3228–3231.
[62] A. J. Smola and B. Schölkopf, “A tutorial on support vector regression,” [88] V. Anandhi and R. M. Chezian, “Support vector regression to forecast
Statist. Comput., vol. 14, no. 3, pp. 199–222, 2004. the demand and supply of pulpwood,” Int. J. Future Comput. Commun.,
[63] C.-C. Chang and C.-J. Lin, “LIBSVM: A library for support vector vol. 2, no. 3, pp. 266–269, 2013.
machines,” ACM Trans. Intell. Syst. Technol., vol. 2, no. 3, p. 27, 2011. [89] S. S. Keerthi and C.-J. Lin, “Asymptotic behaviors of support vector ma-
[64] J. Elith, J. R. Leathwick, and T. Hastie, “A working guide to boosted chines with Gaussian kernel,” Neural Comput., vol. 15, no. 7, pp. 1667–
regression trees,” J. Animal Ecol., vol. 77, no. 4, pp. 802–813, 2008. 1689, 2003.
[65] A. Liaw et al., “Classification and regression by random forest,” R News, [90] D. Bhatt, P. Aggarwal, P. Bhattacharya, and V. Devabhaktuni, “An en-
vol. 2, no. 3, pp. 18–22, 2002. hanced MEMS error modeling approach based on Nu-support vector
[66] I. Guyon and A. Elisseeff, “An introduction to variable and feature se- regression,” Sensors, vol. 12, no. 7, pp. 9448–9466, 2012.
lection,” J. Mach. Learn. Res., vol. 3, no. Mar., pp. 1157–1182, 2003. [91] R. E. Schapire, “The boosting approach to machine learning: An
[67] W. Yang, K. Wang, and W. Zuo, “Neighborhood component fea- overview,” in Nonlinear Estimation and Classification. New York, NY,
ture selection for high-dimensional data,” J. Comput., vol. 7, no. 1, USA: Springer, 2003, pp. 149–171.
pp. 161–168, 2012. [92] J. Friedman et al., “Additive logistic regression: A statistical view of
[68] P. Bermolen and D. Rossi, “Support vector regression for link load pre- boosting (with discussion and a rejoinder by the authors),” Ann. Statist.,
diction,” Comput. Netw., vol. 53, no. 2, pp. 191–201, 2009. vol. 28, no. 2, pp. 337–407, 2000.
[69] V. Vapnik, The Nature of Statistical Learning Theory. New York, NY, [93] J. Friedman, T. Hastie, and R. Tibshirani, The Elements of Statistical
USA: Springer, 2013. Learning (Springer Series in Statistics), vol. 1. New York, NY, USA:
[70] R. Kohavi et al., “A study of cross-validation and bootstrap for accuracy Springer, 2001.
estimation and model selection,” in Proc. Int. Joint Conf. Artif. Intell., [94] T. K. Ho, “Random decision forests,” in Proc. 3rd Int. Conf. Document
Stanford, CA, USA, 1995, vol. 14, pp. 1137–1145. Anal. Recognit., 1995, vol. 1, pp. 278–282.
[71] B. Efron, “Estimating the error rate of a prediction rule: Improvement on [95] T. Hastie, R. Tibshirani, and J. Friedman, The Elements of Statistical
cross-validation,” J. Amer. Statist. Assoc., vol. 78, no. 382, pp. 316–331, Learning: Data Mining, Inference, and Prediction. New York, NY, USA:
1983. Springer, 2002.
[72] U. M. Braga-Neto and E. R. Dougherty, “Is cross-validation valid for [96] R. Tibshirani, Bias, Variance and Prediction Error for Classification
small-sample microarray classification?” Bioinformatics, vol. 20, no. 3, Rules. Toronto, ON, Canada: Dept. Statist., Univ. Toronto, 1996.
pp. 374–380, 2004. [97] L. Breiman, “Out-of-bag estimation,” Citeseer, 1996.
[73] P. Refaeilzadeh, L. Tang, and H. Liu, “Cross validation,” in Encyclopedia [98] J. Snoek, H. Larochelle, and R. P. Adams, “Practical Bayesian optimiza-
of Database Systems (EDBS). New York, NY, USA: Springer, 2009, p. 6. tion of machine learning algorithms,” in Proc. Adv. Neural Inf. Process.
[74] S. J. Gershman and D. M. Blei, “A tutorial on Bayesian nonparametric Syst., 2012, pp. 2951–2959.
models,” J. Math. Psychol., vol. 56, no. 1, pp. 1–12, 2012. [99] C. J. Willmott and K. Matsuura, “Advantages of the mean absolute error
[75] M. Ebden et al., “Gaussian processes for regression: A quick introduc- (MAE) over the root mean square error (RMSE) in assessing average
tion,” The Website of Robotics Research Group, Department of Engi- model performance,” Clim. Res., vol. 30, no. 1, pp. 79–82, 2005.
neering Science, University of Oxford, Oxford, U.K., 2008. [100] T. Chai and R. R. Draxler, “Root mean square error (RMSE) or mean
[76] C. K. Williams, “Prediction with Gaussian processes: From linear re- absolute error (MAE)?” Geosci. Model Develop. Discussions, vol. 7,
gression to linear prediction and beyond,” in Learning in Graphical pp. 1525–1534, 2014.
Models (NATO ASI Series D: Behavioural and Social Sciences), vol. 89. [101] J. Miles, “Tolerance and variance inflation factor,” in, Encyclopedia of
Dordrecht: The Netherlands: Springer, 1998, pp. 599–621. Statistics in Behavioral Science. Hoboken, NJ, USA: Wiley, 2005.
[77] F. Pérez-Cruz, S. Van Vaerenbergh, J. J. Murillo-Fuentes, M. Lázaro- [102] T. P. Ryan, Modern Regression Methods, vol. 655. Hoboken, NJ, USA:
Gredilla, and I. Santamaria, “Gaussian processes for nonlinear signal Wiley, 2008.
processing: An overview of recent advances,” IEEE Signal Process. [103] R. Koenker and K. Hallock, “Quantile regression: An introduction,” J.
Mag., vol. 30, no. 4, pp. 40–50, Jul. 2013. Econ. Perspectives, vol. 15, no. 4, pp. 43–56, 2001.
[78] H. Drucker, C. Burges, L. Kaufman, A. Smola, and V. Vapnik, “Sup- [104] M. Labus, G. Nielsen, R. Lawrence, R. Engel, and D. Long, “Wheat yield
port vector regression machines,” in Advances in Neural Information estimates using multi-temporal NDVI satellite imagery,” Int. J. Remote
Processing Systems, vol. 9. Cambridge, MA, USA: MIT Press, 1997, Sens., vol. 23, no. 20, pp. 4169–4180, 2002.
pp. 155–161. [105] R. Taylor, “Interpretation of the correlation coefficient: A basic review,”
[79] G. Mountrakis, J. Im, and C. Ogole, “Support vector machines in remote J. Diagnostic Med. Sonography, vol. 6, no. 1, pp. 35–39, 1990.
sensing: A review,” ISPRS J. Photogramm. Remote Sens., vol. 66, no. 3, [106] A. Huete, “Soil-dependent spectral response in a developing plant
pp. 247–259, 2011. canopy,” Agronomy J., vol. 79, no. 1, pp. 61–68, 1987.
[80] D. Basak, S. Pal, and D. C. Patranabis, “Support vector regression,”
Neural Inf. Process.—Lett. Rev., vol. 11, no. 10, pp. 203–224, 2007.
[81] F. Melgani and L. Bruzzone, “Classification of hyperspectral remote
sensing images with support vector machines,” IEEE Trans. Geosci.
Remote Sens., vol. 42, no. 8, pp. 1778–1790, Aug. 2004.
[82] M. Pal and P. Mather, “Support vector machines for classification in Hossein Aghighi received the B.Sc. degree in nat-
remote sensing,” Int. J. Remote Sens., vol. 26, no. 5, pp. 1007–1011, ural resource engineering from Gorgan University
2005. of Agricultural Sciences and Natural Resources, the
[83] H. Aghighi, J. Trinder, Y. Tarabalka, and S. Lim, “Dynamic block-based M.Sc. degree in remote sensing and GIS from Tarbiat
parameter estimation for MRF classification of high-resolution images,” Modares University, Tehran, Iran, and the Ph.D. de-
IEEE Geosci. Remote Sens. Lett., vol. 11, no. 10, pp. 1687–1691, Oct. gree in remote sensing from The University of New
2014. South Wales, Sydney, NSW, Australia, in 2000, 2012,
[84] G. Camps-Valls, L. Bruzzone, J. L. Rojo-Alvarez, and F. Melgani, “Ro- 2015, respectively.
bust support vector regression for biophysical variable estimation from He is currently an Assistant Professor with the
remotely sensed images,” IEEE Geosci. Remote Sens. Lett., vol. 3, no. 3, Remote Sensing and GIS Research Center, Shahid
pp. 339–343, Jul. 2006. Beheshti University, Tehran, Iran. His research in-
[85] D. Tuia, J. Verrelst, L. Alonso, F. Pérez-Cruz, and G. Camps-Valls, terests include remote sensing (RS) data processing and applications (optical,
“Multioutput support vector regression for remote sensing biophysical Lidar, and radar), retrieval of vegetation biophysical and biochemical param-
parameter estimation,” IEEE Geosci. Remote Sens. Lett., vol. 8, no. 4, eters from RS data, Internet of things (IOT), and convergence and synergism
pp. 804–808, Jul. 2011. between RS, GIS, and IOT for precision farming.
Mohsen Azadbakht received the B.S. and M.S. Hamid Salehi Shahrabi received the B.A. degree in
degrees in pure mathematics and remote sensing, physical geography and M.S. degree in remote sens-
and the Ph.D. degree in geomatic engineering from ing from Shahid Beheshti University, Tehran, Iran, in
Lorestan University, Shahid Beheshti University 2010 and 2013, respectively.
and the University of Melbourne, Melbourne, VIC, He is currently a Researcher with the Applied Re-
Australia, in 2004, 2009 and 2016, respectively. mote Sensing Group, Iranian Space Research Center,
He is currently an Assistant Professor with the Tehran, Iran. His research interests include crop-type
Remote Sensing and GIS Research Center, Shahid mapping, crop biophysical and biochemical parame-
Beheshti University, Tehran, Iran. His research in- ters estimation, and water productivity estimation of
terests include remote sensing data processing and agricultural fields.
applications.
Davoud Ashourloo received the B.Sc. degree in nat- Soheil Radiom received the Ph.D. degree in telecom-
ural resource engineering from Gorgan University munication & microelectronics Eng. from MICAS-
of Agricultural Sciences and Natural Resources, the ESAT, Katholieke Universiteit Leuven, Leuven,
M.Sc. degree in remote sensing and GIS from Shahid Belgium, in 2011.
Beheshti University, Tehran, Iran, and the Ph.D. de- He is currently an Assistant Professor with the
gree in remote sensing from K. N. Toosi University of Iranian Space Research Center, Tehran, Iran, where
Technology, Tehran, Iran, in 2001, 2003, and 2014, he is also the Director of the space research insti-
respectively. tute. His research interests include different scopes
His research interests include remote sensing, in- in wireless sensor networks, Internet of things, RFID
cluding crop mapping, wheat leaf rust detection, and systems, remote sensing applications development,
retrieval of vegetation biophysical and biochemical ADC, DACs, etc.
parameters.

Aghighi 2018

Uploaded by

Copyright:

Available Formats

Aghighi 2018

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Aghighi 2018

Uploaded by

Copyright:

Available Formats

This article has been accepted for inclusion in a future issue of this journal.

Content is final as presented, with the exception of pagination.

Machine Learning Regression Techniques for the

Abstract—Machine learning (ML) techniques have been utilized I. INTRODUCTION

active radiation [24], the fraction of absorbed photosynthet-

TABLE I Although the preliminary feature selection results seem

Here, NaN stands for not a number.

TABLE VIII Regarding the random sampling methods, although we inves-

You might also like