AgroAdvisor_Crop_Yield_Prediction_Crop_and_Fertili
AgroAdvisor_Crop_Yield_Prediction_Crop_and_Fertili
Kardan University
Article
Keywords: Crop Yield Prediction, Crop Recommendation System, Fertilizer Recommendation System,
Integrated System, Deep Learning Algorithms, Machine Learning Algorithms, DeepFM, Random Forest
DOI: https://doi.org/10.21203/rs.3.rs-4099720/v1
License: This work is licensed under a Creative Commons Attribution 4.0 International License.
Read Full License
Ashima Kukkar1, Rajni Mohana2,*, Aman Sharma2, Saurav Mallik3,*, Mohd Asif Shah4,*
1
Chitkara University Institute of Engineering and Technology, Chitkara University, Punjab, India;
ashima@chitkara.edu.in
2
CS/IT Department, Jaypee University of Information and Technology Technology, Solan, India;
rajni.mohana@juitsolan.in, amans.3008@juitsolan.in
3
Department of Environmental Health, Harvard T H Chan School of Public Health, Boston, MA 02115, USA;
sauravmtech2@gmail.com
4
Department of Economics, Kardan University, Parwane Du, 1001, Kabul, Afghanistan; m.asif@kardan.edu.af
*Corresponding authors: Mohd Asif Shah (m.asif@kardan.edu.af), Saurav Mallik (sauravmtech2@gmail.com),
Rajni Mohana (rajni.mohana@juitsolan.in)
Abstract
Crop yield prediction plays a very important role in productivity growth. Prediction of the crop yield in particular
area helps the farmer to choose the right crop to be grown in the land. With crop yield prediction crop
recommendation boosts up the productivity of crop. Recommending the correct type of crop in particular land on
the factors of soil pH, rainfall, temperature, humidity etc. helps the farmer to choose specific and most suitable crop.
With recommendation and yield prediction of crop, fertilizer recommendation is also necessary for more
productivity and yield. It is necessary to use suitable fertilizers on optimal timing for the growth of crops. Therefore,
in this paper, we have attempted to address these issues by proposing three model systems that will efficiently
manage crop production. In this paper, we designed an integrated system named as AgroAdvisor using the hybrid
proposed technique such as Random Forest with Extreme Gradient Boosting (RFXGB) and Deep Factorization
Machine (DeepFM). RFGB is applied for processing the features, which improves the DeepFM ability to handle the
dense numerical features and increase the prediction performance. The result of RFXGB-DeepFM is compared with
classical machine learning and deep learning techniques by using recall, F-value, precision and accuracy parameters.
The results show that the proposed RFGB-DeepFM technique gives better accuracy than the classical techniques.
The impact of RFGXB on existing techniques is also analyzed using Friedman and post hoc statistical testing and
results show that in most cases RFGXB enhanced the performance.
Keywords: Crop Yield Prediction, Crop Recommendation System, Fertilizer Recommendation System, Integrated
System, Deep Learning Algorithms, Machine Learning Algorithms, DeepFM, Random Forest.
1. Introduction
Agriculture is very important to India's economy and employment. It is an area that plays an important role in
improving our economy. In recent years, globalization has changed agriculture dramatically. Many farmers depend
on agriculture for their livelihood. Over time, production needs increased exponentially. The most common problem
faced by Indian farmers is not choosing crops suitable for the soil. Even when producing in large quantities, farmers
are using technology very wrong [1]. As naturally produced cultures, hybrid varieties do not provide essential
ingredients, such techniques spoil the soil. The most common problem faced by Indian farmers is that they do not
select crops suitable for the soil. This affects productivity. Poor crop selection has led to declining crop yields across
the country, food shortages and an increase in farmer suicides. The main objective of farming plans is to achieve
maximum yields. For growing crops on a large scale, it is very important to take care of the crops at initial stages
because growing crops on the same land for longer time period has led to depletion of soil. Using excessive amount
of fertilizer can also erode the soil. According to a search, if the fertilizers are used with the recommended amount
based on the land, then farmers can harvest approximately 10% - 20% maximum yield of crops compared to the
usual practice of farmer [2].
Agriculture depends on predicting the best crop to grow, suitable fertilizers for the growth of crops, crop yield
according to environmental factors of the particular area, in recent years, machine learning and deep learning
algorithms have become quite important in this process. In this age of data science and technology, the agriculture
industry has a lot to gain from correctly applied methods. Two essential approaches are feature selection and
classification [3-6]. The goal of feature selection is to extract the most crucial dataset properties. It entails selecting
a portion of suitable characteristics out of a larger set of original attributes based on a benchmark that has been
established, such as classification performance, which is crucial in machine learning and deep learning applications.
The main goal of the proposed work is to develop a recommendation system for accurate crop selection, improve
crop productivity and reduce crop wrong selection. The farmer's problem is solved by recommending harvesting
before sowing. This proposed work presents a hybrid machine learning and deep learning algorithm-based crop
recommendation system for farmers named as AgroAdvisor. Multiple researchers utilised Random Forest (RF) for
crop yield prediction, crop and fertiliser recommendation, and compared its applicability with other models [7.10].
For improved crop yield models, researchers compared classical machine learning models to deep learning models.
No study in the available literature has examined the effect of RF with Boosting (RFGB) on prediction accuracy
when applied to DeepFM input layer weight and bias. The intrinsic relationship between RFGB and DeepFM has
not been analysed by any research. In this study, we developed a hybrid RFGB-DeepFM model that demonstrates
how applying RFGB to DeepFM can increase crop yield, crop, and fertiliser recommendation accuracy.
1.1. Contribution:
An integrated System names as AgroAdvisor is implemented using proposed RFXGB-DeepFM technique that
covered the three major areas on the crops:
• To design a Crop Recommendation System that will recommend the best suited crop according to the
environmental conditions so that crop productivity increases.
• To design a Fertilization Recommendation System that suggests the suitable fertilizer and optimal timing
for the application of fertilizer.
• To design a Crop Yield Prediction System that predicts the total yield of the particular crop according to
environmental factors of the particular area.
• To compare the proposed AgroAdvisor results to those of the existing model using accuracy and error rate.
This research paper is organized as follows. The related work is illustrated in Section 2. The proposed system is
described in Section 3. Experimental results and discussions are represented in Section 4. The conclusion and future
work are discussed in Section 5.
2. Related Work
Several methods have been proposed by various researchers in the world. Measures to increase farming primarily
concern his Ingrain technology and devices that make the agricultural sector more profitable for farmers by using
machine learning and deep learning techniques to predict the increase in crops. For our project
work, we reviewed several research papers that demonstrated some good techniques for crop prediction,
recommendation and fertilizer use.
Y. Jeevan et al. [3] designed the system to predict the crop yield. Author have used the previous data and information
to predict the pattern of crops. Environmental factors such as soil pH, temperature, humidity etc. are considered. The
Decision Tree (DT) and Random Forest (RF) methods are used to evaluate the performance of the proposed
prediction system. The results illustrated that RF gives better accuracy than DT and reduces over fitting. The authors
have also compared other methods but still found the RF as best. Reddy et al. [4] created a mobile application that
helps farmer predict the yield of crop. For this purpose, the author compared the accuracy and performance of three
machine learning algorithms. The algorithms used are – Logistics Regression (LR), RF, Naïve Bayes (NB). On
comparing the results of three algorithms, LR, NB and RF have gained 87.8%, 91.50% and 92.81% accuracy rate.
Therefore, authors created this model using RF algorithms. Authors have developed the system using the API
platform, where API takes temperature, rainfall, humidity etc. as input and from the trained data it analyses the
pattern and predicts the yield of crop. Nigam et al. [5] utilized various machine learning techniques for predicting
the yield of crops on various crop datasets. The factors that author considered to calculate for crop yield are –
temperature, rainfall, production, combined dataset. Each machine learning algorithm is applied on these datasets
and calculated the performance. The author predicted that LSTM performed best for predicting the temperature. But
RF outperformed all the other machine learning techniques such as XG Boost, K-Nearest Neighbour (KNN), LR by
combining all the factors of model i.e. temperature, rainfall, production. The accuracy of RF is 67.80% higher than
all other techniques. Nischitha et al. [6] build the model to predict rainfall using Support Vector Machine (SVM)
algorithm and to predict the crop utilizing DT algorithm. The rainfall is predicted on the basis of other factors like
soil pH, humidity, temperature. The output of the rainfall prediction model is fed into crop prediction model for
predicting the crop that is suitable for growing on these conditions and bring more successful rates in growing crops.
Suruliandi et al. [7] used various algorithms for selecting the features from the dataset, and for classifying the
dataset into their specific label, various machine learning algorithms are used. These feature selection algorithms
are– Boruta, Sequential Forward Feature Selection (SFFS), Recursive Feature Elimination FE). The algorithms used
for classification of crops are – KNN, DT, RF, SVM, Bagging and NB. On the performance analysis, it showed
that Boruta, SFFS and RFE work better with bagging classifier. Pudumalar Et al. [8] have used Majority Voting
Technique in which any number of base classifiers can be used. The learners that are selection for this technique are
highly competent to each other as well as complementary to each other. The higher the competency the higher is the
accuracy. The authors used the following four algorithms in majority voting technique –RF, CHAID, NB and KNN.
These learners predict the labels for the data points. These labels are predicted as the models vote for that label and
the majority votes are counted for each label. Each learner has an operator that performs the classification
correspondingly. The accuracy of their model came out to be 88%. Doshi et al. [9] built two models- one model to
recommend the crop to farmer that will be more suitable with respect to inputs. The second model predicts the rainfall
according to recommended crop type. Authors have used different algorithms for both the models. For crop
recommendation system, the accuracy of algorithms such as DT, KNN, RF, and Neural Network (NN) are compared.
NN based model achieved the best accuracy. For rainfall prediction model, LR provided the 71% accuracy rate that
is best from all other algorithms. Pande et al. [10] proposed a system that would lower the suicide rates. The authors
have built a mobile app that is user friendly app. In this app they have provided the two services to the user. The first
service would help the user by recommending the most suitable crop and the crop that would provide the highest
yield. This system has two options- one option is that if the user knows which crop he must grow, then he would just
check the yield of that crop. The second option provided by the authors is – if the user does not know which crop he
has to grow, then he could just enter the values of soil and weather condition and the system would predict the
best crop of all the crops. This system will also provide the yield of that recommended crop that would help the
farmer or user. The second system that they have implemented is for the fertilizer system. This system will provide
the right time to users to use the fertilizer for maximum productivity. The SVM, Artificial Neural Network (ANN),
RF, Multivariate Linear Regression (MLR), and KNN methods are used, and RF showed the best results with 95%
accuracy. Kumar et al. [11] created two systems– crop prediction system and fertilizer recommendation system. In
this paper the authors have taken inputs of soil and weather like – N, P, K values of soil and temperature, humidity,
rainfall. On the basis of these factors the crop recommendation system recommends the crops and provides suitable
fertilizer for that crop to increase productivity. The authors have compared three algorithms – RF, SVM and KNN.
To develop the model authors have used SVM algorithm. Archana et al. [12] implemented the two model – for crop
recommendation and for fertilizer recommendation. In this work, the author have used KNN algorithm to implement
the model. The overall accuracy of the model was 80%. Verma et al. [13] proposed a crop prediction model using
machine learning techniques namely KNN, SVM, RF, DT and Gradient Boosting. The proposed system achieved
the accuracy of 99.4% determining which crop will yield the most under the conditions of the weather and soil.
Molares et al. [14] designed a system to predict crop prediction utilizing RF, ANN and regularized linear models.
The Root Mean Square Error (RMSE) of RF is between 35-38% that is less than other models. In the Maharashtra
region, Iniyan et al. [15] tallied data on precipitation, humidity, temperature, area, soil type, crop type, season, and
yield during the previous 18 years. The Long Short-Term Memory (LSTM) network had the greatest prediction
performance with 86.3% accuracy after training eight machine learning models, including linear regression, DT,
KNN, SVM and gradient boosting. Panigrahi et al. [16] effectively predicted the yield of three distinct crops of
maize, peanuts, and Bengal beans using linear regression, and DT based on the Telangana's monthly low and high
temperatures and yearly rainfall. By combining Convolutional Neural Network (CNN) and the Geographical factors,
Tiwari et al. [17] established a model for crop yield prediction. The current model encountered an issue during a
continuous breakdown in agricultural drifts for crop cultivation that are inappropriate for the soil, weather, and
temperature conditions. Back Propagation Neural Network (BPNN) is utilized to train the created CNN model, which
utilized spatial information as input for error prediction. The proposed model has the benefit of being applied to a
real-time dataset derived from reliable geographical sources. However, the created model lowered the effectiveness
of crop production forecast while reducing the relative error. A unique strategy for an efficient CYP was devised by
Gopal et al. [18] ANN, statistical, and Multi Linear Regression (MLR) methods were used to forecast agricultural
yield. The MLR-ANN model for crop yield prediction is incorporated into the model, which looked at the intrinsic
behavior and calculated accuracy coefficients based on the weights and bias of the MLR and ANN input layers. To
forecast agricultural yield, the Feed forward ANN with back propagation model is utilized. Similar to this, Khaki et
al. [19] investigated the Deep Neural Network (DNN) for crop yield prediction to create a precise prediction model.
In order to set up the relationship between the yield and the interacting elements with respect to the robust and all-
encompassing algorithm, the model accomplished basic understanding. The findings demonstrated and implied that,
when compared to current supervised models, regression trees outperformed them. The key constraint, though, was
to hunt for more sophisticated models that weren't producing correct findings. Mariammal et al. [20] developed a
modified recursive feature elimination (MRFE), a unique FS method with six classifiers, to choose the most pertinent
features from a data collection for crop prediction. These classifiers are KNN, NB, DT, RF, SVM and Bagging. The
experimental findings demonstrate that the bagging strategy aids in accurate crop prediction while the MRFE method
chooses the most accurate characteristics. The performance study explains why the MRFE methodology outperforms
other FS techniques with a 95% ACC.
After reviewing the aforementioned publications, we have come to the conclusion that most researchers use the same
features to compute the results. Few researchers used deep learning methods including ANN, LSTM, CNN, and
DNN for recommendations, as shown in Figure 1. Most of the researchers exclusively used machine learning
techniques. However, other studies employed feature selection methods to reduce the feature set. RF is mostly
utilized algorithm for feature selection and recommendation due to the ability of handling numerical features. In this
work, an improved version of RF called the Random Forest Extreme Gradient Boosting based feature selection model
is utilized to handle the dense numerical features to improve the recommendation performance of the system by
selecting the relevant features [21-24]. Additionally, deep learning-based recommendation in agriculture is a recent
academic field that enables us to quickly manage and assess data gathered from various sources. DL-based systems
typically exhibit non-linear behavior and high levels of variability, which have an effect on the final outcomes. As a
result, DeepFM based deep learning model is utilized to improve the performance of recommendation system. In
this study, we explore and implement DL with ML for crop yield, fertilizer, and crop recommendations, which is a
crucial endeavor and a subdomain of the agricultural domain. Therefore, DeepFM based deep learning model is
integrated with Random Forest Boosting based Machine learning model to enhance the performance of
recommendation system by using the benchmark dataset [25-27].
3.2. Dataset
This section gives the detailed description of the integrated framework dataset used in our proposed work. All the
three datasets are taken from the kaggle [25-27]. These three data sets are combined to achieve the desired results.
Various crop labels are provided in the datasets like rice, cotton, sugarcane etc. The sample of crop yield, crop
recommendation and fertilizer recommendation data is illustrated in Figure 4. The crop dataset has 2200 instances
with 22 labels, fertilizer dataset has 99 instances with 7 labels and crop yield dataset has 246092 instances with 8
labels.
3.3. Pre-processing
Data pre-processing is used to turn the raw data into a clean data collection. The data are acquired from many
sources, however because they are collected in raw form, analysis is not possible. The data is changed into a
comprehensible format by using various strategies, such as substituting missing values and null values. The
various python libraries and tools are used for data processing [28]. Further, the division of training and testing
data is the last stage in the data preparation process. To train each module of AgroAdviser we have used three
different datasets. The dataset is split into training and testing data. For crop recommendation system the dataset
is split into two sets – training set and test set with 80% of data used for initial training and 20% for testing. For
fertilizer recommendation system the dataset is split to 80% of data for initial training and 20% of data for testing.
For crop yield prediction system, the dataset is divided into two sets. The training dataset comprises of 80% of
data and testing dataset comprises of 20% of data.
The selected relevant features are fed into the DeepFM model. The Factorization Machines (FM) and Deep Neural
Networks (DNN) are combined to create DeepFM. It involves both the high-order interactions captured by DNN and
the low-order interactions captured by FM [30,31]. The conventional FM model often just eliminates the combination
of two elements when eliminating the combined features. The output of the FM component and the output of the
DNN component are combined with a weighted total to produce the DeepFM model's final prediction as represented
in equation 3:
where FO (0,1) is the output of whole model, 𝐹𝑂𝐹𝑀 is the output of FM component, and 𝐹𝑂𝐷𝑁𝑁 is the output of
DNN component. The same input is used by both the FM component and the deep neural network. Most of the
category characteristics and ID-type features are encoded using the straightforward one-hot encoding technique. The
coded results are recorded as a very sparse vector that cannot be used as a straight input to a DNN since it would
reduce the network's performance. The original input data must first be learned through the embedding layer to
convert the thermally encoded input data, which is usually very sparse, from high-dimensional sparse vectors to low-
dimensional dense vectors. The dense vector of dimensions serves as the common input for both the FM and deep
neural network components. FM is a device for two-way factorization. The calculation formula is as follows:
𝑔 𝑔
𝐹𝑂𝐹𝑀 = 〈𝑝, 𝑘〉 + ∑ℎ1 ∑ℎ2=ℎ1+1〈𝑣𝑎 , 𝑣𝑏 〉𝑘𝑏1 𝑘𝑏2 (4)
where "cross terms" and 〈𝑝, 𝑘〉 represent the first- and second-order features, respectively, in weighted
representations.
DNN component employs a feedforward neural network. Since the data processed by this model typically contains
a sizable quantity of discrete data, which will lessen the learning impact, the input data cannot be directly inputted
into the neural network. As a result, this component also receives its input from the low-dimensional dense vector
that the embedding layer processed earlier.
The outputs of the embedding layer's processing of the i-th feature domain are 𝐽1 , 𝐽2 , … … . . , 𝐽𝑚 . Use 𝐽(0) as the
DNN’s input after that in order to extract features, as demonstrated below:
Where 𝑊 (𝑖) , 𝐽(𝑖) , 𝑧 (𝑖) are the l-th hidden layer output, weight, and bias parameters, respectively, and 𝜎 signifies the
neural network's hidden layer activation function. Use the output from the previous layer as the DNN part's output
next:
|𝐸| represents the quantity of hidden layers. Exchange of inputs between the FM and DNN components has the
following two benefits: 1) It is feasible to simultaneously learn high-order and low-order features from the original
features; 2) Feature engineering techniques such as RFGXB feature selection have to be applied to the original inputs
to learn the numerical features.
• Performance Measurement.
• Result of Proposed AgroAdvisor System.
• Effect of proposed Feature Selection Method on DeepFM.
• Effect of proposed Feature Selection Method on Existing Classifiers.
• Comparison with Existing Crop Yield Prediction, Crop Recommendation and Fertilizer Recommendation
Systems.
The performance of proposed AgroAdvisor is evaluated using various performance parameters. These parameters
are F-value, Precision, Accuracy, and Recall [32].
• Accuracy of a proposed system is defined as the ratio of accurately predicted samples to the total number
of samples. The prediction's accuracy is calculated by following equation:
∑ 𝑇𝑃+∑ 𝑇𝑁
𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = ∑ (10)
𝑇𝑃+∑ 𝑇𝑁+∑ 𝐹𝑃+∑ 𝐹𝑁
Where,
TP True Positive
FP False Positive
FN False Negative
TN True Negative
• Precision is determined as the percentage of accurately predicted positive sample to total number of positive
samples, along with FP samples. The prediction's precision is computed by following equation:
∑ 𝑇𝑃
𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = ∑ (11)
𝑇𝑃+∑ 𝐹𝑃
• Recall is the ratio of correct positive samples to total positive samples as shown in equation
∑ 𝑇𝑃
𝑅𝑒𝑐𝑎𝑙𝑙 = ∑ (12)
𝑇𝑃+∑ 𝐹𝑁
This section shows the experimental findings for the proposed AgroAdvisor system. The performance of the system
is divided into three modules such as crop yield prediction, crop, and fertilizer recommendation system.
Comparisons are made between the proposed RFXGB-DeepFM technique with the traditional techniques, including
SVM, NB, XGB, RF, KNN, ANN, CNN, LSTM and DT [33-34]. Table 1 compares the performance of the proposed
and the aforementioned techniques for crop yield prediction, crop and fertilizer recommendation. The proposed
AgroAdvisor system performs better than the conventional techniques, as can be observed. Additionally, it has been
shown that proposed RFXGB-DeepFM achieves higher precision, recall, accuracy, and F-value rate. Additionally,
it is noted that the SVM technique performs the worst out of all techniques for crop yield prediction and crop
recommendation, but for fertilizer recommendation, NB performance is worst. The average accuracy of proposed
AgroAdvisor system with RFXGB-DeepFM is 98.34 % for crop yield prediction, 97.34% for crop recommendation
and 99.78% for fertilizer recommendation respectively. By achieving higher values, the proposed system minimizes
error rate. For crop yield prediction, average accuracy of SVM, XGB, RF, DT, KNN, NB, ANN, CNN and LSTM
is 71.36 %, 90.56%, 91.42%, 84.13%, 82.95%, 76.34%, 86.36%, 80.89%, and 94.57% respectively. For fertilizer
recommendation, average accuracy of SVM, XGB, RF, DT, KNN, NB, ANN, CNN and LSTM is 72.38%, 89.24%,
90.13%, 85.98%, 76.58%, 74.30%, 91.23%, 80.16%, and 94.67% respectively. For crop recommendation, average
accuracy of SVM, XGB, RF, DT, KNN, NB, ANN, CNN and LSTM is 79.88%, 88.09%, 92.28%, 84.97%, 81.88%,
75.08%, 90.33%, 82.88%, and 93.48% respectively. Proposed techniques reduced predictions variance, in turn
performance prediction is improved. The RFXGB-DeepFM technique achieves higher accuracy than SVM, XGB,
RF, DT, KNN, NB, ANN, CNN and LSTM. Since the effectiveness of RFXGB feature selection relies on the quantity
of decision trees produced. It provides a more comprehensive view of feature importance, potentially leading to
better feature selection decisions. This approach combines the strengths of both algorithms to produce more robust
and reliable feature rankings. This approach provides a more robust and balanced view of numerical feature
importance, which can lead to better model performance and more reliable insights into your data. Whereas DeepFM
is a machine learning model that combines the strengths of Factorization Machines (FMs) and deep neural networks.
It's primarily designed for recommendation tasks, particularly in scenarios involving sparse and high-dimensional
data. It is often used with sparse categorical data, by integrating RFXGB with DeepFM provides the advantageous
when applied to numerical data. RFXGB-DeepFM's captures complex interactions between features is valuable for
understanding the relationships between various factors affecting crop growth and yield. It can uncover intricate
patterns and dependencies between different attributes, helping to reveal hidden insights. Agricultural datasets often
include a mix of categorical information (e.g., crop types, soil types, weather conditions) and numerical data (e.g.,
temperature, humidity, rainfall). RFXGB-DeepFM's architecture is well-suited for handling both types of data within
a unified framework. It can effectively handle sparse and numerical data scenarios by leveraging embedding and
learning meaningful representations even from limited instances. It can further enhance the model's performance on
numerical data by preventing overfitting and promoting better convergence during training. It has been shown that
NB performs somewhat differently overall, particularly for the F-value and accuracy metrics. This is because both
classifiers utilize distinct objective functions to predict. Further, the XGB increases the time, complexity and
computation, but has ability to handle complex relationships, feature interactions, and noisy data. Therefore, it
performs better than SVM, DT, KNN, CNN and NB. The performance of the ANN technique is observed to be
superior to that of the NB, SVM, XGB, KNN, CNN and DT classifiers thus, ANN depicts the complicated relationship
between output and input. Due to RF’s ability to be parallelized, to handle unbalanced data, excellent high-
dimensionality performance, quick prediction or fast training speed, resistance to non-linear data, moderate variance,
and low bias. Therefore, it performed better than NB, XGB, SVM, KNN and DT classifiers. The DT classifier
performs low, because data is not separated linearly and it ignores some important variables in the training data. The
performance of KNN is also not good due to the computational complexity for large datasets, sensitivity to the choice
of distance metric, and the need for careful tuning of the hyper parameter. The LSTM technique also works well on
crop dataset due to their ability to handle sequences and capture temporal dependencies. Crop data often involves
time series or sequential information, making LSTM a suitable choice for analysis. CNN works worst because it is
particularly effective at capturing spatial patterns and hierarchies of features. As a result, it is safe to conclude that
the level of performance derived from the aforementioned parameters justifies the use of RFXGB-DeepFM in the
proposed system for crop yield prediction, crop and fertilizer recommendation.
Table 1: Result Comparison of the Proposed and Existing Techniques For Crop Yield Prediction, Crop And
Fertilizer Recommendation
4.3. Effect of Proposed Feature Selection Method on DeepFM and Existing Techniques.
The effect of the proposed feature selection method on proposed and existing classifiers is investigated in
this section. RFXGB is used as feature selection techniques to select relevant features from the input data
before training a DeepFM model. The recall, precision, accuracy, and F-value parameters are used to evaluate
the performance. From Table 2 and Figure 6, it can be observed that precision, recall, F-value and accuracy
of Deep FM is increased by approx. 5.02 %, 5.12%, 5.07% and 6.43% respectively for fertilizer
recommendation, 5.96%, 8.31%, 7.16% and 6% for crop yield prediction, 7.22%, 8.44%, 7.6% and 6.2% for
crop recommendation, by combining proposed feature selection method. Further the RFXGB is also
combined with existing Techniques such as SVM, XGB, RF, DT, NB, ANN, CNN and LSTM, the results
are illustrated in Tables 2,3 & 4. The accuracy parameter is used to evaluate the result of these techniques
with and without features selection. From Figure 7, it can be observed that for fertilizer recommendation,
prediction accuracy of KNN, LSTM and SVM is decreased by approx. 3.43%, 0.37% and 3%, by combining
RFXGB feature selection method, whereas the accuracy of RF, XGB, DT, ANN, CNN and NB is increased
by 0.39%, 1.14%, 3.26%, 3.79%, 4.25% and 3.37% respectively. The accuracy of RF, XGB, DT, ANN,
CNN and LSTM classifies is enhanced by approx. 0.75%, 0.66%, 1.11%, 1.88%, 2.18% and 0.45 % whereas
the prediction accuracy of KNN, NB and SVM is decreased by approx. 0.59%, 0.28% and 2.16%, by
combining RFXGB feature selection method for crop recommendation. The accuracy of RF, XGB, DT, NB,
ANN, CNN and LSTM is improved by 1.36%, 0.43%, 1.1 %, 1.2%, 2.76%, 4.43% respectively and 0.09%
whereas SVM and KNN accuracy rate is decreased by 3.13% and 2.5% respectively. It is concluded that in
some cases the proposed feature section method extracts the relevant features from the crop dataset and
techniques used these relevant features for predicting the yield and recommending. But in a few cases,
RFXGB reduced the performance of techniques such as SVM, KNN, NB and LSTM it due to the removal of
important features and technique can lose the critical information.
Table 2: Performance Comparison of Existing Classifiers with or without Feature Selection Method for Crop
Yield Prediction
Accuracy Accuracy
F-Value F-Value
Recall Recall
Precision Precision
80 85 90 95 100 85 90 95 100
% %
(a) (b)
CROP
RECOMMENDATION
Proposed RFXGB-DeepFM DeepFM
Accuracy
F-Value
Recall
Precision
85 90 95 100
%
(c)
Figure 6 (a,b,c): Graphical Representation of Performance Comparison of DeepFM with or without Feature
Selection Method.
Table 3: Performance Comparison of Existing Classifiers with or without Feature Selection Method for Crop
Recommendation.
Classifiers without
Models Classifiers with Feature Selection Accuracy Accuracy
Feature Selection
RFXGB-SVM 70.22 SVM 72.38
RFXGB-XGB 89.9 XGB 89.24
RFXGB-RF 90.88 RF 90.13
RFXGB-DT 87.09 DT 85.98
Crop RFXGB-KNN 75.99 KNN 76.58
Recommendation RFXGB-NB 74.02 NB 74.3
RFXGB-ANN 93.11 ANN 91.23
RFXGB-CNN 82.34 CNN 80.16
RFXGB-LSTM 95.12 LSTM 94.67
Proposed RFXGB-DeepFM 97.34 DeepFM 91.14
Table 4: Performance Comparison of Existing Classifiers with or without Feature Selection Method for Fertilizer
Recommendation.
(a) (b)
(c)
Figure 7(a,b,c): Graphical Representation of Performance Comparison of Existing Classifiers with or without
Feature Selection Method.
4.4. Comparison of Proposed AgroAdvisor system with Existing Crop Yield Prediction, Crop
Recommendation and Fertilizer Recommendation Systems
The primary step that must be used in the majority of agriculture data studies is benchmarking, which determines the
precision, efficacy and reliability of the enhanced system in comparison to the present one. An important component
of benchmarking is the use of a standard dataset or a technique that is related to the problem at hand. Thus, for this
study, crop yield prediction, crop and fertilizer recommendation techniques that have been documented are used for
benchmarking. This study includes a fair comparison with current cutting-edge techniques in terms of accuracy
metrics for the prediction performance. Table 5 displays the comparative results of proposed and existing crop yield
prediction, crop and fertilizer recommendation techniques. Furthermore, there is a comparison based on the crop
dataset, all existing systems have been tested and trained. The crop yield prediction module’s performance is
compared to the existing three systems proposed by Venugopal et al. [35], Kumar et al. [36], Agarwal et al. [12].
They have used prediction models based on RF, DT, LSTM, RNN and SVM in addition to other techniques. During
the experiment, Venugopal et al., Kumar et al., Agarwal et al. attained accuracy rate of 92.81%, 81%, and 97%
respectively. The performance of crop recommendation module is compared with five existing systems proposed by
Doshi et al. [9], Priyadharshini et al. [37], Durai et al. [38], Archana et al. [12], Govindwar et al. [39]. They have
implemented the recommendation systems using NN, RSCU-RF, Ensemble, SVM and RF techniques. The accuracy
achieved by Doshi et al., Priyadharshini et al., Durai et al., Archana et al., Govindwar et al., is 91 %, 89.88%, 95.45%,
92% and 90% respectively. Further, fertilizer recommendation module is compared with four state of art systems
that are developed by Senapati et al. [40], Palaniraj et al. [41], Sriniva et al. [42], Govindwar et al. [39] and gained
98.40%, 95%, 95.60%, 90% respectively. To execute the benchmark tasks in the system, proposed system, achieved
more optimum results than reference [30,31,32] in all modules. In contrast to all baseline research, the proposed
AgroAdvisor system, however, not only decreases complexity but also enhances accuracy rate throughout
experiment using lightweight iteration. Additionally, when compared to other works, the proposed system shows
that the proposed RFXGB-DeepFM technique benefits greatly in terms of accuracy from the feature vector derived
via proposed feature selection method. DeepFM predictor also gains significantly in most performance
measurements. Finally, by using the proposed work, physicians can increase the therapeutic benefits and decrease
mortality by employing agriculture resources more precisely and effectively.
In this study, the crop yield prediction, crop and fertilizer recommendation dataset are separated into a training
set and a test set according to different ratios in order to validate the performance of each technique under the
various scales. Each technique is evaluated to determine the value of accuracy, with the training set including,
successively, 80%, 70%, 60%, 50%, and 40%. Table 6 displays the test results of proposed and existing technique.
The table demonstrates that every technique’s performance degrades as the size of the training set shrinks in very
dataset. SVM technique has the worst impact among them all. Because it is sensitive to the quality and quantity
of data. SVMs may require more data to generalize effectively, and if the training set is small, it can lead to poorer
results. Crop yield prediction and recommendation tasks can be complex, and the relationships between input
features and crop outcomes are not linear or separable. SVMs primarily work well when there is a clear linear
separation between classes. The relationships in data are nonlinear, other techniques like XGB, RF, ANN, CNN,
LSTM, RFXGB-DeepFM capture these complexities better. NB is also worse than other techniques in all dataset
because it relies on the strong assumption that features are conditionally independent given the class label. In
reality, features in agricultural data, such as soil characteristics, weather variables, and crop history, are often
interrelated. Violation of the independence assumption can lead to suboptimal performance. NB is a simple
algorithm and not capture complex relationships in the data, especially when there are nonlinear dependencies
between features and the target variable. The dataset also contains continuous variables; as it typically assumes
categorical or discrete features. Other techniques handle continuous data more effectively. The performance of
CNN is also worse than ANN, LSTM, RFXGB-Deep FM, XGB and RF, it is due that CNNs typically require a
large amount of data to learn complex features effectively. If the dataset is relatively small, CNNs may struggle
to generalize from limited examples, leading to overfitting on the training data and worse performance on the test
data. Further the crop yield prediction and recommendation data do not have a spatial structure. KNN makes
predictions based on the local patterns in the training data and used data set contains the global and complex
relationship. Therefore, the result of KNN is not as good as other techniques. Further it is sensitive to the curse
of dimensionality. RFXGB-DeepFM performs better than other techniques in all training and testing batches
because it captures both linear and nonlinear patterns in the data, providing more accurate predictions. It can
effectively handle feature selection and feature importance, ensuring that the most relevant features are
considered. It reduces overfitting and improves generalization of the model. Crop yield prediction and
recommendation tasks often involve diverse data types, including numerical, categorical, and possibly sequential
data. Hybrid models like RFXGB-DeepFM can handle these different data types effectively, leveraging the
strengths of each component model. RFXGB-DeepFM are often more complex and computationally intensive
than individual models. They require more data and computational resources for training and tuning. Therefore,
at below 80%, the accuracy of the model is decreased.
Table 6: Comparison of Proposed AgroAdvisor system with Existing techniques using Different Batches of
Training and Testing
The statistical outcomes of the Friedman test are illustrated in this subsection. Statistical testing is used to verify
and confirm the newly given approach. These tests are designed to find any appreciable differences between the
proposed technique's performance and that of other compared techniques. The significant distinction outlined how
the new algorithm differs from the current algorithms and experimental and statistical evidence supporting the
suggested technique's efficacy. Therefore, the RFXGB-DeepFM approach in this study is regarded to be validated
by the Friedman test. Additionally, two hypotheses—H0 and H1—are developed. The null hypothesis (H0) states
that there is no discernible difference between the RFXGB-DeepFM approach and the other techniques that were
examined. RFXGB-DeepFM and other techniques differ significantly, according to hypothesis (H1), at the same
time. The average ranking of each approach using the accuracy metric is displayed in Table 7. The RFXGB-
DeepFM method reportedly ranks best among all other approaches. SVM, however, ranks lower when compared
to other methods. Table 8 is an example of the Friedman test statistics. 9 is the freedom degree, and the statistical
findings are assessed with a 0.05 level of confidence. The crucial value, F (0.05; 9), has a p-value of 16.92 and is
26.3455. As a result, the H0 is disproved, and the results of RFXGB-DeepFM and other compared approaches
differ significantly. As a result, it is claimed that RFXGB-DeepFM differs dramatically from the other strategies
under consideration.
Proposed RFXGB-
SVM XGB RF DT KNN NB ANN CNN LSTM
DeepFM
1.33 6.33 7.67 5 3.3 1.67 7 3.67 9 10
A post hoc test is also run to confirm the effectiveness of the RFXGB-DeepFM technique. Table 9 includes the
Post-hoc test findings and highlights the large discrepancy between the outcomes of the various procedures. In
comparison to SVM, XGB, RF, DT, KNN, NB, ANN, CNN, LSTM and RFXGB-DeepFM techniques, it has been
shown that RFXGB-DeepFM is considerably different. Thus, it can be said that the Post-hoc test verifies the
effectiveness of the RFXGB-DeepFM technique. All approaches are divided into seven groups using the Post-
hoc test. Group 1 consists of SVM and NB technique, Group 2 of CNN and KNN technique, Group 3 of DT
technique, Group 4 of XGB technique, Group 5 of ANN and RF technique, Group 6 of LSTM technique, and
Group 7 of RFXGB-DeepFM technique are the groups of techniques. It is demonstrated how considerably SVM,
XGB, RF, DT, KNN, NB, ANN, CNN, LSTM and RFXGB-DeepFM techniques are different from others. SVM
and NB, CNN and KNN, ANN and RF on the other hand, show no discernible changes. Although these strategies
perform similarly statistically, they provide distinct outcomes when used in experiments.
Proposed
Techniques SVM XGB RF DT KNN NB ANN CNN LSTM
RFXGB-DeepFM
SVM S S S S S S S S
XGB S S S S S S S S S
RF S S S S S S S S S
DT S S S S S S S S S
KNN S S S S S S S S S
NB S S S S S S S S
ANN S S S S S S S S
CNN S S S S S S S S
LSTM S S S S S S S S S
Proposed
RFXGB- S S S S S S S S S
DeepFM
4.7. Discussion
It is concluded that using a combination of RFXGB and DeepFM in a stacked model offer several advantages, as it
leverages the strengths of each technique. Each of these models has its own strengths. RF is known for handling non-
linear relationships and noisy data, XGB excels in gradient boosting and often provides strong predictive power, and
DeepFM is effective at capturing complex feature interactions in large-scale datasets. Combining them can lead to
improved overall predictive performance by leveraging their individual strengths. It also enhances the robustness of
the proposed model. If one of the individual models is overfitting or underperforming on a specific subset of the
data, the ensemble can help mitigate these issues and provide more stable predictions. Each model may extract
different feature representations or insights from the data. Random Forest might focus on decision tree splits, XGB
on gradient boosting, and DeepFM on deep learning-based embeddings. RFXBG and DeepFM can combine these
diverse representations, potentially capturing a broader range of patterns in the data. Combining models with
different regularization techniques and characteristics can help reduce overfitting. If one model overfits to the
training data, the ensemble's aggregation of predictions can smooth out these tendencies and lead to better
generalization. RFXGB are known for their interpretability, as they provide numerical feature importance scores and
insights into feature contributions. By using these models alongside DeepFM, which is more complex and less
interpretable, you can maintain some level of model interpretability while benefiting from DeepFM's feature
interaction capabilities. Ensembles provide flexibility in model selection and combinations. It is experiment with
different weights or strategies for combining predictions from Random Forest, XGB, and DeepFM to find the best-
performing ensemble for your specific problem. It is more robust when faced with changes in the data distribution
or when new data becomes available. They can adapt to shifts in the data better than a single model. However, it's
important to note that combining these models also comes with challenges, such as increased complexity in model
training, tuning, and deployment. Careful experimentation and fine-tuning of the approach are necessary to harness
the advantages effectively. Additionally, the choice of ensemble technique (bagging) and the way the models are
combined (weighted averaging) can significantly impact the final performance and should be chosen based on
empirical results and domain knowledge.
Analyzing the validity of the approach described in the paper, "AgroAdvisor" using the hybrid technique of Random
Forest with Extreme Gradient Boosting (RFXGB) and Deep Factorization Machine (DeepFM), involves considering
various aspects of internal, external, and potential threats to validity:
It is threatened due to the selection of bias in the agriculture data set, if the dataset used for training and evaluation
is not representative of the actual agricultural context. But the data source and collection process are unbiased in this
experiment. Proper techniques like cross-validation, regularization, and careful hyper parameter tuning are applied
to mitigate overfitting. Further proper feature engineering, data cleaning, and encoding are performed consistently
and correctly.
External validity concerns the ability of the model to generalize beyond the agriculture dataset used in the study.
This agriculture dataset is divided into three datasets that is crop yield prediction, crop and fertilizer recommendation.
To enhance external validity, this paper discussed the potential applicability of the AgroAdvisor system to other
agricultural contexts and datasets. The dataset used for evaluation is representative of the broader agricultural
domain. This dataset has positive as well as negative insights into the diversity and characteristics. This dataset
should not cover a diverse range of geographic regions and climates. Agriculture practices, crop varieties, and soil
types can vary significantly by location. Including data from multiple regions can help ensure that the models and
recommendations are applicable to a broader range of environments. The dataset does not encompass various crop
types commonly grown in different regions. Different crops have distinct growth patterns, nutrient requirements, and
susceptibility to diseases and pests. Including a variety of crops can make the predictions and recommendations more
generalizable. Agricultural practices can change over time due to factors like technological advancements, climate
change, and evolving farming techniques. Longitudinal data, collected over multiple years or growing seasons, can
help capture temporal variations and improve the external validity of predictions and recommendations. Soil
composition and quality play a crucial role in crop performance. External validity is enhanced when the dataset
includes information about different soil types and their characteristics. It's important to consider factors such as
irrigation, fertilization schedules, pest control methods, and crop rotation in the dataset to reflect the diversity of
practices used by farmers. Climate data, including temperature, precipitation, and weather patterns, should be
incorporated into the dataset. Farms come in various sizes, from small family farms to large commercial operations.
The dataset should not represent different farm sizes to ensure that recommendations are relevant to a wide range of
farming contexts.
The paper compares the proposed RFXGB-DeepFM technique with "classical machine learning and deep learning
techniques” such as SVM, XGB, RF, DT, KNN, NB, ANN, CNN and LSTM. It is safe to say that the level of
performance derived from the F-value, accuracy, recall and precision justifies the use of RFXGB-DeepFM in the
proposed system for crop yield prediction, crop and fertilizer recommendation. Data pre-processing steps are also
applied to the data to reduce the potential threats that are raised by errors, biases, or missing values. The proper hyper
parameter tuning is done to ensure that RFXGB and DeepFM perform optimally and reduce the overfitting or
underperformance.
1. Data Quality Control: Rigorously data is checked and clean to ensure its quality and accuracy. Data pre-
processing steps are also applied to the input data.
2. External Validation: AgroAdvisor system is also tested on three data sets that are crop yield prediction,
crop and fertilizer recommendation on different datasets, if available, to assess its external validity and
generalizability.
3. Sensitivity Analysis: To assess the robustness of the results, sensitivity analyses are also carried by varying
training and testing batch sizes. Further, Friedman test and post hoc test are also performed on the accuracy
parameter to find any appreciable differences between the proposed technique's performance and that of
other compared techniques.
4. Comparative Analysis: The proposed system is also compared with baseline models in section 4.4.
4.8.5. Limitations
Some limitations of proposed system are described below:
• System is limited for few crops only.
• Performance of AgroAdvisor varies with size of the dataset.
• AgroAdvisor is computationally expensive.
5. Conclusions
An integrated framework named as AgroAdvisor is successfully devised and put into practise in this study. It is
simple to use by farmers across India. With the use of this technology, farmers would be better able to choose the
right crop to plant based on a range of geographical and environmental parameters. It has three modules such as
crop recommendation, crop yield prediction and fertilizer recommendation system in this study. A new RFXGB-
DeepFM technique is proposed to predict the yield, recommend the crop and fertilizer to the user. The proposed
technique employs RFXGB for feature engineering, which can more efficiently gather feature combination
information and enhance the model's interpretability. Then, these collected features are fed into DeepFM. It has
FM layers and a fully connected feedforward neural network is used in the hidden layer. The proposed system is
compared with existing system using accuracy, precision, recall and F-value parameters. The proposed crop yied
prediction, crop and fertilizer recommendation modules achieved better accuracy rate using benchmark dataset
i.e. 98.34%, 97.34% and 99.78% respectively. Proposed technique also undergo Friedman and post hoc statistical
testing. Further, the pre-processing methods are applied on dataset changed some columns to numerical category.
Our model will help the farmers to make right choice for the crop growth thus increasing the productivity of the
crop and decreasing the suicidal rates. The future work that has to be done is making GUI for the project and
hosting it on cloud. In future we can add various models to it like plant disease prediction system and what will
be the cures for the particular disease and what precautions must be taken to prevent the disease in future without
harming the productivity of the crops. Further if the dataset is available then the entire model can run in
hierarchical manner. The model can be implemented with the weather api that will help to predict the weather of
coming weeks and would recommend the best time to sow the seeds and fertilizers for the crops.
Acknowledgements
We thank all lab members for this work.
Competing interests
The Authors declare no competing interests.
Ethical approval
Each author has reviewed and approved the publication of this work into scientific reports.
Consent to participate
Not applicable.
Consent to publish
Not applicable.
Funding
The authors received no funding from their institutes.
Authors’ Contribution
A.K. was responsible for the study's idea and design; R.M. collected the data; A.S. analysed and interpreted the
findings; S.M. and M.A.S. edited and reviewed the manuscript. The final copy of the paper was approved by all
authors after they had evaluated the findings.
Data Availability
The data-set generated and/or analysed during the current study are available in the Kaggle repository,
https://www.kaggle.com/datasets/atharvaingle/crop-recommendation-dataset,
https://www.kaggle.com/datasets/gdabhishek/fertilizer-prediction,
https://www.kaggle.com/datasets/patelris/crop-yield-prediction-dataset .
Experiment Guideline
All experiments involving plants were conducted under the guidance and supervision to ensure compliance with
ethical and safety standards.
References
1. Jayashree, D., Pandithurai, O., Paul Jasmin Rani, L., Menon, P. K., Beria, M. V., & Nithyalakshmi, S.
(2022). Fertilizer recommendation system using machine learning. In Disruptive Technologies for Big Data
and Cloud Applications: Proceedings of ICBDCC 2021 (pp. 709-716). Singapore: Springer Nature
Singapore.
2. “Agriculture in India: Industry Overview, Market Size, Role in Development...|IBEF”, available at
https://www.ibef.org/industry/agriculture-india.aspx, visited in February 2018
3. Kumar, Y.J.N., Spandana, V., Vaishnavi, V.S., Neha, K. and Devi, V.G.R.R., 2020, June. Supervised
machine learning approach for crop yield prediction in agriculture sector. In 2020 5th International
Conference on Communication and Electronics Systems (ICCES) (pp. 736-741). IEEE.
4. Reddy, D.J. and Kumar, M.R., 2021, May. Crop yield prediction using machine learning algorithm. In 2021
5th International Conference on Intelligent Computing and Control Systems (ICICCS) (pp. 1466-1470).
IEEE.
5. Nigam, A., Garg, S., Agrawal, A. and Agrawal, P., 2019, November. Crop yield prediction using machine
learning algorithms. In 2019 Fifth International Conference on Image Information Processing (ICIIP) (pp.
125-130). IEEE.
6. Nischitha, K., Vishwakarma, D., Ashwini, M.N. and Manjuraju, M.R., 2020. Crop prediction using machine
learning approaches. International Journal of Engineering Research & Technology (IJERT), 9(08), pp.23-
26.
7. Suruliandi, A., Mariammal, G. and Raja, S.P., 2021. Crop prediction based on soil and environmental
characteristics using feature selection techniques. Mathematical and Computer Modelling of Dynamical
Systems, 27(1), pp.117-140.
8. Pudumalar, S., Ramanujam, E., Rajashree, R.H., Kavya, C., Kiruthika, T. and Nisha, J., 2017, January. Crop
recommendation system for precision agriculture. In 2016 Eighth International Conference on Advanced
Computing (ICoAC) (pp. 32-36). IEEE.
9. Doshi, Z., Nadkarni, S., Agrawal, R. and Shah, N., 2018, August. AgroConsultant: intelligent crop
recommendation system using machine learning algorithms. In 2018 Fourth International Conference on
Computing Communication Control and Automation (ICCUBEA) (pp. 1-6). IEEE.
10. Pande, S.M., Ramesh, P.K., ANMOL, A., Aishwarya, B.R., ROHILLA, K. and SHAURYA, K., 2021,
April. Crop recommender system using machine learning approach. In 2021 5th international conference
on computing methodologies and communication (ICCMC) (pp. 1066-1071). IEEE.
11. Manoj Kumar, D.P., Malyadri, N. and Srikanth, M.S., 2021. A Machine Learning model for Crop and
Fertilizer recommendation. NVEO-NATURAL VOLATILES & ESSENTIAL OILS Journal| NVEO,
pp.10531-10539.
12. Archana, K. and Saranya, K.G., 2020. Crop yield prediction, forecasting and fertilizer recommendation
using voting based ensemble classifier. SSRG Int. J. Comput. Sci. Eng, 7, pp.1-4.
13. Verma, R., 2022, November. Crop Analysis and Prediction. In 2022 5th International Conference on
Multimedia, Signal Processing and Communication Technologies (IMPACT) (pp. 1-5). IEEE.
14. Morales, A. and Villalobos, F.J., 2023. Using machine learning for crop yield prediction in the past or the
future. Frontiers in Plant Science, 14, p.1128388.
15. Iniyan, S., Akhil Varma, V., and Teja Naidu, C. (2023). Crop yield prediction using machine learning
techniques. Adv. Eng. Softw. 175, 103326. doi:10.1016/j.advengsoft. 2022.103326
16. Panigrahi, B., Kathala, K. C. R., and Sujatha, M. (2023). A machine learning-based comparative approach
to predict the crop yield using supervised learning with regression models. Procedia Comput. Sci. 218,
2684–2693. doi:10.1016/j.procs.2023.01.241
17. P. Tiwari, and P. Shukla, “Crop yield prediction by modified convolutional neural network and geographical
indexes,” International Journal of Computer Sciences and Engineering, vol. 6, no. 8, pp. 503-513, 2018.
18. P. M. Gopal, and R. Bhargavi, “A novel approach for efficient crop yield prediction,” Computers and
Electronics in Agriculture, vol. 165, pp. 104968, 2019.
19. S. Khaki, and L. Wang, “Crop yield prediction using deep neural networks.” Frontiers in plant science, vol.
10, pp. 621, 2019.
20. Mariammal, G., Suruliandi, A., Raja, S.P. and Poongothai, E., 2021. Prediction of land suitability for crop
cultivation based on soil and environmental characteristics using modified recursive feature elimination
technique with various classifiers. IEEE Transactions on Computational Social Systems, 8(5), pp.1132-
1142.
21. Sachdeva, Ravi Kumar, Priyanka Bathla, Pooja Rani, Vinay Kukreja, and Rakesh Ahuja. "A systematic
method for breast cancer classification using RFE feature selection." In 2022 2nd International Conference
on Advance Computing and Innovative Technologies in Engineering (ICACITE), pp. 1673-1676. IEEE,
2022.
22. Mittal, Ruchi, Varun Malik, Vikram Singh, Jaiteg Singh, and Amandeep Kaur. "Integrating genetic
algorithm with random forest for improving the classification performance of web log data." In 2020 Sixth
International Conference on Parallel, Distributed and Grid Computing (PDGC), pp. 177-181. IEEE, 2020.
23. Datta, Parul, Prasenjit Das, and Abhishek Kumar. "Hyper parameter tuning based gradient boosting
algorithm for detection of diabetic retinopathy: an analytical review." Bulletin of Electrical Engineering
and Informatics 11, no. 2 (2022): 814-824.
24. Kumar, Deepak, Yash Kumar, Akhilesh Gulati, and Vinay Kukreja. "Wheat Crop Yield Prediction Using
Machine Learning." In 2022 International Conference on Data Analytics for Business and Industry
(ICDABI), pp. 433-437. IEEE, 2022.
25. https://www.kaggle.com/datasets/atharvaingle/crop-recommendation-dataset [access date: 12 October,
2023].
26. https://www.kaggle.com/datasets/gdabhishek/fertilizer-prediction [access date: 12 October, 2023].
27. https://www.kaggle.com/datasets/patelris/crop-yield-prediction-dataset [access date: 12 October, 2023].
28. Faouzi, J. and Janati, H., 2020. pyts: A python package for time series classification. The Journal of
Machine Learning Research, 21(1), pp.1720-1725.
29. Chavent, M., Genuer, R. and Saracco, J., 2021. Combining clustering of variables and feature selection
using random forests. Communications in Statistics-Simulation and Computation, 50(2), pp.426-445.
30. Guo, H., Tang, R., Ye, Y., Li, Z., He, X. and Dong, Z., 2018. Deepfm: An end-to-end wide & deep learning
framework for CTR prediction. arXiv preprint arXiv:1804.04950.
31. Kukkar, A., Kumar, Y., Sharma, A. and Sandhu, J.K., 2023. Bug Severity Classification in Software Using
Ant Colony Optimization Based Feature Weighting Technique. Expert Systems with Applications,
p.120573.
32. Kukkar, A., Lilhore, U.K., Frnda, J., Sandhu, J.K., Das, R.P., Goyal, N., Kumar, A., Muduli, K. and Rezac,
F., 2023. ProRE: An ACO-based programmer recommendation model to precisely manage software
bugs. Journal of King Saud University-Computer and Information Sciences, 35(1), pp.483-498.
33. Bondre, Devdatta A., and Santosh Mahagaonkar. "Prediction of crop yield and fertilizer recommendation
using machine learning algorithms." International Journal of Engineering Applied Sciences and
Technology 4, no. 5 (2019): 371-376.
34. Agarwal, S. and Tarar, S., 2021. A hybrid approach for crop yield prediction using machine learning and
deep learning algorithms. In Journal of Physics: Conference Series (Vol. 1714, No. 1, p. 012012). IOP
Publishing.
35. Venugopal, A., Aparna, S., Mani, J., Mathew, R. and Williams, V., 2021. Crop yield prediction using
machine learning algorithms. International journal of engineering research & technology (IJERT)
NCREIS, 9(13).
36. Kumar, Y.J.N., Spandana, V., Vaishnavi, V.S., Neha, K. and Devi, V.G.R.R., 2020, June. Supervised
machine learning approach for crop yield prediction in agriculture sector. In 2020 5th International
Conference on Communication and Electronics Systems (ICCES) (pp. 736-741). IEEE.
37. Priyadharshini, A., Chakraborty, S., Kumar, A., & Pooniwala, O. R. (2021, April). Intelligent crop
recommendation system using machine learning. In 2021 5th international conference on computing
methodologies and communication (ICCMC) (pp. 843-848). IEEE.
38. Durai, S.K.S. and Shamili, M.D., 2022. Smart farming using machine learning and deep learning
techniques. Decision Analytics Journal, 3, p.100041.
39. Govindwar, R., Jawale, S., Kalpande, T., Zade, S., Futane, P. and Williams, I., 2023. Crop and Fertilizer
Recommendation System Using Machine Learning. In AI, IoT, Big Data and Cloud Computing for Industry
4.0 (pp. 139-149). Cham: Springer International Publishing.
40. Senapati, B.R., Sanskar, Trishna, A. and Swain, R.R., 2022. Recommendations of crop yield and fertilizers
using machine learning algorithm. Journal of Information and Optimization Sciences, 43(5), pp.1029-1037.
41. Palaniraj, A., Balamurugan, A.S., Prasad, R.D. and Pradeep, P., 2021. Crop and fertilizer recommendation
system using machine learning. IRJET, 8.
42. Srinivasa, K., Prasad, N., Gopal, S.V. and Prasad, K.V., 2022. Suitable Fertilizer Recommendation System
Using Linear Forest Classifier. International Journal of YMER, 21(11), pp.2790-2805.