Crop Prediction Using Deep Learning

Ignatious K Pious Supra Jyotsna. Neravati

dept. of Computer Science dept. of Computer Science
Vel Tech Rangarajan Dr. Sagunthala R&D Vel Tech Rangarajan Dr. Sagunthala R&D
Institute of Science and Technology Institute of Science and Technology
Chennai, India Chennai, India

ABSTRACT: variability within a farm exists in most soils

The aim of this paper is to predict within field and regions. This variability interacts with
variation of yield, based on on-line multi-layer weather, inputs (which practically cannot be
soil data. Yield prediction in precision applied homogeneously) and the variability of
farming, is considered of high importance for genetic material to produce crop and yield
the improvement of crop management. The (quantitative and qualitative) variability. These
knowledge-based systems approach to different spatial or temporal variabilities have
agricultural management is gaining wide to be properly managed by the farmers to
acceptance because of the expanding achieve the best profit with the lowest inputs,
knowledge of processes involved in plant thus reducing the adverse environmental
growth. The classical approach uses dynamic effects.
mechanistic simulation models for crop Keywords: Convolutional Neural Network
growth. Current knowledge of plant growth (CNN), Random Forest, Datasets, CHAID,
and development is integrated in a quantitative KNearest Neighbour, Naive Bayes.
and process-oriented manner. After proper INTRODUCTION:
calibration and validation, models could This paper proposes a crop recommendation
predict crop response to various environments system that uses a Convolutional Neural
and simulate alternative crop management Network (CNN) and a Random Forest Model
options. A variety of approaches, models and to predict the optimal crop to be grown by
algorithms, have been presented and used to analyzing various parameters including the
enable yield prediction in agriculture. Simple region, soil type, yield, selling price, etc. The
linear correlations of yield with soil properties CNN architecture gave an accuracy of 95.21
have been proposed based on limited number %, and the Random Forest Algorithm had an
of soil samples. The factors affecting plant accuracy of 75% [9]. Soil variability within a
growth are assumed to be integrated in leaf farm exists in most soils and regions. This
area index (LAI). These factors are, for variability interacts with weather, inputs
example, initial soil conditions, management (which practically cannot be applied
factors, genetic factors and temporal homogeneously) and the variability of genetic
variability of climatic parameters. Soil material to produce crop and yield
(quantitative and qualitative) variability [7]. convolution [12]. This work is to create a
The knowledge-based systems approach to suitable model for classifying various kinds of
agricultural management is gaining wide soil series data along with suitable crops
acceptance because of the expanding suggestion. Series are recognized by machine
knowledge of processes involved in plant learning methods using various chemical
growth. The classical approach uses dynamic features and possible crops for that soil series
mechanistic simulation models for crop are suggested using geographical attributes [2].
growth. Current knowledge of plant growth Machine learning, which is a branch
and development is integrated in a quantitative of Artificial Intelligence (AI) focusing on
and process-oriented manner [4]. Crop learning, is a practical approach that can
prediction depends on the soil, geographic and provide better yield prediction based on
climatic attributes. Selecting appropriate several features. Machine learning (ML) can
attributes for the right crop/s is an intrinsic part determine patterns and correlations and
of the prediction undertaken by feature discover knowledge from datasets. The models
selection techniques. In this work, a need to be trained using datasets, where the
comparative study of various wrapper feature outcomes are represented based on past
selection methods are carried out for crop experience [19]. Crop yield prediction based
prediction using classification techniques that on environmental data and management
suggest the suitable crop/s for land. Crop practices. The proposed CNN model, along
prediction depends on the soil, geographic and with other popular methods such as random
climatic attributes. Selecting appropriate forest (RF) [7]. Soil is an important parameter
attributes for the right crop/s is an intrinsic part affecting crop yield prediction. Analysis of
of the prediction undertaken by feature soil nutrients can aid farmers and soil analysts
selection techniques. In this work, a to get higher yield of the crops by making
comparative study of various wrapper feature prior arrangements [11]. The main problem
selection methods are carried out for crop that lingers in the lack of high crop yield lies
prediction using classification techniques that in the improper identification and selection of
suggest the suitable crops for land [17]. Deep micronutrients and macronutrients of the soil.
Learning is productive when massive data is These macro and micro nutrients play a much
available for training, and these models have bigger role in the growth of crops and
solved many complicated, dynamic real-time influence their productivity to a considerable
problems with higher accuracy with time. extent. Absence of any one of the
Convolution Neural Networks (CNN) is a micronutrients in the soil can limit and disrupt
unique class of neural networks that processes the growing nature of plants even when all
known data, which has grid topology. CNN other nutrients are present in adequate
has many applications, and it operates on a quantities. The proposed study aims at
mathematical operator, which is called predicting the actual availability of the micro
and macro nutrients with the help of traditional Analysis) within Artificial Intelligence. The
values from Agricultural department [18]. goal is to recommend suitable crops for a
Farmers lack in basic knowledge of nutrient selected land based on site-specific parameters
content of soil, selection of crop best suited for with high accuracy and efficiency. The
soil and they also lack in efficient method of challenge lies in identifying the optimal crops
prediction of crop well in advance so that for available land, considering fluctuating
appropriate methods can be used to improve environmental factors like temperature, water
crop productivity and to make arrangements levels, and soil conditions. This crop
for storage, marketing well before harvest. recommendation system addresses these
This work presents an approach which uses challenges by predicting the most suitable crop
different Machine Learning techniques in for a selected area through the analysis of
order to predict the category of the yield based environmental factors using trained sub-
on macro-nutrients and micro- nutrients status models.
in dataset [16]. It helps the farmer to analyze [1] India, renowned for its agricultural
the fertility of their yard and plant the better prominence, stands among the top global
crop to increase their productivity and profit producers for various crops. Despite the
[20]. The changes in atmospheric condition centrality of the Indian farmer to the
cause changes in soil condition and in agricultural sector, many of them occupy a
temperature. Both air temperature and soil lower socio-economic status. Additionally,
temperature have distinct roles in crops [6]. determining the most suitable and profitable
RELATED WORK: crop for a particular soil remains a challenge,
[9] Automating agricultural processes, with or given the diverse soil types across
without human intervention, has become geographical regions. This study introduces a
crucial due to limited domestic land space. In crop recommendation system that utilizes a
Sri Lanka, despite having manual knowledge Convolutional Neural Network (CNN) and a
and techniques in agriculture, there is a lack of Random Forest Model to predict the ideal crop
systems that detect environmental factors and based on factors such as region, soil type,
provide crop recommendations. This paper yield, selling price, and more. The CNN
presents a theoretical and conceptual achieved an accuracy of 95.21Algorithm
framework for a Recommendation System demonstrated an accuracy of 75
utilizing integrated models. The system [13] Originally conceived to tackle soil and
incorporates Arduino microcontrollers for crop parameter variations on a large scale in
collecting environmental factors, machine developed nations, Precision Agriculture (PA)
learning techniques such as Naive Bayes and has the potential for adaptation in farm-based
Support Vector Machine, unsupervised agriculture, especially for small and marginal
machine learning with K-Means Clustering, farmers in 6 Developing Countries. This
and Natural Language Processing (Sentiment adaptation involves utilizing a database
encompassing farmer-soil-crop information parameters.
gathered from the field. Agricultural experts [8] Precision agriculture involves integrating
contribute crop calendars, and real-time the latest technology into farming practices.
parameters like temperature and rainfall are The agricultural sector generates vast amounts
obtained through sensors. An analytical model, of data, and various data mining techniques are
incorporating static, semi-static, and dynamic applied to optimize its utilization. This paper
inputs, simulates the crop calendar. The explores different classification algorithms
outcome is the delivery of farmer- and crop- within the field of data mining, specifically
specific support advisories through devices focusing on their application to a dataset
like mobile phones and tablets. accumulated over several years for predicting
soybean crop yield. Subsequently, a
[14] Data mining involves the examination comparative analysis is conducted to
and extraction of meaningful information from determine the most effective classification
data, and it finds applications across diverse algorithm for accurate yield prediction within
fields such as finance, retail, medicine, and the realm of classification techniques.
agriculture. In the agricultural sector, data METHODS AND MATERIALS:
mining is utilized to analyze both biotic and Data set used: This data set contains the
abiotic factors. Given the pivotal role of values of nitrogen, phosphorous, potassium,
agriculture in the Indian economy and temperature, humidity, pH, Rainfall, label. It
employment landscape, a prevalent issue faced contains 220001 entities and 22 crops.
by Indian farmers is the challenge of selecting Proposed System: The system incorporates a
the most suitable crop based on their soil comprehensive ensemble model that utilizes
requirements, leading to productivity setbacks. machine learning algorithms, including
Precision agriculture addresses this challenge Random Tree, CHAID, KNearest Neighbour,
by leveraging research data on soil and Naive Bayes. By employing a majority
characteristics, types, and crop yield to voting technique, the system ensures robust
recommend the optimal crop for specific site and reliable crop recommendations, taking into
parameters, reducing the risk of incorrect crop account a wide range of factors such as soil
choices and enhancing productivity. This characteristics, soil types, and historical crop
paper proposes a solution to this problem by yield data.
introducing a recommendation system 3.1. Materials:
employing an ensemble model with a majority As it contains the values of nitrogen,
voting technique. The ensemble model phosphorous, potassium, temperature,
incorporates Random Tree, CHAID, K- humidity, pH, Rainfall, label. It contains
Nearest Neighbor, and Naive Bayes as learners 220001 entities and 22 crops.
to provide accurate and efficient crop Nitrogen, Phosphorus, Potassium: These are
recommendations based on site-specific likely nutrient levels in the soil, crucial for
plant growth. Different crops have varying relationships between these factors and the
nutrient requirements, making these features successful growth of various crops. The
essential for optimizing cultivation practices. dataset could be valuable for tasks such as
Temperature, Humidity: Environmental crop yield prediction, optimization of
factors influencing plant growth and cultivation practices, or the development of
development. Monitoring temperature and decision support systems for farmers.
humidity helps understand the climatic 3.2. METHOD:
conditions conducive to specific crops. Convolutional Neural Network (CNN):
pH: The pH level of the soil, which is vital as CNN is a deep learning algorithm primarily
it influences nutrient availability. Different used for image classification but adapted in
crops thrive in soils with specific pH ranges. this context for its ability to extract
Environmental Factor: hierarchical features from input data. In the
Rainfall: The amount of precipitation, a system, CNN is employed to analyze and learn
critical factor for crop growth. Rainfall data patterns from the environmental parameters
helps assess water availability and plan provided as input. It consists of layers such as
irrigation strategies. convolutional layers, pooling layers, and dense
Label: layers. Convolutional layers extract features,
12 Different Crops: This indicates the pooling layers reduce dimensionality, and
classification or category of each entry in the dense layers make predictions.
dataset, representing the type of crop being Random Forest: Random Forest is an
cultivated. Each crop likely belongs to a ensemble learning method that operates by
distinct class, allowing for supervised machine constructing multiple decision trees during
learning or statistical analysis based on the training and outputs the mode of the classes as
specific crop. the prediction. In the crop recommendation
Dataset Size: system, Random Forest is utilized to analyze
220,001 Entities: This denotes the number of the input parameters and make predictions
observations or entries in the dataset. Each based on the collective decision making of
entry likely represents a set of values for the multiple trees. It is known for its versatility,
mentioned features corresponding to a specific efficiency, and resistance to overfitting.
instance of crop cultivation. Ensemble Model with Majority Voting: The
In summary, this dataset seems to encompass a system implements an ensemble model that
comprehensive set of features related to soil combines the predictions from multiple
nutrients, environmental conditions, and pH, learners, including Random Forest, CHAID,
along with information on rainfall, for the K-Nearest Neighbour (KNN), and Naive
cultivation of 12 different crops. The large Bayes. Majority voting is employed to
dataset size of 220,001 entities allows for determine the final recommendation. Each
robust analysis and modeling to understand the learner contributes its prediction, and the crop
with the most votes is selected as the
recommended crop. This approach leverages
the strengths of different algorithms,
potentially improving accuracy and
Data Preprocessing: Before inputting data
into the algorithms, a data preprocessing step
is implemented. This involves handling
missing values, removing duplicates, scaling
features using StandardScaler, and encoding
categorical labels using LabelEncoder. The
preprocessing ensures that the input data is in a
suitable format for the machine learning
Training and Testing: The dataset is split
into training and testing sets using the train test
split function. The training set is used to train
the machine learning models, and the testing
set is used to evaluate their performance. The
models are compiled with appropriate loss
functions and optimizers, and the training
involves adjusting the model parameters to
minimize the loss.


The general architecture for crop prediction it
is combination of data acquisition, pre-
processing, feature extraction, model training,
and prediction.

FIG: Heat map for inputs


Efficiency of the Proposed System:
The proposed system is based on the Random
forest Algorithm that creates many decision
trees. Accuracy of proposed system is done by
using random forest gives the ouput
approximately 76 to 78 percent. Random
forest implements many decision trees and
also gives the most accurate output when
compared to the decision tree. Random Forest
algorithm is used in the two phases. Firstly, the
RF algorithm extracts subsamples from the
original samples by using the bootstrap
resampling method and creates the decision
trees for each testing sample and then the is less when compared to proposed system.
algorithm classifies the decision trees and Proposed system:(Random forest
implements a vote with the help of the largest algorithm) Random forest algorithm generates
vote of the classification as a final result of the more trees when compared to the decision tree
classification. The random Forest algorithm and other algorithms. We can specify the
always includes some of the steps as follows: number of trees we want in the forest and also
Selecting the training dataset: Using the we also can specify maximum of features to be
bootstrap random sampling method we can used in the each of the tree. But, we cannot
derive the K training sets from the original control the randomness of the forest in which
dataset properties using the size of all training the feature is a part of the algorithm. Accuracy
set the same as that of original training dataset. keeps increasing as we increase the number of
Building the random forest algorithm: trees but it becomes static at one certain point.
Creating a classification regression tree each Unlike the decision tree it won’t create more
of the bootstrap training set will generate the K biased and decreases variance. Proposed
decision trees to form a random forest model, system is implemented using the Random
uses the trees that are not pruned. Looking at forest algorithm so that the accuracy is more
the growth of the tree, 31 this approach is not when compared to the existing system.
chosen the best feature as the internal nodes CONCLUSION AND FUTUREWORK:
for the branches but rather the branching In conclusion, this aims to address the crucial
process is a random selection of all the trees challenges faced by farmers in crop selection
gives the best features. by leveraging advanced technologies,
Comparison of Existing and Proposed specifically machine learning and precision
System: agriculture. By incorporating a Convolutional
Existing system:(Decision tree): In the Neural Network (CNN) and other machine
Existing system, we implemented a decision learning techniques, the proposed system
tree algorithm that predicts whether to grant endeavors to provide accurate and efficient
the loan or not. When using a decision tree crop recommendations based on diverse
model, it gives the training dataset the environmental factors. The project’s
accuracy keeps improving with splits. We can foundation lies in the recognition of the
easily overfit the dataset and doesn’t know significance of agriculture in the economy and
when it crossed the line unless we are using the need to empower farmers with intelligent
the cross validation. The advantages of the decision-making tools. Through the integration
decision tree are model is very easy to of diverse technologies, including data mining,
interpret we can know that the variables and data preprocessing, and machine learning
the value of the variable is used to split the algorithms, the system seeks to enhance crop
data. But the accuracy of decision tree in selection processes, ultimately leading to
existing system gives less accurate output that improved agricultural productivity and
economic outcomes. The utilization of a Flask- Dynamic Learning Models: Implementing
based web application further extends the dynamic learning models that continuously
project’s impact by providing an accessible update and adapt based on new data and
and user-friendly interface for farmers to input environmental changes can improve the
environmental parameters and receive real- system’s accuracy over time. This involves
time crop recommendations. The web incorporating techniques like online learning
application leverages a trained model, to accommodate evolving agricultural
incorporating a CNN, to predict the most conditions.
suitable crops for a given set of conditions. User Feedback Mechanism: Introducing a
The system’s feasibility is underscored by its user feedback mechanism in the web
ability to adapt to diverse geographical and application can allow farmers to provide
climatic conditions, providing tailored insights into the actual outcomes of their crop
recommendations for specific regions. The selections. This feedback loop can be used to
inclusion of a detailed methodology, refine and improve the recommendation
algorithmic approach, and systematic testing system, making it more responsive to local
procedures contributes to the robustness and variations.
reliability of the proposed crop Crop Disease Prediction: Extending the
recommendation system. In essence, this system to predict potential crop diseases based
project represents a significant step towards on environmental conditions can further assist
modernizing agriculture, empowering farmers farmers in preventive measures. Machine
with data-driven insights, and contributing to learning models can be trained on historical
the sustainable growth of the agricultural data to identify patterns associated with
sector. The integration of cutting-edge specific diseases.
technologies not only addresses existing Multi-Crop Recommendation: Enhancing
challenges in crop selection but also sets the the system to recommend multiple crops in a
stage for future advancements in precision rotational pattern can contribute to sustainable
agriculture. farming practices. Consideration of crop
Future Enhancements: rotation can help maintain soil health and
Integration of Remote Sensing Data: reduce the risk of pests and diseases.
Incorporating remote sensing data can enhance
