Plant Disease Detection Using Machine Learning
Abstract— Crop diseases are a noteworthy risk to sustenance chromatography, mass spectrometry, thermography and hyper
security, however their quick distinguishing proof stays spectral techniques have been employed for disease
troublesome in numerous parts of the world because of the non identification. However, these techniques are not cost effective
attendance of the important foundation. Emergence of accurate
and are high time consuming.
techniques in the field of leaf-based image classification has
shown impressive results. This paper makes use of Random In recent times, server based and mobile based approach
Forest in identifying between healthy and diseased leaf from the for disease identification has been employed for disease
data sets created. Our proposed paper includes various phases of identification. Several factors of these technologies being high
implementation namely dataset creation, feature extraction, resolution camera, high performance processing and extensive
training the classifier and classification. The created datasets of built in accessories are the added advantages resulting in
diseased and healthy leaves are collectively trained under automatic disease recognition.
Random Forest to classify the diseased and healthy images. For Modern approaches such as machine learning and deep
extracting features of an image we use Histogram of an Oriented learning algorithm has been employed to increase the
Gradient (HOG). Overall, using machine learning to train the
recognition rate and the accuracy of the results. Various
large data sets available publicly gives us a clear way to detect the
disease present in plants in a colossal scale. researches have taken place under the field of machine
learning for plant disease detection and diagnosis, such
Keywords— Diseased and Healthy leaf, Random forest, Feature traditional machine learning approach being random forest,
extraction, Training, Classification. artificial neural network, support vector machine(SVM),
fuzzy logic, K-means method, Convolutional neural networks
I. INTRODUCTION Random forests are as a whole, learning method for
classification, regression and other tasks that operate by
The agriculturist in provincial regions may think that it’s
constructing a forest of the decision trees during the training
hard to differentiate the malady which may be available in
time. Unlike decision trees, Random forets overcome the
their harvests. It's not moderate for them to go to agribusiness
disadvantage of over fitting of their training data set and it
office and discover what the infection may be. Our principle
handles both numeric and categorical data.
objective is to distinguish the illness introduce in a plant by
The histogram of oriented gradients (HOG) is an element
watching its morphology by picture handling and machine
descriptor utilized as a part of PC vision and image processing
for the sake of object detection. Here we are making
Pests and Diseases results in the destruction of crops or part
utilization of three component descriptors:
of the plant resulting in decreased food production leading to
1. Hu moments
food insecurity. Also, knowledge about the pest management
2. Haralick texture
or control and diseases are less in various less developed
3. Color Histogram
countries. Toxic pathogens, poor disease control, drastic
Hu moments is basically used to extract the shape of the
climate changes are one of the key factors which arises in
leaves. Haralick texture is used to get the texture of the leaves
dwindled food production.
and color Histogram is used to represent the distribution of the
Various modern technologies have emerged to minimize
colors in an image.
postharvest processing, to fortify agricultural sustainability
and to maximize the productivity. Various Laboratory based
approaches such as polymerase chain reaction, gas
II. LITERATURE REVIEW Hu moments: Image moments which have the important
[1] S. S. Sannakki and V. S. Rajpurohit, proposed a characteristics of the image pixels helps in describing the
“Classification of Pomegranate Diseases Based on Back objects. Here Hu moments help in describing the outline of a
Propagation Neural Network” which mainly works on the particular leaf. Hu moments are calculated over single channel
method of Segment the defected area and color and texture are only. The first step involves converting RGB to Gray scale
used as the features. Here they used neural network classifier and then the Hu moments are calculated. This step gives an
for the classification. The main advantage is it Converts to array of shape descriptors.
L*a*b to extract chromaticity layers of the image and Haralick Texture: Usually the healthy leaves and diseased
Categorisation is found to be 97.30% accurate. The main leaves have different textures. Here we use Haralick texture
disadvantage is that it is used only for the limited crops. feature to distinguish between the textures of healthy and
[2] P. R. Rothe and R. V. Kshirsagar introduced a” Cotton diseased leaf. It is based on the adjacency matrix which stores
Leaf Disease Identification using Pattern Recognition the position of (I,J). Texture [7] is calculated based on the
Techniques” which Uses snake segmentation, here Hu’s frequency of the pixel I occupying the position next to pixel J.
moments are used as distinctive attribute. Active contour To calculate Haralick texture it is required that the image be
model used to limit the vitality inside the infection spot, converted to gray scale.
BPNN classifier tackles the numerous class problems. The
average classification is found to be 85.52%.
[3] Aakanksha Rastogi, Ritika Arora and Shanu Sharma,” Leaf
Disease Detection and Grading using Computer Vision
Technology &Fuzzy Logic”. K-means clustering used to
segment the defected area; GLCM is used for the extraction of
texture features, Fuzzy logic is used for disease grading. They
used artificial neural network (ANN) as a classifier which
mainly helps to check the severity of the diseased leaf.
[4] Godliver Owomugisha, John A. Quinn, Ernest Mwebaze
and James Lwasa, proposed” Automated Vision-Based
Diagnosis of Banana Bacterial Wilt Disease and Black
Sigatoka Disease “Color histograms are extracted and
transformed from RGB to HSV, RGB to L*a*b.Peak
components are used to create max tree, five shape attributes
are used and area under the curve analysis is used for
classification. They used nearest neighbors, Decision tree,
random forest, extremely randomized tree, Naïve bayes and
SV classifier. In seven classifiers extremely, randomized trees
yield a very high score, provide real time information provide
flexibility to the application. Fig.1. RGB to Gray scale conversion of a leaf.
[5] uan Tian, Chunjiang Zhao, Shenglian Lu and Xinyu Guo,”
SVM-based Multiple Classifier System for Recognition of
Wheat Leaf Diseases,” Color features are represented in RGB Color Histogram: Color histogram gives the representation of
to HIS, by using GLCM, seven invariant moment are taken as the colors in the image. RGB is first converted to HSV color
shape parameter. They used SVM classifier which has MCS, space and the histogram is calculated for the same. It is needed
used for detecting disease in wheat plant offline. to convert the RGB image to HSV since HSV model aligns
closely with how human eye discerns the colors in an image.
III. PROPOSED METHODOLOGY Histogram plot [8] provides the description about the number
To find out whether the leaf is diseased or healthy, certain of pixels available in the given color ranges
steps must be followed. i.e., Preprocessing, Feature extraction,
Training of classifier and Classification. Preprocessing of
image, is bringing all the images size to a reduced uniform
size. Then comes extracting features of a preprocessed image
which is done with the help of HOG . HoG [6] is a feature
descriptor used for object detection. In this feature descriptor
the appearance of the object and the outline of the image is
described by its intensity gradients. One of the advantage of
HoG feature extraction is that it operates on the cells created.
Any transformations doesn’t affect this.
Here we made use of three feature descriptors.
Fig.2. RGB to HSV conversion of leaf
Fig.6. Flow chart for classification
First for any image we need to convert RGB image into gray
scale image. This is done just because Hu moments shape
descriptor and Haralick features can be calculated over single
channel only. Therefore, it is necessary to convert RGB to
gray scale before computing Hu moments and Haralick
features. As depicted in the figure 4.
To calculate histogram the image first must be converted to
HSV (hue, saturation and value), so we are converting RGB Fig.8. Comparison between different machine learning models.
image to an HSV image as shown the figure5.
Finally, the main aim of our project is to detect whether it is TABLE I.
diseased or healthy leaf with the help of a Random forest
classifier which is as depicted in the “Fig.7.”
