0% found this document useful (0 votes)
649 views

Sample Report

Uploaded by

Sai Vasanth G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
649 views

Sample Report

Uploaded by

Sai Vasanth G
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

INTELLIGENT PLANT DISEASE

DIAGNOSIS WITH EXPLAINABLE AI


METHODS AND LIGHTWEIGHT
MODEL
A PROJECT REPORT
Submitted by
ISHAN JOSHI (RA2111003010387)
NAMAN MARDIA (RA2111003010407)
Under the Guidance of
Dr.R. VIDHYA
(Assistant Professor, Department of Computing Technologies)

in partial fulfillment of the requirements for the degree of


BACHELOR OF TECHNOLOGY
in
COMPUTER SCIENCE ENGINEERING

DEPARTMENT OF COMPUTING TECHNOLOGIES


COLLEGE OF ENGINEERING AND TECHNOLOGY
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR- 603 203

NOVEMBER 2024
Department of Computational Intelligence
SRM Institute of Science & Technology
Own Work* Declaration Form

This sheet must be filled in (each box ticked to show that the condition has been met). It must be
signed and dated along with your student registration number and included with all assignments
you submit – work will not be marked unless this is done.
To be completed by the student for all assessments

Degree/ Course : B. Tech /Computer Science Engineering

Student Name : Ishan Joshi, Naman Mardia

Registration Number : RA2111003010387, RA2111003010407


Title of Work : INTELLIGENT PLANT DISEASE DIAGNOSIS WITH EXPLAINABLE
AI METHODS AND LIGHTWEIGHT MODEL

I / We hereby certify that this assessment compiles with the University’s Rules and Regulations
relating to Academic misconduct and plagiarism**, as listed in the University Website,
Regulations, and the Education Committee guidelines.

I / We confirm that all the work contained in this assessment is my / our own except where
indicated, and that I / We have met the following conditions:

● Clearly referenced / listed all sources as appropriate


● Referenced and put in inverted commas all quoted text (from books, web, etc)
● Given the sources of all pictures, data etc. that are not my own
● Not made any use of the report(s) or essay(s) of any other student(s) either past or present
● Acknowledged in appropriate places any help that I have received from others (e.g.
fellow students, technicians, statisticians, external sources)
● Compiled with any other plagiarism criteria specified in the Course handbook /
University website

I understand that any false claim for this work will be penalized in accordance with the
University policies and regulations.

DECLARATION:
I am aware of and understand the University’s policy on Academic misconduct and plagiarism and I certify
that this assessment is my / our own work, except where indicated by referring, and that I have followed
the good academic practices noted above.

If you are working in a group, please write your registration numbers and sign with the date for
every student in your group.
SRM INSTITUTE OF SCIENCE AND TECHNOLOGY
KATTANKULATHUR – 603 203
BONAFIDE CERTIFICATE

Certified that 18CSP107L - Minor Project [18CSP108L]- report titled


“INTELLIGENT PLANT DISEASE DIAGNOSIS WITH EXPLAINABLE AI
METHODS AND LIGHTWEIGHT MODEL” is the bonafide work of “ISHAN
JOSHI[RA2111003010387], NAMAN MARDIA [RA2111003010407]” who
carried out the project work under my supervision. Certified further, that to the
best of my knowledge the work reported herein does not form any other project
report or dissertation on the basis of which a degree or award was conferred on
an earlier occasion on this or any other candidate.

SIGNATURE SIGNATURE

Dr. R. VIDHYA Dr. G. Niranjana

SUPERVISOR PROFESSOR & HEAD


Assistant Professor Department of
Department of Computing Technologies
Computing Technologies
ACKNOWLEDGEMENTS
We express our humble gratitude to Dr. C. Muthamizhchelvan, Vice-Chancellor, SRM Institute of Science
and Technology, for the facilities extended for the project work and his continued support.

We extend our sincere thanks to Dr. T. V. Gopal, Dean-CET, SRM Institute of Science and Technology,
for his invaluable support.

We wish to thank Dr. Revathi Venkataraman, Professor and Chairperson, School of Computing, SRM
Institute of Science and Technology, for her support throughout the project work.

We encompass our sincere thanks to, Dr. M. Pushpalatha, Professor and Associate Chairperson, School of
Computing and Dr. C. Lakshmi, Professor and Associate Chairperson, School of Computing, SRM Institute of
Science and Technology, for their invaluable support.

We are incredibly grateful to our Head of the Department, Dr. G. Niranjana, Professor, Department of
Computing Technologies, SRM Institute of Science and Technology, for her suggestions and encouragement at
all the stages of the project work.

We want to convey our thanks to our Project Coordinators, Dr.R.Vidhya, Dr.M.ArulPrakash and Dr.M.Revathi
and Panel Head, Dr.R.Vidhya and Panel Members, Dr. Vinod D and Dr.Balamurugan G, Department of
Computing Technologies, SRM Institute of Science and Technology, for their inputs during the project reviews
and support.

We register our immeasurable thanks to our Faculty Advisor, Dr.R. VIDHYA, Department of Computing
Technologies, SRM Institute of Science and Technology, for leading and helping us to complete our course.

Our inexpressible respect and thanks to our guide, Dr.R. VIDHYA, Department of Computing Technologies,
S.R.M Institute of Science and Technology, for providing us with an opportunity to pursue our project under her
mentorship. She provided us with the freedom and support to explore the research topics of our interest. Her
passion for solving problems and making a difference in the world has always been inspiring.

We sincerely thank all the staff and students of Computing Technologies, School of Computing, S.R.M
Institute of Science and Technology, for their help during our project. Finally, we would like to thank our parents,
family members, and friends for their unconditional love, constant support and encouragement

Ishan Joshi [RA2111003010387]

Naman Mardia [RA2111003010407]


TABLE OF CONTENTS

Content Page. No.


Abstract I
List of Figures III
List of Tables IV
1 Introduction 1
1.1 Diseases related to Plants 3
1.1.1 Yellow leaf curl virus (YLCV) 4
1.1.2 Huanglongbing (HLB) 5
1.1.3 Early Blight 6
1.1.4 Powdery Mildew 6
1.1.5 Downy Mildew 7
1.1.6 Rust 8
1.2 Impact of Plant Disease 9
2 Literature Review 10
3 Problem Statement and Proposed Solution 13
4 Design and Implementation 14
4.1 Architecture Diagram
4.2 Methodology and Algorithm 22
5 Result and Discussion 24
6 Conclusion and Future Enhancements 27
6.1 Conclusion
6.2 Future Scope
Appendix -1
References
Plagiarism Report
Paper Publication Proof
LIST OF FIGURES

Figure No. Name Page. No.


1.1 Healthy Plant 2
1.2 Diseased Plant 2
1.3 Yellow Leaf Curl Virus 4
1.4 Huanglongbing 5
1.5 Early Blight 6
1.6 Powdery Mildew 7
1.7 Downy Mildew 8
1.8 Rust 8
4.1 Block Diagram 14
5.1 Plant Village Dataset 16
7.1 Python Programming Language 23
7.2 Anaconda IDE 24
7.3 Anaconda Navigator 25
7.4 Jupyter Notebook/Labs 26
7.5 Jupyter Notebook 26
7.6 Machine Learning Chart 27
7.7 Supervised Learning 29
7.8 Supervised Learning Graphical Representation 29
7.9 Unsupervised Learning 30
7.10 Unsupervised Learning Graphical Representation 30
7.11 Semi-Supervised Learning 31
7.12 Reinforcement Learning 32
8.1 Result with RF Classifier 33
8.2 Comparison of accuracy of machine learning models 33
8.3 Performance Evaluation 34
8.4 Result with CNN model (1) 34
8.5 Result with CNN model (2) 34
8.6 Output with actual label, predicted label, and confidence 35
LIST OF TABLES
Table No. Name Page. No.
6.1 Algorithm 22
CHAPTER 1

INTRODUCTION

Plant diseases are induced by a wide range of biotic and abiotic agents that impact plant
growth, development and productivity. These biotic stressors are pathogens (including
bacteria, fungi, viruses, nematodes and insects), while abiotic stressors are
environmental stressors (drought, extreme temperatures, soil nutrient deficiencies).
Plant diseases can be extremely costly to agriculture, food security and livelihoods.
Losses from plant diseases alone account for as much as 40 per cent of the world's crop
harvests, according to the UN's Food and Agriculture Organisation, totalling billions of
dollars in economic damage annually. Plant diseases can also impact the quality and
safety of food products, which are a risk to human and animal health.
Plant diseases can take many forms, from visible symptoms like wilting, yellowing,
death and lesions. The intensity and extent of plant diseases can depend on numerous
factors, including the nature of the pathogen, the host plant, the environment, and
management practices.
Plant pathogens, the biotic factors causing plant diseases, move by soil, water, air, and
by contact between plants. Other pathogens are carried by insects and other vectors,
which can spread infection quickly and widely. Physical stressors (abiotic factors), like
drought and heat, can also make plants more vulnerable to infections by biotic
pathogens.
The prevention, timely detection, and correct treatment of plant diseases demand a
mixture of management and control. Prevention involves implementing measures to
minimize the risk of disease introduction and spread, such as using disease-resistant crop
varieties, practicing crop rotation, and maintaining proper sanitation.
Plant disease can be managed effectively only if detected early enough to intervene in a
timely manner. Conventional approaches to plant disease diagnosis, such as visual
assessment by experts, are often slow and subjective. Automated methods of plant
disease detection, such as remote sensing and imaging, offer faster and more objective
means of detecting diseases. Machine learning algorithms have been trained to scan
massive plant image databases and spot patterns that are suggestive of disease
symptoms, with good success.
Plant disease detection has been one of the most promising application areas for machine
learning algorithms. By examining vast collections of plant images and
Fig 1.1: Healthy Plant

Fig 1.2: Diseased Plant

looking for patterns that suggest disease symptoms, these algorithms can be applied.
Using machine learning, scientists can teach systems to identify and diagnose certain
diseases much quicker and more efficiently than by running tests through traditional
channels.
Among the most significant benefits of machine learning for plant disease detection is
its capacity to process huge numbers of images of plants in a fast and efficient manner.
Machine learning models can detect subtle changes in plant appearance and identify
patterns that are indicative of disease symptoms, even when those symptoms are difficult
for humans to detect.
It also contributes to more reliable diagnosis of disease. Traditional methods of plant
disease diagnosis, such as visual inspection by experts, can be subjective and prone to
error. In contrast, machine learning models are consistent and objective, and they can
analyze images with a high degree of accuracy.
Another advantage of machine learning is its scalability. The proliferation of high-
resolution digital cameras and the advent of remote-sensing technology mean it's now
simple to acquire a huge amount of plant image data in a short amount of time. Machine
learning algorithms can read these datasets and identify diseases over vast tracts of land,
offering clues to how plant diseases spread.

1.1 DISEASES RELATED TO PLANTS

Plant diseases can manifest in many different ways, depending on the disease and the
plant involved. Symptoms typically include leaf spots, wilting, stunting, discoloration
and deformities. Some common symptoms include:

Leaf spots: Tiny, round or irregularly shaped patches on leaves, which can be coloured
or have a necrotic (dead) core.

Yellowing: Leaves might yellow, signalling nutrient deficiencies or viral infections.


Necrosis: Chunks of plant may become brown or black as cells die as a result of disease
or environmental stress.

Chlorosis: Leaves turn yellow, but the veins stay green, which means iron, magnesium,
or some other nutrient is lacking.

Necrotic streaks: Black, sunken lines on leaves or stems, possibly caused by viral or
bacterial disease.

Powdery or fuzzy growth: Fungal diseases can cause a powdery or fuzzy coating to
appear on leaves, stems or flowers.

There are many types of pathogens that can cause plant diseases, including fungi,
bacteria, viruses, nematodes and phytoplasmas. Some diseases are also caused by abiotic
factors, such as nutrient deficiencies, temperature extremes, and drought.
1.1.1 YELLOW LEAF CURL VIRUS (YLCV)

Yellow leaf curl virus is a single-stranded DNA virus belonging to the family
Geminiviridae. The virus takes over plant cells, and symptoms of the disease can differ
widely depending on the species of plant, the viral strain and the environmental
conditions. This virus is seen in tomato plants and other crops in the family Solanaceae.
It is spread by the whitefly Bemisia tabaci and is one of the most destructive viral
diseases of tomato crops globally.

Fig 1.3: Yellow leaf curl virus

Symptoms of YLCV infection include leaf curling, chlorosis, stunting, and poor fruit
quality and yield. The virus can infect plants at any point in their growth cycle, but it is
most destructive if it strikes early in the growing season.
YLCV has no known cure, and once a plant is infected, there is little that can be done to
prevent its death. However, there are some preventive measures that can be taken to
reduce the risk of infection. Among these are planting resistant tomato cultivars,
managing whitefly populations, and maintaining good sanitation in the greenhouse or in
the field.
1.1.2 HUANGLONGBING (HLB)

Citrus greening disease (known officially as Huanglongbing or HLB) is a devastating


bacterial disease that has infected citrus trees across the globe. It's triggered by the
bacterium Candidatus Liberibacter spp, spread by the Asian citrus psyllid, a minuscule
bug that sucks sap from the leaves and stems of citrus plants. HLB is thought to have
originated in Asia but has since become established in citrus-growing areas around the
globe, including the US, Brazil, China and India. The disease has already cost citrus
growers billions of dollars, and it could destroy the citrus industry in many areas. The
bacterium that causes HLB infects the phloem tissue of citrus trees, which is responsible
for transporting nutrients and sugars throughout the tree. This interferes with the flow of
nutrients through the tree and causes the disease's characteristic symptoms.

Fig 1.4: Huanglongbing

HLB can take several years to manifest symptoms, and the disease can be hard to spot
and manage. The symptoms of HLB can vary depending on the stage of the disease, but
they typically include yellowing and blotching of leaves, stunted growth, and bitter and
misshapen fruit. The disease can eventually kill the tree, and it poses a serious danger to
the world's citrus industry. HLB has no cure, and infected trees need to be chopped down
to stop the spread of the disease.
1.1.3 EARLY BLIGHT

Early blight is a common fungal disease that affects tomato and potato plants, as well as
other plants in the Solanaceae family. The pathogen is the fungus Alternaria solani,
which can live in the soil for years.
Early blight can affect plants at any stage of growth, but it is most common in the early
stages of plant development. Early blight is favored by warm, humid weather and can
spread rapidly under these conditions. It can hibernate in infected plant residue and soil,
and crop rotation is a key management tactic for preventing the disease.

Fig 1.5: Early Blight

Early blight symptoms usually show up on the lower part of the plant and take the form
of small, round lesions that are dark brown or black in colour. The lesions can grow
larger and eventually merge together, leading to the death of the affected leaves.
In extreme cases, the disease can lead to leaf drop and crop loss.

1.1.4 POWDERY MILDEW

Powdery mildew is a fungal disease that affects a wide range of plants, including
vegetables, fruits, ornamental plants, and trees. The disease is caused by several species
of fungi in the Erysiphaceae family. The fungus that causes powdery mildew is able to
produce asexual spores called conidia, which can be easily spread by wind and water.
The spores can also be spread by insects or other animals that come into contact with the
infected plant material.
Fig 1.6: Powdery Mildew

Powdery mildew causes a powdery white or gray coating on the leaves, stems, and
flowers of plants. Affected regions can become misshapen or stunted, and the leaves turn
yellow and drop early. The illness can debilitate the plant and leave it unable to bear
fruit or flower.
Powdery mildew loves warm, moist weather and can spread quickly under these
conditions.
It can be particularly problematic in greenhouse and indoor growing environments,
where the humidity levels can be high and air circulation may be limited. The fungus
can survive the winter on infected plant residue and living plant tissue, so good sanitation
is a key to preventing the disease.

1.1.5 DOWNY MILDEW

Downy mildew is also a fungal disease that affects many types of plants, including
vegetables, fruits, flowers, and ornamental plants. The disease is triggered by several
species of fungus in the Peronosporaceae family.
Symptoms of downy mildew include yellow or pale green areas on the top surface of
leaves, along with white or grayish-purple fuzzy growth on the bottom surface of the
leaves. The fuzzy growth is made up of fungal spores that can be easily spread by wind
or water. With increasing severity, leaves might brown and die before their time, the
plant might flower and fruit less.
Fig 1.7: Downy Mildew

Cool, wet conditions favour the growth of downy mildew, which can spread quickly
under such conditions. The fungus can survive over winter on infected plant residue and
seed, so crop rotation and sanitation are key to preventing the disease. mildew thrives in
cool, wet weather and can spread rapidly under these conditions. The fungus can
overwinter on infected plant debris and can be carried on seed, making crop rotation and
good sanitation practices important for preventing the disease.

1.1.6 RUST

Rust is a fungal disease that affects a wide range of plants, including both woody and
herbaceous plants. The disease is caused by several different species of fungi, all
belonging to the order Pucciniales.

Fig 1.8: Rust


The symptoms of rust are tiny, round, reddish-brown pustules or blisters on the leaves,
stems or flowers of the plant. The pustules can be yellow, orange or even black,
depending on the species of fungus and the plant it infects. As the disease progresses,
the leaves may turn yellow and drop prematurely, and the plant may become weakened
and stunted.
Living plant tissue is essential for the survival and reproduction of rust fungi, which
spread easily in warm, humid conditions. The fungus's spores are readily dispersed by
wind or water, and can also travel on clothing, tools and equipment.

1.2 IMPACT OF PLANT DISEASE

And plant diseases can be devastating to agriculture and ecosystems. Here are some
potential impacts of plant diseases:
Reduced crop yield: Plant pathology can have devastating effects on crop production,
costing farmers money and the population food.
Increased use of pesticides: Farmers might apply pesticides to prevent plant diseases,
which costs money and can damage the environment and human health.
Environmental impact: Plant pathogens can also lead to soil erosion, biodiversity loss
and other environmental ills.
Food safety: Plant diseases can also cause toxins that render food unfit for eating.
Economic impact: Plant diseases are also a major economic concern, not just in terms
of lost farmer income, but also because of the instability they introduce to agricultural
communities. In general, plant diseases can have many kinds of adverse effects on
agriculture, the environment, and human health. Early detection and accurate diagnosis,
part of effective disease management programmes, can help to reduce these effects and
support sustainable agriculture.
CHAPTER 2
LITERATURE REVIEW

[1] Murat Koklu, Ilker Ali Ozkan, 2020 suggested their research on the design of a
computer vision system to sort seven different registered varieties of dry beans through
image analysis to achieve uniform seed sorting. It employed a high-resolution camera
and a MATLAB GUI to acquire 16 features from 13,611 bean grains, and four
classification models were developed, with SVM having the best accuracy. The study
found that shape and size features alone were not enough for successful classification,
and incorporating texture and statistical features may improve results. The resulting
MATLAB program might be adapted into a mobile application to build a shared image
database of dry beans.

[2] Ferhat Kurtulmuş, 2020 Suggested deep learning and computer vision techniques
to detect sunflower seeds, with the best classification accuracy obtained using the
GoogleNet algorithm. In this study, a computer vision system was proposed, trained,
and tested to identify four varieties of sunfower seeds using deep learning methodology.
The best classifcation accuracy (95% was followed by the GoogleNet algorithm. This
study was a pioneering effort to discriminate sunflower seed varieties via deep learning
and detailed the specifics of the imaging setup, the structure of the datasets, and the
performance of the top-performing DCNN models.

[3] H. Ünal, 2014 proposed a study on rapeseed, which is widely cultivated for animal
feed, vegetable fat, and biodiesel, and an affordable method based on computer vision
and machine learning was proposed to classify the seven varieties. An expert system
based on computer vision and machine learning was proposed to classify seven rapeseed
varieties using mass surface color images. It achieved accuracies of more than 90 per
cent (99.24 per cent for the most predictive model) and 29 features were sufficient to
identify seven varieties from colour images. A more complete expert system needs to be
developed for the benefit of rapeseed industry and research community, covering more
rapeseed varieties.

[4] Di Wu, Zhen Cai, Jiwan Han, Huawei Qin, 2020 published a paper on a low-cost
and highly efficient automated recognition and counting system for maize kernels from
digital images of maize ears. The method involves five steps including image
compression, separating maize fruit from the background, enhancing kernel edges,
segmenting kernel zones, and recognizing local grayscale peaks to determine the kernel
number. It turns out that the new algorithm is robust and performs well at counting maize
ear kernels in a variety of lighting conditions. The technique has several advantages
compared with manual counting in terms of accuracy, speed and stability, and it can be
applied to several types of maize kernels. But the accuracy can be compromised if
kernels on both sides of the ear are blocked, and image geometry distortion is high. The
approach also serves as a model for counting kernels of other crops, with potential
applications in breeding programmes.

[5] Gang Kou, Pei Yang, Yi Peng, Feng Xiao, Yang Chen, Fawaz E. Alsaadi, 2019
proposed the evaluation of feature selection methods for text classification on small
sample datasets is a multiple criteria decision-making (MCDM) problem that requires a
better evaluation method. This paper presents a performance evaluation framework
based on the multi-criteria decision-making (MCDM) approach for the feature-selection
methods of text classification. The method contrasts 10 different feature selection
techniques, nine performance metrics for binary classification, seven performance
metrics for multi-class classification, and three classifiers on 10 small datasets. Five
MCDM methods are employed to rank the feature selection methods according to their
performance in terms of classification accuracy, stability and efficiency. As can be seen,
none of the feature selection methods dominated in terms of performance on all criteria,
and DF was the superior method in the overall ranking. More work is needed to explore
other MCDM methods and try out more datasets.

[6] Jasmeet Kaur, Er. Ramanpreet Kaur, 2016 proposed a study on plants are
important for solving global warming, but they are prone to diseases that need to be
detected through pattern recognition. Plant pathogens are commonly surveyed using
remote sensing and digital image processing techniques. Backpropagation and principal
component analysis (PCA) are algorithms used to recognise patterns, often for the
detection of disease, but accuracy is a problem. To improve accuracy, a new method
combining BP, PCA, and SVD is proposed. This study detected grape downy mildew,
grape powdery mildew, wheat stripe rust, and wheat leaf rust using BP networks and
three-color features. PCA was used to reduce feature data dimensions, but the accuracy
was lower compared to other methods. Further research is needed to improve the
recognition results.

[7] Ko Ko Zaw, Dr. Zin Ma Ma Myo, Daw Thae Hsu Thoung, 2018 did a research
work which proposes a support vector machine (SVM) algorithm for the classification
of leaf diseases in Myanmar using image processing techniques. RGB color space is
transformed into HSI color space and then segmented by using k-means clustering.
'GLCM is applied after median filtering for noise suppression, extracting features. Leaf
disease classifier using SVM classifier with the highest accuracy of 83 per cent. The
total program execution time was 0.24045 minutes. Such a system can be extended to
other leaf diseases for better performance if the time of program execution is limited.

[8] Godliver Owomugisha, Friedrich Melchert, Ernest Mwebaze, John A Quinn and
Michael Biehl (2018) studied how to use spectral data from the leaves of a plant to
automatically diagnose crop diseases, a vital task in areas with a shortage of experts. The
two authors also contrast the performance of prototype-based classification algorithms
and traditional classification models in a three-class classification problem setup. The
results demonstrate a dramatic performance increase with the use of spectral data that
enables disease detection before crops show visible symptoms. The research was carried
out in partnership with the Uganda National Crop Resources Research Institute, with
technical support from the University of Groningen's Centre for Information
Technology, and funding from the Bill and Melinda Gates Foundation.beans
CHAPTER 3 PROBLEM STATEMENT AND PROPOSED
SOLUTION

3.1 PROBLEM STATEMENT


Plant diseases threaten world food security because they lead to huge losses in crop
production. Culture-based methods of plant disease detection and diagnosis take time
and expertise, and it is difficult for small-scale farmers to take timely and effective action
to avoid crop loss. There is a need to develop a fast, accurate, and accessible solution for
identifying plant diseases to minimize crop losses and improve food security.

3.2 PROPOSED SOLUTION


The suggested approach is to use machine learning algorithms, specifically
convolutional neural networks (CNNs), to recognise plant diseases by analysing images
of plants. The project wants to create an image database of healthy plants and plants that
are ill with different diseases, and train a CNN model on this database to be able to
classify images as healthy or diseased. This remedy could dramatically lower crop losses
from plant diseases, and boost the food security of small-scale farmers.

CHAPTER 4 ARCHITECTURE DIAGRAM

Fig 4.1: Architecture Diagram


• Plant leaves pictures - For our project we have used what is called the plant village
dataset, which is a collection of various plant leaves and their corresponding diseases.
• Training set - The set of data used to train the model. The model looks at the data and
learns from it. We're giving the training set 70% of the dataset.
• Testing - At the conclusion of the project, we may run evaluation metrics on the test set
to obtain an idea of how the model will perform in practice. 20 of the data will be the
test set.
• Validation - Another part of the dataset, referred to as the validation set, will be used
during training to assess how well the model generalises to photos not used in training.
As a validation set, we're holding out 10 per cent of the dataset.
• Developing Machine Learning models - machine learning models have been
developed and we've compared the accuracy of each of them. The accuracy of the
random forest classifier is the best of all, we decided.
• Classification - It is done using CNN a type of artificial neural network widely used for
image/object recognition and classification. Output The resulting output will tell us the
actual disease, the predicted disease by our model, and the confidence of our model.
METHODOLOGY

Step 1: Loading the dataset of 54,303 healthy and diseased leaf images split into 38
species and disease categories.

Link for dataset –


https://github.com/spMohanty/PlantVillage-Dataset/tree/master/raw/color

Fig 5.1: Plant Village Dataset

Step 2: Dividing the imported dataset into 3 parts: -


• Training
• Testing
• Validation.

Step 3: Applying Machine learning model: -


• Logistic Regression
• Linear Discriminant Analysis
• K Nearest Neighbors
• Decision Tree Classifier
• Random Forest Classifier
• Gaussian Naive Bayes
• Support Vector Machine
 Logistic Regression It is a supervised learning algorithm that is used for classification
in Machine Learning. It is a statistical model that helps in predicting the probability of
the occurrence of a binary outcome (0 or 1) based on a set of independent variables.

 Linear Discriminant Analysis (LDA): It is a supervised classification technique in


machine learning that is used to identify a linear combination of features that can best
discriminate between classes. LDA is usually employed as a dimensionality-reduction
step in data preprocessing to map a high-dimensional dataset into a lower-dimensional
space.

LDA does this first by calculating the mean and covariance of each class in the data. It
then seeks a linear discriminant function, which maximises the ratio of variance between
the classes to variance within them. The goal is to find a projection of the data onto a
lower-dimensional space such that the variance between classes is maximized and the
variance within classes is minimized.

 K-nearest neighbors (KNN): It is a machine learning algorithm used for both


regression and classification problems. It is a non-parametric algorithm, which means
that it does not make any assumptions about the underlying data distribution.

For example, in KNN, the algorithm searches for the k closest data points (i.e.,
neighbors) to some input data point and then predicts its class or value by the class or
value of those neighbors.

The value of k is a hyperparameter that has to be chosen prior to training the model.

 Decision tree classifier: It is a supervised learning algorithm in the form of a classifier.


The algorithm aims to construct a model that can predict the class of an input data point
by traversing a series of decisions, each of which is predicated on the values of the
features of the data point.
The algorithm operates, at a high level of abstraction, by recursively dividing the data
into smaller and smaller subsets until the subsets are pure (ie, all data points. (in the
subclass be the same class) or some stopping criterion is reached. The resulting tree has
the form of a tree, and the structure can be thought of as a set of rules that can be applied
to classify new pieces of data.
Random Forest Classifier: It is a machine learning method, which is a type of machine
learning algorithm, for classification. It is an assembly of decision trees that work
collectively as a forest to make a prediction. Every decision tree in the forest is
constructed from a random sample of the training data and a random sample of the
features.
The algorithm starts by growing several decision trees on various subsets of the training
data and across various subsets of the features. The decision trees are constructed using
a recursive partitioning approach to split the data into smaller subsets, where each subset
is split based on the best split that maximizes the information gain. The split is repeated
until the decision tree has satisfied a stopping criterion, such as a maximum tree depth
or a minimum number of samples necessary to split a node.
The algorithm first builds multiple decision trees on different subsets of the training data
and with different subsets of the features. The decision trees are constructed using a
recursive partitioning approach to split the data into smaller subsets, where each subset
is split based on the best split that maximizes the information gain. The splitting process
continues until the decision tree reaches a stopping criterion, such as a maximum tree
depth or minimum number of samples required to split a node.

 Gaussian Naive Bayes: It is a stochastic classifier in machine learning. It's derived from
the Bayes theorem and makes the independence assumption that all features are
independent from one another, which makes it computationally easy.

In Gaussian Naive Bayes, each feature is assumed to follow a Gaussian (normal)


distribution. The algorithm computes the posterior probability for each class, given the
input features, and returns the class with the highest posterior probability as the output.

 Support vector machines (SVM): It is a training algorithm for classification and


regression. In the area of classification, SVM looks for a hyperplane in a very high-
dimensional space that maximally separates different classes of data.

Step 4: Calculating performance measures: -


• Accuracy
• Precision
• Recall
• F1 – score
 Accuracy:
One of the standard measures of performance used in Machine Learning (ML) is to
assess the quality of a classifier. It's the proportion of correct predictions the model
produced over the total number of predictions.

For instance, if a model has predicted 80 out of 100 samples correctly, then the accuracy
of the model is 80 per cent. It is calculated as follows:

accuracy = (number of correct predictions) / (total number of predictions)

 Precision:
Precision (in machine learning) is a parameter that quantifies the proportion of correctly
classified positive examples out of all the examples that are classified as positive. It is
also known as positive predictive value (PPV). Precision is a useful parameter when the
cost of false positives is high, as it helps to minimize the number of false positives.

The precision is calculated using the following formula:

precision = TP / (TP + FP) – (1)

Where TP is the number of true positives and FP is the number of false positives. True
positives are the instances that are actually positive and are correctly identified as
positive, while false positives are the instances that are actually negative but are
incorrectly identified as positive. A high precision score implies that the model has a low
false positive rate, in other words that it is effective at identifying positive instances.
 Recall:
In machine learning, recall is a measure of performance based on the model's ability to
return all the relevant instances of a given class (also known as sensitivity or true positive
rate (TPR).
Put another way, recall is the percentage of true positives the model accurately identifies
out of all the positive examples in the data. A good recall value means that the model is
good at picking up true positives, and a bad recall value means that the model is dropping
a lot of positives.
The formula for recall is: The formula for recall is:
Recall = TP / (TP + FN) – (2)

Where TP is the number of true positives (correctly classified positive instances), and FN is
the number of false negatives (positive instances that were falsely classified as negative).
 F1-Score:
The F1-score is one measure that is very often used to assess the performance of a binary
classifier. It's the harmonic average of precision and recall and is a single number that
weighs both precision and recall.

The formula for F1-score is::

F1-score = 2 * (precision * recall) / (precision + recall) – (3)

where precision is the number of true positives divided by the number of true positives
plus false positives, and recall is the number of true positives divided by the number of
true positives plus false negatives.
The F1-score is a value between 0 and 1, with 1 being perfect precision and recall, and
0 being very poor performance. It is a useful metric to apply when the data is imbalanced
(i.e. one class is far more common than the other), since it takes into account both false
positives and false negatives.
All in all, F1-score is a nice compromise between precision and recall.
Step 5: Configure the algorithm for the dataset.

Convolutional Neural Network (CNN)

Step 6: Testing & Validation


The type of validation used in this project is "split validation”.

Step 7: Output (actual label and predicted label).


ALGORITHM

St ep Process
s

1 Imported libraries
2 Preprocessing of images – conversion of images in
HSV format.
3 Image Segmentation for extraction of green and brown
segments
4 Feature Scaling- Used MinMaxScaler so that all the
images are in same format

5 Feature Extraction – Extracted the features from


leaves such as texture, color and outline

6 Training and Testing – Divided the dataset in 70%


for training, 20% for testing and 10% for validation.

7 Models –
• Logistic Regression- The process of modeling the
probability of a discrete outcome given an input
variable is known as logistic regression.
• Linear Discriminant Analysis- A supervised learning
technique called linear discriminant analysis (LDA)
is applied to machine learning classification
problems.
• K Nearest Neighbors- The k-nearest neighbors
algorithm, sometimes referred to as KNN or k-NN, is
a non-parametric supervised learning classifier that
groups individual data points based on proximity in
order to classify or predict data.
• Decision Tree Classifier- By constructing a
decision tree, the decision tree classifier generates
the classification model for that attribute.
• Random Forest Classifier- A random forest is a meta
estimator that employs averaging to increase
prediction accuracy and manage over fitting after
fitting several decision tree classifiers on different
subsamples of the dataset.
• Gaussian Naive Bayes-
• Support Vector Machine- Strong machine learning
algorithms like Support Vector Machine (SVM)
are used for tasks like regression, outlier detection,
and linear or nonlinear classification.

8 k- fold cross validation - for evaluating the


performance of multiple machine learning models
on a given dataset using cross-validation.

9 Performance Evaluation – Evaluating the model on


different parameters such as accuracy, F1-score,
precision, and recall

10 Prediction using the best performing classifier which


is Random
Forest Classifier.
11 Classification using CNN which will give the
output of actual disease, predicted disease and
confidence.
Table 6.1: Algorithm
CHAPTER 5 RESULTS AND DISCUSSION

Fig 8.1: Result with RF Classifier

Fig 8.2: Comparison of accuracy of machine learning models


Fig 8.3: Performance Evaluation

Fig 8.4: Result with CNN model (1) Fig 8.5: Result with CNN model (2)
Fig 8.6: Output with actual label, predicted label, and confidence.
CHAPTER 6 CONCLUSION AND FUTURE SCOPE

In conclusion, our study aimed to develop a solution for detecting plant diseases using
machine learning algorithms. We created a database of images from healthy crops and
crops with several different diseases, and trained several models, such as a Random
Forest (RF) classifier and a Convolutional Neural Network (CNNs), to classify the
images into healthy or diseased. Our results showed that the RF classifier had the highest
accuracy in identifying healthy and diseased plants.
In order to further refine the accuracy of the disease identification, we took the images
identified as diseased by the RF classifier and ran CNN on them to get the name of the
specific disease. Our hybrid approach, which integrates the best of RF and CNN, gives
farmers a quick and precise tool to diagnose plant diseases, and time enough to take
appropriate preventive measures to avoid crop losses.
This solution has the potential to significantly reduce crop losses caused by plant
diseases, improve food security for farmers, and promote sustainable agriculture
practices. For future studies, one could enlarge the dataset by adding more species of
plants and diseases, and also train the CNN model more effectively to recognise
particular diseases. In sum, our work shows how machine learning algorithms hold
promise to change plant disease detection and agricultural practices.
Since machine learning needs you to know computer programming, statistics, and data
evaluation, the future scope of your machine learning career can also be in leadership
roles in automation or analytics environments that use data science, big data analysis,
AI integration.
The primary goal of creating this Machine Learning Model is to save the world from
many plant diseases.
With slight upgradation, this model can be applied to several other agricultural
experiments, and can also be utilized as some kind of field equipment to enable farmers
to rescue their crops from diseases at a very early stage.
Integration with precision agriculture: Machine learning models can be integrated with
precision agriculture techniques, such as drones and sensors, to detect plant diseases on
a large scale. This will enable farmers to track their crops as they go and take appropriate
measures to avoid the spread of disease.
Transfer learning: Transfer learning is a machine learning technique that involves
training a model on one task and transferring its knowledge to another related task.
Models can be trained on a minimal dataset for plant disease detection and then transfer
learning can be applied to a much bigger one.
Use of advanced techniques: As machine learning progresses, new approaches like
deep learning, reinforcement learning, and generative adversarial networks (GANs) can
be applied to detect plant disease. These sophisticated methods can help to detect disease
more accurately, and with fewer false positives.
Multi-spectral imaging: Multi-spectral imaging is a method that employs sensors to
take pictures of plants at several different wavelengths of light. This technique can be
used to capture more detailed images of plants, which can be used to detect plant diseases
more accurately.
Use of IoT: The Internet of Things (IoT) can be applied to develop smart farms that can
detect and manage plant diseases via machine learning. Sensor for sensing
environmental factors such as humidity, temperature, and soil moisture that can be used
to forecast the onset of diseases.
Integration with crop management systems: Machine learning systems can be
embedded in crop management platforms to give farmers on-the-ground
recommendations. For instance, the model can suggest the best moment to spray
pesticides or fungicides, depending on the likelihood of plant diseases.
Overall, the future of plant disease detection using machine learning is promising, and.
There is still much room for improvements in this field.
REFERENCES

[1] J. Shirahatti, R. Patil, and P. Akulwar, “A survey paper on plant disease identification using machine
learning approach,” in 2018 3rd International Conference on Communication and Electronics Systems
(ICCES). IEEE, 2018, pp. 1171–1174.
[2] CS Arvind et al. “Deep Learning Based Plant Disease Classification
With Explainable AI and Mitigation Recommendation”. In: 2021 IEEE Symposium Series on
Computational Intelligence (SSCI). Dec. 2021, pp. 01– 08
[3] Quan Huu Cap et al. “LeafGAN: An Effective Data Augmentation Method for Practical Plant Disease
Diagnosis”. In: IEEE Transactions on Automation Science and Engineering 19.2 (Apr. 2022). Conference
Name: IEEE Transactions on Automation Science and Engineering, pp. 1258–1267. ISSN: 1558-3783.
[4] Uday Pratap Singh et al. “Multilayer Convolution Neural Network for the Classification of Mango Leaves
Infected by Anthracnose Disease”. In: IEEE Access 7 (2019). Conference Name: IEEE Access, pp. 43721–
43729.
[5] Amer Tabbakh and Soubhagya Sankar Barpanda. “A Deep Features Extraction Model Based on the
Transfer Learning Model and Vision Transformer “TLMViT” for Plant Disease Classification”. In: IEEE
Access 11 (2023). Conference Name: IEEE Access, pp. 45377– 45392. ISSN: 2169-3536
[6] K. M. Hasib, F. Rahman, R. Hasnat, and M. G. R. Alam, “A machine learning and explainable ai approach
for predicting secondary school student performance,” in 2022 IEEE 12th Annual Computing and
Communication Workshop and Conference (CCWC), 2022, pp. 0399– 0405.
[7] Ko Ko Zaw, Dr. Zin Ma Ma Myo, Daw Thae Hsu Thoung, “Support Vector Machine Based Classification
of Leaf Diseases”, International Journal of Science and Engineering Applications, 2018.
[8] Godliver Owomugisha, Friedrich Melchert, Ernest Mwebaze, John
A Quinn and Michael Biehl, “Machine Learning for diagnosis of disease in plants using spectral data”,
Int'l Conf. Artificial Intelligence (2018).

[9] Daglarli, Evren, “Explainable Artificial Intelligence (xAI) Approaches and Deep Meta-Learning Models
for Cyber-Physical Systems”, Artificial Intelligence Paradigms for Smart Cyber-Physical Systems, edited
by Ashish Kumar Luhach and Atilla Elçi, IGI Global, 2021, pp. 42-67.
[10] Adi Dwifana Saputra , Djarot Hindarto, Handri Santoso, “Disease Classification on Rice Leaves using
DenseNet121, DenseNet169, DenseNet201”, Sinkron : Jurnal dan Penelitian Teknik Informatika Volume
8, Issue 1, January 2023, DOI : https://doi.org/10.33395/sinkron.v8i1.11906
CONFERENCE PESENTATION

Our paper titled "INTELLIGENT PLANT DISEASE DIAGNOSIS WITH


EXPLAINABLE AI METHODS AND LIGHTWEIGHT MODEL" was
presented at ICOFE-2024 conference held at SRM. 200+ shortlisted teams
presented their papers on various fields in the conference. Our paper got accepted
with a plagiarism of 7%.

FigureA.1: ICOFE-2024 Acceptance

On presenting the paper in this international conference held at SRM KTR campus, we
received positive remarks and suggestion from the judging panel.
Intelligent Plant Disease Diagnosis with
Explainable AI Methods and Lightweight Model
ISHAN JOSHI NAMAN MARDIA
CTech CTech
SRM Institute of Science and Technology SRM Institute of Science and Technology
Chennai, India Chennai, India
id8400@srmist.edu.in na4197@srmist.edu.in

Dr. R. VIDHYA
Associate Professor
CTech
SRM Institute of Science and Technology
Chennai, India
vidhyar@srmist.edu.in

To address these challenges, there is a critical need for an


intelligent plant disease diagnosis system that not only
provides high accuracy but also incorporates explainable AI
methods. This system should be capable of offering clear,
Abstract— the agricultural sector is a key driver of a nation's interpretable explanations for its predictions, thereby
economic growth, especially in India, where it serves as a empowering users to make informed decisions about disease
primary source of livelihood for millions in rural areas. One of management. Additionally, the system must be robust,
the major challenges facing agriculture is plant diseases, which scalable, and adaptable to various agricultural settings to
can be triggered by a variety of factors such as synthetic ensure broad applicability and effectiveness.
fertilizers, outdated farming practices, and environmental In this survey paper, we plan to summarize the various
conditions. These diseases can severely impact crop yield, commonly occurring leaf diseases that infect plants and the
ultimately affecting the economy. To tackle this issue, available datasets and state-of-the-art techniques for detecting
researchers have increasingly turned to AI and Machine infected leaf diseases. Furthermore, we intend to introduce
Learning techniques for plant disease detection. This research Explainable AI (XAI) in plant leaf-based disease detection
survey provides an in-depth review of common plant leaf and classification. The goal is to enhance the transparency
diseases, evaluates both traditional and deep learning
and interpretability of deep learning models by generating
XAI based solutions tailored explicitly for CNN and
approaches for disease identification, and highlights available
Transformer models. The study also underscores the
datasets. Additionally, it investigates the role of Explainable AI motivation for using XAI in plant leaf disease detection and
(XAI) in improving the transparency of deep learning models, highlights possible future research directions.
making their decisions more interpretable for end-users. By
synthesizing this knowledge, the survey offers valuable insights
for researchers, practitioners, and stakeholders, driving the
development of effective and transparent solutions for managing II. OVERVIEW ON LEAF BASED PLANT
plant diseases and promoting sustainable agriculture. DISEASE DETECTION
All living organisms, including plants, animals, and
I. INTRODUCTION humans, are vulnerable to diseases. Researchers and
Despite advances in agricultural technology, plant professionals in agricultural science and management are
disease remains a significant challenge, leading to actively searching for advanced solutions to mitigate plant
substantial crop losses globally and threatening food disease outbreaks, which can cause significant damage to
security. Traditional methods of plant disease detection are agricultural productivity. To address this, various scientific
highly dependent on human expertise, making them prone disciplines collaborate to control the spread of plant leaf
to errors, subjective bias, and inefficiencies, particularly in diseases and ensure a stable food supply for the world’s
large-scale farming operations. The lack of timely and growing population.
accurate diagnosis often results in delayed treatment,
exacerbating the spread of diseases and further reducing
crop yields. Plant diseases can manifest through various symptoms that
comprise a plant’s structural components—such as leaves,
The advent of machine learning and computer vision stems, and roots—ultimately affecting its ability to grow,
offers a promising solution by enabling automated disease reproduce, or yield effectively. The occurrence of these
detection systems. However, current models often function diseases varies seasonally, influenced by changes in
as "black boxes," providing little to no explanation of their weather conditions and the presence of specific pathogens.
decision-making processes. This lack of transparency can
hinder trust and adoption among farmers and agronomists
who need to understand and validate the system’s
This section is organized into three parts, discussing
recommendations. Furthermore, these models may struggle
common leaf diseases, available datasets, and highlighting
to generalize across different environmental conditions.
C. PLAGIARISM REPORT

59

You might also like