Report

Download as pdf or txt
Download as pdf or txt
You are on page 1of 98

Crop Disease Diagnosis and Remedia0on

A Minor Project Report Submi3ed To

Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal


Towards Par*al Fulfilment for the Award Of

Bachelor of Technology

In

COMPUTER SCIENCE AND ENGINEERING

Submi3ed By

Kumari Lalini (0863CS211086)

Khushboo Rathore (0863CS211075)

Pranjal Panchal (0863CS211113)

Priyanka Bijore (0863CS211117)

Under the Supervision of

Dr. Amita Jain


Session: 2024 (Jan – June)

Department of Computer Science and Engineering,

Pres/ge Ins/tute of Engineering, Management and Research, Indore (M.P.)


[An Ins*tu*on Approved By AICTE, New Delhi & Affiliated To RGPV, Bhopal]
PRESTIGE INSTITUTE OF ENGINEERING MANAGEMENT
AND RESEARCH INDORE (M.P.)

DECLARATION

We Khushboo Rathore , Kumari Lalini , Pranjal Panchal , Priyanka Bijore hereby declare that
the project en*tled “Early detecKon of diabeKc ReKnopathy using deep learning”, which is
submiYed by us for the par*al fulfilment of the requirement for the award of Bachelor of
Technology in Computer Science & Engineering to the Pres*ge Ins*tute of Engineering,
Management and Research, Indore (M.P.). Rajiv Gandhi Proudhyogiki Vishwavidyalaya,
Bhopal, comprises my work and due acknowledgement has been made in the text to all other
material used.

Signature of Students:

Date: May 2024

Place: Indore

PRESTIGE INSTITUTE OF ENGINEERING MANAGEMENT


AND RESEARCH

INDORE (M.P.)
DISSERTATION APPROVAL SHEET

This is to cer*fy that the disserta*on en*tled “Early detecKon of diabeKc ReKnopathy using

deep learning” submiYed by Kumari Lalini (0863CS211086), Khushboo Rathore

(0863CS211075) , Pranjal Panchal (0863CS211113) ,Priyanka Bijore (0832CS211117) to the

Pres*ge Ins*tute of Engineering, Management and Research, Indore (M.P.) is approved as

fulfilment for the award of the degree of “Bachelor of Technology in Computer Science &

Engineering” by Rajiv Gandhi Proudhyogiki Vishwavidyalaya, Bhopal, (M.P.).

Internal Examiner External Examiner

Date: Date:

HOD, CSE

Dr Piyush Choudhary

PIEMR, INDORE

PRESTIGE INSTITUTE OF ENGINEERING MANAGEMENT RESEARCH

INDORE (M.P.)

CERTIFICATE

This is cer*fied that the project en*tled “Early detecKon of diabeKc ReKnopathy using
deep learning” submiYed by Kumari Lalini, Khushboo Rathore , Pranjal Panchal , Priyanka
Bijore is a sa*sfactory account of the bona fide work done under our supervision and is

recommended towards par*al fulfilment for the award of the degree Bachelor of Technology
in Computer Science & Engineering to Rajiv Gandhi Proudyogiki Vishwavidyalaya, Bhopal

(M.P.).
Date:

Enclosed by:

Dr. Amita Jain Prof. Ajay Jaiswal Dr. Piyush Choudhary Project Guide Project

Coordinator Professor & Head, CSE

Dr. Manojkumar Deshpande

Director

PIEMR, Indore

PRESTIGE INSTITUTE OF ENGINEERING MANAGEMENT AND RESEARCH

INDORE (M.P.)

ACKNOWLEDGEMENT

A_er the comple*on of Minor project work, words are not enough to express our feelings
about all those who helped us to reach our goal; feeling above this is our indebtedness to the
almighty for providing us with this moment in life.

First and foremost, we take this opportunity to express our deep regards and hearaelt
gra*tude to our project guide Dr Amita Jain and Project Coordinator Prof. Ajay Jaiswal,
Department of Computer Science and Engineering, PIEMR, Indore for their inspiring
guidance and *mely sugges*ons in carrying out our project successfully. They are also a
constant source of inspira*on for us. Working under their guidance has allowed us to learn
more and more.

We are extremely thankful to Dr. Piyush Choudhary, (HOD, CSE) for his coopera*on and
mo*va*on during the project. We extend our deepest gra*tude to Dr Manojkumar
Deshpande, Director, PIEMR, and Indore for providing all the necessary facili*es and a truly
encouraging environment to bring out the best of our endeavours.
We would like to thank all the teachers in our department for providing invaluable support
and mo*va*on. We remain indebted to all the non-teaching staff of our Ins*tute who have
helped us immensely throughout the project.

We are also grateful to our friends and colleagues for their help and coopera*on throughout
this work. Last but not least; We thank our families for their support, pa*ence, blessings and
understanding while comple*ng our project.

Name of Students:

Kumari Lalini with 0863CS211086

Khushboo Rathore with 0863CS211075

Pranjal Panchal with 0863CS211113

Priyanka Bijore with 0863CS211117

INDEX

Declara/on I

Disserta/on Approval Sheet II

Cer/ficate III Acknowledgement IV Table of Contents V

List of Figures VII

Abbrevia/ons VIII
TABLE OF CONTENTS

CHAPTER 1 INTRODUCTION

1.1 Introduc*on…………………………………………………………………….………..12

1.2 Mo*va*on………………………………………………………..............................13

1.3 Objec*ve…………………………………………………………………………………..13

1.4 Analysis ……………………………………………………………………………………14

1.4.1 Func*onal Requirements ………………………………………………..14

1.4.2 Non-func*onal Requirements ………………………………………….15

1.4.3 Use Case Diagram………………………………………………………….16

CHAPTER 2 BACKGROUND AND RELATED WORK

2.1 Problem Statement ………………………………………………………………..…18

2.2 Background and Related Work……………………………………………………18

2.2.1 Background Work …………………………………………………….…..18

2.2.2 Literature survey ……………………………………………………..…..19

2.3 Solu*on Approach…………………………………….…………………………..…20

CHAPTER 3
3 SYSTEM DESIGN & METHODOLOGY 6

3.1 EXISTING SYSTEM 6

3.2 PROPOSED SYSTEM 6


3.3 REQUIREMENT SPECIFICATION 6

3.4 TECHONOLOGIES USED 7

3.4.1 INTRODUCTION TO PYTHON 7


3.4.2 INTRODUCTION TO RANDOM FOREST 10
CHAPTER 4 SOFTWARE DESGIN AND IMPLEMENTATION

4.1 DESIGN AND IMPLEMENTATION CONSTRAINTS 24

4.2 OTHER NONFUNCTIONAL REQUIREMENTS 24

4.3 ARCHITECTURE DIAGRAM 25


4.4 SEQUENCE DIAGRAM 26

4.5 USE CASE DIAGRAM 26


4.6 ACTIVITY DIAGRAM 27

4.7 COLLABRATION DIAGRAM 28

4.8 MODULES 29
4.9 CODING AND TESTING 31

4.10 TEST DATA AND OUTPUT 32

4.11 TESTING TECHNIQUES 3

CHAPTER 5 RESULT AND DISCUSSION

5.1 Result

CHAPTER 6: CONCLUSION AND FUTURE WORK

6.1 CONCLUSION
6.2 FUTURE WORK
REFERENCES
APPENDIX
A. PAPER WORK
B. REPORT
C. SOURCE CODE
List Of Figures

FIGURE No.FIGURE NAME PAGE No.

3.1 RANDOM FOREST 10

3.2 SUPPORT VECTOR MACHINE 14

3.3 CLASSIFICATION OF ANALYSIS 15

3.4 SVM FOR CASE IN HAND 16

3.5 DISTRIBUTION 17

3.6 TRANSFORMATION 17

3.7 SCENARIO-1 18

3.8 SCENARIO-2 19

3.9 MARGIN 19

3.10 SCENARIO-3 20

3.11 SCENARIO-4 20

3.12 CLASSIFY TWO CLASSES 21

3.13 SCENARIO-5 21

3.14 SEGREGATE TWO CLASSES 22

3.15 HYPER-PLANE 23

4.1 ARCHITECTURE DIAGRAM 25

4.2 SEQUENCE DIAGRAM 26

4.3 USECASE DIAGRAM 27

4.4 ACTIVITY DIAGRAM 28


4.5 COLLABORATION DIAGRAM 29

5.1 WELCOME PAGE 38

5.2 REGISTRATION PAGE 39

5.3 LOG-IN PAGE 39

5.4 HOME PAGE 40

5.5 FILE UPLOAD PAGE 41

5.6 RESULT PAGE 41

5.7 STAGES OF DR 42

5.8 COMPARISON OF DR RESULTS 43


CHAPTER 1
INTRODUCTION
1.1 Introduc0on

Diabe*c Re*nopathy (DR) is a common complica*on of diabetes mellitus, which n causes


lesions on the re*na that affect vision. If it is not detected early, it can lead to blindness.
Unfortunately, DR is not a reversible process, and treatment only sustains vision. DR early
detec*on and treatment can significantly reduce the risk of vision loss. The manual diagnosis
process of DR re*na fundus images by ophthalmologists is *me, effort and cost-consuming
and prone to misdiagnosis unlike computer-aided diagnosis systems.

Transfer learning has become one of the most common techniques that has achieved beYer
performance in many areas, especially in medical image analysis and classifica*on. We used
Transfer Learning techniques like Incep*on

V3,Resnet50,Xcep*on V3 that are more widely used as a transfer learning method in medical
image analysis and they are highly effec*ve. Diabe*c re*nopathy (DR) is a significant cause of
blindness among diabe*c individuals aged 25–65. It occurs due to lesions on the re*na caused
by weakened blood vessels, leading to visual impairment and even total blindness. Current
manual grading methods for detec*ng DR are both *me-consuming and prone to errors.

Diabetes is a metabolic disorder that results in a re*nal complica*on called diabe*c


re*nopathy (DR) which is one of the four main reasons for sightlessness all over the globe. DR
usually has no clear symptoms before the onset, thus making disease iden* ca*on a
challenging task. The healthcare industry may face unfavorable consequences if the gap in
iden*fying DR is not lled with effec*ve automa*on. Thus, our objec*ve is to develop an
automa*c and cost-effec*ve method for classifying DR samples. In this work, we present a
custom Faster-RCNN technique for the recogni*on and classi ca*on of DR lesions from re*nal
images. A_er pre-processing, we generate the annota*ons of the dataset which is required
for model training. Then, introduce DenseNet-65 at the feature extrac*on level of Faster-
RCNN to compute the representa*ve set of key points. Finally, the Faster-RCNN localizes and
classi es the input sample into ve classes. Rigorous experiments performed on a Kaggle
dataset comprising of 88,704 images show that the introduced methodology outperforms
with an accuracy of 97.2%. We have compared our technique with state-of-the-art
approaches to show its robustness in term of DR localiza*on and classi ca*on. Addi*onally,
we performed cross-dataset valida*on on the Kaggle and APTOS datasets and achieved
remarkable results on both training and tes*ng phases.
In recent years, convolu*onal neural networks (CNNs) have shown great promise in
automa*ng the iden*fica*on and categoriza*on of diabe*c re*nopathy. Companies like
Google AI, IDx-Diabe*c Re*nopathy, Eyenuk, and VoxelCloud u*lize CNNs to detect DR with
high accuracy and speed, saving *me and costs compared to manual diagnosis.

Our study aims to develop a CNN-based model using re*nal fundus images for early detec*on
of diabe*c re*nopathy. We will train the model on a substan*al dataset of re*nal pictures
using powerful compu*ng architecture provided by graphics processing units (GPUs). The
model’s performance will be evaluated using metrics such as accuracy, sensi*vity, and
specificity, enhancing its classifica*on capabili*es.

By leveraging deep learning techniques, our proposed solu*on has the poten*al to improve
DR detec*on, enabling faster and more reliable interpreta*on of re*nal images.

1.2 Mo/va/on

Diabe*c Re*nopathy (DR) is a serious complica*on of diabetes that affects the eyes. It is the
leading cause of blindness in the working-age popula*on worldwide, es*mated to impact
over 347 million people1. When blood sugar levels remain uncontrolled for an extended
period, DR can develop. However, early detec*on and interven*on can prevent vision loss.

Here’s why this project is crucial:

Global Health Impact: DR affects millions of people, and its prevalence con*nues to rise. By
using ar*ficial intelligence (AI) and deep learning, we can iden*fy signs of DR from re*nal
fundus images, allowing for *mely treatment and preven*ng blindness1.

Efficiency and Accessibility: Tradi*onal diagnos*c methods require skilled doctors and
significant *me investment. Automa*ng DR detec*on using deep learning models reduces the
burden on healthcare professionals and ensures faster, more accessible screening2.

Precision and Early IntervenKon: Deep neural networks, such as Convolu*onal Neural
Networks (CNNs) and Residual Networks (ResNets), excel at analyzing medical images. By
training these models on large datasets, we can achieve high accuracy in iden*fying DR at an
early stage, enabling *mely interven*on3.

Scalability: Once trained, the model can process a large number of images efficiently. This
scalability is essen*al for popula*on-wide screening and monitoring4.1.3 Objec*ves

The principal aim of this project is to formulate a web-based plant disease detec*on system
that integrates advanced deep learning techniques, specifically leveraging Convolu*onal
Neural Networks (CNNs), with the user-friendly interface inherent in a Django web
applica*on. The fundamental objec*ve is to furnish farmers and agricultural stakeholders with
a robust and easily navigable tool for the expedi*ous and precise iden*fica*on of plant
diseases from images of their leaves. Through the applica*on of comprehensive data pre-
processing, the project endeavours to op*mize model accuracy, thereby ensuring dependable
diagnoses. Ul*mately, the system aspires to facilitate real-*me, data-driven decision-making
in the realm of agriculture, enabling early disease detec*on and the *mely applica*on of
remedies. In doing so, it endeavours to contribute substan*vely to the enhancement of crop
yields and the promo*on of sustainable farming prac*ces.

1.4 Analysis

Diabe*c Re*nopathy (DR) is a progressive eye disease caused by diabetes, affec*ng the blood
vessels in the re*na. Le_ untreated, it can lead to vision impairment and blindness. The
primary objec*ve of this project is to develop an automated system that detects DR at an
early stage using deep learning techniques.

Func/onal Requirements:

Image ClassificaKon Model: Develop a deep learning model capable of classifying re*nal
fundus images into two categories: normal (no signs of DR) and abnormal (indica*ng DR).

Achieve high accuracy and robustness to varia*ons in image quality, ligh*ng, and pa*ent
demographics.

Implement transfer learning using pre-trained CNN architectures (e.g., ResNet, VGG) to
leverage learned features.

Severity Level Predic*on:

Extend the model to predict severity levels of DR (mild, moderate, severe, or prolifera*ve).

Fine-tune the model to handle mul*-class classifica*on.

Provide interpretable results, highligh*ng regions in the image that contribute to the severity
predic*on.
Scalability and Efficiency: Ensure the model can process a large number of images efficiently.

Op*mize inference *me for real-*me clinical use.

Explore model compression techniques to reduce memory footprint.

Non-Func/onal Requirements:

Accuracy and SensiKvity: Achieve high sensi*vity (recall) to minimize false nega*ves (missed
cases of DR).

Balance specificity to avoid unnecessary referrals for healthy pa*ents.

Robustness and GeneralizaKon: Validate the model on diverse datasets from different clinics
and popula*ons.

Address domain shi_ by augmen*ng data and fine-tuning on relevant distribu*ons.

Ethical ConsideraKons: Ensure pa*ent privacy and informed consent during data collec*on.

Mi*gate bias by analyzing model performance across demographic groups.

Transparently communicate the limita*ons of AI-based diagnosis to healthcare providers.

Clinical IntegraKon: Collaborate with ophthalmologists to validate model predic*ons against


ground truth.

Develop an intui*ve user interface for clinicians to upload images and receive predic*ons.

Integrate the system into exis*ng clinical workflows.

Scalability and Deployment:

Design the solu*on to scale across mul*ple clinics and regions.

Deploy the model on cloud servers or edge devices for widespread adop*on.

Interpretability:

Use techniques like Grad-CAM to visualize which regions in the image contribute to the
model’s decision.

oZ
Fig 1.1 Use Case Diagram
CHAPTER 2 BACKGROUND
AND RELATED WORK
2.1 Problem Statement
Diabe*c re*nopathy (DR) is a common complica*on of diabetes that affects the re*na and
can lead to vision loss. Early detec*on of DR is crucial for *mely interven*on and preven*ng
irreversible damage. The goal of this project is to develop an accurate and efficient deep
learning model that can analyze fundus images and classify them based on the severity of DR.
The model should be able to predict whether a pa*ent has no DR, mild DR, moderate DR,
severe DR, or prolifera*ve DR. The project aims to achieve the following objec*ves:

Data CollecKon and Preprocessing:

Gather a diverse dataset of fundus images from diabe*c pa*ents.

Preprocess the images to enhance features and remove noise.

Model Architecture:

Design a deep neural network architecture suitable for image classifica*on.

Explore various architectures (e.g., CNNs, ResNets) and choose the most effec*ve one.

Training and ValidaKon:

Train the model using labeled data, ensuring proper valida*on and hyperparameter tuning.

Evaluate the model’s performance using metrics such as accuracy, precision, recall, and F1-
score.

Severity Grading:

Implement a grading system to categorize the severity of DR (e.g., 0-4 scale).

Classify fundus images into appropriate severity levels.

Deployment and Clinical IntegraKon:

Deploy the trained model in a clinical sexng (e.g., hospitals, clinics).

Integrate the model into exis*ng healthcare systems for real-*me diagnosis.
:

Address privacy concerns related to pa*ent data.

Ensure transparency and interpretability of the model’s decisions.

2.2 Background and Related Work


2.2.1 Background Work
Background: Diabe*c re*nopathy is a significant cause of blindness among individuals aged
25–65 who have diabetes. It occurs due to lesions on the re*na caused by weakened blood
vessels, leading to visual impairment and poten*al total blindness. Early detec*on is crucial
for *mely interven*on and preven*ng irreversible damage1.

Related Work: Researchers have explored various deep learning techniques to automate the
iden*fica*on and categoriza*on of diabe*c re*nopathy from fundus images. Notable
approaches include:

Convolu*onal Neural Networks (CNNs):

CNNs have shown promise in automa*ng DR diagnosis. They analyze re*nal images and
classify them based on severity.

Google AI, IDx-Diabe*c Re*nopathy, Eyenuk, and VoxelCloud are among the current
approaches using CNNs with high accuracy and speed, saving *me and costs compared to
manual diagnosis1.

Transfer Learning:

Researchers have experimented with pre-trained CNN models to predict diabe*c re*nopathy.
Transfer learning allows leveraging knowledge from exis*ng models for beYer performance2.

Addressing Imbalance:

Imbalanced datasets (e.g., fewer severe cases) pose challenges. Methods like synthe*c
minority oversampling technique (SMOTE) and oversampling with early stopping help
mi*gate overfixng and improve model robustness3.

Conclusion: A general deep learning model for detec*ng DR has been developed, applicable
across various DR databases.

2.2.2 Literature Survey:


REVIEW OF LITERATURE SURVEY:
1. Vashist P, Senjam SS, Gupta V, Manna S, Gupta N, Shamanna BR, Bhardwaj A, Kumar A,
Gupta P (2021) Prevalence of diabe*c re*nopahty in India: results from the Na*onal
Survey 2015–19. Ind J Ophthalmol 69(11):3087

2. Tan KW, Dickens BSL, Cook AR (2020) Projected burden of type 2 diabetes mellitus-
related complica*ons in Singapore un*l 2050: a Bayesian evidence synthesis. BMJ Open
Diab Res Care 8:000928

3. Diabetes: an overview. hYps://my.clevelandclinic.org/health/diseases/7104-diabetes-


mellitus-an-overview. Accessed 7 March 2022

4. Priya R, Aruna P (2013) Diagnosis of diabe*c re*nopathy using machine learning


techniques. ICTACT J So_ Comput 3(04):563

5. Chaki J, Ganesh ST, Cidham SK, Theertan SA (2022) Machine learning and ar*ficial
intelligence based Diabetes Mellitus detec*on and self-management: a systema*c
review. J King Saud Univ 34:3204

6. Nemade V, Pathak S, Dubey AK (2022) A systema*c literature review of breast cancer


diagnosis using machine intelligence techniques. Arch Computat Methods Eng.
hYps://doi.org/10.1007/s11831-022-09738-3

7. Gupta J, Pathak S, Kumar G (2022) A hybrid op*miza*on-tuned deep convolu*onal


neural network for bare skinned image classifica*on in websites. Mul*med Tools Appl
81:26283–26305. hYps://doi.org/10.1007/s11042-022-12891-3

- .
CHAPTER 3
SYSTEM DESIGN & METHEDOLOGY

3.1 EXISTINGSYSTEM

The obtained results are compared with the results of exis*ng models within the same

domain and found to be improved. The data of diabe*c pa*ents collected from the

UCI laboratory is used to discover paYerns with are K Nearest Neighbours (KNN), Naive

Bayes

(NB), Support Vector Machine (SVM), Decision Tree (DT), Logis*c Regression (LR) and

Random Forest (RF). The results are compared for performance and accuracy with

these algorithms. The proposed hybrid method returns results of 78.5%, compe*ng

with the other exis*ng methods.

3.2 PROPOSED SYSTEM

In this paper, six machine learning algorithms are used to predict diabetes disease.

These six algorithms are K Nearest Neighbours (KNN), Naive Bayes (NB), Support

Vector Machine (SVM), Decision Tree (DT), Logis*c Regression (LR) and Random Forest

(RF). Comparisonof the different machine learning techniques used in thisstudy

reveals which algorithm is best suited for predic*on ofdiabetes.

3.3 REQUIREMENTSPECIFICATION
This proposed so_ware runs effec*vely on a compu*ng system that has the minimum
requirements.

The requirements are split into two categories, namely:


The requirements are split into two categories, namely:

So`ware Requirements
The basic so_ware requirements to run the program are:
✔ Opera*ng System : Windows 7 , 8, 10 (64 bit)
✔ So_ware : Python

✔ Tools : Anaconda (Jupyter Note Book IDE)

Hardware Requirements

The basic hardware required to run the program are:


● Hard Disk : 500GB and Above

● RAM : 4GB and Above


Processor : I3 and Above

3.4 TECHNOLOGIES
USED Python
Random Forest and Support Vector
3.4.1 Introduc)on to Python

Python is a widely used general-purpose, high level programming language. It was

ini*ally designed by Guido van Rossum in 1991 and developed by Python So_ware

Founda*on. It was mainly developed for emphasis on code readability, and its syntax

allows programmers to express concepts in fewer lines of code. Python is a

programming language that lets you

work quickly and integrate systems more efficiently.

It is used for:

● web development (server-side),


● so_ware development,
● mathema*cs,

● System scrip*ng.

What can Python do?


● Python can be used on a server to create web applica*ons.

● Python can be used alongside so_ware to create workflows.


● Python can connect to database systems. It can also read and modify
files.

● Python can be used to handle big data and perform complex


mathema*cs.
● Python can be used for rapid prototyping, or for produc*on-ready

so_ware development.

Why Python?

● Python works on different plaaorms (Windows, Mac, Linux, Raspberry


Pi, etc).
● Python has a simple syntax similar to the English language.
● Python has syntax that allows developers to write programs with fewer

lines than some other programming languages.


● Python runs on an interpreter system, meaning that code can be

executed as soon as it is wriYen. This means that prototyping can be

very quick.
● Python can be treated in a procedural way, an object-orientated way or
a
func*onal way.
Good to know

● The most recent major version of Python is Python 3, which we shall be

using in this tutorial. However, Python 2, although not being updated

with anything

other than security updates, is s*ll quite popular.


● Python 2.0 was released in 2000, and the 2.x versions were the prevalent

releases un*l December 2008. At that *me, the development team

made the decision to release version 3.0, which contained a few

rela*vely small but significant changes that were not backward

compa*ble with the 2.x versions. Python 2 and 3 are very similar, and

some features of Python 3 have been

backported to Python 2. But in general, they remain not quite


compa*ble.
● Both Python 2 and 3 have con*nued to be maintained and developed,

with periodic release updates for both. As of this wri*ng, the most

recent versions available are 2.7.15 and 3.6.5. However, an official End

Of Life date of January 1, 2020 has been established for Python 2, a_er

which *me it will no longer be maintained.


● Python is s*ll maintained by a core development team at the Ins*tute,
and

Guido is s*ll in charge, having been given the *tle of BDFL (Benevolent

Dictator For Life) by the Python community. The name Python, by the

way, derives not from the snake, but from the Bri*sh comedy troupe

Monty Python’s Flying Circus, of which Guido was, and presumably s*ll

is, a fan. It is common to find references to Monty Python sketches and

movies scaYered throughout the Python documenta*on.

● It is possible to write Python in an Integrated Development

Environment, such as Thonny, Pycharm, Netbeans or Eclipse which are

par*cularly useful when


managing larger collec*ons of Python files.

Python Syntax compared to other programming languages

● Python was designed to for readability, and has some similari*es to the
English language with influence from mathema*cs.

● Python uses new lines to complete a command, as opposed to other


programming languages which o_en use semicolons or parentheses.

● Python relies on indenta*on, using whitespace, to define scope; such as

the scope of loops, func*ons and classes. Other programming languages

o_en

use curly-brackets for this purpose.

Python is Interpreted

● Many languages are compiled, meaning the source code you create

needs to be translated into machine code, the language of your

computer’s processor, before it can be run. Programs wriYen in an

interpreted language are passed

straight to an interpreter that runs them directly.

● This makes for a quicker development cycle because you just type in
your code and run it, without the intermediate compila*on step.

● One poten*al downside to interpreted languages is execu*on speed.

Programs that are compiled into the na*ve language of the computer

processor tend to run more quickly than interpreted programs. For some

applica*ons that are par*cularly computa*onally intensive, like graphics

processing or intense number crunching, this can be limi*ng.


● In prac*ce, however, for most programs, the difference in execu*on

speed is measured in milliseconds, or seconds at most, and not

appreciably no*ceable to a human user. The expediency of coding in an

interpreted language is
typically worth it for most applica*ons.
● For all its syntac*cal simplicity, Python supports most constructs that

would be expected in a very high-level language, including complex

dynamic data types, structured and func*onal programming, and

object-oriented programming.

● Addi*onally, a very extensive library of classes and func*ons is available

that provides capability well beyond what is built into the language, such
as database manipula*on or GUI programming.

● Python accomplishes what many programming languages don’t: the

language itself is simply designed, but it is very versa*le in terms of what

you can

accomplish with it.

3.4.2 Introduc)on to Random Forest

● With increase in computa*onal power, we can now choose algorithms which

perform very intensive calcula*ons. One such algorithm is “Random Forest”,

which we will discuss in this ar*cle. While the algorithm is very popular in various

compe**ons (e.g. like the ones running on Kaggle), the end output of the model

is like a black box and hence should be used judiciously.

● Before going any further, here is an example on the importance of choosing


the best algorithm.
the best algorithm.

Random Forest Case Study


● Following is a distribu*on of Annual income Gini Coefficients across different

countries:
● Mexico has the second highest Gini coefficient and hence has a very high

segrega*on in annual income of rich and poor. Our task is to come up with an

accurate predic*ve algorithm to es*mate annual income bracket of each

individual in Mexico. The brackets of income are as follows:

● 1. below $40,000
● 2. $40,000 – 150,000

● 3. More than $150,000


● Following are the informa*on available for each individual:
● 1. Age, 2. Gender, 3. Highest educa*onal qualifica*on, 4. Working in Industry,
5.
Residence in Metro/Non-metro

● We need to come up with an algorithm to give an accurate predic*on for an

individual who has following traits:


● 1. Age: 35 years, 2, Gender: Male, 3. Highest Educa*onal Qualifica*on:

Diploma holder, 4. Industry: Manufacturing, 5. Residence: Metro

● We will only talk about random forest to make this predic*on in this ar*cle.

● The algorithm of Random Forest


● Random forest is like bootstrapping algorithm with Decision tree (CART) model.

Say, we have 1000 observa*on in the complete popula*on with 10 variables.

Random forest tries to build mul*ple CART model with different sample and

different ini*al variables. For instance, it will take a random sample of 100
observa*on and 5 randomly chosen ini*al variables to build a CART model. It

will repeat the process (say) 10 *mes and then make a final predic*on on each

observa*on. Final predic*on is a func*on of each predic*on. This final

predic*on can simply be the mean of each

predic*on.

● Back to Case study


● Disclaimer: The numbers in this ar*cle are illustra*ve
● Mexico has a popula*on of 118 MM. Say, the algorithm Random forest picks

up 10k observa*on with only one variable (for simplicity) to build each CART

model. In total, we are looking at 5 CART models being built with different

variables. In a real life problem, you will have more number of popula*on

sample and different

combina*ons of input variables.


● Salary bands:

● Band 1: Below $40,000


● Band 2: $40,000 – 150,000
● Band 3: More than $150,000

● Following are the outputs of the 5 different CART model.

● CART 1: Variable Age


CART 2: Variable Gender

● CART 3: Variable Educa/on

● CART 4: Variable Residence

● CART 5: Variable Industry

● Using these 5 CART models, we need to come up with singe set of probability

to belong to each of the salary classes. For simplicity, we will just take a mean

of probabili*es in this case study. Other than simple mean, we also consider

vote method to come up with the final predic*on. To come up with the final

predic*on let’s locate the following profile in each CART model:

● 1. Age: 35 years, 2, Gender: Male, 3. Highest Educa*onal Qualifica*on:


Diploma
holder, 4. Industry: Manufacturing, 5. Residence: Metro

● For each of these CART model, following is the distribu*on across salary bands:
● The final probability is simply the average of the probability in the same salary

bands in different CART models. As you can see from this analysis, that there is

70% chance of this individual falling in class 1 (less than $40,000) and around

24%

chance of the individual falling in class 2.

● Conclusion
● Random forest gives much more accurate predic*ons when compared to simple

CART/CHAID or regression models in many scenarios. These cases generally

have high number of predic*ve variables and huge sample size. This is because

it captures the variance of several input variables at the same *me and enables

high

number of observa*ons to par*cipate in the predic*on.

Support Vector

Machine (SVM)

IntroducKon:

Mastering machine learning algorithms isn’t a myth at all. Most of the beginners start

by learning regression. It is simple to learn and use, but does that solve our purpose?
Of course not! Because, you can do so much more than just Regression!

Think of machine learning algorithms as an armory packed with axes, sword, blades,

bow, dagger etc. You have various tools, but you ought to learn to use them at the right

*me. As an analogy, think of ‘Regression’ as a sword capable of slicing and dicing data

efficiently, but incapable of dealing with highly complex data. On the contrary,

‘Support Vector Machines’ is like a sharp knife – it works on smaller datasets, but on

them, it can be much

stronger and powerful in building models.


“Support Vector Machine” (SVM) is a supervised machine learning algorithm which

can be used for either classifica*on or regression challenges. However, it is mostly used

in classifica*on problems. In this algorithm, we plot each data item as a point in

ndimensional space (where n is number of features you have) with the value of each

feature being the value of a par*cular coordinate. Then, we perform classifica*on by

finding the

hyper-plane that differen*ate the two classes very well (look at the below snapshot).

FIG 3.2: SUPPORT VECTOR MACHINE

Support Vectors are simply the co-ordinates of individual observa*on. Support Vector

Machine is a fron*er which best segregates the two classes (hyper-plane/ line).

You can look at defini*on of support vectors and a few examples of its working here.

Let’s talk about how support vector machine works. This is suitable for readers who do

not know much about this algorithm and have a curiosity to learn a new technique. In

following explore the technique in detail and analyze cases where such techniques are

stronger than other techniques.


Classifica0on analysis

Let’s consider an example to understand these concepts. We have a popula*on

composed of 50%-50% Males and Females. Using a sample of this popula*on, you

want to create some set of rules which will guide us the gender class for rest of the

popula*on. Using this algorithm, we intend to build a robot which can iden*fy

whether a person is a Male or a Female. This is a sample problem of classifica*on

analysis. Using some set of rules, we will try to classify the popula*on into two possible

segments. For simplicity, let’s assume that the two differen*a*ng factors iden*fied

are: Height of the individual and Hair Length.

Following is a scaYer plot of the sample.

FIG 3.3: CLASSIFICATION ANALYSIS

The blue circles in the plot represent females and green squares represents male. A
few

expected insights from the graph are:

1. Males in our popula*on have a higher average height.


2. Females in our popula*on have longer scalp hairs.

If we were to see an individual with height 180 cm’s and hair length 4 cm\s, our best
guess

will be to classify this individual as a male. This is how we do a classifica*on analysis.

A Support Vector and SVM

Support Vectors are simply the co-ordinates of individual observa*on. For instance,

(45,150) is a support vector which corresponds to a female. Support Vector Machine

is a fron*er which best segregates the Male from the Females. In this case, the two

classes are

well separated from each other; hence it is easier to find a SVM.

How to find the Support Vector Machine for case in hand?

There are many possible fron*ers which can classify the problem in hand. Following
are the three possible fron*ers.

FIG 3.4: SVM FOR CASE IN HAND

How do we decide which is the best fron*er for this par*cular problem statement?
The easiest way to interpret the objec*ve func*on in a SVM is to find the minimum

distance of the fron*er from closest support vector (this can belong to any class). For

instance, orange fron*er is closest to blue circles. And the closest blue circle is 2 units

away from the fron*er. Once we have these distances for all the fron*ers, we simply

choose the fron*er with the maximum distance (from the closest support vector). Out

of the three shown

fron*ers, we see the black fron*er is farthest from nearest support vector (i.e. 15
units).

What if we do not find a clean fron*er which segregates the classes?

Our job was rela*vely easier finding the SVM in this business case. What if the
distribu*on looked something like as follows?

FIG 3.5: DISTRIBUTION

In such cases, we do not see a straight line fron*er directly in current plane which can

serve as the SVM. In such cases, we need to map these vectors to a higher dimension

plane so that they get segregated from each other. Such cases will be covered once we

start with the formula*on of SVM. For now, you can visualize that such transforma*on

will

result into following type of SVM.


FIG 3.6: TRANSFORMATION

Each of the green squares in original distribu*on is mapped on a transformed scale.

And transformed scale has clearly segregated classes. Many algorithms have been

proposed to make these transforma*ons.


How does it work?

Above, we got accustomed to the process of segrega*ng the two classes with a

hyperplane. Now the burning ques*on is “How can we iden*fy the right hyper-plane?”

Don’t

worry; it’s not as hard as you think!

Let’s understand:

● IdenKfy the right hyper-plane (Scenario-1): Here, we have three hyper-planes (A,
B and C). Now, iden*fy the right hyper-plane to classify star and circle.

FIG 3.7: SCENARIO-1

You need to remember a thumb rule to iden*fy the right hyper-plane: “Select

the hyper-plane which segregates the two classes beYer”. In this scenario,
hyper-plane “B” has excellently performed this job.

● IdenKfy the right hyper-plane (Scenario-2): Here, we have three hyper-planes

(A, B and C) and all are segrega*ng the classes well. Now, how can we iden*fy

the right hyper-plane?


FIG 3.8: SCENARIO-2

Here, maximizing the distances between nearest data points (either

class) and hyper-plane will help us to decide the right hyper-plane. This

distance is called as Margin. Let’s look at the below snapshot:

FIG 3.9: MARGIN

Above, you can see that the margin for hyper-plane C is high as compared to

both A and B. Hence, we name the right hyper-plane as C. Another lightning

reason for selec*ng the hyper-plane with higher margin is robustness. If we

select a hyper-

plane having low margin then there is high chance of miss-classifica*on.


● IdenKfy the right hyper-plane (Scenario-3):Hint: Use the rules as discussed in

previous sec*on to iden*fy the right hyper-plane

FIG 3.10: SCENARIO-3

Some of you may have selected the hyper-plane B as it has higher margin
compared to A. But, here is the catch; SVM selects the hyper-plane which

classifies the classes accurately prior to maximizing margin. Here, hyper-plane

B has a classifica*on error and A has classified all correctly. Therefore, the right

hyper-plane is A.

● Can we classify two classes (Scenario-4)?: Below, I am unable to segregate the

two classes using a straight line, as one of star lies in the territory of

other(circle)

class as an outlier.
FIG 3.11: SCENARIO-4

As I have already men*oned, one star at other end is like an outlier for star class.

SVM has a feature to ignore outliers and find the hyper-plane that has maximum

margin. Hence, we can say, SVM is robust to outliers.

FIG 3.12: CLASSIFY TWO CLASSES

● Find the hyper-plane to segregate two classes (Scenario-5): In the scenario

below, we can’t have linear hyper-plane between the two classes, so how does

SVM

classify these two classes? Till now, we have only looked at the linear hyper-plane.
FIG 3.13: SCENARIO-5

SVM can solve this problem. Easily! It solves this problem by introducing addi*onal feature.
Here, we will add a new feature z=x^2+y^2. Now, let’s plot the data points
on axis x and z:

FIG 3.14: SEGREGATE TWO CLASSES

In above plot, points to consider are:

o All values for z would be posi*ve always because z is the squared sum
of both x and y

o In the original plot, red circles appear close to the origin of x and y axes,

leading to lower value of z and star rela*vely away from the origin result

to higher value of z.

In SVM, it is easy to have a linear hyper-plane between these two classes. But, another

burning ques*on which arises is, should we need to add this feature manually to have

a hyper-plane. No, SVM has a technique called the kernel trick. These are func*ons

which takes low dimensional input space and transform it to a higher dimensional

space i.e. it converts not separable problem to separable problem, these func*ons are

called kernels. It is mostly useful in non-linear separa*on problem. Simply put, it does

some extremely complex data transforma*ons, then find out the process to separate

the data based on the

labels or outputs you’ve defined.


When we look at the hyper-plane in original input space it looks like a circle:

FIG 3.15 HYPER-PLANE

Pros and Cons associated with SVM

● Pros:
o It works really well with clear margin of separa*on o It is effec*ve in high dimensional spaces.

o It is effec*ve in cases where number of dimensions is greater than the number of samples

o It uses a subset of training points in the decision func*on (called support vectors), so it is also
memory efficient.

● Cons:
o It doesn’t perform well, when we have large data set because the required training *me is higher

o It also doesn’t perform very well, when the data set has more noise i.e.
target classes are overlapping

o SVM doesn’t directly provide probability es*mates, these are calculated using an expensive five-
fold cross-valida*on. It is related SVC method of Python scikit-learn library
CHAPTER 4
SOFTWARE DESIGN AND
IMPLEMENTATION
4.1 DEDIGN AND IMPLEMENTATION
CONSTRAINTS Constraints in Analysis

♦ Constraints as Informal Text

♦ Constraints as Opera*onal Restric*ons

♦ Constraints Integrated in Exis*ng Model Concepts

♦ Constraints as a Separate Concept

♦ Constraints Implied by the Model Structure

Constraints in Design
♦ Determina*on of the Involved Classes

♦ Determina*on of the Involved Objects

♦ Determina*on of the Involved Ac*ons

♦ Determina*on of the Require Clauses

♦ Global ac*ons and Constraint Realiza*on


Constraints in Implementa/on
A hierarchical structuring of rela*ons may result in more classes and a more

complicated structure to implement. Therefore it is advisable to transform the

hierarchical rela*on structure to a simpler structure such as a classical flat one. It is

rather straighaorward to transform the developed hierarchical model into a bipar*te,

flat model, consis*ng of classes on the one hand and flat rela*ons on the other. Flat

rela*ons are preferred at the design level for reasons of simplicity and implementa*on

ease. There is no iden*ty or func*onality associated with a flat rela*on. A flat rela*on

corresponds with the rela*on concept of en*tyrela*onship modeling and many object

oriented methods.
4.2 Other Nonfunc/onal Requirements Performance Requirements
The applica*on at this side controls and communicates with the following three main

general components.

embedded browser in charge of the naviga*on and accessing to the web


service;

Server Tier: The server side contains the main parts of the func*onality of the
proposed architecture. The components at this *er are the following.
Web Server, Security Module, Server-Side Capturing Engine, Preprocessing Engine,
Database System, Verifica*on Engine, Output Module.

Safety Requirements
1. The so_ware may be safety-cri*cal. If so, there are issues associated
with its integrity level
2. The so_ware may not be safety-cri*cal although it forms part of a safety-

cri*cal system. For example, so_ware may simply log transac*ons.


3. If a system must be of a high integrity level and if the so_ware is shown

to be of that integrity level, then the hardware must be at least of the same
integrity level. 4. There is liYle point in producing 'perfect' code in some

language if hardware and system so_ware (in widest sense) are not reliable.
5. If a computer system is to run so_ware of a high integrity level then that
system should not at the same *me accommodate so_ware of a lower

integrity level.
6. Systems with different requirements for safety levels must be separated.

7. Otherwise, the highest level of integrity required must be applied to all

systems in the same environment.

4.3 Architecture Diagram:


Fig: 4.1 ARCHITECTURE DIAGRAM

4.4 Sequence Diagram:


A Sequence diagram is a kind of interac*on diagram that shows how processes operate

with one another and in what order. It is a construct of Message Sequence diagrams

are

some*mes called event diagrams, event sceneries and *ming diagram.

FIG 4.2 SEQUENCE DIAGRAM


4.5 USE CASE DIAGRAM:
Unified Modeling Language (UML) is a standardized general-purpose modeling

language in the field of so_ware engineering. The standard is managed and was

created by the Object Management Group. UML includes a set of graphic nota*on

techniques to create visual models of so_ware intensive systems. This language is used

to specify, visualize, modify, construct and document the ar*facts of an object oriented

so_ware intensive system under development.


A Use case Diagram is used to present a graphical overview of the func*onality

provided by a system in terms of actors, their goals and any dependencies between

those use

cases.Use case diagram consists of two parts:


Use case: A use case describes a sequence of ac*ons that provided something of

measurable value to an actor and is drawn as a horizontal ellipse.

Actor: An actor is a person, organiza*on or external system that plays a role in one or
more interac*on with the system.
FIG 4.3: USECASE DIAGRAM

4.6 ACTIVITY DIAGRAM:


Ac*vity diagram is a graphical representa*on of workflows of stepwise ac*vi*es and

ac*ons with support for choice, itera*on and concurrency. An ac*vity diagram shows

the

overall flow of control.


The most important shape types:

● Rounded rectangles represent ac*vi*es.


● Diamonds represent decisions.

● Bars represent the start or end of concurrent ac*vi*es.

● A black circle represents the start of the workflow.


● An encircled circle represent

s the end of the


workflow.

FIG 4.4: ACTIVITY DIAGRAM

4.7COLLABORATION DIAGRAM
UML Collabora*on Diagrams illustrate the rela*onship and interac*on between

so_ware objects. They require use cases, system opera*on contracts and domain
model to already exist. The collabora*on diagram illustrates messages being sent

between classes and

objects.

FIG 4.5: COLLABORATION DIAGRAM

4.8 MODULES
Data Pre-Processing
Feature Selec*on
Classifica*on Modeling
Performance Measures
Module Descrip6on

Dataset DescripKon:

The objec*ve of the dataset is to predict whether or not a pa*ent has diabetes, based

on certain diagnos*c measurements included in the dataset. The datasets consists of

several medical predictor variables and one target variable, Outcome. Predictor

variables include the number of pregnancies the pa*ent has had, their BMI, insulin

level, age, and so on.

Pregnancies: Number of *mes pregnant

Glucose: Plasma glucose concentra*on a 2 hours in an oral glucose tolerance test

Blood Pressure: Diastolic blood pressure (mm Hg)


Skin Thickness: Triceps skin fold thickness (mm)

Insulin: 2-Hour serum insulin (mu U/ml)

BMI: Body mass index (weight in kg/(height in m)^2)

Diabetes Pedigree Func*on: Diabetes pedigree func*on


Age: Age (years)

Outcome: Class variable (0 or 1) 268 of 768 are 1, the others are 0

Data Pre-Processing
Diabetes disease data is pre-processed a_er collec*on of various records. The dataset

contains a total of 769 pa*ent records, where 6 records are with some missing values.

Those 6 records have been removed from the dataset and the remaining 763 pa*ent

records are used in pre-processing.


Feature Selec6on
From among the 8 aYributes of the data set, one aYributes pertaining to age is used

to iden*fy the personal informa*on of the pa*ent. The remaining 7 aYributes are

considered important as they contain vital clinical records. Clinical records are vital to

diagnosis and learning the severity of diabetes disease.

Classifica6on Modeling
The clustering of datasets is done on the basis of the variables and criteria of Decision

Tree (DT) features. Then, the classifiers are applied to each clustered dataset in order

to es*mate its performance. The best performing models are iden*fied from the

above results

based on their low rate of error.

● Decision Trees Classifier

● Support Vector Classifier


● Random Forest Classifier

● Logis*c Regression
● K Nearest neighbors
● Naive Bayes

Performance Measures:
Several standard performance metrics such as accuracy, precision and error in
classifica*on have been considered for the computa*on of performance efficacy of
this model.
Logis*c Regression: 71.42857142857143

K Nearest neighbors: 78.57142857142857


Support Vector Classifier: 73.37662337662337
Naive Bayes: 71.42857142857143
Decision tree: 68.18181818181817

Random Forest: 75.97402597402598


4.9 CODING AND

TESTING CODING
Once the design aspect of the system is finalizes the system enters into the coding and

tes*ng phase. The coding phase brings the actual system into ac*on by conver*ng the

design of the system into the code in a given programming language. Therefore, a good

coding style has to be taken whenever changes are required it easily screwed into the

system.

CODING STANDARDS
Coding standards are guidelines to programming that focuses on the physical structure

and appearance of the program. They make the code easier to read, understand and

maintain. This phase of the system actually implements the blueprint developed

during the design phase. The coding specifica*on should be in such a way that any

programmer must be able to understand the code and can bring about changes

whenever felt necessary. Some

of the standard needed to achieve the above-men*oned objec*ves are as follows:


Program should be simple, clear and easy to understand.
Naming conven*ons
Value conven*ons

Script and comment


procedure Message
box format
Excep*on and error
handling

NAMING CONVENTIONS
Naming conven*ons of classes, data member, member func*ons, procedures etc.,

should be self-descripKve. One should even get the meaning and scope of the variable
by its name.

The conven*ons are adopted for easy understanding of the intended message by the
user.
So it is customary to follow the conven*ons. These conven*ons are as follows: Class

names Class names are problem domain equivalence and begin with capital leYer and

have mixed cases.

Member Func)on and Data Member name

Member func*on and data member name begins with a lowercase leYer with each
subsequent leYers of the new words in uppercase and the rest of leYers in lowercase.

VALUE CONVENTIONS
Value conven*ons ensure values for variable at any point of *me. This involves the

following:
Proper default values for the variables.

Proper valida*on of values in the field.

Proper documenta*on of flag values.

SCRIPT WRITING AND COMMENTING STANDARD


Script wri*ng is an art in which indenta*on is utmost important. Condi*onal and

looping statements are to be properly aligned to facilitate easy understanding.

Comments are included to minimize the number of surprises that could occur when

going through the code.

MESSAGE BOX FORMAT


When something has to be prompted to the user, he must be able to understand it
properly.

To achieve this, a specific format has been adopted in displaying messages to the user.

They are as follows:


X – User has performed illegal opera*on.

! – Informa*on to the user.

TEST PROCEDURE

SYSTEM TESTING
Tes*ng is performed to iden*fy errors. It is used for quality assurance. Tes*ng is an
integral
part of the en*re development and maintenance process. The goal of the tes*ng

during phase is to verify that the specifica*on has been accurately and completely

incorporated into the design, as well as to ensure the correctness of the design itself.

For example the design must not have any logic faults in the design is detected before

coding commences, otherwise the cost of fixing the faults will be considerably higher

as reflected. Detec*on of design faults can be achieved by means of inspec*on as well

as walkthrough. Tes*ng is one of the important steps in the so_ware development

phase. Tes*ng checks for the

errors, as a whole of the project tes*ng involves the following test cases:
Sta*c analysis is used to inves*gate the structural proper*es of the Source
code.

Dynamic tes*ng is used to inves*gate the behavior of the source code by


execu*ng the program on the test data.

4.10 TEST DATA AND

OUTPUT UNIT TESTING


Unit tes*ng is conducted to verify the func*onal performance of each modular
component
of the so_ware. Unit tes*ng focuses on the smallest unit of the so_ware design (i.e.),

the module. The white-box tes*ng techniques were heavily employed for unit tes*ng.

FUNCTIONAL TESTS
Func*onal test cases involved exercising the code with nominal input values for which

the expected results are known, as well as boundary values and special values, such as

logically related inputs, files of iden*cal elements, and empty files.


Three types of tests in Func*onal test:

Performance Test
Stress Test
Structure Test
PERFORMANCE TEST

It determines the amount of execu*on *me spent in various parts of the unit, program

throughput, and response *me and device u*liza*on by the program unit.

STRESS TEST

Stress Test is those test designed to inten*onally break the unit. A Great deal can be

learned about the strength and limita*ons of a program by examining the manner in

which a programmer in which a program unit breaks.

STRUCTURED TEST
Structure Tests are concerned with exercising the internal logic of a program and

traversing par*cular execu*on paths. The way in which White-Box test strategy was

employed to ensure that the test cases could Guarantee that all independent paths

within a module

have been have been exercised at least once.


Exercise all logical decisions on their true or false sides.
Execute all loops at their boundaries and within their opera*onal
bounds.
Exercise internal data structures to assure their validity.
Checking aYributes for their correctness.
Handling end of file condi*on, I/O errors, buffer problems and textual
errors in output informa*on

INTEGRATION TESTING
Integra*on tes*ng is a systema*c technique for construc*on the program structure

while at the same *me conduc*ng tests to uncover errors associated with interfacing.

i.e., integra*on tes*ng is the complete tes*ng of the set of modules which makes up

the product. The objec*ve is to take untested modules and build a program structure

tester

should iden*fy cri*cal modules. Cri*cal modules should be tested as early as possible.
One
approach is to wait un*l all the units have passed tes*ng, and then combine them and

then tested. This approach is evolved from unstructured tes*ng of small programs.

Another strategy is to construct the product in increments of tested units. A small set

of modules are integrated together and tested, to which another module is added and

tested in combina*on. And so on. The advantages of this approach are that, interface

dispenses can be easily found and corrected.

The major error that was faced during the project is linking error. When all the modules

are combined the link is not set properly with all support files. Then we checked out

for interconnec*on and the links. Errors are localized to the new module and its

intercommunica*ons. The product development can be staged, and modules

integrated in as they complete unit tes*ng. Tes*ng is completed when the last module

is integrated and

tested.

4.11 TESTING TECHNIQUES / TESTING

STRATERGIES TESTING
Tes*ng is a process of execu*ng a program with the intent of finding an error. A good

test case is one that has a high probability of finding an as-yet –undiscovered error. A

successful test is one that uncovers an as-yet- undiscovered error. System tes*ng is the

stage of implementa*on, which is aimed at ensuring that the system works accurately

and efficiently as expected before live opera*on commences. It verifies that the whole

set of programs hang together. System tes*ng requires a test consists of several key

ac*vi*es and steps for run program, string, system and is important in adop*ng a

successful new system. This is the last chance to detect and correct errors before the

system is installed

for user acceptance tes*ng.

The so_ware tes*ng process commences once the program is created and the

documenta*on and related data structures are designed. So_ware tes*ng is essen*al

for correc*ng errors. Otherwise the program or the project is not said to be complete.
So_ware tes*ng is the cri*cal element of so_ware quality assurance and represents

the ul*mate the review of specifica*on design and coding. Tes*ng is the process of

execu*ng the program with the intent of finding the error. A good test case design is

one that as a probability of finding an yet undiscovered error. A successful test is one

that uncovers an yet

undiscovered error. Any engineering product can be tested in one of the two ways:

WHITE BOX TESTING


This tes*ng is also called as Glass box tes*ng. In this tes*ng, by knowing the specific
func*ons that a product has been design to perform test can be conducted that
demonstrate each func*on is fully opera*onal at the same *me searching for errors in

each func*on. It is a test case design method that uses the control structure of the

procedural

design to derive test cases. Basis path tes*ng is a white box tes*ng.
Basis path tes*ng:
Flow graph nota*on
Cyclometric complexity
Deriving test cases
Graph matrices Control

BLACK BOX TESTING


In this tes*ng by knowing the internal opera*on of a product, test can be conducted

to ensure that “all gears mesh”, that is the internal opera*on performs according to

specifica*on and all internal components have been adequately exercised. It

fundamentally focuses on the func*onal requirements of the so_ware.

The steps involved in black box test case design are:


Graph based tes*ng methods
Equivalence par**oning
Boundary value analysis
Comparison tes*ng
SOFTWARE TESTING STRATEGIES:
A so_ware tes*ng strategy provides a road map for the so_ware developer. Tes*ng is

a set ac*vity that can be planned in advance and conducted systema*cally. For this

reason a template for so_ware tes*ng a set of steps into which we can place specific

test case

design methods should be strategy should have the following characteris*cs:


Tes*ng begins at the module level and works “outward” toward the

integra*on of the en*re computer based system.

Different tes*ng techniques are appropriate at different points in *me.


The developer of the so_ware and an independent test group conducts

tes*ng.
Tes*ng and Debugging are different ac*vi*es but debugging must be

accommodated in any tes*ng strategy.


INTEGRATION TESTING:

Integra*on tes*ng is a systema*c technique for construc*ng the program structure


while at the same *me conduc*ng tests to uncover errors associated with. Individual
modules,

which are highly prone to interface errors, should not be assumed to work instantly
when
we put them together. The problem of course, is “puxng them together”- interfacing.

There may be the chances of data lost across on another’s sub func*ons, when

combined may not produce the desired major func*on; individually acceptable

impression may be

magnified to unacceptable levels; global data structures can present problems.


PROGRAM TESTING:
The logical and syntax errors have been pointed out by program tes*ng. A syntax error

is an error in a program statement that in violates one or more rules of the language

in which it is wriYen. An improperly defined field dimension or omiYed keywords are

common syntax error. These errors are shown through error messages generated by

the computer. A logic error on the other hand deals with the incorrect data fields, out-

off-range items and invalid combina*ons. Since the compiler s will not deduct logical

error, the programmer must examine the output. Condi*on tes*ng exercises the

logical condi*ons contained in a module. The possible types of elements in a condi*on

include a Boolean operator, Boolean variable, a pair of Boolean parentheses A

rela*onal operator or on arithme*c expression. Condi*on tes*ng method focuses on

tes*ng each condi*on in the program the purpose of condi*on test is to deduct not

only errors in the condi*on of a program but also other a

errors in the program.


SECURITY TESTING:
Security tes*ng aYempts to verify the protec*on mechanisms built in to a system well,

in fact, protect it from improper penetra*on. The system security must be tested for

invulnerability from frontal aYack must also be tested for invulnerability from rear

aYack.

During security, the tester places the role of individual who desires to penetrate
system.

VALIDATION TESTING
At the culmina*on of integra*on tes*ng, so_ware is completely assembled as a

package. Interfacing errors have been uncovered and corrected and a final series of
so_ware testvalida*on tes*ng begins. Valida*on tes*ng can be defined in many ways,

but a simple defini*on is that valida*on succeeds when the so_ware func*ons in

manner that is reasonably expected by the customer. So_ware valida*on is achieved

through a series of black box tests that demonstrate conformity with requirement.

A_er valida*on test has been conducted, one of two condi*ons exists.

* The func*on or performance characteris*cs confirm to specifica*ons and are


accepted.

* A valida*on from specifica*on is uncovered and a deficiency created.

Devia*on or errors discovered at this step in this project is corrected prior to


comple*on of the project with the help of the user by nego*a*ng to establish a
method for resolving
deficiencies. Thus the proposed system under considera*on has been tested by using

valida*on tes*ng and found to be working sa*sfactorily. Though there were

deficiencies in the system they were not catastrophic

USER ACCEPTANCE TESTING


User acceptance of the system is key factor for the success of any system. The system

under considera*on is tested for user acceptance by constantly keeping in touch with

prospec*ve system and user at the *me of developing and making changes whenever

required. This is done in regarding to the following points.


● Input screen design.

● Output screen design.


CHAPTER 5 RESULTS AND DISCUSSION
5.1 RESULT
In this chapter we will discuss about the results of our system.

Fig 5.1: WELCOME PAGE

The Fig 5.1 is the welcome page of the project. This page will appear once the

user or the admin open’s the portal. It is the frontend of the project or it can

also be termed as user interface. Here the user the gets the mul*ple op*ons to

login or register as per need.


Fig 5.2: REGISTRATION PAGE

The Fig 5.2 shows that the user has to register first if he or she is new in this

website. A_er successful registra*on, they get their log-in creden*als which are

provided by the admin. They use the login creden*als to log-in in the website.

Fig 5.3: LOG-IN PAGE


The Fig 5.3 is the log-in page which generally is the entry or authen*ca*on of

every secured portal. This portal can be logged in by both admin and the user.

The admin can log-in to check the working of the portal and to maintain the

database. The user log-in to access the portal and the admin provides the

required security to the users so that the data does not get deleted or leaked.

This sec*on checks the

authen*ca*on and if the log in creden*als is correct then the user can log in. It is the

frontend of the project or it can also be termed as user interface. Here the user the

gets the mul*ple op*ons to execute or access their task as per need.

Fig 5.4: HOME PAGE

The Fig 5.4 is the home page. This page is displayed a_er successful log-in. Here the

user gets the mul*ple op*ons to choose a file and upload the file. If the user does

not

choose a file, it shows that no file has chosen.


Fig 5.5:FILE UPLOAD PAGE

The Fig 5.5 is the file upload page. Here the user has to choose the file which he

want to upload. In this page the user can access the image files from the data

set. The user can view the images and upload the images. The admin can log-in

to check

the working of the portal and to maintain the database.


Fig 5.6: RESULT PAGE

The Fig 5.6 is the result page of our project. This page is displayed a_er the image

file is uploaded successfully. This page shows the percentage of Diabe*c

Re*nopathy and percentage of non-diabe*c re*nopathy.

If the percentage of diabe*c re*nopathy is higher than the percentage of non-diabe*c

re*nopathy, then it concludes that the person is having diabe*c re*nopathy.

Fig 5.7: STAGE OF DIABETC RETINOPHY

The Fig 5.7 shows the stages of diabe*c re*nopathy with increasing severity.
Fig 5.8: Comparison of DR results

The comparison gained results with the state-of-the-art models on the Messidor
dataset in fig 5.8 shows that our AUC value is the highest compared to recent
work.
CHAPTER 6 CONCLUSION AND FUTURE WORK

6.1 CONCLUSION

This system will make the process of diabe*c re*nopathy detec*on


moreefficient and flexible for the users. The users no need to worry about
the security and the confiden*ality of the images submiYed. And the user
need not worry about the informa*on to be stored because it is all
maintained in a database and can be edited or retrieved whenever needed.
The portal is secured, monitored and maintained by the admin *me to *me.
Our model accepts to fundus images corresponding to the le_ eye and right
eye as inputs and then transmits them into the Siamese – like blocks. The
informa*on from two eyes is gathered into the fullyconnected layer and
finally the model will output the diagnosis result of each eye respec*vely.
The system consists of four func*onal modules: Image PreProcessing,
Incep*on, Convolu*on Neural network algorithm and Matching Score.

6.2 FUTURE WORK

This project summarizes the state of the art in diabe*c re*nopathy research

and provides a perspec*ve on opportuni*es for future inves*ga*ons. New

insights into the pathophysiology of diabetes and diabe*c re*nopathy will

improve metabolic control. Structure-func*on analyses are revealing new

details of diabe*c re*nopathy. Intraocular drug therapy provides improved

visual outcomes. Together these steps will yield beYer means to detect and

quan*fy vision loss, and to

develop pa*ent-specific treatments to preserve vision for persons with diabetes.


REFERENCES

[1] XialongZeng,Haiquan Chen, Yuan Lao, Wenbin Ye,” Automated Diabe*c

Re*nopathy Detec*on on Binocular Vol 09 Issue01, Jan 2020 ISSN 2456 –

5083 Page 8 Siamese-Like Convolu*on Neural Network” ,February 2019.

[2]
S. R. Nirmala, M. K. Nath, and S. Dandapat, “ Re*nal Image Analysis: A
Review,” Interna*onal Journal of Computer & Communica*on
Technology (IJCCT), vol-2, pp. 11- 15, 2011.

[3] Na*onal Eye Ins*tute, Na*onal Ins*tutes of Health, “ Diabe*c Re*nopathy:


What you should know,” Booklet, NIH Publica*on, no: 06-2171, 2003.

[4] A. D. Fleming, K. A. Goatman, and J. A. Olson, “ The role of haemorrhage and

exudate detec*on in automated grading of diabe*c re*nopathy,” Bri*sh

Journal

of Ophthalmology, vol 94, no. 6, pp. 706- 711, 2010.

[5] Shijian Lu, et al., “ Automa*c Fundus Image Classifica*on for Computer-Aided

Diagnosis” , 31st Annual Interna*onal Conference of the IEEE EMBS

Minneapolis, Minnesota, USA, September 2-6, 2009, pp.1453-1456.

[6] Gang Luo, OpasChutatape, Huiqi Lei, Shankar. M. Krishnan, “Abnormality

Detec*on in Automated Mass Screening System of Diabe*c Re*nopathy”,

available online at “hYp://luoinnova*on.tech.officelive.com/l uo


research/cbms2001.pdf” (accessed on March 3, 2013).

[7] Keerthi Ram, Gopal DaY Joshi, and JayanthiSivaswam, “A


Successive
CluYerRejec*onBased Approach for Early Detec*on of Diabe*c

Re*nopathy” , IEEE Transac*ons On Biomedical Engineering, Vol. 58, No.

3, March 2011, pp.664-673.

[8] Ahmad Fadzil, et. Al, “ Analysis of Foveal Avascular Zone in Colour Fundus
Images for Grading of Diabe*c Re*nopathy Severity” , 32nd
Annual

Interna*onal Conference of the IEEE EMBS Buenos Aires, Argen*na,


August 31 - September 4, 2010, pp. 5632-5635.

[9] R. Sivakumar, G. Ravindran, M. Muthayya, S. Lakshminarayanan, and C. U.


Velmurughendran, “ Diabe*c Re*nopathy Analysis” , Journal of Biomedicine
and Biotechnology, 2005 pp. 20– 27. Z.

[10] Omar, M. Hanafi, S. Mashohor, N. Mahfudz, and


M. Muna’im,
“Automa*cdiabe*c re*nopathy detec*on and classifica*on system,” in 2017 7th
IEEE Interna*onal Conference on System Engineering and Technology
(ICSET), pp. 162– 166, IEEE, 2017

[11] Y. Haniza, A. Hamzah, and M. Norrima, “Edge sharpening for diabe*c


re*nopathy detec*on,” IEEE Conference Publica*ons, 2010
APPENDIX

A. Paper work

DIABETIC RETINOPATHY DETECTION USING


DEEP LEARNING TECHNIQUE
B.Report
ORIGINALITYREPORT

22 % 6% 18% 7%
SIMILARITYINDEX INTERNETSOURCES PUBLICATIONS STUDENTPAPERS

PRIMARYSOURCES

1 SubmiYedtoIndianaUniversity

StudentPaper

2%
AsraMomeniPour,HadiSeyedarabi,SeyedHassanAbbasiJahro

mi,Alirez a

Javadzadeh."Automa*cDetec*onandMonitoringofDiabe*cRe*nopathyUs

ngEfficientConvolu*onalNeuralNetworksandContrastLimitedAdap*veHist

ogramEqualiza*on",IEEEAccess,2020

Publica*on
ShoravSuriyal,ChristopherDruzgalski,KumarGautam."Mobil

eassistedd

iabe*cre*nopathydetec*onusingdeepneuralnetwork",2018

GlobalMedi c

alEngineeringPhysicsExchanges/PanAmericanHealthCareExc

hanges(

GMEPE/PAHCE),2018
Publica*on

4 SubmiYedtoChinoValleyUnifiedSchoolDistrict

StudentPaper %

5 XianglongZeng,HaiquanChen,YuanLuo,
WenbinYe."AutomatedDiabe*cRe*nopathy

Detec*onBasedonBinocularSiamese-Like

%
Convolu*onalNeuralNetwork",IEEEAccess,2019

Publica*on

"NeuralNetworkTechniqueforDiabe*cRe*nopathyDetec*on",Interna*on

alJournalofEngineeringandAdvancedTechnology,2019
Publica*on

G.Kalyani,B.Janakiramaiah,A.Karuna,L.V.NarasimhaPrasad."Diabe*cre
*nopathydetec*onandclassifica*onusingcapsulenetworks",Complex&Intel
ligentSystems,2021
Publica*on

SubmiYedtoAmityUniversity
StudentPaper

1%
HaiQuanChen,XiangLongZeng,YuanLuo,WenBinYe."Detec*onofDiab

e*cRe*nopathyusingDeepNeuralNetwork",2018IEEE23rdInterna*onalC

onferenceonDigitalSignalProcessing (DSP),2018
Publica*on
1%

XiangLongZeng,HaiQuanChen,YuanLuo,WenBinYe."AutomatedDetec*on

ofDiabe*cRe*nopathyusingaBinocularSiamese-

LikeConvolu*onalNetwork",2019IEEEInterna*onalSymposiumonCircuitsa

ndSystems(ISCAS),
2019

Publica*on

www.ijritcc.or g

11 InternetSource

www.nice.org.uk

12 InternetSource 1
www.isaacpub.org

13 CommunicationandSigna
InternetSource

lProcessing,2013.
Publica*on
www.warse.org

14 InternetSource

ManishaManjra
mkar."Surv
eyofDiabeticRe
tinop
athyScreening
Methods",20
182nd

S.SaranyaRubini,A.Kun
thavai."Diabe*
cRe*nopathyDetec*onBa
sedonEige
nvalueso_heHessianMatri
x",ProcediaC
omputerScience, 2015
Publica*on
technicaljournalsonline.c
om

16 InternetSource

Gandhi,Mahe
ndran,andR
.Dhanasekara
n."Diagnosi
sofdiabeticreti
nopathyusi
ngmorphologi
calprocess a
ndSVMclassif
ier",2013I
nternationalC
onferenceo n
%

1%
1%
<1%

<1%

<1%

<1%
Interna*onalConferenceonTrendsinElectronicsandInforma*cs(ICOEI),2018

Publica*on

JaykumarLachure,A.V.Deorankar,SagarLachure
,Swati
Gupta,RomitJadhav."DiabeticRetinopathyusing
morpho
logicaloperationsandmachinelearning",2015IEE
EIntern
ationalAdvanceComputingConference(IACC),2
015
Publica*on

C. Source Code

from flask import *

app = Flask( name )

import tensorflow

as a, sys import

tensorflow.compat

.v1 as a

a.disable_v2_beh

avior()
import os import sqlite3 as sql import base64 from flask

import Flask ,render_template,request,jsonify,session

from flask import

Flask,abort,render_template,request,redirect,url_for

#from werkzeug import secure_filename

#Flask Libraries from flask import Flask, redirect,

url_for, render_template, request import requests

import urllib

@app.route('/',

methods=['GET', 'POST'])

def home(): return

render_template('index.ht

ml')

def

validate(username

,password): con =

sql.connect('sta*c/

chat.db')

comple*on = False

with con:

cur = con.cursor()

cur.execute('SELECT * FROM persons')

for row in

rows:
dbuser

row[1]

dbpass

row[2]

if

dbuser

==

userna

me:

comple*on = (dbpass ==

password) return comple*on

@app.route('/login',

methods=['GET', 'POST']) def

login(): error = None if

request.method == 'POST':

username =

request.form['username']

password =

request.form['password']

comple*on =

validate(username,password) if
comple*on == False: error =

'invalid Creden*als. please try

again.' else:

session['username'] =

request.form['username'] return

render_template('file_upload_form.html')

return

render_template('file_upload_form.html',

error=error)

@app.route('/register', methods =

['GET','POST']) def register(): if

request.method == 'POST':

try:

name = request.form['name']

username =

request.form['us5e9rna

me']
password =

request.form['password']

with

sql.connect("sta*c/chat.d

b") as con:

cur = con.cursor() cur.execute("INSERT INTO

persons(name,username,password)

VALUES

(?,?,?)",(name,username,pass

word)) con.commit() msg =

"Record successfully added"

except:

con.r

ollback(

) msg =

"error

in insert

opera*

on"

finally:

return

render_template("index.html",msg =

msg) con.close() return

render_template('register.html')
@ap

p.ro

ute('

/list')

def

list():

con =

sql.connect("sta

*c/chat.db")

con.row_factory

= sql.Row

cur = con.cursor()

cur.execute("select *

from persons")

rows = cur.fetchall();

return render_template("list.html",rows = rows)


def upload(): return

render_template("file_upload_for

m.html")

@app.route('/success', methods

= ['POST']) def success():

if request.method == 'POST':

f = request.files['file']

# change this

as you see fit

#image_path

= sys.argv[1]

image_path

=f.filename

# Read in the image_data image_data =

a.gfile.FastGFile("D:/Project/Automa*c

Diabe*c_Re*nopathy_webapp/dataset/"+str(image_path), 'rb').read()

# Loads label file, strips off carriage

return label_lines = [line.rstrip() for line

in a.gfile.GFile("models/retrained_labels.txt")]
# Unpersists graph from file with

a.gfile.FastGFile("models/retrained_graph.pb",

'rb') as f:

graph_def = a.GraphDef()

graph_def.ParseFromString(f.read())

_ = a.import_graph_def(graph_def, name='')

with a.Session() as sess:

# Feed the image_data as input to the graph and get first predic*on
so_max_tensor = sess.graph.get_tensor_by_name('final_result:0')
predic*ons = sess.run(so_max_tensor,

{'DecodeJpeg/contents:0': image_data})

# Sort to show labels of first predic*on in order

of confidence top_k = predic*ons[0].argsort()[-

len(predic*ons[0]):][::-1] print(top_k) m=[] for

node_id in top_k: human_string =

label_lines[node_id] score =

predic*ons[0][node_id] for j in

human_string,score:

print(j)

m.append(j)

#print('%s (score = %.5f)' % (human_string, score))

return render_template("success.html", list2 = m)

if name == ' main ':

app.run(debug = True)
References
[1] Feisul, M. I., & Azmi, S. (2013). National diabetes registry report.
2009-2012. Ministry of Health Malaysia.
[2] Neuwirth, J. (1988). Diabetic retinopathy: What you should know.
https://nei.nih.gov/sites/default/files/Diabetic-Retinopathy-What-
You-Should-Know-508.pdf.
[3] Vislisel, J., & Oetting, T. (2010). Diabetic retinopathy: From one
medical student to another. http://eyerounds.org/tutorials/diabetic-
retinopathy-med-students/Diabetic-Retinopathy-medical-
students.pdf.
[4] Diabetes UK. (2006). What is diabetes.
http://www.godnaturalcures.com/healthc/PDF%207%20EBOOKS/
natural-help-for-diabetes.pdf.
[5] Antonetti, D., Klein, R., & Gardener, T., Diabetic retinopathy.
(2012). https://www.aoa.org/patients-and-public/eye-and-vision-
problems/glossary-of-eye-and-vision-conditions/diabetic-
retinopathy.
[6] Ophthalmic Photographers' Society. Fundus photography overview.
https://www.opsweb.org/page/fundusphotography.
[7] Poornima, S. V., Nishchala, T. K., & Umamakeswari, A. (2014).
Detection of diabetic retinopathy by applying total varia-
tion. Biomedical Research, 25(4), 560–563.
[8] Decencière, E., Zhang, X., Cazuguel, G., Lay, B., Cochener, B.,
Trone, C., P. Gain, R. Ordonez, P. Massin, A. Erginay, & Charton,
B. (2014). Feedback on a publicly distributed image database: The
Messidor database. Image Analysis and Stereology, 33(3), 231-234.

You might also like