UKASHA ABDUL-KADIR

CHAPTER ONE
INTRODUCTION
1.1 Background to the study
A condition when high levels of blood sugar (blood glucose) caused metabolic disease is namely
diabetes. Diabetes makes the human body cannot properly respond to insulin. The hormone
insulin is responsible for controlling the equilibrium of blood sugar levels. Hyperglycemia, or
elevated blood glucose, is a sign that the insulin is not functioning correctly. The term is
impaired glucose tolerance (IGT). The effect of diabetes suffering over time has serious risks
such as kidney disease (CKD), heart disease (CVD), damaged blood vessels, blindness (diabetes
retinopathy), and skin problems. So, it is included as one of the most dangerous diseases in the
world. According to the WHO, 422 million individuals in low- and middle-income countries
have diabetes. It is predicted that there will be 642 million cases of diabetes worldwide in 2040.
Diabetes comes in two varieties: type 1 and type 2. Type 1 occurs because the pancreas produces
little insulin, and type 2 is the human body resists insulin so there is not enough insulin in the
body. A reason that causes diabetes is bad consumption habits. As an example, Indonesian
people like to consume high calories food without protein, vitamins, and fat balancing. In
addition, psychological factors like mental health conditions, cognitive dysfunction, personality
traits, and quality of life can also contribute to it. A prolonged period of high blood glucose
concentration is the primary sign of diabetes, frequent urination, blurry eyes, abnormal loss of
weight and feel hungry too quickly. Diabetes can be detected from urine checks, blood pressure
tests, kidney tests and biopsy. But studies show that 40% of diabetic subjects do not show initial
screening. Diabetes also called as a silent killer’s because the undetectable symptoms. One
concern of diabetes is that it can cause several health issues or possibly early mortality if it is not
identified and treated promptly. Therefore, it is necessary to have a good system to detect
diabetes as a preventive method. In this research, we take advantage of machine learning

classifier implementation to predict diabetes early. In the last ten years, machine learning has
been used extensively in the healthcare industry. Machine learning has a great deal of potential to
raise human standards and move toward a peaceful existence due to its dependability and
efficiency. There is a lot of data stored in different formats because of the current digitalization
era. Machine learning can use this data to examine patterns or hidden knowledge within a set of
data. Since machine learning is a data-driven approach, learning requires data. There are
algorithms in the machine learning method that was used in previous research to predict diabetes
disease. Such as support vector machine (SVM) reach accuracy about 84.10%, 100%, and 82%.
SVM reliably handles data that is separated both linearly and nonlinearly by constructing a hyper
plane that divides data classes. An algorithm known as random forest (RF) uses multiple
decision trees to produce decisions and conduct majority voting. RF could reach accuracy about
84%; 79%; and 98%. Logistic model tree (LMT) is a tree model based on logistic function. This
method can reach accuracy = 79.31%. Fuzzy is one of soft computing method that reaches
accuracy about 79.8%. Artificial neuro fuzzy inference system (ANFIS) reach accuracy =
84.48%. Decision tree (DT) is tree based model to do prediction and reach accuracy about
91.2%. It is proven that machine learning can be implemented to predict diabetes disease.
Multivariate analysis is a statistical technique for the analysis of two or more variables observed
from one or more sample objects. It consists of methods that are suitable for application when
several measurements are taken on each unit in one or more samples. The quality and validity of
information obtained about the population of study increases with the number of variables
measured on each sampled unit, and adequately modeled to explain variations in the response
variable. The discriminant function procedure introduced by Fisher [9] is one of the linear
models used for the analysis of multivariate data; it gives the best linear function for
discriminating the explanatory variables. Discriminant analysis focuses on the association
between multiple independent variables and a categorical dependent variable to create rules for
separating distinct groups as much as possible, and for assigning an observation of unknown
origin to one of _ ≥ 2 distinct preexisting groups. Fisher's linear discriminant function (FLDF) is
frequently used for discrimination, classification and prediction purposes under the usual basic
statistical assumptions of; multivariate normality of the independent variables, equality of
variance and covariance matrices, and relative equality of groups sample sizes. According to the
linear discriminant analysis (LDA) is suitable for supervised classification when the number of
observations, _ is larger than the number of variables. However, with the development of new
technologies, there has been increase in complex problems with high-dimensionality, a situation
where the number of variables (the dimension of the data vectors) is much larger than the
number of observations (sample size), that is __ < __ in many disciplines such as medicine and
epidemiology, genetics, biology, metrology etc.
1.2 Statement of the problem

Over the last decade, there has been alarming increase in the rate of recorded case of diabetes in
our various hospitals in Sokoto state. This has led to very serious questions on the minds of the
citizens in particular and the professionals in our medical field in general. Ordinary, since
diabetes fever is generally associated with a lot of symptoms such as fever, headache, loss of
weight etc. These however make it a complicated disease with a serious economic and social
effect on the victims. This study seeks to find the prevalent rate of diabetes and suggest possible
solutions of preventing it. Diabetes is a chronic health issue with devastating, yet preventable
consequence. It is characterized by high blood glucose levels resulting from defects in insulin
production, insulin action or both. Type 2 diabetes impacts men and women proportionately and
this rate is expected to increase greatly over the next half century.
1.3 Aim and objectives
The purpose of this study is to investigate the systolic and diastolic of diabetic patient as well as
the level or presence of sugar in their blood and urine through the following objectives:
i. To obtain the correlation coefficient between the domain of pulsation of the heart and
presence of sugar in both urine and blood.
ii. To conduct a significant test between the domains.
iii. To investigate the extent relationship existing between blood pressure and sugar level of
the patients.
1.4 Significance of the study

This project is significant in the sense that health problem imposed by diabetes in this part of the
world make it a health issue that requires adequate attention and care. This research work will be
of great significant in the following ways:
i. This study will help the hospital management, the state government and the general
public to know the particular relationship that is existing between low and high blood
pressure in diabetic patients in order to carry out enlightenment campaign towards
reducing the incidence of the disease.
ii. It will help to determine the descriminant analysis of the reported cases of the diseases in
the Sokoto state.
1.5 Scope and limitation

This study is limited to the data obtained from Sokoto state specialist hospital in the department
of medical record based on the issue of blood pressure and level of sugar in urine and blood
excluding other issues that are not mentioned.

CHAPTER TWO
LITERATURE REVIEW
2.1 Review of Previous Literature

Rahayu et al., (2019) in “Classification of diabetes events using discriminant analysis” the study
aims to classify diabetes events accurately, because it can be used as early prevention before
complications occur. Based on linear discriminant analysis, it gets that someone who have more
weight, lower age, and more cholesterol level will make s/he classified into diabetes patient.
Then, based on APER test, it gets results the percentage of misclassification is 14%. Therefore,
classification of diabetes case using discriminant analysis can be used for the classification of
diabetics, because the accuracy has a reasonably high result. Classification with discriminant
analysis is expected to be applicable to other diseases datasets.
Galán et al., (2023) “Discriminant Model for Insulin Resistance in Type 2 Diabetic Patients”
Patients with type 2 diabetes mellitus tend to have insulin resistance, a condition that is evaluated
using expensive methods that are not easily accessible in routine clinical practice. Objective: To
determine the anthropometric, clinical, and metabolic parameters that allow for the
discrimination of type 2 diabetic patients who have insulin resistance from those who do not. A
cross-sectional analytical observational study was carried out in 92 type 2 diabetic patients. A
discriminant analysis was applied using the SPSS statistical package to establish the
characteristics that differentiate type 2 diabetic patients with insulin resistance from those
without it. Results: Most of the variables analyzed in this study have a statistically significant
association with the HOMA-IR. However, only HDL-c, LDL-c, glycemia, BMI, and tobacco
exposure time allow for the discrimination of type 2 diabetic patients who have insulin resistance
from those who do not, considering the interaction between them. According to the absolute
value of the structure matrix, the variable that contributes most to the discriminant model is
HDL-c (􀀀0.69). Conclusions: The association between HDL-c, LDL-c, glycemia, BMI, and
tobacco exposure time allows for the discrimination of type 2 diabetic patients who have insulin
resistance from those who do not. This constitutes a simple model that can be used in routine
clinical practice.
Dibal et al., (2020) “On the Application of Linear Discriminant Function to Evaluate Data on
Diabetic Patients at the University of Port Harcourt Teaching Hospital, Rivers, Nigeria” Many
real life events involves several interacting variables, hence multivariate statistical tool is
necessary for appropriate analysis and interpretation. Discriminant analysis (DA) is one of the
commonly used multivariate methods in various fields of study including education, finance,
environment, medicine etc., where complex data analysis and interpretation is required. These
papers demonstrates and illustrate approaches in presenting how the discriminant analysis can be
carried out on 335 (40 diabetics and 295 non-diabetic) patients and how the output can be
interpreted using the Fisher’s linear Discriminant function (FLDF). The performance of FLDF
was adjudged based on the percentage of correct reclassification of the original observation to
yield the discriminant scores from the functions. Up to 65.4% correct classifications were
achieved and 62.7% percent of the cross-validated grouped cases were correctly classified into
either being a Diabetic or nondiabetic patient. Patient’s age and gender were found to be the two
most important contributing variables in classifying a patient between the two groups.
Alayande et al., (2015) stated in their study "Application of Discriminant Analysis in Data
Analysis" The paper shows that Discriminant analysis as a general research technique can be
very useful in the investigation of various aspects of a multi-variate research problem. It is
sometimes preferable than logistic regression especially when the sample size is very small and
the assumptions are met. Application of it to the failed industry in Nigeria shows that the derived
model appeared outperform previous model build since the model can exhibit true ex ante
predictive ability for a period of about 3 years subsequent.
Humera et al (2015) stated in their studies “Application of Discriminant Analysis to Predict the
Institute’s Annual Performance in Sargodha Board" This study has been carried out on the
annual performance of 2 Year (12 year education) results ndth 2014 of various institutes
affiliated with Sargodha board. Discriminant analysis has been used for achieving sharper
discrimination between two categories (group 1: institutes having annual results below board
average result and group 2: institutes having annual results above board average result) based on
disciplines of institutes, gender, status of institutes and geographical area of four districts in
Sargodha division. Successful discrimination is made between institutes with results below or
above board average. Clustering annual results of institutes under two categories is statistically
significant on the basis of discriminant analysis. The data is obtained from managing body of
Sargodha board. Analyzing set of seven variables shows five of them significantly help in
discriminating between institutes with result below or above board average. Therefore it is
suggested to reclassified institutes who were misclassified under results below board average
result.
Discriminant analysis focuses on the association between multiple independent variables and a
categorical dependent variable to create rules for separating distinct groups as much as possible,
and for assigning an observation of unknown origin to one of distinct preexisting groups. Fisher's
linear discriminant function (FLDF) is frequently used for discrimination, classification and
prediction purposes under the usual basic statistical assumptions of; multivariate normality of the
independent variables, equality of variance and covariance matrices, and relative equality of
groups sample sizes. According to, the linear discriminant analysis (LDA) is suitable for
supervised classification when the number of observations, is larger than the number of
variables. However, with the development of new technologies, there has been increase in
complex problems with high-dimensionality, a situation where the number of variables (the
dimension of the data vectors) is much larger than the number of observations (sample size), that
is in many disciplines such as medicine and epidemiology, genetics, biology, metrology etc.
There are many proposed methods for the analysis high dimensional data where such as K-D tree
and R tree, however, their performance and efficiency decrease as the dimensionality increases
because the methods are designed to operate with small dimensionality. When discriminating
between two groups, the analysis assumes the two samples or populations have the same
covariance matrix Σ but distinct mean vectors with variables, where the discriminant function
that maximizes the separation of the groups is the linear combination of the variables.
Discriminant analysis focuses on the association between multiple independent variables and a
categorical dependent variable by forming a composite of the independent variables. Fisher's
approach transforms the observations to univariate such that the obtained from the two
populations were adequately separated. The objective is to select linear combination of _ that
maximize the ratio of squared distance between sample means of to its variance. This study uses
the method of discriminant analysis on health-related data collected from the University of Port
Harcourt Teaching Hospital, Port Harcourt, Rivers State, Nigeria.

CHAPTER THREE
METHODOLOGY
3.0 Introduction
Collection of data is the cornerstone of any project work that must be carried out because it can
be seen as those necessary facts and information which are gotten before the analysis is being
carried out.
3.1 Sources of Data Collection

The main source of data collection in this research work is the secondary data, which involves
documentary sources from Maryam Abacha hospital.
3.2 Method of Data Collection

The method of data collection employed in this research work is the transcription from records of
diabetic patients in medical record department.
3.3 Population Size
The population size of this project research covers only on the blood pressure and sugar level in
both urine and blood of the patient.
3.0 Introduction
Collection of data is the cornerstone of any project work that must be carried out because it can
be seen as those necessary facts and information which are gotten before the analysis is being
carried out.
3.1 Sources Of Data Collection

The main source of data collection in this research work is the secondary data, which involves
documentary sources from the Sokoto State Police Command.
3.2 Method Of Data Collection

The method of data collection employed in this research work is the transcription from records of
crimes and examination of the crime scene.
3.3 Design Of The Study

This study is designed to analyze the significant difference between offence against property and
other offences.
3.4 Population Size

The population size of this project research covers only on the offences against persons, offences
against persons and other offences.
3.4 Analysis technique

Suppose we have two multivariate normal populations with equal variance-covariance matrices,
N(μ1,Ʃ) and N(μ2,Ʃ); where μi(i=1,2)=( μ1, μ2,……, μp)’ represents the vector of means of the ith
population and Ʃ is the variance-covariance matrices of the two populations. The pdf of ith
population is given below:
f ( X i )=
1
p
2
( 2 π ) |Σ|
1
2
[
exp −
1
2
' −1
]
( X−μi ) Σ ( X−μi ) .. . .. .. .. . .. .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .(1)
The ratio of the densities of two multivariate normal populations donated by f(X1) and f(X2), we
have;
f ( X 1)
=
exp −
[ 1
2
( X −μ 1 )' Σ−1 ( X−μ1 )
] ≥k
f ( X 2) 1
[
exp − ( X −μ 2 ) ' Σ−1 ( X−μ2
2
)]
∴exp −
[ 1
2
( ]
X−μ1 )' Σ−1 ( X−μ 1 ) −( X −μ2 ) ' Σ−1 ( X−μ2 ) ≥k .. .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .(2)
Taking the natural logarithms of equation (2) above; which is monotone increasing we have:
1
− ' −1 ' −1
2 ( X−μ1 ) Σ ( X−μ 1 ) −( X −μ 2 ) Σ ( X−μ2 )≥log k . . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .. . .. .. .(3 )
The second term of (3) above is the Mahalanobis square distance between N(μ1,Ʃ) and N(μ2,Ʃ).
For k suitable chosen (which of course can be one and then logk will be zero), the LHS of (3) can
be expanded and rearranged to obtain the following:
1
X ' Σ−1 ( μ1 −μ2 )− ( μ 1 +μ2 ) ' Σ−1 ( μ 1−μ 2 ) ≥log k .. .. . .. .. . .. .. . .. .. ... .. ...... .. .. ... .. .. . .. .. . .. .(4 )
2
The first term (4) above is the well known fisher’s linear discriminant function which is linear in
the component of the observation vector.
The best regions of classification into π 1 and π 2 are given by:
Classify as π 1 if:
1
X ' Σ−1 ( μ1 −μ2 )− ( μ 1 +μ2 ) ' Σ−1 ( μ 1−μ 2 ) ≥log k
2 ……………………………………(5)
1
X ' Σ−1 ( μ1 −μ2 )− ( μ 1 + μ2 ) ' Σ−1 ( μ 1−μ 2 )≺log k
2 ………………………………….(6)
If a prior probabilities q1 and q2 are known, then k is given by:
q2 C (1 /2 )
k=
q1 C ( 2 /1 ) ……………………………………………………………………..(7)
Where; C(1/2) is the cost of misclassifying an observation into π 1 instead of π 2 and C(2/1) is the
cost of misclassifying an observation into π 2 instead of π 1 . Also, q1 and q2 are the prior
probabilities of π 1 and π 2 respectively. But if the two populations are equally likely, and the
costs misclassifications being equal, k=1 and log k=0. Hence, the region of classification into π 1
and π 2 can further is simplified as follows:
1
X ' Σ−1 ( μ1 −μ2 )≥− ( μ 1 + μ2 )' Σ −1 ( μ 1−μ 2 )
2 …………………………………………………….(8)
1
X ' Σ−1 ( μ1 −μ2 )≺− ( μ1 + μ2 ) ' Σ−1 ( μ 1−μ2 )
2 ………………………………………………(9)
Where
−−
μ1 can be estimated by X 1
−−
μ2 can be estimated by X 2
Ʃ can be estimated by the pooled variance

Sp
Where
( X −X )……………………………………………(10)
__ __
Y = X ' S−1
P 1 2
( ) ( )
__ ' __
¿ __ __
1
M= X 1 + X 2 S−1
P X 1− X 2
2 ………………………………..(11)
X’=(X1 X2)
( n1 −1 ) S 1 + ( n2 −1 ) S 2
S P=
n1 + n2 −2 …………………………………………..(12)
If n1=n2 then the estimate of the pooled variance Sp above becomes
S 1+ S 2
S p=
2 ……………………………………………………..(13)
S1 and S2 are the respective sample variance covariance matrices of the two populations.
For two variables for instance p=2, the sample mean vectors and sample variance-covariance
matrices could be illustrated as follows:

( )
__
__ X 11
X 1= __
X 21
…………………………………………………………………(14)
Is the sample mean vector for population I.
S1 =
( S 111
S 211
S 121
S 221 ) ………………………………………………………(15)
Is the sample covariance matrix for population I.
( )
__
__ X 12
X 2= __
X 22
…………………………………………………………………(16)
Is the sample mean vector for population II.
S2 =
( S 112
S 212
S 122
S 222 ) ………………………………………………………(17)
Is the sample covariance matrix for population II.

REFERENCE
Alayande S., Ayinla Bashiru and Kehinde Adekunle (2015) “An Overview and Application of
Discriminant Analysis in Data Analysis” IOSR Journal of Mathematics (IOSR-JM) e-
ISSN: 2278-5728, p-ISSN: 2319-765X. Volume 11, Issue 1 Ver. V (Jan - Feb. 2015), PP
12-15 www.iosrjournals.org DOI: 10.9790/5728-11151215 www.iosrjournals.org 12 |
Page
Erislandis López-Galán, Rafael Barrio-Deler, Manuel Alejandro Fernández-Fernández, Humera
Razzak, Mehboob Ali and Maqsood Ali (2015) “Application of Discriminant Analysis to
Predict The Institute’s Annual Performance in Sargodha Board” World Applied
Sciences Journal 33 (2): 213-219, 2015 ISSN 1818-4952 IDOSI Publications, 2015 DOI:
10.5829/idosi.wasj.2015.33.02.1
Nicholas Pindar Dibal, Christopher Akas Abraham (2020) “On the Application of Linear
Discriminant Function to Evaluate Data on Diabetic Patients at the University of Port
Harcourt Teaching Hospital, Rivers, Nigeria”. American Journal of Theoretical and
Applied Statistics. Vol. 9, No. 3, 2020, pp. 53-56. doi: 10.11648/j.ajtas.20200903.14
Received: April 16, 2020; Accepted: May 3, 2020; Published: May 18, 2020
Usman, A. (2012). Statistics Method for Biometric and Medical Research. Kaduna, Nigeria:
Millennium Printing and Publishing Company Limited, pp. 1-10 and 53-67.
Usman, A. (2015). Statistics Method for Biometric and Medical Research. Kaduna, Nigeria:
Millennium Printing and Publishing Company Limited, pp. 1-10 and 53-67.
W. Rahayu, V. M. Santi and B. S. Putri (2019) “Classification of diabetes events using

discriminant analysis” 4th Annual Applied Science and Engineering Conference
doi:10.1088/1742-6596/1402/7/077102.
Yaquelin Del Toro-Delgado, Isaac Enrique Peñuela-Puente, Miguel Enrique Sánchez-

Hechavarría, and Gustavo Alejandro Muñoz-Bustos (2023) “Discriminant Model for
Insulin Resistance in Type 2 Diabetic Patients” Medicina 2023, 59, 839.
https://doi.org/10.3390/medicina59050839 https://www.mdpi.com/journal/medicina

UKASHA ABDUL-KADIR

Uploaded by

Copyright:

Available Formats

UKASHA ABDUL-KADIR

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

UKASHA ABDUL-KADIR

Uploaded by

Copyright:

Available Formats

CHAPTER ONE

diabetes as a preventive method. In this research, we take advantage of machine learning

statistical assumptions of; multivariate normality of the independent variables, equality of

epidemiology, genetics, biology, metrology etc.

1.2 Statement of the problem

presence of sugar in both urine and blood.

ii. To conduct a significant test between the domains.

1.4 Significance of the study

of great significant in the following ways:

pressure in diabetic patients in order to carry out enlightenment campaign towards

reducing the incidence of the disease.

the Sokoto state.

1.5 Scope and limitation

excluding other issues that are not mentioned.

2.1 Review of Previous Literature

analysis is expected to be applicable to other diseases datasets.

very useful in the investigation of various aspects of a multi-variate research problem. It is

predictive ability for a period of about 3 years subsequent.

categorical dependent variable by forming a composite of the independent variables. Fisher's

Harcourt Teaching Hospital, Port Harcourt, Rivers State, Nigeria.

3.1 Sources of Data Collection

3.2 Method of Data Collection

both urine and blood of the patient.

3.1 Sources Of Data Collection

3.2 Method Of Data Collection

3.3 Design Of The Study

3.4 Population Size

against persons and other offences.

3.4 Analysis technique

population is given below:

be expanded and rearranged to obtain the following:

the component of the observation vector.

The best regions of classification into π 1 and π 2 are given by:

If a prior probabilities q1 and q2 are known, then k is given by:

and π 2 can further is simplified as follows:

Ʃ can be estimated by the pooled variance

If n1=n2 then the estimate of the pooled variance Sp above becomes

matrices could be illustrated as follows:

Is the sample mean vector for population I.

Is the sample covariance matrix for population I.

Is the sample mean vector for population II.

Is the sample covariance matrix for population II.

W. Rahayu, V. M. Santi and B. S. Putri (2019) “Classification of diabetes events using

Yaquelin Del Toro-Delgado, Isaac Enrique Peñuela-Puente, Miguel Enrique Sánchez-

You might also like