UKASHA ABDUL-KADIR
UKASHA ABDUL-KADIR
UKASHA ABDUL-KADIR
INTRODUCTION
1.1 Background to the study
A condition when high levels of blood sugar (blood glucose) caused metabolic disease is namely
diabetes. Diabetes makes the human body cannot properly respond to insulin. The hormone
insulin is responsible for controlling the equilibrium of blood sugar levels. Hyperglycemia, or
elevated blood glucose, is a sign that the insulin is not functioning correctly. The term is
impaired glucose tolerance (IGT). The effect of diabetes suffering over time has serious risks
such as kidney disease (CKD), heart disease (CVD), damaged blood vessels, blindness (diabetes
retinopathy), and skin problems. So, it is included as one of the most dangerous diseases in the
world. According to the WHO, 422 million individuals in low- and middle-income countries
have diabetes. It is predicted that there will be 642 million cases of diabetes worldwide in 2040.
Diabetes comes in two varieties: type 1 and type 2. Type 1 occurs because the pancreas produces
little insulin, and type 2 is the human body resists insulin so there is not enough insulin in the
body. A reason that causes diabetes is bad consumption habits. As an example, Indonesian
people like to consume high calories food without protein, vitamins, and fat balancing. In
addition, psychological factors like mental health conditions, cognitive dysfunction, personality
traits, and quality of life can also contribute to it. A prolonged period of high blood glucose
concentration is the primary sign of diabetes, frequent urination, blurry eyes, abnormal loss of
weight and feel hungry too quickly. Diabetes can be detected from urine checks, blood pressure
tests, kidney tests and biopsy. But studies show that 40% of diabetic subjects do not show initial
screening. Diabetes also called as a silent killer’s because the undetectable symptoms. One
concern of diabetes is that it can cause several health issues or possibly early mortality if it is not
identified and treated promptly. Therefore, it is necessary to have a good system to detect
been used extensively in the healthcare industry. Machine learning has a great deal of potential to
raise human standards and move toward a peaceful existence due to its dependability and
efficiency. There is a lot of data stored in different formats because of the current digitalization
era. Machine learning can use this data to examine patterns or hidden knowledge within a set of
data. Since machine learning is a data-driven approach, learning requires data. There are
algorithms in the machine learning method that was used in previous research to predict diabetes
disease. Such as support vector machine (SVM) reach accuracy about 84.10%, 100%, and 82%.
SVM reliably handles data that is separated both linearly and nonlinearly by constructing a hyper
plane that divides data classes. An algorithm known as random forest (RF) uses multiple
decision trees to produce decisions and conduct majority voting. RF could reach accuracy about
84%; 79%; and 98%. Logistic model tree (LMT) is a tree model based on logistic function. This
method can reach accuracy = 79.31%. Fuzzy is one of soft computing method that reaches
accuracy about 79.8%. Artificial neuro fuzzy inference system (ANFIS) reach accuracy =
84.48%. Decision tree (DT) is tree based model to do prediction and reach accuracy about
91.2%. It is proven that machine learning can be implemented to predict diabetes disease.
Multivariate analysis is a statistical technique for the analysis of two or more variables observed
from one or more sample objects. It consists of methods that are suitable for application when
several measurements are taken on each unit in one or more samples. The quality and validity of
information obtained about the population of study increases with the number of variables
measured on each sampled unit, and adequately modeled to explain variations in the response
variable. The discriminant function procedure introduced by Fisher [9] is one of the linear
models used for the analysis of multivariate data; it gives the best linear function for
discriminating the explanatory variables. Discriminant analysis focuses on the association
between multiple independent variables and a categorical dependent variable to create rules for
separating distinct groups as much as possible, and for assigning an observation of unknown
origin to one of _ ≥ 2 distinct preexisting groups. Fisher's linear discriminant function (FLDF) is
frequently used for discrimination, classification and prediction purposes under the usual basic
variance and covariance matrices, and relative equality of groups sample sizes. According to the
linear discriminant analysis (LDA) is suitable for supervised classification when the number of
observations, _ is larger than the number of variables. However, with the development of new
technologies, there has been increase in complex problems with high-dimensionality, a situation
where the number of variables (the dimension of the data vectors) is much larger than the
number of observations (sample size), that is __ < __ in many disciplines such as medicine and
our various hospitals in Sokoto state. This has led to very serious questions on the minds of the
citizens in particular and the professionals in our medical field in general. Ordinary, since
diabetes fever is generally associated with a lot of symptoms such as fever, headache, loss of
weight etc. These however make it a complicated disease with a serious economic and social
effect on the victims. This study seeks to find the prevalent rate of diabetes and suggest possible
solutions of preventing it. Diabetes is a chronic health issue with devastating, yet preventable
consequence. It is characterized by high blood glucose levels resulting from defects in insulin
production, insulin action or both. Type 2 diabetes impacts men and women proportionately and
this rate is expected to increase greatly over the next half century.
1.3 Aim and objectives
The purpose of this study is to investigate the systolic and diastolic of diabetic patient as well as
the level or presence of sugar in their blood and urine through the following objectives:
i. To obtain the correlation coefficient between the domain of pulsation of the heart and
iii. To investigate the extent relationship existing between blood pressure and sugar level of
the patients.
world make it a health issue that requires adequate attention and care. This research work will be
i. This study will help the hospital management, the state government and the general
public to know the particular relationship that is existing between low and high blood
ii. It will help to determine the descriminant analysis of the reported cases of the diseases in
of medical record based on the issue of blood pressure and level of sugar in urine and blood
aims to classify diabetes events accurately, because it can be used as early prevention before
complications occur. Based on linear discriminant analysis, it gets that someone who have more
weight, lower age, and more cholesterol level will make s/he classified into diabetes patient.
Then, based on APER test, it gets results the percentage of misclassification is 14%. Therefore,
classification of diabetes case using discriminant analysis can be used for the classification of
diabetics, because the accuracy has a reasonably high result. Classification with discriminant
Galán et al., (2023) “Discriminant Model for Insulin Resistance in Type 2 Diabetic Patients”
Patients with type 2 diabetes mellitus tend to have insulin resistance, a condition that is evaluated
using expensive methods that are not easily accessible in routine clinical practice. Objective: To
determine the anthropometric, clinical, and metabolic parameters that allow for the
discrimination of type 2 diabetic patients who have insulin resistance from those who do not. A
cross-sectional analytical observational study was carried out in 92 type 2 diabetic patients. A
discriminant analysis was applied using the SPSS statistical package to establish the
characteristics that differentiate type 2 diabetic patients with insulin resistance from those
without it. Results: Most of the variables analyzed in this study have a statistically significant
association with the HOMA-IR. However, only HDL-c, LDL-c, glycemia, BMI, and tobacco
exposure time allow for the discrimination of type 2 diabetic patients who have insulin resistance
from those who do not, considering the interaction between them. According to the absolute
value of the structure matrix, the variable that contributes most to the discriminant model is
HDL-c (0.69). Conclusions: The association between HDL-c, LDL-c, glycemia, BMI, and
tobacco exposure time allows for the discrimination of type 2 diabetic patients who have insulin
resistance from those who do not. This constitutes a simple model that can be used in routine
clinical practice.
Dibal et al., (2020) “On the Application of Linear Discriminant Function to Evaluate Data on
Diabetic Patients at the University of Port Harcourt Teaching Hospital, Rivers, Nigeria” Many
real life events involves several interacting variables, hence multivariate statistical tool is
necessary for appropriate analysis and interpretation. Discriminant analysis (DA) is one of the
commonly used multivariate methods in various fields of study including education, finance,
environment, medicine etc., where complex data analysis and interpretation is required. These
papers demonstrates and illustrate approaches in presenting how the discriminant analysis can be
carried out on 335 (40 diabetics and 295 non-diabetic) patients and how the output can be
interpreted using the Fisher’s linear Discriminant function (FLDF). The performance of FLDF
was adjudged based on the percentage of correct reclassification of the original observation to
yield the discriminant scores from the functions. Up to 65.4% correct classifications were
achieved and 62.7% percent of the cross-validated grouped cases were correctly classified into
either being a Diabetic or nondiabetic patient. Patient’s age and gender were found to be the two
most important contributing variables in classifying a patient between the two groups.
Alayande et al., (2015) stated in their study "Application of Discriminant Analysis in Data
Analysis" The paper shows that Discriminant analysis as a general research technique can be
sometimes preferable than logistic regression especially when the sample size is very small and
the assumptions are met. Application of it to the failed industry in Nigeria shows that the derived
model appeared outperform previous model build since the model can exhibit true ex ante
Humera et al (2015) stated in their studies “Application of Discriminant Analysis to Predict the
Institute’s Annual Performance in Sargodha Board" This study has been carried out on the
annual performance of 2 Year (12 year education) results ndth 2014 of various institutes
affiliated with Sargodha board. Discriminant analysis has been used for achieving sharper
discrimination between two categories (group 1: institutes having annual results below board
average result and group 2: institutes having annual results above board average result) based on
disciplines of institutes, gender, status of institutes and geographical area of four districts in
Sargodha division. Successful discrimination is made between institutes with results below or
above board average. Clustering annual results of institutes under two categories is statistically
significant on the basis of discriminant analysis. The data is obtained from managing body of
Sargodha board. Analyzing set of seven variables shows five of them significantly help in
discriminating between institutes with result below or above board average. Therefore it is
suggested to reclassified institutes who were misclassified under results below board average
result.
Discriminant analysis focuses on the association between multiple independent variables and a
categorical dependent variable to create rules for separating distinct groups as much as possible,
and for assigning an observation of unknown origin to one of distinct preexisting groups. Fisher's
linear discriminant function (FLDF) is frequently used for discrimination, classification and
prediction purposes under the usual basic statistical assumptions of; multivariate normality of the
independent variables, equality of variance and covariance matrices, and relative equality of
groups sample sizes. According to, the linear discriminant analysis (LDA) is suitable for
supervised classification when the number of observations, is larger than the number of
variables. However, with the development of new technologies, there has been increase in
complex problems with high-dimensionality, a situation where the number of variables (the
dimension of the data vectors) is much larger than the number of observations (sample size), that
is in many disciplines such as medicine and epidemiology, genetics, biology, metrology etc.
There are many proposed methods for the analysis high dimensional data where such as K-D tree
and R tree, however, their performance and efficiency decrease as the dimensionality increases
because the methods are designed to operate with small dimensionality. When discriminating
between two groups, the analysis assumes the two samples or populations have the same
covariance matrix Σ but distinct mean vectors with variables, where the discriminant function
that maximizes the separation of the groups is the linear combination of the variables.
Discriminant analysis focuses on the association between multiple independent variables and a
approach transforms the observations to univariate such that the obtained from the two
populations were adequately separated. The objective is to select linear combination of _ that
maximize the ratio of squared distance between sample means of to its variance. This study uses
the method of discriminant analysis on health-related data collected from the University of Port
be seen as those necessary facts and information which are gotten before the analysis is being
carried out.
3.0 Introduction
Collection of data is the cornerstone of any project work that must be carried out because it can
be seen as those necessary facts and information which are gotten before the analysis is being
carried out.
N(μ1,Ʃ) and N(μ2,Ʃ); where μi(i=1,2)=( μ1, μ2,……, μp)’ represents the vector of means of the ith
population and Ʃ is the variance-covariance matrices of the two populations. The pdf of ith
f ( X i )=
1
p
2
( 2 π ) |Σ|
1
2
[
exp −
1
2
' −1
]
( X−μi ) Σ ( X−μi ) .. . .. .. .. . .. .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .(1)
The ratio of the densities of two multivariate normal populations donated by f(X1) and f(X2), we
have;
f ( X 1)
=
exp −
[ 1
2
( X −μ 1 )' Σ−1 ( X−μ1 )
] ≥k
f ( X 2) 1
[
exp − ( X −μ 2 ) ' Σ−1 ( X−μ2
2
)]
∴exp −
[ 1
2
( ]
X−μ1 )' Σ−1 ( X−μ 1 ) −( X −μ2 ) ' Σ−1 ( X−μ2 ) ≥k .. .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .(2)
Taking the natural logarithms of equation (2) above; which is monotone increasing we have:
1
− ' −1 ' −1
2 ( X−μ1 ) Σ ( X−μ 1 ) −( X −μ 2 ) Σ ( X−μ2 )≥log k . . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .. . .. .. .(3 )
The second term of (3) above is the Mahalanobis square distance between N(μ1,Ʃ) and N(μ2,Ʃ).
For k suitable chosen (which of course can be one and then logk will be zero), the LHS of (3) can
1
X ' Σ−1 ( μ1 −μ2 )− ( μ 1 +μ2 ) ' Σ−1 ( μ 1−μ 2 ) ≥log k .. .. . .. .. . .. .. . .. .. ... .. ...... .. .. ... .. .. . .. .. . .. .(4 )
2
The first term (4) above is the well known fisher’s linear discriminant function which is linear in
Classify as π 1 if:
1
X ' Σ−1 ( μ1 −μ2 )− ( μ 1 +μ2 ) ' Σ−1 ( μ 1−μ 2 ) ≥log k
2 ……………………………………(5)
1
X ' Σ−1 ( μ1 −μ2 )− ( μ 1 + μ2 ) ' Σ−1 ( μ 1−μ 2 )≺log k
2 ………………………………….(6)
q2 C (1 /2 )
k=
q1 C ( 2 /1 ) ……………………………………………………………………..(7)
Where; C(1/2) is the cost of misclassifying an observation into π 1 instead of π 2 and C(2/1) is the
cost of misclassifying an observation into π 2 instead of π 1 . Also, q1 and q2 are the prior
probabilities of π 1 and π 2 respectively. But if the two populations are equally likely, and the
costs misclassifications being equal, k=1 and log k=0. Hence, the region of classification into π 1
Classify as π 1 if:
1
X ' Σ−1 ( μ1 −μ2 )≥− ( μ 1 + μ2 )' Σ −1 ( μ 1−μ 2 )
2 …………………………………………………….(8)
Classify as π 1 if:
1
X ' Σ−1 ( μ1 −μ2 )≺− ( μ1 + μ2 ) ' Σ−1 ( μ 1−μ2 )
2 ………………………………………………(9)
Where
−−
μ1 can be estimated by X 1
−−
μ2 can be estimated by X 2
Where
( X −X )……………………………………………(10)
__ __
Y = X ' S−1
P 1 2
( ) ( )
__ ' __
¿ __ __
1
M= X 1 + X 2 S−1
P X 1− X 2
2 ………………………………..(11)
X’=(X1 X2)
( n1 −1 ) S 1 + ( n2 −1 ) S 2
S P=
n1 + n2 −2 …………………………………………..(12)
S 1+ S 2
S p=
2 ……………………………………………………..(13)
S1 and S2 are the respective sample variance covariance matrices of the two populations.
For two variables for instance p=2, the sample mean vectors and sample variance-covariance
S1 =
( S 111
S 211
S 121
S 221 ) ………………………………………………………(15)
( )
__
__ X 12
X 2= __
X 22
…………………………………………………………………(16)
S2 =
( S 112
S 212
S 122
S 222 ) ………………………………………………………(17)
Nicholas Pindar Dibal, Christopher Akas Abraham (2020) “On the Application of Linear
Discriminant Function to Evaluate Data on Diabetic Patients at the University of Port
Harcourt Teaching Hospital, Rivers, Nigeria”. American Journal of Theoretical and
Applied Statistics. Vol. 9, No. 3, 2020, pp. 53-56. doi: 10.11648/j.ajtas.20200903.14
Received: April 16, 2020; Accepted: May 3, 2020; Published: May 18, 2020
Usman, A. (2012). Statistics Method for Biometric and Medical Research. Kaduna, Nigeria:
Millennium Printing and Publishing Company Limited, pp. 1-10 and 53-67.
Usman, A. (2015). Statistics Method for Biometric and Medical Research. Kaduna, Nigeria:
Millennium Printing and Publishing Company Limited, pp. 1-10 and 53-67.