UKASHA ABDUL-KADIR

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 14

CHAPTER ONE

INTRODUCTION
1.1 Background to the study
A condition when high levels of blood sugar (blood glucose) caused metabolic disease is namely

diabetes. Diabetes makes the human body cannot properly respond to insulin. The hormone

insulin is responsible for controlling the equilibrium of blood sugar levels. Hyperglycemia, or

elevated blood glucose, is a sign that the insulin is not functioning correctly. The term is

impaired glucose tolerance (IGT). The effect of diabetes suffering over time has serious risks

such as kidney disease (CKD), heart disease (CVD), damaged blood vessels, blindness (diabetes

retinopathy), and skin problems. So, it is included as one of the most dangerous diseases in the

world. According to the WHO, 422 million individuals in low- and middle-income countries

have diabetes. It is predicted that there will be 642 million cases of diabetes worldwide in 2040.

Diabetes comes in two varieties: type 1 and type 2. Type 1 occurs because the pancreas produces

little insulin, and type 2 is the human body resists insulin so there is not enough insulin in the

body. A reason that causes diabetes is bad consumption habits. As an example, Indonesian

people like to consume high calories food without protein, vitamins, and fat balancing. In

addition, psychological factors like mental health conditions, cognitive dysfunction, personality

traits, and quality of life can also contribute to it. A prolonged period of high blood glucose

concentration is the primary sign of diabetes, frequent urination, blurry eyes, abnormal loss of

weight and feel hungry too quickly. Diabetes can be detected from urine checks, blood pressure

tests, kidney tests and biopsy. But studies show that 40% of diabetic subjects do not show initial

screening. Diabetes also called as a silent killer’s because the undetectable symptoms. One

concern of diabetes is that it can cause several health issues or possibly early mortality if it is not

identified and treated promptly. Therefore, it is necessary to have a good system to detect

diabetes as a preventive method. In this research, we take advantage of machine learning


classifier implementation to predict diabetes early. In the last ten years, machine learning has

been used extensively in the healthcare industry. Machine learning has a great deal of potential to

raise human standards and move toward a peaceful existence due to its dependability and

efficiency. There is a lot of data stored in different formats because of the current digitalization

era. Machine learning can use this data to examine patterns or hidden knowledge within a set of

data. Since machine learning is a data-driven approach, learning requires data. There are

algorithms in the machine learning method that was used in previous research to predict diabetes

disease. Such as support vector machine (SVM) reach accuracy about 84.10%, 100%, and 82%.

SVM reliably handles data that is separated both linearly and nonlinearly by constructing a hyper

plane that divides data classes. An algorithm known as random forest (RF) uses multiple

decision trees to produce decisions and conduct majority voting. RF could reach accuracy about

84%; 79%; and 98%. Logistic model tree (LMT) is a tree model based on logistic function. This

method can reach accuracy = 79.31%. Fuzzy is one of soft computing method that reaches

accuracy about 79.8%. Artificial neuro fuzzy inference system (ANFIS) reach accuracy =

84.48%. Decision tree (DT) is tree based model to do prediction and reach accuracy about

91.2%. It is proven that machine learning can be implemented to predict diabetes disease.

Multivariate analysis is a statistical technique for the analysis of two or more variables observed

from one or more sample objects. It consists of methods that are suitable for application when

several measurements are taken on each unit in one or more samples. The quality and validity of

information obtained about the population of study increases with the number of variables

measured on each sampled unit, and adequately modeled to explain variations in the response

variable. The discriminant function procedure introduced by Fisher [9] is one of the linear

models used for the analysis of multivariate data; it gives the best linear function for
discriminating the explanatory variables. Discriminant analysis focuses on the association

between multiple independent variables and a categorical dependent variable to create rules for

separating distinct groups as much as possible, and for assigning an observation of unknown

origin to one of _ ≥ 2 distinct preexisting groups. Fisher's linear discriminant function (FLDF) is

frequently used for discrimination, classification and prediction purposes under the usual basic

statistical assumptions of; multivariate normality of the independent variables, equality of

variance and covariance matrices, and relative equality of groups sample sizes. According to the

linear discriminant analysis (LDA) is suitable for supervised classification when the number of

observations, _ is larger than the number of variables. However, with the development of new

technologies, there has been increase in complex problems with high-dimensionality, a situation

where the number of variables (the dimension of the data vectors) is much larger than the

number of observations (sample size), that is __ < __ in many disciplines such as medicine and

epidemiology, genetics, biology, metrology etc.

1.2 Statement of the problem


Over the last decade, there has been alarming increase in the rate of recorded case of diabetes in

our various hospitals in Sokoto state. This has led to very serious questions on the minds of the

citizens in particular and the professionals in our medical field in general. Ordinary, since

diabetes fever is generally associated with a lot of symptoms such as fever, headache, loss of

weight etc. These however make it a complicated disease with a serious economic and social

effect on the victims. This study seeks to find the prevalent rate of diabetes and suggest possible

solutions of preventing it. Diabetes is a chronic health issue with devastating, yet preventable

consequence. It is characterized by high blood glucose levels resulting from defects in insulin

production, insulin action or both. Type 2 diabetes impacts men and women proportionately and

this rate is expected to increase greatly over the next half century.
1.3 Aim and objectives
The purpose of this study is to investigate the systolic and diastolic of diabetic patient as well as

the level or presence of sugar in their blood and urine through the following objectives:

i. To obtain the correlation coefficient between the domain of pulsation of the heart and

presence of sugar in both urine and blood.

ii. To conduct a significant test between the domains.

iii. To investigate the extent relationship existing between blood pressure and sugar level of

the patients.

1.4 Significance of the study


This project is significant in the sense that health problem imposed by diabetes in this part of the

world make it a health issue that requires adequate attention and care. This research work will be

of great significant in the following ways:

i. This study will help the hospital management, the state government and the general

public to know the particular relationship that is existing between low and high blood

pressure in diabetic patients in order to carry out enlightenment campaign towards

reducing the incidence of the disease.

ii. It will help to determine the descriminant analysis of the reported cases of the diseases in

the Sokoto state.

1.5 Scope and limitation


This study is limited to the data obtained from Sokoto state specialist hospital in the department

of medical record based on the issue of blood pressure and level of sugar in urine and blood

excluding other issues that are not mentioned.


CHAPTER TWO
LITERATURE REVIEW

2.1 Review of Previous Literature


Rahayu et al., (2019) in “Classification of diabetes events using discriminant analysis” the study

aims to classify diabetes events accurately, because it can be used as early prevention before

complications occur. Based on linear discriminant analysis, it gets that someone who have more

weight, lower age, and more cholesterol level will make s/he classified into diabetes patient.

Then, based on APER test, it gets results the percentage of misclassification is 14%. Therefore,

classification of diabetes case using discriminant analysis can be used for the classification of

diabetics, because the accuracy has a reasonably high result. Classification with discriminant

analysis is expected to be applicable to other diseases datasets.

Galán et al., (2023) “Discriminant Model for Insulin Resistance in Type 2 Diabetic Patients”

Patients with type 2 diabetes mellitus tend to have insulin resistance, a condition that is evaluated

using expensive methods that are not easily accessible in routine clinical practice. Objective: To

determine the anthropometric, clinical, and metabolic parameters that allow for the

discrimination of type 2 diabetic patients who have insulin resistance from those who do not. A

cross-sectional analytical observational study was carried out in 92 type 2 diabetic patients. A

discriminant analysis was applied using the SPSS statistical package to establish the

characteristics that differentiate type 2 diabetic patients with insulin resistance from those

without it. Results: Most of the variables analyzed in this study have a statistically significant

association with the HOMA-IR. However, only HDL-c, LDL-c, glycemia, BMI, and tobacco

exposure time allow for the discrimination of type 2 diabetic patients who have insulin resistance

from those who do not, considering the interaction between them. According to the absolute

value of the structure matrix, the variable that contributes most to the discriminant model is
HDL-c (􀀀0.69). Conclusions: The association between HDL-c, LDL-c, glycemia, BMI, and

tobacco exposure time allows for the discrimination of type 2 diabetic patients who have insulin

resistance from those who do not. This constitutes a simple model that can be used in routine

clinical practice.

Dibal et al., (2020) “On the Application of Linear Discriminant Function to Evaluate Data on

Diabetic Patients at the University of Port Harcourt Teaching Hospital, Rivers, Nigeria” Many

real life events involves several interacting variables, hence multivariate statistical tool is

necessary for appropriate analysis and interpretation. Discriminant analysis (DA) is one of the

commonly used multivariate methods in various fields of study including education, finance,

environment, medicine etc., where complex data analysis and interpretation is required. These

papers demonstrates and illustrate approaches in presenting how the discriminant analysis can be

carried out on 335 (40 diabetics and 295 non-diabetic) patients and how the output can be

interpreted using the Fisher’s linear Discriminant function (FLDF). The performance of FLDF

was adjudged based on the percentage of correct reclassification of the original observation to

yield the discriminant scores from the functions. Up to 65.4% correct classifications were

achieved and 62.7% percent of the cross-validated grouped cases were correctly classified into

either being a Diabetic or nondiabetic patient. Patient’s age and gender were found to be the two

most important contributing variables in classifying a patient between the two groups.

Alayande et al., (2015) stated in their study "Application of Discriminant Analysis in Data

Analysis" The paper shows that Discriminant analysis as a general research technique can be

very useful in the investigation of various aspects of a multi-variate research problem. It is

sometimes preferable than logistic regression especially when the sample size is very small and

the assumptions are met. Application of it to the failed industry in Nigeria shows that the derived
model appeared outperform previous model build since the model can exhibit true ex ante

predictive ability for a period of about 3 years subsequent.

Humera et al (2015) stated in their studies “Application of Discriminant Analysis to Predict the

Institute’s Annual Performance in Sargodha Board" This study has been carried out on the

annual performance of 2 Year (12 year education) results ndth 2014 of various institutes

affiliated with Sargodha board. Discriminant analysis has been used for achieving sharper

discrimination between two categories (group 1: institutes having annual results below board

average result and group 2: institutes having annual results above board average result) based on

disciplines of institutes, gender, status of institutes and geographical area of four districts in

Sargodha division. Successful discrimination is made between institutes with results below or

above board average. Clustering annual results of institutes under two categories is statistically

significant on the basis of discriminant analysis. The data is obtained from managing body of

Sargodha board. Analyzing set of seven variables shows five of them significantly help in

discriminating between institutes with result below or above board average. Therefore it is

suggested to reclassified institutes who were misclassified under results below board average

result.

Discriminant analysis focuses on the association between multiple independent variables and a

categorical dependent variable to create rules for separating distinct groups as much as possible,

and for assigning an observation of unknown origin to one of distinct preexisting groups. Fisher's

linear discriminant function (FLDF) is frequently used for discrimination, classification and

prediction purposes under the usual basic statistical assumptions of; multivariate normality of the

independent variables, equality of variance and covariance matrices, and relative equality of

groups sample sizes. According to, the linear discriminant analysis (LDA) is suitable for
supervised classification when the number of observations, is larger than the number of

variables. However, with the development of new technologies, there has been increase in

complex problems with high-dimensionality, a situation where the number of variables (the

dimension of the data vectors) is much larger than the number of observations (sample size), that

is in many disciplines such as medicine and epidemiology, genetics, biology, metrology etc.

There are many proposed methods for the analysis high dimensional data where such as K-D tree

and R tree, however, their performance and efficiency decrease as the dimensionality increases

because the methods are designed to operate with small dimensionality. When discriminating

between two groups, the analysis assumes the two samples or populations have the same

covariance matrix Σ but distinct mean vectors with variables, where the discriminant function

that maximizes the separation of the groups is the linear combination of the variables.

Discriminant analysis focuses on the association between multiple independent variables and a

categorical dependent variable by forming a composite of the independent variables. Fisher's

approach transforms the observations to univariate such that the obtained from the two

populations were adequately separated. The objective is to select linear combination of _ that

maximize the ratio of squared distance between sample means of to its variance. This study uses

the method of discriminant analysis on health-related data collected from the University of Port

Harcourt Teaching Hospital, Port Harcourt, Rivers State, Nigeria.


CHAPTER THREE
METHODOLOGY
3.0 Introduction
Collection of data is the cornerstone of any project work that must be carried out because it can

be seen as those necessary facts and information which are gotten before the analysis is being

carried out.

3.1 Sources of Data Collection


The main source of data collection in this research work is the secondary data, which involves
documentary sources from Maryam Abacha hospital.

3.2 Method of Data Collection


The method of data collection employed in this research work is the transcription from records of
diabetic patients in medical record department.
3.3 Population Size
The population size of this project research covers only on the blood pressure and sugar level in

both urine and blood of the patient.

3.0 Introduction
Collection of data is the cornerstone of any project work that must be carried out because it can

be seen as those necessary facts and information which are gotten before the analysis is being

carried out.

3.1 Sources Of Data Collection


The main source of data collection in this research work is the secondary data, which involves
documentary sources from the Sokoto State Police Command.

3.2 Method Of Data Collection


The method of data collection employed in this research work is the transcription from records of
crimes and examination of the crime scene.

3.3 Design Of The Study


This study is designed to analyze the significant difference between offence against property and
other offences.

3.4 Population Size


The population size of this project research covers only on the offences against persons, offences

against persons and other offences.

3.4 Analysis technique


Suppose we have two multivariate normal populations with equal variance-covariance matrices,

N(μ1,Ʃ) and N(μ2,Ʃ); where μi(i=1,2)=( μ1, μ2,……, μp)’ represents the vector of means of the ith

population and Ʃ is the variance-covariance matrices of the two populations. The pdf of ith

population is given below:

f ( X i )=
1
p
2
( 2 π ) |Σ|
1
2
[
exp −
1
2
' −1
]
( X−μi ) Σ ( X−μi ) .. . .. .. .. . .. .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .(1)

The ratio of the densities of two multivariate normal populations donated by f(X1) and f(X2), we

have;

f ( X 1)
=
exp −
[ 1
2
( X −μ 1 )' Σ−1 ( X−μ1 )
] ≥k
f ( X 2) 1
[
exp − ( X −μ 2 ) ' Σ−1 ( X−μ2
2
)]

∴exp −
[ 1
2
( ]
X−μ1 )' Σ−1 ( X−μ 1 ) −( X −μ2 ) ' Σ−1 ( X−μ2 ) ≥k .. .. . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .(2)

Taking the natural logarithms of equation (2) above; which is monotone increasing we have:

1
− ' −1 ' −1
2 ( X−μ1 ) Σ ( X−μ 1 ) −( X −μ 2 ) Σ ( X−μ2 )≥log k . . .. .. . .. .. . .. .. . .. .. . .. .. .. . .. .. . .. .. . .. .. .(3 )
The second term of (3) above is the Mahalanobis square distance between N(μ1,Ʃ) and N(μ2,Ʃ).

For k suitable chosen (which of course can be one and then logk will be zero), the LHS of (3) can

be expanded and rearranged to obtain the following:

1
X ' Σ−1 ( μ1 −μ2 )− ( μ 1 +μ2 ) ' Σ−1 ( μ 1−μ 2 ) ≥log k .. .. . .. .. . .. .. . .. .. ... .. ...... .. .. ... .. .. . .. .. . .. .(4 )
2

The first term (4) above is the well known fisher’s linear discriminant function which is linear in

the component of the observation vector.

The best regions of classification into π 1 and π 2 are given by:

Classify as π 1 if:

1
X ' Σ−1 ( μ1 −μ2 )− ( μ 1 +μ2 ) ' Σ−1 ( μ 1−μ 2 ) ≥log k
2 ……………………………………(5)

1
X ' Σ−1 ( μ1 −μ2 )− ( μ 1 + μ2 ) ' Σ−1 ( μ 1−μ 2 )≺log k
2 ………………………………….(6)

If a prior probabilities q1 and q2 are known, then k is given by:

q2 C (1 /2 )
k=
q1 C ( 2 /1 ) ……………………………………………………………………..(7)

Where; C(1/2) is the cost of misclassifying an observation into π 1 instead of π 2 and C(2/1) is the

cost of misclassifying an observation into π 2 instead of π 1 . Also, q1 and q2 are the prior

probabilities of π 1 and π 2 respectively. But if the two populations are equally likely, and the

costs misclassifications being equal, k=1 and log k=0. Hence, the region of classification into π 1

and π 2 can further is simplified as follows:

Classify as π 1 if:
1
X ' Σ−1 ( μ1 −μ2 )≥− ( μ 1 + μ2 )' Σ −1 ( μ 1−μ 2 )
2 …………………………………………………….(8)

Classify as π 1 if:

1
X ' Σ−1 ( μ1 −μ2 )≺− ( μ1 + μ2 ) ' Σ−1 ( μ 1−μ2 )
2 ………………………………………………(9)

Where
−−
μ1 can be estimated by X 1

−−
μ2 can be estimated by X 2

Ʃ can be estimated by the pooled variance


Sp

Where

( X −X )……………………………………………(10)
__ __
Y = X ' S−1
P 1 2

( ) ( )
__ ' __
¿ __ __
1
M= X 1 + X 2 S−1
P X 1− X 2
2 ………………………………..(11)

X’=(X1 X2)

( n1 −1 ) S 1 + ( n2 −1 ) S 2
S P=
n1 + n2 −2 …………………………………………..(12)

If n1=n2 then the estimate of the pooled variance Sp above becomes

S 1+ S 2
S p=
2 ……………………………………………………..(13)

S1 and S2 are the respective sample variance covariance matrices of the two populations.

For two variables for instance p=2, the sample mean vectors and sample variance-covariance

matrices could be illustrated as follows:


( )
__
__ X 11
X 1= __
X 21
…………………………………………………………………(14)

Is the sample mean vector for population I.

S1 =
( S 111
S 211
S 121
S 221 ) ………………………………………………………(15)

Is the sample covariance matrix for population I.

( )
__
__ X 12
X 2= __
X 22
…………………………………………………………………(16)

Is the sample mean vector for population II.

S2 =
( S 112
S 212
S 122
S 222 ) ………………………………………………………(17)

Is the sample covariance matrix for population II.


REFERENCE
Alayande S., Ayinla Bashiru and Kehinde Adekunle (2015) “An Overview and Application of
Discriminant Analysis in Data Analysis” IOSR Journal of Mathematics (IOSR-JM) e-
ISSN: 2278-5728, p-ISSN: 2319-765X. Volume 11, Issue 1 Ver. V (Jan - Feb. 2015), PP
12-15 www.iosrjournals.org DOI: 10.9790/5728-11151215 www.iosrjournals.org 12 |
Page
Erislandis López-Galán, Rafael Barrio-Deler, Manuel Alejandro Fernández-Fernández, Humera
Razzak, Mehboob Ali and Maqsood Ali (2015) “Application of Discriminant Analysis to
Predict The Institute’s Annual Performance in Sargodha Board” World Applied
Sciences Journal 33 (2): 213-219, 2015 ISSN 1818-4952 IDOSI Publications, 2015 DOI:
10.5829/idosi.wasj.2015.33.02.1

Nicholas Pindar Dibal, Christopher Akas Abraham (2020) “On the Application of Linear
Discriminant Function to Evaluate Data on Diabetic Patients at the University of Port
Harcourt Teaching Hospital, Rivers, Nigeria”. American Journal of Theoretical and
Applied Statistics. Vol. 9, No. 3, 2020, pp. 53-56. doi: 10.11648/j.ajtas.20200903.14
Received: April 16, 2020; Accepted: May 3, 2020; Published: May 18, 2020

Usman, A. (2012). Statistics Method for Biometric and Medical Research. Kaduna, Nigeria:
Millennium Printing and Publishing Company Limited, pp. 1-10 and 53-67.

Usman, A. (2015). Statistics Method for Biometric and Medical Research. Kaduna, Nigeria:
Millennium Printing and Publishing Company Limited, pp. 1-10 and 53-67.

W. Rahayu, V. M. Santi and B. S. Putri (2019) “Classification of diabetes events using


discriminant analysis” 4th Annual Applied Science and Engineering Conference
doi:10.1088/1742-6596/1402/7/077102.

Yaquelin Del Toro-Delgado, Isaac Enrique Peñuela-Puente, Miguel Enrique Sánchez-


Hechavarría, and Gustavo Alejandro Muñoz-Bustos (2023) “Discriminant Model for
Insulin Resistance in Type 2 Diabetic Patients” Medicina 2023, 59, 839.
https://doi.org/10.3390/medicina59050839 https://www.mdpi.com/journal/medicina

You might also like