Day93 94 Diabetes Prediction Model
Day93 94 Diabetes Prediction Model
Day93 94 Diabetes Prediction Model
Load Datasets
[96]: df = pd.read_csv("/content/diabetes.csv")
[97]: df.head()
[98]: df.describe()
1
25% 1.000000 99.000000 62.000000 0.000000 0.000000
50% 3.000000 117.000000 72.000000 23.000000 30.500000
75% 6.000000 140.250000 80.000000 32.000000 127.250000
max 17.000000 199.000000 122.000000 99.000000 846.000000
[99]: df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 768 entries, 0 to 767
Data columns (total 9 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Pregnancies 768 non-null int64
1 Glucose 768 non-null int64
2 BloodPressure 768 non-null int64
3 SkinThickness 768 non-null int64
4 Insulin 768 non-null int64
5 BMI 768 non-null float64
6 DiabetesPedigreeFunction 768 non-null float64
7 Age 768 non-null int64
8 Outcome 768 non-null int64
dtypes: float64(2), int64(7)
memory usage: 54.1 KB
[100]: df.shape
[100]: (768, 9)
[101]: df.value_counts()
2
65 0 1
104 74 0 0 28.8 0.153
48 0 1
105 72 29 325 36.9 0.159
28 0 1
..
2 84 50 23 76 30.4 0.968
21 0 1
85 65 0 0 39.6 0.930
27 0 1
87 0 23 0 28.9 0.773
25 0 1
58 16 52 32.7 0.166
25 0 1
17 163 72 41 114 40.9 0.817
47 1 1
Length: 768, dtype: int64
[102]: df.columns
[103]: Pregnancies 0
Glucose 0
BloodPressure 0
SkinThickness 0
Insulin 0
BMI 0
DiabetesPedigreeFunction 0
Age 0
Outcome 0
dtype: int64
3
Insulin -0.073535 0.331357 0.088933 0.436783
BMI 0.017683 0.221071 0.281805 0.392573
DiabetesPedigreeFunction -0.033523 0.137337 0.041265 0.183928
Age 0.544341 0.263514 0.239528 -0.113970
Outcome 0.221898 0.466581 0.065068 0.074752
Age Outcome
Pregnancies 0.544341 0.221898
Glucose 0.263514 0.466581
BloodPressure 0.239528 0.065068
SkinThickness -0.113970 0.074752
Insulin -0.042163 0.130548
BMI 0.036242 0.292695
DiabetesPedigreeFunction 0.033561 0.173844
Age 1.000000 0.238356
Outcome 0.238356 1.000000
4
[106]: df.hist(figsize=(18,12))
plt.show()
5
[107]: features = ['Glucose', 'BloodPressure', 'Insulin', 'BMI', 'Age',␣
↪'SkinThickness']
plt.figure(figsize=(14, 10))
plt.tight_layout()
plt.show()
6
[108]: mean_col = ['Glucose','BloodPressure','Insulin','Age','Outcome','BMI']
sns.pairplot(df[mean_col],palette='dark')
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1513: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=vector, **plot_kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1513: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=vector, **plot_kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1513: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=vector, **plot_kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1513: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=vector, **plot_kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1513: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=vector, **plot_kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1513: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=vector, **plot_kwargs)
7
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
8
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
/usr/local/lib/python3.10/dist-packages/seaborn/axisgrid.py:1615: UserWarning:
Ignoring `palette` because no `hue` variable has been assigned.
func(x=x, y=y, **kwargs)
9
[109]: sns.boxplot(x='Outcome',y='Insulin',data=df)
10
[110]: sns.regplot(x='BMI', y= 'Glucose', data=df)
11
[111]: sns.relplot(x='BMI', y= 'Glucose', data=df)
12
[112]: sns.scatterplot(x='Glucose', y= 'Insulin', data=df)
13
[113]: sns.jointplot(x='SkinThickness', y= 'Insulin', data=df)
14
[114]: sns.pairplot(df,hue='Outcome')
15
[115]: sns.lineplot(x='Glucose', y= 'Insulin', data=df)
16
[116]: sns.swarmplot(x='Glucose', y= 'Insulin', data=df)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 60.0% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 50.0% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 33.3% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 25.0% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 66.7% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
17
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 71.4% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 42.9% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 55.6% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 81.8% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 57.1% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 61.5% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 37.5% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 64.7% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 44.4% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 76.9% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 53.8% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 85.7% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
18
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 63.6% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 64.3% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 69.2% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 70.0% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 45.5% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 54.5% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 58.3% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 22.2% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 40.0% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 80.0% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 16.7% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 62.5% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
19
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 20.0% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
/usr/local/lib/python3.10/dist-packages/seaborn/categorical.py:3398:
UserWarning: 28.6% of the points cannot be placed; you may want to decrease the
size of the markers or use stripplot.
warnings.warn(msg, UserWarning)
20
[118]: plt.figure(figsize=(5,5))
sns.barplot(x="Glucose", y="Insulin", data=df[120:130])
plt.title("Glucose vs Insulin",fontsize=15)
plt.xlabel("Glucose")
plt.ylabel("Insulin")
plt.show()
21
Training and Testing Data
[119]: x = df.drop(columns = 'Outcome')
y = df['Outcome']
MODELS
1. Logistic Regression
[120]: from sklearn.linear_model import LogisticRegression
model = LogisticRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
22
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
[[98 9]
[18 29]]
Logistic Regression accuracy is: 82.47%
/usr/local/lib/python3.10/dist-packages/sklearn/linear_model/_logistic.py:458:
ConvergenceWarning: lbfgs failed to converge (status=1):
STOP: TOTAL NO. of ITERATIONS REACHED LIMIT.
Increase the number of iterations (max_iter) or scale the data as shown in:
https://scikit-learn.org/stable/modules/preprocessing.html
Please also refer to the documentation for alternative solver options:
https://scikit-learn.org/stable/modules/linear_model.html#logistic-
regression
n_iter_i = _check_optimize_result(
2. KNeighborsClassifier
[121]: from sklearn.neighbors import KNeighborsClassifier
model = KNeighborsClassifier(n_neighbors=7)
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
23
1 0.61 0.57 0.59 47
[[90 17]
[20 27]]
KNeighborsClassifier accuracy is: 75.97%
3. SVC
[122]: from sklearn.svm import SVC
model = SVC()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
[[98 9]
[23 24]]
SVC accuracy is: 79.22%
4. RandomForestClassifier
[123]: from sklearn.ensemble import RandomForestClassifier
model = RandomForestClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
24
from sklearn.metrics import accuracy_score
RFAcc = accuracy_score(y_pred,y_test)
print('RFC accuracy is: {:.2f}%'.format(RFAcc*100))
[[94 13]
[16 31]]
RFC accuracy is: 81.17%
5. Gradient Boosting Classifier
[124]: from sklearn.ensemble import GradientBoostingClassifier
model = GradientBoostingClassifier()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
[[93 14]
[14 33]]
GBC accuracy is: 81.82%
6. Naive Bayes
25
[125]: from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
print(classification_report(y_test, y_pred))
print(confusion_matrix(y_test, y_pred))
[[93 14]
[18 29]]
GNB accuracy is: 79.22%
Compare Models
[126]: compare = pd.DataFrame({'Model': ['Logistic Regression', 'K Neighbors', 'SVM',␣
↪'Random Forest', 'GradientBoostingClassifier', 'GaussianNB'],
compare.sort_values(by='Accuracy', ascending=False)
26
From the comparison plot, among the 6 ML models, Logistic Regression had achieved
the highest accuracy of 82.50%.
27