0% found this document useful (0 votes)
34 views

ML Assignment 3

The document contains assignments submitted by a student for their Machine Learning course. It includes: 1) A regression model to predict Boston housing prices using scikit-learn with linear regression, achieving an RMSE of $X and R2 score of $Y on test data. 2) A K-means clustering model to classify diabetes patients using patient data on glucose, insulin, and BMI, achieving $Z accuracy on test data, with clusters visualized on a 2D plot.

Uploaded by

Kishan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
34 views

ML Assignment 3

The document contains assignments submitted by a student for their Machine Learning course. It includes: 1) A regression model to predict Boston housing prices using scikit-learn with linear regression, achieving an RMSE of $X and R2 score of $Y on test data. 2) A K-means clustering model to classify diabetes patients using patient data on glucose, insulin, and BMI, achieving $Z accuracy on test data, with clusters visualized on a 2D plot.

Uploaded by

Kishan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

Department of Electronics and Telecommunication Engineering

Ramaiah Institute of Technology


M.S.R. Nagar, Bangalore-54

MACHINE LEARNING (ETE631)

ASSIGNMENT 3

Name: R Jeevan Kumar


USN: 1MS19ET042

Submitted to
Dr. Shobha K R
Associate
Professor Dept of
ETE

2021-2022
All code uploaded to https://github.com/jeevankumar99/ML-Assignment-3

1. Develop a model using regression which can predict the housing price in Boston using
python/ scikit learn

Python code:

# Predict housing prices using Linear Regression

from sklearn.metrics import mean_squared_error, r2_score

from sklearn.model_selection import train_test_split


from sklearn.datasets import load_boston
from sklearn.linear_model import LinearRegression
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

print("\n----------- HOUSING PRICE PREDICTOR------------\n")


data = load_boston()

array = data.feature_names
print(array)
array = np.append(array,['medv'])

data, target = data.data, data.target

Xtrain, Xtest, Ytrain, Ytest = train_test_split(data,target,test_size=0.3)

print(Xtrain.shape,Ytrain.shape)
print(Xtest.shape,Ytest.shape)

lin_model = LinearRegression()
lin_model.fit(Xtrain,Ytrain)
Ytrain_predict =
lin_model.predict(Xtrain)

rmse =
(np.sqrt(mean_squared_error(Ytrain,Ytrain_predict))) r2 =
r2_score(Ytrain,Ytrain_predict)

print("Model performance for training set is :\n ")


print("Root Mean Square Error: ",rmse,"\n")
print("R2 sore is: ",r2,"\n")

Ytest_predict = lin_model.predict(Xtest)

rmse =
(np.sqrt(mean_squared_error(Ytest,Ytest_predict))) r2 =
r2_score(Ytest,Ytest_predict)
print("Model performance for testing set is :\n ")
print("Root Mean Square Error: ",rmse,"\n")
print("R2 sore is: ",r2,"\n")

plt.scatter(Ytest,Ytest_predict,c = 'green')
plt.xlabel("Price in $1000's")
plt.ylabel("Predicted value")
plt.title("True value vs predicted value: Linear Regression")
plt.show()

Output:
2. Implement data classification on diabetic data set using k means clustering

Python code:

# classify diabetes using K means clustering

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import MinMaxScaler
import sklearn
from sklearn.cluster import KMeans
from sklearn.metrics import confusion_matrix
import seaborn as sns
from sklearn.cluster import KMeans

from sklearn import preprocessing


from sklearn.preprocessing import StandardScaler
from sklearn.preprocessing import MinMaxScaler
from sklearn.model_selection import train_test_split
if name == " main ":

print ("\n ---------- K MEANS CLUSTERING ON DIABETES DATA----------------


\n")
data = pd.read_csv("./data.csv") #importing files using
pandas dataset_new = data
dataset_new[[
"Glucose",
"BloodPressure",
"SkinThickness",
"Insulin", "BMI"]] =
dataset_new[[
"Glucose",
"BloodPressure",
"SkinThickness",
"Insulin",
"BMI"]].replace(0, np.NaN)

# Replacing NaN with mean values


dataset_new["Glucose"].fillna(dataset_new["Glucose"].mean(), inplace
=
True)
dataset_new["BloodPressure"].fillna(dataset_new["BloodPressure"].mean(),
inplace = True)
dataset_new["SkinThickness"].fillna(dataset_new["SkinThickness"].mean(),
inplace = True)
dataset_new["Insulin"].fillna(dataset_new["Insulin"].mean(), inplace =
True)
dataset_new["BMI"].fillna(dataset_new["BMI"].mean(), inplace = True)

# Feature scaling using MinMaxScaler


sc = MinMaxScaler(feature_range = (0, 1))
dataset_scaled = sc.fit_transform(dataset_new)

data1 = pd.DataFrame(dataset_scaled)
# Selecting features - [Glucose, Insulin, BMI]
X = data1.iloc[:, [1, 4, 5]].values
Y = data1.iloc[:, 8].values

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size =


0.20, random_state = 42, stratify = dataset_new['Outcome'] )

# Checking dimensions
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("Y_train shape:", Y_train.shape)
print("Y_test shape:", Y_test.shape)

KMeans_Clustering = KMeans(n_clusters =2, random_state=0)


KMeans_Clustering.fit(X_train)

print(KMeans_Clustering.cluster_centers_)
#prediction using kmeans and accuracy
kpred = KMeans_Clustering.predict(X_test)
print('Classification report:\n\n',
sklearn.metrics.classification_report(Y_test,kpred))

outcome_labels = sorted(data.Outcome.unique())
sns.heatmap(
confusion_matrix(Y_test, kpred),
annot=True,
xticklabels=outcome_labels,
yticklabels=outcome_labels
)

# Fit again and plot


KMeans_Clustering = KMeans(n_clusters =2, random_state=0)
KMeans_Clustering.fit(X)

plt.scatter(data1.iloc[:, [1]].values,data1.iloc[:, [5]].values,


c=KMeans_Clustering.labels_, cmap='rainbow')

Output:

You might also like