ML Assignment 3
ML Assignment 3
ASSIGNMENT 3
Submitted to
Dr. Shobha K R
Associate
Professor Dept of
ETE
2021-2022
All code uploaded to https://github.com/jeevankumar99/ML-Assignment-3
1. Develop a model using regression which can predict the housing price in Boston using
python/ scikit learn
Python code:
array = data.feature_names
print(array)
array = np.append(array,['medv'])
print(Xtrain.shape,Ytrain.shape)
print(Xtest.shape,Ytest.shape)
lin_model = LinearRegression()
lin_model.fit(Xtrain,Ytrain)
Ytrain_predict =
lin_model.predict(Xtrain)
rmse =
(np.sqrt(mean_squared_error(Ytrain,Ytrain_predict))) r2 =
r2_score(Ytrain,Ytrain_predict)
Ytest_predict = lin_model.predict(Xtest)
rmse =
(np.sqrt(mean_squared_error(Ytest,Ytest_predict))) r2 =
r2_score(Ytest,Ytest_predict)
print("Model performance for testing set is :\n ")
print("Root Mean Square Error: ",rmse,"\n")
print("R2 sore is: ",r2,"\n")
plt.scatter(Ytest,Ytest_predict,c = 'green')
plt.xlabel("Price in $1000's")
plt.ylabel("Predicted value")
plt.title("True value vs predicted value: Linear Regression")
plt.show()
Output:
2. Implement data classification on diabetic data set using k means clustering
Python code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn import datasets
from sklearn.preprocessing import MinMaxScaler
import sklearn
from sklearn.cluster import KMeans
from sklearn.metrics import confusion_matrix
import seaborn as sns
from sklearn.cluster import KMeans
data1 = pd.DataFrame(dataset_scaled)
# Selecting features - [Glucose, Insulin, BMI]
X = data1.iloc[:, [1, 4, 5]].values
Y = data1.iloc[:, 8].values
# Checking dimensions
print("X_train shape:", X_train.shape)
print("X_test shape:", X_test.shape)
print("Y_train shape:", Y_train.shape)
print("Y_test shape:", Y_test.shape)
print(KMeans_Clustering.cluster_centers_)
#prediction using kmeans and accuracy
kpred = KMeans_Clustering.predict(X_test)
print('Classification report:\n\n',
sklearn.metrics.classification_report(Y_test,kpred))
outcome_labels = sorted(data.Outcome.unique())
sns.heatmap(
confusion_matrix(Y_test, kpred),
annot=True,
xticklabels=outcome_labels,
yticklabels=outcome_labels
)
Output: