0% found this document useful (0 votes)
74 views43 pages

Jntuk R20 ML

The program demonstrates the ID3 decision tree algorithm. It loads training data from a CSV file, builds a decision tree to classify the samples, and then loads test data to classify using the decision tree. It prints the decision tree and the predicted classification for each test sample.

Uploaded by

Sush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views43 pages

Jntuk R20 ML

The program demonstrates the ID3 decision tree algorithm. It loads training data from a CSV file, builds a decision tree to classify the samples, and then loads test data to classify using the decision tree. It prints the decision tree and the predicted classification for each test sample.

Uploaded by

Sush
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

SYLLABUS

Experiment-1:
Implement and demonstrate the FIND-S algorithm for finding the
most specific hypothesis based on a given set of training data
samples. Read the training data from a .CSV file.

Experiment-2:
For a given set of training data examples stored in a .CSV file,
implement and demonstrate the Candidate-Elimination algorithm to
output a description of the set of all hypotheses consistent with
the training examples.
Experiment-3:
Write a program to demonstrate the working of the decision tree
based ID3 algorithm. Use an appropriate data set for building the
decision tree and apply this knowledge to classify a new sample.
Experiment-4:
Exercises to solve the real-world problems using the following
machine learning methods: a) Linear Regression
b) Logistic Regression
c) Binary Classifier
Experiment-5:
Develop a program for Bias, Variance, Remove duplicates , Cross
Validation
Experiment-6:
Write a program to implement Categorical Encoding, One-hot
Encoding
Experiment-7:
Build an Artificial Neural Network by implementing the Back
propagation algorithm and test the same using appropriate data
sets.
Experiment-8:
Write a program to implement k-Nearest Neighbor algorithm to
classify the iris data set. Print both correct and wrong
predictions.
Experiment-9:
Implement the non-parametric Locally Weighted Regression
algorithm in order to fit data points. Select appropriate data set
for your experiment and draw graphs
Experiment-10:
Assuming a set of documents that need to be classified, use the
naïve Bayesian Classifier model to perform this task. Built-in
Java classes/API can be used to write the program. Calculate the
accuracy, precision, and recall for your data set.
Experiment-11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the
same data set for clustering using k-Means algorithm. Compare the
results of these two algorithms and comment on the quality of
clustering. You can add Java/Python ML library classes/API in the
program.
Experiment-12:
Exploratory Data Analysis for Classification using Pandas or
Matplotlib.
Experiment-13:
Write a Python program to construct a Bayesian network considering
medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart
Disease Data Set
Experiment-14:
Write a program to Implement Support Vector Machines and Principle
Component Analysis
Experiment-15:
Write a program to Implement Principle Component Analysis

Find all codes at colab here

.
EXPERIMENT-1
Implement and demonstrate the FIND-S algorithm for finding the
most specific hypothesis based on a given set of training data
samples. Read the training data from a .CSV file.
DATASET:-

HIT DOWNLOAD FOR ‘enjoysports.csv’ DOWNLOAD


PROGRAM:-
import csv
num_attributes = 6
a = []
print("\n The Given Training Data Set \n")
with open('enjoysport.csv', 'r') as csvfile:
reader = csv.reader(csvfile)
for row in reader:
a.append (row)
print(row)
print("\n The initial value of hypothesis: ")
hypothesis = ['0'] * num_attributes
print(hypothesis)

for j in range(0,num_attributes):
hypothesis[j] = a[0][j];

print("\n Find S: Finding a Maximally Specific Hypothesis\n")


for i in range(0,len(a)):
if a[i][num_attributes]=='yes':
for j in range(0,num_attributes):
if a[i][j]!=hypothesis[j]:
hypothesis[j]='?'
else :
hypothesis[j]= a[i][j]
print(" For Training instance No:{0} the hypothesis
is".format(i),hypothesis)
print("\n The Maximally Specific Hypothesis for a given
TrainingExamples :\n")
print(hypothesis)
OUTPUT:-
The Given Training Data Set

['sunny', 'warm', 'normal', 'strong', 'warm', 'same', 'yes']


['sunny', 'warm', 'high', 'strong', 'warm', 'same', 'yes']
['rainy', 'cold', 'high', 'strong', 'warm', 'change', 'no']
['sunny', 'warm', 'high', 'strong', 'cool', 'change', 'yes']

The initial value of hypothesis:


['0', '0', '0', '0', '0', '0']

Find S: Finding a Maximally Specific Hypothesis

For Training instance No:3 the hypothesis is ['sunny', 'warm', '?',


'strong', '?', '?']

The Maximally Specific Hypothesis for a given TrainingExamples :

['sunny', 'warm', '?', 'strong', '?', '?']


EXPERIMENT-2
2. For a given set of training data examples stored in a .CSV
file, implement and demonstrate the Candidate-Elimination
algorithm to output a description of the set of all hypotheses
consistent with the training examples.

DATASET:-

HIT DOWNLOAD FOR ‘enjoysports.csv’ DOWNLOAD


PROGRAM:-
#pip install numpy , pandas
import numpy as np
import pandas as pd
data = pd.DataFrame(data=pd.read_csv('enjoysport.csv'))
concepts = np.array(data.iloc[:,0:-1])
print(concepts)
target = np.array(data.iloc[:,-1])
print(target)
def learn(concepts, target):
specific_h = concepts[0].copy()
print("initialization of specific_h and general_h")
print(specific_h)
general_h = [["?" for i in range(len(specific_h))] for i in
range(len(specific_h))]
print(general_h)
for i, h in enumerate(concepts):
if target[i] == "yes":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
specific_h[x] ='?'
general_h[x][x] ='?'
print(specific_h)
print(specific_h)
if target[i] == "no":
for x in range(len(specific_h)):
if h[x]!= specific_h[x]:
general_h[x][x] = specific_h[x]
else:
general_h[x][x] = '?'
print(" steps of Candidate Elimination Algorithm",i+1)
print(specific_h)
print(general_h)
indices = [i for i, val in enumerate(general_h) if val
==['?', '?', '?', '?', '?', '?']]
for i in indices:
general_h.remove(['?', '?', '?', '?', '?', '?'])
return specific_h, general_h
s_final, g_final = learn(concepts, target)
print("Final Specific_h:", s_final, sep="\n")
print("Final General_h:", g_final, sep="\n")
OUTPUT:-
Final Specific_h:
['sunny' 'warm' 'high' 'strong' '?' '?']
Final General_h:
[['sunny', '?', '?', '?', '?', '?'], ['?', 'warm', '?', '?', '?',
'?']]
EXPERIMENT-3
Write a program to demonstrate the working of the decision tree
based ID3 algorithm. Use an appropriate data set for building the
decision tree and apply this knowledge to classify a new sample
DATASET:-

.
HIT DOWNLOAD FOR ‘data3.csv’ DOWNLOAD

HIT DOWNLOAD FOR ‘data3_test.csv’ DOWNLOAD


PROGRAM:-
#THIS ALGORITHM REQUIRES TWO FILES ("data3.csv" ,
"data3_test.csv" )
import math
import csv
def load_csv(filename):
lines=csv.reader(open(filename,"r"));
dataset = list(lines)
headers = dataset.pop(0)
return dataset,headers
class Node:
def __init__(self,attribute):
self.attribute=attribute
self.children=[]
self.answer=""
def subtables(data,col,delete):
dic={}
coldata=[row[col] for row in data]
attr=list(set(coldata))
counts=[0]*len(attr)
r=len(data)
c=len(data[0])
for x in range(len(attr)):
for y in range(r):
if data[y][col]==attr[x]:
counts[x]+=1
for x in range(len(attr)):
dic[attr[x]]=[[0 for i in range(c)] for j in
range(counts[x])]
pos=0
for y in range(r):
if data[y][col]==attr[x]:
if delete:
del data[y][col]
dic[attr[x]][pos]=data[y]
pos+=1
return attr,dic

def entropy(S):
attr=list(set(S))
if len(attr)==1:
return 0
counts=[0,0]
for i in range(2):
counts[i]=sum([1 for x in S if attr[i]==x])/(len(S)*1.0)
sums=0
for cnt in counts:
sums+=-1*cnt*math.log(cnt,2)
return sums
def compute_gain(data,col):
attr,dic = subtables(data,col,delete=False)
total_size=len(data)
entropies=[0]*len(attr)
ratio=[0]*len(attr)
total_entropy=entropy([row[-1] for row in data])
for x in range(len(attr)):
ratio[x]=len(dic[attr[x]])/(total_size*1.0)
entropies[x]=entropy([row[-1] for row in dic[attr[x]]])
total_entropy-=ratio[x]*entropies[x]
return total_entropy
def build_tree(data,features):
lastcol=[row[-1] for row in data]
if(len(set(lastcol)))==1:
node=Node("")
node.answer=lastcol[0]
return node
n=len(data[0])-1
gains=[0]*n
for col in range(n):
gains[col]=compute_gain(data,col)
split=gains.index(max(gains))
node=Node(features[split])
fea = features[:split]+features[split+1:]
attr,dic=subtables(data,split,delete=True)

for x in range(len(attr)):
child=build_tree(dic[attr[x]],fea)
node.children.append((attr[x],child))
return node
def print_tree(node,level):
if node.answer!="":
print(" "*level,node.answer)
return
print(" "*level,node.attribute)
for value,n in node.children:
print(" "*(level+1),value)
print_tree(n,level+2)

def classify(node,x_test,features):
if node.answer!="":
print(node.answer)
return
pos=features.index(node.attribute)
for value, n in node.children:
if x_test[pos]==value:
classify(n,x_test,features)
'''Main program'''
#This is main program that calls previously defined functions
dataset,features=load_csv("data3.csv")
node1=build_tree(dataset,features)
print("The decision tree for the dataset using ID3 algorithmis")
print_tree(node1,0)
#load second dataset to test the model
testdata,features=load_csv("data3_test.csv")
for xtest in testdata:
print("\n The test instance:",xtest)
print("The label for test instance:",end="")
classify(node1,xtest,features)
OUTPUT:-
The decision tree for the dataset using ID3 algorithmis
Outlook
Rain
Wind
Strong
No
Weak
Yes
Overcast
Yes
Sunny
Humidity
Normal
Yes
High
No

The test instance: ['Rain', 'Cool', 'Normal', 'Strong']


The label for test instance:No

The test instance: ['Sunny', 'Mild', 'Normal', 'Strong']


The label for test instance:Yes
EXPERIMENT-4
Exercises to solve the real-world problems using the following
machine learning methods:
a) Linear Regression
b) Logistic Regression
c) Binary Classifier
Linear Regression
DATASET:-

HIT DOWNLOAD FOR ‘Salary.csv’ DOWNLOAD

PROGRAM:-
#EXP-4(a)
#pip install sklearn , matplotlib , numpy , pandas ,scikit-learn
import numpy as np
import pandas as pd
import matplotlib.pyplot as mtp

data_set=pd.read_csv(r'salary.csv')
#print(data_set)
x=data_set.iloc[:,:-1].values
y=data_set.iloc[:,:-1].values
from sklearn.model_selection import train_test_split
x_train , x_test , y_train , y_test
=train_test_split(x,y,test_size=1/3 , random_state=0)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(x_train,y_train)
y_pred = regressor.predict(x_test)
x_pred = regressor.predict(x_train)
mtp.scatter(x_train,y_train, color="green")
mtp.plot(x_train,x_pred,color="red")
mtp.title("salary vs experence(Training Dataset }")
mtp.xlabel("Years of Experence")
mtp.ylabel("salary (in rupee)")
mtp.show()

mtp.scatter(x_test,y_test, color="blue")
mtp.plot(x_train,x_pred,color="red")
mtp.title("salary vs experence(Training Dataset }")
mtp.xlabel("Years of Experence")
mtp.ylabel("salary (in rupee)")
mtp.show()
B. Logistic Regression PROGRAM
DATASET:-

HIT DOWNLOAD FOR ‘user_data.csv’ DOWNLOAD

PROGRAM:-
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd

#importing datasets
data_set= pd.read_csv('user_data.csv')
#print(data_set)
x= data_set.iloc[:, [3,4]].values
y= data_set.iloc[:, 5].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25,
random_state=0)
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
from sklearn.linear_model import LogisticRegression
classifier= LogisticRegression(random_state=0)
classifier.fit(x_train, y_train)
LogisticRegression(C=1.0, class_weight=None, dual=False, fit_intercept=True,
intercept_scaling=1, l1_ratio=None, max_iter=100,
multi_class='warn', n_jobs=None, penalty='l2',
random_state=0, solver='warn', tol=0.0001, verbose=0,
warm_start=False)
y_pred= classifier.predict(x_test)
from sklearn.metrics import confusion_matrix
cm= confusion_matrix(y_test,y_pred)

from matplotlib.colors import ListedColormap


x_set, y_set = x_train, y_train
x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:,
0].max() + 1, step =0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(),
x2.ravel()]).T).reshape(x1.shape),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
mtp.xlim(x1.min(), x1.max())
mtp.ylim(x2.min(), x2.max())
for i, j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
mtp.title('Logistic Regression (Training set)')
mtp.xlabel('Age')
mtp.ylabel('Estimated Salary')
mtp.legend()
mtp.show()
from matplotlib.colors import ListedColormap
x_set, y_set = x_test, y_test
x1, x2 = nm.meshgrid(nm.arange(start = x_set[:, 0].min() - 1, stop = x_set[:,
0].max() + 1, step =0.01),
nm.arange(start = x_set[:, 1].min() - 1, stop = x_set[:, 1].max() + 1, step = 0.01))
mtp.contourf(x1, x2, classifier.predict(nm.array([x1.ravel(),
x2.ravel()]).T).reshape(x1.shape),
alpha = 0.75, cmap = ListedColormap(('purple','green' )))
mtp.xlim(x1.min(), x1.max())
mtp.ylim(x2.min(), x2.max())
for i, j in enumerate(nm.unique(y_set)):
mtp.scatter(x_set[y_set == j, 0], x_set[y_set == j, 1],
c = ListedColormap(('purple', 'green'))(i), label = j)
mtp.title('Logistic Regression (Test set)')
mtp.xlabel('Age')
mtp.ylabel('Estimated Salary')
mtp.legend()
mtp.show()
C. BINARY CLASSIFICATION PROGRAM
#EXP 4(C)
#NO DATASET FOR THIS EXPEREMENT

import matplotlib.pyplot as plt


from sklearn.datasets import load_breast_cancer

dataset = load_breast_cancer(as_frame=True)
dataset['data'].head()
dataset['target'].head()
dataset['target'].value_counts()
X = dataset['data']
y = dataset['target']
from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y , test_size=0.25,


random_state=0)
from sklearn.preprocessing import StandardScaler

ss_train = StandardScaler()
X_train = ss_train.fit_transform(X_train)

ss_test = StandardScaler()
X_test = ss_test.fit_transform(X_test)

from sklearn.linear_model import LogisticRegression


model = LogisticRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)

#print(predictions)
from sklearn.metrics import confusion_matrix

cm = confusion_matrix(y_test, predictions)

TN, FP, FN, TP = confusion_matrix(y_test, predictions).ravel()

print('True Positive(TP) = ', TP)


print('False Positive(FP) = ', FP)
print('True Negative(TN) = ', TN)
print('False Negative(FN) = ', FN)
accuracy = (TP + TN) / (TP + FP + TN + FN)

print('Accuracy of the binary classifier = {:0.3f}'.format(accuracy))

models = {}

# Logistic Regression
from sklearn.linear_model import LogisticRegression
models['Logistic Regression'] = LogisticRegression()

# Support Vector Machines


from sklearn.svm import LinearSVC
models['Support Vector Machines'] = LinearSVC()

# Decision Trees
from sklearn.tree import DecisionTreeClassifier
models['Decision Trees'] = DecisionTreeClassifier()

# Random Forest
from sklearn.ensemble import RandomForestClassifier
models['Random Forest'] = RandomForestClassifier()

# Naive Bayes
from sklearn.naive_bayes import GaussianNB
models['Naive Bayes'] = GaussianNB()

# K-Nearest Neighbors
from sklearn.neighbors import KNeighborsClassifier
models['K-Nearest Neighbor'] = KNeighborsClassifier()

from sklearn.metrics import accuracy_score, precision_score, recall_score

accuracy, precision, recall = {}, {}, {}

for key in models.keys():

# Fit the classifier


models[key].fit(X_train, y_train)
# Make predictions
predictions = models[key].predict(X_test)

# Calculate metrics
accuracy[key] = accuracy_score(predictions, y_test)
precision[key] = precision_score(predictions, y_test)
recall[key] = recall_score(predictions, y_test)
import pandas as pd

df_model = pd.DataFrame(index=models.keys(), columns=['Accuracy', 'Precision',


'Recall'])
df_model['Accuracy'] = accuracy.values()
df_model['Precision'] = precision.values()
df_model['Recall'] = recall.values()

df_model

ax = df_model.plot.barh()
ax.legend(
ncol=len(models.keys()),
bbox_to_anchor=(0, 1),
loc='lower left',
prop={'size': 14}
)
plt.tight_layout()

OUTPUT:-
True Positive(TP) = 86
False Positive(FP) = 2
True Negative(TN) = 51
False Negative(FN) = 4
Accuracy of the binary classifier = 0.958
Experiment-5
Develop a program for Bias, Variance, Remove duplicates ,
Cross Validation
PROGRAM:-
#NO DATASET
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# generate some sample data
X = np.random.rand(100, 10)
y = np.random.rand(100)
# split the data into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.2)
# train a linear regression model on the training data
model = LinearRegression()
model.fit(X_train, y_train)
# calculate the mean squared error on the test data
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
print(f"Mean squared error: {mse:.3f}")
# calculate the bias and variance
y_pred_train = model.predict(X_train)
bias = np.mean((y_pred_train - y_train) ** 2)
variance = np.mean((y_pred - y_test) ** 2)
print(f"Bias: {bias:.3f}")
print(f"Variance: {variance:.3f}")
# remove duplicates from the data
X_no_duplicates, indices = np.unique(X, axis=0,
return_index=True)
y_no_duplicates = y[indices]
print(f"Number of duplicates removed: {X.shape[0] -
X_no_duplicates.shape[0]}")
# perform k-fold cross-validation
from sklearn.model_selection import KFold
kf = KFold(n_splits=5)
mse_scores = []
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index]
model = LinearRegression()
model.fit(X_train, y_train)
y_pred = model.predict(X_test)
mse = mean_squared_error(y_test, y_pred)
mse_scores.append(mse)
print(f"Cross-validation mean squared error:
{np.mean(mse_scores):.3f}")

OUTPUT:-
Mean squared error: 0.073
Bias: 0.075
Variance: 0.073
Number of duplicates removed: 0
Cross-validation mean squared error: 0.095
Experiment-6
Write a program to implement Categorical Encoding, One-hot
Encoding
PROGRAM:
#NO DATASET
import pandas as pd
# create a sample dataframe with categorical variables
data = {'gender': ['male', 'female', 'male', 'male', 'female']}
df = pd.DataFrame(data)
# perform categorical encoding using pandas' 'astype' method
df['gender_encoded'] = df['gender'].astype('category').cat.codes
print(df)
# perform one-hot encoding using pandas' 'get_dummies' function
df_onehot = pd.get_dummies(df, columns=['gender'])
print(df_onehot)
OUTPUT:
gender gender_encoded
0 male 1
1 female 0
2 male 1
3 male 1
4 female 0
gender_encoded gender_female gender_male
0 1 0 1
1 0 1 0
2 1 0 1
3 1 0 1
4 0 1 0
Experiment-7:
Build an Artificial Neural Network by implementing the Back
propagation algorithm and test the same using appropriate data
sets.
PROGRAM:-
#NO DATASET
import numpy as np
X = np.array(([2, 9], [1, 5], [3, 6]), dtype=float)
y = np.array(([92], [86], [89]), dtype=float)
X = X/np.amax(X,axis=0) # maximum of X array longitudinally y =
y/100
#Sigmoid Function
def sigmoid (x):
return (1/(1 + np.exp(-x)))
#Derivative of Sigmoid Function
def derivatives_sigmoid(x):
return x * (1 - x)
#Variable initialization
epoch=7000 #Setting training iterations
lr=0.1 #Setting learning rate
inputlayer_neurons = 2 #number of features in data set
hiddenlayer_neurons = 3 #number of hidden layers neurons
output_neurons = 1
#weight and bias initialization
wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons
))
bh=np.random.uniform(size=(1,hiddenlayer_neurons))
wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons))
bout=np.random.uniform(size=(1,output_neurons))
# draws a random range of numbers uniformly of dim x*y
#Forward Propagation
for i in range(epoch):
hinp1=np.dot(X,wh)
hinp=hinp1 + bh
hlayer_act = sigmoid(hinp)
outinp1=np.dot(hlayer_act,wout)
outinp= outinp1+ bout

output = sigmoid(outinp)
#Backpropagation
EO = y-output
outgrad = derivatives_sigmoid(output)
d_output = EO* outgrad
EH = d_output.dot(wout.T)
hiddengrad = derivatives_sigmoid(hlayer_act)
#how much hidden layer wts contributed to error
d_hiddenlayer = EH * hiddengrad
wout += hlayer_act.T.dot(d_output) *lr
# dotproduct of nextlayererror and currentlayerop
bout += np.sum(d_output, axis=0,keepdims=True) *lr
wh += X.T.dot(d_hiddenlayer) *lr
#bh += np.sum(d_hiddenlayer, axis=0,keepdims=True) *lr
print("Input: \n" + str(X))
print("Actual Output: \n" + str(y))
print("Predicted Output: \n",output)
OUTPUT{-
Input:
[[0.66666667 1. ]
[0.33333333 0.55555556]
[1. 0.66666667]]
Actual Output:
[[92.]
[86.]
[89.]]
Predicted Output:
[[0.8855704 ]
[0.87086645]
[0.88298625]]
Experiment-8:
Write a program to implement k-Nearest Neighbor algorithm to
classify the iris data set. Print both correct and wrong
predictions
PROGRAM:-
#NO DATASET
import numpy as np
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
# Load Iris dataset and split into train/test sets
iris = load_iris()
X_train, X_test, y_train, y_test = train_test_split(iris.data,
iris.target, test_size=0.2,random_state=42)
# Initialize k-NN classifier
k = 3
knn = KNeighborsClassifier(n_neighbors=k)
# Fit the classifier to the training data
knn.fit(X_train, y_train)
# Predict labels for test set
y_pred = knn.predict(X_test)
# Print correct and wrong predictions
correct = 0
wrong = 0
for i in range(len(y_test)):
if y_pred[i] == y_test[i]:
correct += 1
print(f"Correct prediction: Actual class {y_test[i]},
Predicted class {y_pred[i]}")
else:
wrong += 1

print(f"Wrong prediction: Actual class {y_test[i]}, Predicted


class {y_pred[i]}")
print(f"\nNumber of correct predictions: {correct}")
print(f"Number of wrong predictions: {wrong}")
OUTPUT:-
Correct prediction: Actual class 1, Predicted class 1
Correct prediction: Actual class 0, Predicted class 0
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 1, Predicted class 1
Correct prediction: Actual class 1, Predicted class 1
Correct prediction: Actual class 0, Predicted class 0
Correct prediction: Actual class 1, Predicted class 1
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 1, Predicted class 1
Correct prediction: Actual class 1, Predicted class 1
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 0, Predicted class 0
Correct prediction: Actual class 0, Predicted class 0
Correct prediction: Actual class 0, Predicted class 0
Correct prediction: Actual class 0, Predicted class 0
Correct prediction: Actual class 1, Predicted class 1
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 1, Predicted class 1
Correct prediction: Actual class 1, Predicted class 1
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 0, Predicted class 0
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 0, Predicted class 0
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 2, Predicted class 2
Correct prediction: Actual class 0, Predicted class 0
Correct prediction: Actual class 0, Predicted class 0
Wrong prediction: Actual class 0, Predicted class 0

Number of correct predictions: 30


Number of wrong predictions: 0
Experiment-9:
Implement the non-parametric Locally Weighted Regression
algorithm in order to fit data points. Select appropriate data
set for your experiment and draw graphs
PROGRAM:-
#NO DATASET
import numpy as np
import matplotlib.pyplot as plt
# Generate example data
x = np.linspace(0, 10, num=100)
y = np.sin(x)
# Add noise to data
np.random.seed(42)
noise = np.random.normal(loc=0, scale=0.1, size=len(x))
y_noisy = y + noise
# Define Locally Weighted Regression function
def lowess(x, y, tau=0.5):
y_pred = np.zeros_like(y)
for i in range(len(x)):
weights = np.exp(-(x - x[i])**2 / (2 * tau**2))
weights /= np.sum(weights)
y_pred[i] = np.dot(weights, y)
return y_pred
# Fit data using Locally Weighted Regression
y_pred = lowess(x, y_noisy)
# Plot data and predictions
plt.scatter(x, y_noisy, alpha=0.5, label='Data')
plt.plot(x, y_pred, color='red', label='Locally Weighted
Regression')
plt.legend()
plt.show()
OUTPUT:-
Experiment-10:
Assuming a set of documents that need to be classified, use the
naïve Bayesian Classifier model to perform this task. Built-in
Java classes/API can be used to write the program. Calculate the
accuracy, precision, and recall for your data set.
DATASET:-

HIT DOWNLOAD FOR ‘dataset.csv’ DOWNLOAD


PROGRAM:-
import pandas as pd
msg = pd.read_csv(r'dataset.csv', names=['message','label'])
print("Total Instances of Dataset: ", msg.shape[0])
msg['labelnum'] = msg.label.map({'pos': 1, 'neg': 0})
X = msg.message

y = msg.labelnum
from sklearn.model_selection import train_test_split
Xtrain, Xtest, ytrain, ytest = train_test_split(X, y)
from sklearn.feature_extraction.text import CountVectorizer
count_v = CountVectorizer()
Xtrain_dm = count_v.fit_transform(Xtrain)
Xtest_dm = count_v.transform(Xtest)
df =
pd.DataFrame(Xtrain_dm.toarray(),columns=count_v.get_feature_name
s_out())
print(df[0:5])
from sklearn.naive_bayes import MultinomialNB
clf = MultinomialNB()
clf.fit(Xtrain_dm, ytrain)
pred = clf.predict(Xtest_dm)
for doc, p in zip(Xtrain, pred):
p = 'pos' if p == 1 else 'neg'
print("%s -> %s" % (doc, p))
from sklearn.metrics import accuracy_score, confusion_matrix,
precision_score,recall_score
print('Accuracy Metrics: \n')
print('Accuracy: ', accuracy_score(ytest, pred))
print('Recall: ', recall_score(ytest, pred))
print('Precision: ', precision_score(ytest, pred))
print('Confusion Matrix: \n', confusion_matrix(ytest, pred))
OUTPUT:-
Total Instances of Dataset: 18
about an awesome bad beers best boss can dance deal ...
these \
0 0 0 0 0 0 0 0 0 0 0 ...
0
1 0 0 0 0 0 1 0 0 0 0 ...
0
2 0 0 0 1 0 0 0 0 0 0 ...
0
3 0 0 0 0 0 0 0 0 1 0 ...
0
4 0 0 0 0 0 0 0 0 0 0 ...
0

this to today very view went what with work


0 0 1 1 0 0 1 0 0 0
1 1 0 0 0 0 0 0 0 1
2 0 1 0 0 0 0 0 0 0
3 0 1 0 0 0 0 0 0 0
4 1 0 0 0 0 0 0 0 0

[5 rows x 43 columns]
I went to my enemy's house today -> pos
This is my best work -> pos
That is a bad locality to stay -> neg
I love to dance -> pos
I do not like this restaurant -> pos
Accuracy Metrics:

Accuracy: 0.8
Recall: 1.0
Precision: 0.75
Confusion Matrix:
[[1 1]
[0 3]]
Experiment-11:
Apply EM algorithm to cluster a Heart Disease Data Set. Use the
same data set for clustering using k-Means algorithm. Compare the
results of these two algorithms and comment on the quality of
clustering. You can add Java/Python ML library classes/API in the
program.
PROGRAM:-
import pandas as pd
import numpy as np
from sklearn.cluster import KMeans
from sklearn.mixture import GaussianMixture
from sklearn.metrics import silhouette_score
from sklearn.preprocessing import LabelEncoder
from sklearn.preprocessing import StandardScaler
# Load the Heart Disease Data Set
data = pd.DataFrame({
"Age": [40, 49, 37, 48],
"Sex": ["M", "F", "M", "F"],
"ChestPainType": ["ATA", "NAP", "ATA", "ASY"],
"RestingBP": [140, 160, 130, 138],
"Cholesterol": [289, 180, 283, 214],
"FastingBS": [0, 0, 0, 0],
"RestingECG": ["Normal", "Normal", "ST", "Normal"],
"MaxHR": [172, 156, 98, 108],
"ExerciseAngina": ["N", "N", "N", "Y"],
"Oldpeak": [0, 1, 0, 1.5],
"ST_Slope": ["Up", "Flat", "Up", "Flat"],
"HeartDisease": [0, 1, 0, 1]
})
# Preprocess the data
# Handle missing values
data = data.dropna()
# Encode categorical features
le = LabelEncoder()
data["Sex"] = le.fit_transform(data["Sex"])
data["ChestPainType"] = le.fit_transform(data["ChestPainType"])
data["RestingECG"] = le.fit_transform(data["RestingECG"])
data["ExerciseAngina"] = le.fit_transform(data["ExerciseAngina"])
data["ST_Slope"] = le.fit_transform(data["ST_Slope"])
# Scale the data
scaler = StandardScaler()
data = scaler.fit_transform(data)
# Apply the EM algorithm
gmm = GaussianMixture(n_components=2)
gmm.fit(data)
em_labels = gmm.predict(data)
# Apply the k-Means algorithm
kmeans = KMeans(n_clusters=2, random_state=0)
kmeans.fit(data)
kmeans_labels = kmeans.predict(data)
# Evaluate the quality of the clustering results
print("Silhouette score for EM algorithm:",
silhouette_score(data,em_labels))
print("Silhouette score for k-Means algorithm:",
silhouette_score(data,kmeans_labels))

OUTPUT:-

Silhouette score for EM algorithm: 0.32408473865415144


Silhouette score for k-Means algorithm: 0.32408473865415144
Experiment-12:
Exploratory Data Analysis for Classification using Pandas or
Matplotlib.
DATASET:-

HIT DOWNLOAD FOR ‘dataset.csv’ DOWNLOAD

PROGRAM:-

import pandas as pd
import matplotlib.pyplot as plt
# Load the data into a Pandas dataframe
data = pd.read_csv('dataset.csv')
# Get a summary of the data
print(data.describe())
# Plot histograms of the numerical features
data.hist(bins=10, figsize=(20,15))
plt.show()
# Plot a scatter matrix of the numerical features
from pandas.plotting import scatter_matrix
scatter_matrix(data, figsize=(20,15))
plt.show()
# Plot a bar chart of the loan purposes
data['loan_purpose'].value_counts().plot(kind='bar')
plt.show()
# Plot a pie chart of the labels
data['label\t\t'].value_counts().plot(kind='pie',
autopct='%1.1f%%')
plt.show()
OUTPUT:-
is_first_loan total_credit_card_limit \
count 29.000000 29.000000
mean 0.517241 4658.620690
std 0.508548 1864.234282
min 0.000000 2500.000000
25% 0.000000 3000.000000
50% 1.000000 4100.000000
75% 1.000000 5900.000000
max 1.000000 7900.000000

avg_percentage_credit_card_limit_used_last_year saving_amount
\
count 29.000000 29.000000
mean 0.665862 1551.172414
std 0.213366 865.010201
min 0.220000 88.000000
25% 0.520000 1058.000000
50% 0.690000 1310.000000
75% 0.860000 1958.000000
max 0.950000 3866.000000

checking_amount is_employed yearly_salary age \


count 29.000000 29.000000 29.000000 29.000000
mean 3444.103448 0.931034 30055.172414 46.034483
std 2222.545956 0.257881 18362.745403 11.773101
min 661.000000 0.000000 0.000000 21.000000
25% 1846.000000 1.000000 18300.000000 40.000000
50% 2929.000000 1.000000 26100.000000 47.000000
75% 4139.000000 1.000000 39500.000000 52.000000
max 8868.000000 1.000000 75500.000000 69.000000

dependent_number label\t\t
count 29.000000 29.000000
mean 3.758621 0.344828
std 2.898955 0.483725
min 0.000000 0.000000
25% 1.000000 0.000000
50% 3.000000 0.000000
75% 6.000000 1.000000
max 8.000000 1.000000
Experiment-13:
Write a Python program to construct a Bayesian network considering
medical data. Use this model to
demonstrate the diagnosis of heart patients using standard Heart
Disease Data Set
DATASET:-

HIT DOWNLOAD FOR ‘heart.csv’ DOWNLOAD


PROGRAM:-
import numpy as np
import csv
import pandas as pd
from pgmpy.models import BayesianModel
from pgmpy.estimators import MaximumLikelihoodEstimator
from pgmpy.inference import VariableElimination
# Read Cleveland Heart Disease data
heartDisease = pd.read_csv('heart.csv')
heartDisease = heartDisease.replace('?',np.nan)
# Display the data
print('Few examples from the dataset are given below')
print(heartDisease.head())
# Model Bayesian Network
Model=BayesianModel([('age','trestbps'),('age','fbs'),
('sex','trestbps'),('exang','trestbps'),('trestbps','Heartdisease
'),('fbs'
,'Heartdisease'),('Heartdisease','restecg'),
('Heartdisease','thalach'),('Heartdisease','chol')])
# Learning CPDs using Maximum Likelihood Estimators
print('\n Learning CPD using Maximum likelihood estimators')
Model.fit(heartDisease,estimator=MaximumLikelihoodEstimator)
# Inferencing with Bayesian Network
print('\n Inferencing with Bayesian Network:')
HeartDisease_infer = VariableElimination(Model)
# Computing the Probability of HeartDisease given Age
print('\n 1. Probability of HeartDisease given Age=30')
q=HeartDisease_infer.query(variables=['Heartdisease'],evidence={'
age':30})
print(q.values.tolist())
#print(q['Heartdisease'])
# Computing the Probability of HeartDisease given cholesterol
print('\n 2. Probability of HeartDisease given cholesterol=100')
q=HeartDisease_infer.query(variables=['Heartdisease'],evidence={'
chol':100})
print(q.values.tolist())
OUTPUT:-
Few examples from the dataset are given below
age sex cp trestbps chol fbs restecg thalach exang oldpeak
slope \
0 63 1 1 145 233 1 2 100 0 2.3
3
1 67 1 4 160 286 0 2 108 1 1.5
2
2 67 1 4 120 229 0 2 129 1 2.6
2
3 41 0 2 130 100 0 2 172 0 1.4
1
4 62 0 4 140 268 0 2 160 0 3.6
3

ca thal Heartdisease
0 0 6 0
1 3 3 2
2 2 7 1
3 0 3 0
4 2 3 3

Learning CPD using Maximum likelihood estimators

Inferencing with Bayesian Network:

1. Probability of HeartDisease given Age=30


[0.13333333333333333, 0.13333333333333333, 0.6, 0.13333333333333333]

2. Probability of HeartDisease given cholesterol=100


[1.0, 0.0, 0.0, 0.0]
Experiment-14:
Write a program to Implement Support Vector Machines and Principle
Component Analysis
PROGRAM:-
import numpy as np
import pandas as pd
from sklearn import svm
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score
iris = load_iris()
X = iris.data
y = iris.target
X_train, X_test, y_train, y_test = train_test_split(X, y,
test_size=0.3, random_state=42)
pca = PCA(n_components=2)
print(pca)
X_train_pca = pca.fit_transform(X_train)
X_test_pca = pca.transform(X_test)
svm_clf = svm.SVC(kernel='linear')
svm_clf.fit(X_train_pca, y_train)
y_pred = svm_clf.predict(X_test_pca)
accuracy = accuracy_score(y_test, y_pred)
print('Accuracy: {:.2f}'.format(accuracy))

Output:-
PCA(n_components=2)
Accuracy: 0.98
Experiment-15:
Write a program to Implement Principle Component Analysis
PROGRAM:-
import numpy as np
import pandas as pd
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris
# Load the Iris dataset as an example
iris = load_iris()
X = iris.data
y = iris.target
# Instantiate the PCA object with the number of components
pca = PCA(n_components=2)
# Fit and transform the data using PCA
X_pca = pca.fit_transform(X)
# Create a new dataframe with the PCA results and the target
variable
df = pd.DataFrame(data=X_pca, columns=['PC1', 'PC2'])
df['target'] = y
# Plot the PCA results
import matplotlib.pyplot as plt
plt.figure(figsize=(8,6))
targets = [0, 1, 2]
colors = ['r', 'g', 'b']
for target, color in zip(targets,colors):
indicesToKeep = df['target'] == target
plt.scatter(df.loc[indicesToKeep, 'PC1']
, df.loc[indicesToKeep, 'PC2']
, c = color
, s = 50)
plt.xlabel('PC1')
plt.ylabel('PC2')
plt.legend(targets)
plt.show()
OUTPUT:-

You might also like