Lab Manual: Department of Computer Science and Engineering
Lab Manual: Department of Computer Science and Engineering
Lab Manual: Department of Computer Science and Engineering
AND ENGINEERING
LAB MANUAL
B.Tech. VI Semester
TABLE OF CONTENT
S.No Content Page No
Lab Instruction i
BTU Syllabus ii
Lab Introduction iii
Implement and demonstrate the FIND-S algorithm for finding the
1 most specific hypothesis based on a given set of training data 1-2
samples.
For a given set of training data examples, implement and
2 demonstrate the Candidate-Elimination algorithm to output the 3-4
set of all hypotheses consistent with the training examples.
Write a program to demonstrate the working of the decision tree
3 based ID3 algorithm. Use an appropriate data set for building the 5-8
decision tree and apply this knowledge toclassify a new sample
Build an Artificial Neural Network by implementing the
4 Backpropagation algorithm and test the same using appropriate 9-10
data sets
Write a program to implement the naïve Bayesian classifier for a
5 sample training data set. Compute the accuracy of the classifier, 11-13
considering few test data sets.
Assuming a set of documents that need to be classified, use the
6 naive Bayesian Classifier model to perform this task. Calculate 14-16
the accuracy, precision, and recall for your data set.
Write a program to construct a Bayesian network considering
7 medical data. Use this model to demonstrate the diagnosis of heart 17-18
patients using standard Heart Disease Data Set.
Apply EM algorithm to cluster a set of data set. Use the same data
8 set for clustering using k-Means algorithm. Compare the results 19-21
of these two algorithms.
Write a program to implement k-Nearest Neighbour algorithm to
9 22-23
classify the iris data set. Print both correct and wrong predictions.
Implement the non-parametric Locally Weighted Regression
10 algorithm in order to fit data points. Select appropriate data set 24-25
for your experiment and draw graphs.
LAB INSTRUCTIONS
3. Always follow the instruction given by concerned faulty to perform the assigned
experiment.
5. Do not switch off the power supply of the PCs directly, first shut down the PCs then
switch off power supply.
6. Every student is responsible for any damage to the PCs or its accessories which is
assigned for lab work.
8. Always bring your lab file and the task assigned to you must be completed.
9. Experiment performed by you should be positively checked in next turn after that
faculty may not check your work.
10. Please mention your roll number, name, node number and signature in lab register.
BTU SYLLABUS
S.No Content
Implement and demonstrate the FIND-S algorithm for finding the most
1
specific hypothesis based on a given set of training data samples.
For a given set of training data examples, implement and demonstrate the
2 Candidate-Elimination algorithm to output a description of the set of all
hypotheses consistent with the training examples.
Write a program to demonstrate the working of the decision tree based ID3
3 algorithm. Use an appropriate data set for building the decision tree and apply
this knowledge to classify a new sample
Build an Artificial Neural Network by implementing the Backpropagation
4
algorithm and test the same using appropriate data sets
Write a program to implement the naive Bayesian classifier for a sample
5 training data set. Compute the accuracy of the classifier, considering few
test data sets.
Assuming a set of documents that need to be classified, use the naive Bayesian
Classifier model to perform this task. Built-in Java classes/API can be used to
6
write the program. Calculate the accuracy, precision, and recall for your data
set.
Write a program to construct aBayesian network considering medical data.
7 Use this model to demonstrate the diagnosis of heart patients using standard
Heart Disease Data Set. You can use Java/Python ML library classes/API.
Apply EM algorithm to cluster a set of data. Use the same data set
for clustering using k-Means algorithm. Compare the results of these
8
two algorithms and comment on the quality of clustering. You can add
Java/Python ML library classes/API in the program.
Write a program to implement k-Nearest Neighbour algorithm to classify the
9 iris data set. Print both correct and wrong predictions. Java/Python ML library
classes can be used for this problem.
Implement the non-parametric Locally Weighted Regression algorithm in
10
order to fit data points. Select appropriate data set and draw graphs.
LAB INTRODUCTION
This lab is intended for the third year students of engineering branches in the subject
of Machine Learning. This manual typically contains practical/Lab Sessions related ML
covering various aspects related to the subject to enhance understanding.
The programs are implemented in python programming language and involve use of
packages like numpy, pandas, matplotlib, scikit-learn.
Program No. 1
Objective: Implement and demonstrate the FIND-S algorithm for finding the most specific
hypothesis based on a given set of training data samples. Read the training data from a
.CSV file.
10
11
12
num_attribute = len(a[0])-1
RC
print("\n The total number of training instances are : ",len(a))
22 else:
23 hypothesis[j] = ’?’
24 print("\n The hypothesis for the training instance {} is : \n".
format(i+1),hypothesis)
25
26 print("\n The Maximally specific hypothesis for the training instance
is ")
27 print(hypothesis)
Listing 1: Find-S program
Output
The Given Training Data Set
[’sunny’, ’warm’, ’normal’, ’strong’, ’warm’, ’same’, ’yes’]
[’sunny’, ’warm’, ’high’, ’strong’, ’warm’, ’same’, ’yes’]
[’rainy’, ’cold’, ’high’, ’strong’, ’warm’, ’change’, ’no’]
[’sunny’, ’warm’, ’high’, ’strong’, ’cool’, ’change’, ’yes’]
The total number of training instances are : 4
The initial hypothesis is :
Viva Voce
1. What do you mean by Concept learning?
RC
2. What do you mean by Inductive logic?
Ltd, 2019.
Program No. 2
Objective: For a given set of training data examples stored in a .CSV file, implement
and demonstrate the Candidate-Elimination algorithm to output a description of the set of
all hypotheses consistent with the training examples.
16 for i, h in enumerate(concepts):
17 print("For Loop Starts")
18 if target[i] == "yes":
19 print("If instance is Positive ")
20 for x in range(len(specific_h)):
M
21 if h[x]!= specific_h[x]:
22 specific_h[x] =’?’
23 general_h[x][x] =’?’
24
25 if target[i] == "no":
26 print("If instance is Negative ")
27 for x in range(len(specific_h)):
28 if h[x]!= specific_h[x]:
29 general_h[x][x] = specific_h[x]
30 else:
31 general_h[x][x] = ’?’
32
43
Output
Final Specific_h:
[’sunny’ ’warm’ ’?’ ’strong’ ’?’ ’?’]
Final General_h:
[[’sunny’, ’?’, ’?’, ’?’, ’?’, ’?’],
[’?’, ’warm’, ’?’, ’?’, ’?’, ’?’]]
Viva Voce
RC
1. What do you mean by consistent hypothesis?
Reference
M
1. Raschka, Sebastian, and Vahid Mirjalili. Python machine learning: Machine learning
and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing
Ltd, 2019.
Program No. 3
Objective: Write a program to demonstrate the working of the decision tree based ID3
algorithm. Use an appropriate data set for building the decision tree and apply this knowledge
to classify a new sample
11
12
class Node:
def __init__(self,attribute):
self.attribute=attribute
self.children=[]
RC
13 self.answer=""
14
15 def subtables(data,col,delete):
dic={}
IT
16
22 c=len(data[0])
23 for x in range(len(attr)):
24 for y in range(r):
25 if data[y][col]==attr[x]:
26 counts[x]+=1
27
28 for x in range(len(attr)):
29 dic[attr[x]]=[[0 for i in range(c)] for j in range(counts[x])]
30 pos=0
31 for y in range(r):
32 if data[y][col]==attr[x]:
33 if delete:
34 del data[y][col]
35 dic[attr[x]][pos]=data[y]
36 pos+=1
37 return attr,dic
38
39 def entropy(S):
40 attr=list(set(S))
41 if len(attr)==1:
42 return 0
43
44 counts=[0,0]
45 for i in range(2):
46 counts[i]=sum([1 for x in S if attr[i]==x])/(len(S)*1.0)
47
48 sums=0
49 for cnt in counts:
50 sums+=-1*cnt*math.log(cnt,2)
51 return sums
52
53 def compute_gain(data,col):
54 attr,dic = subtables(data,col,delete=False)
55
56 total_size=len(data)
57 entropies=[0]*len(attr)
58 ratio=[0]*len(attr)
59
60 total_entropy=entropy([row[-1] for row in data])
61 for x in range(len(attr)):
62 ratio[x]=len(dic[attr[x]])/(total_size*1.0)
63 entropies[x]=entropy([row[-1] for row in dic[attr[x]]])
64
65
66
67
return total_entropy
RC
total_entropy-=ratio[x]*entropies[x]
def build_tree(data,features):
68 lastcol=[row[-1] for row in data]
69 if(len(set(lastcol)))==1:
70 node=Node("")
71 node.answer=lastcol[0]
IT
72 return node
73
74 n=len(data[0])-1
75 gains=[0]*n
76 for col in range(n):
M
77 gains[col]=compute_gain(data,col)
78 split=gains.index(max(gains))
79 node=Node(features[split])
80 fea = features[:split]+features[split+1:]
81
82
83 attr,dic=subtables(data,split,delete=True)
84
85 for x in range(len(attr)):
86 child=build_tree(dic[attr[x]],fea)
87 node.children.append((attr[x],child))
88 return node
89
90 def print_tree(node,level):
91 if node.answer!="":
92 print(" "*level,node.answer)
93 return
94
95 print(" "*level,node.attribute)
96 for value,n in node.children:
97 print(" "*(level+1),value)
98 print_tree(n,level+2)
99
100
Output
IT
The decision tree for the dataset using ID3 algorithm is
Outlook
rain
M
Wind
strong
no
weak
yes
overcast
yes
sunny
Humidity
normal
yes
high
no
The test instance: [’rain’, ’cool’, ’normal’, ’strong’]
The label for test instance: no
Viva Voce
1. How you decide the root node of decision tree?
Reference
1. Raschka, Sebastian, and Vahid Mirjalili. Python machine learning: Machine learning
and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing
Ltd, 2019.
RC
IT
M
Program No. 4
7 #Sigmoid Function RC
8 def sigmoid (x):
9 return 1/(1 + np.exp(-x))
10
11 #Derivative of Sigmoid Function
12 def derivatives_sigmoid(x):
13 return x * (1 - x)
14
#Variable initialization
IT
15
16 epoch=5000 #Setting training iterations
17 lr=0.1 #Setting learning rate
18 inputlayer_neurons = 2 #number of features in data set
19 hiddenlayer_neurons = 3 #number of hidden layers neurons
20 output_neurons = 1 #number of neurons at output layer
M
21
22 #weight and bias initialization
23 wh=np.random.uniform(size=(inputlayer_neurons,hiddenlayer_neurons)) #
weight of the link from input node to hidden node
24 bh=np.random.uniform(size=(1,hiddenlayer_neurons)) # bias of the link
from input node to hidden node
25 wout=np.random.uniform(size=(hiddenlayer_neurons,output_neurons)) #
weight of the link from hidden node to output node
26 bout=np.random.uniform(size=(1,output_neurons)) #bias of the link from
hidden node to output node
27
28
29 #draws a random range of numbers uniformly of dim x*y
30 for i in range(epoch):
31
32 #Forward Propogation
33 hinp1=np.dot(X,wh)
34 hinp=hinp1 + bh
35 hlayer_act = sigmoid(hinp)
36 outinp1=np.dot(hlayer_act,wout)
37 outinp= outinp1+ bout
38 output = sigmoid(outinp)
39
40 #Backpropagation
41 EO = y-output
42 outgrad = derivatives_sigmoid(output)
43 d_output = EO* outgrad
44 EH = d_output.dot(wout.T)
45
46 #how much hidden layer weights contributed to error
47 hiddengrad = derivatives_sigmoid(hlayer_act)
48 d_hiddenlayer = EH * hiddengrad
49
50 # dotproduct of nextlayererror and currentlayerop
51 wout += hlayer_act.T.dot(d_output) *lr
52 wh += X.T.dot(d_hiddenlayer) *lr
53
54 print("Input: \n" + str(X))
55 print("Actual Output: \n" + str(y))
56 print("Predicted Output: \n" ,output)
Listing 4: ANN backpropogation algorithm implementation
Output
RC
Viva Voce
1. What is pereceptron?
IT
2. how multi-layer ANN is better than single layer ANN?
Reference
1. Raschka, Sebastian, and Vahid Mirjalili. Python machine learning: Machine learning
and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing
Ltd, 2019.
Program No. 5
Objective: Write a program to implement the naive Bayesian classifier for a sample
training data set stored as a .CSV file. Compute the accuracy of the classifier, considering
few test data sets
17 trainset = []
18 copy = list(dataset);
19 while len(trainset) < trainsize:
20 #generate indices for the dataset list randomly to pick ele for
training data
M
21 index = random.randrange(len(copy));
22 trainset.append(copy.pop(index))
23 return [trainset, copy]
24
25 def separatebyclass(dataset):
26 separated = {} #dictionary of classes 1 and 0
27 #creates a dictionary of classes 1 and 0 where the values are
28 #the instances belonging to each class
29 for i in range(len(dataset)):
30 vector = dataset[i]
31 if (vector[-1] not in separated):
32 separated[vector[-1]] = []
33 separated[vector[-1]].append(vector)
34 return separated
35
36 def mean(numbers):
37 return sum(numbers)/float(len(numbers))
38
39 def stdev(numbers):
40 avg = mean(numbers)
41 variance = sum([pow(x-avg,2) for x in numbers])/float(len(numbers)-1)
42 return math.sqrt(variance)
43
92 if testset[i][-1] == predictions[i]:
93 correct += 1
94 return (correct/float(len(testset))) * 100.0
95
96 def main():
97 filename = ’naivedata.csv’
98 splitratio = 0.67
99 dataset = loadcsv(filename);
100
101 trainingset, testset = splitdataset(dataset, splitratio)
102 print(’Split {0} rows into train={1} and test={2} rows’.format(len(
dataset), len(trainingset), len(testset)))
103 # prepare model
104 summaries = summarizebyclass(trainingset);
105 #print(summaries)
106 # test model
107 predictions = getpredictions(summaries, testset) #find the
predictions of test data with the training data
108 accuracy = getaccuracy(testset, predictions)
109
110
111 main()
RC
print(’Accuracy of the classifier is : {0}%’.format(accuracy))
Output
IT
Split 768 rows into train=514 and test=254 rows
Accuracy of the classifier is : 71.65354330708661%
M
Viva Voce
1. What do you understand by classification Algorithm in machine learning?
Reference
1. Raschka, Sebastian, and Vahid Mirjalili. Python machine learning: Machine learning
and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing
Ltd, 2019.
Program No. 6
Objective: Assuming a set of documents that need to be classified, use the naive Bayesian
Classifier model to perform this task.Calculate the accuracy, precision, and recall for your
data set.
10
11
12
RC
print(’The dimensions of the dataset’,msg.shape)
msg[’labelnum’]=msg.label.map({’pos’:1,’neg’:0})
X=msg.message
13 y=msg.labelnum
14
15 #splitting the dataset into train and test data
IT
16 xtrain,xtest,ytrain,ytest=train_test_split(X,y)
17 print (’\n the total number of Training Data :’,ytrain.shape)
18 print (’\n the total number of Test Data :’,ytest.shape)
19
20
21 #output the words or Tokens in the text documents
M
22 cv = CountVectorizer()
23 xtrain_dtm = cv.fit_transform(xtrain)
24 xtest_dtm=cv.transform(xtest)
25 print(’\n The words or Tokens in the text documents \n’)
26 print(cv.get_feature_names())
27 df=pd.DataFrame(xtrain_dtm.toarray(),columns=cv.get_feature_names())
28
29 # Training Naive Bayes (NB) classifier on training data.
30 clf = MultinomialNB().fit(xtrain_dtm,ytrain)
31 predicted = clf.predict(xtest_dtm)
32
Output
The dimensions of the dataset (18, 2)
0 1
1 1
M
2 1
3 1
4 1
5 0
6 0
7 0
8 0
9 0
10 1
11 0
12 1
13 0
14 1
15 0
16 1
17 0
Confusion matrix RC
[[2 1]
[0 2]]
Reference
1. Raschka, Sebastian, and Vahid Mirjalili. Python machine learning: Machine learning
and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing
Ltd, 2019.
Program No. 7
Program Bayesian network constructed using python pgmpy package used for making
network.
1 import numpy as np
2 import csv
3 import pandas as pd
4 from pgmpy.models import BayesianModel
5 from pgmpy.estimators import MaximumLikelihoodEstimator
6 from pgmpy.inference import VariableElimination
7
8
9
10
11
RC
heartDisease = pd.read_csv(’heart.csv’)
heartDisease = heartDisease.replace(’?’,np.nan)
’)])
18
Output
RC
IT
Viva Voce
M
Reference
1. Raschka, Sebastian, and Vahid Mirjalili. Python machine learning: Machine learning
and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing
Ltd, 2019.
Program No. 8
Objective: Apply EM algorithm to cluster a set of data stored in a .CSV file. Use the
same data set for clustering using k-Means algorithm. Compare the results of these two
algorithms and comment on the quality of clustering.
11
12
iris = datasets.load_iris()
X = pd.DataFrame(iris.data)
RC
X.columns = [’Sepal_Length’,’Sepal_Width’,’Petal_Length’,’Petal_Width’]
13
14 y = pd.DataFrame(iris.target)
15 y.columns = [’Targets’]
IT
16
64
labels_))
RC
print(’The accuracy score of K-Mean: ’,sm.accuracy_score(y, model.
74 gmm = GaussianMixture(n_components=3)
75 gmm.fit(xs)
76
77 y_cluster_gmm = gmm.predict(xs)
78 #y_cluster_gmm
79
80 plt.subplot(2, 2, 3)
81 plt.scatter(X.Petal_Length, X.Petal_Width, c=colormap[y_cluster_gmm], s
=40)
82 plt.title(’GMM Classification’)
83
84 print(’The accuracy score of EM: ’,sm.accuracy_score(y, y_cluster_gmm))
85 print(’The Confusion matrix of EM: ’,sm.confusion_matrix(y,
y_cluster_gmm))
Listing 8: Clustering
Viva Voce
1. Is clustering unsupervised or supervised learning ?
Reference
1. Raschka, Sebastian, and Vahid Mirjalili. Python machine learning: Machine learning
and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing
Ltd, 2019.
RC
IT
M
Program No. 9
23 y_pred=classifier.predict(x_test)
24
25 print(’Confusion Matrix’)
26 print(confusion_matrix(y_test,y_pred))
27 print(’Accuracy Metrics’)
28 print(classification_report(y_test,y_pred))
Listing 9: KNN implemenation
Output
sepal-length sepal-width petal-length petal-width
[[5.1 3.5 1.4 0.2]
[4.9 3. 1.4 0.2]
[4.7 3.2 1.3 0.2]
[4.6 3.1 1.5 0.2]
[5. 3.6 1.4 0.2]
. . . . .
. . . . .
[6.2 3.4 5.4 2.3]
Confusion Matrix[[20 0 0]
[ 0 10 0]
[ 0 1 14]]
Accuracy Metrics
Precision recall f1-score support
0 1.00 1.00 1.00 20
1 0.91 1.00 0.95 10
2 1.00 0.93 0.97 15
avg / total 0.98 0.98 0.98 45
Viva Voce RC
1. What are features of iris-data?
Ltd, 2019.
Program No. 10
21 # generate dataset
22 X = np.linspace(-3, 3, num=n)
23 print("The Data Set ( 10 Samples) X :\n",X[1:10])
24 Y = np.log(np.abs(X ** 2 - 1) + .5)
25 print("The Fitting Curve Data Set (10 Samples) Y :\n",Y[1:10])
26 # jitter X
27 X += np.random.normal(scale=.1, size=n)
28 print("Normalised (10 Samples) X :\n",X[1:10])
29
42 show(gridplot([
43 [plot_lwr(10.), plot_lwr(1.)],
44 [plot_lwr(0.1), plot_lwr(0.01)]]))
Listing 10: Non-parametric regression
Output
RC
IT
M
Viva Voce
1. What is difference between regression and classification?
2. Is regression supervised learning?
3. Name few important regression algorithm.
Reference
1. Raschka, Sebastian, and Vahid Mirjalili. Python machine learning: Machine learning
and deep learning with Python, scikit-learn, and TensorFlow 2. Packt Publishing
Ltd, 2019.
2. Liu, Yuxi Hayden. Python Machine Learning By Example: Implement machine
learning algorithms and techniques to build intelligent systems. Packt Publishing
Ltd, 2019.