Department of Computer Engineering Academic Term: June-Nov 2021

Department of Computer Engineering
Academic Term: June- Nov 2021
Practical No: 4
Title: Implementation of Bayesian algorithm
Date of Performance: 30/08/2021
Date of Submission: 06/09/2021
Name of the Student: Candida Ruth Noronha
Class: TE COMPS B
Roll No: 8960
Evaluation
Indicator BS – MS - ES - Exceeds Marks

Below Meet Standard awarded
Standard standar
d
On time Completion & 02 (On 00 (Not on Time) 00 (Not on Time)
Submission (02) Time)
Logic/Algorithm Complexity 04(Correct) 02(Partial) 01 (Tried)
analysis(04)
Coding Standards (04): 04(All 02 (Partial) 01 (rarely
Comments/indention/Naming used) followed)
conventions
Output/Test Cases
Total marks out of 10
Signature of the Teacher

Naive Bayes is a classification algorithm for binary (two-class) and multiclass classification
problems. It is called Naive Bayes because the calculations of the probabilities for each class
are simplified to make their calculations tractable. In simple terms, a Naive Bayes classifier
assumes that the presence of a particular feature in a class is unrelated to the presence of any
other feature.
Dataset: https://raw.githubusercontent.com/jbrownlee/Datasets/master/iris.csv
Algorithm:
1) Importing the Libraries
2) Importing the dataset
3) Separating our training data by class
4) Summarize the dataset with two statistics i.e. mean and standard deviation
5) Summarize the columns in the dataset organized by class values
6) Calculating the Gaussian probability of observing a given real-value like x
7) Calculate the probabilities that data belongs to the first class, the second class, and for all
the other classes.
Functions:
1. Load CSV file: The Iris dataset is used for the experiment. This is in the form of a CSV
file. Next, we will open the file and read the data of the file and append the numbers in
the list.
2. Convert String Columns to Float: For each row in the dataset, we convert
corresponding string values to float.
3. Convert String Columns to Integer: We create an empty dictionary. Then we traverse

through each value and using the set function we append the unique values in the
dictionary.
4. Split the dataset by class values, and return a dictionary: First we create an empty
dictionary by name “separated”. This is to store the data items according to class values.
Using a for loop we traverse through the dataset and split it.
5. Calculate the mean of a list of numbers: Simply return the mean (average) by dividing
sum of numbers by total numbers.
6. Calculate the standard deviation of a list of numbers: Deviation is the square root for
variation. First, we calculate the average of given data. With the help of a given formula
we find the variation.
7. Calculate the mean, stdev and count for each column in a dataset: We calculate the
mean standard deviation and count for each column in the given data set.
8. Split dataset by class then calculate statistics for each row: we have developed the
function to separate a dataset into rows by class. And we have developed a function to
calculate summary statistics for each column. We can put all of this together and
summarize the columns in the dataset organized by class values.
9. Calculate the Gaussian probability distribution function for x: We use the formula to
find the exponent. And we have the mean and standard deviation to find the gaussian
probability distribution.
In probability theory, a normal (or Gaussian) distribution is a type of continuous

probability distribution for a real-valued random variable. The general form of its
probability density function is given by:
where the e and π are mathematical constants. The parameter ‘μ’ is the mean or
expectation of the distribution, while the parameter ‘𝝈’ is its standard deviation.
10. Calculate the probabilities of predicting each class for a given row: Probabilities are
calculated separately for each class. This means that we first calculate the probability that
a new piece of data belongs to the first class, then calculate probabilities that it belongs to
the second class, and so on for all the classes.
11. Predict the class for a given row: We compare the classes received from the previous
function and compare them and return the best class for the given row.
12. Make a prediction with Naive Bayes on Iris Dataset: Load the Dataset which is saved
as iris.csv file. Use this to predict class for a given row.
Code:
#Make Predictions with Naive Bayes on the Iris Dataset

from csv import reader
from math import sqrt
from math import exp
from math import pi
#Load CSV file

def load_csv(filename):
dataset = list()
with open(filename, 'r') as file:
csv_reader = reader(file)
for row in csv_reader:
if not row:
continue
dataset.append(row)
return dataset
#Convert String Columns to Float

ddef str_column_to_float(dataset, column):
for row in dataset:
row[column] = float(row[column].strip())
#Convert String Columns to Integer

def str_column_to_int(dataset, column):
class_values = [ row[column] for row in dataset ]
unique = set(class_values)
lookup = dict()
for i, value in enumerate(unique):
lookup[value] = i
print(value+' => '+str(i))
for row in dataset:
row[column] = lookup[row[column]]
return lookup
#Split the dataset by class values, returns a dictionary

def separate_by_class(dataset):
separated = dict()
for i in range(len(dataset)):
vector = dataset[i]
class_value = vector[-1]
if (class_value not in separated):
separated[class_value] = list()
separated[class_value].append(vector)
return separated
#Calculate the mean of a list of numbers

def mean(numbers):
return sum(numbers)/float(len(numbers))
#Calculate the standard deviation of a list of numbers
def stdev(numbers):
avg = mean(numbers)
variance = sum([(x-avg)**2 for x in numbers]) / float(len(numbers)-1)
return sqrt(variance)
#Calculate the mean, stdev and count for each column in a dataset
def summarize_dataset(dataset):
summaries = [(mean(column), stdev(column), len(column)) for column in zip(*dataset)]
del(summaries[-1])
return summaries
#Split dataset by class then calculate statistics for each row

def summarize_by_class(dataset):
separated = separate_by_class(dataset)
summaries = dict()
for class_value, rows in separated.items():
summaries[class_value] = summarize_dataset(rows)
return summaries
#Calculate the Gaussian probability distribution function for x

def calculate_probability(x, mean, stdev):
exponent = exp(-((x-mean)**2 / (2 * stdev**2 )))
return (1 / (sqrt(2 * pi) * stdev)) * exponent
#Calculate the probabilities of predicting each class for a given row

def calculate_class_probabilities(summaries, row):
total_rows = sum([summaries[label][0][2] for label in summaries])
probabilities = dict()
for class_value, class_summaries in summaries.items():
probabilities[class_value] = summaries[class_value][0][2]/float(total_rows)
for i in range(len(class_summaries)):
mean, stdev, _ = class_summaries[i]
probabilities[class_value] *= calculate_probability(row[i], mean, stdev)
return probabilities
#Predict the class for a given row
def predict(summaries, row):
probabilities = calculate_class_probabilities(summaries, row)
best_label, best_prob = None, -1
for class_value, probability in probabilities.items():
if best_label is None or probability > best_prob:
best_prob = probability
best_label = class_value
return best_label
#Make a prediction with Naive Bayes on Iris Dataset
filename = './iris.csv'
dataset = load_csv(filename)
for i in range(len(dataset[0])-1):
str_column_to_float(dataset, i)
#Convert class column to integers

str_column_to_int(dataset, len(dataset[0])-1)
#Fit model
model = summarize_by_class(dataset)
#Define a new record

row = [5.9, 2.7, 4.2, 1.4]
#Predict the label

label = predict(model, row)
print('Data='+str(row)+' Predicted='+str(label))
Output:
Conclusion:
Running the data first summarizes the mapping of class labels to integers and then fits the
model on the entire dataset. There are three class labels. 0,1 & 2. In the output, when a new
observation is defined, a class label is predicted. Here, our observation is predicted as
belonging to class 1 which is ‘Iris-versicolor’.

Department of Computer Engineering Academic Term: June-Nov 2021

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Department of Computer Engineering Academic Term: June-Nov 2021

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Department of Computer Engineering Academic Term: June-Nov 2021

Uploaded by

Copyright:

Available Formats

Department of Computer Engineering

Academic Term: June- Nov 2021

Title: Implementation of Bayesian algorithm

Date of Performance: 30/08/2021

Date of Submission: 06/09/2021

Name of the Student: Candida Ruth Noronha

Roll No: 8960

Indicator BS – MS - ES - Exceeds Marks

Total marks out of 10

Signature of the Teacher

3. Convert String Columns to Integer: We create an empty dictionary. Then we traverse

In probability theory, a normal (or Gaussian) distribution is a type of continuous

#Make Predictions with Naive Bayes on the Iris Dataset

#Load CSV file

#Convert String Columns to Float

#Convert String Columns to Integer

#Split the dataset by class values, returns a dictionary

#Calculate the mean of a list of numbers

#Split dataset by class then calculate statistics for each row

#Calculate the Gaussian probability distribution function for x

#Calculate the probabilities of predicting each class for a given row

#Convert class column to integers

#Define a new record

#Predict the label

You might also like