ML Lab Manual AIML Final

Download as pdf or txt
Download as pdf or txt
You are on page 1of 61

MACHINE LEARNING LAB MANUAL

SREYAS

SREYAS INSTITUTE OF ENGINEERING & TECHNOLOGY


(Approved by AICTE, Affiliated to JNTUH)
2-50/5, SyNo.107, Tattiannaram(V), G.S.I Bandlaguda, Nagole, Hyderabad – 500 068

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING(AIML)

III-B Tech – II Semester [Branch: CSE(AIML)


MACHINE LEARNING LAB MANUAL

SREYAS INSTITUTE OF ENGINEERING & TECHNOLOGY


Besides Indu Aranya, Nagole, Hyderabad.

Department of Computer Science and Engineering (AIML), SREYAS Page 1


MACHINE LEARNING LAB MANUAL
SREYAS

SREYAS INSTITUTE OF ENGINEERING AND TECHNOLOGY


(Approved by AICTE, Affiliated to JNTUH)
G.S.I. Bandlaguda, Nagole, Hyderabad - 500068

CERTIFICATE

LAB NAME : MACHINE LEARNING LAB MANUAL

BRANCH : CSE (AIML)

YEAR & SEM : III – I

REGULATION : R18

Department of Computer Science and Engineering (AIML), SREYAS Page 2


MACHINE LEARNING LAB MANUAL
SREYAS

VISION & MISSION OF INSTITUTION

VISION
To be a centre of excellence in technical education to empower the
young talent through quality education and innovative engineering for well
being of the society
MISSION
1. Provide quality education with innovative methodology and intellectual
human capital.
2. Provide conducive environment for research and developmental
activities.
3. Inculcate holistic approach towards nature, society and human ethics with
lifelong learning attitude.

Department of Computer Science and Engineering (AIML), SREYAS Page 3


MACHINE LEARNING LAB MANUAL
SREYAS

VISION & MISSION OF DEPARTMENT

Vision

To excel in computer science engineering education with best learning


practices, research and professional ethics.

Mission

1. To offer technical education with innovative teaching, good infrastructure


and qualified human resources.
2. Accomplish a process to advance knowledge in the subject and promote
academic and research environment.
3. To impart moral and ethical values and interpersonal skills to the students.

Program Educational Objectives

Computer Science & Engineering (CSE) is one of the most prominent technical
fields in Engineering. The curriculum offers courses with various areas of emphasis
on theory, design and experimental work. Subject matter ranges from basics of
Computers & Programming Languages to Compiler Design and Cloud Computing.
It maintains strong tie-ups with industry and is dedicated to preparing students for a
career in Web Technologies, Object Oriented Analysis and Design, Networking &
Security, Databases, Data Mining & Data Warehousing and Software Testing.

Department of Computer Science and Engineering (AIML), SREYAS Page 4


MACHINE LEARNING LAB MANUAL
SREYAS

PROGRAM OUTCOMES (POs)


Engineering Graduates will be able to:
1. Engineering Knowledge: Apply the knowledge of mathematics, science, engineering
fundamentals, and an engineering specialization to the solution of complex engineering
problems.
2. Problem analysis: Identify, formulate, review research literature, and analyze complex
engineering problems reaching substantiated conclusions using first principles of mathematics,
natural sciences, and engineering sciences.
3. Design/development of solutions: Design solutions for complex engineering problems and
design system components or processes that meet the specified needs with appropriate
consideration for the public health and safety, and the cultural, societal, and environmental
considerations.
4. Conduct investigations of complex problems: Use research-based knowledge and research
methods including design of experiments, analysis and interpretation of data, and synthesis of
the information to provide valid conclusions.
5. Modern tool usage: Create, select, and apply appropriate techniques, resources, and modern
engineering and IT tools including prediction and modeling to complex engineering activities
with an understanding of the limitations.
6. The engineer and society: Apply reasoning informed by the contextual knowledge to assess
societal, health, safety, legal and cultural issues and the consequent responsibilities relevant to
the professional engineering practice.
7. Environment and sustainability: Understand the impact of the professional engineering
solutions in societal and environmental contexts, and demonstrate the knowledge of, and need
for sustainable development.
8. Ethics: Apply ethical principles and commit to professional ethics and responsibilities and
norms of the engineering practice.
9. Individual and team work: Function effectively as an individual, and as a member or leader
in diverse teams, and in multidisciplinary settings.
10. Communication: Communicate effectively on complex engineering activities with the
engineering community and with society at large, such as, being able to comprehend and write
effective reports and design documentation, make effective presentations, and give and receive
clear instructions.

Department of Computer Science and Engineering (AIML), SREYAS Page 5


MACHINE LEARNING LAB MANUAL
SREYAS

11. Project management and finance: Demonstrate knowledge and understanding of the
engineering and management principles and apply these to one’s own work, as a member and
leader in a team, to manage projects and in multidisciplinary environments.
12. Life-long learning: Recognize the need for, and have the preparation and ability to engage in
independent and life-long learning in the broadest context of technological change.

PROGRAM SPECIFIC OUTCOMES (PSOs)

13. Proficiency on the contemporary skills towards development of innovative apps and firmware
products.
14. Capabilities to participate in the construction of software systems of varying complexity.

DEPARTMENT OF COMPUTER SCIENCE & ENGINEERING


III Year B.Tech CSE- II Sem

Department of Computer Science and Engineering (AIML), SREYAS Page 6


MACHINE LEARNING LAB MANUAL
SREYAS

MACHINE LEARNING LAB INSTRUCTIONS TO THE STUDENTS

Things to Do:

1) Students should come in formal dresses.


2) Students must wear their id cards.
3) They have to be in the lab before 10 minutes.
4) They should come up with the observation and the record.
5) Observation should get corrected with the concerned faculty.
6) The programs corrected by the faculty have to copy to record.
7) They should maintain silence in the lab.

Things not to do:

1) Students should not bring any electronic gadgets into the lab.
2) They should not come late.
3) You should not create any disturbances to others.

HOD Lab Incharge


JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY HYDERABAD

R18 B.Tech. CSE (AIML) III Year JNTU Hyderabad

Department of Computer Science and Engineering (AIML), SREYAS Page 7


MACHINE LEARNING LAB MANUAL
SREYAS

MACHINE LEARNING LAB


B.Tech. III Year II Sem. LTPC
0 0 3 1.5
Course Objective: The objective of this lab is to get an overview of the various machine learning
techniques and can able to demonstrate them using python.

Course Outcomes: After the completion of the course the student can able to:
understand complexity of Machine Learning algorithms and their limitations;
understand modern notions in data analysis-oriented computing;
be capable of confidently applying common Machine Learning algorithms in practice and implementing
their own;
be capable of performing experiments in Machine Learning using real-world data.

List of Experiments
1. The probability that it is Friday and that a student is absent is 3 %. Since there are 5 school
days in a week, the probability that it is Friday is 20 %. What is the probability that a student is
absent given that today is Friday? Apply Baye’s rule in python to get the result. (Ans: 15%)

2. Extract the data from database using python

3. Implement k-nearest neighbours classification using python

4. Given the following data, which specify classifications for nine combinations of VAR1 predict a
classification for a case where VAR1=0.906 and VAR2=0.606, using the result of kmeans
clustering with 3 means (i.e., 3 centroids)
VAR1 VAR2 CLASS
1.713 1.586 0
0.180 1.786 1
0.353 1.240 1
0.940 1.566 0
1.486 0.759 1
1.266 1.106 0
1.540 0.419 1
0.459 1.799 1
0.773 0.186 1
5. The following training examples map descriptions of individuals onto high, medium and low
credit-worthiness.
medium skiing design single twenties no -> highRisk
high golf trading married forties yes -> lowRisk
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisk
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk
high skiing banking single thirties yes -> highRisk
low golf unemployed married forties yes -> highRisk
Input attributes are (from left to right) income, recreation, job, status, age-group, home-owner. Find the
unconditional probability of `golf' and the conditional probability of `single' given `medRisk' in the
dataset?

6. Implement linear regression using python.


7. Implement Naïve Bayes theorem to classify the English text

8. Implement an algorithm to demonstrate the significance of genetic algorithm

Department of Computer Science and Engineering (AIML), SREYAS Page 8


MACHINE LEARNING LAB MANUAL
SREYAS

9. Implement the finite words classification system using Back-propagation algorithm

MACHINE LEARNING:
Introduction:

Department of Computer Science and Engineering (AIML), SREYAS Page 9


MACHINE LEARNING LAB MANUAL
SREYAS

Machine learning is a subfield of artificial intelligence that enables the systems to learn
and improve from experience without being explicitly programmed. Machine learning
algorithms detect patterns in data and learn from them, in order to make their own predictions. In
traditional programming, a computer engineer writes a series of directions that instruct a computer
how to transform input data into a desired output. Machine learning, on the other hand, is an
automated process that enables machines to solve problems with little or no human input, and take
actions based on past observations.
Machine learning can be put to work on massive amounts of data and can perform much
more accurately than humans. It helps us to save time and money on tasks and analyses,
like solving customer pain points to improve customer satisfaction, support ticket automation,
and data mining from internal sources and all over the internet.
The four most common and most used types of machine learning:
I. Supervised Learning:

Supervised learning algorithms make predictions based on labeled training data. Each
training sample includes an input and a desired output. A supervised learning algorithm analyzes
this sample data and makes an inference.

Data is labeled to tell the machine what patterns (similar words and images, data categories, etc.)
it should be looking for and recognize connections with.

Fig: Working of Supervised Learning with Example

Here we have a dataset of different types of shapes which includes square, rectangle, triangle, and
Polygon. Now the first step is that we need to train the model for each shape.

o If the given shape has four sides, and all the sides are equal, then it will be labeled as
a Square.
o If the given shape has three sides, then it will be labeled as a triangle.
o If the given shape has six equal sides then it will be labeled as hexagon.

Now, after training, we test our model using the test set, and the task of the model is to identify
the shape.

The machine is already trained on all types of shapes, and when it finds a new shape, it classifies
the shape on the bases of a number of sides, and predicts the output.

Department of Computer Science and Engineering (AIML), SREYAS Page 10


MACHINE LEARNING LAB MANUAL
SREYAS

Classification in supervised machine learning


Supervised Machine Learning algorithm can be broadly classified into Regression and
Classification Algorithms

1. Regression

Regression algorithms are used if there is a relationship between the input variable and the output
variable. It is used for the prediction of continuous variables, such as Weather forecasting, Market
Trends, etc. Below are some popular Regression algorithms which come under supervised
learning:

o Linear Regression
o Regression Trees
o Non-Linear Regression
o Bayesian Linear Regression
o Polynomial Regression

Classification

Classification algorithms are used when the output variable is categorical, which means there are
two classes such as Yes-No, Male-Female, True-false, etc.

Spam Filtering,

o Random Forest
o Decision Trees
o Logistic Regression
o Support vector Machines

II. Unsupervised Learning:


Unsupervised learning is a type of machine learning in which models are trained using unlabeled
dataset and are allowed to act on that data without any supervision.
Unsupervised learning cannot be directly applied to a regression or classification problem because
unlike supervised learning, we have the input data but no corresponding output data. The goal of
unsupervised learning is to find the underlying structure of dataset, group that data according
to similarities, and represent that dataset in a compressed format.

Department of Computer Science and Engineering (AIML), SREYAS Page 11


MACHINE LEARNING LAB MANUAL
SREYAS

Fig: Working of Unsupervised Learning with Example

Here, unlabeled input data is considered, which means it is not categorized and
corresponding outputs are also not given. Now, this unlabeled input data is fed to the machine
learning model in order to train it. Firstly, it will interpret the raw data to find the hidden patterns
from the data and then will apply suitable algorithms such as k-means clustering, Decision tree,
etc.

Once it applies the suitable algorithm, the algorithm divides the data objects into groups according
to the similarities and difference between the objects.

Classification in Unsupervised machine learning:


Unsupervised Machine Learning algorithm can be broadly classified into Clustering and
Association Algorithms.

o Clustering: Clustering is a method of grouping the objects into clusters such that objects
with most similarities remains into a group and has less or no similarities with the objects
of another group. Cluster analysis finds the commonalities between the data objects and
categorizes them as per the presence and absence of those commonalities.
o Association: An association rule is an unsupervised learning method which is used for
finding the relationships between variables in the large database. It determines the set of
items that occurs together in the dataset. Association rule makes marketing strategy more
effective. Such as people who buy X item (suppose a bread) are also tend to purchase Y
(Butter/Jam) item. A typical example of Association rule is Market Basket Analysis.

Below is the list of some popular unsupervised learning algorithms:

Department of Computer Science and Engineering (AIML), SREYAS Page 12


MACHINE LEARNING LAB MANUAL
SREYAS

o K-means clustering
o KNN (k-nearest neighbors)
o Hierarchal clustering
o Anomaly detection
o Neural Networks
o Principle Component Analysis
o Independent Component Analysis
o Apriori algorithm
o Singular value decomposition

Semi-supervised learning is a type of machine learning that falls in between supervised and
unsupervised learning. It is a method that uses a small amount of labeled data and a large
amount of unlabeled data to train a model. The goal of semi-supervised learning is to learn
a function that can accurately predict the output variable based on the input variables, similar
to supervised learning. However, unlike supervised learning, the algorithm is trained on a
dataset that contains both labeled and unlabeled data.
Semi-supervised learning is particularly useful when there is a large amount of unlabeled data
available, but it’s too expensive or difficult to label all of it. Some examples of semi-supervised
learning applications include:
Text classification: In text classification, the goal is to classify a given text into one or more
predefined categories. Semi-supervised learning can be used to train a text classification model
using a small amount of labeled data and a large amount of unlabeled text data.
Image classification: In image classification, the goal is to classify a given image into one or
more predefined categories. Semi-supervised learning can be used to train an image
classification model using a small amount of labeled data and a large amount of unlabeled image
data.
Anomaly detection: In anomaly detection, the goal is to detect patterns or observations that are
unusual or different from the norm
Reinforcement Learning (RL) is the science of decision making. It is about learning the optimal
behavior in an environment to obtain maximum reward. In RL, the data is accumulated from
machine learning systems that use a trial-and-error method. Data is not part of the input that we
would find in supervised or unsupervised machine learning.
Reinforcement learning uses algorithms that learn from outcomes and decide which action to
take next. After each action, the algorithm receives feedback that helps it determine whether the
choice it made was correct, neutral or incorrect. It is a good technique to use for automated
systems that have to make a lot of small decisions without human guidance.
Reinforcement learning is an autonomous, self- teaching system that essentially learns by trial
and error. It performs actions with the aim of maximizing rewards, or in other words, it is
learning by doing in order to achieve the best outcomes.

Department of Computer Science and Engineering (AIML), SREYAS Page 13


MACHINE LEARNING LAB MANUAL
SREYAS

PROGRAM 1

1. a. The probability that it is Friday and that a student is absent is 3%. The probability
that it is Friday is 20%. What is the probability that a student is absent given that today is
Friday? Apply Baye’s rule in python to get the result.

AIM: To find the probability that a student is absent given that today is Friday

THEORY:
Baye’s theorem gives the formula for determining conditional probability. Conditional
probability is the likelihood of an outcome occurring, based on a previous outcome having
occurred in similar circumstances. Baye’s theorem provides a way to revise existing predictions
or theories i.e., update the probabilities given new or additional evidence.
Bayes' theorem relies on incorporating prior probability distributions in order to generate
posterior probabilities. Prior probability, in Bayesian statistical inference, is the probability of an
event occurring before new data is collected. Posterior probability is the revised probability of an
event occurring after taking into consideration the new information.

Where, P(A) = The probability of A occurring


P(B) = The probability of B occurring
P(AIB) = The probability of A given B
P(BIA) = The probability of B given A
P(AᴖB) = The probability of both A and B occurring
FLOW CHART:

Department of Computer Science and Engineering (AIML), SREYAS Page 14


MACHINE LEARNING LAB MANUAL
SREYAS

SOURCE CODE:
#pf is for p(F)
pf =0.20
#panf is for p(A∩F)
panf =0.03
#paf is for p(A/F)
paf=panf/pf
#convert to percentage
r=paf*100
#print the result
print(“The probability of a given student is absent given that day is Friday = ”, int(r) , ”%” )

Output: The probability of given student is absent given that day is Friday =15%

Result: Thus the program to find probability of student absent given that day is Friday is
executed and the output is verified.

1.b 10% of the patients entering into a clinic are having liver disease. 5% of the patients are
alcoholic .The probability of that patient is alcoholic given that they have liver disease is 7%.Find
the probability that a patient having liver disease give that alcoholic.

AIM: To write a program to find probability of liver disease patients to that of alcoholic.

SOURCE CODE:

#pl is for p(L)


pl=0.1
#pa is for p(A)
pa=0.05
#pal is for p(A/L)
pal=0.07
# pla is for P(L/A)
pla=pal.pl/pa
#convert to percentage
r=pla*100
#print the result
print(“the probability of liver disease patient given alcoholic= ”, int(r) ,”%” )

Output: The probability of liver disease patient given alcoholic=14%

Result: Thus the program to find the probability of liver disease patients to that of alcoholic is
executed and the output is verified.

Department of Computer Science and Engineering (AIML), SREYAS Page 15


MACHINE LEARNING LAB MANUAL
SREYAS

PROGRAM 2

Extract the data from database using python

2.a. AIM: To write a python program to fetch and display records from the product table.

Required Installations: pip install mysql-connector-python

FLOWCHART:

SOURCE CODE:
BACKEND:
$ sudo mysql
mysql>create user ‘USERNAME’ @’Localhost’ identified by ‘PASSWORD’;
mysql>grant all on *.* to ‘USERNAME’ @ ‘Localhost’;
mysql>flush privileges;
mysql>exit
$ mysql -u USERNAME -p
Enter password: PASSWORD

Department of Computer Science and Engineering (AIML), SREYAS Page 16


MACHINE LEARNING LAB MANUAL
SREYAS

mysql>create database DATABASENAME;


mysql> use DATABASENAME;
mysql>create table product(pcode varchar(20),pname varchar(30));
mysql>Insert into product values(“p101”,”A”);
mysql>Insert into product values(“p102,”B);
mysql>Insert into product values(“p103,”c);
mysql>exit

FRONT END:

import mysql.connector
d=mysql.connector.connect(host=”Localhost”,user=”USERNAME”,password=”PASSWORD”,
database=”DATABASENAME”)
print(d)
k=d.cursor()
k.execute(“select * from product”)
r=k.fetchall()
print(“The records from product table are”)
for i in r:
print(i)

Output:
The records from product table are
(‘p101’,’A’)
(‘p102’,’B’)
(‘p103’,’C’)

Result: To fetch and display records from product table is executed and verified.

2. b. AIM: To write a python program to fetch and display records from customer table in
descending order by names.

SOURCE CODE:

BackEnd:

$ mysql -u USERNAME -p
Enter password: PASSWORD
mysql> use DATABASENAME;
mysql> create table customer( cname char(30),caddress char(100), cmobile_no real);
mysql> insert into customer values(“H1”,”Hyderabad”,7123456789);
mysql>insert into customer values(“A1”,”Delhi”,0876543210);
mysql>insert into customer values(“Z1”,”Mumbai”,965432198);
mysql> exit

FrontEnd:

Department of Computer Science and Engineering (AIML), SREYAS Page 17


MACHINE LEARNING LAB MANUAL
SREYAS

import mysql.connector
d=mysql.connector.connect(host=”Localhost”,user=”USERNAME”,password=”PASSWORD”,
database=”DATABASENAME”)
print(d)
k=d.cursor()
k.execute(“select * from customers ORDER BY cname DESC”)
r=k.fetchall()
print(“The records from customer table are”)
for i in r:
print(i)

Output:
The records from customer table are
(Z1,Mumbai,965432198)
(H1,Hyderabad,7123456789)
(A1,Delhi,0876543210)

Result: Thus the program to fetch and display records from customer table executed and
verified.

2.c. AIM: To write a program to design GUI using tkinter to read data and store into database.

Required Installations: sudo apt-get install python3-tk

SOURCE CODE:

BackEnd:

$ mysql -u USERNAME -p
ENTER PASSWORD: PASSWORD
mysql>use DATABASENAME;
mysql>create table cplayer( cname char(100), cruns int);
mysql>exit

FrontEnd:

import tkinter as t
import mysql.connector
def f1():
x = v1.get()
y = v2.get()
d = mysql.connector.connect(host = 'Localhost', user = 'USERNAME', password =
'PASSWORD’, database = 'DATABASE')
c = d.cursor()
s = "insert into cplayer(cname,cruns)values(%s,%s)"
a = (x, int(y))
c.execute(s,a)
d.commit()

Department of Computer Science and Engineering (AIML), SREYAS Page 18


MACHINE LEARNING LAB MANUAL
SREYAS

d.close()
w = t.Tk()
l1 = t.Label(text = "player name")
l1.place(x = 100, y = 50)
v1 = t.StringVar()
t1 = t.Entry(text = " ", textvariable = v1)
t1.place(x = 200, y = 50)
l2 = t.Label(text = "player runs")
l2.place(x = 100, y = 100)
v2 = t.StringVar()
t2 = t.Entry(text = " ", textvariable = v2)
t2.place(x = 200, y = 100)
b = t.Button(text = "submit", command = f1)
b.place(x = 300, y = 200)
w.mainloop()

OUTPUT:

$ mysql -u USERNAME -p
ENTER PASWORD: PASSWORD
mysql> use DATABASE;
mysql> select * from cplayer;

RESULT: Thus the program to design and read values from GUI is executed and the output is
verified.

Department of Computer Science and Engineering (AIML), SREYAS Page 19


MACHINE LEARNING LAB MANUAL
SREYAS

2.d. AIM:To write a program to design GUI using tkinter to read data and store into database.

SOURCECODE:

BackEnd:
$ mysql -u USERNAME -p
ENTER PASSWORD:PASSWORD
mysql> use DATABASENAME;
mysql> create table student (sname char(100),rollno varchar(20),gender char(20),year int,Branch
char(20));
mysql>exit

FrontEnd:

import tkinter as t
import mysql.connector
def f1():
x = v1.get()
y = v2.get()
p = v3.get()
q = v4.get()
r = v5.get()
d = mysql.connector.connect(host = "Localhost", user = "USERNAME", password =
"PASSWORD", database = "DATABASENAME")
c = d.cursor()
z = "insert into student(sname,rollno,gender,year,branch)values(%s,%s,%s,%s,%s)"
a = (x,y,p,int(q),r)
c.execute(z,a)
d.commit()
d.close()
w = t.Tk()
w.title("Student Registration Form")
l1 = t.Label(text = "Name")
l1.place(x = 100, y =50)
v1 = t.StringVar()
t1 = t.Entry(text = " ", textvariable = v1)
t1.place(x = 200, y = 50)
l2 = t.Label(text = "Roll No.")
l2.place(x = 100, y = 100)
v2 = t.StringVar()
t2 = t.Entry(text = " ", textvariable = v2)
t2.place(x = 200, y = 100)
v3 = t.StringVar()
l3 = t.Label(text = "Gender")
l3.place(x = 100, y = 150)
r1 = t.Radiobutton(w, text = "male", variable = v3, value = "male", command = f1)
r1.pack()
r1.place(x = 100, y =150)
r2 = t.Radiobutton(w, text = "female", variable = v3, value = "female", command = f1)
r2.pack()

Department of Computer Science and Engineering (AIML), SREYAS Page 20


MACHINE LEARNING LAB MANUAL
SREYAS

r2.place(x = 200, y = 150)


l4 = t.Label(text = "year")
l4.place(x = 100, y = 200)
v4 = t.IntVar()
v4.set("Select your year")
drop = t.OptionMenu(w,v4,"1","2","3","4")
drop.pack()
drop.place(x = 200, y = 250)
l5 = t.Label(w , text= "Branch")
l5.place(x = 100, y = 300)
v5 = t.StringVar()
v5.set("Select your Branch")
drop = t.OptionMenu(w, v5, "CSE","AIML","DS","ECE")
drop.pack
drop.place(x = 200, y = 300)
b = t.Button(text = "Submit", command = f1)
b.place(x = 300, y = 400)
w.mainloop()

OUTPUT:

Department of Computer Science and Engineering (AIML), SREYAS Page 21


MACHINE LEARNING LAB MANUAL
SREYAS

mysql> use DATABASE;


mysql>select * from student;

RESULT: Thus the program to design and read values from GUI is executed and output is
verified.

Department of Computer Science and Engineering (AIML), SREYAS Page 22


MACHINE LEARNING LAB MANUAL
SREYAS

PROGRAM 3

3. Implement linear regression using python

Aim: To write a program to predict the price of a house using simple linear regression algorithm.

Theory:

Regression:
A regression is a statistical technique that relates a dependent variable to one or more in-
dependent (explanatory) variables. A regression model is able to show whether changes observed
in the dependent variable are associated with changes in one or more of the explanatory varia-
bles.
Linear Regression:
Linear regression is probably one of the most important and widely used regression tech-
niques. It’s among the simplest regression methods. One of its main advantages is the ease of
interpreting results. When implementing linear regression of some dependent variable 𝑦on the set
of independent variables x= (x1…xrᵣ), where r is the number of predictors, you assume a linear
relationship between y and x: y= β0 + β1x1 + ……+ βrxr+ ϵ. This equation is the regression
equation. β0, 𝛽, βrᵣ are the regression coefficients, and 𝜀is the random error.
Linear regression calculates the estimators of the regression coefficients or simply the
predicted weights, denoted with b1…br. They define the estimated regression function f(x) =
b0 +b1x1 +….+brxr This function should capture the dependencies between the inputs and output
sufficiently well.
Simple Linear Regression
The following figure illustrates simple linear regression:

When implementing simple linear regression, you typically start with a given set of input-
output (x-y) pairs (green circles). These pairs are your observations. For example, the leftmost obser-
vation (green circle) has the input x= 5 and the actual output (response) y= 5. The next one has x= 15
and y= 20, and so on.
The estimated regression function (black line) has the equation f(x)= b0+b1x.Your goal is to
calculate the optimal values of the predicted weights b0 and b1 that minimize SSR and determine the
Department of Computer Science and Engineering (AIML), SREYAS Page 23
MACHINE LEARNING LAB MANUAL
SREYAS

estimated regression function. The value of b0, also called the intercept, shows the point where the
estimated regression line crosses the y axis. It is the value of the estimated response f(x) for x=0. The
value of b1 determines the slope of the estimated regression line.
The predicted responses (red squares) are the points on the regression line that correspond to
the input values. For example, for the input x= 5, the predicted response is f (5) = 8.33 (represented
with the leftmost red square).
The residuals (vertical dashed gray lines) can be calculatedasy1ᵢ-f(x1ᵢ)=y1–b0–b1x1fori=1,
Multiple Linear Regression:
Multiple linear regression is used to estimate the relationship between two or more independent
variables and one dependent variable.
The formula for a multiple linear regression is:

 = the predicted value of the dependent variable


 = the y-intercept (value of y when all other parameters are set to 0)
 = the regression coefficient ( ) of the first independent variable ( ) (a.k.a.
the effect that increasing the value of the independent variable has on the predicted y
value)
 … = do the same for however many independent variables you are testing
 = the regression coefficient of the last independent variable
 = model error (a.k.a. how much variation there is in our estimate of )

To find the best-fit line for each independent variable, multiple linear regression calculates three
things:

 The regression coefficients that lead to the smallest overall model error.
 The t statistic of the overall model.
 The associated p value (how likely it is that the t statistic would have occurred by chance
if the null hypothesis of no relationship between the independent and dependent variables
was true).

Python Packages for Linear Regression


The package NumPy is a fundamental Python scientific package that allows many high per-
formance operations on single and multi-dimensional arrays .It also offers many mathematical routines.
It’s open source.
The package scikit-learn is a widely used Python library for machine learning, built on top of
NumPy and some other packages. It provides the means for preprocessing data, reducing dimension-
ality, implementing regression, classification, clustering, and more. Like NumPy, scikit- learn is also
open source.
To implement linear regression with functionality beyond the scope of scikit- learn, stats-
models is used. It’s a powerful Python package for the estimation of statistical models, performing
tests, and more. It’s open source as well.

Simple Linear Regression With scikit-learn


There are five basic steps involved in the implementation of linear regression:
1. Import the packages and classes needed.
2. Provide data to work with and eventually do appropriate transformations.
3. Create a regression model and fit it with existing data.
4. Check the results of model fitting to know whether the model is satisfactory.
Department of Computer Science and Engineering (AIML), SREYAS Page 24
MACHINE LEARNING LAB MANUAL
SREYAS

5. Apply the model for predictions.

FLOWCHART:

$ gedit house10.csv

Source Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as pt
from sklearn import linear_model
from sklearn import model_selection
from sklearn import metrics
df=pd.read_csv("house10.csv")
print(df)
x=df['area'].values.reshape(-1,1)
y=df['price'].values.reshape(-1,1)
xtrain,xtest,ytrain,ytest=model_selection.train_test_split(x,y,test_size=0.3)
train=pd.DataFrame({"area":xtrain.squeeze(),"price":ytrain.squeeze()})

Department of Computer Science and Engineering (AIML), SREYAS Page 25


MACHINE LEARNING LAB MANUAL
SREYAS

print("the records selected by method from data set are")


print(train)
test=pd.DataFrame({"area":xtest.squeeze(),"price":ytest.squeeze()})
print("The records selected by method from data set for testing are")
print(test)
slr=linear_model.LinearRegression()
slr.fit(xtrain,ytrain)
ypredict=slr.predict(xtest)
df1=pd.DataFrame({"actual price from the dataset":ytest.squeeze(),"predicted price by
algorithm":ypredict.squeeze()})
print(df1)
print("enter the area of the house")
x=input()
a=int(x)
pp=slr.predict([[a]])
print("price of the house= ",pp)
y=slr.score(xtest,ytest)
acc=int(round(y,2)*100)
print("accuracy= ",acc,"%")
m=slr.coef_
c=slr.intercept_
print("house price= ", m " *area+" , c )
pt.xlabel("area")
pt.ylabel("price")
pt.plot(df.area,df.price)
pt.show()
mae=metrics.mean_absolute_error(ytest,ypredict)
mse=metrics.mean_squared_error(ytest,ypredict)
rmse=np.sqrt(mse)
print("Mean Absolute Error= ",mae)
print("Mean Squared Error= ",mse)
print("Root Mean Squared Error= ",rmse)

OUTPUT:

area price
0 500 1000000
1 525 1200000
2 540 1400000
3 600 1700000
4 635 1870000
5 670 1900000
6 800 2900000
7 900 3200000
8 1000 3600000
9 1050 3800000
10 1100 4100000
11 1200 4500000
12 1250 4600000
13 1300 5000000

Department of Computer Science and Engineering (AIML), SREYAS Page 26


MACHINE LEARNING LAB MANUAL
SREYAS

14 1325 5100000
15 1350 5200000
16 1400 5300000
17 1500 5450000
18 1550 5600000
19 1600 5700000

The records selected by method from data set are


area price
0 1400 5300000
1 900 3200000
2 600 1700000
3 1050 3800000
4 670 1900000
5 525 1200000
6 1325 5100000
7 1600 5700000
8 1550 5600000
9 1350 5200000
10 1300 5000000
11 1500 5450000
12 1100 4100000
13 635 1870000

The records selected by method from data set for testing are
area price
0 800 2900000
1 1250 4600000
2 1000 3600000
3 540 1400000
4 500 1000000
5 1200 4500000
Actual price from the dataset Predicted price by algorithm
0 2900000 2.608761e+06
1 4600000 4.552735e+06
2 3600000 3.472749e+06
3 1400000 1.485575e+06
4 1000000 1.312778e+06
5 4500000 4.336738e+06
Enter the area of the house
1575
Price of the house = [[5956715.9877182]]
Accuracy = 98 %
House price= [[4359.96269557]]*area+ [-847193.4896212]
Mean Absolute Error: 213056.0686386718
Mean Squared Error: 56296467155.93354

Department of Computer Science and Engineering (AIML), SREYAS Page 27


MACHINE LEARNING LAB MANUAL
SREYAS

RESULT: Thus the program to predict the price of a house using simple linear regression
algorithm is executed and output is verified.

3.b. Aim: To write a program to predict the sales using multiple regression algorithm.

Source Code:

Data Set:
$ gedit advertising.csv

Python Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as pt
from sklearn import linear_model
from sklearn import model_selection
from sklearn import metrics
import math
df=pd.read_csv("advertising.csv")
print(df)
pt.xlabel("TV RADIO NEWSPAPER")
pt.ylabel("SALES")
pt.plot(df.tv, df.sales, df.radio, df.sales, df.newspaper, df.sales)
pt.show()
x1=df["tv"].values.reshape(-1,1)
x2=df["radio"].values.reshape(-1,1)
x3=df["newspaper"].values.reshape(-1,1)
y=df["sales"].values.reshape(-1,1)

Department of Computer Science and Engineering (AIML), SREYAS Page 28


MACHINE LEARNING LAB MANUAL
SREYAS

x1train,x1test,x2train,x2test,x3train,x3test,ytrain,ytest=model_selection.train_test_split(x1,x2,x3
,y,test_size=0.3)
train=pd.DataFrame({"tv":x1train.squeeze(),"radio":x2train.squeeze(), "newspaper":
x3train.squeeze(), "sales" : ytrain.squeeze()})
print("the training from data set are")
print(train)
test=pd.DataFrame({"tv":x1test.squeeze(),"radio":x2test.squeeze(), "newspaper":
x3test.squeeze(), "sales" : ytest.squeeze() })
print("the records selected by method from data set for testing are")
print(test)
xtrain = train[["tv", "radio", "newspaper"]]
ytrain = train['sales']
mlr=linear_model.LinearRegression()
mlr.fit(xtrain,ytrain)
xtest = test[["tv", "radio", "newspaper"]]
ypredict=mlr.predict(xtest)
dfd=pd.DataFrame({"actual sales from the dataset":ytest.squeeze(),"predicted sales by
algorithm":ypredict.squeeze()})
print(dfd)
print("enter tv, radio, newspaper values to predict the sales")
x=input()
y = input()
z = input()
t = float(x)
r = float(y)
np = float(z)
data = { 'tv' : [t], 'radio' : [r], 'newspaper' : [np]}
k = pd.DataFrame(data)
ps = mlr.predict(k[["tv", "radio", "newspaper"]])
print("Predicted Sales = ", ps)
m = mlr.coef_
c = mlr.intercept_
print("Sales = ", m[0], "*tv+", m[1], "*radio+", m[2], "*newspaper+", c)
acc = mlr.score(xtest, ytest)
acc = int(round(acc,2)*100)
print("accuracy", acc, "%")
mae=metrics.mean_absolute_error(ytest,ypredict)
mse=metrics.mean_squared_error(ytest,ypredict)
rmse=math.pow(mse,0.5)
print("mean absolute error= ",mae)
print("mean squared error= ",mse)

OUTPUT:

tv radio newspaper sales


0 230.1 37.8 69.2 22.1
1 44.5 39.3 45.1 10.4
2 17.2 45.9 69.3 9.3
3 151.5 41.3 58.5 18.5
4 180.8 10.8 58.4 12.9

Department of Computer Science and Engineering (AIML), SREYAS Page 29


MACHINE LEARNING LAB MANUAL
SREYAS

5 8.7 48.9 75.0 7.2


6 57.5 32.8 23.5 11.8
7 120.2 19.6 11.6 13.2
8 8.6 2.1 1.0 4.8
9 199.8 2.6 29.2 10.6

The records selected for training from data set by the method are
tv radio newspaper sales
0 120.2 19.6 11.6 13.2
1 180.8 10.8 58.4 12.9
2 199.8 2.6 29.2 10.6
3 8.6 2.1 1.0 4.8
4 151.5 41.3 58.5 18.5
5 8.7 48.9 75.0 7.2
6 57.5 32.8 23.5 11.8

The records selected by method from data set for testing are
tv radio newspaper sales
0 44.5 39.3 45.1 10.4
1 230.1 37.8 69.2 22.1
2 17.2 45.9 69.3 9.3

Actual sales from the dataset Predicted sales by algorithm


0 10.4 10.956942
1 22.1 19.801604
2 9.3 8.949839

Enter tv, radio, newspaper values to predict the sales


8.6
2.1
1.0
Predicted Sales = [3.65574863]
Sales = 0.059761154874772834 *tv+ 0.23200795422356396 *radio+ -0.07879654355949373
*newspaper+ 2.733382533044141
accuracy 94 %
mean absolute error= 1.0684997530941345
mean squared error= 1.9051403848423896
root mean squared error= 1.3802682293099373

Department of Computer Science and Engineering (AIML), SREYAS Page 30


MACHINE LEARNING LAB MANUAL
SREYAS

RESULT: Thus the program to predict the sales using multiple linear regression algorithm is
executed and output is verified.

Department of Computer Science and Engineering (AIML), SREYAS Page 31


MACHINE LEARNING LAB MANUAL
SREYAS

PROGRAM 4

4.a. The following training examples map descriptions of individuals onto high, medium
and low credit-worthiness. Input attributes are (from left to right) income, recreation,
job, status, age-group, home-owner. Find the unconditional probability of 'golf' and the
conditional probability of 'single' given 'medRisk' in the dataset.

Income recreation job status age-group home-owner.


medium skiing design single twenties no -> highRisk
high golf trading married forties yes -> lowRisk
low speedway transport married thirties yes -> medRisk
medium football banking single thirties yes -> lowRisk
high flying media married fifties yes -> highRisk
low football security single twenties no -> medRisk
medium golf media single thirties yes -> medRisk
medium golf transport married forties yes -> lowRisk
high skiing banking single thirties yes -> highRisk
low golf unemployed married forties yes -> highRisk

Aim: To write a program to find the unconditional probability of 'golf' and the conditional
probability of 'single' given 'medRisk' in the dataset.

Theory:
Conditional probability is defined as the likelihood of an event or outcome occurring,
based on the occurrence of a previous event or outcome. Conditional probability is calculated
by multiplying the probability of the preceding event by the updated probability of the succeed-
ing, or conditional event.

Unconditional probability refers to the likelihood that an event will take place irrespec-
tive of whether any other events have taken place or any other conditions are present.

P(B|A) = P(A∩B) / P(A)


where, P = Probability,A = Event A, B = Event B

Department of Computer Science and Engineering (AIML), SREYAS Page 32


MACHINE LEARNING LAB MANUAL
SREYAS

Flow Chart:

Dataset:
$ gedit info.csv

1,income,recreation,job,status,age-group,homeowner,risk
2,mediumskiing,design,single,twenties,no,highrisk
3,high,golf,trading,married,forties,yes,lowrisk
4,low,speedway,transport,married,thirties ,yes,medrisk
5,medium,football,banking,single,thirties,yes,lowrisk
6,high,fly,media,marries,fifties,yes,highrisk
7,low,football,security,single,twenties,no,medrisk
8,medium,golf ,media,single,thirties,no,medrisk
9,medium,golf ,transport,married,forties,yes,highrisk
10,high,skiing,banking,single,thirties,yes,highrisk
11,low,golf,unemployed,married,forties,yes,highrisk

Source Code:

import pandas as pd
df = pd.read_csv("info.csv")
print(df)
print ("Enter attributes to match recreation")
a = input()
gc = df.recreation.value_counts()[a]
n = len(df.index)

Department of Computer Science and Engineering (AIML), SREYAS Page 33


MACHINE LEARNING LAB MANUAL
SREYAS

upg = gc/n
print("Unconditional probability of golf = ", upg)
print("Enter attribute value to match for risk")
b = input()
mrc = df.risk.value_counts()[b]
print("Enter attribute value to match for status")
c = input()
q = (df['status'].str.contains(c))&(df['risk'].str.contains(b))
smrc = df[q].shape[0]
psmr = smrc/mrc
print("Conditional Probability of ",c , "given", b, "=", psmr)

Output:

Result: Thus the program to find conditional and unconditional probability of attributes using
pandas has been executed and output is verified.

4.b.
Aim: Write a program to find conditional and unconditional probability of given attributes.

Program:
total_records = 10
NumGolfRecords =4
UnconditionalProbGolf = NumGolfRecords/total_records
print("Unconditional probabiliy of golf: = ", UnconditionalProbGolf)
#Conditional probability of 'single' given 'medrisk'
NumMedRiskSingle = 2
NumMedRisk = 3
ProbMedRiskSingle = NumMedRiskSingle/total_records
ProbMedRisk = NumMedRisk/total_records
ConditionalProb = ProbMedRiskSingle/ProbMedRisk
print("Conditional probability of single given medrisk = " , ConditionalProb)

Output:
Unconditional probability of golf: = 0.4
Conditional probability of single given medrisk = 0.66666666666667
=0.67
RESULT: Thus the program to find conditional and unconditional probability of attributes has
been executed and output is verified.

Department of Computer Science and Engineering (AIML), SREYAS Page 34


MACHINE LEARNING LAB MANUAL
SREYAS

PROGRAM 5

5.a. KNN (K Nearest Neighbours Algorithm) as Classifier:

AIM: To write a program to classify a person as overweight, underweight, normal weight using
python

THEORY:
o K-NN algorithm assumes the similarity between the new case/data and available cases and
put the new case into the category that is most similar to the available categories.
o K-NN algorithm can be used for Regression as well as for Classification but mostly it is
used for the Classification problems.

The working of K-NN can be explained as follows:

1. Select the number K of the neighbors


2. Calculate the Euclidean distance of K number of neighbors
3. Take the K nearest neighbors as per the calculated Euclidean distance.
4. Among these k neighbors, count the number of the data points in each category.
5. Assign the new data points to that category for which the number of the neighbor is maximum.

FLOWCHART:

Department of Computer Science and Engineering (AIML), SREYAS Page 35


MACHINE LEARNING LAB MANUAL
SREYAS

SOURCE CODE:

Dataset:
$ gedit weight.csv
height,weight,target
137,35,0
137,48,1
137,80,2
140,20,0
140,50,1
140,100,2
141,15,0
141,52,1
141,200,2
143,56,1
143,30,0
143,99,2
145,59,1
145,110,2
145,35,0
146,62,1
146,47,0
146,47,0
146,88,2
148,65,1
148,100,2
148,25,0
150,68,1
150,46,0
150,86,2
155,72,1
155,56,0
155,88,2
161,78,1
161,565,2
161,100,2
170,85,1
170,50,0
170,100,2
174,88,1
174,66,0
174,120,2
182,93,1
182,60,0
182,110,2
187,96,1
187,70,0
187,125,2
193,98,1,
193,65,0

Department of Computer Science and Engineering (AIML), SREYAS Page 36


MACHINE LEARNING LAB MANUAL
SREYAS

193,100,2

Python Code:

import pandas as pd
import matplotlib.pyplot as pt
import seaborn as sb
from sklearn import model_selection
from sklearn.metrics import confusion_matrix
from sklearn.neighbors import kNeighborsClassifier
#Data Visualization
df = pd.read_csv("weight.csv")
print(df)
pt.xlabel("height")
pt.ylabel("weight")
df1 = df[df.target = = 0]
df2 = df.[df.target = =1]
df3 = df[df.target = =2]
#Scatter Diagram
pt.scatter(df1["height"], df1["weight"], color = "red", marker = "+")
pt.scatter(df2["height"], df2["weight"], color = "green", marker = "*")
pt.scatter(df3["height"], df3["weight"], color = "black", marker = ".")
pt.show()
#Experiance
x = df.drop(["target"], axis = "columns")
y = df["target"]
xtrain,xtest,ytrain,ytest = model_selection.train_test_split(x,y,test_size = 0.2, random_state = 1)
print(xtrain)
print(ytrain)
knn =KNeighborsClassifier(n_neighbors = 5)
knn.fit(xtrain,ytrain)
#Task
ypredict = knn.predict(xtest)
cm = confusion_matrix(ytest,ypredict)
print("confusion matrix = ",cm)
pt.figure(figsize = (10,5))
sb.heatmap(cm, annot = true)
pt.xlabel("Predicted Value")
pt.ylabel("Actual value from Dataset")
pt.show()
#End User Input
print("Enter Height and Weight")
h = int(input())
w = int(input())
data = {"height" : [h], "weight" : [w]}
k = pd.DataFrame(data)
pt = knn.predict(k[["height","weight"]])
print("predicted target = ", pt)
#Performance
acc = knn.score(xtest,ytest)

Department of Computer Science and Engineering (AIML), SREYAS Page 37


MACHINE LEARNING LAB MANUAL
SREYAS

acc = int(round(acc,2)*100)
print("accuracy = ", acc, "%")

Output:

weight.csv
43 records
height weight
33 174 66
43 193 100
32 174 88
23 155 72
17 146 88
31 170 100
29 170 85
36 182 60
40 187 125
4 140 50
24 155 56
14 145 35
10 143 30
39 187 70
26 161 78
27 161 565
38 187 96
20 148 25
18 148 65
25 155 88
6 141 15
28 161 100
13 145 110
7 141 52
42 193 65
1 137 48
16 146 47
0 140 35
15 143 62
5 140 100
11 143 99
9 143 56
8 141 200
12 145 59
37 182 110
33 0
43 2
32 1
23 1
17 2
31 2
29 1

Department of Computer Science and Engineering (AIML), SREYAS Page 38


MACHINE LEARNING LAB MANUAL
SREYAS

36 0
40 2
4 1
24 0
14 0
10 0
39 0
26 1
27 2
38 1
20 0
18 1
25 2
6 0
28 2
13 2
7 1
42 0
1 1
16 0
0 0
15 1
5 2
11 2
9 1
8 2
12 1
37 2

Department of Computer Science and Engineering (AIML), SREYAS Page 39


MACHINE LEARNING LAB MANUAL
SREYAS

Name : target, dtype:int64


Confusion matrix = [[3 0 0]
[0 2 1]
[0 0 3]]
Enter height weight
155
90
Predicted target = [20]
Accuracy = 89%

Result: Thus the program to classify a person as overweight, underweight, normal weight has
been executed and output is verified.

5.b) KNN as Regressor:


Aim: To write a program to find the price for a given area of a house using KNN as regressor.

Source Code:

$ gedit house.csv
area,price
500,1000000
525,1210000
540,1400000
600,1700000
635,1870000
670,1900000
800,2900000
900,3200000
1000,3600000

Department of Computer Science and Engineering (AIML), SREYAS Page 40


MACHINE LEARNING LAB MANUAL
SREYAS

1050,3800000
1100,nan
1200,4500000
1250,4600000
1300,5000000
1325,5100000
1350,5200000
1400,5300000
1500,5450000
1550,5600000
1600,5700000
1700,6000000
1725,nan
1800,6500000
1825,6700000
1877,7200000
1900,7000000
1960,7700000
2000,8500000
2100,8600000
2200,8500000
2500,9000000
2600,8700000
2700,8300000

Python Code:

# import Modules
import pandas as pd
import matplotlib.pyplot as pt
from sklearn import model_selection
from sklearn.neighbors import KNeighborsRegressor
#Data visualization
df = pd.read_csv("house.csv")
print(df)
pt.xlabel("area")
pt.ylabel("price")
pt.plot(df.area,df.price)
pt.show()
#Data preprocessing
print("Missing value information")
print(df.isna().sum())
df["price"].fillna(df ["price"].mean(),limit = 1,inplace=True)
df["price"].fillna(method = "ffill", inplace=True)
print(df)
#Data visualization after preprocessing
pt.xlabel("area")
pt.ylabel ("price")
pt.plot(df.area,df.price)
pt.show()

Department of Computer Science and Engineering (AIML), SREYAS Page 41


MACHINE LEARNING LAB MANUAL
SREYAS

#E Experience
x=df.drop(["price"], axis="columns")
y = df.drop(["area"], axis= "columns")
xtrain,xtest,ytrain,ytest= model_selection.train_test_split(x,y,test_size=0.25,random_state=1)
print("The training data")
train = pd.DataFrame({"area":xtrain.squeeze(),"price": ytrain.squeeze()})
print(train)
knn = KNeighborsRegressor(n_neighbors = 7)
knn.fit(xtrain,ytrain)
# T Task
print("The test data is")
test=pd.DataFrame({ "area": xtest. squeeze(), "price" :ytest.squeeze})
print(test)
ypredict = knn.predict(xtest)
print ("comparision")
yp = pd.DataFrame ({ "y predict": ypredict.squeeze(), "ytest":ytest.squeeze()})
print(yp)
print("enter area of the house")
a = int(input ())
data = {"area": [a]}
k= pd.DataFrame(data)
pp = knn.predict(k[["area"]])
print("predicted price ", pp)
#P performance
acc = knn.score(xtest, ytest)
acc=int(round (acc, 2) * 100)
print ("accuracy = ", acc, "%")

Output:

area price
0 500 1000000.0
1 525 1210000.0
2 540 1400000.0
3 600 1700000.0
4 635 1870000.0
5 670 1900000.0
6 800 2900000.0
7 900 3200000.0
8 1000 3600000.0
9 1050 3800000.0
10 1100 NaN
11 1200 4500000.0
12 1250 4600000.0
13 1300 5000000.0
14 1325 5100000.0
15 1350 5200000.0
16 1400 5300000.0
17 1500 5450000.0
18 1550 5600000.0

Department of Computer Science and Engineering (AIML), SREYAS Page 42


MACHINE LEARNING LAB MANUAL
SREYAS

19 1600 5700000.0
20 1700 6000000.0
21 1725 NaN
22 1800 6500000.0
23 1825 6700000.0
24 1877 7200000.0
25 1900 7000000.0
26 1960 7700000.0
27 2000 8500000.0
28 2100 8600000.0
29 2200 8500000.0
30 2500 9000000.0
31 2600 8700000.0
32 2700 8300000.0

Missing value information


area 0
price 2
dtype: int64
area price
0 500 1.000000e+06
1 525 1.210000e+06
2 540 1.400000e+06
3 600 1.700000e+06
4 635 1.870000e+06
5 670 1.900000e+06
6 800 2.900000e+06
7 900 3.200000e+06
8 1000 3.600000e+06
9 1050 3.800000e+06
10 1100 5.217097e+06
11 1200 4.500000e+06
12 1250 4.600000e+06
13 1300 5.000000e+06
14 1325 5.100000e+06
15 1350 5.200000e+06
16 1400 5.300000e+06
17 1500 5.450000e+06
18 1550 5.600000e+06
19 1600 5.700000e+06
20 1700 6.000000e+06
21 1725 6.000000e+06
22 1800 6.500000e+06
23 1825 6.700000e+06
24 1877 7.200000e+06
25 1900 7.000000e+06
26 1960 7.700000e+06
27 2000 8.500000e+06
28 2100 8.600000e+06
29 2200 8.500000e+06

Department of Computer Science and Engineering (AIML), SREYAS Page 43


MACHINE LEARNING LAB MANUAL
SREYAS

30 2500 9.000000e+06
31 2600 8.700000e+06
32 2700 8.300000e+06

The training data


area price
30 2500 9.000000e+06
17 1500 5.450000e+06
22 1800 6.500000e+06
4 635 1.870000e+06
2 540 1.400000e+06
21 1725 6.000000e+06
23 1825 6.700000e+06
10 1100 5.217097e+06
29 2200 8.500000e+06
28 2100 8.600000e+06
18 1550 5.600000e+06
6 800 2.900000e+06
13 1300 5.000000e+06
7 900 3.200000e+06
32 2700 8.300000e+06
1 525 1.210000e+06
16 1400 5.300000e+06
0 500 1.000000e+06
15 1350 5.200000e+06
5 670 1.900000e+06
11 1200 4.500000e+06
9 1050 3.800000e+06
8 1000 3.600000e+06
12 1250 4.600000e+06

The test data is


area price
14 1325 5100000.0
19 1600 5700000.0
3 600 1700000.0
27 2000 8500000.0
31 2600 8700000.0
26 1960 7700000.0
20 1700 6000000.0
25 1900 7000000.0
24 1877 7200000.0

comparision
y predict ytest
14 5.092857e+06 5100000.0
19 5.821429e+06 5700000.0
3 1.925714e+06 1700000.0
27 6.764286e+06 8500000.0
31 7.657143e+06 8700000.0

Department of Computer Science and Engineering (AIML), SREYAS Page 44


MACHINE LEARNING LAB MANUAL
SREYAS

26 6.764286e+06 7700000.0
20 5.821429e+06 6000000.0
25 6.764286e+06 7000000.0
24 6.764286e+06 7200000.0

enter area of the house


2345
predicted price [[7657142.85714286]]
accuracy = 86 %

Department of Computer Science and Engineering (AIML), SREYAS Page 45


MACHINE LEARNING LAB MANUAL
SREYAS

Result: Thus the program to find the price for a given area of a house using KNN as regressor
has been executed and output is verified.

PROGRAM 6
6. Given the following data, which specify classifications for nine combinations of VAR1 and
VAR2 predict a classification for a case where VAR1=0.906 and VAR2=0.606, using the result
of k-means clustering with 3 means (i.e., 3centroids)

Aim: To implement KMeans Clustering algorithm using python

Dataset:
$ gedit var12.csv
Department of Computer Science and Engineering (AIML), SREYAS Page 46
MACHINE LEARNING LAB MANUAL
SREYAS

Var1,Var2
1.713,1.586
0.18,1.786
0.353,1.24
0.94,1.566
1.486,0.759
1.266,1.106
1.54,0.419
0.459,1.799
0.773,0.186

SOURCE CODE:
import matplotlib.pyplot as pt
import numpy as np. Import pandas as pd
from sklearn.cluster import KMeans
from sklearn.preprocessing import Standard Scaler

#data preprocessing
df = pd.read_csv (“var12.csv”)
print(df)
s = StandardScaler()
t = df.iloc[:, [0, 1]]
x = sfit_transform (t)
df [‘VAR1’] = x[:,0]
df [‘VAR2’] = x[:,1]
print (“dataset after scaling”)
print (df)
#data visualization
inc = df.iloc[:,0]
ss=df.iloc[:,1]
pt.title (“var1 vs var 2 scatter diagram”)
pt.xlabel (“VARI”)
pt.ylabel(“VAR2”)
pt.scatter(inc, ss)
pt.show()
# find the no g cluster using the elbow method
wcss = [ ]
k=[]
y = df.iloc[:, [0, 1]]
for i in range (2,10):
km = KMeans(n_clusters=i)
km.fit(y)
wcss. append (km.inertia_)
k.append (i)
pt.title (number of clusters vs wcss line graph)
pt.xlabel (“k”)
pt.ylabel(“wcss”)
pt.plot (k, wcss)
pt.show()

Department of Computer Science and Engineering (AIML), SREYAS Page 47


MACHINE LEARNING LAB MANUAL
SREYAS

#fit the data using kmeans algorithm


km=kMeans (n_clusters =3)
km.fit(y)
# To predict the clusters
dc=km. predict (y)
print (dc)
df [“cluster”]=dc
print (“the dataset after the cluster assignment for each example”)
print (df)
print(“the centeroids of fif clusters”)
cen=km.Cluster _centers_
print (un)
c1 = df [df.cluster == 0]
c2=df [df.cluster = = 1]
c3 = df [df.cluster = = 2]
pt.xlabel (“VAR1”)
pt.ylabel(“VAR2”)
pt.scatter (C, VARI, C1.VAR2, color=”red”, label = “cluster1”)
pt.scatter (C2.VARI, C2. VAR2, color=”blue”, label = “cluster 2”)
pt.scatter (C3.VARI, C3.VAR2, color = “green”, label = “cluster 3”)
pt.scatter (cen [:,0], can [:,1], color = “black”, label = “centroid”)
pt. legend ()
pt-show()
print (’enter the vallevala”)
inc = float(input ())
ss=float(input())
data = {“VAR1”: [inc], “VAR2”: [ss]}
k = pd.DataFrame(data)
pc = km.predict (K[[“VAR1”, “VAR 2”]])
print(“The cluster for given input = “, pc)

Department of Computer Science and Engineering (AIML), SREYAS Page 48


MACHINE LEARNING LAB MANUAL
SREYAS

Department of Computer Science and Engineering (AIML), SREYAS Page 49


MACHINE LEARNING LAB MANUAL
SREYAS

the dataset after the cluster assignment for each example


VAR1 VAR2 cluster
0 1.403778 0.760407 0
1 -1.483941 1.118058 2
2 -1.158060 0.141670 2
3 -0.052325 0.724642 0
4 0.976178 -0.718482 1
5 0.561763 -0.097957 0
6 1.077898 -1.326490 1
7 -0.958387 1.141306 2
8 -0.366904 -1.743154 1

the centeroids of fif clusters


[[ 0.63773861 0.46236393]
[ 0.56239043 -1.26270855]
[-1.20012903 0.80034462]]

enter the var1 and var2


0.606
0.543
The cluster for given input = [0]

Result: Thus the program to implement KMeans Clustering algorithm using python has been
executed and the output is verified.

Department of Computer Science and Engineering (AIML), SREYAS Page 50


MACHINE LEARNING LAB MANUAL
SREYAS

PROGRAM 7
TEXT CLASSIFICATION:

Aim: To write a program to classify text using naïve bayes classifier.

Theory:
Naïve Bayes Classifier predicts on the basis of the probability of an object.

Working of Naïve Bayes' Classifier is as follows:


1. Convert the given dataset into frequency tables.
2. Generate Likelihood table by finding the probabilities of given features.
3. Now, use Bayes theorem to calculate the posterior probability.

Flow Chart:

Department of Computer Science and Engineering (AIML), SREYAS Page 51


MACHINE LEARNING LAB MANUAL
SREYAS

SOURCE CODE:
import pandas as pd
import numpy as np
import seaborn as sb
import matplotlib.pyplot as pt
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from sklearn.naive_bayes import MultinomialNB
from sklearn.metrics import accuracy_score
from sklearn.metrics import confusion_matrix
df = pd.read_csv("sample1.csv")
print("the given data set is")
print(df)
cv = CountVectorizer()
cm = cv.fit_transform(df["text"])
ca = cm.toarray()
df1 = pd.DataFrame(data = ca, columns=cv.get_feature_names_out())
print("first five count vector of text attribute from dataset")
print(df1.head())
x = ca
y=df['classification']
xtrain,xtest,ytrain,ytest=train_test_split(x,y, test_size=0.3)
nbc=MultinomialNB()
nbc.fit(xtrain, ytrain)
ypredict =nbc.predict(xtest)
print("ytest and ypredict comparision")
dap=pd.DataFrame({ "actual classification from dataset": ytest.squeeze(), "predicted classifica-
tion by algorithm": ypredict.squeeze()})
print(dap)
#Confusion matrix
cm = confusion_matrix (ytest, ypredict)
print(cm)
pt.figure(figsize = (10,5))
sb.heatmap(cm, annot=True)
pt.xlabel("text")
pt.ylabel("classification")
pt.show()
# accuracy using score method
acc=nbc.score(xtest,ytest)
acc=int(round (acc, 2) * 100)
print ("acc using score method = ", acc,"%")
acc = accuracy_score(ytest, ypredict)
acc = int(round (acc, 2) * 100)
print("accessing accuracy _score method = ", acc, "%")
#Classification Report.
from sklearn.metrics import classification_report
print (classification_report (ytest, ypredict))

Dataset:
$ gedit sample1.csv

Department of Computer Science and Engineering (AIML), SREYAS Page 52


MACHINE LEARNING LAB MANUAL
SREYAS

text,classification
Machine learning is the field of study that gives computers the capability to learn without being
explicitly programmed,machine learning
Ml results will be hundred percentage from aiml a third year,aimla
Today machine learning to used in a wide range of applications perhaps one of the most well
known examples is the recommendation engine,machine learning
Machine learning projects are typically driven by data scientists who command high salaries,
machine learning
Machine learning enables a machine to automatically learn from data to improve performance
from experiences & predicts things without being explicitly programmed,machine learning
Cricket is a bat and ball game played between two teams of 11 players on a field at the centre of
which is a 22-yard pitch with a wicket at each end each comprising two balls balanced on three
stumps,cricket
Third year Aiml a is excellent class,aimla
Women cricket which is organized& played separately has also achieved international
standard,cricket

Output:

Department of Computer Science and Engineering (AIML), SREYAS Page 53


MACHINE LEARNING LAB MANUAL
SREYAS

the given data set is


text classification
0 Machine learning is the field of study that gi... machine learning
1 Ml results will be hundred percentage from aim... aimla
2 Today machine learning to used in a wide range... machine learning
3 Machine learning projects are typically driven... machine learning
4 Machine learning enables a machine to automati... machine learning
5 Cricket is a bat and ball game played between ... cricket
6 Third year Aiml a is excellent class aimla
7 Women cricket which is organized& played separ... cricket
first five count vector of text attribute from dataset
11 22 achieved aiml also and applications are at ... who wicket wide will with without
women yard year
0 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 0 0
1 0 0 0 1 0 0 0 0 0 ... 0 0 0 1 0 0 0 0 1
2 0 0 0 0 0 0 1 0 0 ... 0 0 1 0 0 0 0 0 0
3 0 0 0 0 0 0 0 1 0 ... 1 0 0 0 0 0 0 0 0
4 0 0 0 0 0 0 0 0 0 ... 0 0 0 0 0 1 0 0 0

[5 rows x 96 columns]
ytest and ypredict comparision
actual classification from dataset predicted classifica-tion by algorithm
5 cricket cricket
6 aimla aimla
2 machine learning machine learning
[[1 0 0]
[0 1 0]
[0 0 1]]
acc using score method = 100 %
accessing accuracy _score method = 100 %
precision recall f1-score support

aimla 1.00 1.00 1.00 1

Department of Computer Science and Engineering (AIML), SREYAS Page 54


MACHINE LEARNING LAB MANUAL
SREYAS

cricket 1.00 1.00 1.00 1


machine learning 1.00 1.00 1.00 1

accuracy 1.00 3
macro avg 1.00 1.00 1.00 3
weighted avg 1.00 1.00 1.00 3

Result: Thus a program to classify text using Naïve Baye’s classifier has been executed and the
output is verified.

Department of Computer Science and Engineering (AIML), SREYAS Page 55


MACHINE LEARNING LAB MANUAL
SREYAS

PROGRAM 8

AIM: To Implement an algorithm to demonstrate the significance of genetic algorithm

Theory:
A genetic algorithm is an adaptive heuristic search algorithm inspired by "Darwin's theory
of evolution in Nature."It basically involves five phases to solve the complex optimization
problems, which are given as below:

o Initialization
o Fitness Assignment
o Selection
o Reproduction
o Termination

Flow Chart:

Department of Computer Science and Engineering (AIML), SREYAS Page 56


MACHINE LEARNING LAB MANUAL
SREYAS

SOURCE CODE:
import random
random.seed(100)
def selection(population):
fs = []
j=0
sol = []
for c in population:
indf = 100-abs(30-(c[0]+2*c[1]-3*c[2]+c[3]+4*c[4]+c[5]))
fs.append(indf)
sc = list(zip(fs,population))
sc.sort(reverse = True)
for i in sc:
if i[0] == 100:
j = j+1
if i[1] not in sol:
sol.append(i[1])
if(j == 0):
print("no solution found in this generation")
else:
print ("The solution for the given equation in this Generation is")
print(sol)
sc=sc[:4]
score,population = zip(*sc)
return list(population)
def crossover(population):
random.shuffle(population)
fatherchromosome = population[:2]
motherchromosome = population[2:]
children =[]
for i in range (len(fatherchromosome)):
crossoversite= random.randint(1,5)
fatherfragments =
[fatherchromosome[i][:crossoversite],fatherchromosome[i][crossoversite:]]
motherfragments =
[motherchromosome[i][:crossoversite],motherchromosome[i][crossoversite:]]
firstchild = fatherfragments[0] + motherfragments[1]
children.append(firstchild)
secondchild = motherfragments[0] + fatherfragments[1]
children.append(secondchild)
return children
def mutation(population):
mutatedchromosomes = []
for chromosome in population:
mutation_site = random.randint(0,5)
chromosome[mutation_site] = random.randint(1,9)
mutatedchromosomes.append(chromosome)
return mutatedchromosomes
def get_fit_chromosomes(generation):
population = [[random.randint(1,9) for i in range (6)] for j in range (6)]

Department of Computer Science and Engineering (AIML), SREYAS Page 57


MACHINE LEARNING LAB MANUAL
SREYAS

for generation in range (generation):


generation +=1
print("Generation : ",generation)
population= selection(population)
crossover_children = crossover(population)
population =population + crossover_children
mutated_population = mutation(population)
population = population + mutated_population
#main program
print ("Enter no of generations")
n=int(input())
get_fit_chromosomes(n)

Output:

Enter no of generations
4
Generation : 1
no solution found in this generation
Generation : 2
The solution for the given equation in this Generation is
[[5, 7, 3, 7, 3, 1], [4, 8, 8, 3, 7, 6], [5, 4, 8, 3, 7, 5], [4, 9, 2, 9, 2, 2], [5, 7, 2, 9, 2, 6], [3, 8, 3, 3, 4,
9], [5, 9, 3, 3, 4, 6], [7, 9, 3, 2, 6, 1]]
Generation : 3
no solution found in this generation
Generation : 4
no solution found in this generation

Result: Thus a program to implement genetic algorithm has been executed and output is
verified.

Department of Computer Science and Engineering (AIML), SREYAS Page 58


MACHINE LEARNING LAB MANUAL
SREYAS

PROGRAM 9

AIM: To implement the finite words classification system using Back-propagation Algorithm.

Back Propagation Working:

BPN learns in an iterative manner. On every iteration, it compares training examples with the
actual target label. Target label can be a class label of continuous value. The back propagation
algorithm works in following steps:
 Initialize Network: BPN randomly initializes the weights.
 Forward propagate: After initialization, we will propagate into the forward direction. In
this phase, we compute the output and calculate the error from the target output.

Back Propagation Error: For each observation, weights are modified in order to reduce the
error in a technique called the delta rule or gradient descent. It modifies weights in a “backward”
direction to all the hidden layers.

Flow Chart:

Department of Computer Science and Engineering (AIML), SREYAS Page 59


MACHINE LEARNING LAB MANUAL
SREYAS

Dataset:

Iris dataset characteristics:


Number of Instances:
150 (50 in each of three classes).
Number of Attributes:
4 numeric predictive attributes and the class

Attribute Information:

Sepal length in cm
Sepal width in cm
Petal length in cm
Petal width in cm

Class:

Iris-selosa
Iris-Versicolour
Iris – Virginica

Source code:

import numpy as np
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
ds = load_iris()
x=ds.data
y=ds.target
y = pd.get_dummies(y).values
xtrain,xtest,ytrain,ytest = train_test_split(x, y, test_size=0.20)
learning_rate = 0.1
iterations = 5000
N = ytrain.size
input_size = 4
hidden_size = 2
output_size = 3
results = pd.DataFrame(columns=["mse", "accuracy"])
np.random.seed(10)
W1 = np.random.normal(scale=0.5, size=(input_size, hidden_size))
W2 = np.random.normal(scale=0.5, size=(hidden_size , output_size))
def sigmoid(x):
return 1 / (1 + np.exp(-x))
def mean_squared_error(ypred, ytrue):
return ((ypred - ytrue)**2).sum() / (2*ypred.size)
def accuracy(ypred, ytrue):
acc = ypred.argmax(axis=1) == ytrue.argmax(axis=1)
return acc.mean()

Department of Computer Science and Engineering (AIML), SREYAS Page 60


MACHINE LEARNING LAB MANUAL
SREYAS

for itr in range(iterations):


# feedforward propagation
# on hidden layer
Z1 = np.dot(xtrain, W1)
A1 = sigmoid(Z1)
# on output layer
Z2 = np.dot(A1, W2)
A2 = sigmoid(Z2)
# Calculating error
mse = mean_squared_error(A2, ytrain)
acc = accuracy(A2, ytrain)
temp={"mse":[mse], "accuracy":[acc]}
tt=pd.DataFrame(temp)
pd.concat((results,tt))
# backpropagation
E1 = A2 - ytrain
dW1 = E1 * A2 * (1 - A2)
E2 = np.dot(dW1, W2.T)
dW2 = E2 * A1 * (1 - A1)
# weight updates
W2_update = np.dot(A1.T, dW1) / N
W1_update = np.dot(xtrain.T, dW2) / N
W2 = W2 - learning_rate * W2_update
W1 = W1 - learning_rate * W1_update
# feedforward
Z1 = np.dot(xtest, W1)
A1 = sigmoid(Z1)
Z2 = np.dot(A1, W2)
A2 = sigmoid(Z2)
acc = accuracy(A2,ytest)
acc=int(round(acc,2)*100)
print("Accuracy=",acc,"%")

Output:

Accuracy= 53 %

Accuracy= 63 %

Accuracy= 67 %

Accuracy= 77 %

Result: Thus the program to classify this data set using back propagation algorithm is executed
and the output is verified.

Department of Computer Science and Engineering (AIML), SREYAS Page 61

You might also like