CO-367 Machine Learning Lab File: Submitted To: Submitted by

CO-367 Machine Learning
Lab File
Submitted to: Submitted by:

Sanjay Pathidar Shubham Anand
(Associate Professor) 2k17/CO/336
Page 1
INDEX
S.No Experiment Date Sign
Page 2
EXPERMIMENT 1
 
AIM: To study basic python libraries used in data science
THEORY:
A. Numpy 
NumPy is the fundamental package for scientific computing with python.
It contains among other things:
• a powerful N-dimensional array object
• sophisticated (broadcasting) functions
• tools for integrating C/C++ and Fortran code
• useful linear algebra, Fourier transform, and random number

capabilities
FUNCTIONS OF NUMPY LIBRARY

1. MATHEMATICAL FUNCTIONS
ARCSIN, ARCOS and ARCTAN functions return the trigonometric
inverse of sin, cos, and tan of the given angle.
NUMPY.AROUND() is a function that returns the value rounded to

the desired precision.
NUMPY.FLOOR() is a function returns the largest integer not

greater than the input parameter.
2. STRING FUNCTIONS
ADD() is a function that returns element-wise string concatenation
for two arrays of str or Unicode.
MULTIPLY() is a function that returns the string with multiple

concatenation, element-wise.
CENTER() is a function that returns a copy of the given string with

elements centered in a string of specified length.
SPLIT() is a function that returns a list of the words in the string,

using separate or delimiter.
Page 3
3. SORTING FUNCTIONS
NUMPY.SORT() function returns a sorted copy of the input array.
NUMPY.ARGSORT() function performs an indirect sort on input

array, along the given axis and using a specified kind of sort to
return the array of indices of data.
NUMPY.LEXSORT() function performs an indirect sort using a

sequence of keys. The keys can be seen as a column in a
spreadsheet.
4. STATICTICAL FUNCTIONS
NUMPY.AMIN() and NUMPY.AMAX() functions return the minimum
and the maximum from the elements in the given array along the
specified axis.
NUMPY.PTP() function returns the range (maximum-minimum) of

values along an axis.
NUMPY.MEDIAN() returns the value separating the higher half of a

data sample from the lower half – Median.
NUMPY.PERCENTILE() returns Percentile (or a centile) that is a

measure used in statistics indicating the value below which a given
percentage of observations in a group of observations fall.
B. Matplotlib 
Matplotlib is a Python 2D plotting library which produces publication
quality figures in a variety of hardcopy formats and interactive
environments across platforms. Matplotlib can be used in Python
scripts, the Python and IPython shells, the Jupyter notebook, web
application servers, and four graphical user interface toolkits. 
Matplotlib tries to make easy things easy and hard things possible. You
can generate plots, histograms, power spectra, bar charts, errorcharts,
scatterplots, etc., with just a few lines of code.
Page 4
FUNCTIONS OF MATPLOTLIB LIBRARY
Matplotlib comes with a wide variety of plots. Plots helps to understand
trends, patterns, and to make correlations. They’re typically instruments
for reasoning about quantitative information. Some of the sample plots are
covered here.
1. LINE PLOT
# importing matplotlib module from matplotlib
import pyplot as plt
# Function to plot plt.plot(x,y)
# function to show the plot plt.show()
2. BAR PLOT
# Function to plot the bar plt.bar(x,y)
# function to show the plot plt.show()
3. HISTOGRAM
# Function to plot histogram plt.hist(y)
# Function to show the plot plt.show()
4. SCATTER PLOT

# Function to plot scatter plt.scatter(x, y)
# Function to show the plot plt.show()
C. Pandas
Python has long been great for data munging and preparation, but
less so for data analysis and modeling. pandas helps fill this gap,
enabling you to carry out your entire data analysis workflow in
Python without having to switch to a more domain specific language
like R.
pandas does not implement significant modeling functionality

outside of linear and panel regression; for this, look to statsmodels
and scikit-learn. More work is still needed to make Python a first
Page 5
class statistical modeling environment, but we are well on our way
toward that goal.
FUNCTIONS OF PANDAS LIBRARY

1. INDEX
dataflair_index =pd.date_range('1/1/2000', periods=8)
2. SERIES
dataflair_s1 = pd.Series(np.random.randn(5),
index=['a', 'b', 'c', 'd', 'e'])
3. DATAFRAME
d a t a fl a i r _ d f 1 = p d . D a t a F r a m e ( n p . r a n d o m . r a n d n ( 8 , 3 ) ,
index=dataflair_index,columns=['A', 'B', 'C'])
4. PANEL
dataflair_wp1 = pd.Panel(np.random.randn(2, 5, 4), items=['Item1',

'Item2'],major_axis=pd.date_range('1/1/2000',
periods=5),minor_axis=['A', 'B', 'C', 'D'])
CONCLUSION:
We’ve learnt the basics of the most commonly used data science libraries in
python.
Page 6
EXPERMIMENT 2
 
AIM: To learn how to read from a csv file using pandas.
 
THEORY:
Data in the form of tables is also called CSV (comma separated values)
- literally "comma-separated values." This is a text format intended for
the presentation of tabular data. Each line of the file is one line of the
table. The values of individual columns are separated by a separator
symbol - a comma (,), a semicolon (;) or another symbol. CSV can be
easily read and processed by Python.
CODE:
# Load the Pandas libraries with alias 'pd'
import pandas as pd
# Read data from file 'filename.csv'
# (in the same directory that your python process is based)
# Control delimiters, rows, column names with read_csv
data = pd.read_csv("filename.csv")
# Preview the first 5 lines of the loaded data
data.head()
FUNCTION DESCRIPTION
read_csv Read a comma-separated values
(csv) file into DataFrame.Also
supports optionally iterating or
breaking of the file into chunks.
head Preview the first 5 lines of the

loaded data
CONCLUSION:
We successfully read a csv file and displayed the first five lines of our dataset
Page 7
EXPERMIMENT 3
AIM: To implement linear regression.
THEORY: 
Linear Regression is a Machine Learning algorithm based on supervised
learning. It performs a regression task. Regression models a target prediction
value based on
independent variables. It is mostly used for finding out the relationship

between variables and forecasting. Different regression models differ based
on – the kind of relationship between dependent and independent variables,
they are considering and the number of independent variables being used.
Linear Regression performs the task to predict a dependent variable value (y)
based on a given independent variable (x). So, this regression technique
finds out a linear relationship between x (input) and y(output). In a simple
regression problem (a single x and a single y), the form of the model would
be:
y = B0 + B1*x
In higher dimensions when we have more than one input (x), the line is called
a plane or a hyper-plane. The representation therefore is the form of the
equation and the specific values used for the coefficients.
CODE:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
# Splitting the dataset into the Training set and Test set
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size

= 1/3, random_state = 0)
Page 8
# Feature Scaling
from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
# Fitting Simple Linear Regression to the Training set
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
# Predicting the Test set results
y_pred = regressor.predict(X_test)
# Visualising the Training set results
plt.scatter(X_train, y_train, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue' )
plt.title('Salary Vs Experiance (Training Set)')
plt.xlabel('Years of Experiance')
plt.ylabel('Salary')
plt.show()
# Visualising the Test set results
plt.scatter(X_test, y_test, color = 'red')
plt.plot(X_train, regressor.predict(X_train), color = 'blue' )
plt.title('Salary Vs Experiance (Test Set)')
plt.xlabel('Years of Experiance')
plt.show()
CONCLUSION:
In this experiment we learned about linear regression and the graph
obtained by importing the dataset and fitting the regression model to the
dataset.
Page 9
EXPERMIMENT 4
AIM: To implement DT CART (classification and regression trees) algorithm.
THEORY:
A decision tree is a largely used non-parametric effective machine learning
modelling technique for regression and classification problems. To find
solutions a decision tree makes sequential, hierarchical decision about the
outcome variable based on the predictor data. Decision tree builds
regression or classification models in the form of a tree structure. It breaks
down a dataset into smaller and smaller subsets while at the same time an
associated decision tree is incrementally developed. The final result is a tree
with decision nodes and leaf nodes.
Classification and Regression Tree (CART) is one of commonly used Decision
Tree algorithms. In this post, we will explain the steps of CART algorithm
using an example data. Decision Tree is a recursive partitioning approach
and CART split each of the input node into two child nodes, so CART
decision tree is Binary Decision Tree. At each level of decision tree, the
algorithm identify a condition - which variable and level to be used for
splitting input node (data sample) into two child nodes.
CODE:
# Importing the libraries
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
# Importing the dataset
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Splitting the dataset into the Training set and Test set
"”"from sklearn.cross_validation import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size

= 0.2, random_state =
0)"""
Page 10
# Feature Scaling
"""from sklearn.preprocessing import StandardScaler
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
# Fitting Decision Tree Regression to the dataset
from sklearn.tree import DecisionTreeRegressor
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X, y)
# Predicting a new result
y_pred = regressor.predict(6.5)
# Visualising the Decision Tree Regression results (higher

resolution)
X_grid = np.arange(min(X), max(X), 0.01)
X_grid = X_grid.reshape((len(X_grid), 1))
plt.scatter(X, y, color = 'red')
plt.plot(X_grid, regressor.predict(X_grid), color = 'blue')
plt.title('Truth or Bluff (Decision Tree Regression)')
plt.xlabel('Position level')
plt.show()
CONCLUSION:
In this experiment we learned about Regression Tree (Classification and
Regression tree) and the graph obtained by importing the dataset and fitting
the regression tree model to the dataset.
Page 11
Page 12

CO-367 Machine Learning Lab File: Submitted To: Submitted by

Uploaded by

Copyright:

Available Formats

CO-367 Machine Learning Lab File: Submitted To: Submitted by

Uploaded by

Document Information

Original Description:

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

CO-367 Machine Learning Lab File: Submitted To: Submitted by

Uploaded by

Copyright:

Available Formats

CO-367 Machine Learning

Submitted to: Submitted by:

(Associate Professor) 2k17/CO/336

S.No Experiment Date Sign

• sophisticated (broadcasting) functions

• tools for integrating C/C++ and Fortran code

• useful linear algebra, Fourier transform, and random number

FUNCTIONS OF NUMPY LIBRARY

NUMPY.AROUND() is a function that returns the value rounded to

NUMPY.FLOOR() is a function returns the largest integer not

MULTIPLY() is a function that returns the string with multiple

CENTER() is a function that returns a copy of the given string with

SPLIT() is a function that returns a list of the words in the string,

NUMPY.ARGSORT() function performs an indirect sort on input

NUMPY.LEXSORT() function performs an indirect sort using a

NUMPY.PTP() function returns the range (maximum-minimum) of

NUMPY.MEDIAN() returns the value separating the higher half of a

NUMPY.PERCENTILE() returns Percentile (or a centile) that is a

# Function to plot plt.plot(x,y)

# function to show the plot plt.show()

# Function to plot the bar plt.bar(x,y)

# function to show the plot plt.show()

# Function to plot histogram plt.hist(y)

# Function to show the plot plt.show()

# importing matplotlib module from matplotlib

# Function to plot scatter plt.scatter(x, y)

# Function to show the plot plt.show()

pandas does not implement significant modeling functionality

FUNCTIONS OF PANDAS LIBRARY

dataflair_wp1 = pd.Panel(np.random.randn(2, 5, 4), items=['Item1',

# Read data from file 'filename.csv'

# (in the same directory that your python process is based)

# Control delimiters, rows, column names with read_csv

# Preview the first 5 lines of the loaded data

head Preview the first 5 lines of the

independent variables. It is mostly used for finding out the relationship

# Importing the libraries

import matplotlib.pyplot as plt

# Importing the dataset

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size

from sklearn.preprocessing import StandardScaler

# Fitting Simple Linear Regression to the Training set

from sklearn.linear_model import LinearRegression

# Predicting the Test set results

# Visualising the Training set results

plt.scatter(X_train, y_train, color = 'red')

plt.plot(X_train, regressor.predict(X_train), color = 'blue' )

plt.title('Salary Vs Experiance (Training Set)')

# Visualising the Test set results

plt.scatter(X_test, y_test, color = 'red')

plt.plot(X_train, regressor.predict(X_train), color = 'blue' )

plt.title('Salary Vs Experiance (Test Set)')

with decision nodes and leaf nodes.

Classification and Regression Tree (CART) is one of commonly used Decision

# Importing the libraries

import matplotlib.pyplot as plt

# Importing the dataset