CO-367 Machine Learning Lab File: Submitted To: Submitted by
CO-367 Machine Learning Lab File: Submitted To: Submitted by
CO-367 Machine Learning Lab File: Submitted To: Submitted by
Lab File
Page 1
INDEX
Page 2
EXPERMIMENT 1
AIM: To study basic python libraries used in data science
THEORY:
A. Numpy
NumPy is the fundamental package for scientific computing with python.
It contains among other things:
• a powerful N-dimensional array object
2. STRING FUNCTIONS
ADD() is a function that returns element-wise string concatenation
for two arrays of str or Unicode.
Page 3
3. SORTING FUNCTIONS
NUMPY.SORT() function returns a sorted copy of the input array.
4. STATICTICAL FUNCTIONS
NUMPY.AMIN() and NUMPY.AMAX() functions return the minimum
and the maximum from the elements in the given array along the
specified axis.
B. Matplotlib
Matplotlib is a Python 2D plotting library which produces publication
quality figures in a variety of hardcopy formats and interactive
environments across platforms. Matplotlib can be used in Python
scripts, the Python and IPython shells, the Jupyter notebook, web
application servers, and four graphical user interface toolkits.
Matplotlib tries to make easy things easy and hard things possible. You
can generate plots, histograms, power spectra, bar charts, errorcharts,
scatterplots, etc., with just a few lines of code.
Page 4
FUNCTIONS OF MATPLOTLIB LIBRARY
Matplotlib comes with a wide variety of plots. Plots helps to understand
trends, patterns, and to make correlations. They’re typically instruments
for reasoning about quantitative information. Some of the sample plots are
covered here.
1. LINE PLOT
# importing matplotlib module from matplotlib
import pyplot as plt
2. BAR PLOT
# importing matplotlib module from matplotlib
import pyplot as plt
3. HISTOGRAM
# importing matplotlib module from matplotlib
import pyplot as plt
4. SCATTER PLOT
C. Pandas
Python has long been great for data munging and preparation, but
less so for data analysis and modeling. pandas helps fill this gap,
enabling you to carry out your entire data analysis workflow in
Python without having to switch to a more domain specific language
like R.
Page 5
class statistical modeling environment, but we are well on our way
toward that goal.
2. SERIES
dataflair_s1 = pd.Series(np.random.randn(5),
index=['a', 'b', 'c', 'd', 'e'])
3. DATAFRAME
d a t a fl a i r _ d f 1 = p d . D a t a F r a m e ( n p . r a n d o m . r a n d n ( 8 , 3 ) ,
index=dataflair_index,columns=['A', 'B', 'C'])
4. PANEL
CONCLUSION:
We’ve learnt the basics of the most commonly used data science libraries in
python.
Page 6
EXPERMIMENT 2
AIM: To learn how to read from a csv file using pandas.
THEORY:
Data in the form of tables is also called CSV (comma separated values)
- literally "comma-separated values." This is a text format intended for
the presentation of tabular data. Each line of the file is one line of the
table. The values of individual columns are separated by a separator
symbol - a comma (,), a semicolon (;) or another symbol. CSV can be
easily read and processed by Python.
CODE:
# Load the Pandas libraries with alias 'pd'
import pandas as pd
data = pd.read_csv("filename.csv")
data.head()
FUNCTION DESCRIPTION
read_csv Read a comma-separated values
(csv) file into DataFrame.Also
supports optionally iterating or
breaking of the file into chunks.
CONCLUSION:
We successfully read a csv file and displayed the first five lines of our dataset
Page 7
EXPERMIMENT 3
AIM: To implement linear regression.
THEORY:
Linear Regression is a Machine Learning algorithm based on supervised
learning. It performs a regression task. Regression models a target prediction
value based on
y = B0 + B1*x
In higher dimensions when we have more than one input (x), the line is called
a plane or a hyper-plane. The representation therefore is the form of the
equation and the specific values used for the coefficients.
CODE:
import numpy as np
import pandas as pd
dataset = pd.read_csv('Salary_Data.csv')
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 1].values
# Splitting the dataset into the Training set and Test set
Page 8
# Feature Scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
regressor = LinearRegression()
regressor.fit(X_train, y_train)
y_pred = regressor.predict(X_test)
plt.xlabel('Years of Experiance')
plt.ylabel('Salary')
plt.show()
plt.xlabel('Years of Experiance')
plt.ylabel('Salary')
plt.show()
CONCLUSION:
In this experiment we learned about linear regression and the graph
obtained by importing the dataset and fitting the regression model to the
dataset.
Page 9
EXPERMIMENT 4
AIM: To implement DT CART (classification and regression trees) algorithm.
THEORY:
A decision tree is a largely used non-parametric effective machine learning
modelling technique for regression and classification problems. To find
solutions a decision tree makes sequential, hierarchical decision about the
outcome variable based on the predictor data. Decision tree builds
regression or classification models in the form of a tree structure. It breaks
down a dataset into smaller and smaller subsets while at the same time an
associated decision tree is incrementally developed. The final result is a tree
Tree algorithms. In this post, we will explain the steps of CART algorithm
using an example data. Decision Tree is a recursive partitioning approach
and CART split each of the input node into two child nodes, so CART
decision tree is Binary Decision Tree. At each level of decision tree, the
algorithm identify a condition - which variable and level to be used for
splitting input node (data sample) into two child nodes.
CODE:
import numpy as np
import pandas as pd
dataset = pd.read_csv('Position_Salaries.csv')
X = dataset.iloc[:, 1:2].values
y = dataset.iloc[:, 2].values
# Splitting the dataset into the Training set and Test set
0)"""
Page 10
# Feature Scaling
sc_X = StandardScaler()
X_train = sc_X.fit_transform(X_train)
X_test = sc_X.transform(X_test)
sc_y = StandardScaler()
y_train = sc_y.fit_transform(y_train)"""
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X, y)
y_pred = regressor.predict(6.5)
plt.xlabel('Position level')
plt.ylabel('Salary')
plt.show()
CONCLUSION:
In this experiment we learned about Regression Tree (Classification and
Regression tree) and the graph obtained by importing the dataset and fitting
the regression tree model to the dataset.
Page 11
Page 12