0% found this document useful (0 votes)

51 views

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

Abhay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

51 views

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

Abhay Yadav

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

3/7/22, 3:49 PM Copy of data_preprocessing_tools.

ipynb - Colaboratory

Data Preprocessing Tools

Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

Importing the dataset

dataset= pd.read_csv('Data.csv')

X= dataset.iloc[:, :-1]

Y= dataset.iloc[:, -1]

print(X)

Country Age Salary

0 France 44.0 72000.0

1 Spain 27.0 48000.0

2 Germany 30.0 54000.0

3 Spain 38.0 61000.0

4 Germany 40.0 NaN

5 France 35.0 58000.0

6 Spain NaN 52000.0

7 France 48.0 79000.0

8 Germany 50.0 83000.0

9 France 37.0 67000.0

print(Y)

0 No

1 Yes

2 No

3 No

4 Yes

5 Yes

6 No

7 Yes

8 No

9 Yes

Name: Purchased, dtype: object

Taking care of missing data

from sklearn impute import SimpleImputer
https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 1/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
from sklearn.impute import SimpleImputer

imputer= SimpleImputer(missing_values=np.nan,strategy='mean')

imputer.fit(X.iloc[:, 1:3])

X.iloc[:, 1:3] = imputer.transform(X.iloc[:, 1:3])

Encoding categorical data

Encoding the Independent Variable

from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder

ct= ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthr
X = np.array(ct.fit_transform(X))

print(X)

[[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

7.20000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

4.80000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 3.00000000e+01

5.40000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

6.10000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

6.37777778e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

5.80000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

5.20000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

7.90000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

8.30000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.70000000e+01

6.70000000e+04]]

Encoding the Dependent Variable

from sklearn.preprocessing import LabelEncoder

le =LabelEncoder()

Y= le.fit_transform(Y)

print(Y)

[0 1 0 0 1 1 0 1 0 1]

Splitting the dataset into the Training set and Test set
https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 2/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=1)

print(X_train)

[[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

5.20000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

6.37777778e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

7.20000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

6.10000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

4.80000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

7.90000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

8.30000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

5.80000000e+04]]

print(Y_train)

[0 1 0 0 1 1 0 1]

print(X_test)

[[0.0e+00 1.0e+00 0.0e+00 3.0e+01 5.4e+04]

[1.0e+00 0.0e+00 0.0e+00 3.7e+01 6.7e+04]]

print(Y_test)

[0 1]

Feature Scaling

from sklearn.preprocessing import StandardScaler

sc= StandardScaler()

X_train[:, 3:]=sc.fit_transform(X_train[:, 3:])

X_test[:, 3:]=sc.transform(X_test[:, 3:])

print(X_train)

[[ 0. 0. 1. -0.19159184 -1.07812594]

[ 0. 1. 0. -0.01411729 -0.07013168]

[ 1. 0. 0. 0.56670851 0.63356243]

https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 3/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
[ 0. 0. 1. -0.30453019 -0.30786617]

[ 0. 0. 1. -1.90180114 -1.42046362]

[ 1. 0. 0. 1.14753431 1.23265336]

[ 0. 1. 0. 1.43794721 1.57499104]

[ 1. 0. 0. -0.74014954 -0.56461943]]

print(X_test)

[[ 0. 1. 0. -1.46618179 -0.9069571 ]

[ 1. 0. 0. -0.44973664 0.20564034]]

https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 4/4

Mercedes-Benz Greener Manufacturing Ai
0% (1)
Mercedes-Benz Greener Manufacturing Ai
16 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Data Science
No ratings yet
Data Science
1 page
DA_Programs
No ratings yet
DA_Programs
44 pages
Mlda - Lab
No ratings yet
Mlda - Lab
35 pages
HW2A_Jiarui Han
No ratings yet
HW2A_Jiarui Han
6 pages
Final ML File
No ratings yet
Final ML File
34 pages
16BCB0126 VL2018195002535 Pe003
No ratings yet
16BCB0126 VL2018195002535 Pe003
40 pages
ML Shristi File
No ratings yet
ML Shristi File
49 pages
LAB-4 Report
No ratings yet
LAB-4 Report
21 pages
ML Record Print
No ratings yet
ML Record Print
20 pages
Machine Learning LAB
No ratings yet
Machine Learning LAB
20 pages
DMLAB-2 - 4238 - 01-08-2024.ipynb - Colab
No ratings yet
DMLAB-2 - 4238 - 01-08-2024.ipynb - Colab
4 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
ML(sudhanshu)
No ratings yet
ML(sudhanshu)
24 pages
ML Lab Manual
No ratings yet
ML Lab Manual
12 pages
Machine File
No ratings yet
Machine File
27 pages
ML Lab
No ratings yet
ML Lab
7 pages
Ridge - Lasso - Regression (1) .Ipynb - Colaboratory
No ratings yet
Ridge - Lasso - Regression (1) .Ipynb - Colaboratory
4 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
ML Activity Kalyan
No ratings yet
ML Activity Kalyan
21 pages
data-mining-lab-manual-CSE-VII-Sem
No ratings yet
data-mining-lab-manual-CSE-VII-Sem
63 pages
Aiml Ex 4-7
No ratings yet
Aiml Ex 4-7
8 pages
Experiment1111
No ratings yet
Experiment1111
25 pages
Regression
No ratings yet
Regression
32 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
Unit2 ML Programs
No ratings yet
Unit2 ML Programs
7 pages
Linear and Logistic Regression
No ratings yet
Linear and Logistic Regression
6 pages
MLLabManual
No ratings yet
MLLabManual
24 pages
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
No ratings yet
Setup: This Notebook Contains All The Sample Code and Solutions To The Exercises in Chapter 3
30 pages
som
No ratings yet
som
19 pages
External
No ratings yet
External
11 pages
ML
No ratings yet
ML
8 pages
ML Lab Manual
No ratings yet
ML Lab Manual
24 pages
KRAI Practical
No ratings yet
KRAI Practical
14 pages
featureselection
No ratings yet
featureselection
11 pages
21brs1474 ML Lab 2
No ratings yet
21brs1474 ML Lab 2
25 pages
MACHINE LEARNING manual
No ratings yet
MACHINE LEARNING manual
36 pages
Dsbda 4
No ratings yet
Dsbda 4
4 pages
ML Lab Prgms Split
No ratings yet
ML Lab Prgms Split
3 pages
Programs Lab Bca
No ratings yet
Programs Lab Bca
16 pages
Logistic Regression
No ratings yet
Logistic Regression
3 pages
Deep Learning Perceptron
No ratings yet
Deep Learning Perceptron
10 pages
Machine learningn
No ratings yet
Machine learningn
5 pages
1st PGM
No ratings yet
1st PGM
10 pages
ml labs
No ratings yet
ml labs
14 pages
Lab 08 - Data Preprocessing
No ratings yet
Lab 08 - Data Preprocessing
9 pages
EE2211 CheatSheet
No ratings yet
EE2211 CheatSheet
15 pages
Aiml Lab
No ratings yet
Aiml Lab
14 pages
Assignment 1
No ratings yet
Assignment 1
17 pages
1 - Standard Linear Regression: Numpy NP Pandas
No ratings yet
1 - Standard Linear Regression: Numpy NP Pandas
4 pages
hw02 Description
No ratings yet
hw02 Description
4 pages
Bilal Ahmad Ai & DSS Assign # 03
No ratings yet
Bilal Ahmad Ai & DSS Assign # 03
7 pages
EXP2-DM - KS
No ratings yet
EXP2-DM - KS
9 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
ML Lab Manual PDF
No ratings yet
ML Lab Manual PDF
9 pages
sahil_ml
No ratings yet
sahil_ml
21 pages
HIV Regression Source Code
No ratings yet
HIV Regression Source Code
26 pages