0% found this document useful (0 votes)
51 views

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

Abhay Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
51 views

Data - Preprocessing - Tools - Ipynb - Colaboratory

Uploaded by

Abhay Yadav
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 4

3/7/22, 3:49 PM Copy of data_preprocessing_tools.

ipynb - Colaboratory

Data Preprocessing Tools

Importing the libraries

import numpy as np

import matplotlib.pyplot as plt

import pandas as pd

Importing the dataset

dataset= pd.read_csv('Data.csv')

X= dataset.iloc[:, :-1]

Y= dataset.iloc[:, -1]

print(X)

Country Age Salary

0 France 44.0 72000.0

1 Spain 27.0 48000.0

2 Germany 30.0 54000.0

3 Spain 38.0 61000.0

4 Germany 40.0 NaN

5 France 35.0 58000.0

6 Spain NaN 52000.0

7 France 48.0 79000.0

8 Germany 50.0 83000.0

9 France 37.0 67000.0

print(Y)

0 No

1 Yes

2 No

3 No

4 Yes

5 Yes

6 No

7 Yes

8 No

9 Yes

Name: Purchased, dtype: object

Taking care of missing data


from sklearn impute import SimpleImputer
https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 1/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
from sklearn.impute import SimpleImputer

imputer= SimpleImputer(missing_values=np.nan,strategy='mean')

imputer.fit(X.iloc[:, 1:3])

X.iloc[:, 1:3] = imputer.transform(X.iloc[:, 1:3])

Encoding categorical data

Encoding the Independent Variable

from sklearn.compose import ColumnTransformer

from sklearn.preprocessing import OneHotEncoder

ct= ColumnTransformer(transformers=[('encoder', OneHotEncoder(), [0])], remainder='passthr
X = np.array(ct.fit_transform(X))

print(X)

[[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

7.20000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

4.80000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 3.00000000e+01

5.40000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

6.10000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

6.37777778e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

5.80000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

5.20000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

7.90000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

8.30000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.70000000e+01

6.70000000e+04]]

Encoding the Dependent Variable

from sklearn.preprocessing import LabelEncoder

le =LabelEncoder()

Y= le.fit_transform(Y)

print(Y)

[0 1 0 0 1 1 0 1 0 1]

Splitting the dataset into the Training set and Test set
https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 2/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
Splitting the dataset into the Training set and Test set

from sklearn.model_selection import train_test_split

X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.2, random_state=1)

print(X_train)

[[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.87777778e+01

5.20000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 4.00000000e+01

6.37777778e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.40000000e+01

7.20000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 3.80000000e+01

6.10000000e+04]

[0.00000000e+00 0.00000000e+00 1.00000000e+00 2.70000000e+01

4.80000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 4.80000000e+01

7.90000000e+04]

[0.00000000e+00 1.00000000e+00 0.00000000e+00 5.00000000e+01

8.30000000e+04]

[1.00000000e+00 0.00000000e+00 0.00000000e+00 3.50000000e+01

5.80000000e+04]]

print(Y_train)

[0 1 0 0 1 1 0 1]

print(X_test)

[[0.0e+00 1.0e+00 0.0e+00 3.0e+01 5.4e+04]

[1.0e+00 0.0e+00 0.0e+00 3.7e+01 6.7e+04]]

print(Y_test)

[0 1]

Feature Scaling

from sklearn.preprocessing import StandardScaler

sc= StandardScaler()

X_train[:, 3:]=sc.fit_transform(X_train[:, 3:])

X_test[:, 3:]=sc.transform(X_test[:, 3:])

print(X_train)

[[ 0. 0. 1. -0.19159184 -1.07812594]

[ 0. 1. 0. -0.01411729 -0.07013168]

[ 1. 0. 0. 0.56670851 0.63356243]

https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 3/4
3/7/22, 3:49 PM Copy of data_preprocessing_tools.ipynb - Colaboratory
[ 0. 0. 1. -0.30453019 -0.30786617]

[ 0. 0. 1. -1.90180114 -1.42046362]

[ 1. 0. 0. 1.14753431 1.23265336]

[ 0. 1. 0. 1.43794721 1.57499104]

[ 1. 0. 0. -0.74014954 -0.56461943]]

print(X_test)

[[ 0. 1. 0. -1.46618179 -0.9069571 ]

[ 1. 0. 0. -0.44973664 0.20564034]]

https://colab.research.google.com/drive/1dRbDzNhckDa9llj_HNTGKg0EUrQN_E_k#printMode=true 4/4

You might also like