Data Preprocessing Exercise

1. The document contains details about a student named Opie Sopyan studying Master of Computer Science at Nusa Putra University. 2. It discusses data preprocessing tasks like data cleaning, integration, reduction and transformation. Examples are given for handling missing/dirty data and combining data sources. 3. Correlation analysis is explained as a way to measure the relationship between two variables using Pearson, Spearman or contingency coefficients depending on the variable type. An example analyzes the relationship between patient weight and blood pressure using hospital data and Pearson correlation.

Uploaded by

Opie Sopyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

79 views

Data Preprocessing Exercise

Uploaded by

Opie Sopyan

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 3

Name : Opie Sopyan

Student ID : 20200130005
Class : Master Computer Science Nusa Putra University
Courses: Advanced Database
Lecture : Dr. DINI OKTARINA DWI HANDAYANI, S.T, M.Sc.

1. Describe your understanding about preprocessing.

Data Preprocessing is way to process data before use the data itself. Preprocessing often
used to control ( short, ignore , or delete some ) raw data to make it in good status to be
process in next level data processing ( i.e. Data mining, Data Analyst, etc ).

2. Explain major tasks in data preprocessing. Describe how to handle dirty data and give
example for each task.
a. Data Cleaning
This task is usually basic and important to control and identity of data. Fill in missing
values, smooth noisy data, identify or remove outliers, and resolve inconsistencies

Example :
I want to remove every datasheet with incomplete status, and fill up more datasheet
with several algorithm like mean

b. Data Integration
Used to combine multiple datasheet, not only from one source but it can use from
several source, cubes and files

Example :
I want make decision tree from rainy season in West Java and I have data sources from
each region : sukabumi.csv , bandung.csv and garut.csv
Every data contain several attribute, missing data and etc; with data integration I can
combine every single attribute from each data source to make my data more accurate .

c. Data Reduction
This task we used to reduce Dimensionality, Numerosity and compress data

Example :
After we combine multiple datasheet, we have many attributes and several data we can
use to process , not only every single attribute we need, but also data we don’t need
By this task we eliminate the data.

d. Data transformation and data discretization

In this task the data will proceed to be data that can be understand with machine its
called data Normalization and Hierarchy Generation.

Example :
Re generating the data structure after eliminate data from previous task
3. What is Correlation Analysis? Describe your understating and give a real world example.
Correlation Analysis is way to measure wether exist or not exist relation from 2 variables.
We have 4 measurement In Measurement scale of statistic : Interval, Ratio, Ordinal and
Nominal.
There is 3 correlation measure to count in every scale
 Pearson Correlation used to measure Interval and Ratio
 Rank Spearman / Kendal’s tau used to measure Ordinal
 Contingency Coefficient used to measure Nominal
Example :

In the Sukabumi Hospital we have 10 patience about weight and blood pressure , we will
analyze if there is or not relation between that. Here is the data looks like :

Name Weight Blood Pressure

Imas 45 130
Duloh 44 110
Amir 42 100
Umtiti 57 155
Guardiola 55 130
Acuy 50 130
Alex 45 120
Siska 58 172
Nemandja 61 180
Ratu 60 160

Because the data we have is using scale so we use Pearson measurement.

After we process the data we now have measurement of closeness :
rxy = 0.91 < it is indicated that weight and blood pressure had relation

Now we are trying hypothesis test . first we are formulating hypothesis

 H0 describe there is no relation between weight and blood pressure
 H1 describe there is relation between weight and blood pressure

Next, we determine level of significance and t table

a
a=5 % , =0.025
2
db=( n−2 )=( 10−2 )=8

Next we determine to value

We have :
t0 = 6.21

Next we are formulating testing criteria

t = 6.21 > t
0 0.025;8 =2.308 ( refuse H0 )
In the end, we can make a conclusion that Weight and Blood Pressure Have a relation

4. Suppose two stocks A and B have the following values in one week: (7, 5), (3, 8), (5, 10),
(14, 11), (6, 14). If the stocks are affected by the same industry trends, will their prices rise
or fall together?

By above data we have a table like this :

Day Stocks A Stocks B

1 7 5
2 3 8
3 5 10
4 14 11
5 6 14

Because the data shown scale, so we use Pearson measurement check if there is relation
between stock A and Stock B or not.

r
The result is : xy = 0.1 < it is indicated that Stocks A & B had no relation

Next we’re looking for Coeffecient (r) , N, T Statistic, DF, and P Value
By using Excel we now have this result

Coeffecient (r) 0,195557

N 5
T statistic 0,345384
DF 3
P value 0,752605

With a = 0,025 we can now take the summary that there is no relation between Stocks A and
Stocks B !

It’s mean the prices will not rise together whenever stock affected by trends

Activity 5 Topic: Blender/Mixer Objectives
100% (1)
Activity 5 Topic: Blender/Mixer Objectives
12 pages
Data Preprocessing (DWDM MOD 2)
No ratings yet
Data Preprocessing (DWDM MOD 2)
62 pages
03Preprocessing (2)
No ratings yet
03Preprocessing (2)
80 pages
Data Preprocessing
No ratings yet
Data Preprocessing
39 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
80 pages
data mining 3
No ratings yet
data mining 3
57 pages
Chapter 3: Data Preprocessing
100% (1)
Chapter 3: Data Preprocessing
41 pages
' 3 IT326 - Ch2 - Pre-Processing
No ratings yet
' 3 IT326 - Ch2 - Pre-Processing
48 pages
PPT1
No ratings yet
PPT1
93 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
50 pages
Module 2
No ratings yet
Module 2
62 pages
Data Pre Processing
No ratings yet
Data Pre Processing
63 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
62 pages
CH 03-01 Data Preprocessing
No ratings yet
CH 03-01 Data Preprocessing
27 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
61 pages
UpdatedUnit 1 Data Preprocessing
No ratings yet
UpdatedUnit 1 Data Preprocessing
38 pages
Data Mining and Knowledge Discovery
No ratings yet
Data Mining and Knowledge Discovery
65 pages
03preprocessing DMDW
No ratings yet
03preprocessing DMDW
81 pages
Chapter 3
No ratings yet
Chapter 3
56 pages
03Preprocessing
No ratings yet
03Preprocessing
65 pages
W4-5 03preprocessing
No ratings yet
W4-5 03preprocessing
83 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
56 pages
Concepts and Techniques: - Chapter 3
No ratings yet
Concepts and Techniques: - Chapter 3
64 pages
Module 5 03preprocessing
No ratings yet
Module 5 03preprocessing
63 pages
_03Preprocessing
No ratings yet
_03Preprocessing
60 pages
Lecture 2.3.1-2.3.3
No ratings yet
Lecture 2.3.1-2.3.3
67 pages
Concepts and Techniques: - Chapter 3
No ratings yet
Concepts and Techniques: - Chapter 3
63 pages
03 Preprocessing
No ratings yet
03 Preprocessing
63 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
63 pages
Concepts and Techniques: - Chapter 3
No ratings yet
Concepts and Techniques: - Chapter 3
63 pages
03 Preprocessing
No ratings yet
03 Preprocessing
63 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
66 pages
Data Mining: Dosen: Dr. Vitri Tundjungsari
No ratings yet
Data Mining: Dosen: Dr. Vitri Tundjungsari
64 pages
Unit 2 Data Preprocessing
No ratings yet
Unit 2 Data Preprocessing
40 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
54 pages
03Preprocessing_20160222
No ratings yet
03Preprocessing_20160222
65 pages
Chapter 3
No ratings yet
Chapter 3
63 pages
Ch 3-Final
No ratings yet
Ch 3-Final
39 pages
Lecture 3
No ratings yet
Lecture 3
47 pages
03Preprocessing
No ratings yet
03Preprocessing
38 pages
DM_merged
No ratings yet
DM_merged
169 pages
03 Pre Processing
No ratings yet
03 Pre Processing
89 pages
Data Mining P5
No ratings yet
Data Mining P5
32 pages
Lec7
No ratings yet
Lec7
45 pages
Concepts and Techniques: Data Mining
No ratings yet
Concepts and Techniques: Data Mining
52 pages
03 Pre Processing
No ratings yet
03 Pre Processing
63 pages
03 Preprocessing
No ratings yet
03 Preprocessing
64 pages
Chapter 3: Data Preprocessing
No ratings yet
Chapter 3: Data Preprocessing
30 pages
2-Data Fundamentals for BI - Part1
No ratings yet
2-Data Fundamentals for BI - Part1
39 pages
Data Preprocessing
No ratings yet
Data Preprocessing
63 pages
Unit 3
No ratings yet
Unit 3
164 pages
Wk6 Preprocessing
No ratings yet
Wk6 Preprocessing
64 pages
Unit2 Part2
No ratings yet
Unit2 Part2
67 pages
4 Preprocessing
No ratings yet
4 Preprocessing
72 pages
GK NU CS 503 - Data Preprocessing
No ratings yet
GK NU CS 503 - Data Preprocessing
62 pages
TTDS Lecture 2
No ratings yet
TTDS Lecture 2
40 pages
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
No ratings yet
Lecture Source: Books by Tan, Steinbach, Kumar Han, Kamber & Pei Evans Dinesh Kumar + Experiential Knowledge
40 pages
Chapter 3 - Tagged
No ratings yet
Chapter 3 - Tagged
63 pages
TTDS Lecture 2
No ratings yet
TTDS Lecture 2
40 pages
Introduction To Business Statistics Through R Software: Software
From Everand
Introduction To Business Statistics Through R Software: Software
Editor IJSMI
No ratings yet
Chi Squared for Beginners
From Everand
Chi Squared for Beginners
Stephanie Glen
No ratings yet
5683-move-motor-microbit-additional-resources-lights-and-sound
No ratings yet
5683-move-motor-microbit-additional-resources-lights-and-sound
16 pages
Baker
No ratings yet
Baker
4 pages
NS2-DH01-P0UYK-770010 (Housing Complex) Inspection and Test Plan For Masonry Works
No ratings yet
NS2-DH01-P0UYK-770010 (Housing Complex) Inspection and Test Plan For Masonry Works
5 pages
Spectrum-NET Overview Oct 2 No NDA
100% (1)
Spectrum-NET Overview Oct 2 No NDA
12 pages
Relational Model Elements
No ratings yet
Relational Model Elements
1 page
C++ Practical File
No ratings yet
C++ Practical File
35 pages
Dose Calculation: 1) Preparation of 1% V/V Tween 80 in Saline: (For 150 ML)
No ratings yet
Dose Calculation: 1) Preparation of 1% V/V Tween 80 in Saline: (For 150 ML)
3 pages
Applied Reinforcement Learning with Python: With OpenAI Gym, Tensorflow, and Keras Beysolow Ii - The latest updated ebook is now available for download
100% (3)
Applied Reinforcement Learning with Python: With OpenAI Gym, Tensorflow, and Keras Beysolow Ii - The latest updated ebook is now available for download
63 pages
Kyron 200603
100% (1)
Kyron 200603
1,011 pages
Lecture 8 - Exercises On Cost Variances - 1
No ratings yet
Lecture 8 - Exercises On Cost Variances - 1
8 pages
F900got Connection 6 of 6
No ratings yet
F900got Connection 6 of 6
92 pages
SMK 630 D
No ratings yet
SMK 630 D
8 pages
ZF Arıza Kodları
No ratings yet
ZF Arıza Kodları
112 pages
Title
No ratings yet
Title
4 pages
Rma Anitapan Elementary School Summary v3
No ratings yet
Rma Anitapan Elementary School Summary v3
37 pages
Mounting Instructions: 75900 Models Top View
100% (1)
Mounting Instructions: 75900 Models Top View
6 pages
Plaxis: CONNECT Edition V22.00
100% (1)
Plaxis: CONNECT Edition V22.00
160 pages
NFS Error
No ratings yet
NFS Error
1 page
As A Single PDF
No ratings yet
As A Single PDF
3 pages
Art Tutorial: Foreword
No ratings yet
Art Tutorial: Foreword
13 pages
Grade 11 Pre-June Exam P1 MG 2024
No ratings yet
Grade 11 Pre-June Exam P1 MG 2024
8 pages
Chapter Four: System Design: Werabe University Institute of Technology Department of Information Systems
No ratings yet
Chapter Four: System Design: Werabe University Institute of Technology Department of Information Systems
23 pages
Fractions
100% (2)
Fractions
21 pages
Observations On Concrete Shear Strength: Examples Illustrate Effects of Changes in The ACI 318 Code Shear Equations
No ratings yet
Observations On Concrete Shear Strength: Examples Illustrate Effects of Changes in The ACI 318 Code Shear Equations
6 pages
CCNA 2 v7 Modules 7
No ratings yet
CCNA 2 v7 Modules 7
32 pages
Soap Bar Qualities
No ratings yet
Soap Bar Qualities
3 pages
BBM 3A978 3 642 27151 9 2F1
No ratings yet
BBM 3A978 3 642 27151 9 2F1
9 pages
Options Nmap
No ratings yet
Options Nmap
4 pages