Predict Your Dream HDB Resale Flat in 1 Day: © Parallebs, 2018
Predict Your Dream HDB Resale Flat in 1 Day: © Parallebs, 2018
© Parallebs, 2018
About Us
What we do and who we are
© Parallebs, 2018
Course Overview
1 Python basics
Setting expectations
• This course is most suitable for beginners to data analytics and
Python
• This course serves as an entry point for you to explore more
about data and programming
© Parallebs, 2018
Discussion: Why learn programming and data?
• Digital and data will be a core component of any job out there in the
future
© Parallebs, 2018
What is Python used for?
Applications
© Parallebs, 2018
Hands-on coding part 1 (Basics)
6
© Parallebs, 2018
Activity Time!
Recap & Application of Python basics
© Parallebs, 2018
Hands-on coding part 2
8
© Parallebs, 2018
Activity Time!
Yelp API
© Parallebs, 2018
Concept Recap
Yelp API
Basic
© Parallebs, 2018
Stand to Win $5,000 Cash!
Yelp Data Set Challenge!
© Parallebs, 2018
Basics of Data Analytics
Concepts
© Parallebs, 2018
Basics of Data Analytics
Concepts
© Parallebs, 2018
Data Analytics Basics
Framing the problem statement for HDB houses
Ultimate goal
Predict a house resale price that is as close to the actual resale price given all the data you have
about a particular unit
3
Training – “creating the Evaluating/Testing –
4
model” evaluate the model
© Parallebs, 2018
Summary of data analytics steps
Dataset
• Statistical analysis
Broadly understand the data you have now
• Visualisations
Cleaned dataset
Training – “creating
• Linear Regression
the model”
Results
© Parallebs, 2018
Hands-on coding part 3 (Visualisations) 1
16
© Parallebs, 2018
2 Preprocessing – getting the data ready
Specific Categorical
Risk (Individuals) Continuous
• Nominal – No scale
• Location
• Square Feet
• Brand
• Height
• Ordinal
• Prices
• Bad, Normal Best
• 1-5 stars
© Parallebs, 2018
Summary of data analytics steps
Dataset
• Statistical analysis
Broadly understand the data you have now
• Visualisations
Cleaned dataset
Training – “creating
• Linear Regression
the model”
Results
© Parallebs, 2018
Data Analytics Basics
Success Stories Of Implementation of Data Analytics
© Parallebs, 2018
3 Training – creating an equation
750000
700000
• Imagine you have a dataset
650000 of 2 columns, floor area and
resale price
Resale price
600000
• You would like to predict the
550000 price of another flat, when
500000
another flat comes in (e.g.
with 80 sqm)
450000 • You can predict the 80 sqm
400000 flat by creating an “equation”
with your current data
350000 • How to determine what line
300000 makes it a best “equation” to
describe this data?
250000
30 40 50 60 70 80
Floor_area_sqm
© Parallebs, 2018
3 Training – creating an equation
High level understanding of mathematical derivation of ”line of best fit” – least squares
750000
1 Assume equation of ideal line is ypred = mxactual
700000
+c
650000
Resale price
550000
Floor_area_sqm
© Parallebs, 2018
3 Training – creating an equation
© Parallebs, 2018
3 Training – creating an equation
1000000 y = 20000x
900000
800000
Resale price
700000
600000 Y = 0 Y = X Y =2X
y = 10000x
500000 x = 30 0 300k 600k
x = 40 0 400k 800k
400000
x = 50 0 500k 1m
300000
1.6666666 1.6666666
200000 MSE 67 0 67
100000
y = 0x
0
30 35 40 45 50 55
Floor_area_sqm
© Parallebs, 2018
3 Multiple LinearTraining
Regression
– creating an equation
y = b0x0 + b1x1 +
b2x2 … + c
Eg.
Predicted Price =
1270(floor_area_sqm) +
5839(remaining_lease) + 738.4
24
© Parallebs, 2018
4 Data Preparation Evaluate the equation
• Predictive analytics aim to predict the future when future data comes in
• To simulate future data, we split today’s data and assume a part of the
data is tomorrow’s data
• Using the test data to evaluate the equation allows the evaluation to be
unbiased
SpecificTraining
Risk (Individuals)
Data Validation & Test Data
© Parallebs, 2018
4 Evaluate the equation
26
© Parallebs, 2018
Data Analytics Basics
Getting the intuition of data analytics with real-life examples
Data points
Ultimate goal
Playback experience Devices, Buffering time, network
(might be subjective) throughput
© Parallebs, 2018
Expanding your learning
Dataset
• Statistical analysis
Broadly understand the data you have now
• Visualisations
• Linear Regression
Training – “creating • .. Other models
the model” (Ridge, Boosted
trees, Neural nets)
Evaluating/Testing – • Mean absolute
Model/Equation error
evaluate the model
• .. Other metrics
(MSE, RMSE ..)
Results
© Parallebs, 2018