0% found this document useful (0 votes)
63 views28 pages

Predict Your Dream HDB Resale Flat in 1 Day: © Parallebs, 2018

python

Uploaded by

baah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
63 views28 pages

Predict Your Dream HDB Resale Flat in 1 Day: © Parallebs, 2018

python

Uploaded by

baah
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Predict Your Dream HDB

Resale Flat In 1 Day

© Parallebs, 2018
About Us
What we do and who we are

Brandon Xue Xun

• Implementing RPA and AI-


driven processes for SMEs • Undergraduate specialising in
• Served as Chief Technology Business Analytics
Officer of The Pawmeal Group • Completed data analytics
• Was a tech consultant to internships at Meltwater and
startups Shopee,
• Ecommerce, web • Spearheaded several 4-months
scraping and automated long projects, including
logistics models developing key databases and
• Undergraduate study of simulation models
Business Analytics • Was coaching basic Excel
functions to undergraduates.

© Parallebs, 2018
Course Overview

1 Python basics

Introduction to popular open source libraries for data


2
analytics

3 Introduction to Data Analytics

4 Create your own predictive model

Setting expectations
• This course is most suitable for beginners to data analytics and
Python
• This course serves as an entry point for you to explore more
about data and programming

© Parallebs, 2018
Discussion: Why learn programming and data?

• Digital and data will be a core component of any job out there in the
future

© Parallebs, 2018
What is Python used for?
Applications

Python is a multi-purpose programming language

• Web Development (Django for Instagram)


• Numeric/Scientific – Data Analytics and Machine Learning
• Web Scraping
• Computer Vision
• Writing scripts for automation

© Parallebs, 2018
Hands-on coding part 1 (Basics)

6
© Parallebs, 2018
Activity Time!
Recap & Application of Python basics

1. Declare 2 variables with a random string as a value


2. Put these two strings together
• Try to put a space between the two strings
3. Declare a variable with a random list with a length of 5
values each
4. Query the 4th value from this list
5. Replace the 3rd value with the string ‘Hello’
6. Replace the 2nd value with a dict (dictionary)
• Try to query the value of that dictionary

© Parallebs, 2018
Hands-on coding part 2

8
© Parallebs, 2018
Activity Time!
Yelp API

© Parallebs, 2018
Concept Recap
Yelp API

Basic

1. Get a list of name of all restaurants


2. Get a list of name of all restaurants with a rating of above 3.5
3. Get a list of name of all restaurants with a distance below 1000

© Parallebs, 2018
Stand to Win $5,000 Cash!
Yelp Data Set Challenge!

• Semantic Topic Modelling to personalise ratings


• E.g. Two individuals might give a rating based on either
a service or the food à Model analyses how the textual
reviews classify the ratings based

© Parallebs, 2018
Basics of Data Analytics
Concepts

Specific Risk (Individuals)


Descriptive Analytics Predictive Analytics

• What has happened? • What could happen?


• Most common type of • Move beyond historical
analytics data
• While it might seem simple, • Use models to predict
the difficulty for practioners events
is combining data from
different sources

© Parallebs, 2018
Basics of Data Analytics
Concepts

Specific Risk (Individuals)


Supervised Learning Unsupervised Learning

• The model has an explicit • The model has no goals,


goal and usually finds
• Predicting if a customer relationships between
will buy a product or not inputs
• Predicting property prices • Clustering customer
segments

© Parallebs, 2018
Data Analytics Basics
Framing the problem statement for HDB houses

Ultimate goal
Predict a house resale price that is as close to the actual resale price given all the data you have
about a particular unit

Broadly understand the Preprocessing – getting


1 2
data you have now the data ready

3
Training – “creating the Evaluating/Testing –
4
model” evaluate the model

5 Pick the best equation with the lowest error

© Parallebs, 2018
Summary of data analytics steps

Dataset

• Statistical analysis
Broadly understand the data you have now
• Visualisations

Preprocessing – getting the data ready • Categorising independent variables

Cleaned dataset

Training dataset Test dataset

Training – “creating
• Linear Regression
the model”

Evaluating/Testing – • Mean absolute


Model/Equation error
evaluate the model

Results

© Parallebs, 2018
Hands-on coding part 3 (Visualisations) 1

16
© Parallebs, 2018
2 Preprocessing – getting the data ready

Cleaning – By data conversions

Specific Categorical
Risk (Individuals) Continuous

• Nominal – No scale
• Location
• Square Feet
• Brand
• Height
• Ordinal
• Prices
• Bad, Normal Best
• 1-5 stars

© Parallebs, 2018
Summary of data analytics steps

Dataset

• Statistical analysis
Broadly understand the data you have now
• Visualisations

Preprocessing – getting the data ready • Categorising independent variables

Cleaned dataset

Training dataset Test dataset

Training – “creating
• Linear Regression
the model”

Evaluating/Testing – • Mean absolute


Model/Equation error
evaluate the model

Results

© Parallebs, 2018
Data Analytics Basics
Success Stories Of Implementation of Data Analytics

© Parallebs, 2018
3 Training – creating an equation

Intuition behind predictive analytics

750000

700000
• Imagine you have a dataset
650000 of 2 columns, floor area and
resale price
Resale price

600000
• You would like to predict the
550000 price of another flat, when
500000
another flat comes in (e.g.
with 80 sqm)
450000 • You can predict the 80 sqm
400000 flat by creating an “equation”
with your current data
350000 • How to determine what line
300000 makes it a best “equation” to
describe this data?
250000
30 40 50 60 70 80

Floor_area_sqm

© Parallebs, 2018
3 Training – creating an equation

Intuition behind predictive analytics – Least squares

High level understanding of mathematical derivation of ”line of best fit” – least squares

750000
1 Assume equation of ideal line is ypred = mxactual
700000
+c
650000
Resale price

600000 2 For every ypred , there is a yactual

550000

500000 3 Error = yactual – ypred


450000
Error = yactual - mxactual - c
400000
Sum of squared errors (SSE) = (Err1)2 +
350000 (Err2)2… + (Errn)2
300000

250000 4 Find m and c, after solving for minimum SSE


30 40 50 60 70 80

Floor_area_sqm
© Parallebs, 2018
3 Training – creating an equation

Intuition behind predictive analytics

© Parallebs, 2018
3 Training – creating an equation

Intuition behind predictive analytics – Cost Function

1000000 y = 20000x
900000

800000
Resale price

700000

600000 Y = 0 Y = X Y =2X
y = 10000x
500000 x = 30 0 300k 600k
x = 40 0 400k 800k
400000
x = 50 0 500k 1m
300000
1.6666666 1.6666666
200000 MSE 67 0 67

100000
y = 0x
0
30 35 40 45 50 55

Floor_area_sqm

© Parallebs, 2018
3 Multiple LinearTraining
Regression
– creating an equation

For the training set

y = b0x0 + b1x1 +
b2x2 … + c
Eg.
Predicted Price =
1270(floor_area_sqm) +
5839(remaining_lease) + 738.4

• Where bn is the parameters of


the independent variable
• Y is the target variable
• B is statistically determined

24
© Parallebs, 2018
4 Data Preparation Evaluate the equation

• Predictive analytics aim to predict the future when future data comes in
• To simulate future data, we split today’s data and assume a part of the
data is tomorrow’s data
• Using the test data to evaluate the equation allows the evaluation to be
unbiased

SpecificTraining
Risk (Individuals)
Data Validation & Test Data

• Used to create the • Test the model to see if it


“equation” – otherwise works on the training model
known as training • 20% of the data
• We will keep it at 80% of
the data

© Parallebs, 2018
4 Evaluate the equation

For the test set

“Equation” created from training data:


Predicted HDB price = 1300 x (remaining_lease) + 3300 x (floor_area_sqm) + 10000

Evaluate equation with test set so that it’s an unbiased evaluation

26
© Parallebs, 2018
Data Analytics Basics
Getting the intuition of data analytics with real-life examples

Uses machine learning for streaming quality

Data points
Ultimate goal
Playback experience Devices, Buffering time, network
(might be subjective) throughput

What metric will lead to Hypothesis


this goal
• Adjusting resolution
• User’s input on quality
• Predictive caching
• Bounce rates
• A quality score based on
subjective metrics

© Parallebs, 2018
Expanding your learning

Dataset

• Statistical analysis
Broadly understand the data you have now
• Visualisations

• Categorising independent variables


Preprocessing – getting the data ready
• Normalisation
• Data cleaning
Cleaned dataset • Features engineering

Training dataset Test dataset

• Linear Regression
Training – “creating • .. Other models
the model” (Ridge, Boosted
trees, Neural nets)
Evaluating/Testing – • Mean absolute
Model/Equation error
evaluate the model
• .. Other metrics
(MSE, RMSE ..)

Results

© Parallebs, 2018

You might also like