Lab 1: Data Analysis with
Python
Objectives
Create a Jupyter notebook
Apply data analysis and modeling techniques to housing
price data
Answer 11 questions: implement using Python code
Lab 1: Data Analysis with
Python 2
Project Scenario
In this assignment, you are a Data Analyst working at a Real
Estate Investment Trust. The Trust would like to start
investing in Residential real estate. You are tasked with
determining the market price of a house given a set of
features. You will analyze and predict housing prices using
attributes or features such as square footage, number of
bedrooms, number of floors, and so on. A template notebook
is provided in the lab; your job is to complete the ten
questions. Some hints to the questions are given in the
template notebook.
Lab 1: Data Analysis with
Python 3
Data sets
The dataset contains house sale prices for King County,
which includes Seattle.
It includes homes sold between May 2014 and May 2015
Lab 1: Data Analysis with
Python 4
Data sets
Lab 1: Data Analysis with
Python 5
Data sets
Lab 1: Data Analysis with
Python 6
Data sets
Lab 1: Data Analysis with
Python 7
Data sets
Lab 1: Data Analysis with
Python 8
Data sets
Lab 1: Data Analysis with
Python 9
Data sets
Lab 1: Data Analysis with
Python 10
Data sets
Lab 1: Data Analysis with
Python 11
Data sets
Lab 1: Data Analysis with
Python 12
Question
Question 1
Display the data types of each column using the function dtypes
Lab 1: Data Analysis with
Python 13
Question
Question 2
Drop the columns "id" and "Unnamed: 0" from axis 1 using the
method drop(), then use the method describe() to obtain a
statistical summary of the data
Lab 1: Data Analysis with
Python 14
Question
Question 3
Use the method value_counts to count the number of houses with
unique floor values, use the method .to_frame() to convert it to a
dataframe.
Lab 1: Data Analysis with
Python 15
Question
Question 4
Use the function boxplot in the seaborn library to determine
whether houses with a waterfront view or without a waterfront view
have more price outliers.
Lab 1: Data Analysis with
Python 16
Question
Question 5
Use the function regplot in the seaborn library to determine if the
feature sqft_above is negatively or positively correlated with price.
Lab 1: Data Analysis with
Python 17
Question
Question 6
Fit a linear regression model to predict the 'price' using the
feature 'sqft_living' then calculate the R^2.
Lab 1: Data Analysis with
Python 18
Question
Question 7
Fit a linear regression model to predict the 'price' using the list of
features: features =["floors",
"waterfront","lat" ,"bedrooms" ,"sqft_basement" ,"view" ,"bathroom
s","sqft_living15","sqft_above","grade","sqft_living"]
Then calculate the R^2.
Lab 1: Data Analysis with
Python 19
Question
Question 8
Create a list of tuples, the first element in the tuple contains the
name of the estimator:
'scale'
'polynomial'
'model'
The second element in the tuple contains the model constructor
StandardScaler()
PolynomialFeatures(include_bias=False)
LinearRegression()
Lab 1: Data Analysis with
Python 20
Question
Question 9
Use the list to create a pipeline object to predict the 'price', fit the
object using the features in the list features, and calculate the R^2.
Lab 1: Data Analysis with
Python 21
Question
Question 10
Create and fit a Ridge regression object using the training data, set
the regularization parameter to 0.1, and calculate the R^2 using
the test data.
Lab 1: Data Analysis with
Python 22
Question
Question 11
Perform a second order polynomial transform on both the training
data and testing data. Create and fit a Ridge regression object
using the training data, set the regularisation parameter to 0.1, and
calculate the R^2 utilising the test data provided.
Lab 1: Data Analysis with
Python 23
Summary
Create a Jupyter notebook
Apply data analysis and modeling techniques to housing
price data
Lab 1: Data Analysis with
Python 24
Q&A
Lab 1: Data Analysis with
Python 25