Exercise4 Solution
Exercise4 Solution
Essential Libraries
Let us begin by importing the essential Python Libraries.
# Basic Libraries
import numpy as np
import pandas as pd
import seaborn as sb
import matplotlib.pyplot as plt # we only need pyplot
sb.set() # set the default Seaborn style for graphics
The dataset is train.csv; hence we use the read_csv function from Pandas.
Immediately after importing, take a quick look at the data using the head function.
houseData = pd.read_csv('train.csv')
houseData.head()
[5 rows x 81 columns]
houseGrLivArea = pd.DataFrame(houseData['GrLivArea'])
houseSalePrice = pd.DataFrame(houseData['SalePrice'])
<seaborn.axisgrid.JointGrid at 0x25b36d1db50>
Import the LinearRegression model from sklearn.linear_model.
linreg.fit(houseGrLivArea_train, houseSalePrice_train)
LinearRegression()
Intercept : b = [16608.46906887]
Coefficients : a = [[108.93750382]]
• Perform a Random Train-Test Split on the dataset before you start with the
Regression
Note : Check the preparation notebook M3 LinearRegression.ipynb for the code
Problem 2 : Predicting SalePrice using LotArea
Extract the required variables from the dataset, as mentioned in the problem.
housePredictor = pd.DataFrame(houseData['LotArea'])
houseSalePrice = pd.DataFrame(houseData['SalePrice'])
<seaborn.axisgrid.JointGrid at 0x25b374c2e50>
Linear Regression on SalePrice vs Predictor
# Import LinearRegression model from Scikit-Learn
from sklearn.linear_model import LinearRegression
LinearRegression()
Intercept : b = [162100.36069809]
Coefficients : a = [[1.83081318]]
Prediction of Response based on the Predictor
# Predict SalePrice values corresponding to GrLivArea
houseSalePrice_test_pred = linreg.predict(housePredictor_test)
housePredictor = pd.DataFrame(houseData['TotalBsmtSF'])
houseSalePrice = pd.DataFrame(houseData['SalePrice'])
trainDF = pd.concat([housePredictor, houseSalePrice], axis =
1).reindex(housePredictor.index)
sb.jointplot(data=trainDF, x='TotalBsmtSF', y='SalePrice', height =
12)
<seaborn.axisgrid.JointGrid at 0x25b384d3a60>
LinearRegression()
Intercept : b = [66517.88779439]
Coefficients : a = [[107.22623803]]
Prediction of Response based on the Predictor
# Predict SalePrice values corresponding to GrLivArea
houseSalePrice_test_pred = linreg.predict(housePredictor_test)
housePredictor = pd.DataFrame(houseData['GarageArea'])
houseSalePrice = pd.DataFrame(houseData['SalePrice'])
<seaborn.axisgrid.JointGrid at 0x25b39697280>
Linear Regression on SalePrice vs Predictor
# Import LinearRegression model from Scikit-Learn
from sklearn.linear_model import LinearRegression
LinearRegression()
Intercept : b = [73619.02664658]
Coefficients : a = [[227.27199019]]
Prediction of Response based on the Predictor
# Predict SalePrice values corresponding to GrLivArea
houseSalePrice_test_pred = linreg.predict(housePredictor_test)
housePredictor =
pd.DataFrame(houseData[['GrLivArea','LotArea','TotalBsmtSF','GarageAre
a']])
houseSalePrice = pd.DataFrame(houseData['SalePrice'])
LinearRegression()
Intercept : b = [-18973.0369382]
Coefficients : a = [[70.07255406 0.19864354 43.29137855
96.31552172]]