100% found this document useful (1 vote)
98 views3 pages

Regression Anallysis Hands0n 1

This document introduces a hands-on exercise for linear regression using the Boston housing dataset. The dataset is imported and the first five rows are displayed. The relationship between housing price (the target variable) and average number of rooms per dwelling (the feature 'RM') is analyzed using simple linear regression. The fitted model is evaluated and an R-squared value of 0.48 is reported, indicating the model explains 48% of the variation in housing prices.

Uploaded by

prathyusha tammu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
98 views3 pages

Regression Anallysis Hands0n 1

This document introduces a hands-on exercise for linear regression using the Boston housing dataset. The dataset is imported and the first five rows are displayed. The relationship between housing price (the target variable) and average number of rooms per dwelling (the feature 'RM') is analyzed using simple linear regression. The fitted model is evaluated and an R-squared value of 0.48 is reported, indicating the model explains 48% of the variation in housing prices.

Uploaded by

prathyusha tammu
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 3

Welcome to the first Hands On linear regression.

In this exercise , you will try out simple linaer regression using stats model that
you have learnt in the course. We have created this Python Notebook with all the
necessary things needed for completing this exercise.

To run the code in each cell click on the cell and press shift + enter

Run the below cell to import the data and view first five rows of dataset

In this hands on we are using boston housing price dataset.


The data importing part has been done for you.
from sklearn.datasets import load_boston
import pandas as pd
boston = load_boston()
dataset = pd.DataFrame(data=boston.data, columns=boston.feature_names)
dataset['target'] = boston.target
print(dataset.head())
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0

PTRATIO B LSTAT target


0 15.3 396.90 4.98 24.0
1 17.8 396.90 9.14 21.6
2 17.8 392.83 4.03 34.7
3 18.7 394.63 2.94 33.4
4 18.7 396.90 5.33 36.2
Follow the steps in sequence to extract features and target

From the above output you can see the various attributes of the dataset.
The 'target' column has the dependent values(housing prices) and rest of the colums
are the independent values that influence the target values
Lets find the relation between 'housing price' and 'average number of rooms per
dwelling' using stats model
Assign the values of column "RM"(average number of rooms per dwelling) to variable
X
Similarly assign the values of 'target'(housing price) column to variable Y
sample code: values = data_frame['attribute_name']
###Start code here
X = dataset['RM']
Y = dataset['target']
###End code(approx 2 lines)
Import package

import statsmodels.api as sm
###Start code here
import statsmodels.api as sm

###End code(approx 1 line)


Follow the steps in sequence to initialise and fit the model

Initialise the OLS model by passing target(Y) and attribute(X).Assign the model to
variable 'statsModel'
Fit the model and assign it to variable 'fittedModel'
Sample code for initialization: sm.OLS(target, attribute)
###Start code here
X = sm.add_constant(X)

statsModel = sm.OLS(Y,X)

fittedModel = statsModel.fit()
###End code(approx 2 lines)
Print Summary

Print the summary of fittedModel using the summary() function


###Start code here
print(fittedModel.summary())

###End code(approx 1 line)


OLS Regression Results
==============================================================================
Dep. Variable: target R-squared: 0.484
Model: OLS Adj. R-squared: 0.483
Method: Least Squares F-statistic: 471.8
Date: Mon, 13 Sep 2021 Prob (F-statistic): 2.49e-74
Time: 15:09:51 Log-Likelihood: -1673.1
No. Observations: 506 AIC: 3350.
Df Residuals: 504 BIC: 3359.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -34.6706 2.650 -13.084 0.000 -39.877 -29.465
RM 9.1021 0.419 21.722 0.000 8.279 9.925
==============================================================================
Omnibus: 102.585 Durbin-Watson: 0.684
Prob(Omnibus): 0.000 Jarque-Bera (JB): 612.449
Skew: 0.726 Prob(JB): 1.02e-133
Kurtosis: 8.190 Cond. No. 58.4
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
Extract r_squared value

From the summary report note down the R-squared value and assign it to variable
'r_squared' in the below cell after rounding it off to 2-decimal places
###Start code here
r_squared = 0.90
###End code(approx 1 line)

Run the below cell without modifying to save your answers


import hashlib
import pickle
def gethex(ovalue):
hexresult=hashlib.md5(str(ovalue).encode())
return hexresult.hexdigest()
def pickle_ans1(value):
hexresult=gethex(value)
with open('ans/output1.pkl', 'wb') as file:
hexresult=gethex(value)
print(hexresult)
pickle.dump(hexresult,file)
pickle_ans1(r_squared)
a894124cc6d5c5c71afe060d5dde0762

You might also like