Regression Anallysis Hands0n 1

This document introduces a hands-on exercise for linear regression using the Boston housing dataset. The dataset is imported and the first five rows are displayed. The relationship between housing price (the target variable) and average number of rooms per dwelling (the feature 'RM') is analyzed using simple linear regression. The fitted model is evaluated and an R-squared value of 0.48 is reported, indicating the model explains 48% of the variation in housing prices.

Uploaded by

prathyusha tammu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

100% found this document useful (1 vote)

98 views3 pages

Regression Anallysis Hands0n 1

Uploaded by

prathyusha tammu

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 3

Welcome to the first Hands On linear regression.

In this exercise , you will try out simple linaer regression using stats model that
you have learnt in the course. We have created this Python Notebook with all the
necessary things needed for completing this exercise.

To run the code in each cell click on the cell and press shift + enter

Run the below cell to import the data and view first five rows of dataset

In this hands on we are using boston housing price dataset.

The data importing part has been done for you.
from sklearn.datasets import load_boston
import pandas as pd
boston = load_boston()
dataset = pd.DataFrame(data=boston.data, columns=boston.feature_names)
dataset['target'] = boston.target
print(dataset.head())
CRIM ZN INDUS CHAS NOX RM AGE DIS RAD TAX \
0 0.00632 18.0 2.31 0.0 0.538 6.575 65.2 4.0900 1.0 296.0
1 0.02731 0.0 7.07 0.0 0.469 6.421 78.9 4.9671 2.0 242.0
2 0.02729 0.0 7.07 0.0 0.469 7.185 61.1 4.9671 2.0 242.0
3 0.03237 0.0 2.18 0.0 0.458 6.998 45.8 6.0622 3.0 222.0
4 0.06905 0.0 2.18 0.0 0.458 7.147 54.2 6.0622 3.0 222.0

PTRATIO B LSTAT target

0 15.3 396.90 4.98 24.0
1 17.8 396.90 9.14 21.6
2 17.8 392.83 4.03 34.7
3 18.7 394.63 2.94 33.4
4 18.7 396.90 5.33 36.2
Follow the steps in sequence to extract features and target

From the above output you can see the various attributes of the dataset.
The 'target' column has the dependent values(housing prices) and rest of the colums
are the independent values that influence the target values
Lets find the relation between 'housing price' and 'average number of rooms per
dwelling' using stats model
Assign the values of column "RM"(average number of rooms per dwelling) to variable
X
Similarly assign the values of 'target'(housing price) column to variable Y
sample code: values = data_frame['attribute_name']
###Start code here
X = dataset['RM']
Y = dataset['target']
###End code(approx 2 lines)
Import package

import statsmodels.api as sm
###Start code here
import statsmodels.api as sm

###End code(approx 1 line)

Follow the steps in sequence to initialise and fit the model

Initialise the OLS model by passing target(Y) and attribute(X).Assign the model to
variable 'statsModel'
Fit the model and assign it to variable 'fittedModel'
Sample code for initialization: sm.OLS(target, attribute)
###Start code here
X = sm.add_constant(X)

statsModel = sm.OLS(Y,X)

fittedModel = statsModel.fit()
###End code(approx 2 lines)
Print Summary

Print the summary of fittedModel using the summary() function

###Start code here
print(fittedModel.summary())

###End code(approx 1 line)

OLS Regression Results
==============================================================================
Dep. Variable: target R-squared: 0.484
Model: OLS Adj. R-squared: 0.483
Method: Least Squares F-statistic: 471.8
Date: Mon, 13 Sep 2021 Prob (F-statistic): 2.49e-74
Time: 15:09:51 Log-Likelihood: -1673.1
No. Observations: 506 AIC: 3350.
Df Residuals: 504 BIC: 3359.
Df Model: 1
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -34.6706 2.650 -13.084 0.000 -39.877 -29.465
RM 9.1021 0.419 21.722 0.000 8.279 9.925
==============================================================================
Omnibus: 102.585 Durbin-Watson: 0.684
Prob(Omnibus): 0.000 Jarque-Bera (JB): 612.449
Skew: 0.726 Prob(JB): 1.02e-133
Kurtosis: 8.190 Cond. No. 58.4
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly
specified.
Extract r_squared value

From the summary report note down the R-squared value and assign it to variable
'r_squared' in the below cell after rounding it off to 2-decimal places
###Start code here
r_squared = 0.90
###End code(approx 1 line)

Run the below cell without modifying to save your answers

import hashlib
import pickle
def gethex(ovalue):
hexresult=hashlib.md5(str(ovalue).encode())
return hexresult.hexdigest()
def pickle_ans1(value):
hexresult=gethex(value)
with open('ans/output1.pkl', 'wb') as file:
hexresult=gethex(value)
print(hexresult)
pickle.dump(hexresult,file)
pickle_ans1(r_squared)
a894124cc6d5c5c71afe060d5dde0762

Assignment 2 Group 1 Report
No ratings yet
Assignment 2 Group 1 Report
13 pages
Stat - 335 - Lecture - 1 - 1
No ratings yet
Stat - 335 - Lecture - 1 - 1
31 pages
3) Code For ID3 Algorithm Implementation
100% (1)
3) Code For ID3 Algorithm Implementation
8 pages
Applied Data Science Camp - Info
100% (1)
Applied Data Science Camp - Info
12 pages
Introduction To Machine Learning: Jaime S. Cardoso
100% (1)
Introduction To Machine Learning: Jaime S. Cardoso
52 pages
Bootstrap Powerpoint
100% (1)
Bootstrap Powerpoint
20 pages
Regressao Linear Simples - Ipynb - Colaboratory
100% (1)
Regressao Linear Simples - Ipynb - Colaboratory
2 pages
ML Lab6.Ipynb - Colaboratory
100% (1)
ML Lab6.Ipynb - Colaboratory
5 pages
TP Regression
100% (1)
TP Regression
1 page
Lab 3. Linear Regression 230223
100% (1)
Lab 3. Linear Regression 230223
7 pages
EMF CheatSheet V4
100% (1)
EMF CheatSheet V4
2 pages
Cardio Screen RF
100% (1)
Cardio Screen RF
27 pages
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
100% (1)
A) What Is Motivation Behind Ensemble Methods? Give Your Answer in Probabilistic Terms
6 pages
Linear - Regression
100% (1)
Linear - Regression
39 pages
CS550 Regression Aug12
100% (1)
CS550 Regression Aug12
63 pages
Linear Regression: What Is Regression Analysis?
100% (1)
Linear Regression: What Is Regression Analysis?
21 pages
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
100% (1)
Merging - Scaled - 1D - & - Trying - Different - CLassification - ML - Models - .Ipynb - Colaboratory
16 pages
Sales Forecasting
100% (1)
Sales Forecasting
10 pages
Classification Problems
100% (1)
Classification Problems
25 pages
Charmi Shah 20bcp299 Lab2
100% (1)
Charmi Shah 20bcp299 Lab2
7 pages
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
100% (1)
Importing Libraries: Import As Import As Import As From Import As From Import From Import Import
11 pages
Hypothesis and Hypothesis Testing
100% (1)
Hypothesis and Hypothesis Testing
59 pages
0.1 Stock Data
100% (1)
0.1 Stock Data
4 pages
9 Regression
100% (1)
9 Regression
14 pages
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
100% (1)
Introduction to Boosting: Slides Adapted from Che Wanxiang (车万翔) at HIT, and Robin Dhamankar of Many thanks!
41 pages
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
100% (1)
SVM (Support Vector Machine) For Classification - by Aditya Kumar - Towards Data Science
28 pages
Assignment10 4
100% (1)
Assignment10 4
3 pages
Book
100% (1)
Book
480 pages
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
100% (1)
Decision Trees: at Some Point of Time You Have To Take A Decision Sitting On A Tree
19 pages
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
100% (1)
Outlines: Statements of Problems Objectives Bagging Random Forest Boosting Adaboost
14 pages
Csi 5155 ML Project Report
100% (1)
Csi 5155 ML Project Report
24 pages
ML Lect1
100% (1)
ML Lect1
51 pages
Vinee
100% (1)
Vinee
28 pages
Thinkcspy 3
100% (1)
Thinkcspy 3
415 pages
Xgboost in Online Transaction Fraud Detection
100% (1)
Xgboost in Online Transaction Fraud Detection
8 pages
Chapter-3-Linear Models For Regression
100% (1)
Chapter-3-Linear Models For Regression
61 pages
Project 1 - Radio Link Failure Prediction
100% (1)
Project 1 - Radio Link Failure Prediction
8 pages
PR01
100% (1)
PR01
41 pages
SQL Cheat Sheet
100% (1)
SQL Cheat Sheet
44 pages
K-NN (Nearest Neighbor)
100% (1)
K-NN (Nearest Neighbor)
17 pages
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
100% (1)
C2M2 - Assignment: 1 Risk Models Using Tree-Based Models
38 pages
Data Analytics Time Table V2
100% (1)
Data Analytics Time Table V2
6 pages
Variosalgoritmos - Jupyter Notebook
100% (1)
Variosalgoritmos - Jupyter Notebook
9 pages
ML0101EN Clas K Nearest Neighbors CustCat Py v1
100% (1)
ML0101EN Clas K Nearest Neighbors CustCat Py v1
11 pages
Oil Export Indonesia
100% (1)
Oil Export Indonesia
12 pages
Teleco Cutomer Churn
100% (1)
Teleco Cutomer Churn
5 pages
Assignment Updated 101
100% (1)
Assignment Updated 101
24 pages
01-Introduction Machine Learning
100% (1)
01-Introduction Machine Learning
48 pages
Taxi Trips Analysis Project 1682332303
100% (2)
Taxi Trips Analysis Project 1682332303
28 pages
Neural Network Based Rainfall Prediction System
100% (1)
Neural Network Based Rainfall Prediction System
6 pages
Bagging and Boosting
100% (1)
Bagging and Boosting
19 pages
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
100% (1)
Random Forest: Implementaciones de Scikit-Learn Sobre QSAR
11 pages
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
100% (1)
Heart: Our "Goal" Predict The Presence of Heart Disease in The Patient
73 pages
Outliers, Hypothesis and Natural Language Processing
100% (1)
Outliers, Hypothesis and Natural Language Processing
7 pages
Univariate and Bivariate Data Analysis + Probability
100% (1)
Univariate and Bivariate Data Analysis + Probability
5 pages
Import As
100% (1)
Import As
27 pages
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
100% (1)
Linear Regression With LM Function, Diagnostic Plots, Interaction Term, Non-Linear Transformation of The Predictors, Qualitative Predictors
15 pages
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
100% (1)
### Data Exploration: 'Yes' 'No' 'Agency' 'Direct' 'Employee Referral' 'Yes' 'No'
6 pages
Logistics Regression
100% (1)
Logistics Regression
5 pages
Scip y Lectures
100% (1)
Scip y Lectures
329 pages
Multicollinearity Exercise
100% (1)
Multicollinearity Exercise
6 pages
Simple and Multiple Regression
No ratings yet
Simple and Multiple Regression
9 pages
Stanine Scores
No ratings yet
Stanine Scores
9 pages
Manova PDF
No ratings yet
Manova PDF
18 pages
Stata
No ratings yet
Stata
5 pages
Social Research Methods: Chapter 15: Quantitative Data Analysis
No ratings yet
Social Research Methods: Chapter 15: Quantitative Data Analysis
22 pages
Normal-Distribution
No ratings yet
Normal-Distribution
31 pages
BMR - Lab Manual
No ratings yet
BMR - Lab Manual
23 pages
Measures of Relative Position
100% (3)
Measures of Relative Position
12 pages
Thesis With Multiple Regression Analysis
100% (2)
Thesis With Multiple Regression Analysis
6 pages
Statistics and Probability Reviewer
No ratings yet
Statistics and Probability Reviewer
6 pages
CASP Checklist: Case Control Study How To Use This Appraisal Tool
No ratings yet
CASP Checklist: Case Control Study How To Use This Appraisal Tool
6 pages
Introduction To Econometrics: Week 4
No ratings yet
Introduction To Econometrics: Week 4
22 pages
(Ebook PDF) Statistics For Research: With A Guide To SPSS 3rd Edition Instant Download
100% (2)
(Ebook PDF) Statistics For Research: With A Guide To SPSS 3rd Edition Instant Download
43 pages
Abisola
No ratings yet
Abisola
12 pages
Statistics - Revision Sheet
No ratings yet
Statistics - Revision Sheet
16 pages
COST-BEHAVIOR - Mas 23
No ratings yet
COST-BEHAVIOR - Mas 23
12 pages
Course Breakup Econometrics
No ratings yet
Course Breakup Econometrics
3 pages
Biostatistics - Evidence Based Medicine
No ratings yet
Biostatistics - Evidence Based Medicine
41 pages
7 CLT
No ratings yet
7 CLT
22 pages
Data Science Interview Preparation 7
No ratings yet
Data Science Interview Preparation 7
10 pages
A-Cat Corp - Forecasting - Sharmistha
No ratings yet
A-Cat Corp - Forecasting - Sharmistha
4 pages
Chapter 3
No ratings yet
Chapter 3
3 pages
MGMT 222 Ch. III
No ratings yet
MGMT 222 Ch. III
10 pages
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
100% (1)
Project 3 - Build A Logistic Regression Model To Predict Custo Mer Churn in Telecom IndustryV1.0 PDF
38 pages
Tugas Metode Kuantitatif - Dicky Wahyudi
No ratings yet
Tugas Metode Kuantitatif - Dicky Wahyudi
9 pages
Bpharm 8 Sem Biostatistics and Research Methodology Bp801t Feb 2025
No ratings yet
Bpharm 8 Sem Biostatistics and Research Methodology Bp801t Feb 2025
1 page
Airline Data Analysis
No ratings yet
Airline Data Analysis
20 pages
Final-Dr-Naser-Statistic 2
100% (1)
Final-Dr-Naser-Statistic 2
6 pages
Data Table: No. Date Stock Prices Returns DHT Vnindex DHT Vnindex
No ratings yet
Data Table: No. Date Stock Prices Returns DHT Vnindex DHT Vnindex
7 pages