0% found this document useful (0 votes)

29 views9 pages

Regression Algorithm

The document outlines the process of creating a linear regression model to predict housing prices using various features such as average area income, house age, number of rooms, and population. It includes data exploration, correlation analysis, model training, and evaluation metrics. The model's performance is assessed through predictions, scatter plots, and residual analysis.

Uploaded by

unknown2006103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

29 views9 pages

Regression Algorithm

Uploaded by

unknown2006103

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 9

Linear Regression Models

Creating a model to predict the Housing prices

based on existing features
In [1]: import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

In [2]: %matplotlib inline

In [3]: hs=pd.read_csv('USA_Housing.csv')
hs

Out[3]: Avg. Area

Avg. Area Avg. Area Avg. Area
Area
House Number of Number of Price Address
Income Population
Age Rooms Bedrooms

208 Michael Ferry Apt.

0 79545.458574 5.682861 7.009188 4.09 23086.800503 1.059034e+06 674\nLaurabury, NE
3701...

188 Johnson Views Suite

1 79248.642455 6.002900 6.730821 3.09 40173.072174 1.505891e+06
079\nLake Kathleen, CA...

9127 Elizabeth
2 61287.067179 5.865890 8.512727 5.13 36882.159400 1.058988e+06 Stravenue\nDanieltown,
WI 06482...

USS Barnett\nFPO AP
3 63345.240046 7.188236 5.586729 3.26 34310.242831 1.260617e+06
44820

USNS Raymond\nFPO AE
4 59982.197226 5.040555 7.839388 4.23 26354.109472 6.309435e+05
09386

... ... ... ... ... ... ... ...

USNS Williams\nFPO AP
4995 60567.944140 7.830362 6.137356 3.46 22837.361035 1.060194e+06
30153-7653

PSC 9258, Box

4996 78491.275435 6.999135 6.576763 4.02 25616.115489 1.482618e+06 8489\nAPO AA 42991-
3352

4215 Tracy Garden Suite

4997 63390.686886 7.250591 4.805081 2.13 33266.145490 1.030730e+06 076\nJoshualand, VA
01...

USS Wallace\nFPO AE
4998 68001.331235 5.534388 7.130144 5.44 42625.620156 1.198657e+06
73316

37778 George Ridges

4999 65510.581804 5.992305 6.792336 4.07 46501.283803 1.298950e+06 Apt. 509\nEast Holly, NV
2...

5000 rows × 7 columns

In [4]: hs.info() #gives total number of columns, total number of entries and type of data ty
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Avg. Area Income 5000 non-null float64
1 Avg. Area House Age 5000 non-null float64
2 Avg. Area Number of Rooms 5000 non-null float64
3 Avg. Area Number of Bedrooms 5000 non-null float64
4 Area Population 5000 non-null float64
5 Price 5000 non-null float64
6 Address 5000 non-null object
dtypes: float64(6), object(1)
memory usage: 273.6+ KB

In [5]: hs.describe() #to get the statistical information about the dataframe

Out[5]: Avg. Area Avg. Area House Avg. Area Number Avg. Area Number of Area
Price
Income Age of Rooms Bedrooms Population

count 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5.000000e+03

mean 68583.108984 5.977222 6.987792 3.981330 36163.516039 1.232073e+06

std 10657.991214 0.991456 1.005833 1.234137 9925.650114 3.531176e+05

min 17796.631190 2.644304 3.236194 2.000000 172.610686 1.593866e+04

25% 61480.562388 5.322283 6.299250 3.140000 29403.928702 9.975771e+05

50% 68804.286404 5.970429 7.002902 4.050000 36199.406689 1.232669e+06

75% 75783.338666 6.650808 7.665871 4.490000 42861.290769 1.471210e+06

max 107701.748378 9.519088 10.759588 6.500000 69621.713378 2.469066e+06

In [6]: #in above address column is not available as it has string values.

In [7]: hs.columns #gives number of columns in the given dataframe.

Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
Out[7]:
'Avg. Area Number of Bedrooms', 'Area Population', 'Price', 'Address'],
dtype='object')

In [8]: sns.pairplot(hs)

<seaborn.axisgrid.PairGrid at 0x21970b9e790>
Out[8]:
In [9]: #in above histogram looks like normally distributed except number of bedrooms which are 4
#around it.

In [10]: #to check distribution of the price

sns.histplot(hs['Price'])

<AxesSubplot:xlabel='Price', ylabel='Count'>
Out[10]:
In [11]: #to check the corelation between two variables use heatmap
sns.heatmap(hs.corr())
#you can see the daignol of corelation and it shows relation is perfectly ploted with so

<AxesSubplot:>
Out[11]:

In [12]: sns.heatmap(hs.corr(),annot=True)

<AxesSubplot:>
Out[12]:
In [13]: #to check the corelation between two variables
hs.corr()

Out[13]: Avg. Area Avg. Area Avg. Area Number Avg. Area Number Area
Price
Income House Age of Rooms of Bedrooms Population

Avg. Area Income 1.000000 -0.002007 -0.011032 0.019788 -0.016234 0.639734

Avg. Area House

-0.002007 1.000000 -0.009428 0.006149 -0.018743 0.452543
Age

Avg. Area Number

-0.011032 -0.009428 1.000000 0.462695 0.002040 0.335664
of Rooms

Avg. Area Number

0.019788 0.006149 0.462695 1.000000 -0.022168 0.171071
of Bedrooms

Area Population -0.016234 -0.018743 0.002040 -0.022168 1.000000 0.408556

Price 0.639734 0.452543 0.335664 0.171071 0.408556 1.000000

In [14]: # for training a linear regression model split the data into X axis that features to tra
# in this case target variable is Price column wr we are predciting the price
#here we wont deal with Address column as it has text information but in NLP we will.
hs.columns # to grab the columns

Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
Out[14]:
'Avg. Area Number of Bedrooms', 'Area Population', 'Price', 'Address'],
dtype='object')

In [15]: #take feature variables

X=hs[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
'Avg. Area Number of Bedrooms', 'Area Population']]

In [16]: #take target variable that we trying to predict

y=hs['Price']
Splitting the data into training set and testing set(to
test the model that we have trained)
In [17]: #now do Train Test split in the data
#we need to import scikitlearn for splitting the data
from sklearn.model_selection import train_test_split

In [18]: x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=10

#tuple unpacking to grab training set and testing set
#test size is percentage of test data you want to allocate for testing our model that is
#random state means data is allocated randomly.

In [19]: from sklearn.linear_model import LinearRegression

#to import linear regression functions

In [20]: #we need to create instance for the linear regression model
lm=LinearRegression()

In [21]: lm.fit(x_train,y_train) # at first i will fit my data into model for training the data
#use shift+tab to get the broiler code

LinearRegression()
Out[21]:

evaluate our model while checking coeffcients

In [22]: #grab the intercept while calling lm
print(lm.intercept_)

-2640159.79685267

In [23]: #grab the coeffcient for each features each of this coef relates to columns
lm.coef_

array([2.15282755e+01, 1.64883282e+05, 1.22368678e+05, 2.23380186e+03,

Out[23]:
1.51504200e+01])

In [24]: X.columns

Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
Out[24]:
'Avg. Area Number of Bedrooms', 'Area Population'],
dtype='object')

In [25]: x_train.columns

Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
Out[25]:
'Avg. Area Number of Bedrooms', 'Area Population'],
dtype='object')

In [26]: #create a dataframe for better understanding on the coefficients

cdf=pd.DataFrame(lm.coef_,X.columns,columns=['Coeff'])

In [27]: cdf
Out[27]: Coeff

Avg. Area Income 21.528276

Avg. Area House Age 164883.282027

Avg. Area Number of Rooms 122368.678027

Avg. Area Number of Bedrooms 2233.801864

Area Population 15.150420

In [28]: #coefficient shows that when you hold all other features fixed and if there is one unit
#is assoicated with the increase of 21.528$ in the house price.and rest all the same.

Predicting the test set

In [37]: predictions=lm.predict(x_test)

In [38]: predictions #predicted prices for the house

array([1260960.70567627, 827588.7556033 , 1742421.24254342, ...,

Out[38]:
372191.40626916, 1365217.15140897, 1914519.5417888 ])

In [39]: y_test #actual vlaues of the house

1718 1.251689e+06
Out[39]:
2511 8.730483e+05
345 1.696978e+06
2521 1.063964e+06
54 9.487883e+05
...
1776 1.489520e+06
4269 7.777336e+05
1661 1.515271e+05
2410 1.343824e+06
2302 1.906025e+06
Name: Price, Length: 2000, dtype: float64

In [40]: #to analyse above these values draw a scatter plot using both variables
plt.scatter(y_test,predictions)

<matplotlib.collections.PathCollection at 0x2197496b880>
Out[40]:
In [ ]: #it is pretty good bcs the predicted value and actual value fits into the straight line

In [41]: #create a histogram of distribution for residuals.residuals are the diffrence between ac
#predicted values (predictions).
sns.distplot((y_test-predictions))

C:\Users\Sai Shri\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarni

ng: `distplot` is a deprecated function and will be removed in a future version. Please
adapt your code to use either `displot` (a figure-level function with similar flexibilit
y) or `histplot` (an axes-level function for histograms).
warnings.warn(msg, FutureWarning)
<AxesSubplot:xlabel='Price', ylabel='Density'>
Out[41]:

In [42]: #the curve shows residuals are normally distributed it means the model selected is a per
#if it is not normaly distributed and showing weired behaviour then look back the data a
#is good choice for the data or not for the dataset

Regression Evaluation Matrics

Mean Absolute Error

Mean sqaured Error

Root Mean sqaured Error

In [43]: from sklearn import metrics

In [44]: metrics.mean_absolute_error(y_test,predictions)

82288.22251914942
Out[44]:

In [46]: metrics.mean_squared_error(y_test,predictions)

10460958907.208948
Out[46]:
In [47]: np.sqrt(metrics.mean_squared_error(y_test,predictions))

102278.82922290883
Out[47]:

Prac - 8 (1) - Jupyter Notebook
No ratings yet
Prac - 8 (1) - Jupyter Notebook
6 pages
Project Linear Regression
No ratings yet
Project Linear Regression
7 pages
DL - LR - 1.ipynb - Colab
No ratings yet
DL - LR - 1.ipynb - Colab
5 pages
ML Merged
No ratings yet
ML Merged
28 pages
01.multiple Linear Regression - Ipynb - Colaboratory
No ratings yet
01.multiple Linear Regression - Ipynb - Colaboratory
10 pages
Linear Regression Using Python
No ratings yet
Linear Regression Using Python
18 pages
Mlext
No ratings yet
Mlext
1 page
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
No ratings yet
Unit 1: Shobana T S Assistant Professor Dept. of ISE, BMSCE
127 pages
DL 1
No ratings yet
DL 1
11 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
A09Ass04 - Jupyter Notebook
No ratings yet
A09Ass04 - Jupyter Notebook
10 pages
Kaggle Machine Learning
No ratings yet
Kaggle Machine Learning
6 pages
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
Document From Jahnavi
No ratings yet
Document From Jahnavi
20 pages
ML Regression
No ratings yet
ML Regression
9 pages
Expt 7
No ratings yet
Expt 7
3 pages
2 Linear Regression Multivariate
No ratings yet
2 Linear Regression Multivariate
2 pages
Report
No ratings yet
Report
40 pages
Introduction To Machine Learning (ML) With Sklearn
No ratings yet
Introduction To Machine Learning (ML) With Sklearn
10 pages
ML File
No ratings yet
ML File
37 pages
Emllab
No ratings yet
Emllab
6 pages
Python File
No ratings yet
Python File
5 pages
Exp 3 ML
No ratings yet
Exp 3 ML
3 pages
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
No ratings yet
Machine Learning Project: TITLE: Predicting The Sale Price of A House Using Linear Regression
20 pages
DSBDA Prac4 2
No ratings yet
DSBDA Prac4 2
1 page
ML Manual
No ratings yet
ML Manual
9 pages
Chirag HOusing Price Pred
No ratings yet
Chirag HOusing Price Pred
12 pages
Linear Reg
No ratings yet
Linear Reg
25 pages
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
No ratings yet
USA Real Estate Price Prediction Using Decision Tree Regressor, and AdaBoost Regressor
14 pages
ML Manual
No ratings yet
ML Manual
30 pages
ML Record
No ratings yet
ML Record
19 pages
Unit 3 5
No ratings yet
Unit 3 5
4 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
The Boston Housing Dataset
100% (2)
The Boston Housing Dataset
4 pages
Exercise4 Solution
No ratings yet
Exercise4 Solution
20 pages
IoT Task4 21BEC0384
No ratings yet
IoT Task4 21BEC0384
9 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Aayushi ML File
No ratings yet
Aayushi ML File
37 pages
Linear Regression - Jupyter Notebook
No ratings yet
Linear Regression - Jupyter Notebook
2 pages
2 - Linear - Regression - Multivariate - Ipynb - Colaboratory
No ratings yet
2 - Linear - Regression - Multivariate - Ipynb - Colaboratory
4 pages
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
No ratings yet
Exp - 2-EDA - CaliforniaData Set - HeatMap - PairPlot-checkpoint - Jupyter Notebook
12 pages
Injecttive Blockchain
No ratings yet
Injecttive Blockchain
14 pages
Housing Prices Linear Regression
No ratings yet
Housing Prices Linear Regression
3 pages
Data Science Record - 05
No ratings yet
Data Science Record - 05
20 pages
Predicting Housin main project ediglobe
No ratings yet
Predicting Housin main project ediglobe
4 pages
Machine Learning
No ratings yet
Machine Learning
10 pages
Data Mining Final Assignment
No ratings yet
Data Mining Final Assignment
4 pages
Price Prediction
No ratings yet
Price Prediction
4 pages
1 - Linear - Regression - Ipynb - Colaboratory
No ratings yet
1 - Linear - Regression - Ipynb - Colaboratory
7 pages
DA Lab2
No ratings yet
DA Lab2
5 pages
1 - Lab Manual (ML)
No ratings yet
1 - Lab Manual (ML)
42 pages
Faseeh Chap 2 Report
No ratings yet
Faseeh Chap 2 Report
30 pages
Setup: Chapter 2 - End-To-End Machine Learning Project
No ratings yet
Setup: Chapter 2 - End-To-End Machine Learning Project
31 pages
Ex No.: Date: Problem Statement
No ratings yet
Ex No.: Date: Problem Statement
3 pages
California Housing Price Prediction .
No ratings yet
California Housing Price Prediction .
1 page
DL Assignment 1ms24rai03
No ratings yet
DL Assignment 1ms24rai03
10 pages
CB Lab 221801017
No ratings yet
CB Lab 221801017
33 pages
Medium Sudoku Puzzle Book (Printable Version)
From Everand
Medium Sudoku Puzzle Book (Printable Version)
Sheba Blake
No ratings yet
Data Analytics v2 Brochure Skillovilla
No ratings yet
Data Analytics v2 Brochure Skillovilla
40 pages
Cap 11 Doane
No ratings yet
Cap 11 Doane
46 pages
A Overview Breif of PRIDIT
No ratings yet
A Overview Breif of PRIDIT
28 pages
DWH Tables
No ratings yet
DWH Tables
8 pages
4th Chap Variability
No ratings yet
4th Chap Variability
24 pages
Stata Treatment-Effects Reference Manual:: Release 17
No ratings yet
Stata Treatment-Effects Reference Manual:: Release 17
394 pages
Veritas College of Irosin: Schedule Lesson No.: 4-6
No ratings yet
Veritas College of Irosin: Schedule Lesson No.: 4-6
17 pages
Safeer Sajid Dissertation
No ratings yet
Safeer Sajid Dissertation
90 pages
Applying Machine Learning To Pairs Trading - Illya Barziy
No ratings yet
Applying Machine Learning To Pairs Trading - Illya Barziy
36 pages
Ekonometri
No ratings yet
Ekonometri
5 pages
HNW Service and Satisfaction Survey in Education Sector: Submitted By-Surabhi Bhat
No ratings yet
HNW Service and Satisfaction Survey in Education Sector: Submitted By-Surabhi Bhat
22 pages
Qlikview Working With Chart and Pivots
No ratings yet
Qlikview Working With Chart and Pivots
24 pages
Hmef5013 Educational Administration
100% (2)
Hmef5013 Educational Administration
2 pages
Chapter 11: Multiple Regression and Correlation: Ey Ey Ey X X Ey X X X For Which by Contrast, When X Ey X
No ratings yet
Chapter 11: Multiple Regression and Correlation: Ey Ey Ey X X Ey X X X For Which by Contrast, When X Ey X
14 pages
Process Mining and Data Stream Mining
No ratings yet
Process Mining and Data Stream Mining
19 pages
Particle SizeAnalyses
No ratings yet
Particle SizeAnalyses
41 pages
CH Multiple Linear Regression Everitt Hothorn
No ratings yet
CH Multiple Linear Regression Everitt Hothorn
12 pages
Difference Between l1 and l2 Regularisation
No ratings yet
Difference Between l1 and l2 Regularisation
4 pages
Process For Spss and Sas
No ratings yet
Process For Spss and Sas
5 pages
Abdulrafiu Aishat Omolara
No ratings yet
Abdulrafiu Aishat Omolara
65 pages
Evolution Analysis
No ratings yet
Evolution Analysis
16 pages
Universities in A Competitive Global Marketplace A Systematic Review of The Literature On Higher Education Marketing
No ratings yet
Universities in A Competitive Global Marketplace A Systematic Review of The Literature On Higher Education Marketing
24 pages
Tabel 2.6 Analisis SPSS Curah Hujan: Descriptive Statistics
No ratings yet
Tabel 2.6 Analisis SPSS Curah Hujan: Descriptive Statistics
2 pages
Reg. No.: 39110009 Colab Notebook Link: Name: Abivirshan Suresh
No ratings yet
Reg. No.: 39110009 Colab Notebook Link: Name: Abivirshan Suresh
27 pages
How Big Data Is Different
No ratings yet
How Big Data Is Different
5 pages
MISQ BI Special Issue Introduction Chen-Chiang-Storey December 2012 PDF
No ratings yet
MISQ BI Special Issue Introduction Chen-Chiang-Storey December 2012 PDF
24 pages
BSL Lessons 4.1: Quick Guide
No ratings yet
BSL Lessons 4.1: Quick Guide
4 pages
Data Analytics: Get Certified! Get Hired!
No ratings yet
Data Analytics: Get Certified! Get Hired!
38 pages
Codsoft Report
No ratings yet
Codsoft Report
26 pages
6 - Event Studies
No ratings yet
6 - Event Studies
31 pages

Regression Algorithm

Uploaded by

Regression Algorithm

Uploaded by

Linear Regression Models

Creating a model to predict the Housing prices

In [2]: %matplotlib inline

Out[3]: Avg. Area

208 Michael Ferry Apt.

188 Johnson Views Suite

... ... ... ... ... ... ... ...

PSC 9258, Box

4215 Tracy Garden Suite

37778 George Ridges

5000 rows × 7 columns

count 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5.000000e+03

mean 68583.108984 5.977222 6.987792 3.981330 36163.516039 1.232073e+06

std 10657.991214 0.991456 1.005833 1.234137 9925.650114 3.531176e+05

min 17796.631190 2.644304 3.236194 2.000000 172.610686 1.593866e+04

25% 61480.562388 5.322283 6.299250 3.140000 29403.928702 9.975771e+05

50% 68804.286404 5.970429 7.002902 4.050000 36199.406689 1.232669e+06

75% 75783.338666 6.650808 7.665871 4.490000 42861.290769 1.471210e+06

max 107701.748378 9.519088 10.759588 6.500000 69621.713378 2.469066e+06

In [7]: hs.columns #gives number of columns in the given dataframe.

In [10]: #to check distribution of the price

Avg. Area Income 1.000000 -0.002007 -0.011032 0.019788 -0.016234 0.639734

Avg. Area House

Avg. Area Number

Avg. Area Number

Area Population -0.016234 -0.018743 0.002040 -0.022168 1.000000 0.408556

Price 0.639734 0.452543 0.335664 0.171071 0.408556 1.000000

In [15]: #take feature variables

In [16]: #take target variable that we trying to predict

In [18]: x_train, x_test, y_train, y_test = train_test_split(X, y, test_size=0.4, random_state=10

In [19]: from sklearn.linear_model import LinearRegression

evaluate our model while checking coeffcients

array([2.15282755e+01, 1.64883282e+05, 1.22368678e+05, 2.23380186e+03,

In [26]: #create a dataframe for better understanding on the coefficients

Avg. Area Income 21.528276

Avg. Area House Age 164883.282027

Avg. Area Number of Rooms 122368.678027

Avg. Area Number of Bedrooms 2233.801864

Area Population 15.150420

Predicting the test set

In [38]: predictions #predicted prices for the house

array([1260960.70567627, 827588.7556033 , 1742421.24254342, ...,

In [39]: y_test #actual vlaues of the house

C:\Users\Sai Shri\anaconda3\lib\site-packages\seaborn\distributions.py:2619: FutureWarni

Regression Evaluation Matrics

Mean Absolute Error

Mean sqaured Error

Root Mean sqaured Error

You might also like