0% found this document useful (0 votes)
42 views

Linear Regression Using Python

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views

Linear Regression Using Python

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 18

4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

Linear Regression with Python


Here neighbor is a real estate agent and wants some help predicting housing prices for
regions in the INDIA. It would be great if i could somehow create a model for her that allows
her to put in a few features of a house and returns back an estimate of what the house
would sell for.

She has asked me if i could help her out with your new data science skills. me say yes, and
decide that Linear Regression might be a good path to solve this problem!

My neighbor then gives you some information about a bunch of houses in regions of the
India,it is all in the data set: INDIA_Housing.csv.

The data contains the following columns:

'Avg. Area Income': Avg. Income of residents of the city house is located in. 'Avg. Area House
Age': Avg Age of Houses in same city 'Avg. Area Number of Rooms': Avg Number of Rooms
for Houses in same city 'Avg. Area Number of Bedrooms': Avg Number of Bedrooms for
Houses in same city 'Area Population': Population of city house is located in 'Price': Price that
the house sold at 'Address': Address for the house

Let's get started!

Check out the data


In [1]: ## Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

In [2]: INDIAhousing = pd.read_csv("INDIA_Housing.csv")

INDIAhousing.head()

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 1/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

Out[2]: Avg.
Avg. Avg. Area
Area
Avg. Area Area Number Area
Number Price Address
Income House of Population
of
Age Bedrooms
Rooms

208 Michael Ferry Apt.


0 79545.458574 5.682861 7.009188 4.09 23086.800503 1.059034e+06 674\nLaurabury, NE
3701..

188 Johnson Views


1 79248.642455 6.002900 6.730821 3.09 40173.072174 1.505891e+06 Suite 079\nLake
Kathleen, CA...

9127 Elizabeth
2 61287.067179 5.865890 8.512727 5.13 36882.159400 1.058988e+06 Stravenue\nDanieltown,
WI 06482..

USS Barnett\nFPO AP
3 63345.240046 7.188236 5.586729 3.26 34310.242831 1.260617e+06
44820

USNS Raymond\nFPO
4 59982.197226 5.040555 7.839388 4.23 26354.109472 6.309435e+05
AE 09386

In [3]: INDIAhousing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Avg. Area Income 5000 non-null float64
1 Avg. Area House Age 5000 non-null float64
2 Avg. Area Number of Rooms 5000 non-null float64
3 Avg. Area Number of Bedrooms 5000 non-null float64
4 Area Population 5000 non-null float64
5 Price 5000 non-null float64
6 Address 5000 non-null object
dtypes: float64(6), object(1)
memory usage: 273.6+ KB

In [4]: INDIAhousing.describe()

Out[4]: Avg. Area Avg. Area


Avg. Area Avg. Area Area
Number of Number of Price
Income House Age Population
Rooms Bedrooms

count 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5.000000e+03

mean 68583.108984 5.977222 6.987792 3.981330 36163.516039 1.232073e+06

std 10657.991214 0.991456 1.005833 1.234137 9925.650114 3.531176e+05

min 17796.631190 2.644304 3.236194 2.000000 172.610686 1.593866e+04

25% 61480.562388 5.322283 6.299250 3.140000 29403.928702 9.975771e+05

50% 68804.286404 5.970429 7.002902 4.050000 36199.406689 1.232669e+06

75% 75783.338666 6.650808 7.665871 4.490000 42861.290769 1.471210e+06

max 107701.748378 9.519088 10.759588 6.500000 69621.713378 2.469066e+06

In [5]: INDIAhousing.columns

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 2/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis
Index(['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
Out[5]:
'Avg. Area Number of Bedrooms', 'Area Population', 'Price', 'Address'],
dtype='object')

Exploratory Data Analysis for House Price


Prediction
In [6]: sns.pairplot(INDIAhousing)

<seaborn.axisgrid.PairGrid at 0x21385c9ef50>
Out[6]:

In [7]: sns.distplot(INDIAhousing['Price'])

C:\Users\HP\AppData\Local\Temp\ipykernel_4772\867072288.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(INDIAhousing['Price'])

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 3/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis
<Axes: xlabel='Price', ylabel='Density'>
Out[7]:

In [8]: sns.distplot(INDIAhousing['Area Population'])

C:\Users\HP\AppData\Local\Temp\ipykernel_4772\2139820291.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(INDIAhousing['Area Population'])
<Axes: xlabel='Area Population', ylabel='Density'>
Out[8]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 4/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [9]: sns.distplot(INDIAhousing['Avg. Area Income'])

C:\Users\HP\AppData\Local\Temp\ipykernel_4772\3131757723.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(INDIAhousing['Avg. Area Income'])


<Axes: xlabel='Avg. Area Income', ylabel='Density'>
Out[9]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 5/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [10]: sns.distplot(INDIAhousing['Avg. Area House Age'])

C:\Users\HP\AppData\Local\Temp\ipykernel_4772\1332842614.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(INDIAhousing['Avg. Area House Age'])


<Axes: xlabel='Avg. Area House Age', ylabel='Density'>
Out[10]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 6/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [11]: sns.distplot(INDIAhousing['Avg. Area Number of Rooms'])

C:\Users\HP\AppData\Local\Temp\ipykernel_4772\2831880010.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(INDIAhousing['Avg. Area Number of Rooms'])


<Axes: xlabel='Avg. Area Number of Rooms', ylabel='Density'>
Out[11]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 7/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [12]: sns.distplot(INDIAhousing['Avg. Area Number of Bedrooms'])

C:\Users\HP\AppData\Local\Temp\ipykernel_4772\334197827.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(INDIAhousing['Avg. Area Number of Bedrooms'])


<Axes: xlabel='Avg. Area Number of Bedrooms', ylabel='Density'>
Out[12]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 8/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [13]: sns.distplot(INDIAhousing['Area Population'])

C:\Users\HP\AppData\Local\Temp\ipykernel_4772\2139820291.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot(INDIAhousing['Area Population'])
<Axes: xlabel='Area Population', ylabel='Density'>
Out[13]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 9/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [15]: sns.histplot(INDIAhousing['Price'])

<Axes: xlabel='Price', ylabel='Count'>


Out[15]:

In [16]: sns.histplot(INDIAhousing['Avg. Area Income'])

<Axes: xlabel='Avg. Area Income', ylabel='Count'>


Out[16]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 10/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [17]: sns.histplot(INDIAhousing['Avg. Area House Age'])

<Axes: xlabel='Avg. Area House Age', ylabel='Count'>


Out[17]:

In [18]: sns.histplot(INDIAhousing['Avg. Area Number of Rooms'])

<Axes: xlabel='Avg. Area Number of Rooms', ylabel='Count'>


Out[18]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 11/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [19]: sns.histplot(INDIAhousing['Avg. Area Number of Bedrooms'])

<Axes: xlabel='Avg. Area Number of Bedrooms', ylabel='Count'>


Out[19]:

In [20]: sns.histplot(INDIAhousing['Area Population'])

<Axes: xlabel='Area Population', ylabel='Count'>


Out[20]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 12/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [21]: INDIAhousing_numeric = INDIAhousing.drop(columns=['Address'])

In [22]: sns.heatmap(INDIAhousing_numeric.corr(), annot=True)

<Axes: >
Out[22]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 13/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

Training a Linear Regression Model


In [23]: X = INDIAhousing[['Avg. Area Income','Avg. Area House Age','Avg. Area Number of Roo

Y = INDIAhousing['Price']

Split Data into Train ,Test


In [24]: from sklearn.model_selection import train_test_split

In [25]: X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=0.4, random_sta

Creating and Training the Linear Regression Model


In [26]: from sklearn.linear_model import LinearRegression

In [27]: lm = LinearRegression()

In [28]: lm.fit(X_train,Y_train)

Out[28]: ▾ LinearRegression

LinearRegression()

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 14/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

Linear Regression Model Evaluation


In [29]: print(lm.intercept_)

-2640159.79685267

In [30]: coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient'])


coeff_df

Out[30]: Coefficient

Avg. Area Income 21.528276

Avg. Area House Age 164883.282027

Avg. Area Number of Rooms 122368.678027

Avg. Area Number of Bedrooms 2233.801864

Area Population 15.150420

Interpreting the coefficients:

Holding all other features fixed, a 1 unit increase in Avg. Area Income is associated with an
increase of $21.52

Holding all other features fixed, a 1 unit increase in Avg. Area House Age is associated with
an increase of $164883.28

Holding all other features fixed, a 1 unit increase in Avg. Area Number of Rooms is
associated with an increase of $122368.67

Holding all other features fixed, a 1 unit increase in Avg. Area Number of Bedrooms is
associated with an increase of $2233.80

Holding all other features fixed, a 1 unit increase in Area Population is associated with an
increase of $15.15

Does this make sense? Probably not because I made up this data. If you want real data to
repeat this sort of analysis, check out the boston dataset:

from sklearn.datasets import load_boston

boston = load_boston()

print(boston.DESCR)

boston_df = boston.data

Predictions from our Model


Let's grab predictions off our test set and see how well it did!

In [31]: predictions = lm.predict(X_test)

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 15/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [32]: plt.scatter(Y_test,predictions)

<matplotlib.collections.PathCollection at 0x2138b335750>
Out[32]:

Residual Histogram
In [33]: sns.distplot((Y_test-predictions),bins=50);

C:\Users\HP\AppData\Local\Temp\ipykernel_4772\1960946261.py:1: UserWarning:

`distplot` is a deprecated function and will be removed in seaborn v0.14.0.

Please adapt your code to use either `displot` (a figure-level function with
similar flexibility) or `histplot` (an axes-level function for histograms).

For a guide to updating your code to use the new functions, please see
https://gist.github.com/mwaskom/de44147ed2974457ad6372750bbe5751

sns.distplot((Y_test-predictions),bins=50);

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 16/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [34]: sns.histplot((Y_test-predictions),bins=50);

Regression Evaluation Metrics


In [35]: from sklearn import metrics

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 17/18


4/18/24, 6:43 AM Linear Regression With Python - Housing data Analysis

In [36]: print('MAE:', metrics.mean_absolute_error(Y_test, predictions))


print('MSE:', metrics.mean_absolute_error(Y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(Y_test, predictions)))

MAE: 82288.22251914942
MSE: 82288.22251914942
RMSE: 102278.82922290884

Thank You
In [ ]:

localhost:8888/nbconvert/html/Linear Regression With Python - Housing data Analysis .ipynb?download=false 18/18

You might also like