0% found this document useful (0 votes)

252 views10 pages

01.multiple Linear Regression - Ipynb - Colaboratory

Your neighbor, a real estate agent, asks for help building a model to predict housing prices based on various features of homes. You are given data on over 5,000 homes that includes income, age, number of rooms/bedrooms, population, and price. You clean the data, split it into training and test sets, and train a linear regression model to predict prices using the other features. You then evaluate the model to see how well it predicts housing prices.

Uploaded by

eusrghnw53

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

252 views10 pages

01.multiple Linear Regression - Ipynb - Colaboratory

Uploaded by

eusrghnw53

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

11/30/2020 01.multiple linear regression.

ipynb - Colaboratory

Linear Regression with Python

This is mostly just code for reference.

Your neighbor is a real estate agent and wants some help predicting housing prices for regions
in the USA. It would be great if you could somehow create a model for her that allows her to put
in a few features of a house and returns back an estimate of what the house would sell for.

She has asked you if you could help her out with your new data science skills. You say yes, and
decide that Linear Regression might be a good path to solve this problem!

Your neighbor then gives you some information about a bunch of houses in regions of the
United States,it is all in the data set: USA_Housing.csv.

The data contains the following columns:

'Avg. Area Income': Avg. Income of residents of the city house is located in.
'Avg. Area House Age': Avg Age of Houses in same city
'Avg. Area Number of Rooms': Avg Number of Rooms for Houses in same city
'Avg. Area Number of Bedrooms': Avg Number of Bedrooms for Houses in same city
'Area Population': Population of city house is located in
'Price': Price that the house sold at
'Address': Address for the house

Let's get started!

Check out the data

We've been able to get some data from your neighbor for housing prices as a csv set, let's get
our environment ready with the libraries we'll need and then import the data!

Import Libraries

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline

from google.colab import files

f=files.upload()

Choose Files No file chosen Upload widget is only available when the cell has been
executed in the current browser session. Please rerun this cell to enable.
Saving USA Housing csv to USA Housing csv

https://colab.research.google.com/drive/1Hfw_WgnV6HjrEFPdL7n6OT35Z5NyJ5Zf#printMode=true 1/10
11/30/2020 01.multiple linear regression.ipynb - Colaboratory

Check out the Data

USAhousing = pd.read_csv('USA_Housing.csv')

USAhousing.head()

Avg. Avg.
Avg.
Area Area
Avg. Area Area Area
Number Number Price
Income House Population
of of
Age
Rooms Bedrooms

208 Michael
0 79545.458574 5.682861 7.009188 4.09 23086.800503 1.059034e+06 674\nLaur

188 Johns
1 79248.642455 6.002900 6.730821 3.09 40173.072174 1.505891e+06 Suite 0
Kathl

9127
2 61287 067179 5 865890 8 512727 5 13 36882 159400 1 058988e+06 Stravenue\nD

USAhousing.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 7 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Avg. Area Income 5000 non-null float64
1 Avg. Area House Age 5000 non-null float64
2 Avg. Area Number of Rooms 5000 non-null float64
3 Avg. Area Number of Bedrooms 5000 non-null float64
4 Area Population 5000 non-null float64
5 Price 5000 non-null float64
6 Address 5000 non-null object
dtypes: float64(6), object(1)
memory usage: 273.6+ KB

USAhousing.describe()

https://colab.research.google.com/drive/1Hfw_WgnV6HjrEFPdL7n6OT35Z5NyJ5Zf#printMode=true 2/10
11/30/2020 01.multiple linear regression.ipynb - Colaboratory

Avg. Area Avg. Area

Avg. Area Avg. Area Area
Number of Number of Price
Income House Age Population
Rooms Bedrooms
USAhousing.columns
count 5000.000000 5000.000000 5000.000000 5000.000000 5000.000000 5.000000e+03
Index(['Avg.
mean Area Income',5.977222
68583.108984 'Avg. Area 6.987792
House Age', 'Avg. Area36163.516039
3.981330 Number of Rooms',
1.232073e+06
'Avg. Area Number of Bedrooms', 'Area Population', 'Price', 'Address'],
std dtype='object')
10657.991214 0.991456 1.005833 1.234137 9925.650114 3.531176e+05

min 17796.631190 2.644304 3.236194 2.000000 172.610686 1.593866e+04

EDA 25% 61480.562388 5.322283 6.299250 3.140000 29403.928702 9.975771e+05

50% 68804.286404 5.970429 7.002902 4.050000 36199.406689 1.232669e+06

Let's create some simple plots to check out the data!
75% 75783.338666 6.650808 7.665871 4.490000 42861.290769 1.471210e+06

sns.pairplot(USAhousing)

https://colab.research.google.com/drive/1Hfw_WgnV6HjrEFPdL7n6OT35Z5NyJ5Zf#printMode=true 3/10
11/30/2020 01.multiple linear regression.ipynb - Colaboratory

<seaborn.axisgrid.PairGrid at 0x7f3687566978>

sns.distplot(USAhousing['Price'])

/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py:2551: FutureWarning:
warnings.warn(msg, FutureWarning)
<matplotlib.axes._subplots.AxesSubplot at 0x7f0a81e26518>

sns.heatmap(USAhousing.corr(),cmap="coolwarm",annot=True)

https://colab.research.google.com/drive/1Hfw_WgnV6HjrEFPdL7n6OT35Z5NyJ5Zf#printMode=true 4/10
11/30/2020 01.multiple linear regression.ipynb - Colaboratory

<matplotlib.axes._subplots.AxesSubplot at 0x7f0a80487e80>

Training a Linear Regression Model

Let's now begin to train out regression model! We will need to rst split up our data into an X
array that contains the features to train on, and a y array with the target variable, in this case the
Price column. We will toss out the Address column because it only has text info that the linear
regression model can't use.

X and y arrays

X = USAhousing[['Avg. Area Income', 'Avg. Area House Age', 'Avg. Area Number of Rooms',
'Avg. Area Number of Bedrooms', 'Area Population']]
y = USAhousing['Price']

Train Test Split

Now let's split the data into a training set and a testing set. We will train out model on the
training set and then use the test set to evaluate the model.

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

Creating and Training the Model

from sklearn.linear_model import LinearRegression

lm = LinearRegression()

lm.fit(X_train,y_train)

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

Model Evaluation

https://colab.research.google.com/drive/1Hfw_WgnV6HjrEFPdL7n6OT35Z5NyJ5Zf#printMode=true 5/10
11/30/2020 01.multiple linear regression.ipynb - Colaboratory

Let's evaluate the model by checking out it's coe cients and how we can interpret them.

y=mx+c

y=m1x1+m2x2+m3x3+m4x4+m5x5+c

# print the intercept

print(lm.intercept_)

-2641372.6673013503

coeff_df = pd.DataFrame(lm.coef_,X.columns,columns=['Coefficient'])
coeff_df

Coefficient

Avg. Area Income 21.617635

Avg. Area House Age 165221.119872

Avg. Area Number of Rooms 121405.376596

Avg. Area Number of Bedrooms 1318.718783

Area Population 15.225196

Interpreting the coe cients:

Holding all other features xed, a 1 unit increase in Avg. Area Income is associated with an
*increase of $21.52 *.
Holding all other features xed, a 1 unit increase in Avg. Area House Age is associated
with an *increase of $164883.28 *.
Holding all other features xed, a 1 unit increase in Avg. Area Number of Rooms is
associated with an *increase of $122368.67 *.
Holding all other features xed, a 1 unit increase in Avg. Area Number of Bedrooms is
associated with an *increase of $2233.80 *.
Holding all other features xed, a 1 unit increase in Area Population is associated with an
*increase of $15.15 *.

Does this make sense? Probably not because I made up this data. If you want real data to repeat
this sort of analysis, check out the boston dataset:

from sklearn.datasets import load_boston

boston = load_boston()
print(boston.DESCR)
boston_df = boston.data

https://colab.research.google.com/drive/1Hfw_WgnV6HjrEFPdL7n6OT35Z5NyJ5Zf#printMode=true 6/10
11/30/2020 01.multiple linear regression.ipynb - Colaboratory

Predictions from our Model

Let's grab predictions off our test set and see how well it did!
predictions = lm.predict(X_test)

lm.predict([[79545.458574,5.682861,7.009188,4.09,23086.800503]])

array([1224988.39965275])

plt.scatter(y_test,predictions)

<matplotlib.collections.PathCollection at 0x7f0a7a760da0>

Residual Histogram

sns.distplot((y_test-predictions),bins=50);

/usr/local/lib/python3.6/dist-packages/seaborn/distributions.py:2551: FutureWarning:
warnings.warn(msg, FutureWarning)

https://colab.research.google.com/drive/1Hfw_WgnV6HjrEFPdL7n6OT35Z5NyJ5Zf#printMode=true 7/10
11/30/2020 01.multiple linear regression.ipynb - Colaboratory

Regression Evaluation Metrics

Here are three common evaluation metrics for regression problems:

Mean Absolute Error (MAE) is the mean of the absolute value of the errors:
n
1
^ |
∑ |y i − y
i
n
i=1

Mean Squared Error (MSE) is the mean of the squared errors:

n
1
2
^ )
∑(y i − y
i
n
i=1

Root Mean Squared Error (RMSE) is the square root of the mean of the squared errors:
−−−−
n
−−−−− −− −
1
2
√ ^ )
∑(y i − y
i
n
i=1

Comparing these metrics:

MAE is the easiest to understand, because it's the average error.

MSE is more popular than MAE, because MSE "punishes" larger errors, which tends to be
useful in the real world.
RMSE is even more popular than MSE, because RMSE is interpretable in the "y" units.

All of these are loss functions, because we want to minimize them.

from sklearn import metrics

print('MAE:', metrics.mean_absolute_error(y_test, predictions))

print('MSE:', metrics.mean_squared_error(y_test, predictions))
print('RMSE:', np.sqrt(metrics.mean_squared_error(y_test, predictions)))

MAE: 81257.55795855916
MSE: 10169125565.897552
RMSE: 100842.08231635022

#Cheking the score

print('Train Score: ', lm.score(np.array(X_train), y_train))
print('Test Score: ', lm.score(np.array(X_test), y_test))

Train Score: 0.9181223200568411

Test Score: 0.9176824009649299

Up next is your own Machine Learning Project!

Great Job!

https://colab.research.google.com/drive/1Hfw_WgnV6HjrEFPdL7n6OT35Z5NyJ5Zf#printMode=true 8/10
11/30/2020 01.multiple linear regression.ipynb - Colaboratory

Backward elimination
import statsmodels.api as smf

x = np.append(arr = np.ones((5000,1)).astype(int), values=X, axis=1)

x_opt=x [:, [0,1,2,3,4,5]]

regressor_OLS=smf.OLS(endog = y, exog=x_opt).fit()
print(regressor_OLS.summary())

OLS Regression Results

==============================================================================
Dep. Variable: Price R-squared: 0.918
Model: OLS Adj. R-squared: 0.918
Method: Least Squares F-statistic: 1.119e+04
Date: Sat, 28 Nov 2020 Prob (F-statistic): 0.00
Time: 15:48:10 Log-Likelihood: -64714.
No. Observations: 5000 AIC: 1.294e+05
Df Residuals: 4994 BIC: 1.295e+05
Df Model: 5
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -2.637e+06 1.72e+04 -153.708 0.000 -2.67e+06 -2.6e+06
x1 21.5780 0.134 160.656 0.000 21.315 21.841
x2 1.656e+05 1443.413 114.754 0.000 1.63e+05 1.68e+05
x3 1.207e+05 1605.160 75.170 0.000 1.18e+05 1.24e+05
x4 1651.1391 1308.671 1.262 0.207 -914.431 4216.709
x5 15.2007 0.144 105.393 0.000 14.918 15.483
==============================================================================
Omnibus: 5.580 Durbin-Watson: 2.005
Prob(Omnibus): 0.061 Jarque-Bera (JB): 4.959
Skew: 0.011 Prob(JB): 0.0838
Kurtosis: 2.847 Cond. No. 9.40e+05
==============================================================================

Warnings:
[1] Standard Errors assume that the covariance matrix of the errors is correctly spec
[2] The condition number is large, 9.4e+05. This might indicate that there are
strong multicollinearity or other numerical problems.

x_opt=x [:, [0,1,2,3,5]]

regressor_OLS=smf.OLS(endog = y, exog=x_opt).fit()
print(regressor_OLS.summary())

OLS Regression Results

Df Model: 4
Covariance Type: nonrobust
==============================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const -2.638e+06 1.72e+04 -153.726 0.000 -2.67e+06 -2.6e+06
x1 21.5827 0.134 160.743 0.000 21.320 21.846
x2 1.657e+05 1443.404 114.769 0.000 1.63e+05 1.68e+05
x3 1.216e+05 1422.608 85.476 0.000 1.19e+05 1.24e+05
x4 15.1961 0.144 105.388 0.000 14.913 15.479
==============================================================================
Omnibus: 5.310 Durbin-Watson: 2.006
Prob(Omnibus): 0.070 Jarque-Bera (JB): 4.742
Skew: 0.011 Prob(JB): 0.0934
Kurtosis: 2.851 Cond. No. 9.40e+05
==============================================================================

https://colab.research.google.com/drive/1Hfw_WgnV6HjrEFPdL7n6OT35Z5NyJ5Zf#printMode=true 10/10

Messari Crypto Theses 2024
100% (4)
Messari Crypto Theses 2024
193 pages
Planmeca Compact I Touch/Classic v2: Spare Parts Manual
No ratings yet
Planmeca Compact I Touch/Classic v2: Spare Parts Manual
296 pages
MUNAR - Linear Regression - Ipynb - Colaboratory
No ratings yet
MUNAR - Linear Regression - Ipynb - Colaboratory
30 pages
House Price Prediction: Project Description
No ratings yet
House Price Prediction: Project Description
11 pages
Leetcode Solutions in Java
No ratings yet
Leetcode Solutions in Java
2 pages
DL - LR - 1.ipynb - Colab
No ratings yet
DL - LR - 1.ipynb - Colab
5 pages
Regression Algorithm
No ratings yet
Regression Algorithm
9 pages
Sesi 4-2B Linear Regression With Python - Jupyter Notebook
No ratings yet
Sesi 4-2B Linear Regression With Python - Jupyter Notebook
12 pages
2 - Linear - Regression - Multivariate - Ipynb - Colaboratory
No ratings yet
2 - Linear - Regression - Multivariate - Ipynb - Colaboratory
4 pages
Emllab
No ratings yet
Emllab
6 pages
Prac - 8 (1) - Jupyter Notebook
No ratings yet
Prac - 8 (1) - Jupyter Notebook
6 pages
Linear Regression Analysis - Polynomial Regression
No ratings yet
Linear Regression Analysis - Polynomial Regression
25 pages
Linear Regression Using Python
No ratings yet
Linear Regression Using Python
18 pages
DSBDAL - Assignment No 4
No ratings yet
DSBDAL - Assignment No 4
15 pages
T2 Summary VHA
No ratings yet
T2 Summary VHA
14 pages
Ex No.: Date: Problem Statement
No ratings yet
Ex No.: Date: Problem Statement
3 pages
Predicting Housin main project ediglobe
No ratings yet
Predicting Housin main project ediglobe
4 pages
5 Multiple Linear Regression
No ratings yet
5 Multiple Linear Regression
2 pages
Unit 3 5
No ratings yet
Unit 3 5
4 pages
Project Linear Regression
No ratings yet
Project Linear Regression
7 pages
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 1
No ratings yet
Kritika Sejwal - 24MCI10023 - ML Lab - Worksheet 1
6 pages
ML Regression
No ratings yet
ML Regression
9 pages
Lab6 Hoursing Price Regression
No ratings yet
Lab6 Hoursing Price Regression
2 pages
Expt 7
No ratings yet
Expt 7
3 pages
Ds 4 Linears Boston
No ratings yet
Ds 4 Linears Boston
2 pages
1 - Linear - Regression - Ipynb - Colaboratory
No ratings yet
1 - Linear - Regression - Ipynb - Colaboratory
7 pages
DL Assignment 1ms24rai03
No ratings yet
DL Assignment 1ms24rai03
10 pages
Week 6 LAB
No ratings yet
Week 6 LAB
13 pages
Data Science - Machine Learning - Multiple Linear Regression
No ratings yet
Data Science - Machine Learning - Multiple Linear Regression
14 pages
Prediction of House Rent Using Multiple Linear Regression
No ratings yet
Prediction of House Rent Using Multiple Linear Regression
20 pages
ML Assignment1
No ratings yet
ML Assignment1
4 pages
Boston Housing Kaggle Challenge With Linear Regression
No ratings yet
Boston Housing Kaggle Challenge With Linear Regression
3 pages
20MIS1025 - Regression - Ipynb - Colaboratory
No ratings yet
20MIS1025 - Regression - Ipynb - Colaboratory
5 pages
Exp4 (Linear Regression)
No ratings yet
Exp4 (Linear Regression)
2 pages
Python File
No ratings yet
Python File
5 pages
Data Mining Final Assignment
No ratings yet
Data Mining Final Assignment
4 pages
2 Linear Regression Multivariate
No ratings yet
2 Linear Regression Multivariate
2 pages
7 A
No ratings yet
7 A
2 pages
R22 ML Lab Manual
No ratings yet
R22 ML Lab Manual
25 pages
1 - Lab Manual (ML)
No ratings yet
1 - Lab Manual (ML)
42 pages
ml2020 Pythonlab02
No ratings yet
ml2020 Pythonlab02
3 pages
AD-22053227 Lab 401, 402
No ratings yet
AD-22053227 Lab 401, 402
4 pages
Linear Regression - Jupyter Notebook
No ratings yet
Linear Regression - Jupyter Notebook
2 pages
IoT Task4 21BEC0384
No ratings yet
IoT Task4 21BEC0384
9 pages
Practical No. 02: To Implement Linear Regression To Predict A Continuous Target Variable
No ratings yet
Practical No. 02: To Implement Linear Regression To Predict A Continuous Target Variable
4 pages
Machine Learning Lab
No ratings yet
Machine Learning Lab
20 pages
Project 4 - House Price Prediction - Ipynb - Colab
No ratings yet
Project 4 - House Price Prediction - Ipynb - Colab
5 pages
A09Ass04 - Jupyter Notebook
No ratings yet
A09Ass04 - Jupyter Notebook
10 pages
Chirag HOusing Price Pred
No ratings yet
Chirag HOusing Price Pred
12 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
DA Lab2
No ratings yet
DA Lab2
5 pages
ML Lab-3
No ratings yet
ML Lab-3
14 pages
Decision Tree
No ratings yet
Decision Tree
4 pages
DSBDA Practical 4
No ratings yet
DSBDA Practical 4
3 pages
Regression (1) - 1-4
No ratings yet
Regression (1) - 1-4
4 pages
Practical Activity 01: Linear Regression: Case of Study: Predicting House Prices
No ratings yet
Practical Activity 01: Linear Regression: Case of Study: Predicting House Prices
2 pages
C1 W1 Lab03 Model Representation Soln-Copy1
No ratings yet
C1 W1 Lab03 Model Representation Soln-Copy1
7 pages
ML Merged
No ratings yet
ML Merged
28 pages
Integrated System Lab
No ratings yet
Integrated System Lab
25 pages
ML Assignment2 33418
No ratings yet
ML Assignment2 33418
6 pages
AIML
No ratings yet
AIML
5 pages
Module 2
No ratings yet
Module 2
20 pages
AI-Powered Bitcoin Trading: Developing an Investment Strategy with Artificial Intelligence
From Everand
AI-Powered Bitcoin Trading: Developing an Investment Strategy with Artificial Intelligence
Eoghan Leahy
No ratings yet
NFR Security
No ratings yet
NFR Security
8 pages
Avaya VoIP Monitoring Manager Reference
No ratings yet
Avaya VoIP Monitoring Manager Reference
114 pages
Data Analytics Using SQL Final Question Bank
No ratings yet
Data Analytics Using SQL Final Question Bank
79 pages
Citra - Log - Txt.old CVM Ass
No ratings yet
Citra - Log - Txt.old CVM Ass
11 pages
Price List: Prefil System
No ratings yet
Price List: Prefil System
5 pages
HDB3
100% (1)
HDB3
3 pages
PaaS For Oracle FIN Cloud-Receivables
No ratings yet
PaaS For Oracle FIN Cloud-Receivables
76 pages
Genesys Agentrack
No ratings yet
Genesys Agentrack
2 pages
OOP Java Chapter One
No ratings yet
OOP Java Chapter One
21 pages
MATH-EXAM - Seinajoki
No ratings yet
MATH-EXAM - Seinajoki
7 pages
Arm Cortex M Efficient System Design and Development
No ratings yet
Arm Cortex M Efficient System Design and Development
5 pages
Building Constraints
No ratings yet
Building Constraints
9 pages
KCQDJ 2 HC 4 FZ
No ratings yet
KCQDJ 2 HC 4 FZ
2 pages
UGC NET Paper 1 Syllabus
No ratings yet
UGC NET Paper 1 Syllabus
7 pages
240606 - STEP 2新系统（ (终版) -发那科系统
No ratings yet
240606 - STEP 2新系统（ (终版) -发那科系统
13 pages
Eastern Africa Submarine Cable System (Eassy)
No ratings yet
Eastern Africa Submarine Cable System (Eassy)
23 pages
Lab Manual CTSD-Index
No ratings yet
Lab Manual CTSD-Index
9 pages
ST 120ea
No ratings yet
ST 120ea
18 pages
Automatic Flash Point COC ASTM D92 OILLAB 670 - Brochure 2023
No ratings yet
Automatic Flash Point COC ASTM D92 OILLAB 670 - Brochure 2023
2 pages
Load Balancing and Load Sharing
No ratings yet
Load Balancing and Load Sharing
2 pages
Gourav - Sharma Cover
No ratings yet
Gourav - Sharma Cover
8 pages
Mini AT Datasheet
No ratings yet
Mini AT Datasheet
2 pages
Strings and Functions
No ratings yet
Strings and Functions
15 pages
From Deepfakes To Digital Truths The Role of Watermarking in AI-Generated Image Verification
No ratings yet
From Deepfakes To Digital Truths The Role of Watermarking in AI-Generated Image Verification
7 pages
Analox Sensor Technology LTD: Price List
No ratings yet
Analox Sensor Technology LTD: Price List
36 pages
Hisham Reffat: Work Experience Skills
No ratings yet
Hisham Reffat: Work Experience Skills
1 page
30 Pattern Program in Java (Star, Number & Alphabet)
No ratings yet
30 Pattern Program in Java (Star, Number & Alphabet)
55 pages

01.multiple Linear Regression - Ipynb - Colaboratory

Uploaded by

01.multiple Linear Regression - Ipynb - Colaboratory

Uploaded by

11/30/2020 01.multiple linear regression.

Linear Regression with Python

The data contains the following columns:

Let's get started!

Check out the data

from google.colab import files

Check out the Data

Avg. Area Avg. Area

min 17796.631190 2.644304 3.236194 2.000000 172.610686 1.593866e+04

EDA 25% 61480.562388 5.322283 6.299250 3.140000 29403.928702 9.975771e+05

50% 68804.286404 5.970429 7.002902 4.050000 36199.406689 1.232669e+06

Training a Linear Regression Model

Train Test Split

from sklearn.model_selection import train_test_split

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=101)

Creating and Training the Model

from sklearn.linear_model import LinearRegression

LinearRegression(copy_X=True, fit_intercept=True, n_jobs=None, normalize=False)

# print the intercept

Avg. Area Income 21.617635

Avg. Area House Age 165221.119872

Avg. Area Number of Rooms 121405.376596

Avg. Area Number of Bedrooms 1318.718783

Area Population 15.225196

Interpreting the coe cients:

from sklearn.datasets import load_boston

Predictions from our Model

Regression Evaluation Metrics

Mean Squared Error (MSE) is the mean of the squared errors:

Comparing these metrics:

MAE is the easiest to understand, because it's the average error.

All of these are loss functions, because we want to minimize them.

print('MAE:', metrics.mean_absolute_error(y_test, predictions))

#Cheking the score

Train Score: 0.9181223200568411

Up next is your own Machine Learning Project!

x = np.append(arr = np.ones((5000,1)).astype(int), values=X, axis=1)

x_opt=x [:, [0,1,2,3,4,5]]

OLS Regression Results

x_opt=x [:, [0,1,2,3,5]]

OLS Regression Results

You might also like