0% found this document useful (0 votes)

17 views

Assumption of Linear Regression

The document loads advertising data and performs linear regression to predict sales based on TV and radio data. It checks that the linear regression assumptions of linearity, no multicollinearity, normality of residuals, homoscedasticity, and no autocorrelation are met by plotting the data and residuals. TV and radio are found to have a linear relationship with sales while newspaper does not and is dropped. The model achieves a training score of 0.90.

Uploaded by

Kagade Ajinkya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

17 views

Assumption of Linear Regression

Uploaded by

Kagade Ajinkya

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

In

[1]: import pandas as pd

import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
import numpy as np

In [2]: # loading dataset

df = pd.read_csv('advertising.csv')

In [3]: df.head()

Out[3]: TV Radio Newspaper Sales

0 230.1 37.8 69.2 22.1

1 44.5 39.3 45.1 10.4

2 17.2 45.9 69.3 9.3

3 151.5 41.3 58.5 18.5

4 180.8 10.8 58.4 12.9

In [4]: feature= list(df.describe().columns)

feature.remove('Sales')
print(feature)

['TV', 'Radio', 'Newspaper']

Assumption 1
Linearity: Linear Regression assumes a linear relationship between the independent variables and the
dependent variable. It assumes that the relationship can be represented by a straight line, allowing us to
estimate the impact of each independent variable on the outcome.

In [5]: for i in feature:

fig=plt.figure(figsize=(9,5))
ax=fig.gca()
plt.scatter(df[i],df['Sales'])
plt.title('Linear relationship between'+ i +'and Sales')
plt.xlabel(i)
plt.ylabel('Sales')

Loading [MathJax]/extensions/Safe.js
TV and Radio shows linear relationship with Sales. Newspaper not shows any relationship with Sales
Loading [MathJax]/extensions/Safe.js
In [6]: df = df.drop(columns=['Newspaper'],axis=1)
df.head()

Out[6]: TV Radio Sales

0 230.1 37.8 22.1

1 44.5 39.3 10.4

2 17.2 45.9 9.3

3 151.5 41.3 18.5

4 180.8 10.8 12.9

Assumption 2 :
No Multicollinearity: Linear Regression assumes that there is little or no multicollinearity among the
independent variables. Multicollinearity occurs when the independent variables are highly correlated with
each other, which can lead to unstable coefficient estimates and difficulty in interpreting the model.

In [11]: import seaborn as sns

sns.heatmap(df.corr(),annot = True)

<AxesSubplot:>
Out[11]:

If scale independent variables between feature is between 0.9 and 1.0 indicates very highly correlated
variables. to avoid highly correlated variables in our prediction we can use Feature Engineering or drop one

In [25]: X = df[['TV',"Radio"]]
y = df['Sales']

In [42]: from sklearn.preprocessing import StandardScaler

sc = StandardScaler()
X = sc.fit_transform(X)

In [43]: X_train,X_test,y_train,y_test = train_test_split(X,y,test_size=0.2,random_state=0)

In [44]: model = LinearRegression()

In [45]: model.fit(X_train,y_train)

Loading [MathJax]/extensions/Safe.js
Out[45]: ▾ LinearRegression

LinearRegression()

In [46]: model.score(X_train,y_train)

0.906590009997456
Out[46]:

In [68]: y_pred = model.predict(X_train)

In [69]: y_pred

array([12.12910171, 9.15580944, 15.03241037, 16.30334926, 17.14716657,

Out[69]:
13.30141363, 3.7442173 , 12.23433166, 15.75030475, 8.72053764,
10.63080662, 19.47315218, 18.40761899, 15.28187734, 9.97767471,
8.18442538, 21.51466181, 14.16377258, 16.31913548, 8.76173868,
15.3232895 , 12.43110582, 13.7323925 , 14.17416248, 18.32683299,
19.18210765, 20.26787047, 17.41350364, 9.2948777 , 11.7162453 ,
19.75212705, 9.88650856, 20.77707152, 23.23212847, 10.12298739,
17.1702549 , 19.57200672, 18.45956026, 16.89979032, 18.48460831,
17.06097604, 8.87711452, 9.92151758, 5.37423437, 3.61268846,
16.62992832, 12.67714289, 18.08966325, 11.70414944, 12.64627113,
13.80162459, 7.02617728, 16.56492853, 9.82454417, 8.10140123,
15.71810356, 24.8236722 , 10.89223692, 21.2456741 , 13.77916502,
10.67543603, 8.42066842, 12.45095892, 20.57350278, 10.46540505,
14.60394292, 16.38952182, 17.142417 , 13.17250923, 17.35076974,
21.17219997, 8.21351412, 16.14984219, 15.14382412, 8.77536534,
13.75492091, 16.41353838, 9.57141305, 14.27633084, 18.08106614,
20.96133734, 9.02853088, 20.25085962, 20.72493711, 13.69127828,
4.48797341, 17.75774028, 11.93958855, 11.03831089, 23.7750009 ,
11.91393641, 18.88783611, 20.84960921, 8.02434976, 5.39836394,
14.35422219, 15.62305239, 4.5174207 , 14.96247916, 17.19408806,
6.93837735, 17.39652874, 16.69270639, 12.76255569, 7.83850076,
12.60407148, 14.47316562, 14.87158322, 21.42869884, 18.14787514,
8.63502004, 11.83397385, 23.20856705, 10.08213515, 19.27559207,
20.0987164 , 9.87376597, 22.32356514, 7.48494988, 19.31724002,
15.56832949, 9.97766649, 11.37395041, 11.08808285, 6.52165542,
19.90457643, 7.57124521, 19.24819132, 17.67664966, 23.34299052,
9.21761664, 17.11020605, 10.26623555, 9.61843934, 13.12688122,
12.50992234, 18.57548627, 10.58632465, 13.87907726, 15.33802624,
14.05423996, 14.42682203, 18.39651198, 13.51559161, 12.73286286,
20.46049761, 22.01778386, 9.53746995, 11.86002719, 17.78800126,
15.80482629, 23.38295048, 14.5151901 , 12.35335522, 14.67363835,
11.98308167, 4.51747309, 6.50865906, 21.73203085, 7.74763102])

Assumption 3:
Normality: Linear Regression assumes that the residuals follow a normal distribution. This assumption
ensures the accuracy of statistical inference and hypothesis testing. Deviations from normality may lead to
biased estimates and incorrect statistical inferences.

In [70]: residual = y_train-y_pred

mean_residual = np.mean(residual)

In [71]: sns.kdeplot(residual)

<AxesSubplot:xlabel='Sales', ylabel='Density'>
Out[71]:

Loading [MathJax]/extensions/Safe.js
Assumption 4:
Homoscedasticity: Homoscedasticity assumes that the variance of the error term is constant across all
levels of the independent variables. In simpler terms, it means that the spread of the residuals remains the
same across the predicted values. Departure from this assumption may indicate heteroscedasticity, which
can affect the model's reliability.

In [73]: plt.scatter(y_pred,residual)

<matplotlib.collections.PathCollection at 0x1e20a8a6260>
Out[73]:

Assumption 5:
No Autocorrelation of error: The residuals in the linear regression model are assumed to be independently
and identically distributed. This implies that each error term is independent and unrelated to the other error
terms.

In [87]: plt.figure(figsize=(10,5))
p = sns.lineplot(x=y_pred,y=residual,marker='o',color='blue')
plt.xlabel('y_pred/predicted values')
plt.ylabel('Residuals')
plt.ylim(-10,10)
plt.xlim(0,26)
Loading [MathJax]/extensions/Safe.js
p = plt.title('Residuals vs fitted values plot for autocorrelation check')

In [ ]:

Loading [MathJax]/extensions/Safe.js

SPSS in Research Part 2 by Prof. Dr. Ananda Kumar
100% (2)
SPSS in Research Part 2 by Prof. Dr. Ananda Kumar
212 pages
Quiz Regression
100% (5)
Quiz Regression
27 pages
DA4675 CFA Level II SmartSheet 2020 PDF
100% (3)
DA4675 CFA Level II SmartSheet 2020 PDF
10 pages
Linear Regression - Jupyter Notebook
100% (3)
Linear Regression - Jupyter Notebook
56 pages
03 Multiple Linear Regression
No ratings yet
03 Multiple Linear Regression
7 pages
Statistic For Agriculture Studies: The Assumptions of Regression
No ratings yet
Statistic For Agriculture Studies: The Assumptions of Regression
6 pages
ML Unit
No ratings yet
ML Unit
23 pages
PythonFile[1]
No ratings yet
PythonFile[1]
5 pages
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
No ratings yet
Experiment Number: 3: Aim:-Study of The Linear Regression in The Machine Learning Using The Boston Housing Dataset. 1)
14 pages
Ds Lab 4.Ipynb - TARUN
No ratings yet
Ds Lab 4.Ipynb - TARUN
6 pages
Zerox Ready
No ratings yet
Zerox Ready
21 pages
Machine Learning (CSO851) - Lecture 02
No ratings yet
Machine Learning (CSO851) - Lecture 02
74 pages
Linear Regression
No ratings yet
Linear Regression
20 pages
Unit 2 Regression Analysis
No ratings yet
Unit 2 Regression Analysis
16 pages
Regression_Questionnaire
No ratings yet
Regression_Questionnaire
10 pages
Linear Regression
No ratings yet
Linear Regression
31 pages
Predictive_Modelling_Alternate_Project_Business_Case.docx
No ratings yet
Predictive_Modelling_Alternate_Project_Business_Case.docx
47 pages
Assignment 2 - LP1
No ratings yet
Assignment 2 - LP1
7 pages
Python ML Projects
No ratings yet
Python ML Projects
18 pages
Practical # 10
No ratings yet
Practical # 10
5 pages
Practical 4
No ratings yet
Practical 4
15 pages
DA_Programs
No ratings yet
DA_Programs
44 pages
MachineLearning
No ratings yet
MachineLearning
10 pages
Project Predictive Modeling
No ratings yet
Project Predictive Modeling
43 pages
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript
No ratings yet
Home Ai Machine Learning Dbms Java Blockchain Control System Selenium HTML Css Javascript
9 pages
Linear Regression
No ratings yet
Linear Regression
46 pages
Linear Regression
No ratings yet
Linear Regression
15 pages
Linear Regression - Jupyter Notebook
No ratings yet
Linear Regression - Jupyter Notebook
6 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
(Slide) Non Linear Regression
No ratings yet
(Slide) Non Linear Regression
39 pages
Chapter4_Regression.docx
No ratings yet
Chapter4_Regression.docx
15 pages
Practical 5
No ratings yet
Practical 5
13 pages
2.3 Assumptions of Linear Regression
No ratings yet
2.3 Assumptions of Linear Regression
16 pages
LP Prcatical 2 Jupyter Notebook
No ratings yet
LP Prcatical 2 Jupyter Notebook
5 pages
Regression Model
No ratings yet
Regression Model
6 pages
Unit Iii
No ratings yet
Unit Iii
27 pages
Ilovepdf Merged
No ratings yet
Ilovepdf Merged
47 pages
Polynomial Regression
No ratings yet
Polynomial Regression
6 pages
Linear Regression
No ratings yet
Linear Regression
6 pages
UNIT-1 Polynomial Regression
No ratings yet
UNIT-1 Polynomial Regression
7 pages
unit5_R
No ratings yet
unit5_R
5 pages
Machine Learning 2
No ratings yet
Machine Learning 2
45 pages
Sales
No ratings yet
Sales
7 pages
Matlab Homework Experts 2
No ratings yet
Matlab Homework Experts 2
10 pages
ML manoj
No ratings yet
ML manoj
51 pages
Linear Regression - Numpy and Sklearn
No ratings yet
Linear Regression - Numpy and Sklearn
7 pages
Linear Regression Algorithm
No ratings yet
Linear Regression Algorithm
16 pages
Assignment 2
No ratings yet
Assignment 2
5 pages
Group Work Assignment Supervised and Unsupervised Learning
No ratings yet
Group Work Assignment Supervised and Unsupervised Learning
10 pages
Arun_27072021_Predictive_Modeling.pdf
No ratings yet
Arun_27072021_Predictive_Modeling.pdf
33 pages
Linear_Regression_datascience_basit.pdf
No ratings yet
Linear_Regression_datascience_basit.pdf
19 pages
lab mannual of ML
No ratings yet
lab mannual of ML
43 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
25 pages
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
No ratings yet
G 203008076 - 4 - Christhian Quiñonez - Ex1 - 2 A PDF
20 pages
Data Science Chapitre 2
No ratings yet
Data Science Chapitre 2
132 pages
07 - Polynomial Regression
No ratings yet
07 - Polynomial Regression
4 pages
Linear Regresion
No ratings yet
Linear Regresion
28 pages
Linear Regression
No ratings yet
Linear Regression
38 pages
Lesson 4 Linear Assumptions
No ratings yet
Lesson 4 Linear Assumptions
27 pages
ML Lab-3
No ratings yet
ML Lab-3
14 pages
Unit 5
No ratings yet
Unit 5
171 pages
LR-LogReg
No ratings yet
LR-LogReg
53 pages
The Fibonacci Number Series
From Everand
The Fibonacci Number Series
Michael Husted
5/5 (1)
A Bound Testing Analysis of Wagners Law in Nigeri
No ratings yet
A Bound Testing Analysis of Wagners Law in Nigeri
18 pages
Chap 016
No ratings yet
Chap 016
59 pages
MODULE-3-STUDENTS
No ratings yet
MODULE-3-STUDENTS
9 pages
Discriminant & Logit Analysis Using SAS Enterprise Guide
No ratings yet
Discriminant & Logit Analysis Using SAS Enterprise Guide
53 pages
CD 404 Imp Que of Data Science
No ratings yet
CD 404 Imp Que of Data Science
3 pages
Panel Data Regression
No ratings yet
Panel Data Regression
39 pages
Mba 611 Statistics Syllabus
No ratings yet
Mba 611 Statistics Syllabus
4 pages
Data Klorofil Spss Konversi
No ratings yet
Data Klorofil Spss Konversi
8 pages
Unit 6 Hypothesis Testing Ex 1
No ratings yet
Unit 6 Hypothesis Testing Ex 1
4 pages
USES OF STATISTICS IN PSYCHOLOGY
No ratings yet
USES OF STATISTICS IN PSYCHOLOGY
5 pages
(Adaptive Computation and Machine Learning) Brendan J. Frey - Graphical Models For Machine Learning and Digital Communication-The MIT Press (1998)
No ratings yet
(Adaptive Computation and Machine Learning) Brendan J. Frey - Graphical Models For Machine Learning and Digital Communication-The MIT Press (1998)
203 pages
Lecture 7 - Regression
No ratings yet
Lecture 7 - Regression
31 pages
Data Analysis 1
No ratings yet
Data Analysis 1
13 pages
Ch4 Forecasting - Problems (1)-1
No ratings yet
Ch4 Forecasting - Problems (1)-1
8 pages
Ast Part1 PDF
No ratings yet
Ast Part1 PDF
20 pages
MS2 CHP 1-10 by Mark Yu
No ratings yet
MS2 CHP 1-10 by Mark Yu
107 pages
Wooldridge 7e Ch07 SM
No ratings yet
Wooldridge 7e Ch07 SM
10 pages
Predictor Coef SE Coef T P
No ratings yet
Predictor Coef SE Coef T P
6 pages
983-Article Text-4328-1-10-20210926
No ratings yet
983-Article Text-4328-1-10-20210926
11 pages
Econometrics For Finance Chapter 2
No ratings yet
Econometrics For Finance Chapter 2
66 pages
Immediate download Interpreting and Visualizing Regression Models Using Stata 2nd Edition Michael Mitchell ebooks 2024
100% (2)
Immediate download Interpreting and Visualizing Regression Models Using Stata 2nd Edition Michael Mitchell ebooks 2024
81 pages
MS Excel Instruction Steps in Matrimony Conjoint Analysis
No ratings yet
MS Excel Instruction Steps in Matrimony Conjoint Analysis
8 pages
Hasil Eviews
No ratings yet
Hasil Eviews
8 pages
U I
No ratings yet
U I
13 pages
Chapter 5
No ratings yet
Chapter 5
10 pages
Kimia Komputasi
No ratings yet
Kimia Komputasi
4 pages
Camm 3e Ch07 PPT PDF
No ratings yet
Camm 3e Ch07 PPT PDF
116 pages