Assumption of Linear Regression
Assumption of Linear Regression
In [3]: df.head()
Assumption 1
Linearity: Linear Regression assumes a linear relationship between the independent variables and the
dependent variable. It assumes that the relationship can be represented by a straight line, allowing us to
estimate the impact of each independent variable on the outcome.
Loading [MathJax]/extensions/Safe.js
TV and Radio shows linear relationship with Sales. Newspaper not shows any relationship with Sales
Loading [MathJax]/extensions/Safe.js
In [6]: df = df.drop(columns=['Newspaper'],axis=1)
df.head()
Assumption 2 :
No Multicollinearity: Linear Regression assumes that there is little or no multicollinearity among the
independent variables. Multicollinearity occurs when the independent variables are highly correlated with
each other, which can lead to unstable coefficient estimates and difficulty in interpreting the model.
<AxesSubplot:>
Out[11]:
If scale independent variables between feature is between 0.9 and 1.0 indicates very highly correlated
variables. to avoid highly correlated variables in our prediction we can use Feature Engineering or drop one
In [25]: X = df[['TV',"Radio"]]
y = df['Sales']
In [45]: model.fit(X_train,y_train)
Loading [MathJax]/extensions/Safe.js
Out[45]: ▾ LinearRegression
LinearRegression()
In [46]: model.score(X_train,y_train)
0.906590009997456
Out[46]:
In [69]: y_pred
Assumption 3:
Normality: Linear Regression assumes that the residuals follow a normal distribution. This assumption
ensures the accuracy of statistical inference and hypothesis testing. Deviations from normality may lead to
biased estimates and incorrect statistical inferences.
In [71]: sns.kdeplot(residual)
<AxesSubplot:xlabel='Sales', ylabel='Density'>
Out[71]:
Loading [MathJax]/extensions/Safe.js
Assumption 4:
Homoscedasticity: Homoscedasticity assumes that the variance of the error term is constant across all
levels of the independent variables. In simpler terms, it means that the spread of the residuals remains the
same across the predicted values. Departure from this assumption may indicate heteroscedasticity, which
can affect the model's reliability.
In [73]: plt.scatter(y_pred,residual)
<matplotlib.collections.PathCollection at 0x1e20a8a6260>
Out[73]:
Assumption 5:
No Autocorrelation of error: The residuals in the linear regression model are assumed to be independently
and identically distributed. This implies that each error term is independent and unrelated to the other error
terms.
In [87]: plt.figure(figsize=(10,5))
p = sns.lineplot(x=y_pred,y=residual,marker='o',color='blue')
plt.xlabel('y_pred/predicted values')
plt.ylabel('Residuals')
plt.ylim(-10,10)
plt.xlim(0,26)
Loading [MathJax]/extensions/Safe.js
p = plt.title('Residuals vs fitted values plot for autocorrelation check')
In [ ]:
Loading [MathJax]/extensions/Safe.js