0% found this document useful (0 votes)
8 views

04 Multiple Linear Regression

The document provides an overview of Multiple Linear Regression (MLR), detailing its structure, including the use of multiple regressors and the impact of multicollinearity. It discusses the mathematical formulation of MLR, matrix operations in NumPy for regression analysis, and the significance of the Variance Inflation Factor (VIF) in assessing multicollinearity. Additionally, it includes Python code examples for fitting linear models and demonstrates the use of sklearn for regression analysis.

Uploaded by

rasheedahmad8987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

04 Multiple Linear Regression

The document provides an overview of Multiple Linear Regression (MLR), detailing its structure, including the use of multiple regressors and the impact of multicollinearity. It discusses the mathematical formulation of MLR, matrix operations in NumPy for regression analysis, and the significance of the Variance Inflation Factor (VIF) in assessing multicollinearity. Additionally, it includes Python code examples for fitting linear models and demonstrates the use of sklearn for regression analysis.

Uploaded by

rasheedahmad8987
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Multiple Linear Regression

Prof. Kailash Singh


Department of Chemical Engineering
MNIT Jaipur
Multiple Linear Regression (MLR)
• A regression model that contains more than
one regressor variable is called a multiple
regression model.

There are k regressors here x1 , x2, …,xk.


is the intercept of the plane. are
partial regression coefficients.
• Polynomials can also be considered as multiple linear
regression models. For example, a cubic polynomial
can be written as
where x1=x, x2=x2,x3=x3.
• An interaction between two variables can be
represented by a cross-product term in the model,
such as .
• In general, any regression model that is linear in
parameters ( ) is a linear regression model, regardless
of the shape of the surface that it generates.
we obtain the least squares normal equations
Matrix Approach to MLR
The model equation for MLR is
Minimize L with respect to :

We get the normal equation in the matrix form:


Solving for regression coefficients, we get:
Matrix Operations in NumPy
Define matrices:
A=np.array([[1,2],[4,5]]) #2x2 matrix
B=np.array([[2,3],[5,6]]) #2x2 matrix

Element-wise multiplication
C=A*B Or, C=np.multiply(A,B) both give the same result

Matrix Multiplication
C=A@B or C=np.matmul(A,B) or C=np.dot(A,B)

Matrix Inverse
C=np.linalg.inv(A)
Matrix Transpose
C=np.transpose(A) or C=A.T
Element-wise division
C=A/B
Determinant of matrix
C=np.linalg.det(A)
Solution to Linear Algebraic equations
Let equations be Ax=b
A = np.array([[2, 4], [6, 8]])
b = np.array([5, 6])
x = np.linalg.solve(A, b)
Eigenvalues and Eigenvectors
eigval,eigvec=np.linalg.eig(A)

For finding only eigenvalues, we can use:


eigval= np.linalg.eigvals(A)
Multicollinearity
• When the dataset has a large number of
independent variables, it is possible that few of
these may be highly correlated.
• The existence of a high correlation between
independent variables is called multicollinearity.
• Presence of multicollinearity may destabilize the
MLR model.
• Multicollinearity may inflate the standard error of
estimate.
• Regression coefficients may be wrongly interpreted.
Variance Inflation Factor (VIF)
• VIF is a measure for identifying the existence of
multicollinearity.
• Consider two independent variables X1 and X2.
Let . Let be the R-squared
value of this model. Then VIF is given by:

• VIF value greater than 4 requires further


investigation to assess the impact of
multicollinearity.
Example
Labour cost of fabrication of heat exchangers depends on number of
tubes and shell surface area. Fit a linear model into the data as follows:
Area No. of tubes Labour cost
(X1) (X2) (y)
120 550 310
130 600 300
108 520 275
110 420 250
84 400 220
90 300 200
80 230 190
55 120 150
64 190 140
50 100 100
Let
Cont… Python code
import numpy as np
X=np.array([[ 1, 120, 550],
[ 1, 130, 600],
import numpy as np [ 1, 108, 520],
[ 1, 110, 420],
X=np.array([[ 1, 120, 550], [ 1, 84, 400],
[ 1, 130, 600], [ 1, 90, 300],
[ 1, 80, 230],
[ 1, 108, 520], [ 1, 55, 120],
[ 1, 110, 420], [ 1, 64, 190],
[ 1, 50, 100]])
[ 1, 84, 400], y=np.array([310,300,275,250,220,200,190,
[ 1, 90, 300], 150,140,100])
B=X.T
[ 1, 80, 230], C=y.T
A=np.linalg.inv(B@X) #@ means matmul
[ 1, 55, 120], beta=A@(B@C)
[ 1, 64, 190], print(beta
[ 1, 50, 100]])

y=np.array([310,300,275,250,220,200,190,150,140,100])
y=y.T #Transpose of matrix
A=np.linalg.inv(X.T@X) #@ means matmul
beta=A@(X.T@y)
print(beta)
Cont… Alternative code
import numpy as np
X=np.array([[ 1, 120, 550],
[ 1, 130, 600],
[ 1, 108, 520],
[ 1, 110, 420],
[ 1, 84, 400],
[ 1, 90, 300],
[ 1, 80, 230],
[ 1, 55, 120],
[ 1, 64, 190],
[ 1, 50, 100]])

y=np.array([310,300,275,250,220,200,190,150,140,100])
y=y.T #Transpose of matrix

beta=np.linalg.solve(X.T@X,X.T@y) #Solve linear alg. equations


print(beta)
Example
An experiment was conducted to find yield of a
product (y) for various values of reactant mole%
(x). Fit a model
Solution
Python Code
import numpy as np
x1=np.array([20,20,30,40,40,50,50,50,60,70])
y=np.array([73,78,85,90,91,87,86,91,75,65])
x0=np.ones(len(x1))
x2=x1**2
X=np.array([x0,x1,x2])
X=X.T
y=y.T
beta=np.linalg.solve(X.T@X,X.T@y)
print(beta)
Alternative approach using sklearn
import numpy as np
import matplotlib.pyplot as plt
from sklearn.linear_model import LinearRegression
model=LinearRegression()
x1=np.array([20,20,30,40,40,50,50,50,60,70])
y=np.array([73,78,85,90,91,87,86,91,75,65])
x2=x1**2
X=np.array([x1,x2])
X=X.T
y=y.T
model.fit(X,y)
b0=model.intercept_
b12=model.coef_
R2=model.score(X,y)
yp=model.predict(X)
plt.scatter(x1,y)
plt.plot(x1,yp)
plt.show()

print(f" b0={b0},b1,b2={b12}, R^2={R2}")

You might also like