MLR - 2023

Download as pdf or txt
Download as pdf or txt
You are on page 1of 18


FE 2106
BSc in Financial Engineering
level II
Correlation and scatter plots

Simple linear regression

Lesson plan
Multiple linear regression

Time series analysis


Multiple Linear Regression
• Describe multiple simultaneous associations of independent variables
with one continuous outcome.

ØVariable selection
ØAssessing model fit

• Ex: response variable: peak plasma growth hormone level of short

predictors (14): gender, age and various body measurements
Mathematics: SLR in matrices
𝑋! 𝑋"

Dummy variable
SLR/MLR in matrices
The model : 𝒀 = 𝑿𝜷 + 𝝐

Normal equations to solve for parameters:

𝑛 ∑𝑋$ ∑𝑌$
where 𝑿# 𝑿 = 𝑿# 𝒀 =
𝑿# 𝑿𝜷 = 𝑿# 𝒀 ∑𝑋$ ∑𝑋$% ∑𝑋$ 𝑌$

𝜷 = 𝑿# 𝑿 &𝟏 𝑿# 𝒀

1 ∑𝑋$% −∑𝑋$
𝑿# 𝑿&𝟏 = , det 𝑿# 𝑿 = 𝑿# 𝑿 = 𝑛∑ 𝑋$ − 𝑋- %
𝑛∑ 𝑋$ − 𝑋- % −∑𝑋$ 𝑛

𝑿# 𝑿 is singular when inverse does not exist. 𝑿# 𝑿&𝟏 blows up when the determinant is zero.

Least square procedure will not give unique solution, but many alternative solutions.

Reason: data are inadequate for fitting the model/model is too complex for available data
Solution: need more data or a simpler model
Example: singularity

Obtain the simple linear regression using data analysis tool and matrix
Model with two predictors
• 𝑌! = 𝛽" + 𝛽# 𝑋!# + 𝛽$ 𝑋!$ + 𝜖!

Ø Linear in predictor variables, Linear response surface

Ø linear in parameters
Ø 𝑌$ =response in i-th trial
Ø 𝑋$" , 𝑋$% = values of predictors in i-th trial
Ø 𝛽! , 𝛽" , 𝛽% =parameters of the model
Ø 𝜖$ = error term

• Assume 𝐸 𝜖! = 0, E 𝑌 = 𝛽" + 𝛽# 𝑋# + 𝛽$ 𝑋$
General procedure
• for each predictor, verify through a data plot that a linear relation is likely
to be appropriate;
• estimate the MLR model;
• assess through diagnostics whether the model provides an appropriate fit
to the data;
• if so, use the model to draw inferences about the regression coefficients;
• reduce the model by removing nonsignificant predictors, if appropriate for
the study goals; and
• reassess through diagnostics whether the model provides an appropriate
fit to the data.
Model with more than two predictors
General linear regression model:

𝑌$ = 𝛽! + 𝛽" 𝑋$," + 𝛽% 𝑋$,% + ⋯ + 𝛽)&" 𝑋$,)&" + 𝜖$

𝜖$ ~𝑁 0, 𝜎 %

𝐸 𝑌 = 𝛽! + 𝛽" 𝑋" + ⋯ + 𝛽)&" 𝑋)&"

Ø No interaction effects between predictors

Ø Qualitative predictor variables: can encompasses variables like gender, disability status that can take
values 0,1 to identify classes of a qualitative variable.
Qualitative predictor variables
Example: Consider a regression analysis to predict the length of hospital stay (Y) based on the age (X1) and
gender (X2) of the patient
General Linear Regression (Matrix Form)
𝒀 = 𝑿𝜷 + 𝝐

𝒀 is an 𝑛×1 vector of observations 𝑿# 𝑿𝜷 = 𝑿# 𝒀

𝑿 is an 𝑛×𝑝 matrix of known form
𝜷 is an 𝑝×1 vector of parameters
𝝐 is an 𝑛×1 vector vector of errors
𝜷 = 𝑿# 𝑿 &𝟏
𝑿# 𝒀

𝐸 𝝐 = 0, 𝑉 𝝐 = 𝑰𝜎 %
General Linear Regression
• Least Square Properties
> = 𝑿𝜷
1. The fitted values are obtained by 𝒀

2. The vector of residuals are obtained by 𝝐 = 𝒀

3. 𝑉 𝒃 = 𝑿# 𝑿 &" 𝜎 % provides the variances (diagonal terms) and covariances (off-diagonal terms) of
the estimates.

4. Suppose 𝑿#𝟎 is a specified 1×𝑝 vector whose elements are of the same form as a row of 𝑿 so that
@𝟎 = 𝑿#𝟎 𝜷 = 𝜷# 𝑿𝟎 is the fitted value at a specified location (predicted value at 𝑿𝟎 by the regression
General Linear Regression
• Least Square Properties

4. Suppose 𝑿#𝟎 is a specified 1×𝑝 vector whose elements are of the same form as a row of 𝑿 so that
𝑌A! = 𝑿#𝟎 𝜷 = 𝜷# 𝑿𝟎 is the fitted value at a specified location (predicted value at 𝑿𝟎 by the regression

predicted value has variance,

𝑉 𝑌A! = 𝑿#𝟎 𝑽 𝜷 𝑿𝟎 = 𝑿#𝟎 𝑿# 𝑿 &𝟏
𝑿𝟎 𝜎 %
General Linear Regression
• Least Square Properties

5. Basic Anova

Source Df SS MS
Regression p-1 𝟏 # MS_ regression
𝜷𝑿# 𝒀 − 𝒀 𝑱𝒀
Residual n-p 𝒀# 𝒀 − 𝜷# 𝑿# 𝒀 MS_ residual
Total n-1 #
𝟏 #

𝐽 is an 𝑛×𝑛 matrix of 1s
General Linear Regression
• Coefficient of multiple determination (𝑅$ )

Simple regression,
𝑅 = =1− , 0 ≤ 𝑅% ≤ 1

• A large value of 𝑅 % does not necessarily imply that the fitted model is useful.
• Adding more X variables to the regression model can only increase 𝑅 % and never reduce it, because
SSE can never become larger with more X variables and SST is always same for a given set of

Adjusted coefficient of multiple regression,

𝑛 − 1 𝑆𝑆𝐸
𝑅+% =1−
𝑛 − 𝑝 𝑆𝑆𝑇
General Linear Regression
• Hypothesis testing for 𝛽% :

𝐻! : 𝛽, = 0
𝐻+ : 𝛽, ≠ 0

Test statistic:
𝑡∗ = ; s. d. 𝑏, = √𝑉 𝛽
s. d. 𝑏,

Decision rule:
If 𝑡 ∗ ≤ 𝑡 .
"& % ,/&)
, failed to reject 𝐻!

You might also like