MLR - 2023
MLR - 2023
MLR - 2023
econometrics
FE 2106
BSc in Financial Engineering
level II
Correlation and scatter plots
Lesson plan
Multiple linear regression
Smoothing
2
Multiple Linear Regression
• Describe multiple simultaneous associations of independent variables
with one continuous outcome.
ØEstimation
ØVariable selection
ØAssessing model fit
Dummy variable
SLR/MLR in matrices
The model : 𝒀 = 𝑿𝜷 + 𝝐
Estimates:
𝜷 = 𝑿# 𝑿 &𝟏 𝑿# 𝒀
Singularity
1 ∑𝑋$% −∑𝑋$
𝑿# 𝑿&𝟏 = , det 𝑿# 𝑿 = 𝑿# 𝑿 = 𝑛∑ 𝑋$ − 𝑋- %
𝑛∑ 𝑋$ − 𝑋- % −∑𝑋$ 𝑛
𝑿# 𝑿 is singular when inverse does not exist. 𝑿# 𝑿&𝟏 blows up when the determinant is zero.
Least square procedure will not give unique solution, but many alternative solutions.
Reason: data are inadequate for fitting the model/model is too complex for available data
Solution: need more data or a simpler model
Example: singularity
Obtain the simple linear regression using data analysis tool and matrix
calculations
Model with two predictors
• 𝑌! = 𝛽" + 𝛽# 𝑋!# + 𝛽$ 𝑋!$ + 𝜖!
• Assume 𝐸 𝜖! = 0, E 𝑌 = 𝛽" + 𝛽# 𝑋# + 𝛽$ 𝑋$
General procedure
• for each predictor, verify through a data plot that a linear relation is likely
to be appropriate;
• estimate the MLR model;
• assess through diagnostics whether the model provides an appropriate fit
to the data;
• if so, use the model to draw inferences about the regression coefficients;
• reduce the model by removing nonsignificant predictors, if appropriate for
the study goals; and
• reassess through diagnostics whether the model provides an appropriate
fit to the data.
Model with more than two predictors
General linear regression model:
Ø Qualitative predictor variables: can encompasses variables like gender, disability status that can take
values 0,1 to identify classes of a qualitative variable.
Qualitative predictor variables
Example: Consider a regression analysis to predict the length of hospital stay (Y) based on the age (X1) and
gender (X2) of the patient
General Linear Regression (Matrix Form)
𝒀 = 𝑿𝜷 + 𝝐
𝐸 𝝐 = 0, 𝑉 𝝐 = 𝑰𝜎 %
General Linear Regression
• Least Square Properties
> = 𝑿𝜷
1. The fitted values are obtained by 𝒀
>−𝒀
2. The vector of residuals are obtained by 𝝐 = 𝒀
3. 𝑉 𝒃 = 𝑿# 𝑿 &" 𝜎 % provides the variances (diagonal terms) and covariances (off-diagonal terms) of
the estimates.
4. Suppose 𝑿#𝟎 is a specified 1×𝑝 vector whose elements are of the same form as a row of 𝑿 so that
@𝟎 = 𝑿#𝟎 𝜷 = 𝜷# 𝑿𝟎 is the fitted value at a specified location (predicted value at 𝑿𝟎 by the regression
𝒀
equation)
General Linear Regression
• Least Square Properties
4. Suppose 𝑿#𝟎 is a specified 1×𝑝 vector whose elements are of the same form as a row of 𝑿 so that
𝑌A! = 𝑿#𝟎 𝜷 = 𝜷# 𝑿𝟎 is the fitted value at a specified location (predicted value at 𝑿𝟎 by the regression
equation)
5. Basic Anova
Source Df SS MS
Regression p-1 𝟏 # MS_ regression
𝜷𝑿# 𝒀 − 𝒀 𝑱𝒀
𝒏
Residual n-p 𝒀# 𝒀 − 𝜷# 𝑿# 𝒀 MS_ residual
Total n-1 #
𝟏 #
𝒀 𝒀− 𝒀 𝑱𝒀
𝒏
𝐽 is an 𝑛×𝑛 matrix of 1s
General Linear Regression
• Coefficient of multiple determination (𝑅$ )
Simple regression,
%
𝑆𝑆𝑅 𝑆𝑆𝐸
𝑅 = =1− , 0 ≤ 𝑅% ≤ 1
𝑆𝑆𝑇 𝑆𝑆𝑇
Note:
• A large value of 𝑅 % does not necessarily imply that the fitted model is useful.
• Adding more X variables to the regression model can only increase 𝑅 % and never reduce it, because
SSE can never become larger with more X variables and SST is always same for a given set of
responses.
𝐻! : 𝛽, = 0
𝐻+ : 𝛽, ≠ 0
Test statistic:
𝑏,
𝑡∗ = ; s. d. 𝑏, = √𝑉 𝛽
s. d. 𝑏,
Decision rule:
If 𝑡 ∗ ≤ 𝑡 .
"& % ,/&)
, failed to reject 𝐻!