Chapter three

Download as pdf or txt
Download as pdf or txt
You are on page 1of 35

Chapter Three

Regression analysis: Further details


3.1 Multiple regression analysis
• We have studied the two-variable model extensively in
the previous unit.
• But in economics it hardly found that one variable is
affected by only one explanatory variable.
• For example, the demand for a commodity is dependent
on price of the same commodity, price of other competing
or complementary goods, income of the consumer,
number of consumers in the market etc.
• Hence the two variable model is often inadequate in
practical works.
Cont.…
• Therefore, we need to discuss multiple regression models.
• The multiple linear regression is entirely concerned with the
relationship between a dependent variable (Y) and two or more
explanatory variables (X1, X2, …, Xn).
Why Do We Need Multiple Regression?
1. One of the motivation for multiple regression is the omitted
variable bias in the simple regression analysis.
• It is the primary drawback of the simple regression but
multiple regression allows us to explicitly control for many
other factors which simultaneously affect the dependent
variable.
Cont.…
Example: wages vs. education
• imagine we want to measure the (causal) effect of an
additional year of education on a person‟s wage
• if we want to the model wage= β0+ β1educ + u and interpret
β1 as the ceteris paribus effect of educ on wage, we have to
assume that educ and u are uncorrelated.
• consider a different model now: wage= β0+ β1educ + β2exper
+ u, where exper is a person‟s working experience (in years)
• Since the equation contains experience explicitly, we will be
able to measure the effect of education on wage, holding
experience fixed.
Cont..
2. multiple regression analysis is also useful for generalizing
functional relationships between variables.
Simple Regression vs. Multiple Regression
• most of the properties of the simple regression model directly
extend to the multiple regression case.
• we derived many of the formulas for the simple regression
model; however, with multiple variables, formulas can get
difficult when explanatory variables more than two.
• As far as the interpretation of the model is concerned, there‟s a
new important fact: the coefficient βj captures the effect of j th
explanatory variable, holding all the remaining explanatory
variables fixed
Estimation
• as in simple regression, the resulting estimates are
identical; similarly as before, we can define:
Population regression model:
𝑌𝑖 = 0 + 1𝑋1𝑖 + 2𝑋2𝑖 + ⋯ … … … . . 𝑘𝑋𝑘𝑖 + 𝑈𝑖
Sample regression model:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ … … … … . . +𝛽𝑘 𝑋𝑘𝑖 + 𝑈𝑖
Fitted values of y:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + ⋯ … … … … . +𝛽𝑘 𝑋𝑘𝑖
Residuals:
𝑈𝑖 = 𝑌𝑖 − 𝛽0 − 𝛽1 𝑋1𝑖 − 𝛽2 𝑋2𝑖 − ⋯ − ⋯ … … … . −𝛽𝑘 𝑋𝑘𝑖
Estimation con…
When number of explanatory variable = 2
Population regression model:
𝑌𝑖 = 0 + 1𝑋1𝑖 + 2𝑋2𝑖 + 𝑈𝑖
Sample regression model:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖 + 𝑈𝑖
Fitted values of y:
𝑌𝑖 = 𝛽0 + 𝛽1 𝑋1𝑖 + 𝛽2 𝑋2𝑖
Residuals:
𝑈𝑖 = 𝑌𝑖 − 𝛽0 − 𝛽1 𝑋1𝑖 − 𝛽2 𝑋2𝑖
Estimation con…
• Summing and squaring both sides to get residual sum of square
2 2
(𝑈𝑖 = (𝑌𝑖 − 𝑌) )
2 2
Or 𝑈𝑖 = (𝑌𝑖 − 𝛽0 − 𝛽1 𝑋1𝑖 − 𝛽2 𝑋2𝑖 )
• Now, using the concept of partial derivatives one can minimize and
set it equal to zero and solving for 𝛽0 , 𝛽1 &𝛽2 .
2
𝜕𝑈𝑖
= 0, since i = 𝛽0 , 𝛽1 &𝛽2
𝜕𝛽𝑖
• By using normal equation we can derive the formula for 𝛽0 , 𝛽1 &𝛽2.
• The normal equations are obtained based on CLRM assumptions.
• The method also know as moment method.
Estimation con…
Population assumption Sample counterpart
𝑈𝑖
𝐸 𝑈𝑖 = 0 = 0 or 𝑈𝑖 = 0
𝑛
𝑋1𝑖 𝑈𝑖
𝐶𝑜𝑣 𝑋1𝑖 , 𝑈𝑖 = 0 = 0 or 𝑋1𝑖 𝑈𝑖 = 0
𝑛
𝑋2𝑖 𝑈𝑖
𝐶𝑜𝑣 𝑋2𝑖 , 𝑈𝑖 = 0 = 0 or 𝑋2𝑖 𝑈𝑖 = 0
𝑛
.
Estimation con…
• Then we can derived the following normal equation on the base of
above assumptions
𝑼𝒊 = 𝟎 ⟹ 𝒀𝒊 = 𝜷𝟎 + 𝜷𝟏 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 ………………………eq(1)

𝑿𝟏𝒊 𝑼𝒊 = 𝟎 ⟹ 𝑿𝟏𝒊 𝒀𝒊 = 𝜷𝟎 𝑿𝟏𝒊 + 𝜷𝟏 𝑿𝟏𝒊 𝟐 + 𝜷𝟐 𝑿𝟏𝒊 𝑿𝟐𝒊 ……..eq(2)

𝑿𝟐𝒊 𝑼𝒊 = 𝟎 ⟹ 𝑿𝟐𝒊 𝒀𝒊 = 𝜷𝟎 𝑿𝟐𝒊 + 𝜷𝟏 𝑿𝟐𝒊 𝑿𝟏𝒊 + 𝜷𝟐 𝑿𝟐𝒊 𝟐 ……...eq(3)

• Divided equation one by n and substituting 𝜷𝟎 in eq(2) and eq(3), we get


𝜷𝟎 = 𝒀 − 𝜷𝟏 𝑿𝟏 − 𝜷𝟐 𝑿𝟐

𝟐 𝟐
𝑿𝟏𝒊 𝒀𝒊 −𝐧𝑿𝟏 𝒀 = 𝜷𝟏 (𝑿𝟏𝒊 − 𝐧𝑿𝟏 ) + 𝜷𝟐 (𝑿𝟏𝒊 𝑿𝟐𝒊 −n 𝑿𝟏 𝑿𝟐 )

𝟐 𝟐
𝑿𝟐𝒊 𝒀𝒊 −𝐧𝑿𝟐 𝒀 = 𝜷𝟏 (𝑿𝟐𝒊 𝑿𝟏𝒊 −n 𝑿𝟐 𝑿𝟏 ) + 𝜷𝟐 (𝑿𝟐𝒊 − 𝐧𝑿𝟐 )
Estimation con…

• Now convert level form in to deviation form


𝑥1𝑖 𝑦𝑖 = 𝛽1 𝑥1𝑖 2 + 𝛽2 𝑥1𝑖 𝑥2𝑖

𝑥2𝑖 𝑦𝑖 = 𝛽1 𝑥2𝑖 𝑥1𝑖 + 𝛽2 𝑥2𝑖 2

By using matrix or substitution method, we get the value of 𝛽1


and 𝛽2 .

𝑥1𝑖 2 𝑥1𝑖 𝑥2𝑖 𝛽1 𝑥1𝑖 𝑦𝑖


=
𝑥2𝑖 𝑥1𝑖 𝑥2𝑖 2 𝛽2 𝑥2𝑖 𝑦𝑖
Estimation con…

𝑥1𝑖𝑦𝑖 𝑥2𝑖 2 −(𝑥2𝑖𝑦𝑖)(𝑥1𝑖𝑥2𝑖)


𝛽1 =
𝑥1𝑖 2 𝑥2𝑖 2 −(𝑥2𝑖𝑥1𝑖)2
𝑥2𝑖𝑦𝑖 𝑥1𝑖 2 −(𝑥1𝑖𝑦𝑖)(𝑥2𝑖 𝑥1𝑖)
𝛽2 =
𝑥1𝑖 2 𝑥2𝑖 2 −(𝑥2𝑖𝑥1𝑖)2
Global hypothesis test (F and r2)
• We used the t test to test single hypotheses ,i.e. Hypotheses
involving only one coefficient. But what if we want to test
more than one Coefficients simultaneously? We do using F-
test.
• F-test it is used for to test overall significance of a model.
𝑀𝑆𝑆 𝑜𝑓 𝐸𝑆𝑆 𝐸𝑆𝑆 (𝑘−1)
𝐹= ⟹
𝑀𝑆𝑆 𝑜𝑓 𝑅𝑆𝑆 𝑅𝑆𝑆 (𝑛−𝑘)
𝑅2 (𝑘−1)
Or using R-squared 𝐹 =
1−𝑅2 (𝑛−𝑘)
Decision: If F > F α(k−1,n−k) , reject H0; otherwise you may
accept H0.(Fcal > F-tab).
where F α(k−1,n−k) is the critical F value at the α level of
significance and (k − 1) numerator df and (n − k) denominator df.
Selection of models

• One of the assumptions of the classical linear regression model


(CLRM), is that the regression model used in the analysis is
“correctly” specified: If the model is not “correctly” specified,
we encounter the problem of model specification error or
model specification bias.
Basic questions related to model selection
 what are the criteria in choosing a model for empirical
analysis?
 What types of model specification errors is one likely to
encounter in practice?
 What are the consequences of specification errors?
Cont..
 How does one detect specification errors? In other words, what
are some of the diagnostic tools that one can use?
 Having detected specification errors, what remedies can one
adopt?
Model Selection Criteria
Model chosen for empirical analysis should satisfy the following
criteria
• Be data admissible; that is, predictions made from the model
must be logically possible.
• Be consistent with theory; that is, it must make good
economic sense.
Cont…
• Exhibit parameter constancy; that is, the values of the
parameters should be stable. Otherwise, forecasting will
be difficulty.
• Exhibit data coherency; that is, the residuals estimated
from the model must be purely random (technically, white
noise).
• Be encompassing; that is, the model should encompass
or include all the rival models in the sense that it is
capable of explaining their results.
• In short, other models cannot be an improvement over the
chosen model.
Types of Specification Errors
• In developing an empirical model, one is likely to commit one
or more of the following specification errors:
i. Omission of a relevant variable(s)
ii. Inclusion of an unnecessary variable(s)
iii. Adopting the wrong functional form
iv. Errors of measurement
Consequences of Model Specification Errors
Omitting a Relevant Variable
• If the left-out, or omitted, is correlated with the included
variable, the correlation coefficient between the two variables
is nonzero, the estimators are biased as well as inconsistent.
• Even if the two variables are not correlated, the intercept
parameter is biased, although the slope parameter is now
unbiased.
• The disturbance variance is incorrectly estimated.
• In consequence, the usual confidence interval and hypothesis-
testing procedures are likely to give misleading conclusions
about the statistical significance of the estimated parameters.
Con…
• There is asymmetry in the two types of specification biases.
• Including an irrelevant variable in the model, the model still
gives us unbiased and consistent estimates of the coefficients
in the true model, the error variance is correctly estimated, and
the conventional hypothesis-testing methods are still valid.
• The only penalty we pay for the inclusion of the superfluous
variable is that the estimated variances of the coefficients are
larger, and as a result our probability inferences about the
parameters are less precise.
Functional Forms of Regression Models
• Commonly used regression models that may be nonlinear in
the variables but are linear in the parameters or that can be
made so by suitable transformations of the variables.
1. Linear model: Y = β1 + β2X
2. Log model: lnY = β1 + β2 ln X
3. Semi-log model(lin-log or log-lin): Y = β1 + β2 ln X and lnY
= β1 + β2 X
4. Reciprocal model:Y = β1 + β2(1/X)
Example
Double-log model
ln 𝐸𝑋𝐷𝑈𝑅𝑡 = −7.5417 + 1.6266 ln 𝑃𝐶𝐸𝑋
𝑠𝑒 = (0.716) (0.080)
𝑡 = (−10.53) (20.3) 𝑟2 = 0.9695
• Interprtation:total personal expenditure goes up by 1 percent, on
average, the expenditure on durable goods goes up by about 1.63
percent.
Lin-log model
𝐹𝑜𝑜𝑑𝐸𝑥𝑝𝑖 = −1283.912 + 257.2700 ln 𝑇𝑜𝑡𝑎𝑙𝐸𝑥𝑝𝑖
𝑠𝑒 ( ?? ) ( ?? )
𝑡 = −4.3848 5.6625 𝑟 2 = 0.3769
• Interprtation: an increase in the total food expenditure of 1 percent,
on average, leads to about 2.57 birr increase in the expenditure on
food.
Cont..
Log-lin model
• ln 𝐸𝑋𝑆𝑡 = 8.3226 + 0.00705𝑡
𝑠𝑒 = (0.0016) (0.00018) 𝑟2 = 0.9919
𝑡 = 5201.625 39.1667
Where, EXS is expenditure on services and t is time and
measured in quarter.
Interprtation: expenditures on services increased at the
(quarterly) rate of 0.705 percent.
Relaxing the CLRM basic assumptions
Multicollinearity problem
• The Assumption classical linear regression model
(CLRM) is that there is no high multicollinearity among
the regressors included in the regression model.
• Multicollinearity meant the existence of a “perfect” or
exact and inexact, linear relationship among some or all
explanatory variables of a regression model.
Cont..
Sources of multicollinearity
• The data collection method employed
• Model specification.
• Overdetermined model
Consequences of Multicollinearity
• The OLS estimators are BLUE
• OLS estimators have large variances and covariance's.
• Because of the large variance of the estimators, which
means large standard errors, the confidence interval tend
to be much wider, leading the acceptance of “zero null
hypothesis”
Cont..
• The computed t-ratio will be very small leading one or
more of the coefficients tend to be statistically
insignificant when tested individually
• R-squared, the overall measure of goodness of fit, can be
very high.
Remedial measures of multicollinearity
• Combining cross-sectional and time series data
• Dropping a variable(s) and specification bias.
• Transformation of variables
Tests to check the existence of multicollinearity
• Variance inflated factor (VIF)
• Correlation matrix
Heteroscedasticity
Source
• Model specification problem
• Data collection problem
• The presence of outliers
Consequences
• Variance of the error term under or over estimates
• The OLS estimators are not BLUE
• CI and t-ratio also affected
• Hypothesis testing is misleading
Cont..
Tests to check the existence of Heteroscedasticity
• Goldfeld-Quandt Test
• Breusch–Pagan–Godfrey Test
• White’s test
Regression on Dummy Variables
• Four types of variables that one generally encounters in
empirical analysis: These are: ratio scale, interval scale,
ordinal scale, and nominal scale.
• Regression models that may involve not only ratio scale
variables but also nominal scale variables. Such variables are
also known as indicator variables, categorical variables,
qualitative variables, or dummy variables.
The nature of dummy variable
• In regression analysis the dependent variable, is frequently
influenced not only by ratio scale variables (e.g., income,
output, prices, costs, height, temperature) but also by variables
that are essentially qualitative, or nominal scale, in nature,
such as sex, race, color, religion, nationality, geographical
region and political party affiliation.
Cont..
• One way we could “quantify” such attributes is by
constructing artificial variables that take on values of 1 or 0, 1
indicating the presence (or possession) of that attribute and 0
indicating the absence of that attribute.
• Variables that assume such 0 and 1 values are called dummy
variables.
• Dummy variables can be used in regression models just as
easily as quantitative variables.
• Regression model may contain explanatory variables that are
exclusively dummy, or qualitative, in nature.
Cont…
Given : Yi = α + βDi + Ui
where Y= annual salary of a college professor
Di = 1 if male college professor = 0 otherwise (i.e., female
professor)
• The above regression model may enable us to find out whether
sex makes any difference in a college professor‟s salary,
assuming, of course, that all other variables such as age,
degree attained, and years of experience are held constant.
• Assuming that the disturbance satisfy the usually assumptions
of the classical linear regression model, we obtain from.
• Mean salary of female college professor: E(Y/Di =0)= α
• Mean salary of male college professor: E(Y/Di =0)= α+β
Cont…

• The intercept term α gives the mean salary of female college


professors and the slope coefficient β tells by how much the
mean salary of a male college professor differs from the mean
salary of his female counterpart.
• How to test whether there is sex discrimination or not ?
Example
𝑌𝑖 = 18,000 + 3,280Di
(0.32) (0.44)
t = (57.74) (7.439) R2 = 0.8737
Based on the result the estimated mean salary of female college
professor is birr 18,000 and that of male professor is birr 21,280.
Cont.….
Regression on one quantitative variable and one qualitative
variable with two classes
Yi = 𝛼1 + α2 Di + βXi + Ui
where Y= annual salary of a college professor
Di = 1 if male college professor = 0 otherwise (i.e., female
professor)
Cont…
Mean salary of female college professor:
E(Y/Xi , Di =0) = 𝛼1 +βXi
Mean salary of male college professor:
E(Y/Xi , Di =1) = (𝛼1 +α2 ) + βX i
• The level of the male professor‟s mean salary is different from
that of the female professor‟s mean salary (by α2 ) but the rate
of change in the mean annual salary by years of experience is
the same for both sexes.
Cont.…
Graphically,
Regression on one quantitative variable and two
qualitative variables
Yi = 𝛼0 + α1 D1i + α2 D2i + βXi + Ui
where Y= annual salary of a college professor
D1i = 1 if male college professor = 0 otherwise (i.e., female
professor). D2i = 1 if white and 0 otherwise
Exercises: From the above expression obtain the following
1. Mean salary for black female professor
2. Mean salary for black male professor
3. Mean salary for white female professor
4. Mean salary for white male professor
Cont.…
 Regression on one quantitative variable and one
qualitative variable with more than two classes.
• Suppose we consider three mutually exclusive levels of
education: less than high school, high school, and college.
• If a qualitative variable has „m‟ categories, introduce only „m-
1‟ dummy variables
• Following the rule that the number of dummies be one less
than the number of categories of the variable.
Yi = 𝛼0 + α1 D1i + α2 D2i + βXi + Ui
Yi = Where annual expenditure on health care
Xi = annual expenditure, D1 = 1 if high school education, D2 = 1
if college education and = 0 otherwise
• Compute the mean health care expenditure functions for the
three levels of education

You might also like