1-Chap II Econometrics ABC DR Mitiku
1-Chap II Econometrics ABC DR Mitiku
1-Chap II Econometrics ABC DR Mitiku
Chapter 2
Denote the dependent variable by y and the independent variable(s) by x1, x2, ... ,
xk where there are k independent variables.
Note that there can be many x variables but we will limit ourselves to the case
where there is only one x variable to start with. In our set-up, there is only one y
variable.
Regression is different from Correlation
We have some intuition that the beta on this fund is positive, and we
therefore want to find whether there appears to be a relationship between x
and y given the data that we have. The first stage would be to form a
scatter plot of the two variables.
Graph (Scatter Diagram)
45
Excess return on fund XXX
40
35
30
25
20
15
10
5
0
0 5 10 15 20 25
Excess return on market portfolio
Finding a Line of Best Fit
We can use the general equation for a straight line,
y=a+bx
to get the line that best “fits” the data.
x
Ordinary Least Squares
The most common method used to fit a line to the data is known as
OLS (ordinary least squares).
What we actually do is take each distance and square it (i.e. take the
area of each of the squares in the diagram) and minimise the total sum
of the squares (hence least squares).
yi
û i
ŷi
xi x
How OLS Works
5
So min. uˆ 2
1 uˆ22 uˆ32 uˆ42 uˆ52 , or minimise
residual sum of squares.
t
ˆ
u 2 . This is known as the
t 1
But what was ût ? It was the difference between the actual point and the line,
yt - ŷ.t
So minimising y ˆ
y
t t is equivalent to minimising
2
t
ˆ
u 2
L
2 xt ( yt ˆ ˆxt ) 0
ˆ t
But y t Ty and xt Tx .
Deriving the OLS Estimator (cont’d)
t t t
x y y x ˆ
x t
x ˆ
t 0
x
2
t t
x y T y x ˆ
T x 2
ˆ
t 0
x
2
t
Deriving the OLS Estimator (cont’d)
So overall we have
ˆ
xt yt Tx y
andˆ y ˆx
xt2 Tx 2
yˆ t 1.74 1.64 x t
Question: If an analyst tells you that she expects the market to yield a return
20% higher than the risk-free rate next year, what would you expect the return
on fund XXX to be?
Solution: We can say that the expected value of y = “-1.74 + 1.64 * value of x”,
so plug x = 20 into the equation to get the expected value for y:
yˆ i 1.74 1.64 20 31.06
Accuracy of Intercept Estimate
0 x
The Population and the Sample
Linear in the parameters means that the parameters are not multiplied
together, divided, squared or cubed etc.
yt xt ut
Linear and Non-linear Models
Additional Assumption
5. ut is normally distributed
Properties of the OLS Estimator
Unbiased
lim Pr ˆ 0 0
T
The least squares estimates of $ and $ are unbiased. That is E( $)= and E($)=
Thus on average the estimated value will be equal to the true values. To prove this also
requires the assumption that E(ut)=0. Unbiasedness is a stronger condition than
consistency.
Efficiency
An estimator $ of parameter is said to be efficient if it is unbiased and no other
unbiased estimator has a smaller variance. If the estimator is efficient, we are
minimising the probability that it is a long way off from the true value of .
Precision and Standard Errors
Any set of regression estimates of $ and $ are specific to the sample used in their
estimation.
Recall that the estimators of and from the sample parameters ($and $) are given
by
ˆ t 2 t
x y Tx y
andˆ y ˆx
x Tx 2
What we need is some measure oft the reliability or precision of the estimators
( $ and $ ). The precision of the estimate is given by its standard error. Given
assumptions 1 - 4 above, then the standard errors can be shown to be given by
x xt
2 2
SE (ˆ ) s t
s ,
T (x x)
t
2
T x T x
2
t
2 2
1 1
SE ( ˆ ) s s
( xt x ) 2 xt2 Tx 2
where s is the estimated standard deviation of the residuals.
Estimating the Variance of the Disturbance Term
The variance of the random variable ut is given by
Var(ut) = E[(ut)-E(ut)]2
which reduces to
Var(ut) = E(ut2)
T 2
where uˆ 2
t is the residual sum of squares and T is the sample size.
2. The sum of the squares of x about their mean appears in both formulae.
The larger the sum of squares, the smaller the coefficient variances.
Some Comments on the Standard Error Estimators
Consider what happens if xt x is small or large:
2
y
y
y y
x
0 x x 0 x
Some Comments on the Standard Error Estimators
(cont’d)
3. The larger the sample size, T, the smaller will be the coefficient
variances. T appears explicitly in SE( $) and implicitly in SE( $).
The reason is that xt measures how far the points are away from the
2
y-axis.
Example: How to Calculate the Parameters and
Standard Errors
Assume we have the following data calculated from a regression of y on a
single variable x and a constant over 22 observations.
Data:
SE(regression), s
uˆ t2
130.6
2.55
T 2 20
3919654
SE ( ) 2.55 * 3.35
22 3919654 22 416.5
2
1
SE ( ) 2.55 * 0.0079
3919654 22 416.5 2
We now write the results as
yˆ t 59.12 0.35 xt
(3.35) (0.0079)
An Introduction to Statistical Inference
We want to make inferences about the likely population values from the regression
parameters.
yˆ t 20.3 0.5091xt
(14.38) (0.2561)
$ 0.5091
is a single (point) estimate of the unknown population parameter, . How
“reliable” is this estimate?
The reliability of the point estimate is measured by the coefficient’s standard error.
Hypothesis Testing: Some Concepts
We can use the information in the sample to make inferences about the
population.
We will always have two hypotheses that go together, the null hypothesis
(denoted H0) and the alternative hypothesis (denoted H1).
The null hypothesis is the statement or the statistical hypothesis that is actually
being tested. The alternative hypothesis represents the remaining outcomes of
interest.
For example, suppose given the regression results above, we are interested in the
hypothesis that the true value of is in fact 0.5. We would use the notation
H0 : = 0.5
H1 : 0.5
This would be known as a two sided test.
One-Sided Hypothesis Tests
Sometimes we may have some prior information that, for example, we would
expect > 0.5 rather than < 0.5. In this case, we would do a one-sided test:
H0 : = 0.5
H1 : > 0.5
or we could have had
H0 : = 0.5
H1 : < 0.5
There are two ways to conduct a hypothesis test: via the test of significance
approach or via the confidence interval approach.
The Probability Distribution of the
Least Squares Estimators
Since the least squares estimators are linear combinations of the random
variables
i.e. $
wt yt
The weighted sum of normal random variables is also normally distributed, so
N(, Var())
$
N(, Var())
$
What if the errors are not normally distributed? Will the parameter estimates
still be normally distributed?
Yes, if the other assumptions of the CLRM hold, and the sample size is
sufficiently large.
The Probability Distribution of the
Least Squares Estimators (cont’d)
Standard normal variates can be constructed from and :
$ $
and
ˆ ˆ
~ N 0,1 ~ N 0,1
var var
But var() and var() are unknown, so
and
ˆ ˆ
~ tT 2 ~ tT 2
SE (ˆ ) ˆ
SE ( )
Testing Hypotheses:
The Test of Significance Approach
Assume the regression equation is given by ,
for t=1,2,...,T
yt xt ut
The steps involved in doing a test of significance are:
1. Estimate , and , in the usual way
$ $ SE($ ) SE( $ )
2. Calculate the test statistic. This is given by the formula
$ *
test statistic
SE ( $ )
where * is the value of under the null hypothesis.
The Test of Significance Approach (cont’d)
f(x)
95% non-rejection
region 5% rejection region
The Rejection Region for a 1-Sided Test (Lower Tail)
f(x)
6. Use the t-tables to obtain a critical value or values with which to compare
the test statistic.
7. Finally perform the test. If the test statistic lies in the rejection region then
reject the null hypothesis (H0), else do not reject H0.
A Note on the t and the Normal Distribution
You should all be familiar with the normal distribution and its characteristic
“bell” shape.
We can scale a normal variate to have zero mean and unit variance by
subtracting its mean and dividing by its standard deviation.
There is, however, a specific relationship between the t- and the standard
normal distribution. Both are symmetrical and centred on zero. The t-
distribution has another parameter, its degrees of freedom. We will always
know this (for the time being from the number of observations -2).
What Does the t-Distribution Look Like?
normal distribution
t-distribution
Comparing the t and the Normal Distribution
In the limit, a t-distribution with an infinite number of degrees of freedom is a
standard normal, i.e. t () N (01
,)
The reason for using the t-distribution rather than the standard normal is that
we had to estimate ,2the variance of the disturbances.
The Confidence Interval Approach
to Hypothesis Testing
3. Use the t-tables to find the appropriate critical value, which will again have T-2
degrees of freedom.
5. Perform the test: If the hypothesised value of (*) lies outside the confidence
interval, then reject the null hypothesis that = *, otherwise do not reject the null.
Confidence Intervals Versus Tests of Significance
Note that the Test of Significance and Confidence Interval approaches always give the
same answer.
Under the test of significance approach, we would not reject H0 that = * if the test
statistic lies within the non-rejection region, i.e. if
$ *
Rearranging, we would not reject if tcrit £ £ tcrit
SE ( $ )
t crit SE ( ˆ ) £ ˆ * £ t crit SE ( ˆ )
But this is just the rule under the confidence interval approach.
ˆ t crit SE( ˆ ) £ * £ ˆ t crit SE( ˆ )
Constructing Tests of Significance and
Confidence Intervals: An Example
The first step is to obtain the critical value. We want tcrit = t20;5%
Determining the Rejection Region
f(x)
-2.086 +2.086
Performing the Test
$ * ˆ t crit SE ( ˆ )
test stat
SE ( $ ) 0.5091 2.086 0.2561
05091
. 1
1917
. (0.0251,1.0433)
0.2561
Do not reject H0 since Since 1 lies within the
test stat lies within confidence interval,
non-rejection region do not reject H0
Testing other Hypotheses
Note that we can test these with the confidence interval approach.
For interest (!), test
H0 : = 0
vs. H1 : 0
H0 : = 2
vs. H1 : 2
Changing the Size of the Test
For example, say we wanted to use a 10% size of test. Using the test of
significance approach,
$ *
test stat
SE ( $ )
05091
. 1
as above. The only thing that changes 1917
.
0.2561is the critical t-value.
Changing the Size of the Test:
The New Rejection Regions
f(x)
-1.725 +1.725
Changing the Size of the Test:
The Conclusion
t20;10% = 1.725. So now, as the test statistic lies in the rejection region,
we would reject H0.
If we reject the null hypothesis at the 5% level, we say that the result of
the test is statistically significant.
The probability of a type I error is just , the significance level or size of test we
chose. To see this, recall what we said significance at the 5% level meant: it is
only 5% likely that a result as or more extreme as this could have occurred
purely by chance.
Note that there is no chance for a free lunch here! What happens if we reduce
the size of the test (e.g. from a 5% test to a 1% test)? We reduce the chances of
making a type I error ... but we also reduce the probability that we will reject the
null hypothesis at all, so we increase the probability of a type II error:
less likely
to falsely reject
Reduce size more strict reject null
of test criterion for hypothesis more likely to
rejection less often incorrectly not
reject
So there is always a trade off between type I and type II errors when choosing a
significance level. The only way we can reduce the chances of both is to
increase the sample size.
A Special Type of Hypothesis Test: The t-ratio
Since i* = 0,
$i
test stat
SE ( $i )
The ratio of the coefficient to its SE is known as the t-ratio or t-statistic.
The t-ratio: An Example
Suppose that we have the following parameter estimates, standard errors and t-
ratios for an intercept and slope respectively.
If we reject H0, we say that the result is significant. If the coefficient is not
“significant” (e.g. the intercept coefficient in the last regression above), then it
means that the variable is not helping to explain variations in y. Variables that
are not significant are usually removed from the regression model.
In practice there are good statistical reasons for always having a constant even
if it is not significant. Look at what happens if no intercept is included:
yt
xt
An Example of the Use of a Simple t-test to Test a Theory in
Finance
The Data: Annual Returns on the portfolios of 115 mutual funds from
1945-1964.
Motivation
Two studies by DeBondt and Thaler (1985, 1987) showed that stocks which
experience a poor performance over a 3 to 5 year period tend to outperform
stocks which had previously performed relatively well.
Calculate the monthly excess return of the stock over the market over a 12,
24 or 36 month period for each stock i:
Calculate the average monthly return for the stock i over the first 12, 24, or
36 month period:
1 n
Ri U it
n t 1
Portfolio Formation
Then rank the stocks from highest average return to lowest and from 5
portfolios:
Use the same sample length n to monitor the performance of each portfolio.
Portfolio Formation and
Portfolio Tracking Periods
Solution: Allow for risk differences by regressing against the market risk
premium:
where
Rmt is the return on the FTA All-share
Rft is the return on a UK government 3 month t-bill.
Is there an Overreaction Effect in the
UK Stock Market? Results
Is there evidence that losers out-perform winners more at one time of the
year than another?
To test this, calculate the difference between the winner & loser portfolios
as previously, RDt, and regress this on 12 month-of-the-year dummies:
12
RDt i Mi t
i 1
Significant out-performance of losers over winners in,
June (for the 24-month horizon), and
January, April and October (for the 36-month horizon)
winners appear to stay significantly as winners in
March (for the 12-month horizon).
Conclusions
Evidence of overreactions in stock returns.
Comments
Small samples
No diagnostic checks of model adequacy
The Exact Significance Level or p-value
If the test statistic is large in absolute value, the p-value will be small, and
vice versa. The p-value gives the plausibility of the null hypothesis.
e.g. a test statistic is distributed as a t62 = 1.47.
The p-value = 0.12.
Do we reject at the 5% level?...........................No
Do we reject at the 10% level?.........................No
Do we reject at the 20% level?.........................Yes