+part 03 - AMEFA - 2024 - Introduction and Repetition

Download as pdf or txt
Download as pdf or txt
You are on page 1of 46

AMEFA

Pär Sjölander
+Part 03 - AMEFA - 2024 - Introduction and Repetition.pptx
OLS Significance Tests 𝑌𝑌 = α + β𝑋𝑋 + ε H0:β=0 H1:β≠0
Purpose: Evaluate if relationships in a sample can be generalized to the broader population.
Hypothesis Formulation: H0: The regression coefficients are not significantly different from e.g. zero (H0:β=0),
implying no effect. H1: The regression coefficients are significantly different from zero, suggesting a relationship
(e.g. H1:β≠0).
Regression Analysis: Use a sample to estimate the relationship between variables X and Y, to make inferences
about the population. Get e.g. the estimated coefficient (β). �
Test Statistic: Compute the t-statistic or F-statistic to test these hypotheses. Note 1: For one X variable  same
info from slope t and F, since t=F0.5)=. Note 1: p-values of the t and the F test can also simplify the process – no
need to compare the test statistic with a critical value in a t or F distribution table).
Decision Rule: Compare the test statistic to the critical value from the appropriate distribution (or simply check
if the p-value is below the significance level, e.g. α=5%).
Conclusion:
- Not Reject H0: An insignificant p-value implies that the sample's variability obscures any potential effect, so
� is not distinctively different from the null hypothesis of β=0 (so we cannot reject
the estimated coefficient (β)
H0:β=0). Thus, we cannot certify a significant relationship between X and Y in such cases.
- Reject H0: A statistically significant result, indicated by e.g. a p-value below 5%, suggests a relationship
between X and Y, where changes in X are likely to affect Y, beyond random variation (and we reject H0:β=0 and
support H1:β≠0).
Note: In econometric models, lower-case (y and x) indicate observed data points, while upper-case (Y and X) refer to random variables or data sets, distinguishing between micro-level (individual points)
and macro-level (distributions or populations) analysis. However, since this standard is not always followed in research articles and text books, we use them interchangeably in this course too.
OLS Significance Tests – t-tests 𝑦𝑦 = β + β 𝑥𝑥 + β 𝑥𝑥 + β 𝑥𝑥 + β 𝑥𝑥 … + β 𝑥𝑥
0 1 1 2 2
• T-Distribution is used for Non-large Samples, due to uncertainty in estimating the population standard
3 3 4 4 k k

deviation (σ) from the regression (high variability in the population error term = use t distribution).
• In practice, usually unknown σ: The t-distribution accounts for the extra uncertainty of an unknown σ.
• When the sample size is extra small, the t-distribution takes into account that we have more
uncertainty to estimate the population variance σ of the regression.
• Convergence to Normal Distribution (for large samples). As the sample size increases, the t-distribution
approximates the normal distribution (due to the central limit theorem). T become suitable for hypothesis
testing in large samples, even if σ is unknown, due to the sampling distribution becoming approximately normal.
• β represents the true population parameter value that you're testing against (e.g. 0 or 1), and βi (with e.g.
i = 0, 1, 2, 3,… ) is the estimated coefficient from your OLS model (e.g. β0, β1, β2, β3,…).
For example, given βi = β and (i) if we e.g. want to test if the intercept is significantly different from 0 we
write β0 = 0, and (ii) if we want to test if a slope is significantly different from 1 we write e.g. β3 = 1.
The general rule:
The t distribution is used when Null Hypothesis Alternative Hypothesis Reject Null Hypothesis if:
the population variance σ is
unknown. Then we estimate �σ. A t-test can test an individual parameter, e.g. if 𝛃𝛃𝟐𝟐 = 0
𝐻𝐻0 : 𝛽𝛽𝑖𝑖 = 𝛽𝛽 𝐻𝐻1 : 𝛽𝛽𝑖𝑖 ≠ 𝛽𝛽 𝑇𝑇 > 𝑡𝑡𝛼𝛼,𝑑𝑑𝑑𝑑
If σ is known we use the normal 2
distribution (but that is rarely
the case). 𝐻𝐻0 : 𝛽𝛽𝑖𝑖 ≤ 𝛽𝛽 𝐻𝐻1 : 𝛽𝛽𝑖𝑖 > 𝛽𝛽 𝑇𝑇 > 𝑡𝑡𝛼𝛼,𝑑𝑑𝑑𝑑
σ represents the standard deviation
of the error terms. The dispersion or 𝐻𝐻0 : 𝛽𝛽𝑖𝑖 ≥ 𝛽𝛽 𝐻𝐻1 : 𝛽𝛽𝑖𝑖 < 𝛽𝛽 𝑇𝑇 < 𝑡𝑡𝛼𝛼,𝑑𝑑𝑑𝑑
spread of errors around the true
�𝑖𝑖 −𝛽𝛽
𝛽𝛽
population regression line. The lower where 𝑇𝑇 =
the better. �𝑖𝑖 )
𝑠𝑠.𝑒𝑒(𝛽𝛽
Example 3a: (Hypothesis Test)
Y = β0 + β1 X1 + 𝜖𝜖 gave the following empirical regression model 𝑦𝑦 = 2 + 1 𝑥𝑥1 + e with 𝑠𝑠. 𝑒𝑒. 𝛽𝛽̂1 = 0.365 for the slope

For the previous example (in the previous lecture notes) given 𝛼𝛼 = 0.05 (5% significance
level), with n=5, test the statistical significance of price for estimating the quantity. Conduct
a t-test.
Two-tailed test:

𝐻𝐻0 : 𝛽𝛽1 = 0

𝐻𝐻1 : 𝛽𝛽1 ≠ 0

𝛽𝛽̂1 − 𝛽𝛽1 1−0


𝑇𝑇 = = = 2.739~𝑡𝑡(5−2=3)
̂
𝑠𝑠. 𝑒𝑒. (𝛽𝛽1 ) 0.365

5 observations – 2 estimated parameter (intercept and slope). The critical value with 3
degrees of freedom is 3.182. Since 2.739 < 3.182, we DO NOT REJECT reject the null
hypothesis, meaning that the price estimate based on our data is not statistically significant
for predicting the quantity. (Note: 5 observations and 3 degrees of freedom is in practice too few)
Example 3b: (Hypothesis Test)
For the previous example, at level of significance 𝛼𝛼 = 0.05, with n=5, test the
statistical significance of price for estimating the quantity. Conduct a t-test.
Two-tailed test.

𝐻𝐻0 : 𝛽𝛽1 = 0.5



𝐻𝐻1 : 𝛽𝛽1 ≠ 0.5

𝛽𝛽̂1 − 𝛽𝛽1 1 − 0.5


𝑇𝑇 = = = 1.37~𝑡𝑡(5−2=3)
̂
𝑠𝑠. 𝑒𝑒. (𝛽𝛽1 ) 0.365

The critical value with 3 degrees of freedom is 3.182. Since 1.37 < 3.182, we
DO NOT REJECT the null hypothesis, meaning that the price estimate based
on our data is not statistically significant for predicting the quantity.
Example 3c: (Hypothesis Test)
Assume our OLS regression gives us the slope estimate β� 1 = 1.9 with the s. e. (β� 1 ) = 0.16, n=1000
For the previous example, at level of significance 𝛼𝛼 = 0.05 , test the statistical
significance of price for estimating the quantity. Conduct a t-test. Is our slope
coefficient significantly higher than 1.2? One-tailed test, or a right-tailed test.

𝐻𝐻0 : 𝛽𝛽1 ≤ 1.2


� (sometimes just the notation = instead of≤)
𝐻𝐻1 : 𝛽𝛽1 > 1.2

𝛽𝛽̂1 − 𝛽𝛽1 1.9 − 1.2


𝑇𝑇 = = = 4.375~𝑡𝑡(1000−2=998)
̂
𝑠𝑠. 𝑒𝑒. (𝛽𝛽1 ) 0.16

The critical value with 998 degrees of freedom is 1.645. Since 4.375 > 1.645, we
REJECT the null hypothesis, meaning that the price estimate based on our data is
statistically significant for predicting the quantity.
Note that a One-tailed test, gives more statistical power than a Two-tailed test. Thus, there is a higher probability of rejecting a false null hypothesis. However, sometimes we are just interested if the estimated parameter from our regression is statistically
different (and e.g. not statistically lower). However, it's important to note that the higher power in a one-tailed test is achieved at the expense of a higher risk of a type 2 error (false negative) in the opposite direction.
Example 3d: (Hypothesis Test)
Assume our OLS regression gives us the slope estimate β� 1 = − 0.92 with the s. e. (β� 1 ) = 0.1, n=1000
For the previous example, at level of significance 𝛼𝛼 = 0.05 , test the statistical
significance of price for estimating the quantity. conduct a t-test. Is our slope
coefficient significantly lower (more negative) than -0.5? One-tailed test, or a left-
tailed test.
𝐻𝐻0 : 𝛽𝛽1 ≥ −0.5
� (sometimes just the notation = instead of ≥)
𝐻𝐻1 : 𝛽𝛽1 < −0.5

𝛽𝛽̂1 − 𝛽𝛽1 −0.92 − (−0.5)


𝑇𝑇 = = = −4.2~𝑡𝑡(1000−2=998)
̂
𝑠𝑠. 𝑒𝑒. (𝛽𝛽1 ) 0.1

The critical value with 998 degrees of freedom is -1.645. Since −4.2 < −1.645 (or
Ab𝑠𝑠 −2.2 > 1.645), we REJECT the null hypothesis, meaning that the price
estimate based on our data is statistically significant for predicting the quantity.
OLS Confidence Intervals
•Point estimation: the most likely value of the parameter, but it will not be exactly
accurate in a single case when we use e.g. x-bar as a point estimator of µ
(Fishing with a hook or spearpoint estimate doesn't provide information about the precision or reliability
of the estimate — it's precise but doesn't give an idea of the variability around that point estimate)
•Interval estimation: a range of values with the known likelihood of capturing the
parameter, i.e., a confidence interval (CI). The width of the confidence interval gives
an indication of the uncertainty, precision or variability of the estimated parameter.
(Fishing with a net, interval estimation gives a range where the parameter is likely to be found, which
incorporates the concept of uncertainty in the estimation process)
•A confidence interval (CI) is a range of values, derived from
the sample statistics, that is likely to contain the value of an
unknown population parameter. ”The confidence level (e.g. 95%),
associated with a confidence interval, indicates the long-run
percentage of such intervals, that would include the parameter
if the same process were repeated numerous times.”

95% of these CIs


are expected to
A 95% confidence interval [10;15] indicates that if we were to repeat the sampling many
cover the true
times, we expect about 95% of such intervals to contain the true parameter value. It does
not mean there's a 95% chance that the parameter lies within this specific interval [10;15]. mean µ
OLS Confidence Intervals
The 1 − 𝛼𝛼 × 100% (e.g. (1 − 0.05) × 100 = 95%) confidence interval of the regression parameter
estimators is

𝛽𝛽̂𝑖𝑖 ∓ 𝑡𝑡𝛼𝛼 × 𝑠𝑠. 𝑒𝑒 𝛽𝛽̂𝑖𝑖 which is the same as β�𝑖𝑖 − 𝑡𝑡α × 𝑠𝑠. 𝑒𝑒 β�𝑖𝑖 , β�𝑖𝑖 + 𝑡𝑡α × 𝑠𝑠. 𝑒𝑒 β�𝑖𝑖
2 2 2

where β�𝑖𝑖 is the estimated value of the i-th parameter from the regression, where 𝑡𝑡α is s the critical value
2
α
from the t-distribution that corresponds to the 1 − -th quantile where α is the significance level (for
2
example, if we want a 95% confidence interval, α would be 0.05). With e.g. 𝑛𝑛 − 2 degrees of freedom
because in a simple linear regression, two parameters (slope and intercept) are estimated. 𝑠𝑠. 𝑒𝑒 β�𝑖𝑖 is the
standard error of the estimate of β𝑖𝑖 (e.g. β0 , β1 , or… ) which measures the variability of the estimator.

If we were to repeat this process, then 95% of the time the interval (let’s say, 0.65 to 0.73) would
contain the true population proportion.
This means that if you have 100 intervals, 95 of them will contain the true proportion, and 5% will not.
Example 4: (Confidence Intervals)
Confidence intervals (CIs) offer several advantages over p-values and t-statistics:

1. Effect Size and Direction: CIs indicate the range and direction of an effect, which
p-values do not.
2. Precision of Estimate: CIs reflect the precision of the estimate; wider intervals
mean less precision (where high precision implies high efficiency = low variance of
the estimator indicated by the s.e.(β).
3. A value range is more informative: CIs provide a range of values, offering more
information for decision-making than a point estimate saying that the weather i 0
degrees Celsius – but with a confidence interval of [-40;40].
4. Visual Interpretation: CIs can be graphed, offering a visual comparison of
estimates.

All these approaches have their pros and cons:


CIs complement p-values by providing a richer context for interpreting statistical
results – but sometimes p-values (or t-statistics) take up less space (which is good)
when presented in research papers).
OLS Confidence Intervals • This is not the probability that the true value is in the CI
Confidence intervals (CIs) and the sampling distribution (this makes it seem as if the population mean is variable, but
of the sample mean 𝒙𝒙
�:
it’s not. This interval either captured the mean or didn’t.
• Five Separate CIs: Below the probability curve, there are Intervals change from sample to sample, but the population
five horizontal lines, each representing a confidence
interval from a different sample. parameter we’re trying to capture does not).
• The curve signifies the probability distribution of sample
means (𝑥𝑥)̅ if we were to take multiple samples from a • The long-run proportion of CIs that will contain the true
population (usually assumed to follow a normal value if we were to repeat the sample many times.
distribution for OLS).
• There is a sample-to-sample variability, in this example, • Important that our CIs are not too wide – then they can be
all except the third CI captured the population mean (μ). irrelevant.
• Let’s assume that tomorrow’s weather is forecasted to 80F
(27C), but with a confidence intervale of ∓ 120 degrees, that
is 80 ∓ 120 or [-40;200] in Fahrenheit ([-40;93] in Celcius).
• This is pretty pointless with this large interval, even if is
correct and unbiased. (Note that -40C=-40F)
OLS Confidence Intervals
OLS Confidence Intervals (CI)
(Here is an alternative with more options but we stick to the
above-mentioned link https://www.statcrunch.com/applets )

You can play around with this online application https://digitalfirst.bfwpub.com/stats_applet/stats_applet_4_ci.html


to see how the confidence intervals are affected by certain factors
Note: we know that Ho: μ=0 is true
• Confidence Level (C): 95% (theoretical coverage)
• Sample Size (n): 10
• So 10 brown observations are shown for each CI
Based on each set of 10 observations a CI is created
• Then we can compare the theoretical coverage
(e.g. 95%) of the Confidence Level, with the
empirical coverage (83% in this case so far).
• Note that the empirical coverage of the second
CI does not cover the true population value μ.
For that CI we would reject the true null
hypothesis, which implies a false negative
decision or a type-1 error.
• How often are we making a type-1 error (not
having empirical coverage of μ)? It should be close
to the theoretical coverage of μ which should be
95% (in this case). Click: SAMPLE
OLS Confidence Intervals
Demo 1: Confidence Level (C): 95% (theoretical coverage)
Demo 2: In repeated infinite samples, the proportion that
captures the true parameter will more closely align with the
theoretical confidence level due to the law of large numbers.
Sample Size (n): 5 Confidence Level (C): 95% (theoretical coverage)
After 25 CIs (high randomness due to sampling variability) Sample Size (n): 5
After 100 CIs we have 95 correct of 100 = 95/100=95%
we have 22 correct of 25 = 22/25=88% (empirical coverage), empirical coverage, 5 false negatives (appr. what we could
3 false negative, more than we expected, but only 25 expect) for a 95% confidence interval).
observations, so what happens if we increase the # CIs?

Note: In practice, we only


compute one CI per dataset.
For a single CI (or just 25 CI), we Repeated sampling with 95% CIs
cannot be sure, but with repeated is expected to include the true
trials, theoretically, the confidence parameter in 95% of cases, but
interval should cover the true this is not assured for any single
interval. Thus, not a 95%
population mean in 95% of cases probability of covering the true μ.
OLS Confidence Intervals
Confidence Level (C): 95% (theoretical coverage)
Confidence Level (C): 95% (theoretical coverage)
Sample Size (n): 250
After 100 CIs we have 95 correct of 100 = 95/100=95%
Sample Size (n): 5 (empirical coverage). Thus, the higher the sample in each
After 100 CIs 96% empirical coverage of the true CI (n=250 here), the higher the estimator precision
population parameter μ by the CIs. However, low sample (lower standard errors).
size (n=5) decreases estimator precision, and increases As n increases (the number of observations)
the standard errors  statistical power increase  Type 2 errors go down.
OLS Confidence Intervals
Confidence Level (C): 90% (theoretical coverage)
Confidence Level (C): 99% (theoretical coverage)
Sample Size (n): 30
Sample Size (n): 30 Wider confidence intervals
Narrower confidence intervals Thus, the higher the confidence level, the lower the
When decreasing the confidence level, we have lower precision and the higher the standard error of the
standard errors, we get more type-1 errors, but with the estimates. When increasing the confidence level, we get
gain of fewer type-2 errors. fewer type-1 errors, but at the cost of more type-2 errors.
Example 4: (Confidence Intervals)
If we use data from Ex. 3a, find the 95% confidence interval of the regression
coefficients:

95% C. I of 𝛽𝛽1 : 𝛽𝛽̂1 ∓ 𝑡𝑡𝛼𝛼 × 𝑠𝑠. 𝑒𝑒 𝛽𝛽̂1 = 1 ∓ 3.182 × 0.365 = [−0.162, 2.162]
2

The 95% confidence interval (around 𝛽𝛽̂1 =1) is [1 ± 3.182 × 0.365], which simplifies
to [−0.162, 2.162].

This interval means that with 95% confidence, in repeated samples, the true value
of β1 lies approximately somewhere between −0.162 𝑡𝑡𝑡𝑡 2.162. Since the interval
includes 0, we cannot say with 95% confidence that β1 is different from 0; hence, it
is not statistically significant at the 5% level. Thus hypothesis testing is possible
with CIs.
ANOVA in OLS Regression (Analysis of Variance)
• Recall the total sum of squares partitioning into its components as
𝑇𝑇𝑇𝑇𝑇𝑇 = 𝐸𝐸𝐸𝐸𝐸𝐸 + 𝑅𝑅𝑅𝑅𝑅𝑅
• Under iid error normality assumption, 𝐸𝐸𝑆𝑆𝑆𝑆 (Explained Sum of Squares)
and 𝑅𝑅𝑆𝑆𝑆𝑆 (Residuals Sum of Squares) are independent of each other,
each with a Chi-Square distribution.
𝐸𝐸𝑆𝑆𝑆𝑆~χ2(𝑑𝑑𝑑𝑑=1)

𝑅𝑅𝑆𝑆𝑆𝑆~χ2(𝑑𝑑𝑓𝑓=𝑛𝑛−2)

𝐸𝐸𝑆𝑆𝑆𝑆/1
𝐹𝐹 = ~𝐹𝐹(1, 𝑛𝑛 − 2)
𝑅𝑅𝑆𝑆𝑆𝑆/(𝑛𝑛 − 2)
• The test statistic 𝐹𝐹 has an F distribution with degrees of freedom
(𝑑𝑑𝑑𝑑1 = 1, 𝑑𝑑𝑑𝑑2 = 𝑛𝑛 − 2).
ANOVA
Gujarati’s Output
Table in OLS Regression SS
There are different notations in
different statistical textbooks – in fact, it
S.O.V df
(Sum of
MS F is a confusing mess…
(Source of Variation) (Degrees of Freedom) (Mean Squared) (F Test Statistic)
Squares) ESS=Explained Sum of Squares =
Due to Regression
Regression coefficients
= SSR=Sums of Squared Regression
(ESS, Explained Sum of
(excluding the intercept)
ESS MSR=ESS/df F=MSR/MSE ---
Squares)
RSS=Residual Sum of Squares =
Due to Residual Sample size minus = SSE=Sums of Squared Error
(RSS, Residual Sum of number of regression RSS MSE=RSS/df - ---
Squares) parameters
TSS=Total Sum of Square =
Total Sum (TSS) Sample size minus 1 TSS - - = SST=Sum of Squares Total)
Standard Output in most books (and other notations too… so it is best just to accept this
and get used to many different notations unfortunately)
(SSR, Sum of
Squares Regression) You can choose to use any of these – but you
(SSE, Sum of must learn both of them because the second
Squares Error) notation (NOT USED IN GUJARATI) is more
(SST, Sum of common than Gujarati’s notation.
Squares Total)
•The ANOVA (Analysis of Variance) has relevance
Stata’s Output ≠ Gujarati’s Output for the understanding how of the variation in the
dependent variable can be explained by the
independent variables included in the model.
•These all lead to the F test: The F test is a test of
the overall effect of the explanatory variables on the
dependent variable, i.e., it is a test for model
significance. It is a test that compares the fit of the
predicted model to a null model (a model only with
EViews is not automatically printing out ANOVA tables for OLS regressions intercept).
F Test in Simple Linear Regression
• ANOVA test in simple linear regression is the same as the t test of slope. The
hypotheses become
𝐻𝐻0 : 𝛽𝛽1 = 0

𝐻𝐻1 : 𝛽𝛽1 ≠ 0

• The null hypothesis states that the slope is zero, which means the predicted
model is not significantly better than the null model with only an intercept.

• In fact, with only one explanatory slope parameter, there is no need of a F test.
The F test statistics becomes the square of the t test statistic, due to the fact
that (𝑡𝑡𝑑𝑑𝑑𝑑 )2 ≡ 𝐹𝐹1,𝑑𝑑𝑑𝑑 .

• However, with more slopes in the regression model, it does not hold anymore.
Example 5: (F Test & ANOVA Table)
Example 5: (F Test & ANOVA Table)

n = 40 = #observations
k = slope coefficients estimated MSR=Mean Squared Regression
1 (in n-k-1) is the intercept MSE=Mean Squared Error

SS Regression = 56.8
(since SST-SSE = 150.3-93.5=56.8)

SSR = SS Regression, SSE= SS Error, SST = SS Total


Example 5: (F Test & ANOVA Table)

Mean Square Regression = 11.36


(since SSR/k = 56.8/5=11.36
where 5= # of independent variables )
Example 5: (F Test & ANOVA Table)

n = 40 = #observations
k = slope coefficients estimated
1 (in n-k-1) is the intercept

Mean Square Error = 2.75, since:


Mean Square Error = SS Error/(n-k-1) =
SSE/(n-k-1) = 93.5/(40-5-1) = 2.75
Example 5: (F Test & ANOVA Table)

F = 4.13 since,
Mean Square Regression / Mean Square Error =
MSR / MSE = 11.36 / 2.75 = 4.13
Example 5: (F Test & ANOVA Table)
Regression (part of the total
variation explained by the
model). The degrees of freedom for
SSR = SS Regression indicates how well the regression model fits the data (the
closeness of the data to the regression line) - and MSR is the average SSR.
regression is the number of independent
variables in the model. In this case, there All these measures lead
are 5, suggesting there are five predictors to the F and R2 values –
used in the regression model. that both should be as
high as possible
Residual - the part of the
total variation not explained
by the model.
while SSE = SS Error represents the variability around the regression
Differences between line (the distance of the data points from the regression line) - and
Regression and Residual in
the ANOVA are reflected by MSE is the average SSE.
high values R2 and F

SS Regression / SS Total =
SSR / SST

indicates how much that


the regression model can
explain of the total
variation in the dependent
variable

- which is the same as R2


Example
• The more variables5: (F Test & ANOVA Table)
we include in a
multiple regression
model (even irrelevant
variables) model – the
higher R2 will be.
• Cannot compare
models with different
number of explanatory
variables.
 Remedy Adj(R2)

Adjusted R2
• Useful model selection measure
• Adjusts for the fact that R2 increases
for every added extra explanatory
variable in a regression model.
• Adjusted R2 can be used to compare
the fit for models with different
number of explanatory variables.
• Choose the model with the highest Note: However, Adj(R2) is just a model
Adjusted R2. Then you can use selection tool so it does not have an
standard R2 if you want to interpret interpretation in terms of how much
that selected model. of the total variation is explained.
Example 5: (F Test & ANOVA Table)

• The square root of the SS Error / (n-k-1) is the standard


error of the estimate.
• This is the magnitude of a typical deviation from the
estimated regression line (an estimate of the true sigma for
the population that is the dispersion of error around the
true regression line).
• This is a measure of the accuracy with which the
regression model predicts the dependent variable. The
lower the better.
Example 5: (F Test & ANOVA Table)

n = 40 = #observations
k = slope coefficients estimated
1 (in n-k-1) is the intercept
F(k, n-k-1)=F(5,40-5-1)=F(5,34)
• In the table, all 5% in one tail for an F-test – never divide alpha by 2.
• The table in Gujarati is not covering all numbers but F(5,30)=2.53 and
F(5,40)=2.45… so the mean is (2.45+2.53)/2 ≈ 2.5
(thus this division is due to incomplete sets of numbers in the F-table in Gujarati)
Example 5: (F Test & ANOVA Table)

• There is a significant relationship between at least one of


the independent variables and the dependent variable.

• This does not show which one but the t-tests from the
regression coefficients may guide us on that.
Distribution of OLS Estimators
• When the regression error terms identically and independently are
normally distributed as 𝑢𝑢𝑖𝑖 ~𝑁𝑁(0, 𝜎𝜎 2 ), the OLS estimators 𝛽𝛽̂0 and 𝛽𝛽̂1 are
also normally distributed, if 𝜎𝜎 2 is known (which it is usually not).

• Using estimated 𝜎𝜎� 2 instead of 𝜎𝜎 2 (usually not known) itself enables us to


use t distribution for the OLS estimators 𝛽𝛽̂0 and 𝛽𝛽̂1 as shown below.

𝛽𝛽̂0 − 𝛽𝛽0 𝛽𝛽̂1 − 𝛽𝛽1


𝑇𝑇 = ~𝑡𝑡(𝑛𝑛−2) , 𝑇𝑇 = ~𝑡𝑡(𝑛𝑛−2)
𝑠𝑠. 𝑒𝑒. (𝛽𝛽̂0 ) 𝑠𝑠. 𝑒𝑒. (𝛽𝛽̂1 )

1 𝑛𝑛
�2 =
σ � 𝑦𝑦𝑖𝑖 − 𝑦𝑦�𝑖𝑖 2
𝑛𝑛 − 𝑝𝑝 𝑖𝑖=1
Sigma (σ) represents the standard deviation of the error terms. It is a measure of the dispersion or spread of errors around the regression line. The lower the better and can be estimated this way
Error Normality Assumption
• Hypotheses tests and confidence intervals for the OLS regression
parameters are based upon normally distributed error terms.
Therefore, the validity of those inferences relies on error normality,
which should be checked.

• There are many tests of normality, like; Jarque-Bera, Shapiro–Wilk


and Anderson–Darling, to mention a few. We will mainly use Jarque-
Bera in the computer labs.
Some arbitrary exercises
Hypothesis Testing –The CAPM
• Monthly data in the file capm.wf1/dta
• Note that it is standard to employ 5 years of monthly data for estimating betas but let
us use all of the observations (over 10 years) for now.
• The monthly stock prices of 1 companies (Ford) will appear as objects, along with index
values for the S&P500 (‘sandp’) and three-month US-Treasury bills (‘ustb3m’).
• To estimate a CAPM equation for the Ford stock, for example, we need to first
transform the price series into returns and then the excess returns over the risk-free
rate. To transform the series (use this: series RSANDP=100*DLOG(SANDP)) or
series RSANDP=100*LOG(SANDP/SANDP(-1)) 'RSANDP=log-return of S&P500
series RFORD=100*LOG(FORD/FORD(-1)) 'RFORD=log-return of FORD

• Now, transform the returns into excess returns. Now, to compute the excess returns.
USTB3M = T-bills (monthly), and RSANDP=Log-Return of S&P500. Where excess return =
* In Stata - we use dateid01 as the time-series identifier
ERSANDP tsset dateid01
gen rsandp = 100*log(sandp / sandp[_n-1]) if _n > 1
series ERSANDP=RSANDP-USTB3M gen rford = 100*log(ford / ford[_n-1]) if _n > 1
series ERFORD =RFORD-USTB3M gen ersandp = rsandp-ustb3m
gen erford = rford-ustb3m
Hypothesis Testing –The CAPM capm.wf1/dta

• Plot the series


We can use the menu system. Object/New Object menu on the menu bar. Select
Graph, provide a name (call the graph Graph1) and then in the new window
provide the names of the series to plot. In this new window, type
ERSANDP ERFORD
* In Stata:
Plot of Excess Returns for the S&P500 and Ford tsline ersandp erford
100

50

-50

-100
01jan200201jan200401jan200601jan200801jan201001jan201201jan201401jan201601jan2018
dateid01
Hypothesis Testing –The CAPM
• Scatterplot ERSANDP and ERFORD. Highlight them (using the Ctrl key). Right-
click. Open/as Group/View/Graph/Scatter. Choose: Regression Line in the drop-
down menu to the right of “Fit lines”.
• There appears to be a weak positive association between ERSANDP and ERFORD.
Close the window of the graph and return to the workfile window.
* In Stata:
100
twoway (scatter erford ersandp) (lfit erford ersandp)

50

erford
0
Fitted values

-50

-100
-20 -10 0 10
ersandp
Hypothesis Testing –The CAPM * In Stata:
reg erford ersand
• Estimate the CAPM equation
LS ERFORD c ERSANDP
• The regression equation takes the form
The beta coefficient (the slope
(RFord − rf )t = α + β(RM − rf )t + ut coefficient) estimate is 1.889. The p-
Thus, if β=0, then (RFord − rf )t = α value of the t-ratio is 0.0000, signifying
that the excess return on the market
proxy has highly significant explanatory
power for the variability of the excess
Note: returns of Ford stock.
tests
HO: β=0 The intercept estimate is not
statistically significant
R2 is not very high. ERSANDP explains
about 34% of the total variation in
ERFORD.
Hypothesis Testing –The CAPM
• How could the hypothesis that the value of the population coefficient is equal to
1 be tested? View/Coefficient Diagnostics/Wald Test/Coefficient Restrictions.
and then in the box that appears (we test if the second estimatated coefficient is
equal to 1 (not zero).
C(2)=1
Note: tests
HO: β=1
Hypothesis Testing –The CAPM
• The conclusion here is that the null hypothesis that the CAPM beta of Ford stock
is 1 is convincingly rejected and hence the estimated beta of 1.889 is
significantly different from 1.
• This is hardly surprising given the distance between 1 and 1.889.
• All p-values are <0.05. Therefore, reject H0 : β = 1
* In Stata:
reg erford ersandp
Note: tests * and where we test if the parameter for ersandp is = 1
HO: β=1
test (ersandp = 1)
Hypothesis Testing –The CAPM
� = 𝟏𝟏. 𝟖𝟖𝟖𝟖𝟖𝟖
• If the beta is 𝜷𝜷
This indicates that for every 1% increase (or decrease) in the market's excess return, the
asset's excess return is expected to increase (or decrease) by approximately 1.889%,
holding all else constant.
• A β=1 suggests that the asset has the same systematic risk as the market; that is, it
moves in the same proportion as the market.
• The beta coefficient (the slope coefficient) in the CAPM equation is estimated to be
1.889. This value is significantly higher than 1, which indicates that the asset has a high
level of systematic risk.
• Specifically, it suggests that the asset is expected to perform 88.9% better than the
market in up markets and 88.9% worse in down markets, on average. This high beta
implies that the asset is quite sensitive to market movements and thus is considered
to be a high-risk investment. Investors would demand a higher expected return for
holding such an asset to compensate for this higher risk.
Hypothesis Testing –The CAPM
• If the intercept is -0.955984 (but it is not The intercept is not statistically significantly
different from 0, indicating that it could be zero).
• The Alpha (α): In CAPM, alpha represents the stock's expected return that is not
explained by its beta with the market. It's the intercept of the regression equation. A
non-significant alpha suggests that the stock's returns are well explained by its beta
with the market.

Note: the R2 is only 33% - which makes this relationship not very reliable

You might also like