Module 5 - Anova

Download as pdf or txt
Download as pdf or txt
You are on page 1of 7

ADVANCED STATISTICS

Module 5: ANOVA

I. INTRODUCTION TO ANALYSIS OF VARIANCE (ANOVA)

A. The Analysis of Variance (ANOVA)

ANOVA is a versatile test which is widely used for comparing the significance of
the differences between two or more independent means. It helps us to
determine whether the variances or differences are due to chance alone or
sampling error or as a result of the effects of the independent variable on the
dependent variable.

Note that the general rationale for the use of ANOVA is that the total variance of
all the scores or data in an experimental study can be separated and attributed
to two sources. These sources are variance between groups and variance within
groups.

i. Variance within groups reflects the spread of scores or data within each
of the number of groups. It represents differences among subjects that
have nothing to do with the independent variable in the experiment. It is
sometimes called error variance.

ii. Variance between groups reflects the magnitude of the difference


between the number of group means. It may be due to the effect of the
independent variable or just a function of chance.

B. Sum of Squares (SS)

The sum of squares is the basic ingredients of the ANOVA procedure. It is the
measure of variability which is analyzed here. It is the total of the squared
differences or the deviations between a set of individual scores and a mean.

Total Sum of Squares (SSt)

It refers to the sum of squares of the deviations of each of the observations from
the grand mean. The mean of all the scores taken together as a group is called
the grand mean (Xt).

1|P a g e
The total sum of squares is given by:
N
SSt = ∑(Xi - Xt )2 or (SSw + SSb)
i=1

where
X - an individual score
X t - the mean of all the scores (grand mean).
N – total number of observations

Sum of Squares within Groups (SSw)

It is a component of the total sum of squares that can be calculated. It is a basic


component of the error term. It is not related to any difference in treatment. It
can be found by calculating the deviation of each individual score in each group
from the mean of its own group and then squaring and adding up the squared
deviations.

It is given by
n1 n2 nk
SSw = ∑(Xi - Xn1)2 + ∑(Xi - Xn2)2 + …..+ ∑(Xi - Xnk)2
i=1 i=1 i=1

where
Xn - the mean of the nth group, n = 1 to k
Xn – an individual score in the nth group, k =1 to n
k – is the number of groups
nk – the number of observations in kth group;
N = total number of observations; n1+ n2 + n3 +…. + nk

Sum of Squares between Groups (SSb)

This refers to the variability from group to group that is also a component of the
total sum of squares. It is the variation that may be due to the experimental
treatment. It is derived by computing the sum of squares of the deviations of
each separate group mean from the grand mean.

It is given by:
n

SSb = ∑nk (Xk– Xt)2


k=1

2|P a g e
where:
nk– number of observations in the kth group
Xn – mean of the kth group
Xt = grand mean

C. Mean Squares (MS)

The sum squares between and within can be used to describe estimates of the
population variance in ANOVA. These are then used to verify the null hypothesis.
If the SSb and SSw are divided by their appropriate degrees of freedom in order
to obtain the estimates of variance that are needed, the two variance estimates
are called mean squares.

Mean Square within Groups (MSw)

This is the estimate of the variance derived from within groups’ data.

It is given by:

MSw = SSw = SSw = ∑(nk-1)(s2k)


dfw N-K N-K

where
SSw - sum of squares within groups; s2k – variance of the kth group
N - total number of scores (n= total per group)
K - the number of groups
dfw – degrees of freedom within groups; equal to N - K

Mean Square Between Groups (MSb)

This is the estimate of the variance between groups.

It is given by the formula:

MSb = SSb = SSb


dfb K-1
where
SSb - sum of squares between groups

3|P a g e
N - total number of scores
K - the number of groups
dfb – degrees of freedom between groups; equal to K-1

D. The F Ratio (test statistic)

F can be defined as the ratio of F = MSb


MSw

The number in this F-ratio can be influenced by the observed differences


between the groups, while the denominator represents the error term since it is
derived from variation within groups.

This F-ratio is used for the comparison of the estimate of the population variance
derived from the sample between-groups data MSb, to the estimate of the
population variance estimate derived from the sample within-groups data, MSw.

Note that as the difference between the groups’ increases, the F-ratio increases.
In this test, ANOVA, the null hypothesis that is verified is that the sample means
being composed with the F-ratio are not different from what is expected from
random samples from the same population. It means therefore that if the null
hypothesis is true, then the variance estimate based on the differences between
groups and the variance estimate based on the difference within groups will
tend to be about the same. This is because both of them are estimates of the
same common population variance. Therefore, any difference in the two mean
squares would be the result of random variation. You will thus expect the ratio of
MSb/MSw to be about 1.

When there is a genuine difference between the groups, the MSb or the variance
estimate derived from the variance of groups means around the grand mean is
markedly greater than the mean square within groups or the variance estimate
derived from the variation of scores within each group, and F will be
considerably greater than 1.

As the difference between the mean squares increases, the F-ratio increases and
the probability that the null hypothesis is correct, decreases. It is only when the
values of F are greater than I that they would be considered as evidence against
the null hypothesis.

4|P a g e
II. THE F-TEST ON ONE-WAY ANOVA

Null and Alternative Hypothesis

Ho: µ1 = µ2 = … =µk (there is no difference among the groups)

Ha: µ1 ≠ µ2 ≠ …. ≠ µk (at least one group is not equal to the other groups)

Test Statistic

F = MSb / MSw

If the K populations are normally distributed with a common variance and if H 0 : μ1


= · · · = μK is true then under independent random sampling F approximately follows
an F-distribution with degrees of freedom df1 = k−1 and df2 =N − k.

1. Calculate the grand mean (X)


2. Calculate the group means (X1, X2…Xk)
3. Calculate the group variances (S12, S22…S2K)
4. Calculate the MSt.
5. Calculate the MSw
6. Find the F ratio.
7. Fill-in the ANOVA table:
Source of Sum of df Mean of F p
Variation Squares Squares
Between Groups SSb k-1 MSb MSb/MSw
Within Groups SSw N-k MSw
Total SSt

Rejection Region

Reject Ho if F > Fα, where α is the level of significance

Example
The average of grade point averages (GPAs) of college courses in a specific major is a measure
of difficulty of the major. An educator wishes to conduct a study to find out whether the
difficulty levels of different majors are the same.

For such a study, a random sample of major grade point averages (GPA) of 11 graduating
seniors at a large university is selected for each of the four majors: Mathematics, English,
Education, and Biology.

5|P a g e
Test, at the 5% level of significance, whether the data contain sufficient evidence to conclude
that there are differences among the average major GPAs of these four majors.

Math (X1) English (X2) Educ (X3) Biology (X4)

1 2.59 3.64 4.00 2.78


2 3.13 3.19 3.59 3.51
3 2.97 3.15 2.80 2.65
4 2.50 3.78 2.39 3.16
5 2.53 3.03 3.47 2.94
6 3.29 2.61 3.59 2.32
7 2.53 3.20 3.74 2.58
8 3.17 3.30 3.77 3.21
9 2.70 3.54 3.13 3.23
10 3.88 3.25 3.00 3.57
11 2.64 4.00 3.47 3.22

(total) 31.93 36.69 36.95 33.17
mean 2.90 3.34 3.36 3.02
variance 0.19 0.15 0.23 0.16

Excel function to compute the sample variance: var.s

Sample variance (Grp1) = [(2.59-2.90)2 + (3.13-2.90)2 +…….+ (2.64-2.90)2 0]/(11-1) = 0.19

Grand mean X = (31.93 + 36.69 + 36.95 + 33.15)/44 (N =44)

= 3.15

MSb = ∑nk (Xk– Xt)2 / K-1

= [11(2.90-3.15)2 + 11 (3.34-3.15)2 + 11 (3.36-3.15)2 + 11 (3.02-3.15)2]/ 3


= 1.7305 / 3
= 0.5768

MSw = ∑(nk-1)(s2k) / N-K


= [10 (0.19) + 10(0.15) + 10(0.23) + 10(0.16) ]/ 40
= 7.2123 / 40
= 0.1803

F = 0.5768/0.1803
= 3.20

Critical Value: Fα = 2.84, α=0.05

6|P a g e
Source of Sum of df Mean of F p
Variation Squares Squares
Between Groups 1.7305 3 0.5768
Within Groups 7.2123 40 0.1803 3.20 0.05
Total 8.9428

Excel / Data / Data Analysis / ANOVA single factor


1. Highlight the data
2. Check your level of significance (alpha)
3. ok

Anova: Single Factor

SUMMARY
Groups Count Sum Average Variance
Math 11 31.93 2.90272727 0.188061818
English 11 36.69 3.33545455 0.147587273
Educ 11 36.95 3.35909091 0.228909091
Biology 11 33.17 3.01545455 0.156667273

ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 1.7305 3 0.57683333 3.199184553 0.033451395 2.838745398
Within Groups 7.21225455 40 0.18030636

Total 8.94275455 43

Decision Point: Reject Ho if F > Fα (F-critical)


Reject Ho if p-value < α

Decision: Reject Ho.

Conclusion: The data provide sufficient evidence, at the 5% level of significance, to conclude
that the averages of major GPAs for the four majors considered are not all equal.

Parametric Test – distribution is known


No. of Sample Size Resulting Test Test Statistic
Groups Distribution
1 At least 30 Standard Normal One sample z-test Z
1 Less than 30 Student’s t One sample t-test T
2 At least 30 Standard Normal Two sample z-test Z
2 Less than 30 Student’s t Two sample t-test T
3 or more Any F-distribution ANOVA F

7|P a g e

You might also like