Module 5 - Anova
Module 5 - Anova
Module 5 - Anova
Module 5: ANOVA
ANOVA is a versatile test which is widely used for comparing the significance of
the differences between two or more independent means. It helps us to
determine whether the variances or differences are due to chance alone or
sampling error or as a result of the effects of the independent variable on the
dependent variable.
Note that the general rationale for the use of ANOVA is that the total variance of
all the scores or data in an experimental study can be separated and attributed
to two sources. These sources are variance between groups and variance within
groups.
i. Variance within groups reflects the spread of scores or data within each
of the number of groups. It represents differences among subjects that
have nothing to do with the independent variable in the experiment. It is
sometimes called error variance.
The sum of squares is the basic ingredients of the ANOVA procedure. It is the
measure of variability which is analyzed here. It is the total of the squared
differences or the deviations between a set of individual scores and a mean.
It refers to the sum of squares of the deviations of each of the observations from
the grand mean. The mean of all the scores taken together as a group is called
the grand mean (Xt).
1|P a g e
The total sum of squares is given by:
N
SSt = ∑(Xi - Xt )2 or (SSw + SSb)
i=1
where
X - an individual score
X t - the mean of all the scores (grand mean).
N – total number of observations
It is given by
n1 n2 nk
SSw = ∑(Xi - Xn1)2 + ∑(Xi - Xn2)2 + …..+ ∑(Xi - Xnk)2
i=1 i=1 i=1
where
Xn - the mean of the nth group, n = 1 to k
Xn – an individual score in the nth group, k =1 to n
k – is the number of groups
nk – the number of observations in kth group;
N = total number of observations; n1+ n2 + n3 +…. + nk
This refers to the variability from group to group that is also a component of the
total sum of squares. It is the variation that may be due to the experimental
treatment. It is derived by computing the sum of squares of the deviations of
each separate group mean from the grand mean.
It is given by:
n
2|P a g e
where:
nk– number of observations in the kth group
Xn – mean of the kth group
Xt = grand mean
The sum squares between and within can be used to describe estimates of the
population variance in ANOVA. These are then used to verify the null hypothesis.
If the SSb and SSw are divided by their appropriate degrees of freedom in order
to obtain the estimates of variance that are needed, the two variance estimates
are called mean squares.
This is the estimate of the variance derived from within groups’ data.
It is given by:
where
SSw - sum of squares within groups; s2k – variance of the kth group
N - total number of scores (n= total per group)
K - the number of groups
dfw – degrees of freedom within groups; equal to N - K
3|P a g e
N - total number of scores
K - the number of groups
dfb – degrees of freedom between groups; equal to K-1
This F-ratio is used for the comparison of the estimate of the population variance
derived from the sample between-groups data MSb, to the estimate of the
population variance estimate derived from the sample within-groups data, MSw.
Note that as the difference between the groups’ increases, the F-ratio increases.
In this test, ANOVA, the null hypothesis that is verified is that the sample means
being composed with the F-ratio are not different from what is expected from
random samples from the same population. It means therefore that if the null
hypothesis is true, then the variance estimate based on the differences between
groups and the variance estimate based on the difference within groups will
tend to be about the same. This is because both of them are estimates of the
same common population variance. Therefore, any difference in the two mean
squares would be the result of random variation. You will thus expect the ratio of
MSb/MSw to be about 1.
When there is a genuine difference between the groups, the MSb or the variance
estimate derived from the variance of groups means around the grand mean is
markedly greater than the mean square within groups or the variance estimate
derived from the variation of scores within each group, and F will be
considerably greater than 1.
As the difference between the mean squares increases, the F-ratio increases and
the probability that the null hypothesis is correct, decreases. It is only when the
values of F are greater than I that they would be considered as evidence against
the null hypothesis.
4|P a g e
II. THE F-TEST ON ONE-WAY ANOVA
Ha: µ1 ≠ µ2 ≠ …. ≠ µk (at least one group is not equal to the other groups)
Test Statistic
F = MSb / MSw
Rejection Region
Example
The average of grade point averages (GPAs) of college courses in a specific major is a measure
of difficulty of the major. An educator wishes to conduct a study to find out whether the
difficulty levels of different majors are the same.
For such a study, a random sample of major grade point averages (GPA) of 11 graduating
seniors at a large university is selected for each of the four majors: Mathematics, English,
Education, and Biology.
5|P a g e
Test, at the 5% level of significance, whether the data contain sufficient evidence to conclude
that there are differences among the average major GPAs of these four majors.
= 3.15
F = 0.5768/0.1803
= 3.20
6|P a g e
Source of Sum of df Mean of F p
Variation Squares Squares
Between Groups 1.7305 3 0.5768
Within Groups 7.2123 40 0.1803 3.20 0.05
Total 8.9428
SUMMARY
Groups Count Sum Average Variance
Math 11 31.93 2.90272727 0.188061818
English 11 36.69 3.33545455 0.147587273
Educ 11 36.95 3.35909091 0.228909091
Biology 11 33.17 3.01545455 0.156667273
ANOVA
Source of Variation SS df MS F P-value F crit
Between Groups 1.7305 3 0.57683333 3.199184553 0.033451395 2.838745398
Within Groups 7.21225455 40 0.18030636
Total 8.94275455 43
Conclusion: The data provide sufficient evidence, at the 5% level of significance, to conclude
that the averages of major GPAs for the four majors considered are not all equal.
7|P a g e