UNIT IV New
UNIT IV New
UNIT IV New
ANALYSIS OF VARIANCE
t-test for one sample – sampling distribution of t – t-test procedure – t-test for two independent
samples – p-value – statistical significance – t-test for two related samples-F-test – ANOVA -
Two-factor experiments – three f-tests – two-factor ANOVA – Introduction to chi-square tests
1
1.3 ESTIMATING THE STANDARD ERROR ( Sx )
If the population standard deviation is unknown, it must be estimated from the sample.This
seemingly minor complication has important implications for hypothesis testing—indeed, it is the
reason why the z test must be replaced by the t test. Now s replaces σ in the formula for the
standard error of the mean. Instead of
2
3
Panel III
Finally, dividing the difference between the sample mean, X, and the null hypothesized
value, 𝝁hyp, by the estimated standard error, X s , yields the value of the t ratio.
Read the entry from the cell intersected by the row for the correct number of degrees of freedom
and the column for the confidence specifications. In the present case, if a 95 percent confidence
interval is desired, first locate the row corresponding to 5 degrees of freedom (from df = n − 1 =
6 − 1 = 5), and then locate the column for the 95 percent level of confidence, that is, the column
heading identified with a single asterisk. (A double asterisk identifies the column for the 99
percent level of confidence.) The intersected cell specifies that a value of 2.571 should be entered
in Formula 12.4.
4
Critical t Values
The sampling distribution of t represents the distribution that would be obtained if a value of t
were calculated for each sample mean for all possible random samples of a given size from some
population. Degrees of freedom (df) refers to the number of values free to vary when, for
example, sample variability is used to estimate the unknown population variability.
Each t distribution is associated with a special number referred to as degrees of freedom,
first when the n deviations about the sample mean are used to estimate variability in the
population, only n − 1 are free to vary because of the restriction that the sum of these deviations
must always equal zero. Since one degree of freedom is lost because of the zero-sum restriction,
there are only n − 1 degrees of freedom, that is, symbolically
5
2.2 Finding Critical t Values
For example, to find the critical t for the gas mileage investigation, first go to the righthand
panel for a one-tailed test, then locate both the row corresponding to five degrees of freedom
and the column for a one-tailed test at the .01 level of significance. The intersected cell specifies
3.365. A negative sign must be placed in front of 3.365, since the hypothesis test requires the
lower tail to be critical. Thus, –3.365 is the critical t for the gas mileage investigation. Figure
12.2, where the distribution of t is centered about zero (the equivalent value of t for the original
null hypothesized value of 45 mpg).If the gas mileage investigation had involved a two-tailed test
(still at the .01 level with five degrees of freedom), then the left-hand panel for a two-tailed test
would have been appropriate, and the intersected cell would have specified 4.032. Both positive
and negative signs would have to be placed in front of 4.032, since both tails are critical.In this
case, 4.032 would have been the pair of critical t values.
6
3. t-test for two independent samples
Two Independent Samples
Statistical Hypotheses
Sampling Distribution
Standard error of the mean
t ratio
Critical t values
Calculations of t test(T test Procedure)
Estimating Effect Size: Point Estimates And Confidence Intervals
Estimating Effect Size: Cohen’s d
7
3.2 Statistical Hypotheses:
8
3.3 Sampling Distribution Of X1 – X2:
The hypothesis test for the current experiment will be based not on the sampling distribution of
X1 −X2 but on its standardized counterpart, the sampling distribution of t. Although there also is
a sampling distribution of z, its use requires that both population standard deviations be known .
3.4 Standard Error of the Sampling Distribution
The standard deviation of the sampling distribution (or standard error) of X equals
9
The standard error of the difference between
3.5 t Ratio
The null hypothesis can be tested using a t ratio. Expressed in words,
X1,X2 represents the one observed difference between sample means; represents
the hypothesized difference of zero between population means; and represents the
estimated standard error.
3.6 Critical t Values:
To find the critical t for the current experiment, first go to the right-hand panel for a one-tailed
test; next, locate the row corresponding to 10 degrees of freedom (from df = n1 + n2 − 2 = 6 +
6 − 2 = 10); and then locate the column for a one-tailed test at the .05 level of significance.
10
3.7 Calculations of t test:
11
12
13
The degrees of freedom for equal the sum of the degrees of freedom for the two samples minus
two. Two degrees of freedom are lost, one for each sample, because of the zero-sum restriction
for the deviations of observations about their respective means.
14
3.8 Estimating Effect Size: Point Estimates And Confidence Intervals
It identifies the observed difference for , in this case, 5 minutes, as an estimate of the
unknown effect, that is, the unknown difference between population means, μ 1 − μ2. If you think
about it, this impressive estimate of effect size isn’t surprising. With the very small groups of only
6 patients, we had to create a large, fictitious mean difference of 5 minutes in order to claim a
statistically significant result. If this result had occurred in a real experiment, it would have
signified a powerful effect of EPO on endurance that could be detected even with very small
samples.
Confidence Interval
For example, if a 95 percent confidence interval is desired for the EPO experiment, first
locate the row corresponding to 10 degrees of freedom (from df = n1 + n2 − 2 = 6 + 6 − 2 =
10) and then locate the column for the 95 percent level of confidence, that is, the column heading
identified with a single asterisk. The intersected cell specifies a value of 2.228 to be entered for
tconf in Formula 14.4. Given this value for tconf, and values of 5 for the difference between
15
sample means, X X 1 2, and of 2.32 for the estimated standard error, (from
Table 14.1),
Now it can be claimed, with 95 percent confidence, that the interval between –0.17
minutes and 10.17 minutes includes the true effect size, that is, the true difference between
population means for endurance scores.
3.9 ESTIMATING EFFECT SIZE: COHEN’S d
Cohen’s d describes effect size by expressing the observed mean difference in standard deviation
units. To calculate d, divide the observed mean difference by the standard deviation, that is,
where, according to current usage, d refers to a standardized estimate of the effect size; X1 and
X2 are the two sample means; and is the sample standard deviation obtained from the
square root of the pooled variance estimate
Consequences:
The standard deviation supplies a stable frame of reference not influenced by increases
in sample size. Unlike the standard error, whose value decreases as sample size
increases, the value of the standard deviation remains the same, except for chance, as
sample size increases. Therefore, straightforward comparisons can be made between d
values based on studies with appreciably different sample sizes.
The original units of measurement cancel out because of their appearance in both the
numerator and denominator. Subsequently, d always emerges as an estimate in standard
deviation units, regardless of whether the original mean difference is based on, for
example, reaction times in milliseconds of pilots to two different cockpit alarms or weight
losses in pounds of overweight subjects to two types of dietary restrictions. Except for
chance, comparisons are straightforward between values of d—with larger values of d
reflecting larger effect sizes—even though the original mean differences are based on
very different units of measurement, such as milliseconds and pounds.
Cohen’s Guidelines for d
Effect size is small if d is less than or in the vicinity of 0.20, that is, one-fifth of a
standard deviation.
Effect size is medium if d is in the vicinity of 0.50, that is, one-half of a standard deviation.
Effect size is large if d is more than or in the vicinity of 0.80, that is, four-fifths of a standard
deviation.
16
Cohen’s guidelines for d are converted into more concrete mean differences involving GPAs,
IQs, and SAT scores. Notice that Cohen’s medium effect, a d value of .50, translates into mean
differences of .25 for GPAs, 7.5 for IQs, and 50 for SAT scores. To qualify as medium effects, the
average GPA would have to increase, for example, from 3.00 to 3.25; the average IQ from 100
to 107.5; and the average SAT score from 500 to 550.
In Figure 14.4, separation between pairs of normal curves is nonexistent (and overlap is
complete) when d = 0. Separation becomes progressively more conspicuous as the values of d,
corresponding to Cohen’s small, medium, and large effects, increase from .20 to .50 and then
to .80. Separation becomes very conspicuous, with relatively little overlap, given a d value of
3.00, equivalent to three standard deviations, for a very large effect.
4.p value:
Find Approximate p values
Small p-value
Level of Significance or p-Value?
17
The p-value for a test result represents the degree of rarity of that result, given that the null
hypothesis is true. Smaller p-values tend to discredit the null hypothesis and to support the
research hypothesis.
The p-value indicates the degree of rarity of the observed test result when combined with
all potentially more deviant test results. In other words, the p-value represents the proportion of
area, beyond the observed result, in the tail of the sampling distribution. In the left panel of
Figure 14.3, a relatively deviant (from zero) observed t is associated with a small p-value that
makes the null hypothesis suspect, while in the right panel, a relatively non-deviant observed t is
associated with a large p-value that does not make the null hypothesis suspect.
18
5. STATISTICALLY SIGNIFICANT:
Beware of Excessively Large Sample Sizes
Avoid an Erroneous Conditional Probability
Tests of hypotheses often are referred to as tests of significance, and test results are
described as being statistically significant (if the null hypothesis has been rejected) or as not
being statistically significant (if the null hypothesis has been retained). Rejecting the null
hypothesis and statistically significant both signify that the test result can’t be attributed to
chance. However, correct usage dictates that rejecting the null hypothesis always refers to the
population, such as rejecting the hypothesized zero difference between two population means,
while statistically significant always refers to the sample, such as assigning statistical
significance to the observed difference between two sample means.
Statistical significance between pairs of sample means implies only that the null
hypothesis is probably false, and not whether it’s false because of a large or small difference
between population means.
5.1 Beware of Excessively Large Sample Sizes
Statistical significance merely indicates that an observed effect, such as an observed
difference between the sample means, is sufficiently large, relative to the standard error, to be
viewed as a rare outcome. (Statistical significance also implies that the observed outcome is
reliable, that is, it would reappear as a similarly rare outcome in a repeat experiment.) It’s very
desirable, therefore, that we go beyond reports of statistical significance by estimating the size of
the effect and, if possible, judging its importance.
5.2 Avoid an Erroneous Conditional Probability
19
Our hypothesis testing procedure only supports the first, not the second conditional
probability. Having rejected H0 at the .05 level of significance, we can conclude, without
indicating a specific probability, that H0 is probably false, but we can’t reverse the original
conditional probability and conclude that it’s true with only probability .05 or less.
6.1 Introduction:
The endurance scores of patients reflect not only the effect of EPO, if it exists, but also
the random effects of many uncontrolled factors. One very important type of uncontrolled factor,
referred to as individual differences, reflects the array of characteristics, such as differences in
attitude, physical fitness, personality, etc., that distinguishes one person from another. When
each subject is measured twice, as in the experiment, the t test for repeated measures can be
extra sensitive to detecting a treatment effect by eliminating the distorting effect of variability
due to individual differences. Difference (D) Scores
Computations can be simplified by working directly with the difference between pairs of
endurance scores, that is, by working directly with
where D is the difference score and X1 and X2 are the paired endurance scores for each
patient measured twice, once under the treatment condition and once under the control
condition, respectively.
Mean Difference Score (D )
To obtain the mean for a set of difference scores, add all difference scores and divide by the
number of scores, that is
where D is the mean difference score, ΣD is the sum of all positive difference scores minus the
sum of all negative difference scores, and n is the number of difference scores. The sign of D is
crucial
6.2 Repeated Measures:
A favorite technique for controlling individual differences is referred to as repeated
measures, because each subject is measured more than once. By focusing on the differences
20
between pairs of scores for each subject, the investigator effectively eliminates, by the simple act
of subtraction, each individual’s unique impact on both endurance scores.
if it exists, and random variations of other uncontrolled factors or experimental
errors not attributable to individual differences. (Experimental errors refer to random variations
in endurance scores due to the combined impact of numerous uncontrolled changes, such as
slight changes in temperature, treadmill speed, etc., as well as any changes in a particular
subject’s motivation, health, etc., between the two experimental sessions.)
Two related samples occur whenever each observation in one sample is paired, on a one-to-one
basis, with a single observation in the other sample. Repeated measures might not always be
feasible since, as discussed below; several potential complications must be resolved before
measuring subjects twice. An investigator still might choose to use two related samples by
matching pairs of different subjects in terms of some uncontrolled variable that appears to have
a considerable impact on the dependent variable
For example, patients might be matched for their body weight because preliminary
studies revealed that, regardless of whether or not they received EPO, lightweight patients have
better endurance scores than heavyweight patients. Before collecting data, patients could be
matched, beginning with the two lightest and ending with the two heaviest (and random
assignment dictating which member of each pair receives EPO). Now, as with repeated
measures, the endurance scores for pairs of matched patients tend to be more similar.
Complications with Repeated Measurements
21
For instance, since each patient is measured twice, once in the treatment condition and
once in the control condition, sufficient time must elapse between these two conditions to
eliminate any lingering effects due to the treatment. If there is any concern that these effects
cannot be eliminated, use each subject in only one condition.
Counterbalancing
It is customary to randomly assign half of the subjects to experience the two conditions in
a particular order—say, first the treatment and then the control condition—while the other half
of the subjects experience the two conditions in the reverse order. Known as counterbalancing,
this adjustment controls for any sequence effect, that is, any potential bias in favor of one
condition merely because subjects happen to experience it first (or second).
6.3 Statistical Hypothesis
If EPO has either no consistent effect or a negative effect on endurance scores when patients are
measured twice, the population mean of all difference scores, μD, should equal zero or less.
reads:
This directional alternative hypothesis translates into a one-tailed test with the upper tail
critical.
6.4
Since the mean of the sampling distribution of X equals the population mean, that is, since X = μ,
the mean of the sampling distribution of D equals the corresponding population mean (for
22
estimated standard error of D.
6.5 Critical T values:
To find the critical t for the current EPO experiment, go to the right-hand panel for a one-tailed
test in Table B, then locate the row corresponding to 5 degrees of freedom (from df = n − 1 = 6
− 1 = 5), and locate the column for a one-tailed test at the .05 level of significance. The
intersected cell specifies 2.015.
23
6.6 CALCULATIONS FOR THE t TEST
Panel 1:It involves most of the computational labor, and it generates values for the sample mean
difference, D, and the sample standard deviation for the difference scores
24
6.7 Confidence Interval :Confidence Interval for μD Given that two samples are related, as when
patients were measured twice in the EPO experiment, a confidence interval for μ D can be
constructed from the following expression:
Finding t conf
First locate the row corresponding to 5 degrees of freedom, and then locate the column
for the 95 percent level of confidence, that is, the column heading identified with a single
asterisk. The intersected cell specifies a value of 2.571 for tconf. Given a value of 2.571 for t
conf, and (from Table 15.1) values of 5 for D, the sample mean of the difference scores, and 0.68
for Ds , the estimated standard error, Formula 15.6 becomes
It can be claimed, with 95 percent confidence, that the interval between 3.25 minutes and 6.75
minutes includes the true mean for the population of difference endurance scores.
6.8 Standardized Effect Size, Cohen’s d:
25
6.9 t T E S T for The Population Correlation Coefficient:
Null Hypothesis
If they were a random sample of pairs of observations from the population of all friends. Then
it’s possible to test the null hypothesis that the population correlation coefficient, symbolized by
the Greek letter ρ (rho), equals zero. There is no correlation between the number of cards sent
and the number of cards received.
26
7. F-test:
Introduction
Variance Estimates
Mean Squares (MS) And The F Ratio
F Test is Non Directional
27
Estimating Effect Size
Multiple Comparisons
7.1 Introduction: In the two-sample case, t reflects the ratio between the observed difference
between the two sample means in the numerator and the estimated standard error in the
denominator. For three or more samples, the null hypothesis is tested with a new ratio, the F
ratio.
An F test of the null hypothesis is based on the notion that if the null hypothesis is true,
both the numerator and the denominator of the F ratio would tend to be about the same, but if
the null hypothesis is false, the numerator would tend to be larger than the denominator.
When the null hypothesis is false, the presence of a treatment effect tends to cause a chain
reaction: The observed differences between group means tend to be large, as does the variability
between groups.
Test Results for Outcomes A and B
Given the .05 level of significance, the null hypothesis should be retained for Outcome A, since
the observed F of 0.75 is smaller than the critical F of 5.14. However, the null hypothesis should
be rejected for Outcome B, since the observed F of 7.36 exceeds the critical F.
28
7.2 Variance Estimates:
The sample variance measures variability among any set of observations by first finding the sum
of squares, SS, that is, the sum of the squared deviations about their mean:
29
where s2 is the sample variance.
Estimates in ANOVA:
1. Sum of squares, SS, in the numerator: The numerator term for s 2 represents the sum of the
squared deviations about the sample mean, X. More generally, the numerator term for any
variance estimate in ANOVA always is the sum of squares, that is, the sum of squared deviations
for some set of scores about their mean.
2. Degrees of freedom, df, in the denominator: The denominator for s2 represents the number of
degrees of freedom for these deviations.
Mean Square :
A variance estimate in ANOVA, referred to as a mean square, consists of some sum of squares
divided by its degrees of freedom.
where MS represents the mean square; SS denotes the sum of squared deviations about their
mean; and df equals the corresponding number of degrees of freedom
Sum of Squares ( SS ): Definitional Formulas: The sum of squared deviations of some set of
scores about their mean.
30
Sum of Squares ( SS ): Computation Formulas
31
The computation formulas for the three new SS terms in Table 16.2 can be viewed as variations
on the original computation formula for the sum of squares. Note the following features common
to both the original computation formula and the three new computation formulas:
1. Each formula consists of two components separated by a minus sign.
2. Means are replaced by their corresponding totals. The grand mean, Xgrand, is replaced by
the grand total, G, and any group mean, Xgroup, is replaced by its group total, T.
3. Whether a score or a total, each entry in the numerator is squared and, in the case of a total,
divided by its sample size, either N for the grand total or n for any
group total.
32
33
34
Checking Computational Accuracy:
To minimize computational errors, calculate from scratch each of the three SS terms, even
though this entails some duplication of effort.
where MSbetween reflects the variability between means for groups of subjects who are treated
differently. Relatively large values of MS between suggest the presence of a treatment effect.
36
7.4 F Test Is Nondirectional
The F test in ANOVA is the equivalent of a nondirectional test. Recall that all variations in
ANOVA are squared. When squared, all values become positive, regardless of whether the
original differences between groups (or group means) are positive or negative.
F and t 2
Squaring the t test would produce a similar effect. When squared, all values of t2 become
positive, regardless of whether the original value for the observed t was positive or negative.
Hence, the t2 test also qualifies as a non directional test, even though the entire rejection region
appears only in the upper tail of the t2 sampling distribution. In fact, the values of t2 and F are
identical when both tests can be applied to the same data for two independent groups. When only
two groups are involved, the t2 test can be viewed as a special case of the more general F test in
ANOVA for two or more groups.
7.5 Estimating Effect Size
37
The value of .71 for η2 implies that .71 (or 71 percent) of the variance among scores can be
attributed to variance between treatment groups, as reflected in SSbetween. More specifically,
this large value of .71 suggests that 71 percent of the variance in aggression scores is
attributable to whether subjects are deprived of sleep for 0, 24, or 48 hours, while the remaining
29 percent of variance in aggression scores is not attributable to hours of sleep deprivation.
7.6 Multiple Comparisons:
Rejection of the overall null hypothesis indicates only that all population means are not equal. In
the case of the original sleep deprivation experiment, the rejection of H0 signals the presence of
one or more inequalities between the mean aggression scores for populations of subjects
38
exposed to 0, 24, or 48 hours of sleep deprivation, that is, between μ0, μ24, and μ48. To pinpoint
the one or more differences between pairs of population means that contribute to the rejection of
the overall H0, we must use a test of multiple comparisons. A test of multiple comparisons is
designed to evaluate not just one but a series of differences between population means, such as
those for each of the three possible differences between pairs of population means for the present
experiment, namely, μ0 − μ24, μ0 − μ48, and μ24 − μ48.
t Test Not Appropriate: The regular t test is designed to evaluate a single comparison for a pair
of observed means, not multiple comparisons for all possible pairs of observed means. Among
other complications, the use of multiple t tests increases the probability of a type I error
(rejecting a true null hypothesis) beyond the value specified by the level of significance.
Tukey’s HSD test :Tukey’s HSD test can be used to test all possible differences between pairs of
means, and yet the cumulative probability of a type I error never exceeds the specified level of
significance.
A multiple comparison test for which the cumulative probability of a type I error never
exceeds the specified level of significance.
Finding the Critical Value:
39
8. ANOVA:
When data are quantitative, an overall test of the null hypothesis for more than two
population means requires a new statistical procedure known as analysis of variance,
which is often abbreviated as ANOVA.
Developed by R.A. Fisher in the year 1920.
Variance is defined as the expection of the squared deviation of a random variable from
its mean (i.e) σ2 or S2 .
For comparison of more than two population or population having more than two
subgroups, we will use ANOVA technique.
To see any difference occur between the sample, we represent using distribution curve.
40
The population from which the sample are drawn have the equal variance (i.e) S12 = S22
= S32 ….= Sk2 for k.
Each sample is drawn randomly and they are independent.
Null hypothesis:
H0 : µ1 = µ2 ….= µn
Alternative hypothesis:
HA : µ1 ≠ µ2 ….≠ µn
An observation in the sample data an ANOVA is classified to one factor or 2 factor.
The null hypothesis is that there is no difference between groups and equality between
means.
The alternate hypothesis is that there is a difference between the means and groups.
Assumptions and limitation:
41
1. Sum of squares, SS, in the numerator: The numerator term for s2 represents the sum of the
squared deviations about the sample mean, X. More generally, the numerator term for any
variance estimate in ANOVA always is the sum of squares, that is, the sum of squared deviations
for some set of scores about their mean.
2. Degrees of freedom, df, in the denominator: The denominator for s2 represents the number of
degrees of freedom for these deviations. More generally, the denominator term for any variance
estimate in ANOVA always is the number of degrees of freedom, that is, the number of deviations
in the numerator that are free to vary and, therefore, supply valid information for the purpose of
estimation.
Mean Square:
A variance estimate in ANOVA, referred to as a mean square, consists of some sum of
squares divided by its degrees of freedom.
where MS represents the mean square; SS denotes the sum of squared deviations about
their mean; and df equals the corresponding number of degrees of freedom.
42
Finding Approximate p –Values
If the observed F is smaller than the light numbers, p > .05. If the observed F is between
light and dark numbers, p < .05. If the observed F is larger than the dark numbers, p < .01.
43
Tables for Main Effects and Interaction
Graph for Interaction
9.1 Introduction: A more complex type of analysis that tests whether differences exist among
population means categorized by two factors or independent variables.
Using this two factor ANOVA design, the psychologist can test not just two but three null
hypotheses, namely, the effect on subjects’ reaction times of (1) crowd size, (2) degree of danger
and, as a bonus, (3) the combination or interaction of crowd size and degree of danger. For
computational simplicity, assume that the social psychologist randomly assigns two subjects to
be tested (one at a time) with crowds of either zero, two, or four people and either the non
dangerous or dangerous conditions. The resulting six groups, each consisting of two subjects,
represent all possible combinations of the two factors.
9.2 Tables for Main Effects and Interaction:
Main Effect :The effect of a single factor when any other factor is ignored
1. The three column means (9, 12, 15) represent the mean reaction times for each crowd
size when degree of danger is ignored. Any differences among these column means not
attributable to chance are referred to as the main effect of crowd size on reaction time. In
ANOVA, main effect always refers to the effect of a single factor, such as crowd size,
when any other factor, such as degree of danger, is ignored.
2. The two row means (8, 16) represent the mean reaction times for degree of danger when
crowd size is ignored. Any difference between these row means not attributable to chance
is referred to as the main effect of degree of danger on reaction time.
3. The mean of the reaction times for each group of two subjects yields the six means (8, 7,
9, 10, 17, 21) for each combination of the two factors. Often referred to as cell means or
treatment-combination means, these means reflect not only the main effects for crowd
size and degree of danger described earlier but, more importantly, any effect due to the
interaction between crowd size and degree of danger, as described below.
4. Finally, the one mean for all three column means—or for both row means— yields the
overall or grand mean (12) for all subjects in the study.
9.3 Graph for Interaction
44
These preliminary conclusions about main effects must be qualified because of a
complication due to the combined effect or interaction of crowd size and degree of
danger on reaction time. Interaction occurs whenever the effects of one factor on the
dependent variable are not consistent for all values (or levels) of the second factor.
The two nonparallel lines in panel C depict differences between the three cell means in
the first row and the three cell means in the second row—that is, between the mean
reaction times for the dangerous condition for different crowd sizes and the mean
reaction times for the non dangerous condition for different crowd sizes. Although the
line for the dangerous conditions remains fairly level, that for the non dangerous
conditions is slanted, suggesting that the reaction times for the non dangerous conditions,
but not those for the dangerous conditions, are influenced by crowd size.
Panel C of Figure 18.1 depicts the interaction between crowd size and degree of
danger. The two nonparallel lines in panel C depict differences between the three cell
means in the first row and the three cell means in the second row—that is, between the
mean reaction times for the dangerous condition for different crowd sizes and the mean
reaction times for the non dangerous condition for different crowd sizes. Although the
line for the dangerous conditions remains fairly level, that for the non dangerous
conditions is slanted, suggesting that the reaction times for the non dangerous conditions,
but not those for the dangerous conditions, are influenced by crowd size. Because the
effect of crowd size is not consistent for the non dangerous and dangerous conditions—
portrayed by the apparent non parallelism between the two lines in panel C of
Figure 18.1—the null hypothesis (that there is no interaction between the two factors)
might be rejected
45
10.1 Introduction
F ratios in both a one- and a two-factor ANOVA always consist of a numerator (shaded) that
measures some aspect of variability between groups or cells and a denominator that measures
variability within groups or cells. In a one-factor ANOVA, a single null hypothesis is tested with
one F ratio. In two-factor ANOVA, three different null hypotheses are tested, one at a time, with
three F ratios: Fcolumn, Frow, and Finteraction.
The numerator of each of these three F ratios reflects a different aspect of variability between
cells: variability between columns (crowd size), variability between rows (degree of danger), and
interaction—any remaining variability between cells not attributable to either variability
between columns (crowd size) or rows (degree of danger ). The shaded numerator terms for the
three F ratios in the bottom panel of Figure 18.2 estimate random error and, if present, a
treatment effect (for subjects treated differently by the investigator). The denominator term
always estimates only random error.
10.2 INTERACTION: Interaction emerges as the most striking feature of a two-factor ANOVA.
As noted previously, two factors interact if the effects of one factor on the dependent variable are
not consistent for all of the levels of a second factor. More generally, when two factors are
combined, something happens that represents more than a mere composite of their separate
effects.
Supplies Valuable Information: the interaction between crowd size and degree of danger might
encourage the exploration, possibly by interviewing participants, about their reactions to
various crowd sizes and degrees of danger. In the process, much might be learned about why
some people in groups assume or fail to assume social responsibility.
Simple Effect: A simple effect represents the effect of one factor on the dependent variable at a
single level of the second factor.
Inconsistent Simple Effects: The two simple effects of crowd size, one for non dangerous
conditions and one for dangerous conditions, clearly are inconsistent; the simple effect of crowd
size for dangerous conditions shows a decrease in mean reaction times with larger crowd sizes,
while the simple effect of crowd size for non dangerous conditions shows just the opposite—an
increase in mean reaction times with larger crowd sizes. Accordingly, the main effect of crowd
size—assuming one exists— cannot be interpreted without referring to its radically different
simple effects.
46
47
48
Fig: Some possible outcomes (two-factor experiment)
Simple Effects and Interaction In Figure , no interaction is present in panels A and B because
their respective simple effects are consistent, as suggested by the parallel lines. Interactions
could be present in panels C and D because their respective simple effects are inconsistent, as
suggested by the diverging or crossed lines. Given the present perspective, interaction can be
viewed as the product of inconsistent simple effects.
49
10.3 Variance Estimates: The three F ratios in a two-factor ANOVA is based on a ratio
involving two variance estimates: a mean square in the numerator that reflects random error
plus, if present, any specific treatment effect and a mean square in the denominator that reflects
only random error.
50
Variance Estimates:
Each of the three F ratios in a two-factor ANOVA is based on a ratio involving two variance
estimates: a mean square in the numerator that reflects random error plus, if present, any
specific treatment effect and a mean square in the denominator that reflects only random error.
51
52
Sums of Squares (Computation Formulas) :
Table 18.2 shows the more efficient, computation formulas, where totals replace means. Notice
the highly predictable computational pattern first described in Section 16.4. Each entry is
squared, and each total, whether for a column, a row, a cell, or the grand total, is then divided
by its respective sample size.
53
54
Degrees of Freedom (d f) The number of degrees of freedom must be determined for each SS
term in a twofactor ANOVA, and for convenience, the various df formulas are listed in
Table 18.4. The (c − 1)(r − 1) degrees of freedom for dfinteraction reflect the fact that, from the
perspective of degrees of freedom, the original matrix with (c)(r) cells shrinks to (c − 1) (r − 1)
cells for df interaction. The N − (c)(r) degrees of freedom for dfwithin reflect the fact that the N
scores within all cells must sum to the fixed totals in their respective cells, causing one degree of
freedom to be lost in each of the (c)(r) cells. The df values for the present study are
55
10.5 ESTIMATING EFFECT SIZE:
56
10.6 MULTIPLE COMPARISONS
57
58
The different test results—one significant, the other non significant—provide statistical
support for an important result of the smoke alarm study, namely, that the reaction times tend to
increase with crowd size for non dangerous but not for dangerous conditions.
Tukey’s HSD Test for Multiple Comparisons
59
Estimating Effect Size: A significant difference between pairs of means can, in turn, have its
effect size estimated with Cohen’s d, that is,
large values of d, as well as the large values for p 2 in the current example, wouldn’t be
obtained with real data. The fictitious data for the smoke alarm experiment were selected to
dramatize various effects in two-factor ANOVA, including an interaction with a significant
simple effect, using very small sample sizes.
60
Introduction
Statistical Hypotheses
CALCULATING ( χ 2)
χ 2 TEST
12.1.3 Calculating Χ 2
61
Expected Frequency (fe)The hypothesized frequency for each category, given that the null
hypothesis is true.
Observed Frequency (fo)The obtained frequency for each category.
For example, when testing the blood bank’s claim with a sample of 100 students, 44
students should have type O (from the product of .44 and 100); 41 should have type A; 10 should
have type B; and only 5 should have type AB. In Table 19.1, each of these numbers is referred to
as an expected frequency,fe, that is, the hypothesized frequency for each category of the
qualitative variable if, in fact, the null hypothesis is true. An expected frequency is compared
with its observed frequency, fo, that is, the frequency actually obtained in the sample for each
category.
Evaluating Discrepancies: The crucial question is whether the discrepancies between observed
and expected frequencies are small enough to be regarded as a common outcome, given that the
null hypothesis is true. If so, the null hypothesis is retained. Otherwise, if the discrepancies are
large enough to qualify as a rare outcome, the null hypothesis is rejected.
62
Some Properties of χ 2
The larger the discrepancies are between the observed and expected frequencies, fo − fe, the larger the value of χ 2
and, therefore ,as will be seen, the more suspect the null hypothesis will be. Because of the squaring of each
discrepancy, negative discrepancies become positive, and the value of χ 2 never can be negative. Division by fe
indicates that discrepancies must be evaluated not in isolation, but relative to the size of expected frequencies. For
example, a discrepancy of 5 looms more importantly (and translates into a larger value of χ 2) relative to an expected
frequency of 10 than relative to an expected frequency of 100.
63
Degree of Freedom:
12.1.4 χ 2 Test
64
65
13.2 TWO-VARIABLE χ 2 TEST
Introduction
Statistical Hypotheses
CALCULATING ( χ 2)
χ 2 TEST
12.2.1Introduction: Evaluates whether observed frequencies reflect the independence of two qualitative variables.
66
12.2.2 Statistical Hypotheses:
12.2.3 Calculating χ 2
67
68
Degree of Freedom
12.2.4 χ 2 TEST:
69
70