G Power Manual
G Power Manual
G Power Manual
1 manual
January 31, 2014
This manual is not yet complete. We will be adding help on more tests in the future. If you cannot find help for your test
in this version of the manual, then please check the G*Power website to see if a more up-to-date version of the manual
has been made available.
1
1 Introduction Distribution-based approach to test selection First select
the family of the test statistic (i.e., exact, F , t , c2 , or z-
G * Power (Fig. 1 shows the main window of the program) test) using the Test family menu in the main window. The
covers statistical power analyses for many different statisti- Statistical test menu adapts accordingly, showing a list of all
cal tests of the tests available for the test family.
• F test, Example: For the two groups t-test, first select the test family
• t test, based on the t distribution.
• c2 -test and
• z test families and some
• exact tests.
G * Power provides effect size calculators and graphics
options. G * Power supports both a distribution-based and
a design-based input mode. It contains also a calculator that
supports many central and noncentral probability distribu-
tions.
G * Power is free software and available for Mac OS X Then select Means: Difference between two independent means
and Windows XP/Vista/7/8. (two groups) option in the Statictical test menu.
2
Figure 1: The main window of G * Power
1.2.2 Choose one of the five types of power analysis • the required power level (1 b ),
available
• the pre-specified significance level a, and
In Step 2, the Type of power analysis menu in the center of
the main window is used to choose the appropriate analysis • the population effect size to be detected with probabil-
type and the input and output parameters in the window ity (1 b).
change accordingly. In a criterion power analysis, a (and the associated deci-
sion criterion) is computed as a function of
Example: If you choose the first item from the Type of power
analysis menu the main window will display input and output • 1-b,
parameters appropriate for an a priori power analysis (for t tests
for independent groups if you followed the example provided • the effect size, and
in Step 1).
• a given sample size.
In a compromise power analysis both a and 1 b are
computed as functions of
• the effect size,
• N, and
• an error probability ratio q = b/a.
In an a priori power analysis, sample size N is computed In a post-hoc power analysis the power (1 b) is com-
as a function of puted as a function of
3
• a, Because Cohen’s book on power analysis Cohen (1988)
appears to be well known in the social and behavioral sci-
• the population effect size parameter, and ences, we made use of his effect size measures whenever
• the sample size(s) used in a study. possible. In addition, wherever available G * Power pro-
vides his definitions of "‘small"’, "‘medium"’, and "‘large"’
In a sensitivity power analysis the critical population ef- effects as "‘Tool tips"’. The tool tips may be optained by
fect size is computed as a function of moving the cursor over the "‘effect size"’ input parameter
field (see below). However, note that these conventions may
• a, have different meanings for different tests.
• 1 b, and
Example: The tooltip showing Cohen’s measures for the effect
• N. size d used in the two groups t test
4
The distributions plot may be copied, saved, or printed by The button X-Y plot for a range of values at to bottom of
clicking the right mouse button inside the plot area. the main window opens the plot window.
5
Figure 2: The plot window of G * Power
Figure 3: The table view of the data for the graphs shown in Fig. 2
6
2 The G * Power calculator • sign(x) - Sign of x: x < 0 ! 1, x = 0 ! 0, x > 0 !
1.
G * Power contains a simple but powerful calculator that
can be opened by selecting the menu label "Calculator" in • lngamma(x) Natural logarithm of the gamma function
the main window. Figure 4 shows an example session. This ln(G( x ))
small example script calculates the power for the one-tailed • frac(x) - Fractional part of floating point x: frac(1.56)
t test for matched pairs and demonstrates most of the avail- is 0.56.
able features:
• int(x) - Integer part of float point x: int(1.56) is 1.
• There can be any number of expressions
• min(x,y) - Minimum of x and y
• The result is set to the value of the last expression in
the script • max(x,y) - Maximum of x and y
• Expressions can be assigned to variables that can be Supported distribution functions (CDF = cumulative
used in following expressions distribution function, PDF = probability density func-
tion, Quantile = inverse of the CDF). For informa-
• The character # starts a comment. The rest of the line tion about the properties of these distributions check
following # is ignored http://mathworld.wolfram.com/.
• Many standard mathematical functions like square • zcdf(x) - CDF
root, sin, cos etc are supported (for a list, see below) zpdf(x) - PDF
zinv(p) - Quantile
• Many important statistical distributions are supported
of the standard normal distribution.
(see list below)
• normcdf(x,m,s) - CDF
• The script can be easily saved and loaded. In this way
normpdf(x,m,s) - PDF
a number of useful helper scripts can be created.
norminv(p,m,s) - Quantile
The calculator supports the following arithmetic opera- of the normal distribution with mean m and standard
tions (shown in descending precedence): deviation s.
7
Figure 4: The G * Power calculator
8
3 Exact: Correlation - Difference from asymptotically identical, that is, they produce essen-
tially the same results if N is large. Therefore, a thresh-
constant (one sample case) old value x for N can be specified that determines the
transition between both procedures. The exact proce-
The null hypothesis is that in the population the true cor-
dure is used if N < x, the approximation otherwise.
relation r between two bivariate normally distributed ran-
dom variables has the fixed value r0 . The (two-sided) al- 2. Use large sample approximation (Fisher Z). With this op-
ternative hypothesis is that the correlation coefficient has a tion you select always to use the approximation.
different value: r 6= r0 :
There are two properties of the output that can be used
H0 : r r0 = 0 to discern which of the procedures was actually used: The
H1 : r r0 6= 0. option field of the output in the protocol, and the naming
of the critical values in the main window, in the distribution
A common special case is r0 = 0 ?see e.g.>[Chap. 3]Co- plot, and in the protocol (r is used for the exact distribution
hen69. The two-sided test (“two tails”) should be used if and z for the approximation).
there is no restriction on the direction of the deviation of the
sample r from r0 . Otherwise use the one-sided test (“one
tail”). 3.3 Examples
In the null hypothesis we assume r0 = 0.60 to be the corre-
3.1 Effect size index lation coefficient in the population. We further assume that
our treatment increases the correlation to r = 0.65. If we
To specify the effect size, the conjectured alternative corre- require a = b = 0.05, how many subjects do we need in a
lation coefficient r should be given. r must conform to the two-sided test?
following restrictions: 1 + # < r < 1 #, with # = 10 6 .
The proper effect size is the difference between r and r0 : • Select
r r0 . Zero effect sizes are not allowed in a priori analyses. Type of power analysis: A priori
G * Power therefore imposes the additional restriction that
|r r0 | > # in this case. • Options
For the special case r0 = 0, Cohen (1969, p.76) defines the Use exact distribution if N <: 10000
following effect size conventions:
• Input
• small r = 0.1 Tail(s): Two
Correlation r H1: 0.65
• medium r = 0.3 a err prob: 0.05
Power (1-b err prob): 0.95
• large r = 0.5
Correlation r H0: 0.60
Pressing the Determine button on the left side of the ef- • Output
fect size label opens the effect size drawer (see Fig. 5). You Lower critical r: 0.570748
can use it to calculate |r| from the coefficient of determina- Upper critical r: 0.627920
tion r2 . Total sample size: 1928
Actual power: 0.950028
1. Use exact distribution if N < x. The computation time of • Correlation: Point biserial model
the exact distribution increases with N, whereas that
of the approximation does not. Both procedures are • Correlations: Two independent Pearson r’s (two sam-
ples)
9
3.5 Implementation notes
Exact distribution. The H0 -distribution is the sam-
ple correlation coefficient distribution sr (r0 , N ), the H1 -
distribution is sr (r, N ), where N denotes the total sam-
ple size, r0 denotes the value of the baseline correlation
assumed in the null hypothesis, and r denotes the ‘alter-
native correlation’. The (implicit) effect size is r r0 . The
algorithm described in Barabesi and Greco (2002) is used to
calculate the CDF of the sample coefficient distribution.
3.6 Validation
The results in the special case of r0 = 0 were compared
with the tabulated values published in Cohen (1969). The
results in the general case were checked against the values
produced by PASS (Hintze, 2006).
10
4 Exact: Proportion - difference from The relational value given in the input field on the left side
and the two proportions given in the two input fields on the
constant (one sample case) right side are automatically synchronized if you leave one
of the input fields. You may also press the Sync values
The problem considered in this case is whether the proba-
button to synchronize manually.
bility p of an event in a given population has the constant
Press the Calculate button to preview the effect size g
value p0 (null hypothesis). The null and the alternative hy-
resulting from your input values. Press the Transfer to
pothesis can be stated as:
main window button to (1) to calculate the effect size g =
H0 : p p0 = 0 p p0 = P2 P1 and (2) to change, in the main window,
H1 : p p0 6= 0. the Constant proportion field to P1 and the Effect size
g field to g as calculated.
A two-tailed binomial tests should be performed to test
this undirected hypothesis. If it is possible to predict a pri-
ori the direction of the deviation of sample proportions p 4.2 Options
from p0 , e.g. p p0 < 0, then a one-tailed binomial test The binomial distribution is discrete. It is thus not normally
should be chosen. possible to arrive exactly at the nominal a-level. For two-
sided tests this leads to the problem how to “distribute” a
4.1 Effect size index to the two sides. G * Power offers the three options listed
here, the first option being selected by default:
The effect size g is defined as the deviation from the con-
stant probability p0 , that is, g = p p0 . 1. Assign a/2 to both sides: Both sides are handled inde-
The definition of g implies the following restriction: # pendently in exactly the same way as in a one-sided
(p0 + g) 1 #. In an a priori analysis we need to re- test. The only difference is that a/2 is used instead of
spect the additional restriction | g| > # (this is in accordance a. Of the three options offered by G * Power , this one
with the general rule that zero effect hypotheses are unde- leads to the greatest deviation from the actual a (in post
fined in a priori analyses). With respect to these constraints, hoc analyses).
G * Power sets # = 10 6 .
Pressing the Determine button on the left side of the ef- 2. Assign to minor tail a/2, then rest to major tail (a2 =
fect size label opens the effect size drawer: a/2, a1 = a a2 ): First a/2 is applied to the side of
the central distribution that is farther away from the
noncentral distribution (minor tail). The criterion used
for the other side is then a a1 , where a1 is the actual
a found on the minor side. Since a1 a/2 one can
conclude that (in post hoc analyses) the sum of the ac-
tual values a1 + a2 is in general closer to the nominal
a-level than it would be if a/2 were assigned to both
side (see Option 1).
3. Assign a/2 to both sides, then increase to minimize the dif-
ference of a1 + a2 to a: The first step is exactly the same
as in Option 1. Then, in the second step, the critical
values on both sides of the distribution are increased
(using the lower of the two potential incremental a-
values) until the sum of both actual a values is as close
as possible to the nominal a.
You can use this dialog to calculate the effect size g from
p0 (called P1 in the dialog above) and p (called P2 in the Press the Options button in the main window to select
dialog above) or from several relations between them. If you one of these options.
open the effect dialog, the value of P1 is set to the value in
the constant proportion input field in the main window. 4.3 Examples
There are four different ways to specify P2:
We assume a constant proportion p0 = 0.65 in the popula-
1. Direct input: Specify P2 in the corresponding input field tion and an effect size g = 0.15, i.e. p = 0.65 + 0.15 = 0.8.
below P1 We want to know the power of a one-sided test given
2. Difference: Choose difference P2-P1 and insert the a = .05 and a total sample size of N = 20.
difference into the text field on the left side (the dif-
• Select
ference is identical to g).
Type of power analysis: Post hoc
3. Ratio: Choose ratio P2/P1 and insert the ratio value
into the text field on the left side • Options
Alpha balancing in two-sided tests: Assign a/2 on both
4. Odds ratio: Choose odds ratio and insert the odds ra- sides
tio ( P2/(1 P2))/( P1/(1 P1)) between P1 and P2
into the text field on the left side.
11
Figure 6: Distribution plot for the example (see text)
12
Figure 7: Plot of power vs. sample size in the binomial test (see text)
13
5 Exact: Proportion - inequality, two the central distribution that is farther away from the
noncentral distribution (minor tail). The criterion used
dependent groups (McNemar) on the other side is then a a1 , where a1 is the actual
a found on the minor side. Since a1 a/2 one can
This procedure relates to tests of paired binary responses.
conclude that (in post hoc analyses) the sum of the ac-
Such data can be represented in a 2 ⇥ 2 table:
tual values a1 + a2 is in general closer to the nominal
Standard a-level than it would be if a/2 were assigned to both
Treatment Yes No sides (see Option 1).
Yes p11 p12 pt 3. Assign a/2 to both sides, then increase to minimize the dif-
No p21 p22 1 pt ference of a1 + a2 to a: The first step is exactly the same
ps 1 ps 1 as in Option 1. Then, in the second step, the critical
values on both sides of the distribution are increased
where pij denotes the probability of the respective re- (using the lower of the two potential incremental a-
sponse. The probability p D of discordant pairs, that is, the values) until the sum of both actual a values is as close
probability of yes/no-response pairs, is given by p D = as possible to the nominal a.
p12 + p21 . The hypothesis of interest is that ps = pt , which
is formally identical to the statement p12 = p21 .
5.2.2 Computation
Using this fact, the null hypothesis states (in a ratio no-
tation) that p12 is identical to p21 , and the alternative hy- You may choose between an exact procedure and a faster
pothesis states that p12 and p21 are different: approximation (see implementation notes for details):
5.2.1 Alpha balancing in two-sided tests In this table the proportion of discordant pairs is p D =
.32 + .08 = 0.4 and the Odds Ratio OR = p12 /p21 =
The binomial distribution is discrete. It is therefore not 0.08/.32 = 0.25. We want to compute the exact power for
normally possible to arrive at the exact nominal a-level. a one-sided test. The sample size N, that is, the number of
For two-sided tests this leads to the problem how to “dis- pairs, is 50 and a = 0.05.
tribute” a to the two sides. G * Power offers the three op-
tions listed here, the first option being selected by default: • Select
Type of power analysis: Post hoc
1. Assign a/2 to both sides: Both sides are handled inde- • Options
pendently in exactly the same way as in a one-sided Computation: Exact
test. The only difference is that a/2 is used instead of
a. Of the three options offered by G * Power , this one • Input
leads to the greatest deviation from the actual a (in post Tail(s): One
hoc analyses). Odds ratio: 0.25
a err prob: 0.05
2. Assign to minor tail a/2, then rest to major tail (a2 = Total sample size: 50
a/2, a1 = a a2 ): First a/2 is applied to the side of Prop discordant pairs: 0.4
14
• Output also in the two-tailed case when the alpha balancing Op-
Power (1-b err prob): 0.839343 tion 2 (“Assign to minor tail a/2, then rest to major tail”,
Actual a: 0.032578 see above) was chosen in G * Power .
Proportion p12: 0.08 We also compared the exact results of G * Power gener-
Proportion p21: 0.32 ated for a large range of parameters to the results produced
by PASS (Hintze, 2006) for the same scenarios. We found
complete correspondence in one-sided test. In two-sided
The power calculated by G * Power (0.839343) corresponds tests PASS uses an alpha balancing strategy correspond-
within the given precision to the result computed by ing to Option 1 in G * Power (“Assign a/2 on both sides”,
O’Brien (0.839). Now we use the Power Plot window to cal- see above). With two-sided tests we found small deviations
culate the power for several other sample sizes and to gen- between G * Power and PASS (about ±1 in the third deci-
erate a graph that gives us an overview of a section of the mal place), especially for small sample sizes. These devia-
parameter space. The Power Plot window can be opened tions were always much smaller than those resulting from
by pressing the X-Y plot for a range of values button a change of the balancing strategy. All comparisons with
in the lower part of the main window. PASS were restricted to N < 2000, since for larger N the ex-
In the Power Plot window we choose to plot the power act routine in PASS sometimes produced nonsensical values
on the Y-Axis (with markers and displaying the values (this restriction is noted in the PASS manual).
in the plot) as a function of total sample size. The sample
sizes shall range from 50 in steps of 25 through to 150. We
choose to draw a single plot. We specify a = 0.05 and Odds
ratio = 0.25.
The results shown in figure 8 replicate exactly the values
in the table in O’Brien (2002, p. 163)
To replicate the values for the two-sided case, we must
decide how the a error should be distributed to the two
sides. The method chosen by O’Brien corresponds to Op-
tion 2 in G * Power (“Assign to minor tail a/2, then rest
to major tail”, see above). In the main window, we select
Tail(s) "Two" and set the other input parameters exactly
as shown in the example above. For sample sizes 50, 75, 100,
125, 150 we get power values 0.798241, 0.930639, 0.980441,
0.994839, and 0.998658, respectively, which are again equal
to the values given in O’Brien’s table.
5.6 Validation
The results of the exact procedure were checked against
the values given on pages 161-163 in O’Brien (2002). Com-
plete correspondence was found in the one-tailed case and
15
Figure 8: Result of the sample McNemar test (see text for details).
16
6 Exact: Proportions - inequality of where
B12
two independent groups (Fisher’s Pr ( T ta |m, H1 ) = Â Â X 2 M B12
,
X 2 Mt a
exact-test)
P(m) = Pr ( x1 + x2 = m|H1 ) = B12 , and
✓ ◆ ✓ ◆
6.1 Introduction n1 x1 n1 x1 n 2
B12 = p1 (1 p1 ) p2x2 (1 p2 )n2 x2
x1 x2
This procedure calculates power and sample size for tests
comparing two independent binomial populations with For two-sided tests G * Power provides three common
probabilities p1 and p2 , respectively. The results of sam- test statistics that are asymptotically equivalent:
pling from these two populations can be given in a 2 ⇥ 2
contingency table X: 1. Fisher’s exact test:
" #
Group 1 Group 2 Total (nx11 )(nx22 )
T= ln N
Success x1 x2 m (m )
Failure n1 x1 n2 x2 N m
Total n1 n2 N 2. Persons’s exact test:
!
Here, n1 , n2 are the sample sizes, and x1 , x2 the observed 2 (xj mn j /N )2 [(n j xj ) ( N m)n j /N ]2
number of successes in the two populations. N = n1 + n2 is T= Â mn j /N
+
(N m)n j /N
the total sample size, and m = x1 + x2 the total number of j =1
successes.
The null hypothesis states that p1 = p2 , whereas the al- 3. Likelihood ratio exact test:
ternative hypothesis assumes different probabilities in both " # " #!
2 xj nj xj
populations: T = 2 Â x j ln + (n j x j ) ln
H0 : p 1 p 2 = 0 j =1
mn j /N (N m)n j /N
H1 : p1 p2 6= 0.
The choice of the test statistics only influences the way in
6.2 Effect size index which a is distributed on both sides of the null distribution.
For one-sided tests the test statistic is T = x2 .
The effect size is determined by directly specifying the two
proportions p1 and p2 .
6.6.2 Large sample approximation
17
7 Exact test: Multiple Regression - ran-
dom model
In multiple regression analyses, the relation of a dependent
variable Y to m independent factors X = ( X1 , ..., Xm ) is
studied. The present procedure refers to the so-called un-
conditional or random factors model of multiple regression
(Gatsonis & Sampson, 1989; Sampson, 1974), that is, it
is assumed that Y and X1 , . . . , Xm are random variables,
where (Y, X1 , . . . , Xm ) have a joint multivariate normal dis-
tribution with a positive definite covariance matrix:
✓ 2 0
◆
sY SYX
SYX S X
H0 : 2 = r2
rYX 0
2 6 = r2 . value can range from 0 to 1, where 0, 0.5 and 1 corre-
H1 : rYX 0 sponds to the left, central and right position inside the inter-
An important special case is r0 = 0 (corresponding to the val, respectively. The output fields C.I. lower r2 and C.I.
assumption SYX = 0). A commonly used test statistic for upper r2 contain the left and right border of the two-sided
this case is F = [( N m 1)/p] RYX 2 / (1 2 ), which has
RYX 100(1 a) percent confidence interval for r2 . The output
a central F distribution with d f 1 = m, and d f 2 = N m 1. fields Statistical lower bound and Statistical upper
This is the same test statistic that is used in the fixed model. bound show the one-sided (0, R) and ( L, 1) intervals, respec-
The power differs, however, in both cases. tively.
7.1 Effect size index Effect size from predictor correlations By choosing the
option "From predictor correlation matrix" (see Fig. (9)) one
The effect size is the population squared correlation coeffi- may compute r2 from the matrix of correlation among the
cient H1 r2 under the alternative hypothesis. To fully spec- predictor variables and the correlations between predictors
ify the effect size, you also need to give the population and the dependent variable Y. Pressing the "Insert/edit
squared correlation coefficient H0 r2 under the null hypoth- matrix"-button opens a window, in which one can spec-
esis. ify (1) the row vector u containing the correlations between
Pressing the button Determine on the left of the effect size each of the m predictors Xi and the dependent variable Y
label in the main window opens the effect size drawer (see and (2) the m ⇥ m matrix B of correlations among the pre-
Fig. 9) that may be used to calculater2 either from the con- dictors. The squared multiple correlation coefficient is then
2
fidence interval for the population rYX given an observed given by r2 = uB 1 u0 . Each input correlation must lie in
2
squared multiple correlation RYX or from predictor corre- the interval [ 1, 1], the matrix B must be positive-definite,
lations. and the resulting r2 must lie in the interval [0, 1]. Pressing
the Button "Calc r2 " tries to calculate r2 from the input and
Effect size from C.I. Figure (9) shows an example of how checks the positive-definiteness of matrix B.
the H1 r2 can be determined from the confidence interval
computed for an observed R2 . You have to input the sam- Relation of r2 to effect size f 2 The relation between r2
ple size, the number of predictors, the observed R2 and the and effect size f 2 used in the fixed factors model is:
confidence level of the confidence interval. In the remain-
ing input field a relative position inside the confidence in- r2
f2 =
terval can be given that determines the H1 r2 value. The 1 r2
18
Figure 10: Input of correlations between predictors and Y (top) and the matrix of correlations among the predictors (bottom).
19
• Output The results show that N should not be less than 153. This
Lower critical R2 : 0.115170 confirms the results in Shieh and Kung (2007).
Upper critical R2 : 0.115170
Power (1- b): 0.662627 7.3.2 Using confidence intervals to determine the effect
The output shows that the power of this test is about 0.663 size
which is slightly lower than the power 0.674 found in the Suppose that in a regression analysis with 5 predictors and
fixed model. This observation holds in general: The power N = 50 we observed a squared multiple correlation coef-
in the random model is never larger than that found for the ficient R2 = 0.3 and we want to use the lower boundary
same scenario in the fixed model. of the 95% confidence interval for r2 as H1 r2 . Pressing the
Determine-button next to the effect size field in the main
Example 2 We now replicate the test of the hypotheses window opens the effect size drawer. After selecting input
H0 : r2 0.3 versus H1 : r2 > 0.3 given in Shieh and Kung mode "From confidence interval" we insert the above val-
(2007, p.733), for N = 100, a = 0.05, and m = 5 predictors. ues (50, 5, 0.3, 0.95) in the corresponding input field and
We assume that H1r2 = 0.4 . The settings and output in set Rel C.I. pos to use (0=left, 1=right) to 0 to se-
this case are: lect the left interval border. Pressing calculate computes
the lower, upper, and 95% two-sided confidence intervals:
• Select [0, 4245], [0.0589, 1] and [0.0337, 0.4606]. The left boundary
Type of power analysis: Post hoc of the two-sided interval (0.0337) is transfered to the field
• Input H1 r2 .
Tail(s): One
H1 r2 : 0.4 7.3.3 Using predictor correlations to determine effect
a err prob: 0.05 size
Total sample size: 100
Number of predictors: 5 We may use assumptions about the (m ⇥ m) correlation ma-
H0 r2 : 0.3 trix between a set of m predictors, and the m correlations
between predictor variables and the dependent variable Y
• Output to determine r2 . Pressing the Determine-button next to the
Lower critical R2 : 0.456625 effect size field in the main window opens the effect size
Upper critical R2 : 0.456625 drawer. After selecting input mode "From predictor corre-
Power (1- b): 0.346482 lations" we insert the number of predictors in the corre-
sponding field and press "Insert/edit matrix". This opens
The results show, that H0 should be rejected if the observed a input dialog (see Fig. (10)). Suppose that we have 4 pre-
R2 is larger than 0.457. The power of the test is about 0.346. dictors and that the 4 correlations between Xi and Y are
Assume we observed R2 = 0.5. To calculate the associated u = (0.3, 0.1, 0.2, 0.2). We insert this values in the tab
p-value we may use the G * Power -calculator. The syntax "Corr between predictors and outcome". Assume further
of the CDF of the squared sample multiple correlation co- that the correlations between X1 and X3 and between X2
efficient is mr2cdf(R2 ,r2 ,m+1,N). Thus for the present case and X4 are 0.5 and 0.2, respectively, whereas all other pre-
we insert 1-mr2cdf(0.5,0.3,6,100) in the calculator and dictor pairs are uncorrelated. We insert the correlation ma-
pressing Calculate gives 0.01278. These values replicate trix 0 1
those given in Shieh and Kung (2007). 1 0 0.5 0
B 0 1 0 0.2 C
B=B @ 0.5
C
Example 3 We now ask for the minimum sample size re- 0 1 0 A
quired for testing the hypothesis H0 : r2 0.2 vs. the spe- 0 0.2 0 1
cific alternative hypothesis H1 : r2 = 0.05 with 5 predictors under the "Corr between predictors" tab. Pressing the "Calc
to achieve power=0.9 and a = 0.05 (Example 2 in Shieh and r2 "-button computes r2 = uB 1 u0 = 0.297083, which also
Kung (2007)). The inputs and outputs are: confirms that B is positive-definite and thus a correct corre-
• Select lation matrix.
Type of power analysis: A priori
• Input
7.4 Related tests
Tail(s): One Similar tests in G * Power 3.0:
H1 r2 : 0.05
a err prob: 0.05 • Linear Multiple Regression: Deviation of R2 from zero.
Power (1- b): 0.9
• Linear Multiple Regression: Increase of R2 .
Number of predictors: 5
H0 r2 : 0.2
7.5 Implementation notes
• Output
Lower critical R2 : 0.132309 The procedure uses the exact sampling distribution of the
Upper critical R2 : 0.132309 squared multiple correlation coefficient (MRC-distribution)
Total sample size: 153 Lee (1971, 1972). The parameters of this distribution are the
Actual power: 0.901051 population squared multiple correlation coefficient r2 , the
20
number of predictors m, and the sample size N. The only
difference between the H0 and H1 distribution is that the
population multiple correlation coefficient is set to "H0 r2 "
in the former and to "H1 r2 " in the latter case.
Several algorithms for the computation of the exact or
approximate CDF of the distribution have been proposed
(Benton & Krishnamoorthy, 2003; Ding, 1996; Ding
& Bargmann, 1991; Lee, 1971, 1972). Benton and Kr-
ishnamoorthy (2003) have shown, that the implementation
proposed by Ding and Bargmann (1991) (that is used in
Dunlap, Xin, and Myers (2004)) may produce grossly false
results in some cases. The implementation of Ding (1996)
has the disadvantage that it overflows for large sample
sizes, because factorials occuring in ratios are explicitly
evaluated. This can easily be avoided by using the log of
the gamma function in the computation instead.
In G * Power we use the procedure of Benton and Krish-
namoorthy (2003) to compute the exact CDF and a modi-
fied version of the procedure given in Ding (1996) to com-
pute the exact PDF of the distribution. Optionally, one can
choose to use the 3-moment noncentral F approximation
proposed by Lee (1971) to compute the CDF. The latter pro-
cedure has also been used by Steiger and Fouladi (1992) in
their R2 program, which provides similar functionality.
7.6 Validation
The power and sample size results were checked against
the values produced by R2 (Steiger & Fouladi, 1992), the
tables in Gatsonis and Sampson (1989), and results reported
in Dunlap et al. (2004) and Shieh and Kung (2007). Slight
deviations from the values computed with R2 were found,
which are due to the approximation used in R2, whereas
complete correspondence was found in all other tests made.
The confidence intervals were checked against values com-
puted in R2, the results reported in Shieh and Kung (2007),
and the tables given in Mendoza and Stafford (2001).
7.7 References
See Chapter 9 in Cohen (1988) for a description of the fixed
model. The random model is described in Gatsonis and
Sampson (1989) and Sampson (1974).
21
8 Exact: Proportion - sign test
The sign test is equivalent to a test that the proba-
bility p of an event in the populations has the value
p0 = 0.5. It is identical to the special case p0 =
0.5 of the test Exact: Proportion - difference from
constant (one sample case). For a more thorough de-
scription see the comments for that test.
8.2 Options
See comments for Exact: Proportion - difference
from constant (one sample case) in chapter 4 (page 11).
8.3 Examples
8.4 Related tests
Similar tests in G * Power 3.0:
• Exact: Proportion - difference from constant (one sam-
ple case).
8.6 Validation
The results were checked against the tabulated val-
ues in Cohen (1969, chap. 5). For more information
see comments for Exact: Proportion - difference from
constant (one sample case) in chapter 4 (page 11).
22
9 Exact: Generic binomial test
9.1 Effect size index
9.2 Options
Since the binomial distribution is discrete, it is normally
not possible to achieve exactly the nominal a-level. For two-
sided tests this leads to the problem how to “distribute” a
on the two sides. G * Power offers three options (case 1 is
the default):
9.3 Examples
9.4 Related tests
9.5 Implementation notes
9.6 Validation
The results were checked against the values produced by
GPower 2.0.
9.7 References
Cohen...
23
10 F test: Fixed effects ANOVA - one
way
The fixed effects one-way ANOVA tests whether there are
any differences between the means µi of k 2 normally
distributed random variables with equal variance s. The
random variables represent measurements of a variable X
in k fixed populations. The one-way ANOVA can be viewed
as an extension of the two group t test for a difference of
means to more than two groups.
The null hypothesis is that all k means are identical H0 :
µ1 = µ2 = . . . = µk . The alternative hypothesis states that
at least two of the k means differ. H1 : µi 6= µ j , for at least
one pair i, j with 1 i, j k.
where wi = ni /(n1 + n2 + · · · + nk ) stands for the relative Figure 11: Effect size dialogs to calculate f
size of group i.
Pressing the Determine button to the left of the effect size
label opens the effect size drawer. You can use this drawer this button fills the size column of the table with the chosen
to calculate the effect size f from variances, from h 2 or from value.
the group means and group sizes. The drawer essentially Clicking on the Calculate button provides a preview of
contains two different dialogs and you can use the Select the effect size that results from your inputs. If you click
procedure selection field to choose one of them. on the Calculate and transfer to main window button
then G * Power calculates the effect size and transfers the
10.1.1 Effect size from means result into the effect size field in the main window. If the
number of groups or the total sample size given in the ef-
In this dialog (see left side of Fig. 11) you normally start by fect size drawer differ from the corresponding values in the
setting the number of groups. G * Power then provides you main window, you will be asked whether you want to ad-
with a mean and group size table of appropriate size. Insert just the values in the main window to the ones in the effect
the standard deviation s common to all groups in the SD size drawer.
s within each group field. Then you need to specify the
mean µi and size ni for each group. If all group sizes are
equal then you may insert the common group size in the
input field to the right of the Equal n button. Clicking on
24
10.1.2 Effect size from variance 10.4 Related tests
This dialog offers two ways to specify f . If you choose • ANOVA: Fixed effects, special, main effects and inter-
From Variances then you need to insert the variance of actions
2 , into the Variance explained
the group means, that is sm
by special effect field, and the square of the common • ANOVA: Repeated measures, between factors
standard deviation within each group, that is s2 , into
the Variance within groups field. Alternatively, you may 10.5 Implementation notes
choose the option Direct and then specify the effect size f
via h 2 . The distribution under H0 is the central F (k 1, N k ) dis-
tribution with numerator d f 1 = k 1 and denominator
d f 2 = N k. The distribution under H1 is the noncentral
10.2 Options F (k 1, N k, l) distribution with the same df’s and non-
This test has no options. centrality parameter l = f 2 N. (k is the number of groups,
N is the total sample size.)
10.3 Examples
10.6 Validation
We compare 10 groups, and we have reason to expect a
"medium" effect size ( f = .25). How many subjects do we The results were checked against the values produced by
need in a test with a = 0.05 to achieve a power of 0.95? GPower 2.0.
• Select
Type of power analysis: A priori
• Input
Effect size f : 0.25
a err prob: 0.05
Power (1-b err prob): 0.95
Number of groups: 10
• Output
Noncentrality parameter l: 24.375000
Critical F: 1.904538
Numerator df: 9
Denominator df: 380
Total sample size: 390
Actual Power: 0.952363
• Select
Type of power analysis: Compromise
• Input
Effect size f : 0.25
b/a ratio: 1
Total sample size: 200
Number of groups: 10
• Output
Noncentrality parameter l: 12.500000
Critical F: 1.476210
Numerator df: 9
Denominator df: 190
a err prob: 0.159194
b err prob: 0.159194
Power (1-b err prob): 0.840806
25
11 F test: Fixed effects ANOVA - spe- In testing the three-factor interactions, the residuals dijk
of the group means after subtraction of all main effects and
cial, main effects and interactions all two-factor interactions are considered. In a three factor
design there is only one possible three-factor interaction.
This procedure may be used to calculate the power of main
The 3 ⇥ 3 ⇥ 4 = 36 residuals in the example are calculated
effects and interactions in fixed effects ANOVAs with fac-
as dijk = µijk µi?? µ? j? µ??k dij? di?k d? jk . The
torial designs. It can also be used to compute power for
null hypothesis of no interaction states that all residuals
planned comparisons. We will discuss both applications in
are equal. Thus,
turn.
H0 : dijk = dlmn for all combinations of index triples
11.0.1 Main effects and interactions i, j, k and l, m, n.
To illustrate the concepts underlying tests of main effects H1 : dijk 6= dlmn for at least one combination of index
and interactions we will consider the specific example of an triples i, j, k and l, m, n.
A ⇥ B ⇥ C factorial design, with i = 3 levels of A, j = 3
levels of B, and k = 4 levels of C. This design has a total The assumption that the grand mean is zero implies that
number of 3 ⇥ 3 ⇥ 4 = 36 groups. A general assumption is Âi,j,k dijk = 0. The above hypotheses are therefore equivalent
that all groups have the same size and that in each group to
the dependent variable is normally distributed with identi- H0 : dijk = 0 for all i, j, k
cal variance.
In a three factor design we may test three main effects H1 : dijk 6= 0 for at least one i, j, k.
of the factors A, B, C, three two-factor interactions A ⇥ B,
A ⇥ C, B ⇥ C, and one three-factor interaction A ⇥ B ⇥ C. It should be obvious how the reasoning outlined above
We write µijk for the mean of group A = i, B = j, C = k. can be generalized to designs with 4 and more factors.
To indicate the mean of means across a dimension we write
a star (?) in the corresponding index. Thus, in the example 11.0.2 Planned comparisons
µij? is the mean of the groups A = i, B = j, C = 1, 2, 3, 4.
Planned comparison are specific tests between levels of a
To simplify the discussion we assume that the grand mean
factor planned before the experiment was conducted.
µ??? over all groups is zero. This can always be achieved by
One application is the comparison between two sets of
subtracting a given non-zero grand mean from each group
levels. The general idea is to subtract the means across
mean.
two sets of levels that should be compared from each other
In testing the main effects, the null hypothesis is that
and to test whether the difference is zero. Formally this is
all means of the corresponding factor are identical. For the
done by calculating the sum of the componentwise prod-
main effect of factor A the hypotheses are, for instance:
uct of the mean vector ~µ and a nonzero contrast vector
H0 : µ1?? = µ2?? = µ3?? ~c (i.e. the scalar product of ~µ and c): C = Âik=1 ci µi . The
contrast vector c contains negative weights for levels on
H1 : µi?? 6= µ j?? for at least one index pair i, j. one side of the comparison, positive weights for the lev-
The assumption that the grand mean is zero implies that els on the other side of the comparison and zero for levels
Âi µi?? = Â j µ? j? = Âk µ??k = 0. The above hypotheses are that are not part of the comparison. The sum of weights
therefore equivalent to is always zero. Assume, for instance, that we have a fac-
tor with 4 levels and mean vector ~µ = (2, 3, 1, 2) and that
H0 : µi?? = 0 for all i we want to test whether the means in the first two lev-
els are identical to the means in the last two levels. In
H1 : µi?? 6= 0 for at least one i. this case we define ~c = ( 1/2, 1/2, 1/2, 1/2) and get
In testing two-factor interactions, the residuals dij? , di?k , C = Âi ~µi~ci = 1 3/2 + 1/2 + 1 = 1.
and d?ik of the groups means after subtraction of the main A second application is testing polygonal contrasts in a
effects are considered. For the A ⇥ B interaction of the ex- trend analysis. In this case it is normally assumed that the
ample, the 3 ⇥ 3 = 9 relevant residuals are dij? = µij? factor represents a quantitative variable and that the lev-
µi?? µ? j? . The null hypothesis of no interaction effect els of the factor that correspond to specific values of this
states that all residuals are identical. The hypotheses for quantitative variable are equally spaced (for more details,
the A ⇥ B interaction are, for example: see e.g. ?[p. 706ff>hays88). In a factor with k levels k 1
orthogonal polynomial trends can be tested.
H0 : dij? = dkl ? for all index pairs i, j and k, l. In planned comparisons the null hypothesis is: H0 : C =
0, and the alternative hypothesis H1 : C 6= 0.
H1 : dij? 6= dkl ? for at least one combination of i, j and
k, l.
11.1 Effect size index
The assumption that the grand mean is zero implies that
Âi,j dij? = Âi,k di?k = Â j,k d? jk = 0. The above hypotheses are The effect size f is defined as: f = sm /s. In this equation sm
therefore equivalent to is the standard deviation of the effects that we want to test
and s the common standard deviation within each of the
H0 : dij? = 0 for all i, j groups in the design. The total variance is then st2 = sm 2 +
2
s . A different but equivalent way to specify the effect size
H1 : dij? 6= 0 for at least one i, j.
26
is in terms of h 2 , which is defined as h 2 = sm 2 /s2 . That is,
t is not essential but makes the calculation and discussion
2
h is the ratio between the between-groups variance sm 2 and easier). Next, we estimate the common variance s within
2
the total variance st and can be interpreted as “proportion each group by calculating the mean variance of all cells,
of variance explained by the effect under consideration”. that is, s2 = 1/36 Âi s2i = 1.71296.
The relationship between
p h 2 and f is: h 2 = f 2 /(1 + f 2 ) or
solved for f : f = h 2 /(1 h 2 ). Main effects To calculate the power for the A, B, and C
?p.348>Cohen69 defines the following effect size conven- main effects we need to know the effect size f = sm /s.
tions: We already know s2 to be 1.71296 but need to calculate the
variance of the means sm 2 for each factor. The procedure is
• small f = 0.10
analogous for all three main effects. We therefore demon-
• medium f = 0.25 strate only the calculations necessary for the main effect of
factor A.
• large f = 0.40 We first calculate the three means for factor A: µi?? =
{ 0.722231, 1.30556, 0.583331}. Due to the fact that we
have first subtracted the grand mean from each cell we
have Âi µi?? = 0, and we can easily compute the vari-
ance of these means as mean square: sm 2 = 1/3 µ2 =
pÂi i??
0.85546.
p With these values we calculate f = sm /s =
0.85546/1.71296 = 0.7066856. The effect size drawer
in G * Power can be used to do the last calculation: We
choose From Variances and insert 0.85546 in the Variance
explained by special effect and 1.71296 in the Error
Variance field. Pressing the Calculate button gives the
above value for f and a partial h 2 of 0.3330686. Note
that the partial h 2 given by G * Power is calculated from
f according to the formula h 2 = f 2 /(1 + f 2 ) and is not
identical to the SPSS partical h 2 , which is based on sam-
ple estimates. The relation between the two is “SPSS h02 ”
Figure 12: Effect size dialog to calculate f = h 2 N/( N + k(h 2 1)), where N denotes the total sample
size, k the total number of groups in the design and h 2 the
Clicking on the Determine button to the left of the effect G * Power value. Thus h02 = 0.33306806 · 108/(108 36 +
size label opens the effect size drawer (see Fig. 12). You can 0.33306806 · 36) = 0.42828, which is the value given in the
use this drawer to calculate the effect size f from variances SPSS output.
or from h 2 . If you choose From Variances then you need to We now use G * Power to calculate the power for a = 0.05
insert the variance explained by the effect under considera- and a total sample size 3 ⇥ 3 ⇥ 4 ⇥ 3 = 108. We set
tion, that is sm 2 , into the Variance explained by special
• Select
effect field, and the square of the common standard de-
Type of power analysis: Post hoc
viation within each group, that is s2 , into the Variance
within groups field. Alternatively, you may choose the op- • Input
tion Direct and then specify the effect size f via h 2 . Effect size f : 0.7066856
See examples section below for information on how to a err prob: 0.05
calculate the effect size f in tests of main effects and inter- Total sample size: 108
actions and tests of planned comparisons. Numerator df: 2 (number of factor levels - 1, A has
3 levels)
11.2 Options Number of groups: 36 (total number of groups in
the design)
This test has no options.
• Output
Noncentrality parameter l: 53.935690
11.3 Examples Critical F: 3.123907
11.3.1 Effect sizes from means and standard deviations Denominator df: 72
Power (1-b err prob): 0.99999
To illustrate the test of main effects and interaction we as-
sume the specific values for our A ⇥ B ⇥ C example shown
in Table 13. Table 14 shows the results of a SPSS analysis The value of the noncentrality parameter and the power
(GLM univariate) done for these data. In the following we computed by G * Power are identical to the values in the
will show how to reproduce the values in the Observed SPSS output.
Power column in the SPSS output with G * Power .
As a first step we calculate the grand mean of the data. Two-factor interactions To calculate the power for two-
Since all groups have the same size (n = 3) this is just factor interactions A ⇥ B, A ⇥ C, and A ⇥ B we need to
the arithmetic mean of all 36 groups means: m g = 3.1382. calculate the effect size f corresponding to the values given
We then subtract this grand mean from all cells (this step in table 13. The procedure is analogous for each of the three
27
Figure 13: Hypothetical means (m) and standard deviations (s) of a 3 ⇥ 3 ⇥ 4 design.
Figure 14: Results computed with SPSS for the values given in table 13
two-factor interactions and we thus restrict ourselves to the Number of groups: 36 (total number of groups in
A ⇥ B interaction. the design)
The values needed to calculate sm 2 are the 3 ⇥ 3 = 9
28
0.249997, -0.249997}. The mean of these values is zero • Output
(as a consequence of subtracting the grand mean). Thus, Noncentrality parameter l: 22.830000
2 = 0.185189. This
the variance sm is given by 1/36 Âi,j,k dijk Critical F: 1.942507
p Denominator df: 2253
results in an effect size f = 0.185189/1.71296 = 0.3288016
and a partial h 2 = 0.09756294. Using the formula given in Total sample size: 2283
the previous section on main effects it can be checked that Actual power: 0.950078
this corresponds to a “SPSS h 2 ” of 0.140, which is identical
to that given in the SPSS output. G * Power calculates a total sample size of 2283. Please
We use G * Power to calculate the power for a = 0.05 and note that this sample size is not a multiple of the group size
a total sample size 3 ⇥ 3 ⇥ 4 ⇥ 3 = 108. We therefore choose 30 (2283/30 = 76.1)! If you want to ensure that your have
• Select equal group sizes, round this value up to a multiple of 30
Type of power analysis: Post hoc by chosing a total sample size of 30*77=2310. A post hoc
analysis with this sample size reveals that this increases the
• Input power to 0.952674.
Effect size f : 0.3288016
a err prob: 0.05 11.3.3 Power for planned comparisons
Total sample size: 108
Numerator df: 12 (#A-1)(#B-1)(#C-1) = (3-1)(3-1)(4- To calculate the effect size f = sm /s for a given compari-
1) son C = Âik=1 µi ci we need to know—besides the standard
Number of groups: 36 (total number of groups in deviation s within each group—the standard deviation sm
the design) of the effect. It is given by:
• Output |C |
Noncentrality parameter l: 11.675933 sm = s
k
Critical F: 1.889242 N Â c2i /ni
Denominator df: 72 i =1
Power (1-b err prob): 0.513442
where N, ni denote total sample size and sample size in
group i, respectively.
(The notation #A in the comment above means number of Given the mean vector µ = (1.5, 2, 3, 4), sample size ni =
levels in factor A). Again a check reveals that the value of 5 in each group, and standard deviation s = 2 within each
the noncentrality parameter and the power computed by group, we want to calculate the power for the following
G * Power are identical to the values for (A * B * C) in the contrasts:
SPSS output. contrast weights c sµ f h2
1,2 vs. 3,4 - 12- 12 1
2
1
2 0.875 0.438 0.161
11.3.2 Using conventional effect sizes lin. trend -3 -1 1 3 0.950 0.475 0.184
quad. trend 1 -1 -1 1 0.125 0.063 0.004
In the example given in the previous section, we assumed
that we know the true values of the mean and variances Each contrast has a numerator d f = 1. The denominator
in all groups. We are, however, seldom in that position. In- dfs are N k, where k is the number of levels (4 in the
stead, we usually only have rough estimates of the expected example).
effect sizes. In these cases we may resort to the conventional To calculate the power of the linear trend at a = 0.05 we
effect sizes proposed by Cohen. specify:
Assume that we want to calculate the total sample size
needed to achieve a power of 0.95 in testing the A ⇥ C two- • Select
factor interaction at a level 0.05. Assume further that the Type of power analysis: A priori
total design in this scenario is A ⇥ B ⇥ C with 3 ⇥ 2 ⇥ 5 • Input
factor levels, that is, 30 groups. Theoretical considerations Effect size f : 0.475164
suggest that there should be a small interaction. We thus a err prob: 0.05
use the conventional value f = 0.1 defined by Cohen (1969) Total sample size: 20
as small effect. The inputs into and outputs of G * Power Numerator df: 1
for this scenario are: Number of groups: 4
• Select • Output
Type of power analysis: A priori Noncentrality parameter l: 4.515617
• Input Critical F: 4.493998
Effect size f : 0.1 Denominator df: 16
a err prob: 0.05 Power (1-b err prob): 0.514736
Power (1-b err prob): 0.95
Numerator df: 8 (#A-1)(#C-1) = (3-1)(5-1) Inserting the f ’s for the other two contrasts yields a
Number of groups: 30 (total number of groups in power of 0.451898 for the comparison of “1,2 vs. 3,4”, and a
the design) power of 0.057970 for the test of a quadratic trend.
29
11.4 Related tests
• ANOVA: One-way
11.6 Validation
The results were checked against the values produced by
GPower 2.0.
30
12 t test: Linear Regression (size of 12.2 Options
slope, one group) This test has no options.
where s denotes the standard deviation of the residu- The present procedure is a special case of Multiple regres-
als Yi ( aX + b) and r the correlation coefficient be- sion, or better: a different interface to the same procedure
tween X and Y. using more convenient variables. To show this, we demon-
strate how the MRC procedure can be used to compute the
The effect size dialog may be used to determine Std dev example above. First, we determine R2 = r2 from the re-
s_y and/or Slope H1 from other values based on Eqns (1) lation b = r · sy /sx , which implies r2 = (b · sx /sy )2 . En-
and (2) given above. tering (-0.0667*7.5/4)^2 into the G * Power calculator gives
Pressing the button Determine on the left side of the ef- r2 = 0.01564062. We enter this value in the effect size di-
fect size label in the main window opens the effect size alog of the MRC procedure and compute an effect size
drawer (see Fig. 15). f 2 = 0.0158891. Selecting a post hoc analysis and setting
The right panel in Fig 15 shows the combinations of in- a err prob to 0.05, Total sample size to 100 and Number
put and output values in different input modes. The input of predictors to 1, we get exactly the same power as given
variables stand on the left side of the arrow ’=>’, the output above.
variables on the right side. The input values must conform
to the usual restrictions, that is, s > 0, sx > 0, sy > 0, 1 <
r < 1. In addition, Eqn. (1) together with the restriction on
12.4 Related tests
r implies the additional restriction 1 < b · sx /sy < 1. Similar tests in G * Power 3.0:
Clicking on the button Calculate and transfer to
main window copies the values given in Slope H1, Std dev • Multiple Regression: Omnibus (R2 deviation from
s_y and Std dev s_x to the corresponding input fields in zero).
the main window. • Correlation: Point biserial model
31
12.5 Implementation notes
The H0-distribution is the central t distribution with d f 2 =
N 2 degrees of freedom, where N is the sample size.
The H1-distribution is the noncentral t distribution with
the same
p degrees of freedom and noncentrality parameter
d = N [sx (b b0 )]/s.
12.6 Validation
The results were checked against the values produced by
by PASS (Hintze, 2006) and perfect correspondance was
found.
32
Figure 15: Effect size drawer to calculate sy and/or slope b from various inputs constellations (see right panel).
33
13 F test: Multiple Regression - om-
nibus (deviation of R2 form zero),
fixed model
In multiple regression analyses the relation of a dependent
variable Y to p independent factors X1 , ..., Xm is studied.
The present procedure refers to the so-called conditional
or fixed factors model of multiple regression (Gatsonis &
Sampson, 1989; Sampson, 1974), that is, it is assumed that
Y = Xb + #
34
Figure 17: Input of correlations between predictors and Y (top) and the matrix of the correlations among predictors (see text).
35
0.25/(1 0.25) = 0.333. For a = 0.05 and total sample size
N = 12 a power of 0.439627 is computed from both proce-
dures.
13.6 Validation
The results were checked against the values produced by
GPower 2.0 and those produced by PASS (Hintze, 2006).
Slight deviations were found to the values tabulated in Co-
hen (1988). This is due to an approximation used by Cohen
(1988) that underestimates the noncentrality parameter l
and therefore also the power. This issue is discussed more
thoroughly in Erdfelder, Faul, and Buchner (1996).
13.7 References
See Chapter 9 in Cohen (1988) .
36
14 F test: Multiple Regression - special Using this definition, f can alternatively be written in
terms of the partial R2 :
(increase of R2 ), fixed model
2
RYB
In multiple regression analyses the relation of a dependent f2 = 2
·A
variable Y to m independent factors X1 , ..., Xm is studied. 1 RYB ·A
The present procedure refers to the so-called conditional
or fixed factors model of multiple regression (Gatsonis & 2. In a second special case (case 2 in Cohen (1988, p.
Sampson, 1989; Sampson, 1974), that is, it is assumed that 407ff.)), the same effect variance VS is considered, but it
is assumed that there is a third set of predictors C that
Y = Xb + # also accounts for parts of the variance of Y and thus
reduces the error variance: VE = 1 RY 2
· A,B,C . In this
where X = (1X1 X2 · · · Xm ) is a N ⇥ (m + 1) matrix of a case, the effect size is
constant term and fixed and known predictor variables Xi .
2
RY 2
RY
The elements of the column vector b of length m + 1 are · A,B ·A
f2 =
the regression weights, and the column vector # of length N 1 2
RY · A,B,C
contains error terms, with # i ⇠ N (0, s ).
This procedure allows power analyses for the test, We may again define a partial R2x as:
whether the proportion of variance of variable Y explained
by a set of predictors A is increased if an additional 2
RYB
nonempty predictor set B is considered. The variance ex- R2x := ·A
1 ( RY2 · A,B,C 2
RYB ·A)
plained by predictor sets A, B, and A [ B is denoted by
2 , R2 , and R2
RY ·A Y·B Y · A,B , respectively. and with this quantity we get
Using this notation, the null and alternate hypotheses
are: R2x
f2 =
1 R2x
H0 : 2 2
RY · A,B RY ·A = 0
H1 : 2 2 Note: Case 1 is the special case of case 2, where C is
RY · A,B RY · A > 0.
the empty set.
The directional form of H1 is due to the fact that RY 2
· A,B ,
that is the proportion of variance explained by sets A and Pressing the button Determine on the left side of the ef-
B combined, cannot be lower than the proportion RY 2 fect size label in the main window opens the effect size
· A ex-
plained by A alone. drawer (see Fig. 18) that may be used to calculate f 2 from
As will be shown in the examples section, the MRC pro- the variances VS and VE , or alternatively from the partial
cedure is quite flexible and can be used as a substitute for R2 .
some other tests.
37
14.3 Examples to 0.16, thus RY 2
· A,B = 0.16. Considering in addition the 4
predictors in set C, increases the explained variance further
14.3.1 Basic example for case 1 to 0.2, thus RY 2
· A,B,C = 0.2. We want to calculate the power
We make the following assumptions: A dependent variable of a test for the increase in variance explained by the in-
Y is predicted by two sets of predictors A and B. The 5 clusion of B in addition to A, given a = 0.01 and a total
predictors in A alone account for 25% of the variation of sample size of 200. This is a case 2 scenario, because the hy-
Y, thus RY 2 pothesis only involves sets A and B, whereas set C should
· A = 0.25. Including the 4 predictors in set B
increases the proportion of variance explained to 0.3, thus be included in the calculation of the residual variance.
2 We use the option From variances in the effect
RY · A,B = 0.3. We want to calculate the power of a test for
the increase due to the inclusion of B, given a = 0.01 and a size drawer to calculate the effect size. In the input
total sample size of 90. field Variance explained by special effect we insert
2 2
First we use the option From variances in the ef- RY · A,B RY · A = 0.16 0.1 = 0.06, and as Residual
fect size drawer to calculate the effect size. In the in- variance we insert 1 RY 2
· A,B,C = 1 0.2 = 0.8. Clicking
put field Variance explained by special effect we in- on Calculate and transfer to main window shows that
2 2
sert RY · A,B RY · A = 0.3 0.25 = 0.05, and as Residual this corresponds to a partial R2 = 0.06976744 and to an ef-
2
variance we insert 1 RY · A,B = 1 0.3 = 0.7. After click- fect size f = 0.075. We then set the input field Numerator
ing on Calculate and transfer to main window we see df in the main window to 3, the number of predictors in set
that this corresponds to a partial R2 of about 0.0666 and B (which are responsible to a potential increase in variance
to an effect size f = 0.07142857. We then set the input explained), and Number of predictors to the total number
field Numerator df in the main window to 4, the number of of predictors in A, B and C (which all influence the residual
predictors in set B, and Number of predictors to the total variance), that is to 5 + 3 + 4 = 12.
number of predictors in A and B, that is to 4 + 5 = 9. This leads to the following analysis in G * Power :
This leads to the following analysis in G * Power :
• Select
• Select Type of power analysis: Post hoc
Type of power analysis: Post hoc • Input
• Input Effect size f 2 : 0.075
Effect size f 2 : 0.0714286 a err prob: 0.01
a err prob: 0.01 Total sample size: 200
Total sample size: 90 Number of tested predictors: 3
Number of tested predictors: 4 Total number of predictors: 12
Total number of predictors: 9
• Output
• Output Noncentrality parameter l:15.000000
Noncentrality parameter l: 6.428574 Critical F: 3.888052
Critical F: 3.563110 Numerator df: 3
Numerator df: 4 Denominator df: 187
Denominator df: 80 Power (1- b): 0.766990
Power (1- b): 0.241297 We find that the power of this test is about 0.767. In this case
We find that the power of this test is very low, namely the power is slightly larger than the power value 0.74 esti-
about 0.24. This confirms the result estimated by Cohen mated by Cohen (1988, p. 439) in his example 9.13, which
(1988, p. 434) in his example 9.10, which uses identical val- uses identical values. This is due to the fact that his approx-
ues. It should be noted, however, that Cohen (1988) uses an imation for l = 14.3 underestimates the true value l = 15
approximation to the correct formula for the noncentrality given in the output above.
parameter l that in general underestimates the true l and
thus also the true power. In this particular case, Cohen esti- 14.3.3 Example showing the relation to factorial ANOVA
mates l = 6.1, which is only slightly lower than the correct designs
value 6.429 given in the output above.
We assume a 2 ⇥ 3 ⇥ 4 design with three factors U, V,
By using an a priori analysis, we can compute how large
W. We want to test main effects, two-way interactions
the sample size must be to achieve a power of 0.80. We find
(U ⇥ V, U ⇥ W, V ⇥ W) and the three-way interaction (U ⇥
that the required sample size is N = 242.
V ⇥ W). We may use the procedure “ANOVA: Fixed ef-
fects, special, main effects and interactions” in G * Power
14.3.2 Basis example for case 2 to do this analysis (see the corresponding entry in the man-
Here we make the following assumptions: A dependent ual for details). As an example, we consider the test of the
variable Y is predicted by three sets of predictors A, B V ⇥ W interaction. Assuming that Variance explained by
and C, which stand in the following causal relationship the interaction = 0.422 and Error variance = 6.75, leads
A ) B ) C. The 5 predictors in A alone account for 10% of to an effect size f = 0.25 (a mean effect size according to
the variation of Y, thus RY2 Cohen (1988)). Numerator df = 6 corresponds to (levels of
· A = 0.10. Including the 3 predic-
tors in set B increases the proportion of variance explained V - 1)(levels of W - 1), and the Number of Groups = 24 is
the total number of cells (2 · 3 · 4 = 24) in the design. With
38
a = 0.05 and total sample size 120 we compute a power of 14.6 Validation
0.470.
We now demonstrate, how these analyses can be done The results were checked against the values produced by
with the MRC procedure. A main factor with k levels cor- GPower 2.0 and those produced by PASS (Hintze, 2006).
responds to k 1 predictors in the MRC analysis, thus Slight deviations were found to the values tabulated in Co-
the number of predictors is 1, 2, and 3 for the three fac- hen (1988). This is due to an approximation used by Cohen
tors U, V, and W. The number of predictors in interactions (1988) that underestimates the noncentrality parameter l
is the product of the number of predictors involved. The and therefore also the power. This issue is discussed more
V ⇥ W interaction, for instance, corresponds to a set of thoroughly in Erdfelder et al. (1996).
(3 1)(4 1) = (2)(3) = 6 predictors.
To test an effect with MRC we need to isolate the relative 14.7 References
contribution of this source to the total variance, that is we
need to determine VS . We illustrate this for the V ⇥ W inter- See Chapter 9 in Cohen (1988) .
action. In this case we must find RY 2
·V ⇥W by excluding from
the proportion of variance that is explained by V, W, V ⇥ W
together, that is RY 2
·V,W,V ⇥W , the contribution of main ef-
2
fects, that is: RY 2 2
·V ⇥W = RY ·V,W,V ⇥W RY ·V,W . The residual
variance VE is the variance of Y from which the variance of
all sources in the design have been removed.
This is a case 2 scenario, in which V ⇥ W corresponds to
set B with 2 · 3 = 6 predictors, V [ W corresponds to set A
with 2 + 3 = 5 predictors, and all other sources of variance,
that is U, U ⇥ V, U ⇥ W, U ⇥ V ⇥ W, correspond to set C
with (1 + (1 · 2) + (1 · 3) + (1 · 2 · 3)) = 1 + 2 + 3 + 6 = 12
predictors. Thus, the total number of predictors is (6 + 5 +
12) = 23. Note: The total number of predictors is always
(number of cells in the design -1).
We now specify these contributions numerically:
RY · A,B,C = 0.325, RY · A,B RY 2 = RY2 ·V ⇥W =
Ȧ
0.0422. Inserting these values in the effect size dia-
log (Variance explained by special effect = 0.0422,
Residual variance = 1-0.325 = 0.675) yields an effect size
f 2 = 0.06251852 (compare these values with those chosen
in the ANOVA analysis and note that f 2 = 0.252 = 0.0625).
To calculate power, we set Number of tested predictors
= 6 (= number of predictors in set B), and Total number
of predictors = 23. With a = 0.05 and N = 120, we get—
as expected—the same power 0.470 as from the equivalent
ANOVA analysis.
39
15 F test: Inequality of two Variances Power (1- b): 0.80
Allocation ratio N2/N1: 1
This procedure allows power analyses for the test that
the population variances s02 and s12 of two normally dis-
tributed random variables are identical. The null and (two- • Output
sided)alternate hypothesis of this test are: Lower critical F: 0.752964
Upper critical F: 1.328085
H0 : s1 s0 = 0 Numerator df: 192
H1 : s1 s0 6= 0. Denominator df: 192
Sample size group 1: 193
The two-sided test (“two tails”) should be used if there is Sample size group 2: 193
no a priori restriction on the sign of the deviation assumed Actual power : 0.800105
in the alternate hypothesis. Otherwise use the one-sided
test (“one tail”). The output shows that we need at least 386 subjects (193
in each group) in order to achieve the desired level of the
a and b error. To apply the test, we would estimate both
15.1 Effect size index variances s21 and s20 from samples of size N1 and N0 , respec-
The ratio s12 /s02 of the two variances is used as effect size tively. The two-sided test would be significant at a = 0.05
measure. This ratio is 1 if H0 is true, that is, if both vari- if the statistic x = s21 /s20 were either smaller than the lower
ances are identical. In an a priori analysis a ratio close critical value 0.753 or greater then the upper critical value
or even identical to 1 would imply an exceedingly large 1.328.
sample size. Thus, G * Power prohibits inputs in the range By setting “Allocation ratio N2/N1 = 2”, we can easily
[0.999, 1.001] in this case. check that a much larger total sample size, namely N =
Pressing the button Determine on the left side of the ef- 443 (148 and 295 in group 1 and 2, respectively), would
fect size label in the main window opens the effect size be required if the sample sizes in both groups are clearly
drawer (see Fig. 19) that may be used to calculate the ratio different.
from two variances. Insert the variances s02 and s12 in the
corresponding input fields. 15.4 Related tests
Similar tests in G * Power 3.0:
• Variance: Difference from constant (two sample case).
40
16 t test: Correlation - point biserial
model
The point biserial correlation is a measure of association
between a continuous variable X and a binary variable Y,
the latter of which takes on values 0 and 1. It is assumed
that the continuous variables X at Y = 0 and Y = 1 are
normally distributed with means µ0 , µ1 and equal variance
Figure 20: Effect size dialog to compute r from the coefficient of
s. If p is the proportion of values with Y = 1 then the point
determination r2 .
biserial correlation coefficient is defined as:
p
( µ1 µ0 ) p (1 p )
r= 16.3 Examples
sx
We want to know how many subjects it takes to detect r =
where sx = s + (µ1 µ0 )2 /4. .25 in the population, given a = b = .05. Thus, H0 : r = 0,
The point biserial correlation is identical to a Pearson cor- H1 : r = 0.25.
relation between two vectors x, y, where xi contains a value
from X at Y = j, and yi = j codes the group from which • Select
the X was taken. Type of power analysis: A priori
The statistical model is the same as that underlying a • Input
test for a differences in means µ0 and µ1 in two inde- Tail(s): One
pendent groups. The relation between the effect size d = Effect size |r |: 0.25
(µ1 µ0 )/s used in that test and the point biserial correla- a err prob: 0.05
tion r considered here is given by: Power (1-b err prob): 0.95
d • Output
r= q
d2 + N2 noncentrality parameter d: 3.306559
n0 n1
Critical t: 1.654314
where n0 , n1 denote the sizes of the two groups and N = df: 162
n0 + n1 . Total sample size: 164
The power procedure refers to a t-test used to evaluate Actual power: 0.950308
the null hypothesis that there is no (point-biserial) correla- The results indicate that we need at least N = 164 sub-
tion in the population (r = 0). The alternative hypothesis is jects to ensure a power > 0.95. The actual power achieved
that the correlation coefficient has a non-zero value r. with this N (0.950308) is slightly higher than the requested
H0 : r=0 power.
H1 : r = r. To illustrate the connection to the two groups t test, we
calculate the corresponding effect size d for equal sample
The two-sided (“two tailed”) test should be used if there sizes n0 = n1 = 82:
is no restriction on the sign of r under the alternative hy-
Nr 164 · 0.25
pothesis. Otherwise use the one-sided (“one tailed”) test. d= p = p = 0.51639778
n0 n1 (1 r2 ) 82 · 82 · (1 0.252 )
16.1 Effect size index Performing a power analysis for the one-tailed two group t
test with this d, n0 = n1 = 82, and a = 0.05 leads to exactly
The effect size index |r | is the absolute value of the cor- the same power 0.930308 as in the example above. If we
relation coefficient in the population as postulated in the assume unequal sample sizes in both groups, for example
alternative hypothesis. From this definition it follows that n0 = 64, n1 = 100, then we would compute a different value
0 |r | < 1. for d:
Cohen (1969, p.79) defines the following effect size con-
Nr 164 · 0.25
ventions for |r |: d= p = p = 0.52930772
2
n0 n1 (1 r ) 100 · 64 · (1 0.252 )
• small r = 0.1
but we would again arrive at the same power. It thus poses
• medium r = 0.3 no restriction of generality that we only input the total sam-
ple size and not the individual group sizes in the t test for
• large r = 0.5
correlation procedure.
Pressing the Determine button on the left side of the ef-
fect size label opens the effect size drawer (see Fig. 20). You 16.4 Related tests
can use it to calculate |r | from the coefficient of determina-
tion r2 . Similar tests in G * Power 3.0:
• Exact test for the difference of one (Pearson) correlation
16.2 Options from a constant
This test has no options. • Test for the difference of two (Pearson) correlations
41
16.5 Implementation notes
The H0 -distribution is the central t-distribution with d f =
N 2 degrees of freedom. The H1 -distribution is the non-
central t-distribution with d f = N 2 and noncentrality
parameter d where
s
|r |2 N
d=
1.0 |r |2
16.6 Validation
The results were checked against the values produced by
GPower 2.0 Faul and Erdfelder (1992).
42
17 t test: Linear Regression (two output variables on the right side. The input values must
conform to the usual restrictions, that is, sxi > 0, syi > 0,
groups) 1 < ri < 1. In addition, Eq (4) together with the restriction
on ri implies the additional restriction 1 < b · sxi /syi < 1.
A linear regression is used to estimate the parameters a, b
Clicking on the button Calculate and transfer to
of a linear relationship Y = a + bX between the dependent
main window copies the values given in Std dev s_x1,
variable Y and the independent variable X. X is assumed to
Std dev s_x2, Std dev residual s, Allocation ration
be a set of fixed values, whereas Yi is modeled as a random
N2/N1, and |D slope | to the corresponding input fields in
variable: Yi = a + bXi + # i , where # i denotes normally dis-
the main window.
tributed random errors with mean 0 and standard deviation
si . A common assumption also adopted here is that all si ’s
are identical, that is si = s. The standard deviation of the 17.2 Options
error is also called the standard deviation of the residuals.
This test has no options.
If we have determined the linear relationships between X
and Y in two groups: Y1 = a1 + b1 X1 , Y2 = a2 + b2 X2 , we
may ask whether the slopes b1 , b2 are identical. 17.3 Examples
The null and the two-sided alternative hypotheses are
We replicate an example given on page 594 in Dupont and
H0 : b1 b2 = 0 Plummer (1998) that refers to an example in Armitage,
H1 : b1 b2 6= 0. Berry, and Matthews (2002, p. 325). The data and relevant
statistics are shown in Fig. (22). Note: Contrary to Dupont
and Plummer (1998), we here consider the data as hypoth-
17.1 Effect size index
esized true values and normalize the variance by N not
The absolute value of the difference between the slopes ( N 1).
|Dslope| = |b1 b2 | is used as effect size. To fully spec- The relation of age and vital capacity for two groups
ify the effect size, the following additional inputs must be of men working in the cadmium industry is investigated.
given: Group 1 includes n1 = 28 worker with less than 10 years
of cadmium exposure, and Group 2 n2 = 44 workers never
• Std dev residual s exposed to cadmium. The standard deviation of the ages in
The standard deviation s > 0 of the residuals in the both groups are sx1 = 9.029 and sx2 = 11.87. Regressing vi-
combined data set (i.e. the square root of the weighted tal capacity on age gives the following slopes of the regres-
sum of the residual variances in the two data sets): If sion lines b 1 = 0.04653 and b 2 = 0.03061. To calculate
sr21 and sr22 denote the variance of the residuals r1 = the pooled standard deviation of the residuals we use the
( a1 + b1 X1 ) Y1 and r2 = ( a2 + b2 X2 ) Y2 in the two effect size dialog: We use input mode “s_x, s_y, slope
groups, and n1 , n2 the respective sample sizes, then => residual s, r” and insert the values given above, the
s standard deviations sy1 , sy2 of y (capacity) as given in Fig.
n1 sr21 + n2 sr22 22, and the allocation ratio n2 /n1 = 44/28 = 1.571428. This
s= (3) results in an pooled standard deviation of the residuals of
n1 + n2
s = 0.5578413 (compare the right panel in Fig. 21).
• Std dev s_x1 We want to recruit enough workers to detect a true differ-
The standard deviation sx1 > 0 of the X-values in ence in slope of |( 0.03061) ( 0.04653)| = 0.01592 with
group 1. 80% power, a = 0.05 and the same allocation ratio to the
two groups as in the sample data.
• Std dev s_x2
• Select
The standard deviation sx2 > 0 of the X-values in Type of power analysis: A priori
group 2.
• Input
Important relationships between the standard deviations Tail(s): Two
sxi of Xi , syi of Yi , the slopes bi of the regression lines, and |D slope|: 0.01592
the correlation coefficient ri between Xi and Yi are: a err prob: 0.05
syi = (bi sxi )/ri (4) Power (1- b): 0.80
q Allocation ratio N2/N1: 1.571428
syi = sri / 1 r2i (5) Std dev residual s: 0.5578413
Std dev s_x1: 9.02914
where si denotes the standard deviation of the residuals
Std dev s_x2: 11.86779
Yi (bi X + ai ).
The effect size dialog may be used to determine Std • Output
dev residual s and |D slope| from other values based Noncentrality parameter d:2.811598
on Eqns (3), (4) and (5) given above. Pressing the button Critical t: 1.965697
Determine on the left side of the effect size label in the main Df: 415
window opens the effect size drawer (see Fig. 21). Sample size group 1: 163
The left panel in Fig 21 shows the combinations of in- Sample size group 2: 256
put and output values in different input modes. The in- Total sample size: 419
put variables stand on the left side of the arrow ’=>’, the Actual power: 0.800980
43
Figure 21: Effect size drawer to calculate the pooled standard deviation of the residuals s and the effect size |D slope | from various
inputs constellations (see left panel). The right panel shows the inputs for the example discussed below.
Figure 22: Data for the example discussed Dupont and Plumer
44
The output shows that we need 419 workers in total, with 17.5 Implementation notes
163 in group 1 and 256 in group 2. These values are close
to those reported in Dupont and Plummer (1998, p. 596) for The procedure implements a slight variant of the algorithm
this example (166 + 261 = 427). The slight difference is due proposed in Dupont and Plummer (1998). The only differ-
to the fact that they normalize the variances by N 1, and ence is, that we replaced the approximation by shifted cen-
use shifted central t-distributions instead of non-central t tral t-distributions used in their paper with noncentral t-
distributions. distributions. In most cases this makes no great difference.
The H0-distribution is the central t distribution with d f =
n1 + n2 4 degrees of freedom, where n1 and n2 denote
17.3.1 Relation to Multiple Regression: Special the sample size in the two groups, and the H1 distribution
The present procedure is essentially a special case of Multi- is the non-central t distribution with the same degrees of
p
ple regression, but provides a more convenient interface. To freedom and the noncentrality parameter d = D n2 .
show this, we demonstrate how the MRC procedure can be
used to compute the example above (see also Dupont and Statistical test. The power is calculated for the t-test for
Plummer (1998, p.597)). equal slopes as described in Armitage et al. (2002) in chap-
First, the data are combined into a data set of size n1 + ter 11. The test statistic is (see Eqn 11.18, 11.19, 11.20):
n2 = 28 + 44 = 72. With respect to this combined data set,
we define the following variables (vectors of length 72): b̂1 b̂2
t=
sb̂ b̂2
• y contains the measured vital capacity 1
reduces to y = b 0 + b 1 x1 , and y = ( b 0 + b 2 x2 ) + ( b 1 + Power of the test: In the procedure for equal slopes the
b 3 ) x1 , for unexposed and exposed workers, respectively. In p
noncentrality parameter is d = D n, with D = |Dslope|/sR
this model b 3 represents the difference in slope between and
both groups, which is assumed to be zero under the null v !
u
hypothesis. Thus, the above model reduces to u 1 1
sR = ts2 1 + 2
+ 2 (6)
y = b 0 + b 1 x1 + b 2 x2 + # i msx1 sx2
if the null hypothesis is true. where m = n1 /n2 , sx1 and sx2 are the standard deviations
Performing a multiple regression analysis with the full of X in group 1 and 2, respectively, and s the common stan-
model leads to b 1 = 0.01592 and R21 = 0.3243. With the dard deviation of the residuals.
reduced model assumed in the null hypothesis one finds
R20 = 0.3115. From these values we compute the following 17.6 Validation
effect size:
The results were checked for a range of input scenarios
R2 R20 0.3243 0.3115 against the values produced by the program PS published
f2 = 1 = = 0.018870
2
1 R1 1 0.3243 by Dupont and Plummer (1998). Only slight deviations
were found that are probably due to the use of the non-
Selecting an a priori analysis and setting a err prob to central t-distribution in G * Power instead of the shifted
0.05, Power (1-b err prob) to 0.80, Numerator df to 1 central t-distributions that are used in PS.
and Number of predictors to 3, we get N = 418, that is
almost the same result as in the example above.
45
18 t test: Means - difference between
two dependent means (matched
pairs)
The null hypothesis of this test is that the population means
µ x , µy of two matched samples x, y are identical. The sam-
pling method leads to N pairs ( xi , yi ) of matched observa-
tions.
The null hypothesis that µ x = µy can be reformulated in
terms of the difference zi = xi yi . The null hypothesis is
then given by µz = 0. The alternative hypothesis states that
µz has a value different from zero:
H0 : µz = 0
H1 : µz 6= 0.
• Output
18.2 Options Noncentrality parameter d: 2.981424
This test has no options. Critical t: 2.009575
df: 49
Power (1- b): 0.832114
18.3 Examples
Let us try to replicate the example in Cohen (1969, p. 48).
The computed power of 0.832114 is close to the value 0.84
The effect of two teaching methods on algebra achieve-
estimated by Cohen using his tables. To estimate the in-
ments are compared between 50 IQ matched pairs of pupils
crease in power due to the correlation between pairs (i.e.,
(i.e. 100 pupils). The effect size that should be detected is
due to the shifting from a two-group design to a matched-
d = (m0 m1 )/s = 0.4. Note that this is the effect size
pairs design), we enter "Correlation between groups = 0" in
index representing differences between two independent
the effect size drawer. This leads to dz = 0.2828427. Repeat-
means (two groups). We want to use this effect size as a
ing the above analysis with this effect size leads to a power
basis for a matched-pairs study. A sample estimate of the
of only 0.500352.
correlation between IQ-matched pairs in the population has
How many subjects would we need to arrive at a power
been calculated to be r = 0.55. We thus assume r xy = 0.55.
of about 0.832114 in a two-group design? We click X-Y plot
What is the power of a two-sided test at an a level of 0.05?
for a range of values to open the Power Plot window.
To compute the effect size dz we open the effect size
Let us plot (on the y axis) the power (with markers and
drawer and choose "from group parameters". We only know
displaying the values in the plot) as a function of the
the ratio d = (µ x µy )Dm/s = 0.4. We are thus free to
total sample size. We want to plot just 1 graph with the
choose any values for the means and (equal) standard de-
err prob set to 0.05 and effect size dz fixed at 0.2828427.
viations that lead to this ratio. We set “Mean group 1 = 0”,
46
Figure 24: Power vs. sample size plot for the example.
18.6 Validation
The results were checked against the values produced by
GPower 2.0.
47
19 t test: Means - difference from con- 19.3 Examples
stant (one sample case) We want to test the null hypothesis that the population
mean is µ = µ0 = 10 against the alternative hypothesis
The one-sample t test is used to determine whether the pop- that µ = 15. The standard deviation in the population is es-
ulation mean µ equals some specified value µ0 . The data are timated to be s = 8. We enter these values in the effect size
from a random sample of size N drawn from a normally dialog: Mean H0 = 10, Mean H1 = 15, SD s = 8 to calculate
distributed population. The true standard deviation in the the effect size d = 0.625.
population is unknown and must be estimated from the Next we want to know how many subjects it takes to
data. The null and alternate hypothesis of the t test state: detect the effect d = 0.625, given a = b = .05. We are only
interested in increases in the mean and thus choose a one-
H0 : µ µ0 = 0 tailed test.
H1 : µ µ0 6= 0.
• Select
The two-sided (“two tailed”) test should be used if there Type of power analysis: A priori
is no restriction on the sign of the deviation from µ0 as-
sumed in the alternate hypothesis. Otherwise use the one- • Input
sided (“one tailed”) test . Tail(s): One
Effect size d: 0.625
a err prob: 0.05
19.1 Effect size index
Power (1-b err prob): 0.95
The effect size index d is defined as:
• Output
µ µ0 Noncentrality parameter d: 3.423266
d=
s Critical t: 1.699127
df: 29
where s denotes the (unknown) standard deviation in the
Total sample size: 30
population. Thus, if µ and µ0 deviate by one standard de-
Actual power: 0.955144
viation then d = 1.
Cohen (1969, p. 38) defines the following conventional The results indicates that we need at least N = 30 subjects
values for d: to ensure a power > 0.95. The actual power achieved with
• small d = 0.2 this N (0.955144) is slightly higher than the requested one.
Cohen (1969, p.59) calculates the sample size needed in a
• medium d = 0.5 two-tailed test that the departure from the population mean
is at least 10% of the standard deviation, that is d = 0.1,
• large d = 0.8 given a = 0.01 and b 0.1. The input and output values
Pressing the Determine button on the left side of the ef- for this analysis are:
fect size label opens the effect size drawer (see Fig. 30). You
• Select
can use this dialog to calculate d from µ, µ0 and the stan-
Type of power analysis: A priori
dard deviation s.
• Input
Tail(s): Two
Effect size d: 0.1
a err prob: 0.01
Power (1-b err prob): 0.90
• Output
Noncentrality parameter d: 3.862642
Critical t: 2.579131
df: 1491
Total sample size: 1492
Actual power: 0.900169
Figure 25: Effect size dialog to calculate effect size d means and
standard deviation. G * Power outputs a sample size of n = 1492 which is
. slightly higher than the value 1490 estimated by Cohen us-
ing his tables.
48
19.6 Validation
The results were checked against the values produced by
GPower 2.0.
49
20 t test: Means - difference be-
tween two independent means (two
groups)
The two-sample t test is used to determine if two popu-
lation means µ1 , µ2 are equal. The data are two samples
of size n1 and n2 from two independent and normally dis-
tributed populations. The true standard deviations in the
two populations are unknown and must be estimated from
the data. The null and alternate hypothesis of this t test are:
H0 : µ1 µ2 = 0
H1 : µ1 µ2 6= 0.
20.2 Options
This test has no options.
20.3 Examples
20.4 Related tests
20.5 Implementation notes
The H0 distribution is the central Student t distribution
t( N 2, 0); the H1 distribution is the noncentral Student
50
21 Wilcoxon signed-rank test: Means -
difference from constant (one sam-
ple case)
The Wilcoxon signed-rank test is a nonparametric alterna-
tive to the one sample t test. Its use is mainly motivated by
uncertainty concerning the assumption of normality made
in the t test.
The Wilcoxon signed-rank test can be used to test
whether a given distribution H is symmetric about zero.
The power routines implemented in G * Power refer to the
important special case of a “shift model”, which states that
H is obtained by subtracting two symmetric distributions F
and G, where G is obtained by shifting F by an amount D:
G ( x ) = F ( x D) for all x. The relation of this shift model
to the one sample t test is obvious if we assume that F is the
fixed distribution with mean µ0 stated in the null hypoth-
esis and G the distribution of the test group with mean µ.
Under this assumptions H ( x ) = F ( x ) G ( x ) is symmetric
about zero under H0 , that is if D = µ µ0 = 0 or, equiv-
alently F ( x ) = G ( x ), and asymmetric under H1 , that is if Figure 27: Densities of the Normal, Laplace, and Logistic distribu-
tion
D 6= 0.
The Wilcoxon signed-rank test The signed-rank test is Power of the Wilcoxon rank-sum test The signed-rank
based on ranks. Assume that a sample of size N is drawn test as described above is distribution free in the sense that
from a distribution H ( x ). To each sample value xi a rank its validity does not depend on the specific form of the re-
S between 1 and N is assigned that corresponds to the po- sponse distribution H. This distribution independence does
sition of | xi | in a increasingly ordered list of all absolute no longer hold, however, if one wants to estimate numeri-
sample values. The general idea of the test is to calculate cal values for the power of the test. The reason is that the
the sum of the ranks assigned to positive sample values effect of a certain shift D on the deviation from symmetry
(x > 0) and the sum of the ranks assigned to negative sam- and therefore the distribution of Vs depends on the specific
ple values (x < 0) and to reject the hypothesis that H is form of F (and G). For power calculations it is therefore
symmetric if these two rank sums are clearly different. necessary to specify the response distribution F. G * Power
The actual procedure is as follows: Since the rank sum of provides three predefined continuous and symmetric re-
negative values is known if that of positive values is given, sponse functions that differ with respect to kurtosis, that
it suffices to consider the rank sum Vs of positive values. is the “peakedness” of the distribution:
The positive ranks can be specified by a n-tupel (S1 , . . . , Sn ),
where 0 n N. There are ( N • Normal distribution N (µ, s2 )
n ) possible n-tuples for a
given n. Since n can take on the values 0, 1, . . . , N, the total
1 ( x µ)2 /(2s2 )
number of possible choices for the S’s is: ÂiN=0 ( Ni ) = 2 N . p( x ) = p e
(We here assume a continuous distribution H for which the 2p
probabilities for ties, that is the occurance of two identical
| x |, is zero.) Therefore, if the null hypothesis is true, then • Laplace or Double Exponential distribution:
the probability to observe a particular n and a certain n- 1 |x|
tuple is:P( N+ = n; S1 = s1 , . . . , Sn = sn ) = 1/2 N . To calcu- p( x ) = e
2
late the probability to observe (under H0 ) a particular posi-
tive rank sum Vs = S1 + . . . + Sn we just need to count the • Logistic distribution
number k of all tuples with rank sum Vs and to add their
probabilities, thus P(Vs = v) = k/2n . Doing this for all e x
possible Vs between the minimal value 0 corresponding to p( x ) =
(1 + e x )2
the case n = 0, and the maximal value N ( N + 1)/2, corre-
sponding to the n = N tuple (1, 2, . . . , N ), gives the discrete
Scaled and/or shifted versions of the Laplace and Logistic
probability distribution of Vs under H0 . This distribution
densities that can be calculated by applying the transfor-
is symmetric about N ( N + 1)/4. Referring to this proba-
mation 1a p(( x b)/a), a > 0, are again probability densities
bility distribution we choose in a one-sided test a critical
and are referred to by the same name.
value c with P(Vs c) a and reject the null hypothesis
if a rank sum Vs > c is observed. With increasing sample
size the exact distribution converges rapidly to the normal Approaches to the power analysis G * Power implements
distribution with mean E(Vs ) = N ( N + 1)/4 and variance two different methods to estimate the power for the signed-
Var (Vs ) = N ( N + 1)(2N + 1)/24. rank Wilcoxon test: A) The asymptotic relative efficiency
51
(A.R.E.) method that defines power relative to the one sam- where X, Y, Z are independent random variables with
ple t test, and B) a normal approximation to the power pro- distribution H. The expectation and variance are given
posed by Lehmann (1975, pp. 164-166). We describe the gen- as:
eral idea of both methods in turn. More specific information
can be found in the implementation section below. E(Vs ) = N(N 1) p2 /2 + N p1
Var (Vs ) = N(N 1)( N 2)( p3 p21 )
• A.R.E-method: The A.R.E method assumes the shift
model described in the introduction. It relates normal + N ( N 1)[2( p1 p2 )2
approximations to the power of the one-sample t-test +3p2 (1 p2 )]/2 + N p1 (1 p1 )
(Lehmann, 1975, Eq. (4.44), p. 172) and the Wilcoxon
test for a specified H (Lehmann, 1975, Eq. (4.15), p. The value p1 is easy to interpret: If H is continuous
160). If for a model with fixed H and D the sample and shifted by an amount D > 0 to larger values, then
size N are required to achieve a specified power for p1 is the probability to observe a negative value. For a
the Wilcoxon signed-rank test and a samples size N 0 is null shift (no treatment effect, D = 0, i.e. H symmetric
required in the t test to achieve the same power, then about zero) we get p1 = 1/2.
the ratio N 0 /N is called the efficiency of the Wilcoxon
If c denotes the critical value of a level a test and F
signed-rank test relative to the one-sample t test. The
the CDF of the standard normal distribution, then the
limiting efficiency as sample size N tends to infinity
normal approximation of the power of the (one-sided)
is called the asymptotic relative efficiency (A.R.E. or
test is given by:
Pitman efficiency) of the Wilcoxon signed rank test rel-
ative to the t test. It is given by (Hettmansperger, 1984, " #
c a E(Vs )
p. 71): P( H ) ⇡ 1 F p
Var (Vs )
2 32 2 32
+•
Z +•
Z +•
Z
2 4 where a = 0.5 if a continuity correction is applied, and
12sH H 2 ( x )dx 5 = 12 x2 H ( x )dx 4 H 2 ( x )dx 5
a = 0 otherwise.
• • •
The formulas for p1 , p2 , p3 for the predefined distribu-
Note, that the A.R.E. of the Wilcoxon signed-rank test tions are given in the implementation section.
to the one-sample t test is identical to the A.R.E of the
Wilcoxon rank-sum test to the two-sample t test (if H =
F; for the meaning of F see the documentation of the
21.1 Effect size index
Wilcoxon rank sum test). The conventional values proposed by Cohen (1969, p. 38)
If H is a normal distribution, then the A.R.E. is 3/p ⇡ for the t-test are applicable. He defines the following con-
0.955. This shows, that the efficiency of the Wilcoxon ventional values for d:
test relative to the t test is rather high even if the as-
• small d = 0.2
sumption of normality made in the t test is true. It
can be shown that the minimal A.R.E. (for H with • medium d = 0.5
finite variance) is 0.864. For non-normal distributions
the Wilcoxon test can be much more efficient than the • large d = 0.8
t test. The A.R.E.s for some specific distributions are
Pressing the button Determine on the left side of the ef-
given in the implementation notes. To estimate the
fect size label opens the effect size dialog (see Fig. 28). You
power of the Wilcoxon test for a given H with the
can use this dialog to calculate d from the means and a
A.R.E. method one basically scales the sample size with
common standard deviations in the two populations.
the corresponding A.R.E. value and then performs the
If N1 = N2 but s1 6= s2 you may use a mean s0 as com-
procedure for the t test for two independent means.
mon within-population s (Cohen, 1969, p.42):
• Lehmann method: The computation of the power re- s
quires the distribution of Vs for the non-null case, that s12 + s22
is for cases where H is not symmetric about zero. The s0 =
2
Lehmann method uses the fact that
If N1 6= N2 you should not use this correction, since this
Vs E(Vs ) may lead to power values that differ greatly from the true
p
VarVs values (Cohen, 1969, p.42).
tends to the standard normal distribution as N tends
to infinity for any fixed distributions H for which 21.2 Options
0 < P( X < 0) < 1. The problem is then to compute
This test has no options.
expectation and variance of Vs . These values depend
on three “moments” p1 , p2 , p3 , which are defined as:
– p1 = P ( X < 0).
– p2 = P ( X + Y > 0).
– p3 = P( X + Y > 0 and X + Z > 0).
52
21.5 Implementation notes
The H0 distribution is the central Student t distribution
t( Nk 2, 0); the H1 distribution is the noncentral Student t
distribution t( Nk 2, d), where the noncentrality parameter
d is given by: s
N1 N2 k
d=d
N1 + N2
The parameter k represents the asymptotic relative effi-
ciency vs. correspondig t tests (Lehmann, 1975, p. 371ff)
and depends in the following way on the parent distribu-
tion:
Parent Value of k (ARE)
Uniform: 1.0
Normal: 3/pi
Logistic: p 2 /9
Laplace: 3/2
min ARE: 0.864
53
22 Wilcoxon-Mann-Whitney test of a
difference between two indepen-
dent means
The Wilcoxon-Mann-Whitney (WMW) test (or U-test) is a
nonparametric alternative to the two-group t test. Its use is
mainly motivated by uncertainty concerning the assump-
tion of normality made in the t test.
It refers to a general two sample model, in which F and
G characterize response distributions under two different
conditions. The null effects hypothesis states F = G, while
the alternative is F 6= G. The power routines implemented
in G * Power refer to the important special case of a “shift
model”, which states that G is obtained by shifting F by
an amount D: G ( x ) = F ( x D) for all x. The shift model
expresses the assumption that the treatment adds a certain
amout D to the response x (Additivity).
54
(1975, pp. 69-71). We describe the general idea of both meth- The value p1 is easy to interpret: If the response dis-
ods in turn. More specific information can be found in the tribution G of the treatment group is shifted to larger
implementation section below. values, then p1 is the probability to observe a lower
value in the control condition than in the test condi-
• A.R.E-method: The A.R.E method assumes the shift tion. For a null shift (no treatment effect, D = 0) we get
model described in the introduction. It relates normal p1 = 1/2.
approximations to the power of the t-test (Lehmann,
1975, Eq. (2.42), p. 78) and the Wilcoxon test for a speci- If c denotes the critical value of a level a test and F
fied F (Lehmann, 1975, Eq. (2.29), p. 72). If for a model the CDF of the standard normal distribution, then the
with fixed F and D the sample sizes m = n are re- normal approximation of the power of the (one-sided)
quired to achieve a specified power for the Wilcoxon test is given by:
test and samples sizes m0 = n0 are required in the t test " #
to achieve the same power, then the ratio n0 /n is called c a mnp1
P( F, G ) ⇡ 1 F p
the efficiency of the Wilcoxon test relatve to the t test. Var (WXY )
The limiting efficiency as sample sizes m and n tend
to infinity is called the asymptotic relative efficiency where a = 0.5 if a continuity correction is applied, and
(A.R.E. or Pitman efficiency) of the Wilcoxon test rela- a = 0 otherwise.
tive to the t test. It is given by (Hettmansperger, 1984, The formulas for p1 , p2 , p3 for the predefined distribu-
p. 71): tions are given in the implementation section.
2 32 2 32
+•
Z +•
Z +•
Z
12sF2 4 2
F ( x )dx 5 = 12 2
x F ( x )dx 4 F2 ( x )dx 5 22.1 Effect size index
• • •
A.R.E. method In the A.R.E. method the effect size d is
If F is a normal distribution, then the A.R.E. is 3/p ⇡ defined as:
µ1 µ2 D
0.955. This shows, that the efficiency of the Wilcoxon d= =
s s
test relative to the t test is rather high even if the as-
sumption of normality made in the t test is true. It where µ1 , µ2 are the means of the response functions F and
can be shown that the minimal A.R.E. (for F with fi- G and s the standard deviation of the response distribution
nite variance) is 0.864. For non-normal distributions F.
the Wilcoxon test can be much more efficient than In addition to the effect size, you have to specify the
the t test. The A.R.E.s for some specific distributions A.R.E. for F. If you want to use the predefined distributions,
are given in the implementation notes. To estimate the you may choose the Normal, Logistic, or Laplace transfor-
power of the Wilcoxon test for a given F with the A.R.E. mation or the minimal A.R.E. From this selection G * Power
method one basically scales the sample size with the determines the A.R.E. automatically. Alternatively, you can
corresponding A.R.E. value and then performs the pro- choose the option to determine the A.R.E. value by hand.
cedure for the t test for two independent means. In this case you must first calculate the A.R.E. for your F
using the formula given above.
• Lehmann method: The computation of the power re-
quires the distribution of WXY for the non-null case Lehmann method In the Lehmann-
F 6= G. The Lehmann method uses the fact that The conventional values proposed by Cohen (1969, p. 38)
WXY E(WXY ) for the t-test are applicable. He defines the following con-
p ventional values for d:
VarWXY
tends to the standard normal distribution as m and n • small d = 0.2
tend to infinity for any fixed distributions F and G for
• medium d = 0.5
X and Y for which 0 < P( X < Y ) < 1. The problem
is then to compute expectation and variance of WXY . • large d = 0.8
These values depend on three “moments” p1 , p2 , p3 ,
which are defined as: Pressing the button Determine on the left side of the ef-
fect size label opens the effect size dialog (see Fig. 30). You
– p1 = P ( X < Y ). can use this dialog to calculate d from the means and a
– p2 = P( X < Y and X < Y 0 ). common standard deviations in the two populations.
– p3 = P( X < Y and X 0 < Y ). If N1 = N2 but s1 6= s2 you may use a mean s0 as com-
mon within-population s (Cohen, 1969, p.42):
Note that p2 = p3 for symmetric F in the shift model.
s
The expectation and variance are given as:
s12 + s22
s0 =
E(WXY ) = mnp1 2
Var (WXY ) = mnp1 (1 p1 ) . . .
If N1 6= N2 you should not use this correction, since this
+mn(n 1)( p2 p21 ) . . . may lead to power values that differ greatly from the true
+nm(m 1)( p3 p21 ) values (Cohen, 1969, p.42).
55
O’Brien (1998). There was complete correspondence with
the values given in O’Brien, while there were slight dif-
ferences to those produced by PASS. The reason of these
differences seems to be that PASS truncates the weighted
sample sizes to integer values.
Figure 30: Effect size dialog to calculate effect size d from the pa-
rameters of two independent random variables with equal vari-
ance.
.
22.2 Options
This test has no options.
22.3 Examples
22.4 Related tests
22.5 Implementation notes
The procedure uses the The H0 distribution is the central
Student t distribution t( Nk 2, 0); the H1 distribution is
the noncentral Student t distribution t( Nk 2, d), where the
noncentrality parameter d is given by:
s
N1 N2 k
d=d
N1 + N2
22.6 Validation
The results were checked against the values produced by
PASS (Hintze, 2006) and those produced by unifyPow
56
23 t test: Generic case
With generic t tests you can perform power analyses for
any test that depends on the t distribution. All parameters
of the noncentral t-distribution can be manipulated inde-
pendently. Note that with Generic t-Tests you cannot do
a priori power analyses, the reason being that there is no
definite association between N and df (the degrees of free-
dom). You need to tell G * Power the values of both N and
df explicitly.
23.2 Options
This test has no options.
23.3 Examples
To illustrate the usage we calculate with the generic t test
the power of a one sample t test. We assume N = 25,
µ0 = 0, µ1 = 1, and s = 4 and get the effect size
d = (µ0 µ1 )/s p = (0 1)/4 p = 0.25, the noncentrality
parameter d = d N = 0.25 25 = 1.25, and the de-
grees of freedom d f = N 1 = 24. We choose a post hoc
analysis and a two-sided test. As result we get the power
(1 b) = 0.224525; this is exactly the same value we get
from the specialized routine for this case in G * Power .
23.6 Validation
The results were checked against the values produced by
GPower 2.0.
57
24 c2 test: Variance - difference from • Input
Tail(s): One
constant (one sample case) Ratio var1/var0: 0.6666667
a err prob: 0.05
This procedure allows power analyses for the test that the
Power (1- b): 0.80
population variance s2 of a normally distributed random
variable has the specific value s02 . The null and (two-sided)
alternative hypothesis of this test are: • Output
Lower critical c2 : 60.391478
H0 : s s0 = 0 Upper critical c2 : 60.391478
H1 : s s0 6= 0. Df: 80
Total sample size: 81
The two-sided test (“two tails”) should be used if there
Actual power : 0.803686
is no restriction on the sign of the deviation assumed in
the alternative hypothesis. Otherwise use the one-sided test The output shows that using a one-sided test we need at
(“one tail”). least 81 subjects in order to achieve the desired level of the
a and b error. To apply the test, we would estimate the
24.1 Effect size index variance s2 from the sample of size N. The one-sided test
would be significant at a = 0.05 if the statistic x = ( N 1) ·
The ratio s2 /s02 of the variance assumed in H1 to the base s2 /s02 were lower than the critical value 60.39.
line variance is used as effect size measure. This ratio is 1 By setting “Tail(s) = Two”, we can easily check that a two-
if H0 is true, that is, if both variances are identical. In an sided test under the same conditions would have required
a priori analysis a ratio close or even identical to 1 would a much larger sample size, namely N = 103, to achieve the
imply an exceedingly large sample size. Thus, G * Power error criteria.
prohibits inputs in the range [0.999, 1.001] in this case.
Pressing the button Determine on the left side of the ef-
fect size label in the main window opens the effect size
24.4 Related tests
drawer (see Fig. 31) that may be used to calculate the ra- Similar tests in G * Power 3.0:
tio from the two variances. Insert the baseline variance s02
in the field variance V0 and the alternate variance in the • Variance: Test of equality (two sample case).
field variance V1.
24.5 Implementation notes
It is assumed that the population is normally distributed
and that the mean is not known in advance but estimated
from a sample of size N. Under these assumptions the
H0-distribution of s2 ( N 1)/s02 is the central c2 distribu-
tion with N 1 degrees of freedom (c2N 1 ), and the H1-
distribution is the same central c2 distribution scaled with
the variance ratio, that is, (s2 /s02 ) · c2N 1 .
Figure 31: Effect size drawer to calculate variance ratios. 24.6 Validation
The correctness of the results were checked against values
produced by PASS (Hintze, 2006) and in a monte carlo
24.2 Options simulation.
24.3 Examples
We want to test whether the variance in a given population
is clearly lower than s0 = 1.5. In this application we use
“s2 is less than 1” as criterion for “clearly lower”. Insert-
ing variance V0 = 1.5 and variance V1 = 1 in the effect
size drawer we calculate as effect size a Ratio var1/var0
of 0.6666667.
How many subjects are needed to achieve the error levels
a = 0.05 and b = 0.2 in this test? This question can be
answered by using the following settings in G * Power :
• Select
Type of power analysis: A priori
58
25 z test: Correlation - inequality of We have two data sets one using A with N1 = 51, and a
second using B with N2 = 206. What is the power of a two-
two independent Pearson r’s sided test for a difference between these correlations, if we
set a = 0.05?
This procedure refers to tests of hypotheses concerning dif-
We use the effect size drawer to calculate the effect size q
ferences between two independent population correlation
from both correlations. Setting correlation coefficient
coefficients. The null hypothesis states that both correlation
r1 = 0.75 and correlation coefficient r2 = 0.88 yields
coefficients are identical r1 = r2 . The (two-sided) alterna-
q = 0.4028126. The input and outputs are as follows:
tive hypothesis is that the correlation coefficients are differ-
ent: r1 6= r2 : • Select
H0 : r1 r2 = 0 Type of power analysis: Post hoc
H1 : r1 r2 6= 0.
• Input
If the direction of the deviation r1 r2 cannot be pre- Tail(s): two
dicted a priori, a two-sided (’two-tailed’) test should be Effect size q: -0.4028126
used. Otherwise use a one-sided test. a err prob: 0.05
Sample size: 260
25.1 Effect size index Sample size: 51
The effect size index q is defined as a difference between
two ’Fisher z’-transformed correlation coefficients: q = z1 • Output
z2 , with z1 = ln((1 + r1 )/(1 r1 ))/2, z2 = ln((1 + r2 )/(1 Critical z: 60.391478
r2 ))/2. G * Power requires q to lie in the interval [ 10, 10]. Upper critical c2 : -1.959964
Cohen (1969, p. 109ff) defines the following effect size Power (1- b): 0.726352
conventions for q:
• small q = 0.1 The output shows that the power is about 0.726. This is
very close to the result 0.72 given in Example 4.3 in Cohen
• medium q = 0.3 (1988, p. 131), which uses the same input values. The small
• large q = 0.5 deviations are due to rounding errors in Cohen’s analysis.
If we instead assume N1 = N2 , how many subjects do
Pressing the button Determine on the left side of the ef- we then need to achieve the same power? To answer this
fect size label opens the effect size dialog: question we use an a priori analysis with the power cal-
culated above as input and an Allocation ratio N2/N1 =
1 to enforce equal sample sizes. All other parameters are
chosen identical to that used in the above case. The result
is, that we now need 84 cases in each group. Thus choosing
equal sample sizes reduces the totally needed sample size
considerably, from (260+51) = 311 to 168 (84 +84).
59
26 z test: Correlation - inequality of The null hypothesis in the special case of a ‘common in-
dex’ states that r a,b = r a,c . The (two-sided) alternative hy-
two dependent Pearson r’s pothesis is that these correlation coefficients are different:
r a,b 6= r a,c :
This procedure provides power analyses for tests of the
hypothesis that two dependent Pearson correlation coef- H0 : r a,b r a,c = 0
ficients r a,b and rc,d are identical. The corresponding test H1 : r a,b r a,c 6= 0.
statistics Z1? and Z2? were proposed by Dunn and Clark Here, G * Power refers to the test Z1? described in Eq. (11)
(1969) and are described in Eqns. (11) and (12) in Steiger in Steiger (1980).
(1980) (for more details on these tests, see implementation If the direction of the deviation r a,b rc,d (or r a,b r a,c
notes below). in the ’common index’ case) cannot be predicted a priori, a
Two correlation coefficients r a,b and rc,d are dependent, two-sided (’two-tailed’) test should be used. Otherwise use
if at least one of the four possible correlation coefficients a one-sided test.
r a,c , r a,d , rb,c and rb,d between other pairs of the four data
sets a, b, c, d is non-zero. Thus, in the general case where
a, b, c, and d are different data sets, we do not only have to
26.1 Effect size index
consider the two correlations under scrutiny but also four In this procedure the correlation coefficient assumed under
additional correlations. H1 is used as effect size, that is rc,d in the general case of
In the special case, where two of the data sets are iden- ‘no common index’ and r a,c in the special case of a ’common
tical, the two correlations are obviously always dependent, index’.
because at least one of the four additional correlations men- To fully specify the effect size, the following additional
tioned above is exactly 1. Two other of the four additional inputs are required:
correlations are identical to the two correlations under test.
Thus, there remains only one additional correlation that can • r a,b , the correlation coefficient assumed under H0, and
be freely specified. In this special case we denote the H0 • all other relevant correlation coefficients that specify
correlation r a,b , the H1 correlation r a,c and the additional the dependency between the correlations assumed in
correlation rb,c . H0 and H1: rb,c in the ’common index’ case, and
It is convenient to describe these two cases by two corre- r a,c , r a,d , rb,c , rb,d in the general case of ’no common in-
sponding correlations matrices (that may be sub-matrices of dex’.
a larger correlations matrix): A 4 ⇥ 4-matrix in the general
case of four different data sets (‘no common index’): G * Power requires the correlations assumed under H0 and
0 1 H1 to lie in the interval [ 1 + #, 1 #], with # = 10 6 , and
1 r a,b r a,c r a,d the additional correlations to lie in the interval [ 1, 1]. In
B r a,b 1 rb,c rb,d C
C1 = B C, a priori analyses zero effect sizes are not allowed, because
@ r a,c rb,c 1 rc,d A this would imply an infinite sample size. In this case the
r a,d rb,d rc,d 1 additional restriction |r a,b rc,d | > 10 6 (or |r a,b r a,c | >
10 6 ) holds.
and a 3 ⇥ 3 matrix in the special case, where one of the data
Why do we not use q, the effect size proposed by Cohen
sets is identical in both correlations (‘common index’):
(1988) for the case of two independent correlations? The ef-
0 1 fect size q is defined as a difference between two ’Fisher
1 r a,b r a,c
C2 = @ r a,b 1 rb,c A z’-transformed correlation coefficients: q = z1 z2 , with
r a,c rb,c 1 z1 = ln((1 + r1 )/(1 r1 ))/2, z2 = ln((1 + r2 )/(1 r2 ))/2.
The choice of q as effect size is sensible for tests of inde-
Note: The values for r x,y in matrices C1 and C2 cannot be pendent correlations, because in this case the power of the
chosen arbitrarily between -1 and 1. This is easily illustrated test does not depend on the absolute value of the correla-
by considering the matrix C2 : It should be obvious that we tion coefficient assumed under H0, but only on the differ-
cannot, for instance, choose r a,b = r a,c and rb,c = 1.0, ence q between the transformed correlations under H0 and
because the latter choice implies that the other two correla- H1. This is no longer true for dependent correlations and
tions are identical. It is, however, generally not that easy to we therefore used the effect size described above. (See the
decide whether a given matrix is a valid correlation matrix. implementation section for a more thorough discussion of
In more complex cases the following formal criterion can be these issues.)
used: A given symmetric matrix is a valid correlation ma- Although the power is not strictly independent of the
trix, if and only if the matrix is positive semi-definite, that value of the correlation coefficient under H0, the deviations
is, if all eigenvalues of the matrix are non-negative. are usually relatively small and it may therefore be conve-
The null hypothesis in the general case with ’no common nient to use the definition of q to specify the correlation
index’ states that r a,b = rc,d . The (two-sided) alternative under H1 for a given correlation under H0. In this way, one
hypothesis is that these correlation coefficients are different: can relate to the effect size conventions for q defined by
r a,b 6= rc,d : Cohen (1969, p. 109ff) for independent correlations:
H0 : r a,b rc,d = 0
• small q = 0.1
H1 : r a,b rc,d 6= 0.
Here, G * Power refers to the test Z2? described in Eq. (12) • medium q = 0.3
in Steiger (1980). • large q = 0.5
60
The effect size drawer, which can be opened by pressing Corr r_bd: 0.8
the button Determine on the left side of the effect size label,
can be used to do this calculation:
• Output
Critical z: 1.644854
Sample Size: 886
Actual Power: 0.800093
We find that the sample size in each group needs to be
N = 886. How large would N be, if we instead assume
that r1,4 and r2,3 are independent, that is, that r ac = r ad =
rbc = rbd = 0? To calculate this value, we may set the corre-
sponding input fields to 0, or, alternatively, use the proce-
dure for independent correlations with an allocation ration
N2/N1 = 1. In either case, we find a considerably larger
The dialog performs the following transformation: r2 =
sample size of N = 1183 per data set (i.e. we correlate data
( a 1)/( a + 1), with a = exp[ 2q + ln((1 + r1 )/(1 r1 ))],
vectors of length N = 1183). This shows that the power
where q, r1 and r2 denote the effect size q, and the correla-
of the test increases considerably if we take dependencies
tions coefficient assumed under H0 and H1, respectively.
between correlations coefficients into account.
If we try to change the correlation rbd from 0.8 to 0.9,
26.2 Options G * Power shows an error message stating: ‘The correlation
matrix is not valid, that is, not positive semi-definite’. This
This test has no options.
indicates that the matrix C p with r3,4 changed to 0.9 is not
a possible correlation matrix.
26.3 Examples
We assume the following correlation matrix in the popula- 26.3.2 Special case: Common index
tion: Assuming again the population correlation matrix C p
0 1
1 r1,2 r1,3 r1,4 shown above, we want to do an a priori analysis for the
B r1,2 1 r2,3 r2,4 C test whether r1,3 = r2,3 or r1,3 > r2,3 holds. With respect to
Cp = B@ r1,3 r2,3
C
the notation used in G * Power we have the following iden-
1 r3,4 A
r1,4 r2,4 r3,4 1 tities: a = 3 (the common index), b = 1 (the index of the
0 1 second data set entering in the correlation assumed under
1 0.5 0.4 0.1
B 0.5 H0, here r a,b = r3,1 = r1,3 ), and c = 2 (the index of the
1 0.2 0.4 C
= B@ 0.4 0.2
C remaining data set).
1 0.8 A
Thus, we have: H0 correlation r a,b = r3,1 = r1,3 = 0.4, H1
0.1 0.4 0.8 1
correlation r a,c = r3,2 = r2,3 = 0.2, and rb,c = r1,2 = 0.5
For this effect size we want to calculate how large our
26.3.1 General case: No common index sample size needs to be, in order to achieve the error levels
We want to perform an a priori analysis for a one-sided test a = 0.05 and b = 0.2. We choose the procedure ‘Correla-
whether r1,4 = r2,3 or r1,4 < r2,3 holds. tions: Two independent Pearson r’s (common index)’ and
With respect to the notation used in G * Power we have set:
the following identities a = 1, b = 4, c = 2, d = 3. Thus wet
• Select
get: H0 correlation r a,b = r1,4 = 0.1, H1 correlation rc,d =
Type of power analysis: A priori
r2,3 = 0.2, and r a,c = r1,2 = 0.5, r a,d = r1,3 = 0.4, rb,c =
r4,2 = r2,4 = 0.4, rb,d = r4,3 = r3,4 = 0.8. • Input
We want to know how large our samples need to be in Tail(s): one
order to achieve the error levels a = 0.05 and b = 0.2. H1 corr r_ac: 0.2
We choose the procedure ‘Correlations: Two independent a err prob: 0.05
Pearson r’s (no common index)’ and set: Power (1-b err prob): 0.8
H0 Corr r_ab: 0.4
• Select
Corr r_bc: 0.5
Type of power analysis: A priori
• Output
• Input
Critical z: 1.644854
Tail(s): one
Sample Size: 144
H1 corr r_cd: 0.2
Actual Power: 0.801161
a err prob: 0.05
Power (1-b err prob): 0.8 The answer is that we need sample sizes of 144 in each
H0 Corr r_ab: 0.1 group (i.e. the correlations are calculated between data vec-
Corr r_ac: 0.5 tors of length N = 144).
Corr r_ad: 0.4
Corr r_bc: -0.4
61
26.3.3 Sensitivity analyses leads to a N ⇥ K data matrix, and pair-wise correlation of
all columns to a K ⇥ K correlation matrix. By drawing M
We now assume a scenario that is identical to that described samples of size N one can compute M such correlation ma-
above, with the exception that rb,c = 0.6. We want to trices, and one can determine the variances sa,b 2 of the sam-
know, what the minimal H1 correlation r a,c is that we can ple of M correlation coefficients r a,b , and the covariances
detect with a = 0.05 and b = 0.2 given a sample size sa,b;c,d between samples of size M of two different correla-
N = 144. In this sensitivity analysis, we have in general tions r a,b , rc,d . For M ! •, the elements of Y, which denotes
two possible solutions, namely one for r a,c r a,b and one the asymptotic variance-covariance matrix of the correla-
for r a,c r a,b . The relevant settings for the former case are: tions times N, are given by [see Eqns (1) and (2) in Steiger
• Select (1980)]:
Type of power analysis: Sensitivity 2
Y a,b;a,b = Nsa,b = (1 r2a,b )2 (7)
• Input Y a,b;c,d = Nsa,b;b,c (8)
Tail(s): one = [(r a,c r a,b rb,c )(rb,d rb,c rc,d ) + . . .
Effect direction: r_ac r_ab
a err prob: 0.05 (r a,d r a,c rc,d )(rb,c r a,b r a,c ) + . . .
Power (1-b err prob): 0.8 (r a,c r a,d rc,d )(rb,d r a,b r a,d ) + . . .
Sample Size: 144 (r a,d r a,b rb,d )(rb,c rb,d rc,d )]/2
H0 Corr r_ab: 0.4
Corr r_bc: -0.6 When two correlations have an index in common, the ex-
pression given in Eq (8) simplifies to [see Eq (3) in Steiger
• Output (1980)]:
Critical z: -1.644854
H1 corr r_ac: 0.047702 Y a,b;a,c = Nsa,b;a,c (9)
= rb,c (1 r2a,b r2a,c )
The result is that the error levels are as requested or lower r a,b r a,c (1 r2a,c r2a,b r2b,c )/2
if the H1 correlation r a,c is equal to or lower than 0.047702.
We now try to find the corresponding H1 correlation that If the raw sample correlations r a,b are transformed by the
is larger than r a,b = 0.4. To this end, we change the effect Fisher r-to-z transform:
size direction in the settings shown above, that is, we choose ✓ ◆
1 1 + r a,b
r a,c r a,b . In this case, however, G * Power shows an error z a,b = ln
2 1 r a,b
message, indicating that no solution was found. The reason
is, that there is no H1 correlation r a,c r a,b that leads to then the elements of the variance-covariance matrix times
a valid (i.e. positive semi-definite) correlation matrix and (N-3) of the transformed raw correlations are [see Eqs. (9)-
simultaneously ensures the requested error levels. To indi- (11) in Steiger (1980)]:
cate a missing result the output for the H1 correlation is set
to the nonsensical value 2. c a,b;a,b = (N 3)sza,b ;za,b = 1
G * Power checks in both the ‘common index’ case and Y a,b;c,d
the general case with no common index, whether the cor- c a,b;c,d = (N 3)sza,b ;zc,d =
(1 r2a,b )(1 r2c,d )
relation matrix is valid and outputs a H1 correlation of 2 if
no solution is found. This is also true in the XY-plot if the Y a,b;a,c
c a,b;a,c = (N 3)sza,b ;za,c =
H1 correlation is the dependent variable: Combinations of (1 r2a,b )(1 r2a,c )
input values for which no solution can be found show up
with a nonsensical value of 2. 26.5.2 Test statistics
The test statistics proposed by Dunn and Clark (1969) are
26.3.4 Using the effect size dialog
[see Eqns (12) and (13) in Steiger (1980)]:
26.4 Related tests r
2 2s a,b;a,c
• Correlations: Difference from constant (one sample Z1? = (z a,b z a,c )/
N 3
case) r
2 2s a,b;c,d
Z2? = (z a,b zc,d )/
• Correlations: Point biserial model N 3
• Correlations: Two independent Pearson r’s (two sam- where s a,b;a,c and s a,b;c,d denote sample estimates of the co-
ple case) variances c a,b;a,c and c a,b;c,d between the transformed corre-
lations, respectively.
Note: 1. The SD of the difference of z a,b z a,c given in
26.5 Implementation notes the denominator of the formula for Z1? depends on the
26.5.1 Background value of r a,b and r a,c ; the same holds analogously for Z2?. 2.
The only difference between Z2? and the z-statistic used for
Let X1 , . . . , XK denote multinormally distributed random independent correlations is that in the latter the covariance
variables with mean vector µ and covariance matrix C. s a,b;c,d is assumed to be zero.
A sample of size N from this K-dimensional distribution
62
26.5.3 Central and noncentral distributions in power cal-
culations
For the special case with a ‘common index’ the H0-
distribution is the standard normal distribution N (0, 1) and
the H1 distribution the normal distribution N (m1 , s1 ), with
q
s0 = (2 2c0 )/( N 3),
with c0 = c a,b;a,c for H0, i.e. r a,c = r a,b
m1 = (z a,b z a,c )/s0
q
s1 = (2 2c1 )/( N 3) /s0 ,
26.6 Validation
The results were checked against Monte-Carlo simulations.
63
27 Z test: Multiple Logistic Regression • R2 other X.
In models with more than one covariate, the influence
A logistic regression model describes the relationship be- of the other covariates X2 , . . . , X p on the power of the
tween a binary response variable Y (with Y = 0 and Y = 1 test can be taken into account by using a correction fac-
denoting non-occurance and occurance of an event, respec- tor. This factor depends on the proportion R2 = r21·23...p
tively) and one or more independent variables (covariates
of the variance of X1 explained by the regression rela-
or predictors) Xi . The variables Xi are themselves random
tionship with X2 , . . . , X p . If N is the sample size consid-
variables with probability density function f X ( x ) (or prob-
ering X1 alone, then the sample size in a setting with
ability distribution f X ( x ) for discrete X).
additional covariates is: N 0 = N/(1 R2 ). This cor-
In a simple logistic regression with one covariate X the
rection for the influence of other covariates has been
assumption is that the probability of an event P = Pr (Y =
proposed by Hsieh, Bloch, and Larsen (1998). R2 must
1) depends on X in the following way:
lie in the interval [0, 1].
e b0 + b1 x 1
P= = • X distribution:
1 + e b0 + b1 x 1 + e ( b0 + b1 x)
Distribution of the Xi . There a 7 options:
For b 1 6= 0 and continuous X this formula describes a
smooth S-shaped transition of the probability for Y = 1 1. Binomial [P(k ) = ( Nk )p k (1 p ) N k , where k is
from 0 to 1 (b 1 > 0) or from 1 to 0 (b 1 < 0) with increasing the number of successes (X = 1) in N trials of
x. This transition gets steeper with increasing b 1 . Rearrang- a Bernoulli process with probability of success p,
ing the formula leads to: log( P/(1 P)) = b 0 + b 1 X. This 0<p<1]
shows that the logarithm of the odds P/(1 P), also called
2. Exponential [ f ( x ) = (1/l)e 1/l , exponential dis-
a logit, on the left side of the equation is linear in X. Here,
tribution with parameter l > 0]
b 1 is the slope of this linear relationship. p
The interesting question is whether covariate Xi is related 3. Lognormal [ f ( x ) = 1/( xs 2p ) exp[ (ln x
to Y or not. Thus, in a simple logistic regression model, the null µ)2 /(2s2 )], lognormal distribution with parame-
and alternative hypothesis for a two-sided test are: ters µ and s > 0.]
p
H0 : b1 = 0 4. Normal [ f ( x ) = 1/(s 2p ) exp[ ( x
H1 : b 1 6= 0. µ)2 /(2s2 )], normal distribution with parame-
ters µ and s > 0)
The procedures implemented in G * Power for this case es-
5. Poisson (P( X = k) = (lk /k!)e l, Poisson distri-
timates the power of the Wald test. The standard normally
bution with parameter l > 0)
distributed test statistic of the Wald test is:
6. Uniform ( f ( x ) = 1/(b a) for a x b, f ( x ) =
b̂ 1 b̂ 1 0 otherwise, continuous uniform distribution in
z= q =
SE( b 1 ) the interval [ a, b], a < b)
var( b̂ 1 )/N
7. Manual (Allows to manually specify the variance
where b̂ 1 is the maximum likelihood estimator for parame- of b̂ under H0 and H1)
ter b 1 and var( b̂ 1 ) the variance of this estimate.
In a multiple logistic regression model log( P/(1 P)) = G * Power provides two different types of procedure
b 0 + b 1 x1 + · · · + b p x p the effect of a specific covariate in to calculate power: An enumeration procedure and
the presence of other covariates is tested. In this case the large sample approximations. The Manual mode is
null hypothesis is H0 : [ b 1 , b 2 , . . . , b p ] = [0, b 2 , . . . , b p ] and only available in the large sample procedures.
the alternative H1 : [ b 1 , b 2 , . . . , b p ] = [ b̂, b 2 , . . . , b p ], where
b̂ 6= 0. 27.2 Options
Input mode You can choose between two input modes for
27.1 Effect size index the effect size: The effect size may be given by either speci-
In the simple logistic model the effect of X on Y is given by fying the two probabilities p1 , p2 defined above, or instead
the size of the parameter b 1 . Let p1 denote the probability by specifying p1 and the odds ratio OR.
of an event under H0, that is exp( b 0 ) = p1 /(1 p1 ), and
p2 the probability of an event under H1 at X = 1, that is Procedure G * Power provides two different types of pro-
exp( b 0 + b 1 ) = p2 /(1 p2 ). Then exp( b 0 + b 1 )/ exp( b 0 ) = cedure to estimate power. An "enumeration procedure" pro-
exp( b 1 ) = [ p2 /(1 p2 )]/[ p1/(1 p1 )] := odds ratio OR, posed by Lyles, Lin, and Williamson (2007) and large sam-
which implies b 1 = log[OR]. ple approximations. The enumeration procedure seems to
Given the probability p1 (input field Pr(Y=1|X=1) H0) provide reasonable accurate results over a wide range of
the effect size is specified either directly by p2 (input field situations, but it can be rather slow and may need large
Pr(Y=1|X=1) H1) or optionally by the odds ratio (OR) (in- amounts of memory. The large sample approximations are
put field Odds ratio). Setting p2 = p1 or equivalently much faster. Results of Monte-Carlo simulations indicate
OR = 1 implies b 1 = 0 and thus an effect size of zero. that the accuracy of the procedures proposed by Demi-
An effect size of zero must not be used in a priori analyses. denko (2007) and Hsieh et al. (1998) are comparable to that
Besides these values the following additional inputs are of the enumeration procedure for N > 200. The procedure
needed base on the work of Demidenko (2007) is more general and
64
slightly more accurate than that proposed by Hsieh et al. recommend to check the dependence of power on effect size
(1998). We thus recommend to use the procedure proposed in the plot window.
by Demidenko (2007) as standard procedure. The enumera- Covariates with a Lognormal distribution are especially
tion procedure of Lyles et al. (2007) may be used to validate problematic, because this distribution may have a very long
the results (if the sample size is not too large). It must also tail (large positive skew) for larger values of m and s and
be used, if one wants to compute the power for likelihood may thus easily lead to numerical problems. In version 3.1.8
ratio tests. the numerical stability of the procedure has been consider-
ably improved. In addition the power with highly skewed
1. The enumeration procedure provides power analy- distributions may behave in an unintuitive manner and you
ses for the Wald-test and the Likelihood ratio test. should therefore check such cases carefully.
The general idea is to construct an exemplary data
set with weights that represent response probabilities
given the assumed values of the parameters of the X- 27.4 Examples
distribution. Then a fit procedure for the generalized We first consider a model with a single predictor X, which
linear model is used to estimate the variance of the re- is normally distributed with m = 0 and s = 1. We as-
gression weights (for Wald tests) or the likelihood ratio sume that the event rate under H0 is p1 = 0.5 and that the
under H0 and H1 (for likelihood ratio tests). The size of event rate under H1 is p2 = 0.6 for X = 1. The odds ra-
the exemplary data set increases with N and the enu- tio is then OR = (0.6/0.4)/(0.5/0.5) = 1.5, and we have
meration procedure may thus be rather slow (and may b 1 = log(OR) ⇡ 0.405. We want to estimate the sample size
need large amounts of computer memory) for large necessary to achieve in a two-sided test with a = 0.05 a
sample sizes. The procedure is especially slow for anal- power of at least 0.95. We want to specify the effect size in
ysis types other then "post hoc", which internally call terms of the odds ratio. When using the procedure of Hsieh
the power routine several times. By specifying a thresh- et al. (1998) the input and output is as follows:
old sample size N you can restrict the use of the enu-
meration procedure to sample sizes < N. For sample • Select
sizes N the large sample approximation selected in Statistical test: Logistic Regression
the option dialog is used. Note: If a computation takes Type of power analysis: A priori
too long you can abort it by pressing the ESC key.
• Options:
2. G * Power provides two different large sample approx- Effect size input mode: Odds ratio
imations for a Wald-type test. Both rely on the asymp- Procedure: Hsieh et al. (1998)
totic normal distribution of the maximum likelihood
estimator for parameter b 1 and are related to the • Input
method described by Whittemore (1981). The accuracy Tail(s): Two
of these approximation increases with sample size, but Odds ratio: 1.5
the deviation from the true power may be quite no- Pr(Y=1) H0: 0.5
ticeable for small and moderate sample sizes. This is a err prob: 0.05
especially true for X-distributions that are not sym- Power (1-b err prob): 0.95
metric about the mean, i.e. the lognormal, exponential, R2 other X: 0
and poisson distribution, and the binomial distribu- X distribution: Normal
tion with p 6= 1/2.The approach of Hsieh et al. (1998) X parm µ: 0
is restricted to binary covariates and covariates with X parm s: 1
standard normal distribution. The approach based on • Output
Demidenko (2007) is more general and usually more Critical z: 1.959964
accurate and is recommended as standard procedure. Total sample size: 317
For this test, a variance correction option can be se- Actual power: 0.950486
lected that compensates for variance distortions that
may occur in skewed X distributions (see implementa- The results indicate that the necessary sample size is 317.
tion notes). If the Hsieh procedure is selected, the pro- This result replicates the value in Table II in Hsieh et al.
gram automatically switches to the procedure of Demi- (1998) for the same scenario. Using the other large sample
denko if a distribution other than the standard normal approximation proposed by Demidenko (2007) we instead
or the binomial distribution is selected. get N = 337 with variance correction and N = 355 without.
In the enumeration procedure proposed by Lyles et al.
27.3 Possible problems (2007) the c2 -statistic is used and the output is
As illustrated in Fig. 32, the power of the test does not al- • Output
ways increase monotonically with effect size, and the max- Noncentrality parameter l: 13.029675
imum attainable power is sometimes less than 1. In partic- Critical c2 :3.841459
ular, this implies that in a sensitivity analysis the requested Df: 1
power cannot always be reached. From version 3.1.8 on Total sample size: 358
G * Power returns in these cases the effect size which maxi- Actual power: 0.950498
mizes the power in the selected direction (output field "‘Ac-
tual power"’). For an overview about possible problems, we
65
Thus, this routine estimates the minimum sample size in sample size further to 2368 (in both cases the Demidenko
this case to be N = 358. procedure with variance correction was used). These exam-
In a Monte-Carlo simulation of the Wald test in the above ples demonstrate the fact that a balanced design requires a
scenario with 50000 independent cases we found a mean smaller sample size than an unbalanced design, and a low
power of 0.940, 0.953, 0.962, and 0.963 for samples sizes 317, prevalence rate requires a smaller sample size than a high
337, 355, and 358, respectively. This indicates that in this prevalence rate (Hsieh et al., 1998, p. 1625).
case the method based on Demidenko (2007) with variance
correction yields the best approximation.
27.5 Related tests
We now assume that we have additional covariates and
estimate the squared multiple correlation with these others • Poisson regression
covariates to be R2 = 0.1. All other conditions are identical.
The only change we need to make is to set the input field
27.6 Implementation notes
R2 other X to 0.1. Under this condition the necessary sample
size increases from 337 to a value of 395 when using the 27.6.1 Enumeration procedure
procedure of Demidenko (2007) with variance correction.
As an example for a model with one binary covariate X The procedures for the Wald- and Likelihood ratio tests are
we choose the values of the fourth example in Table I in implemented exactly as described in Lyles et al. (2007).
Hsieh et al. (1998). That is, we assume that the event rate
under H0 is p1 = 0.05, and the event rate under H0 with 27.6.2 Large sample approximations
X = 1 is p2 = 0.1. We further assume a balanced design
The large sample procedures for the univariate case are
(p = 0.5) with equal sample frequencies for X = 0 and
both related to the approach outlined in Whittemore (1981).
X = 1. Again we want to estimate the sample size necessary
The correction for additional covariates has been proposed
to achieve in a two-sided test with a = 0.05 a power of at
by Hsieh et al. (1998). As large sample approximations they
least 0.95. We want to specify the effect size directly in terms
get more accurate for larger sample sizes.
of p1 and p2 :
66
given by the inverse of the (m + 1) ⇥ (m + 1) Fisher infor- rounding errors. Further checks were made against the cor-
mation matrix I. The (i, j)th element of I is given by responding routine in PASS Hintze (2006) and we usually
" # found complete correspondence. For multiple logistic re-
∂2 log L gression models with R2 other X > 0, however, our values
Iij = E
∂b i ∂b j deviated slightly from the result of PASS. We believe that
our results are correct. There are some indications that the
exp( b 0 + b 1 Xi + . . . + b m Xm )
= NE[ Xi X j reason for these deviations is that PASS internally rounds
1 + exp( b 0 + b 1 Xi + . . . + b m Xm ))2 or truncates sample sizes to integer values.
Thus, in the case of one continuous predictor, I is a 4 ⇥ 4 To validate the procedures of Demidenko (2007) and
matrix with elements: Lyles et al. (2007) we conducted Monte-Carlo simulations
Z • of Wald tests for a range of scenarios. In each case 150000
I00 = GX ( x )dx independent cases were used. This large number of cases is
• necessary to get about 3 digits precision. In our experience,
Z •
I10 = I01 = xGX ( x )dx the common praxis to use only 5000, 2000 or even 1000 in-
Z •
• dependent cases in simulations (Hsieh, 1989; Lyles et al.,
I11 = x2 GX ( x )dx 2007; Shieh, 2001) may lead to rather imprecise and thus
• misleading power estimates.
with Table (2) shows the errors in the power estimates
exp( b 0 + b 1 x ) for different procedures. The labels "Dem(c)" and "Dem"
GX ( x ) : = f X ( x ) , denote the procedure of Demidenko (2007) with and
(1 + exp( b 0 + b 1 x ))2
without variance correction, the labels "LLW(W)" and
where f X ( x ) is the PDF of the X distribution (for discrete "LLW(L)" the procedure of Lyles et al. (2007) for the
predictors, the integrals must be replaced by corresponding Wald test and the likelihood ratio test, respectively. All
sums). The element M11 of the inverse of this matrix (M = six predefined distributions were tested (the parameters
I 1 ), that is the variance of b 1 , is given by: M11 = Var ( b) = are given in the table head). The following 6 combi-
2 ). In G * Power , numerical integration is
I00 /( I00 I11 I01 nations of Pr(Y=1|X=1) H0 and sample size were used:
used to compute these integrals. (0.5,200),(0.2,300),(0.1,400),(0.05,600),(0.02,1000). These val-
To estimate the variance of b̂ 1 under H1, the parameter ues were fully crossed with four odds ratios (1.3, 1.5, 1.7,
b 0 and b 1 in the equations for Iij are chosen as implied by 2.0), and two alpha values (0.01, 0.05). Max and mean errors
the input, that is b 0 = log[ p1 /(1 p1 )], b 1 = log[OR]. To were calculated for all power values < 0.999. The results
estimate the variance under H0, one chooses b 1 = 0 and show that the precision of the procedures depend on the X
b 0 = b⇤0 , where b⇤0 is chosen as defined above. distribution. The procedure of Demidenko (2007) with the
variance correction proposed here, predicted the simulated
Hsieh et al.-procedure The procedures proposed in Hsieh power values best.
et al. (1998) are used. The samples size formula for continu-
ous, normally distributed covariates is [Eqn (1) in Hsieh et
al. (1998)]:
(z1 a/2 + z1 b )2
N=
p1 (1 p1) b̂2
where b̂ = log([ p1/(1 p1)]/[ p2/(1 p2)]) is the tested
effect size, and p1 , p2 are the event rates at the mean of X
and one SD above the mean, respectively.
For binary covariates the sample size formula is [Eqn (2)
in Hsieh et al. (1998)]:
h p p i2
z1 a pqB + z1 b p1 q1 + p2 q2 (1 B)/B
N=
( p1 p2 )2 (1 B)
27.7 Validation
To check the correct implementation of the procedure pro-
posed by Hsieh et al. (1998), we replicated all examples pre-
sented in Tables I and II in Hsieh et al. (1998). The single
deviation found for the first example in Table I on p. 1626
(sample size of 1281 instead of 1282) is probably due to
67
max error
procedure tails bin(0.3) exp(1) lognorm(0,1) norm(0,1) poisson(1) uni(0,1)
Dem(c) 1 0.0132 0.0279 0.0309 0.0125 0.0185 0.0103
Dem(c) 2 0.0125 0.0326 0.0340 0.0149 0.0199 0.0101
Dem 1 0.0140 0.0879 0.1273 0.0314 0.0472 0.0109
Dem 2 0.0145 0.0929 0.1414 0.0358 0.0554 0.0106
LLW(W) 1 0.0144 0.0878 0.1267 0.0346 0.0448 0.0315
LLW(W) 2 0.0145 0.0927 0.1407 0.0399 0.0541 0.0259
LLW(L) 1 0.0174 0.0790 0.0946 0.0142 0.0359 0.0283
LLW(L) 2 0.0197 0.0828 0.1483 0.0155 0.0424 0.0232
mean error
procedure tails binomial exp lognorm norm poisson uni
Dem(c) 1 0.0045 0.0113 0.0120 0.0049 0.0072 0.0031
Dem(c) 2 0.0064 0.0137 0.0156 0.0061 0.0083 0.0039
Dem 1 0.0052 0.0258 0.0465 0.0069 0.0155 0.0035
Dem 2 0.0049 0.0265 0.0521 0.0080 0.0154 0.0045
LLW(W) 1 0.0052 0.0246 0.0469 0.0078 0.0131 0.0111
LLW(W) 2 0.0049 0.0253 0.0522 0.0092 0.0139 0.0070
LLW(L) 1 0.0074 0.0174 0.0126 0.0040 0.0100 0.0115
LLW(L) 2 0.0079 0.0238 0.0209 0.0047 0.0131 0.0073
Table 1: Results of simulation. Shown are the maximum and mean error in power for different procedures (see text).
β
Figure 32: Results of a simulation study investigating the power as a function of effect size for Logistic regression with covariate X ⇠
Lognormal(0,1), p1 = 0.2, N = 100, a = 0.05, one-sided. The plot demonstrates that the method of Demidenko (2007), especially if
combined with variance correction, predicts the power values in the simulation rather well. Note also that here the power does not
increase monotonically with effect size (on the left side from Pr (Y = 1| X = 0) = p1 = 0.2) and that the maximum attainable power
may be restricted to values clearly below one.
68
28 Z test: Poisson Regression the base rate is assumed if X is increased by one unit,
this value is set to (100+10)/100 = 1.1.
A Poisson regression model describes the relationship be-
An input of exp( b 1 ) = 1 corresponds to "no effect" and
tween a Poisson distributed response variable Y (a count)
must not be used in a priori calculations.
and one or more independent variables (covariates or pre-
dictors) Xi , which are themselves random variables with • Base rate exp(b0).
probability density f X ( x ).
This is the mean event rate assumed under H0. It must
The probability of y events during a fixed ‘exposure time’
be greater than 0.
t is:
e lt (lt)y • Mean exposure.
Pr (Y = y|l, t) =
y!
This is the time unit during which the events are
It is assumed that the parameter l of the Poisson distribu- counted. It must be greater than 0.
tion, the mean incidence rate of an event during exposure
time t, is a function of the Xi ’s. In the Poisson regression • R2 other X.
model considered here, l depends on the covariates Xi in In models with more than one covariate, the influence
the following way: of the other covariates X2 , . . . , X p on the power of the
test can be taken into account by using a correction fac-
l = exp( b 0 + b 1 X1 + b 2 X2 + · · · + b m Xm ) tor. This factor depends on the proportion R2 = r21·23...p
of the variance of X1 explained by the regression rela-
where b 0 , . . . , b m denote regression coefficients that are es-
tionship with X2 , . . . , X p . If N is the sample size consid-
timated from the data Frome (1986).
ering X1 alone, then the sample size in a setting with
In a simple Poisson regression with just one covariate
additional covariates is: N 0 = N/(1 R2 ). This cor-
X1 , the procedure implemented in G * Power estimates the
rection for the influence of other covariates is identical
power of the Wald test, applied to decide whether covariate
to that proposed by Hsieh et al. (1998) for the logistic
X1 has an influence on the event rate or not. That is, the
regression.
null and alternative hypothesis for a two-sided test are:
In line with the interpretation of R2 as squared corre-
H0 : b1 = 0 lation it must lie in the interval [0, 1].
H1 : b 1 6= 0.
• X distribution:
The standard normally distributed test statistic of the Distribution of the Xi . There a 7 options:
Wald test is:
1. Binomial [P(k ) = ( Nk )p k (1 p ) N k , where k is
b̂ 1 b̂ 1 the number of successes (X = 1) in N trials of
z= q =
SE( b 1 ) a Bernoulli process with probability of success p,
var( b̂ 1 )/N
0<p<1]
where b̂ 1 is the maximum likelihood estimator for parame- 2. Exponential [ f ( x ) = (1/l)e 1/l , exponential dis-
ter b 1 and var( b̂ 1 ) the variance of this estimate. tribution with parameter l > 0]
In a multiple Poisson regression model: l = exp( b 0 + p
3. Lognormal [ f ( x ) = 1/( xs 2p ) exp[ (ln x
b 1 X1 + · · · + b m Xm ), m > 1, the effect of a specific covariate
µ)2 /(2s2 )], lognormal distribution with parame-
in the presence of other covariates is tested. In this case the
ters µ and s > 0.]
null and alternative hypotheses are: p
4. Normal [ f ( x ) = 1/(s 2p ) exp[ ( x
H0 : [ b 1 , b 2 , . . . , b m ] = [0, b 2 , . . . , b m ] µ)2 /(2s2 )], normal distribution with parame-
H1 : [ b 1 , b 2 , . . . , b m ] = [ b⇤1 , b 2 , . . . , b m ] ters µ and s > 0)
5. Poisson (P( X = k) = (lk /k!)e l, Poisson distri-
where b⇤1 > 0.
bution with parameter l > 0)
6. Uniform ( f ( x ) = 1/(b a) for a x b, f ( x ) =
28.1 Effect size index 0 otherwise, continuous uniform distribution in
The effect size is specified by the ratio R of l under H1 to the interval [ a, b], a < b)
l under H0: 7. Manual (Allows to manually specify the variance
exp( b 0 + b 1 X1 ) of b̂ under H0 and H1)
R= = exp b 1 X1
exp( b 0 ) G * Power provides two different procedures—
. an enumeration procedure and a large sample
The following additional inputs are needed approximation—to calculate power. The manual
mode is only available in the large sample procedure.
• Exp(b1).
This is the value of the l-ratio R defined above for X =
1, that is the relative increase of the event rate over
the base event rate exp( b 0 ) assumed under H0, if X is
increased one unit. If, for instance, a 10% increase over
69
28.2 Options ance distortions that may occur in skewed X distribu-
tions (see implementation notes).
G * Power provides two different types of procedure to es-
timate power. An "enumeration procedure" proposed by
Lyles et al. (2007) and large sample approximations. The 28.3 Examples
enumeration procedure seems to provide reasonable accu-
We replicate the example given on page 449 in Signorini
rate results over a wide range of situations, but it can be
(1991). The number of infections Y of swimmers (X = 1)
rather slow and may need large amounts of memory. The
vs. non-swimmers (X=0) during a swimming season (ex-
large sample approximations are much faster. Results of
posure time = 1) is tested. The infection rate is modeled
Monte-Carlo simulations indicate that the accuracy of the
as a Poisson distributed random variable. X is assumed to
procedure based on the work of Demidenko (2007) is com-
be binomially distributed with p = 0.5 (equal numbers of
parable to that of the enumeration procedure for N > 200,
swimmers and non-swimmers are sampled). The base rate,
whereas errors of the procedure proposed by Signorini
that is, the infection rate in non-swimmers, is estimated to
(1991) can be quite large. We thus recommend to use the
be 0.85. The significance level is a = 0.05. We want to know
procedure based on Demidenko (2007) as the standard pro-
the sample size needed to detect a 30% or greater increase
cedure. The enumeration procedure of Lyles et al. (2007)
in infection rate with a power of 0.95. A 30% increase im-
may be used for small sample sizes and to validate the re-
plies a relative rate of 1.3 ([100%+30%]/100%).
sults of the large sample procedure using an a priori anal-
We first choose to use the procedure of Signorini (1991)
ysis (if the sample size is not too large). It must also be
used, if one wants to compute the power for likelihood ra- • Select
tio tests. The procedure of Signorini (1991) is problematic Statistical test: Regression: Poisson Regression
and should not be used; it is only included to allow checks Type of power analysis: A priori
of published results referring to this widespread procedure.
• Input
1. The enumeration procedure provides power analy- Tail(s): One
ses for the Wald-test and the Likelihood ratio test. Exp(b1): 1.3
The general idea is to construct an exemplary data a err prob: 0.05
set with weights that represent response probabilities Power (1-b err prob): 0.95
given the assumed values of the parameters of the X- Base rate exp(b0): 0.85
distribution. Then a fit procedure for the generalized Mean exposure: 1.0
linear model is used to estimate the variance of the re- R2 other X: 0
gression weights (for Wald tests) or the likelihood ratio X distribution: Binomial
under H0 and H1 (for likelihood ratio tests). The size of X parm p: 0.5
the exemplary data set increases with N and the enu-
meration procedure may thus be rather slow (and may • Output
need large amounts of computer memory) for large Critical z: 1.644854
sample sizes. The procedure is especially slow for anal- Total sample size: 697
ysis types other then "post hoc", which internally call Actual power: 0.950121
the power routine several times. By specifying a thresh- The result N = 697 replicates the value given in Signorini
old sample size N you can restrict the use of the enu- (1991). The other procedures yield N = 649 (Demidenko
meration procedure to sample sizes < N. For sample (2007) with variance correction, and Lyles et al. (2007) for
sizes N the large sample approximation selected in Likelihood-Ratio tests), N = 655 (Demidenko (2007) with-
the option dialog is used. Note: If a computation takes out variance correction, and Lyles et al. (2007) for Wald
too long you can abort it by pressing the ESC key. tests). In Monte-Carlo simulations of the Wald test for this
2. G * Power provides two different large sample approx- scenario with 150000 independent cases per test a mean
imations for a Wald-type test. Both rely on the asymp- power of 0.94997, 0.95183, and 0.96207 was found for sam-
totic normal distribution of the maximum likelihood ple sizes of 649, 655, and 697, respectively. This simulation
estimator b̂. The accuracy of these approximation in- results suggest that in this specific case the procedure based
creases with sample size, but the deviation from the on Demidenko (2007) with variance correction gives the
true power may be quite noticeable for small and most accurate estimate.
moderate sample sizes. This is especially true for X- We now assume that we have additional covariates and
distributions that are not symmetric about the mean, estimate the squared multiple correlation with these others
i.e. the lognormal, exponential, and poisson distribu- covariates to be R2 = 0.1. All other conditions are identical.
tion, and the binomial distribution with p 6= 1/2. The only change needed is to set the input field R2 other
The procedure proposed by Signorini (1991) and vari- X to 0.1. Under this condition the necessary sample size
ants of it Shieh (2001, 2005) use the "null variance for- increases to a value of 774 (when we use the procedure of
mula" which is not correct for the test statistic assumed Signorini (1991)).
here (and that is used in existing software) Demidenko
(2007, 2008). The other procedure which is based on Comparison between procedures To compare the accu-
the work of Demidenko (2007) on logistic regression is racy of the Wald test procedures we replicated a number
usually more accurate. For this test, a variance correc- of test cases presented in table III in Lyles et al. (2007) and
tion option can be selected that compensates for vari- conducted several additional tests for X-distributions not
70
considered in Lyles et al. (2007). In all cases a two-sided Whittemore (1981). They get more accurate for larger sam-
test with N = 200, b 0 = 0.5, a = 0.05 is assumed. ple sizes. The correction for additional covariates has been
If we use the procedure of Demidenko (2007), then the proposed by Hsieh et al. (1998).
complete G * Power input and output for the test in the first The H0 distribution is the standard normal distribu-
row of table (2) below would be: tion N (0, 1), the H1 distribution the normal distribution
N (m1 , s1 ) with:
• Select q
Statistical test: Regression: Poisson Regression m1 = b 1 N (1 R2 ) · t/v0 (12)
Type of power analysis: Post hoc p
s1 = v1 /v0 (13)
• Input
Tail(s): Two in the Signorini (1991) procedure, and
Exp(b1): =exp(-0.1) q
a err prob: 0.05 m1 = b 1 N (1 R2 ) · t/v1 (14)
Total sample size: 200 ?
s1 = s (15)
Base rate exp(b0): =exp(0.5)
Mean exposure: 1.0 in the procedure based on Demidenko (2007). In these equa-
R2 other X: 0 tions t denotes the mean exposure time, N the sample size,
X distribution: Normal R2 the squared multiple correlation coefficient of the co-
X parm µ: 0 variate of interest on the other covariates, and v0 and v1
X parm s: 1 the variance of b̂ 1 under H0 and H1, respectively. With-
out variance
q correction s? = 1, with variance correction
• Output
Critical z: -1.959964 s? = ( av0? + (1 a)v1 )/v1 , where v0? is the variance un-
R
Power (1-b err prob): 0.444593 der H0 for b?0 = log(µ), with µ = f X ( x ) exp( b 0 + b 1 x )dx.
For the lognormal distribution a = 0.75; in all other cases
When using the enumeration procedure, the input is ex- a = 1. With variance correction, the value of s? is often
actly the same, but in this case the test is based on the c2 - close to 1, but deviates from 1 for X-distributions that are
distribution, which leads to a different output. not symmetric about the mean. Simulations showed that
• Output this compensates for variance inflation/deflation in the dis-
Noncentrality parameter l: 3.254068 tribution of b̂ 1 under H1 that occurs with such distributions
Critical c2 : 3.841459 finite sample sizes.
Df: 1 The large sample approximations use the result that the
Power (1-b err prob): 0.438076 (m + 1) maximum likelihood estimators b 0 , b i , . . . , b m are
asymptotically (multivariate) normal distributed, where the
The rows in table (2) show power values for different test variance-covariance matrix is given by the inverse of the
scenarios. The column "Dist. X" indicates the distribution (m + 1) ⇥ (m + 1) Fisher information matrix I. The (i, j)th
of the predictor X and the parameters used, the column element of I is given by
"b 1 " contains the values of the regression weight b 1 under " #
H1, "Sim LLW" contains the simulation results reported by ∂2 log L
Lyles et al. (2007) for the test (if available), "Sim" contains Iij = E = NE[ Xi X j e b0 + b1 Xi +...+ b m Xm ]
∂b i ∂b j
the results of a simulation done by us with considerable
more cases (150000 instead of 2000). The following columns Thus, in the case of one continuous predictor I is a 4 ⇥ 4
contain the results of different procedures: "LLW" = Lyles et matrix with elements:
al. (2007), "Demi" = Demidenko (2007), "Demi(c)" = Demi- Z •
denko (2007) with variance correction, and "Signorini" Sig- I00 = f ( x ) exp( b 0 + b 1 x )dx
norini (1991). •
Z •
I10 = I01 = f ( x ) x exp( b 0 + b 1 x )dx
•
28.4 Related tests Z •
I11 = f ( x ) x2 exp( b 0 + b 1 x )dx
• Logistic regression •
71
Dist. X b1 Sim LLW Sim LLW Demi Demi(c) Signorini
N(0,1) -0.10 0.449 0.440 0.438 0.445 0.445 0.443
N(0,1) -0.15 0.796 0.777 0.774 0.782 0.782 0.779
N(0,1) -0.20 0.950 0.952 0.953 0.956 0.956 0.954
Unif(0,1) -0.20 0.167 0.167 0.169 0.169 0.169 0.195
Unif(0,1) -0.40 0.478 0.472 0.474 0.475 0.474 0.549
Unif(0,1) -0.80 0.923 0.928 0.928 0.928 0.932 0.966
Logn(0,1) -0.05 0.275 0.298 0.305 0.320 0.291 0.501
Logn(0,1) -0.10 0.750 0.748 0.690 0.695 0.746 0.892
Logn(0,1) -0.15 0.947 0.947 0.890 0.890 0.955 0.996
Poiss(0.5) -0.20 - 0.614 0.599 0.603 0.613 0.701
Poiss(0.5) -0.40 - 0.986 0.971 0.972 0.990 0.992
Bin(0.2) -0.20 - 0.254 0.268 0.268 0.254 0.321
Bin(0.2) -0.40 - 0.723 0.692 0.692 0.716 0.788
Exp(3) -0.40 - 0.524 0.511 0.518 0.521 0.649
Exp(3) -0.70 - 0.905 0.868 0.871 0.919 0.952
Table 2: Power for different test scenarios as estimated with different procedures (see text)
The integrals given above happen to be proportional to 5. Uniform distribution with parameters (u, v) corre-
derivatives of the moment generating function g( x ) := sponding to the interval borders.
E(etX ) of the X distribution, which is often known in closed
form. Given g( x ) and its derivatives the variance of b is cal- e xv e xu
g( x ) =
culated as: x (v u)
g( b) 1 v0 exp( b 0 ) = 12/(u v )2
Var ( b) = ·
2
g( b) g ( b) g ( b) exp( b 0 )
00 0 b31 (hu hv )(u v)
v1 exp( b 0 ) =
h2u + h2v hu+v (2 + b21 (u v )2 )
With this definition, the variance of b̂ 1 under H0 is v0 =
lim Var ( b) and the variance of b̂ 1 under H1 is v1 =
hx = exp( b 1 x )
b !0
Var ( b 1 ).
The moment generating function for the lognormal dis-
In the manual mode you have to calculate the variances
tribution does not exist. For the other five predefined dis-
v0 and v1 yourself. To illustrate the necessary steps, we
tributions this leads to:
show the calculation for the exponential distribution (given
1. Binomial distribution with parameter p above) in more detail:
1 l
g( x ) = (1 p + pe x )n g( b) = (1 b/l) =
l b
v0 exp( b 0 ) = p/(1 p )
l
v1 exp( b 0 ) = 1/(1 p ) + 1/(p exp( b 1 )) g0 ( b) =
(l b )2
2l
2. Exponential distribution with parameter l g00 ( b) =
( l b )3
1 g( b)
g( x ) = (1 x/l) Var ( b)/ exp( b 0 ) =
2
g ( b ) g ( b ) g 0 ( x )2
00
v0 exp( b 0 ) = l
(l b )3
v1 exp( b 0 ) = (l b 1 )3 /l =
l
v0 exp( b 0 ) = Var (0) exp( b 0 )
3. Normal distribution with parameters (µ, s) = l2
v1 exp( b 0 ) = Var ( b 1 ) exp( b 0 )
s2 x2
g( x ) = exp(µx + ) = (l b 1 )3 /l
2
v0 exp( b 0 ) = 1/s2 Note, however, that sensitivity analyses in which the effect
v1 exp( b 0 ) = exp( [ b 1 µ + ( b 1 s)2 /2]) size is estimated, are not possible in the manual mode. This
is so, because in this case the argument b of the function
4. Poisson distribution with parameter l Var ( b) is not constant under H1 but the target of the search.
72
with s 6= 1; this error has been corrected in the newest
version PASS 2008). In two-sided tests we found small de-
viations from the results calculated with PASS. The reason
for these deviations is that the (usually small) contribution
to the power from the distant tail is ignored in PASS but
not in G * Power . Given the generally low accuracy of the
Signorini (1991) procedure, these small deviations are of no
practical consequence.
The results of the other two procedures were compared
to the results of Monte-Carlo simulations and with each
other. The results given in table (2) are quite representative
of these more extensive test. These results indicate that the
deviation of the computed power values may deviate from
the true values by ±0.05.
73
29 Z test: Tetrachoric Correlation The exact computation of the tetrachoric correlation coef-
ficient is difficult. One reason is of a computational nature
29.0.1 Background (see implementation notes). A more principal problem is,
however, that frequency data are discrete, which implies
In the model underlying tetrachoric correlation it is as-
that the estimation of a cell probability can be no more ac-
sumed that the frequency data in a 2 ⇥ 2-table stem from di-
curate than 1/(2N). The inaccuracies in estimating the true
chotimizing two continuous random variables X and Y that
correlation r are especially severe when there are cell fre-
are bivariate normally distributed with mean m = (0, 0)
quencies less than 5. In these cases caution is necessary in
and covariance matrix:
interpreting the estimated r. For a more thorough discus-
✓ ◆
1 r sion of these issues see Brown and Benedetti (1977) and
S= . Bonett and Price (2005).
r 1
74
Note: The four cell probabilities must sum to 1. It there- We now want to know, how many subjects we need to a
fore suffices to specify three of them explicitly. If you achieve a power of 0.95 in a one-sided test of the H0 that
leave one of the four cells empty, G * Power computes r = 0 vs. the H1 r = 0.24, given the same marginal proba-
the fourth value as: (1 - sum of three p). bilities and a = 0.05.
Clicking on ‘Calculate and transfer to main window’
• A second possibility is to compute a confidence in- copies the computed H1 corr r = 0.2399846 and the
terval for the tetrachoric correlation in the population marginal probabilities p x = 0.602 and py = 0.582 to the
from the results of a previous investigation, and to corresponding input fields in the main window. The com-
choose a value from this interval as H1 corr r. In this plete input and output is as follows:
case you specify four observed frequencies, the rela-
tive position 0 < k < 1 inside the confidence interval • Select
(0, 0.5, 1 corresponding to the left, central, and right Statistical test: Correlation: Tetrachoric model
position, respectively), and the confidence level (1 a) Type of power analysis: A priori
of the confidence interval (see right panel in Fig. 33).
• Input
From this data G * Power computes the total sample
Tail(s): One
size N = f 11 + f 12 + f 21 + f 22 and estimates the cell
H1 corr r: 0.2399846
probabilities pij by: pij = ( f ij + 0.5)/( N + 2). These are
a err prob: 0.05
used to compute the sample correlation coefficient r,
Power (1-b err prob):0.95
the estimated marginal probabilities, the borders ( L, R)
H0 corr r: 0
of the (1 a) confidence interval for the population
Marginal prob x: 0.6019313
correlation coefficient r, and the standard error of r.
Marginal prob y: 0.5815451
The value L + ( R L) ⇤ k is used as H1 corr r. The
computed correlation coefficient, the confidence inter- • Output
val and the standard error of r depend on the choice for Critical z: 1.644854
the exact Brown and Benedetti (1977) vs. the approxi- Total sample size: 463
mate Bonett and Price (2005) computation mode, made Actual power: 0.950370
in the option dialog. In the exact mode, the labels of the H1 corr r: 0.239985
output fields are Correlation r, C.I. r lwr, C.I. r H0 corr r: 0.0
upr, and Std. error of r, in the approximate mode Critical r lwr: 0.122484
an asterisk ⇤ is appended after r and r. Critical r upr: 0.122484
Std err r: 0.074465
Clicking on the button Calculate and transfer to
main window copies the values given in H1 corr r, Margin
prob x, Margin prob y, and - in frequency mode - Total This shows that we need at least a sample size of 463 in
sample size to the corresponding input fields in the main this case (the Actual power output field shows the power
window. for a sample size rounded to an integer value).
The output also contains the values for r under H0 and
29.2 Options H1 used in the internal computation procedure. In the ex-
act computation mode a deviation from the input values
You can choose between the exact approach in which the would indicate that the internal estimation procedure did
procedure proposed by Brown and Benedetti (1977) is not work correctly for the input values (this should only oc-
used and the approximation suggested by Bonett and Price cur at extreme values of r or marginal probabilities). In the
(2005). approximate mode, the output values correspond to the r
values resulting from the approximation formula.
29.3 Examples The remaining outputs show the critical value(s) for r un-
der H0: In the Wald test assumed here, z = (r r0 )/se0 (r )
To illustrate the application of the procedure we refer to ex- is approximately standard normally distributed under H0.
ample 1 in Bonett and Price (2005): The Yes or No answers The critical values of r under H0 are given (a) as a quan-
of 930 respondents to two questions in a personality inven- tile z1 a/2 of the standard normal distribution, and (b) in
tory are recorded in a 2 ⇥ 2-table with the following result: the form of critical correlation coefficients r and standard
f 11 = 203, f 12 = 186, f 21 = 167, f 22 = 374. error se0 (r ). (In one-sided tests, the single critical value is
First we use the effect size dialog to compute from these reported twice in Critical r lwr and Critical r upr).
data the confidence interval for the tetrachoric correlation In the example given above, the standard error of r under
in the population. We choose in the effect size drawer, H0 is 0.074465, and the critical value for r is 0.122484. Thus,
From C.I. calculated from observed freq. Next, we in- (r r0 )/se(r ) = (0.122484 0)/0.074465 = 1.64485 = z1 a
sert the above values in the corresponding fields and press as expected.
Calculate. Using the ‘exact’ computation mode (selected
in the Options dialog in the main window), we get an esti- Using G * Power to perform the statistical test of H0
mated correlation r = 0.334, a standard error of r = 0.0482, G * Power may also be used to perform the statistical test
and the 95% confidence interval [0.240, 0.429] for the pop- of H0. Assume that we want to test the H0: r = r0 = 0.4
ulation r. We choose the left border of the C.I. (i.e. relative vs the two-sided alternative H1: r 6= 0.4 for a = 0.05. As-
position 0, corresponding to 0.240) as the value of the tetra- sume further that we observed the frequencies f 11 = 120,
choric correlation coefficient r under H0.
75
Figure 33: Effect size drawer to calculate H1 r, and the marginal probabilities (see text).
f 12 = 45, f 21 = 56, and f 22 = 89. To perform the test we The critical values for r ⇤ given in the output section of the
first use the option "From C.I. calculated from observed main window are [0.233, 0.541] and the standard error for
freq" in the effect size dialog to compute from the ob- r ⇤ is 0.0788. Note: To compute the p-Value in the approx-
served frequencies the correlation coefficient r and the es- imate mode, we should use H0 corr r⇤ given in the out-
timated marginal probabilities. In the exact mode we find put and not H0 corr r specified in the input. Accordingly,
r = 0.513, "Est. marginal prob x" = 0.433, and "Est. marginal using the following input in the G * Power calculator z
prob y" = 0.468. In the main window we then choose a = (0.509-0.397)/0.0788; 1-normcdf(z,0,1) yields p =
"post hoc" analysis. Clicking on "Calculate and transfer to 0.0776, a value very close to that given above for the exact
main window" in the effect size dialog copies the values mode.
for marginal x, marginal y, and the sample size 310 to the
main window. We now set "H0 corr r" to 0.4 and "a err
29.4 Related tests
prob" to 0.05. After clicking on "Calculate" in the main win-
dow, the output section shows the critical values for the 29.5 Implementation notes
correlation coefficient([0.244, 0.555]) and the standard er-
ror under H0 (0.079366). These values show that the test is Given r and the marginal probabilties p x and py , the fol-
not significant for the chosen a-level, because the observed lowing procedures are used to calculate the value of r (in
r = 0.513 lies inside the interval [0.244, 0.555]. We then the exact mode) or r⇤ (in the approximate mode) and to
use the G * Power calculator to compute the p-value. In- estimate the standard error of r and r ⇤.
serting z = (0.513-0.4)/0.0794; 1-normcdf(z,0,1) and
clicking on the "Calculate" button yields p = 0.077. 29.5.1 Exact mode
If we instead want to use the approximate mode, we first
choose this option in the "Options" dialog and then pro- In the exact mode the algorithms proposed by Brown and
ceed in essentially the same way. In this case we find a Benedetti (1977) are used to calculate r and to estimate the
very similar value for the correlation coefficient r ⇤ = 0.509. standard error s(r ). Note that the latter is not the expected
76
standard error s(r )! To compute s (r ) would require to enu- where c = (1 | p1⇤ p⇤1 |/5 (1/2 pm )2 )/2, with p1⇤ =
merate all possible tables Ti for the given N. If p( Ti ) and ri p11 + p12 , p⇤1 = p11 + p21 , pm = smallest marginal propor-
denote the probability and the correlation coefficient of ta- tion, and w = p11 p22 /( p12 p21 ).
ble i, then s2 (r ) = Âi (ri r)2 p( Ti ) (see Brown & Benedetti, The same formulas are used to compute an estimator r ⇤
1977, p. 349) for details]. The number of possible tables in- from frequency data f ij . The only difference is that esti-
creases rapidly with N and it is therefore in general compu- mates p̂ij = ( f ij + 0.5)/N of the true probabilities are used.
tationally too expensive to compute this exact value. Thus,
‘exact’ does not mean, that the exact standard error is used Confidence Interval The 100 ⇤ (1 a) confidence interval
in the power calculations. for r ⇤ is computed as follows:
In the exact mode it is not necessary to estimate r in order
to calculate power, because it is already given in the input. CI = [cos(p/(1 + Lĉ )), cos(p/(1 + U ĉ ))],
We nevertheless report the r calculated by the routine in
the output to indicate possible limitations in the precision where
of the routine for |r | near 1. Thus, if the r’s reported in the
output section deviate markedly from those given in the L= exp(ln ŵ + za/2 s(ln ŵ ))
input, all results should be interpreted with care. U = exp(ln ŵ za/2 s(ln ŵ ))
To estimate s(r ) the formula based on asymptotic theory ⇢ ✓ ◆ 1/2
1 1 1 1 1
proposed by Pearson in 1901 is used: s(ln ŵ ) = + + +
N p̂11 p̂12 p̂21 p̂22
1
s (r ) = {( a + d)(b + d)/4+ and za/2 is the a/2 quartile of the standard normal distri-
N 3/2 f(z x , zy , r ) bution.
+( a + c)(b + d)F22 + ( a + b)(c + d)F21 +
+2( ad bc)F1 F2 ( ab cd)F2 Asymptotic standard error for r ⇤ The standard error is
1/2 given by:
( ac bd)F1 }
⇢ ✓ ◆ 1/2
or with respect to cell probabilities: ⇤ 1 1 1 1 1
s (r ) = k + + +
N p̂11 p̂12 p̂21 p̂22
1
s (r ) = {( p11 + p22 )( p12 + p22 )/4+
N 1/2 f(z x , zy , r ) with
sin[p/(1 + w)]
+( p11 + p21 )( p12 + p22 )F22 + k = p ĉw
(1 + w )2
+( p11 + p12 )( p21 + +p22 )F21
where w = ŵ ĉ .
+2( p11 p22 p12 p21 )F1 F2
( p11 p12 p21 p22 )F2
29.5.3 Power calculation
( p11 p21 p12 p22 )F1 }1/2
The H0 distribution is the standard normal distribution
where N (0, 1). The H1 distribution is the normal distribution with
✓ ◆ mean N (m1 , s1 ), where:
z x rzy
F1 = f 0.5,
(1 r2 )1/2 m1 = (r r0 )/sr0
✓ ◆
zy rz x s1 = sr1 /sr0
F2 = f 0.5, and
(1 r2 )1/2
The values sr0 and sr1 denote the standard error (approxi-
1
F(z x , zy , r ) = ⇥ mate or exact) under H0 and H1.
2p (1 r2 )1/2
" #
z2x 2rz x zy + z2y )
⇥ exp . 29.6 Validation
2(1 r 2 )
The correctness of the procedures to calc r, s(r ) and r ⇤, s(r ⇤)
Brown and Benedetti (1977) show that this approximation was checked by reproducing the examples in Brown and
is quite good if the minimal cell frequency is at least 5 (see Benedetti (1977) and Bonett and Price (2005), respectively.
tables 1 and 2 in Brown et al.). The soundness of the power routines were checked by
Monte-Carlo simulations, in which we found good agree-
29.5.2 Approximation mode ment between simulated and predicted power.
r⇤ = cos(p/(1 + w c )),
77
References Hettmansperger, T. P. (1984). Statistical inference based on
ranks. New York: Wiley.
Armitage, P., Berry, G., & Matthews, J. (2002). Statistical Hintze, J. (2006). NCSS, PASS, and GESS. Kaysville, Utah:
methods in medical research (4th edition ed.). Blackwell NCSS.
Science Ltd. Hsieh, F. Y. (1989). Sample size tables for logistic regression.
Barabesi, L., & Greco, L. (2002). A note on the exact com- Statistics in medicine, 8, 795-802.
putation of the student t snedecor f and sample corre- Hsieh, F. Y., Bloch, D. A., & Larsen, M. D. (1998). A simple
lation coefficient distribution functions. Journal of the method of sample size calculation for linear and lo-
Royal Statistical Society: Series D (The Statistician), 51, gistic regression. Statistics in Medicine, 17, 1623-1634.
105-110. Lee, Y. (1971). Some results on the sampling distribution of
Benton, D., & Krishnamoorthy, K. (2003). Computing dis- the multiple correlation coefficient. Journal of the Royal
crete mixtures of continuous distributions: noncen- Statistical Society. Series B (Methodological), 33, 117-130.
tral chisquare, noncentral t and the distribution of the Lee, Y. (1972). Tables of the upper percentage points of
square of the sample multiple correlation coefficient. the multiple correlation coefficient. Biometrika, 59, 179-
Computational statistics & data analysis, 43, 249-267. 189.
Bonett, D. G., & Price, R. M. (2005). Inferential method Lehmann, E. (1975). Nonparameterics: Statistical methods
for the tetrachoric correlation coefficient. Journal of based on ranks. New York: McGraw-Hill.
Educational and Behavioral Statistics, 30, 213-225. Lyles, R. H., Lin, H.-M., & Williamson, J. M. (2007). A prac-
Brown, M. B., & Benedetti, J. K. (1977). On the mean and tial approach to computing power for generalized lin-
variance of the tetrachoric correlation coefficient. Psy- ear models with nominal, count, or ordinal responses.
chometrika, 42, 347-355. Statistics in Medicine, 26, 1632-1648.
Cohen, J. (1969). Statistical power analysis for the behavioural Mendoza, J., & Stafford, K. (2001). Confidence inter-
sciences. New York: Academic Press. val, power calculation, and sample size estimation
Cohen, J. (1988). Statistical power analysis for the behavioral for the squared multiple correlation coefficient under
sciences. Hillsdale, New Jersey: Lawrence Erlbaum As- the fixed and random regression models: A computer
sociates. program and useful standard tables. Educational &
Demidenko, E. (2007). Sample size determination for lo- Psychological Measurement, 61, 650-667.
gistic regression revisited. Statistics in Medicine, 26, O’Brien, R. (1998). A tour of unifypow: A sas mod-
3385-3397. ule/macro for sample-size analysis. Proceedings of the
Demidenko, E. (2008). Sample size and optimal design for 23rd SAS Users Group International Conference, Cary,
logistic regression with binary interaction. Statistics in NC, SAS Institute, 1346-1355.
Medicine, 27, 36-46. O’Brien, R. (2002). Sample size analysis in study planning
Ding, C. G. (1996). On the computation of the distribu- (using unifypow.sas).
tion of the square of the sample multiple correlation (available on the WWW:
coefficient. Computational statistics & data analysis, 22, http://www.bio.ri.ccf.org/UnifyPow.all/UnifyPowNotes020
345-350.
Ding, C. G., & Bargmann, R. E. (1991). Algorithm as 260: Sampson, A. R. (1974). A tale of two regressions. American
Evaluation of the distribution of the square of the Statistical Association, 69, 682-689.
sample multiple correlation coefficient. Applied Statis- Shieh, G. (2001). Sample size calculations for logistic and
tics, 40, 195-198. poisson regression models. Biometrika, 88, 1193-1199.
Dunlap, W. P., Xin, X., & Myers, L. (2004). Computing Shieh, G. (2005). On power and sample size calculations
aspects of power for multiple regression. Behavior Re- for Wald tests in generalized linear models. Journal of
search Methods, Instruments, & Computer, 36, 695-701. Statistical planning and inference, 128, 43-59.
Dunn, O. J., & Clark, V. A. (1969). Correlation coefficients Shieh, G., & Kung, C.-F. (2007). Methodological and compu-
measured on the same individuals. Journal of the Amer- tational considerations for multiple correlation anal-
ican Statistical Association, 64, 366-377. ysis. Behavior Research Methods, Instruments, & Com-
Dupont, W. D., & Plummer, W. D. (1998). Power and sam- puter, 39, 731-734.
ple size calculations for studies involving linear re- Signorini, D. F. (1991). Sample size for poisson regression.
gression. Controlled clinical trials, 19, 589-601. Biometrika, 78, 446-450.
Erdfelder, E., Faul, F., & Buchner, A. (1996). Gpower: A gen- Steiger, J. H. (1980). Tests for comparing elements of a
eral power analysis program. Behavior Research Meth- correlation matrix. Psychological Bulletin, 87, 245-251.
ods, Instruments, & Computer, 28, 1-11. Steiger, J. H., & Fouladi, R. T. (1992). R2: A computer pro-
Faul, F., & Erdfelder, E. (1992). Gpower 2.0. Bonn, Germany: gram for interval estimation, power calculations, sam-
Universität Bonn. ple size estimation, and hypothesis testing in multiple
Frome, E. L. (1986). Multiple regression analysis: Applica- regression. Behavior Research Methods, Instruments, &
tions in the health sciences. In D. Herbert & R. Myers Computer, 24, 581-582.
(Eds.), (p. 84-123). The American Institute of Physics. Whittemore, A. S. (1981). Sample size for logistic regres-
Gatsonis, C., & Sampson, A. R. (1989). Multiple correlation: sion with small response probabilities. Journal of the
Exact power and sample size calculations. Psychologi- American Statistical Association, 76, 27-32.
cal Bulletin, 106, 516-524.
Hays, W. (1988). Statistics (4th ed.). Orlando, FL: Holt,
Rinehart and Winston.
78