Eda Chapters 12 and 13

CHAPTER 12
The box plot of the results shows an indication that there is an increase in strength as
you increase the cotton and then it seems to drop off rather dramatically after 30%.
Design and Analysis of Single Factor Experiments
A completely randomized single factor experiment is an experiment where The null hypothesis asks: does the cotton percent make a difference? Now, it seems
both:
that it doesn't take statistics to answer this question. All we have to do is look at the
One factor of two or more levels has been manipulated. For example, the side by side box plots of the data and there appears to be a difference – however this
experiment may be investigating the effect of different levels of price, or different difference is not so obvious by looking at the table of raw data. A second question,
flavors, or different advertisements. Where two factors are manipulated, such as both frequently asked when the factor is quantitative: what is the optimal level of cotton if
price and flavor being varied, it is then a Multifactor Experiment and not a single you only want to consider strength?
factor experiment.
Each respondent in the survey is shown one and only one of the levels of There is often more than one response measurement that is of interest. You need to
the factor. For example, each respondent may be shown a single product concept, think about multiple responses in any given experiment. In this experiment, for some
one of multiple alternative advertisements or one of multiple pricing structures. In reason, we are interested in only one response, tensile strength, whereas in practice
the language of statistics, this is referred to as a completely randomized experiment. the manufacturer would also consider comfort, ductility, cost, etc.
In a single factor experiment with a CRD the levels of the factor are
randomly assigned to the experimental units. Alternatively, we can think of randomly This single factor experiment can be described as a completely randomized design
assigning the experimental units to the treatments or in some cases, randomly (CRD).
selecting experimental units from each level of the factor.
COMPLETELY RANDOMIZED DESIGN
Example - Cotton Tensile Strength
- means there is no structure among the experimental units.
This is an investigation into the formulation of synthetic fibers that are used to make - There are 25 runs which differ only in the percent cotton, and these will be done
cloth. The response is tensile strength, the strength of the fiber. The experimenter in random order. If there were different machines or operators, or other factors
wants to determine the best level of the cotton in terms of percent, to achieve the
such as the order or batches of material, this would need to be taken into
highest tensile strength of the fiber. Therefore, it has a single quantitative factor, the
account. We will talk about these kinds of designs later. This is an example of a
percent of cotton combined with synthetic fabric fibers.
completely randomized design where there are no other factors that we are
interested in other than the treatment factor percentage of cotton.
The five treatment levels of percent cotton are evenly spaced from 15% to 35%. It
has five replicates, five runs on each of the five cotton weight percentages..
12.1.1 ANALYSIS OF VARIANCE
Analysis of variance (ANOVA)
- is a collection of statistical models and their associated estimation
procedures (such as the "variation" among and between groups) used to
analyze the differences among group means in a sample.
- ANOVA was developed by statistician and evolutionary biologist Ronald
Fisher.
- The ANOVA is based on the law of total variance, where the observed
variance in a particular variable is partitioned into components attributable
to different sources of variation.
- In its simplest form, ANOVA provides a statistical test of whether two or
more population means are equal, and therefore generalizes the t-test
beyond two means.
- This is called the analysis of variance because we are partitioning the total
variation in the response measurements.
The Model Statement We want to test the hypothesis that the means are equal versus at least one is different,
Each measured response can be written as the overall mean plus the treatment effect
plus a random error. i.e.
Corresponding to the sum of squares (SS) are the degrees of freedom associated with the
treatments, a - 1, and the degrees of freedom associated with the error, a × (n - 1), and
finally one degree of freedom is due to the overall mean parameter. These add up
to the total N = a × n, when the ni are all equal to n, or N = ∑ ni otherwise.
Generally we will define our treatment effects so that they sum to 0, a
constraint on our definition of our parameters, ∑ τi = 0. This is not the only constraint Mean square treatment (MST)
we could choose, one treatment level could be a reference such as the zero level for - is the sum of squares due to treatment divided by its degrees of freedom.
cotton and then everything else would be a deviation from that. However, generally
we will let the effects sum to 0. Mean square error (MSE)
The experimental error terms are assumed to be normally distributed, with - is the sum of squares due to error divided by its degrees of freedom.
zero mean and if the experiment has constant variance then there is a single variance
parameter σ2. All of these assumptions need to be checked. This is called the effects If the true treatment means are equal to each other, i.e. the μi are all equal, then these
model. two quantities should have the same expectation.
If they are different then the treatment component, MST will be larger.
An alternative way to write the model, besides the effects model, where the This is the basis for the F-test.
expected value of our observation, E(Yij) = μ + τi or an overall mean plus the
treatment effect. This is called the means model and is written as: The basic test statistic for testing the hypothesis that the means are all equal is the F
ratio, MST/MSE, with degrees of freedom, a-1 and a×(n-1) or a-1 and N-a.
We reject H0 if this quantity is greater than 1-α percentile of the F distribution.

In looking ahead there is also the regression model. Regression models can
also be employed but for now we consider the traditional analysis of variance model Back to the example - Cotton Weight Percent
and focus on the effects of the treatment.
Here is the Analysis of Variance table from the Minitab output:
Analysis of variance formulas that you should be familiar with by now are provided
in the textbook, (Section 3.3).
The total variation is the sum of the observations minus the overall mean squared,
summed over all a × n observations.
The analysis of variance simply takes this total variation and partitions it into the
treatment component and the error component.
Treatment component
- is the difference between the treatment mean and the overall mean. The
error component is the difference between the observations and the
treatment mean, i.e. the variation not explained by the treatments.
Notice when you square the deviations there are also cross product terms, (see equation
3-5), but these sum to zero when you sum over the set of observations. The analysis of
variance is the partition of the total variation into treatment and error components.
All of these multiple comparison procedures are simply aimed at interpreting or
Note a very large F statistic that is, understanding the overall F-test --which means are different?
14.76. The p-value for this F-
statistic is < .0005 which is taken They apply to many situations especially when the factor is qualitative. However, in
from an F distribution pictured this case, since cotton percent is a quantitative factor, doing a test between two
below with 4 and 20 degrees of arbitrary levels e.g. 15% and 20% level, isn't really what you want to know. What
freedom you should focus on is the whole response function as you increase the level of the
quantitative factor, cotton percent.
Figure 12.2 The reference Whenever you have a quantitative factor you should be thinking about modeling that
distribution for the test statistic in relationship with a regression function.
the example
12.1.3 MODEL ASSUMPTION CHECKING

We can see that most of the distribution lies between zero and about four. Our
statistic, 14.76, is far out in the tail, obvious confirmation about what the data show, We should check if the data are normal - they should be approximately normal - they
that indeed the means are not the same. Hence, we reject the null hypothesis. should certainly have constant variance among the groups.
Independence is harder to check but plotting the residuals in the order in which the
12.1.2 MULTIPLE COMPARISONS FOLLOWING THE operations are done can sometimes detect if there is lack of independence. The
ANOVA question in general is how do we fit the right model to represent the data observed. In
this case there's not too much that can go wrong since we only have one factor and it
So, we found the means are significantly different. Now what? In general, if we had is a completely randomized design. It is hard to argue with this model.
a qualitative factor rather than a quantitative factor we would want to know which
means differ from which other ones. We would probably want to do t-tests or Tukey Let's examine the residuals, which are just the observations minus the predicted
maximum range comparisons, or some set of contrasts to examine the differences in values, in this case treatment means. Hence, eij=yij−¯yieij=yij−y¯i.
means. There are many multiple comparison procedures.
These plots don't look exactly
Two methods in particular are: normal but at least they don't
1. Fisher's Least Significant Difference (LSD), and the seem to have any wild
2. Bonferroni Method. outliers. The normal scores
plot looks reasonable. The
Both of these are based on the t-test. residuals versus the order of
the data plot are a plot of the
Fisher's LSD error residuals data in the
- says do an F-test first and if you reject the null hypothesis, then just do order in which the
ordinary t-tests between all pairs of means. observations were taken. This
looks a little suspect in that
Bonferroni method the first six data points all
- is similar, but only requires that you decide in advance how many pairs of have small negative
means you wish to compare, say g, and then perform the g t-tests with a type residuals which are not reflected in the following data points. This looks like it might
I level of α / g. This provides protection for the entire family of g tests that be a start up problem? These are the kinds of clues that you look for... if you are
the type I error is no more than α. For this setting, with a treatments, g = conducting this experiment you would certainly want to find out what was happening
a(a-1)/2 when comparing all pairs of treatments. in the beginning.
12.1.4 Determining Sample Size Considering both of these scenarios, although there is no difference between the
minimums and the maximums, the quantities Σ τi2 are very different.
An important aspect of designing an experiment is to know how many observations Of the two scenarios, the second is the least favorable configuration (LFC).
are needed to make conclusions of sufficient accuracy and with sufficient
confidence. We review what we mean by this statement. Least Favorable Configuration (LFC)
- It is the configuration of means for which you get the least power.
The sample size needed depends on lots of things; including what type of experiment
is being contemplated, how it will be conducted, resources, and desired sensitivity The first scenario would be much more favorable. But generally you do not know
and confidence. which situation you are in. The usual approach is to not to try guess exactly what all
the values of the τi will be but simply to specify δ, which is the maximum difference
Sensitivity between the true means, or δ = max(τi) – min(τi).
- refers to the difference in means that the experimenter wishes to detect, i.e.,
sensitive enough to detect important differences in the means. Going back to our LFC scenario we can calculate this again using Σ τi2 = δ2/2, i.e.
the maximum difference squared over 2. This is true for the LFC for any number of
Generally, increasing the number of replications increases the sensitivity and makes treatments, since Σ τi2 = (δ/2)2 × 2 = δ2/2 since all but the extreme values of τi are
it easier to detect small differences in the means. zero under the LFC.
Power and the margin of error THE USE OF OPERATING CHARACTERISTIC CURVES
- are a function of n and a function of the error variance.
- Most of this course is about finding techniques to reduce this unexplained The OC curves for the fixed effects model are given in the Appendix V.
residual error variance, and thereby improving the power of hypothesis
tests, and reducing the margin of error in estimation. The usual way to use these charts is to define the difference in the means, δ = max (μi)
- min (μi), that you want to detect, specify the value of σ2, and then for the LFC use :
HYPOTHESIS TESTING APPROACH TO DETERMINING SAMPLE SIZE
for various values of n. The Appendix V gives β, where
Our usual goal is to test the hypothesis that the means are equal, versus the 1 - β is the power for the test where ν1 = a - 1 and ν2 =
alternative that the means are not equal. a(n - 1). Thus after setting n, you must calculate ν1 and
ν2 to use the table.
The null hypothesis that the means are all equal implies that the τi's are all equal to
0. Under this framework we want to calculate the power of the F-test in the fixed Example: We consider an α = 0.05 level test for a = 4 using δ = 10 and σ2 = 144 and
effects case. we want to find the sample size n to obtain a test with power = 0.9.
Example - Blood Pressure Let's guess at what our n is and see how this work. Say we let n be equal to 20, let δ
= 10, and σ = 12 then we can calculate the power using Appendix V. Plugging in
Consider the situation where we have four treatment groups that will be using four these values to find Φ we get Φ = 1.3.
different blood pressure drugs, a = 4. We want to be able to detect differences
between the mean blood pressure for the subjects after using these drugs.
One possible scenario is that two of the drugs are effective and two are not. e.g. say
two of them result in blood pressure at 110 and two of them at 120. In this case the
sum of the τi2 for this situation is 100, i.e. τi = (-5, -5, 5, 5) and thus Σ τi2 = 100.
Another scenario is the situation where we have one drug at 110, two of them at 115
and one at 120. In this case the sum of the τi2 is 50, i.e. τi = (-5, 0, 0, 5) and thus Σ
τi2 = 50.
assumptions and the method of contrasting the treatments (a multi-variable
Now go to the chart where ν2 is 80 - 4 = 76 and Φ = 1.3. generalization of simple differences) differ from the fixed-effects model.
This gives us a Type II error of β = 0.45 and power = 1 - β = 0.55.
A “group” effect is random if we can think of the levels we observe in that group to
It seems that we need a larger sample size. be samples from a larger population.
Well, let's use a sample size of 30. In this case we get Φ2 = 2.604, so Φ = 1.6. ■ Example: if collecting data from different medical centers, “center” might be
thought of as random.
Now with ν2 a bit more at 116, we have β = 0.30 and power = 0.70.
■ Example: if surveying students on different campuses, “campus” may be a
So we need a bit more than n = 30 per group to achieve a test with power = 0.8. random effect.
Mixed-effects models
12.2 THE RANDOM EFFECTS MODEL - A mixed-effects model (class III) contains experimental factors of both
fixed and random-effects types, with appropriately different interpretations
RANDOM EFFECTS MODEL and analysis for the two types.
Random effects model Example: Teaching experiments could be performed by a college or university
- also called a variance components model, department to find a good introductory textbook, with each text considered a
- is a statistical model where the model parameters are random variables. treatment. The fixed-effects model would compare a list of candidate texts. The
- It is a kind of hierarchical linear model, which assumes that the data being random-effects model would determine whether important differences exist among a
analysed are drawn from a hierarchy of different populations whose list of randomly selected texts. The mixed-effects model would compare the (fixed)
differences relate to that hierarchy. incumbent texts to randomly selected alternatives.
- In econometrics, random effects models are used in the analysis of
hierarchical or panel data when one assumes no fixed effects (it allows for Defining fixed and random effects has proven elusive, with competing definitions
individual effects). arguably leading toward a linguistic quagmire.
- The random effects model is a special case of the fixed effects model.
Contrast this to the biostatistics definitions, as biostatisticians use "fixed" and

12.2.1 FIXED VS. RANDOM
Example:
"random" effects to respectively refer to the population-average and subject-specific
effects (and where the latter are generally assumed to be unknown, latent variables).
THREE CLASSES OF MODELS USED IN THE ANALYSIS OF VARIANCE
Fixed-effects models
- The fixed-effects model (class I) of analysis of variance applies to situations
in which the experimenter applies one or more treatments to the subjects of
the experiment to see whether the response variable values change. This
allows the experimenter to estimate the ranges of response variable values
that the treatment would generate in the population as a whole.
Random-effects models
- Random-effects model (class II) is used when the treatments are not fixed.
This occurs when the various factor levels are sampled from a larger
population. Because the levels themselves are random variables, some
was assigned a relatively modest share of the weight (23%). It therefore had less pull
Under the fixed-effect model we assume that the true effect size for all studies is on the mean, which was computed as 0.36. Similarly, Carroll is one of the smaller
identical, and the only reason the effect size varies between studies is sampling error studies and happens to have the smallest effect size. Under the fixed-effect model
(error in estimating the effect size). Therefore, when assigning weights to the Carroll was assigned a relatively small proportion of the total weight (12%), and had
different studies we can largely ignore the information in the smaller studies since little influence on the summary effect. By contrast, under the random-effects model
we have better information about the same effect size in the larger studies. Carroll carried a somewhat higher proportion of the total weight (16%) and was able
to pull the weighted mean toward the left.
The operating premise, as illustrated in these examples, is that whenever 2 is nonzero, the
relative weights assigned under random effects will be more balanced than those assigned
under fixed effects. As we move from fixed effect to random effects, extreme studies will
lose influence if they are large, and will gain influence if they are small.
CONFIDENCE INTERVAL
Under the fixed-effect model the only source of uncertainty is the within-study
(sampling or estimation) error. Under the random-effects model there is this same
source of uncertainty plus an additional source (between-studies variance). It follows
that the variance, standard error, and confidence interval for the summary effect will
always be larger (or wider) under the random-effects model than under the fixed-
By contrast, under the random-effects model the goal is not to estimate one true effect model (unless T2 is zero, in which case the two models are the same). In this
effect, but to estimate the mean of a distribution of effects. Since each study provides example, the standard error is 0.064 for the fixed-effect model, and 0.105 for the
information about a different effect size, we want to be sure that all these effect sizes random-effects model
are represented in the summary estimate. This means that we cannot discount a small
study by giving it a very small weight (the way we would in a fixed-effect analysis).
The estimate provided by that study may be imprecise, but it is information about an
effect that no other study has estimated. By the same logic we cannot give too much
weight to a very large study (the way we might in a fixed-effect analysis). Our goal
is to estimate the mean effect in a range of studies, and we do not want that overall
estimate to be overly influenced by any one of them.
In these graphs, the weight assigned to each study is reflected in the size of the box
(specifically, the area) for that study. Under the fixed-effect model there is a wide
range of weights (as reflected in the size of the boxes) whereas under the random-
effects model the weights fall in a relatively narrow range.
For example, compare the weight assigned to the largest study (Donat) with that
assigned to the smallest study (Peck) under the two models. Under the fixed-effect
model Donat is given about five times as much weight as Peck. Under the random-
effects model Donat is given only 1.8 times as much weight as Peck.
EXTREME EFFECT SIZE IN A LARGE STUDY OR A SMALL STUDY
How will the selection of a model influence the overall effect size? In this example Donat
is the largest study, and also happens to have the highest effect size. Under the fixed-
effect model Donat was assigned a large share (39%) of the total weight and pulled the
mean effect up to 0.41. By contrast, under the random-effects model. Donat
Consider what would happen if we had five studies, and each study had an infinitely 12.2.2 ANOVA AND VARIANCE COMPONENTS
large sample size. Under either model the confidence interval for the effect size in
each study would have a width approaching zero, since we know the effect size in To illustrate the concepts with some simple formulas, let us consider a metaanalysis
that study with perfect precision. Under the fixed-effect model the summary effect of studies with the very simplest design, such that each study comprises a single
would also have a confidence interval with a width of zero, since we know the sample of n observations with standard deviation. We combine estimates of the mean
common effect precisely (Figure 13.3). By contrast, under the random-effects model in a meta-analysis. The variance of each estimate is
the width of the confidence interval would not approach zero (Figure 13.4). While
we know the effect in each study precisely, these effects have been sampled from a
universe of possible effect sizes, and provide only an estimate of the mean effect.
Just as the error within a study will approach zero only as the sample size approaches
so the (inverse-variance) weight in a fixed-effect
infinity, so too the error of these studies as an estimate of the mean effect will
meta-analysis is
approach zero only as the number of studies approaches infinity.
More generally, it is instructive to consider what factors influence the standard error
and the variance of the summary effect under the fixed-effect model the standard
of the summary effect under the two models. The following formulas are based on a error is given by
meta-analysis of means from k one-group studies, but the conceptual argument
applies to all meta-analyses. The within-study variance of each mean depends on the
standard deviation (denoted) of participants’ scores and the sample size of each study
(n). For simplicity we assume that all of the studies have the same sample size and
the same standard deviation (see Box 13.1 for details).
Therefore under the fixed-effect model the (true) standard error of the summary
Under the fixed-effect model the standard error of the summary effect is given by mean is given by
It follows that with a large enough sample size the

standard error will approach zero, and this is true
whether the sample size is concentrated on one or two
studies, or dispersed across any number of studies.
Under the random-effects model the standard error of the summary effect is given by Under the random-effects model the weight awarded to each study is
The first term is identical to that for the fixed-effect model and, again, with a large and the (true) standard error of the summary mean turns out to be
enough sample size, this term will approach zero. By contrast, the second term
(which reflects the between-studies variance) will only approach zero as the number
of studies approaches infinity. These formulas do not apply exactly in practice, but
the conceptual argument does. Namely, increasing the sample size within studies is
not sufficient to reduce the standard error beyond a certain point (where that point is
determined by 2 and k). If there is only a small number of studies, then the standard
error could still be substantial even if the total n is in the tens of thousands or higher.
THE NULL HYPOTHESIS Therefore, in these cases the random-effects model is more easily justified than the fixed-
effect model. Additionally, the goal of this analysis is usually to generalize to a range of
Often, after computing a summary effect, researchers perform a test of the null scenarios. Therefore, if one did make the argument that all the studies used an identical,
hypothesis. Under the fixed-effect model the null hypothesis being tested is that there narrowly defined population, then it would not be possible to extrapolate from this
is zero effect in every study. Under the random-effects model the null hypothesis population to others, and the utility of the analysis would be severely limited.
being tested is that the mean effect is zero. Although some may treat these
hypotheses as interchangeable, they are in fact different, and it is imperative to A CAVEAT
choose the test that is appropriate to the inference a researcher wishes to make.
There is one caveat to the above. If the number of studies is very small, then the
WHICH MODEL SHOULD WE USE? estimate of the between-studies variance (2) will have poor precision. While the
random-effects model is still the appropriate model, we lack the information needed
The selection of a computational model should be based on our expectation about to apply it correctly. In this case the reviewer may choose among several options,
whether or not the studies share a common effect size and on our goals in performing each of them problematic. One option is to report the separate effects and not report a
the analysis. summary effect. The hope is that the reader will understand that we cannot draw
conclusions about the effect size and its confidence interval. The problem is that
FIXED EFFECT some readers will revert to vote counting (see Chapter 28) and possibly reach an
It makes sense to use the fixed-effect model if two conditions are met. First, we erroneous conclusion. Another option is to perform a fixed-effect analysis. This
believe that all the studies included in the analysis are functionally identical. Second, approach would yield a descriptive analysis of the included studies, but would not
our goal is to compute the common effect size for the identified population, and not allow us to make inferences about a wider population. The problem with this
to generalize to other populations. approach is that (a) we do want to make inferences about a wider population and (b)
readers will make these inferences even if they are not warranted. A third option is to
For example, suppose that a pharmaceutical company will use a thousand patients to take a Bayesian approach, where the estimate of 2 is based on data from outside of
compare a drug versus placebo. Because the staff can work with only 100 patients at the current set of studies. This is probably the best option, but the problem is that
a time, the company will run a series of ten trials with 100 patients in each. The relatively few researchers have expertise in Bayesian meta-analysis. Additionally,
studies are identical in the sense that any variables which can have an impact on the some researchers have a philosophical objection to this approach.
outcome are the same across the ten studies. Specifically, the studies draw patients
from a common pool, using the same researchers, dose, measure, and so on (we 12.3 THE RANDOMIZED COMPLETE BLOCK DESIGN
assume that there is no concern about practice effects for the researchers, nor for the (RCBD)
different starting times of the various cohorts).
RCBD
All the studies are expected to share a common effect and so the first condition is
- is the standard design for agricultural experiments where similar
met. The goal of the analysis is to see if the drug works in the population from which
experimental unit are grouped into blocks or replicates.
the patients were drawn (and not to extrapolate to other populations), and so the
- It is used to control variation in an experiment by accounting for spatial
second condition is met, as well. In this example the fixed-effect model is a plausible
effects in field or greenhouse.
fit for the data and meets the goal of the researchers. It should be clear, however, that
-
this situation is relatively rare.
Example: variation in fertility or drainage differences in a field
RANDOM EFFECTS
The field or space is divided into uniform units to account for any variation so that
By contrast, when the researcher is accumulating data from a series of studies that
observed differences are largely due to true differences between treatments.
had been performed by researchers operating independently, it would be unlikely that
Treatments are then assigned at random to the subjects in the blocks-once in each
all the studies were functionally equivalent. Typically, the subjects or interventions in
block. The defining feature of the Randomized Complete Block Design is that each
these studies would have differed in ways that would have impacted on Chapter 13:
block sees each treatment exactly once.
Fixed-Effect Versus Random-Effects Models the results, and therefore we should not
assume a common effect size.
ADVANTAGES OF THE RCBD ANALYSIS OF VARIANCE
Generally more precise than the completely randomized design (CRD). No

restriction on the number of treatments or replicates. Some treatments may be
replicated more times than others. Missing plots are easily estimated.
DISADVANTAGES OF THE RCBD
Error degrees of freedom is smaller than that for the CRD (problem with a
small number of treatments). Large variation between experimental units within a
block may result in a large error term If there are missing data, a RCBD experiment
may be less efficient than a CRD.
NOTE: The most important item to consider when choosing a design is the
uniformity of the experimental units. Mathematical Model
THE LAYOUT OF THE EXPERIMENT

Where: symbols are the same as identified previously and
Choose the number of blocks (minimum 2) – e.g. 4
Choose treatments (assign numbers or letters for each) – e.g. 6 trt – A,B, C, D, E, F
J= a particular block
Example: An experiment with 4 treatments (A, B, C, D) and 4 block
MODEL AND ANALYSIS FOR RANDOMIZED COMPLETE BLOCK
DESIGNS
The randomized complete block design (RCBD)
v treatments (They could be treatment combinations.)
b blocks of v units, chosen so that units within a block are alike (or at least similar)
and units in different blocks are substantially different. (Thus the total number of
experimental units is n = bv.)
The v experimental units within each block are randomly assigned to the v
treatments. (So each treatment is assigned one unit per block.)
Model:
Number in upper left-hand corner are plot numbers.
Letters are treatments

εhi’s independent
CHAPTER 13
where
13.1 Factorial Experiments
Yhi is the random variable representing the response for treatment i observed in Factor
block h, - is used in a general sense to denote any feature of the experiment such as
temperature, time, or pressure that may be varied from trial to trial. We
µ is a constant (which may be thought of as the overall mean – see below) define the levels of a factor to be the actual values used in the experiment.
θh is the (additive) effect of the hth block (h = 1, 2, … , b) τi is the (additive) effect For each of these cases it is important to determine not only if the two factors each
of the ith treatment (i = 1, 2, … , v) εhi is the random error for the ith treatment in the has an influence on the response, but also if there is a significant interaction between
hth block. the two factors. As far as terminology is concerned, the experiment described here is
a two-factor experiment and the experimental design may be either a completely
Note: This model formally looks just like a two-way main effects model – but you need randomized design, in which the various treatment combinations are assigned
to remember that there is just one factor plus one block; the randomization is just within randomly to all the experimental units, or a randomized complete block design, in
each block. So we don’t have the conditions for a two-way analysis of variance. which factor combinations are assigned randomly to blocks. In the case of the yeast
example, the various treatment combinations of temperature ands drying time would
be assigned randomly to the samples of yeast if we are using a completely
Like the main-effects model, this is an additive model that does not provide for any randomized design.
interaction between block and treatment level – it assumes that treatments have the
same effect in every block, and the only effect of the block is to shift the mean A factorial experiment in two factors involves experimental trials (or a single trial) at
response up or down. If interaction between block and factor is suspected, then either all factor combinations. For example, in the temperature-drying-time example with,
a transformation is needed to remove interaction before using this model, or a design say, three levels of each and n = 2 runs at each of the nine combinations, we have a
with more than one observation per block-treatment combination must be used. two-factor factorial in a completely randomized design. Neither factor is a blocking
(Trying to add an interaction term in the RCBD would create the same problem as is factor; we are interested in how each influence percent solids in the samples and
encountered in two-way ANOVA with one observation per cell: the degrees of whether they interact. The: biologist would then have available 18 physical samples
freedom for the error is zero, so the method of analysis breaks down.) of material which are: experimental units. These: would then be assigned randomly
to the 18 combinations (nine treatment combinations, each duplicated).
This is an over-specified model; the additional constraints and typically added, so that the Before we launch into analytical details, sums of squares, and so on, it may be of
treatment and block effects are thought of as deviations from the overall mean. interest for the reader to observe the obvious connection between what we have
described and flic situation with the one-factor problem. Consider the yeast
experiment. Explanation of degrees of freedom aids the reader or the analyst in
visualizing the extension. We should initially view the 9 treatment combinations as if
References: they represent one factor with 0 levels (8 degrees of freedom). Thus, an initial look at
degrees of freedom gives
https://newonlinecourses.science.psu.edu/stat503/node/14/ https://www.meta-
analysis.com/downloads/Meta-analysis%20Fixed-effect%20vs%20Random-effects
%20models.pdf
http://pbgworks.org/sites/pbgworks.org/files/RandomizedCompleteBlockDesignTuto
rial.pdf
13.1.1 MAIN EFFECTS AND INTERACTION
The experiment could be analyzed as described in the above table. However, the F-
test for combinations would probably not give the analyst the information he or she
desires, namely, that which considers the role of temperature and drying time.
Three drying times have 2 associated degrees of freedom, three temperatures have 2
degrees of freedom. The main factors, temperature and drying time, are called main
effects. The main effects represent 4 of the 8 degrees of freedom for factor
combinations. The additional 4 degrees of freedom are associated with interaction
between the two factors. As a result, the analysis involves
Statistical (effects) Models:

Factors in an analysis of variance may be viewed as fixed or random, depending on the
type if inference desired and how the levels were chosen. Here which must consider fix
effects, random effects, and even cases where effects are mixed. Most, attention will be
drawn toward expected mean squares when we advance to these topics.
13.2 TWO-FACTOR FACTORIAL EXPERIMENTS

13.3 2K FACTORIAL DESIGN
Two-factor factorial design
- is an experimental design in which data is collected for all possible Factorial designs
combinations of the levels of the two factors of interest. - are frequently used in experiments involving several factors where it is
necessary to study the joint effect of the factors on a response. However,
If equal sample sizes are taken for each of the possible factor combinations, then the several special cases of the general factorial design are important because
design is a balanced two-factor factorial design. they are widely employed in research work and because they form the basis
of other designs of considerable practical value.
A balanced a × b factorial design is a factorial design for which there are a levels of
factor A, b levels of factor B, and n independent replications taken at each of the a × The most important of these special cases is that of k factors, each at only two levels:
b treatment combinations. The design size is N = abn. 1. quantitative and
2. qualitative.
The effect of a factor is defined to be the average change in the response associated
with a change in the level of the factor. This is usually called a main effect. These levels may be quantitative, such as two values of temperature, pressure, or
time; or they may be qualitative, such as two machines, two operators, the “high’’
If the average change in response across the levels of one factor are not the same at and “low’’ levels of a factor, or perhaps the presence and absence of a factor.
all levels of the other factor, then we say there is an interaction between the factors.
The design of an experiment plays a major role in the eventual solution of the A complete replicate of such a design requires 2 X 2 X ••• X 2 = 2k observations and
problem. In a factorial experimental design, experimental trials (or runs) are is called a 2k factorial design.
performed at all combinations of the factor levels. The analysis of variance
(ANOVA) will be used as one of the primary tools for statistical data analysis.
2k design Similarly, the main
- is particularly useful in the early stages of experimental work, when many effect of B is found by
factors are likely to be investigated. It provides the smallest number of runs averaging the
for which k factors can be studied in a complete factorial design. Because observations on the top
there are only two levels for each factor, we must assume that the response of the square, where B is
is approximately linear over the range of the factor levels chosen. at the high level, and
subtracting the average
2K DESIGN of the observations on
the bottom of the square,
2
The simplest type of 2k design is the 2 —that is, two factors A and B, each at two where B is at the low
levels. We usually think of these levels as the low and high levels of the factor. The level:
2
2 design is shown in Fig. 13-a.
Equation 2:
2
Note: the design can be represented geometrically as a square with the 2 = 4 runs, or
2
treatment combinations, forming the corners of the square. In the 2 design it is
customary to denote the low and high levels of the factors A and B by the signs - and
+, respectively. This is sometimes called the geometric notation for the design.
Special notation
- is used to label the treatment combinations. In general, a treatment
combination is represented by a series of lowercase letters. If a letter is Finally, the AB interaction is estimated by taking the difference in the diagonal
present, the corresponding factor is run at the high level in that treatment averages.
combination; if it is absent, the factor is run at its low level.
Equation 3:
For example, treatment combination a indicates that factor A is at the high level and
factor B is at the low level. The treatment combination with both factors at the low
level is represented by (1). This notation is used throughout the 2k design series. For
example, the treatment combination in a 24 with A and C at the high level and B and
D at the low level is denoted by ac. The quantities in brackets in Equations 1, 2, and 3 are called contrasts. For example,
the A contrast is
2
The effects of interest in the 2 design are the main effects A and B and the two-
factor interaction AB. Let the letters (1), a, b, and ab also represent the totals of all n
observations taken at these design points. It is easy to estimate the effects of these
factors. To estimate the main effect of A, we would average the observations on the
In these equations, the contrast coefficients are always either +1 or -1. A table of plus
right side of the square in Fig. 13-a where A is at the high level, and subtract from
and minus signs, such as Table 13-a, can be used to determine the sign on each
this the average of the observations on the left side of the square, where A is at the
low level, or factorial design. treatment
If the difference is small, the center points lie on or near the plane passing through
13.3.1 SINGLE REPLICATE FOR THE 2K DESIGN the factorial points, and there is no curvature. On the other hand, if is large, curvature
is present. A single degree-of-freedom sum of squares for curvature is given by
As the number of factors in a factorial experiment grows, the number of effects
that can be estimated also grows. For example, a 24 experiment has 4 main effects, 6
two-factor interactions, 4 three-factor interactions, and 1 four-factor interaction, while
a 26 experiment has 6 main effects, 15 two-factor interactions, 20 three-factor
interactions, 15 four-factor interactions, 6 five-factor interactions, and 1 six-factor
interaction. In most situations the sparsity of effects principle applies; that is, the
system is usually dominated by the main effects and low order interactions. The
three-factor and higher order interactions are usually negligible.
Therefore, when the number of factors is moderately large, say, k ≥ 4 or 5, a common

practice is to run only a single replicate of the 2k design and then pool or combine
the higher order interactions as an estimate of error. Sometimes a single replicate of a
Figure 13-b
2k design is called an unreplicated 2k factorial design.
where, in general, nF is the number of factorial design points. This quantity may be
When analyzing data from unreplicated factorial designs, occasionally real high-
compared to the error mean square to test for curvature.
order interactions occur. The use of an error mean square obtained by pooling high-
order interactions is inappropriate in these cases.
Notice that when the equation above is divided by δ2 = MSE, the result is like the
square of the t statistic used to compare two means. More specifically, when points
A simple method of analysis can be used to overcome this problem. Construct a plot
are added to the center of the 2k design, the model we may entertain is
of the estimates of the effects on a normal probability scale. The effects that are
negligible are normally distributed, with mean zero and variance 2 and will tend to
fall along a straight line on this plot, whereas significant effects will have nonzero
means and will not lie along the straight line.
13.3.2 ADDITION OF CENTER POINTS TO A 2K DESIGN where the are pure quadratic effects. The test for curvature tests the hypotheses
A potential concern in the use of two-level factorial designs is the assumption
of linearity in the factor effects. Of course, perfect linearity is unnecessary, and the
2k system will work quite well even when the linearity assumption holds only
approximately. However, there is a method of replicating certain points in the 2k
factorial that will provide protection against curvature as well as allow an
independent estimate of error to be obtained. The method consists of adding center
points to the 2k design. These consist of nC replicates run at the point xi = 0 (i = 1, 2,
. . ., k). One important reason for adding the replicate runs at the design center is that
center points do not affect the usual effects estimates in a 2k design. We assume that
the k factors are quantitative.
Furthermore, if the factorial points in the design are unreplicated, we may use the nC
To illustrate the approach, consider a 22 design with one observation at each of the center points to construct an estimate of error with nC 1 degrees of freedom.
factorial points ( - , - ), (+ , - ), ( - , +), and (+, +) and nC observations at the center points
(0, 0). Figure S14-3 illustrates the situation. Let be the average of the four runs at the four
factorial points and let be the average of the nC run at the center point.
CONFOUNDING
13.4 Blocking and Confounding in the 2K Design Confounding

Blocking - is a design technique for arranging a complete factorial experiment in
blocks, where block size is smaller than the number of treatment
Blocking factors and nuisance factors combinations in one replicate.
- provide the mechanism for explaining and controlling variation among the - It gives information about certain treatment effects to be indistinguishable
experimental units from sources that are not of interest to you and therefore from (confounded with) blocks.
are part of the error or noise aspect of the analysis.
Block designs
- help maintain internal validity, by reducing the possibility that the observed 2
For example: Consider a 2 factorial design in 2 blocks.
effects are due to a confounding factor, while maintaining external validity
by allowing the investigator to use less stringent restrictions on the Block 1: (1) and ab
sampling population.
Block 2: a and b
Each set of non-homogeneous conditions define a block and each replicate is run in
one of blocks. If there are n replicates of the design, then each replicate is a block.
Each replicate is run in one of the blocks (time periods, batches of raw material, etc.)
Runs within the block are randomized.
Consider the example:
k = 2 factors, n = 3 replicates
This is the “usual” method for calculating a block sum of squares.
Defining contrast:
xi is the level of the ith factor appearing in a particular treatment combination is

the exponent appearing on the ith factor in the effect to be confounded
In case, the higher order interactions are not of much use or much importance, then they
Estimation of Error can possibly be ignored. The information on main and lower order interaction effects can
then be obtained by conducting a fraction of complete factorial experiments.
Such experiments are called as fractional factorial experiments.
Fractional factorial experiments.

- The utility of such experiments becomes more when the experimental
process is more influenced and governed by the main and lower order
interaction effects rather than the higher order interaction effects.
- The fractional factorial experiments need less number of plots and lesser
experimental material than required in the complete factorial experiments.
Hence it involves less cost, less manpower, less time etc.
Examples:
In order to have more understanding of the fractional factorial, we consider
the setup of 26 factorial experiment. Since the highest order interaction in this case is
ABCDEF, so we construct the one-half fraction using I = ABCDEF as defining
relation. Then we write all the factors 2 6-1=25 factorial experiment in the standard
order. Then multiply all the factors with the defining relation.
For example:
13.5 FRACTIONAL REPLICATION OF THE 2K DESIGN

Consider the set-up of complete factorial experiment, say 2k. If there are four One half fraction of 26 factorial experiment using I = ABCDEF as defining relation:
factors, then the total number of plots needed to conduct the experiment is 4 24=16.
When the number of factors increases to six, then the required number of plots to
conduct the experiment becomes 26=64 and so on.
Moreover, the number of treatment combinations also becomes large when

the number of factor increases. Sometimes, it is so large that it becomes practically
difficult to organize such a huge experiment. Also, the quantity of experimental
material needed, time, manpower etc. also increase and sometimes even it may not
be possible to have so much of resources to conduct a complete factorial experiment.
About the degree of freedoms, in the 26 factorial experiment there are 26-
1=63 degrees of freedom which are divided as 6 for main effects, 15 for two factor
interactions and rest 42 for three or higher order interactions.
13.6 RESPONSE SURFACE METHODS
Response surface methodology (RSM)

- is a collection of mathematical and statistical techniques for empirical
model building. By careful design of experiments, the objective is to
optimize a response (output variable) which is influenced by several
independent variables (input variables). An experiment is a series of tests,
called runs, in which changes are made in the input variables in order to
identify the reasons for changes in the output response.
- dates from the 1950's. Early applications were found in the chemical industry.
Objective Of Response Surface Methods (RSM)

- optimization, The actual variables in their natural units of measurement are used in the experiment.
- finding the best set of factor levels to achieve some goal. However, when we design our experiment we will use our coded variables, X1 and
X2 which will be centered on 0, and extend +1 and -1 from the center of the region
RSM AS A SEQUENTIAL PROCESS of experimentation. Therefore, we will take our natural units and then center and
rescale them to the range from -1 to +1.
DESIGN OF EXPERIMENTS
In a traditional DoE, screening experiments are performed in the early

stages of the process, when it is likely that many of the design variables initially
considered have little or no effect on the response. The purpose is to identify the
design variables that have large effects for further investigation.
Genetic Programming has shown good screening properties (Gilbert et al., 1998), as
will be demonstrated in Section 6.2, which suggests that both the selection of the
relevant design variables and the identification of the model can be carried out at the
same time
Screening Response Model
The text has a graphic depicting a response surface method in three dimensions, The screening model that we used for the first order situation involves linear effects
though actually it is four-dimensional space that is being represented since the three and a single cross product factor, which represents the linear x linear interaction
factors are in 3-dimensional space the the response is the 4th dimension. component.
Instead, let's look at 2 dimensions - this is easier to think about and visualize. There
is a response surface and we will imagine the ideal case where there is actually a 'hill'
which has a nice centered peak.
b. Steepest Ascent Model
If we ignore cross products which gives an indication of the curvature

of the response surface that we are fitting and just look at the first order model this is
called the steepest ascent model:
c. Optimization Model
Then, when we think that we are somewhere near the 'top of the hill'
we will fit a second order model. This includes in addition the two second-order
quadratic terms.
REFERENCE
http://www.just.edu.jo/~haalshraideh/Courses/IE347/Two%20factor%20factorial%2
0experiments.pdf
http://www.um.edu.ar/math/montgomery.pdf
https://www.csie.ntu.edu.tw/~sdlin/download/Probability%20&%20Statistics.pdf
http://www.stat.ncku.edu.tw/faculty_private/rbchen/experimental_design/ExChapter
7.ppt
http://isdl.cau.ac.kr/education.data/DOEO66PT/5.blocking.confounding.pdf
https://newonlinecourses.science.psu.edu/stat503/node/57/
https://www.statease.com/documents/23/rsm_part1_intro.pdf
http://home.iitk.ac.in/~shalab/anova/chapter11-anova-fractional-replications.pdf
https://newonlinecourses.science.psu.edu/stat503/node/18/

Eda Chapters 12 and 13

Uploaded by

Copyright:

Available Formats

Eda Chapters 12 and 13

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Eda Chapters 12 and 13

Uploaded by

Copyright:

Available Formats

CHAPTER 12

We reject H0 if this quantity is greater than 1-α percentile of the F distribution.

12.1.3 MODEL ASSUMPTION CHECKING

Contrast this to the biostatistics definitions, as biostatisticians use "fixed" and

THREE CLASSES OF MODELS USED IN THE ANALYSIS OF VARIANCE

EXTREME EFFECT SIZE IN A LARGE STUDY OR A SMALL STUDY

It follows that with a large enough sample size the

Generally more precise than the completely randomized design (CRD). No

DISADVANTAGES OF THE RCBD

THE LAYOUT OF THE EXPERIMENT

The randomized complete block design (RCBD)

v treatments (They could be treatment combinations.)

Number in upper left-hand corner are plot numbers.

Letters are treatments

Statistical (effects) Models:

13.2 TWO-FACTOR FACTORIAL EXPERIMENTS

Therefore, when the number of factors is moderately large, say, k ≥ 4 or 5, a common

13.4 Blocking and Confounding in the 2K Design Confounding

Consider the example:

xi is the level of the ith factor appearing in a particular treatment combination is

Such experiments are called as fractional factorial experiments.

Fractional factorial experiments.

13.5 FRACTIONAL REPLICATION OF THE 2K DESIGN

Moreover, the number of treatment combinations also becomes large when

Response surface methodology (RSM)

Objective Of Response Surface Methods (RSM)

In a traditional DoE, screening experiments are performed in the early

Screening Response Model

If we ignore cross products which gives an indication of the curvature

You might also like