Handout 1 Experiment Design
Handout 1 Experiment Design
Handout 1 Experiment Design
TERMINOLOGY
1. Design of an experiment
Designing an experiment simply means planning an experiment so that the information collected
will be relevant to the problem under investigation. This is therefore the complete sequence of
steps taken ahead of time to ensure that the appropriate data will be obtained in a way which
permits objective analysis leading to valid references or conclusions with respect to the stated
problem. This definition shows that the person formulating the design clearly understands the
objectives of the proposed investigation or experiment.
2. Experimental unit
These are objects upon which measurements are taken i.e on which variable understudy is
measured. In an agricultural field experiment, the plot of land will be the experimental unit. In a
feeding experiment of cows, the cow is the experimental unit, etc.
3. Factors
4. Level
This is the intensity setting of a factor e.g the 3 temperatures 100℃, 200℃, 300℃ represent 3
levels of the quantitative factor temperature. Similarly Benz, V/W, BMW, Corolla represent 4
levels of the qualitative factor type.
5. Treatment
This is the specific combination of factor levels. The experiment may involve only a single factor
and will therefore have one-way ANOVA. The experiment could also be composed of levels of 2
or more factors in which case we have two-way ANOVA......... n-way ANOVA where n is the
number of factors. In other words what one does to the experimental unit that makes it differ
from one population to another is called a treatment e.g for one-way ANOVA, no of treatments =
no of levels of a factor.
For a two-way ANOVA, the no of treatments = no of combinations (cells) = no. rows x no. of
columns
Variety Fertiliser
F1 F2 F3 F4
V1 * * * *
V2 * * * *
V3 * * * *
No of treatments = 12
6. Experimental error
This is the unexplained random part of the total variation and it is caused by a number of factors,
most important of which are the following;
The experimental error provides a basis for the confidence to be placed in the conclusions of
inferences about the population so it is important to estimate and control the experimental error.
7. Replication
Advantages of replicating
1. It makes the experiment more precise.
1
Precision = ; reciprocal of the variance of the mean. As n increases, the precision
𝛿𝑥̅2
also increases.
2. Replication gives an estimate of the error.
3. It guards against accidents and with many replications, the investigator is in position to
identify outliers.
8. Randomisation
Each individual experimental unit has a known probability of being subjected to each treatment
and the advantages include;
It should be noted that in the planning stages of an experiment, the experimenter must decide
whether the levels of factors considered are to be set at fixed values or are to be chosen at
random from many possible levels. This will depend on the objectives of the experiment and the
question to be asked is:
Are the results to be judged for those levels alone fixed or are they to be extended to more levels
of which those in the experiment are a random sample.
In the case of quantitative factors such as time, weight, etc it is usually desirable to pick fixed
levels, some at the extremes and some at intermediate points because a random choice might not
cover the range in which the experimenter is interested e.g 0℃, 100℃, 200℃, 300℃, 400℃ .
Other factors such as days of the week, halls of residence, locations, etc may often be a small
sample of all possible days, possible halls of residence, etc. In such cases the particular day or
particular hall may not be very important. What is important is whether or not days, halls, etc. in
general increase the variability of the experiment.
Once the decision has been made as to whether to consider the levels random or fixed, if random
levels are to be used then they must be chosen from all possible levels by a random process.
When all levels are fixed, the mathematical model of the process is called a fixed model. When
all levels are chosen at random, then we have a random model. When more than one factor is
involved, some factors may be at fixed levels and others at random levels and the model would
be a mixed model. Suppose we have a single factor experiment, the model is;
𝑦𝑖𝑗= 𝜇+𝛼𝑖 +𝜖𝑖𝑗
µ is the general mean about which the observations are supposed to fluctuate.
𝛼𝑖 is the effect of the 𝑖th treatment
∈𝑖𝑗 is the experimental error/ error term
𝜖𝑖𝑗 ≅ 𝑁𝐼𝐷 ( 0,𝛿𝜖2 ) Normally and independently distributed with 0 mean and variance
𝛿𝜖2
1. 𝛼𝑖′𝑠 are fixed constants and ∑𝑎𝑖=1 𝛼𝑖 = 1. 𝛼𝑖′𝑠≅𝑁𝐼𝐷 (𝑂,𝛿𝛼2) They are random
∑𝑎𝑖=1(µ𝑖 − µ) = 0 variables and are normally and
independently distributed with mean
0, and variance 𝜹𝟐𝜶. which represents
a - no of levels or treatments the variance among the 𝛼𝑖′𝑠 or among
the treatment means and the 𝜶𝒊′𝒔 will
average to zero if averaged over all
the possible levels but for a levels
0℃ 100℃ 200℃ 300℃
will not average to zero.
_ _ _ _
_ _ _ _
_ _ _ _
Total X1 X2 X3 X4
3. Hypothesis
2. Analysis is the same.
Ho : 𝜎𝛼2 = 0
3. Hypotheses
Ho: α1 = α2 =………. = αa = 0
HA : 𝜎𝛼2 ≠ 0
HA: Some αi ≠ 0
OR
HA: Some µi ≠ µ
Introduction
Consider the problem of determining whether or not different types of tyres exhibit different
amounts of tread loss after 20,000km of driving.
A manager wishes to consider 4 tyres that are available and make some decision about which
type or brand might show the least amount of tread wear or loss.
The brands to be considered are A, B, C, and D and she wants to try these 4 brands under actual
driving conditions. The variable to be measured is the difference in maximum tread thickness on
the tyre between the time it is mounted on the wheel of a car and after it has completed
1
20,000km on a car. The measured variable 𝑦𝑖𝑗 is this difference in thickness in cm.
1000
The single factor of interest is brands/tyre types ( 𝛼𝑖 , 𝑖 = 1, 2 3, 4 )
Since the tyres must be tried on cars and since some measure of error is necessary (replication)
more than one tyre of each brand must be used and a set of 4 of each brand would seem quite
practical. A car normally uses 4 tyres.
This means that we are going to have 16 tyres, 4 each for the 4 brand and if we designate each of
the cars as 𝐼, 𝐼𝐼, 𝐼𝐼𝐼, 𝐼𝑉, one might put the brand A tyres on car 𝐼, brand B on car 𝐼𝐼 and so on
with the design as shown in the table below.
I II III IV
A B C D
A B C D
A B C D
A B C D
If one looks at design 1, there is a problem since averages for brands are also averages for cars. If
the cars travel over different terrains using different drivers, any apparent brand differences are
also car differences and therefore this design is called a completely confounded one since we
cannot distinguish between brands and cars in this analysis.
A second attempt at the design may be to try the CRD.
COMPLETELY RANDOMISED DESIGN
This design applied to the above example will imply assigning the 16 tyres to the 4 cars in a
completely random manner and might give results as in table 2 below.
I II III IV
Disadvantages
1. Effects of differences among subjects are controlled by random assignment of subjects to
treatment levels. For this to be more effective, subjects should be relatively homogeneous
or a large number of subjects should be used (as many replications as possible).
2. When many treatment levels are included in the experiment, the required sample size
may be prohibitive especially from the point of view of costs.
3. This design does not offer the possibility of evaluating the interaction effect (you can find
out the interaction effect if you have more than one factor).
Analysis
To carry out ANOVA for one factor experiment, the total sums of squares (SST) is partitioned or
broken down into sums of squares due to the treatments which we donate as SSB and the error
sums of squares (SSE)
SST = SSB + SSE
Randomisation and lay out of the CRD with equal number of observations/treatments
1 2 3 ......... a
The first column represents a random sample of size n for treatment 1.........., ath column
represents a random sample of size n for treatment a
yio and 𝑦̅io are the total and mean respectively of the observations in the ith treatment.
yoo and 𝑦̅oo are the grand total and mean of all the N = an observations
𝑦
𝑦̅oo = 𝑜𝑜
𝑁
i.e Total variation = Variation between the treatments + Variation due to error within treatments
SST = SSB + SSE
SSE = SST – SSB
We can compute
2 2
̅𝑜𝑜 )2
SST = Σ𝑖 Σ𝑗 (𝑦𝑖𝑗 − 𝑦 ⟹ Σ𝑖 Σ𝑗 [ 𝑦𝑖𝑗 + 𝑦̅𝑜𝑜 − 2𝑦𝑖𝑗 𝑦̅𝑜𝑜 ]
2
𝑦𝑜𝑜
2
= Σ𝑖 Σ𝑗 𝑦𝑖𝑗 − (prove)
𝑁
2
𝑦𝑜𝑜
is called the correction factor.
𝑁
Σ𝑖 Σ𝑗 𝑦2𝑖𝑗 is uncorrected total sums of squares.
SSB = n∑𝑖(𝑦
̅𝑖𝑜 − 𝑦̅𝑜𝑜 )2 n is the number of observations per treatment
1 2
𝑦𝑜𝑜
2
= Σ 𝑦𝑖𝑜 − (prove)
𝑛 𝑁
1 2
Σ 𝑦𝑖𝑜 → uncorrected SSB
𝑛
2
𝑦𝑜𝑜
→ correction factor
𝑁
2 1 2
SSE = Σ𝑖 Σ𝑗 𝑦𝑖𝑗 − Σ 𝑦𝑖𝑜
𝑛
OR
SSE = SST – SSB
ANOVA TABLE
Source of variation df ss ms F-ratio
𝑆𝑆𝐵 𝑀𝑆𝐵
Between treatments 𝑎−1 SSB MSB = Fc =
(𝑎−1) 𝑀𝑆𝐸
𝑆𝑆𝐸
Error 𝑁−𝑎 SSE MSE - _
(𝑁−𝑎)
Total 𝑵−𝟏 SST _ _
Tabulated value
FT = F0.05, (𝑎 − 1), (𝑁 − 𝑎)
(𝑎 − 1) – degrees of freedom for the factor of interest.
(𝑁 − 𝑎) – degrees of freedom for the error term.
Rejection criteria
Fc ≥ FT , Reject Ho => Treatment effects are not equal to zero/Treatment means are not
equal
Fc < FT , Accept Ho=> Treatment effects are equal to zero/Treatment means are equal
Example 1
The data in the table below gives the number of hours of pain relief provided by 4 different types
of headache tablets administered to 24 people. The 24 experimental units were randomly divided
into 4 groups and each group was treated with a different brand/type. Do the different drug types
give significantly different hours of pain relief?
Brands
1 2 3 4
̅𝒊𝒐
𝒚 11.0 8.0 9.0 6.0
Steps
1. Model
𝑦𝑖𝑗 = µ + 𝛼𝑖 + ∈𝑖𝑗
Where 𝑦𝑖𝑗 - jth observation under the ith drug type
µ - general mean
𝛼𝑖 - ith treatment effect (ith drug type effect)
∈𝑖𝑗 - error term
2. Hypothesis
Ho: α1 = α2 = α3 = α4 = 0 OR µ1 = µ2 = µ3 = µ4 = µ( say)
HA: Some αi ≠ 0 Some µio ≠ µ
3. Rejection region
Reject Ho if Fc ≥ FT ( F0.05, 3, 20 ) = 3.10
Accept Ho if Fc < F0.05, 3, 20 = 3.10
4. Computations
2
𝑦𝑜𝑜 2042
Correction factor = = = 1734.0
𝑁 24
2
𝑦𝑜𝑜
2
SST = Σ𝑖 Σ𝑗 𝑦𝑖𝑗 −
𝑁
= [ 12.22 + 9.52 + ........... + 7.72 ] − Correction factor
= 1912.7 – 1734.0
= 178.7
1 2
𝑦𝑜𝑜
2
SSB = Σ 𝑦𝑖𝑜 − n – no of observations per treatment
𝑛 𝑁
1
= [ 66.02 + 48.02 + 54.02 + 36.02 ] – C.F
6
= 1812.0 – 1734.0
= 78.0
Total 23 178.7
4. Conclusion
Fc = 5.1638 > FT = 3.10 ; Reject Ho implying;
Drug types are significantly different in terms of the hours of pain relief they give.
Note: The sums of squares are NEVER negative.
Lay out of CRD with unequal number of observations/treatment (ni)
1 2 3 ......... a
N = ∑𝑎
𝑖=1 𝑛𝑖
2 2
𝑦𝑖𝑜 𝑦𝑜𝑜
SSB = Σ [ ]− [∑𝑎𝑖=1 ∑𝑛𝑖 ̅𝑖𝑜 − 𝑦̅𝑜𝑜 )2 ]
𝑗=1(𝑦
𝑛𝑖 𝑁
SST and SSE remain the same as before.
Example 2
Four groups of students were subjected to different teaching techniques and tested at the end of a
specified period of time. The table below gives the performance in percentages. Are the teaching
techniques significantly different judging from the performance of the students?
Teaching techniques
1 2 3 4
65 75 59 94
87 69 78 89
73 83 67 80
79 81 62 88
81 72 83
69 79 76
90
ni 6 7 6 4
𝒚io 454 549 425 351
̅io
𝒚 75.67 78.43 70.83 87.75
N = 23 N = ∑ai=1 𝑛𝑖
𝑦𝑜𝑜 = 1779
1779
𝑦̅𝑜𝑜 = = 77.34
23
1. Model
𝑦𝑖𝑗 = µ + 𝛼𝑖 + ∈𝑖𝑗
Where 𝑦𝑖𝑗 - jth observation under the ith teaching technique
µ - grand mean
𝛼𝑖 - ith treatment effect (ith teaching technique effect)
∈𝑖𝑗 - error term
2. Hypothesis
Ho: α1 = α2 = α3 = α4 = 0 OR µ1 = µ2 = µ3 = µ4 = µ( say)
Ho: Some αi ≠ 0 Some µio ≠ µ
3. Rejection region
Reject Ho if Fc ≥ FT ( F0.05, 3, 19 ) = 3.13
Accept Ho if Fc < F0.05, 3, 19 = 3.13
4. Computations
2
𝑦𝑜𝑜 17792
Correction factor = = = 137,601.8
𝑁 23
2
𝑦𝑜𝑜
2
SST = Σ𝑖 Σ𝑗 𝑦𝑖𝑗 −
𝑁
= [ 652 + 872 + ........... + 882 ] − Correction factor
= 139511 – 137,601.8
= 1909.2
2 2
𝑦𝑖𝑜 𝑦𝑜𝑜
SSB = Σ[ ]− n – no of observations per treatment
𝑛𝑖 𝑁
4542 5492 4252 3512
= [ + + + ]– C.F
6 7 6 4
= 138314.4 – 137601.8
= 712.6
Total 22 1909.2
5. Conclusion
Fc = 3.769 > FT = 3.13 ; Reject Ho implying;
The teaching techniques are significantly different judging from the performance of the
students.
Estimation of effects
Given the data table e.g example 2 and the ANOVA table, we can make the following
inferences/conclusions concerning the population from which the data was obtained.
1) The grand mean 𝑦̅𝑜𝑜 is an unbiased estimator of 𝜇𝑜𝑜 the population mean.
2) The treatment mean 𝑦
̅𝑖𝑜 is an unbiased estimator of the mean of the treatment population.
E.g 𝑦
̅𝑖𝑜 = 75.67
3) The difference between any 2 treatment effects is estimated unbiasedly by the difference
between the treatment means i.e 𝛼1 − 𝛼2 = 𝑦 ̅1𝑜 − 𝑦̅2𝑜
4) The factor effect is estimated unbiasedly by the difference between its treatment means
and the grand mean i.e 𝑦 ̅𝑖𝑜 − 𝑦̅𝑜𝑜 and thus can be positive or negative.
5) The mean square error (MSE) value is an unbiased estimator of the common variance σ2.
e.g σ2 = 63 in example 2.
Assumptions of ANOVA
In applying the ANOVA techniques, certain assumptions should be kept in mind and these
include;
1. The process is repeatable (can be replicated)
2. The population distribution being sampled is normal.
3. The variances of all the a levels of a factor are homogeneous.
Exercise
1. Analyse the tyre type effect in the Table 2 design.
2. In a biological experiment, 4 concentrations of a certain chemical are used to enhance the
growth of a certain type of plant over a specified period of time.
The following growth data in cm were recorded for the plants that survived.
Concentration
1 2 3 4
6.1
Is there a significant difference in the average growth of these plants for the different
concentrations of the chemical? Use α = 0.01 and α = 0.05. Estimate the concentration
effect.