c measured values of x
i = 1, . . . , r
on the material subject =⇒ xij
j = 1, . . . , c
to the treatment i
Are the mean values of the quantity x significantly different for
the various treatments?
Or are they essentially equal, so that different treatments do
not modify the property x in a significant way?
The model
Set of independent random variables with two indices
xij i = 1, . . . , r j = 1, . . . , c
All the groups are characterized by the same variance σ 2 ,
but they could have different means µi
Hypothesis to be tested
H0 : µi = µ ∀ i = 1, . . . , r (equal means)
For a set of data arranged in r groups of c data each, the
1-factor analysis of variance consists in the interpretation (or
“explanation”) of data spread by distinguishing:
• a contribution due to the choice of the group (row)
• an intrinsic contribution of the single groups
kind of
datum ⇐⇒ + temperature
chemical reagent
kind of
group ⇐⇒ + temperature
chemical reagent
kind of
+ temperature
datum ⇐⇒ chemical reagent
+ time
And so forth...
Only 1-factor and 2-factor ANOVA are commonly applied
• 1-factor ANOVA
when the factor is related to the first index
xij = µ + αi + εij αi = 0 εij i.i.d. N (0, σ)
and so forth...
Mean of data
the mean of all the data
r c
1 XX 1 X
x = xij = xij
rc i=1 j=1 rc i,j
Mean of a group
the mean of the data in a row, i.e. within a group
xi· = xij
c j=1
difference between the value of a variable and the mean of data
xij − x
“Unexplained” deviation
the remaining component of the deviation, which describes the
spread of the data within the group (i.e., within the row)
xij − x − (xi· − x) = xij − xi·
Total variation
the sum of (deviation) squares
SSt = (xij − x)2
X r
SS1 = (xi· − x) = c (xi· − x)2
i,j i=1
It expresses the data variation due to the fact that the groups
(rows) have unequal means
Also known as variation “between” the groups
Unexplained variation
the sum of squares of unexplained deviations
SSu = (xij − xi· )2
Stefano Siboni 8
xij = µ + αi + εij i = 1, . . . , r j = 1, . . . , c
with: X
αi = 0
εij i.i.d. random variables N (0, σ) ∀ i, j
List of symbols:
• (vij ) = (v11 , v12 , . . . , vrc ) ∈ Rrc (arbitrary vector of Rrc )
Stefano Siboni 10
“Complement projector to P1 ”
I − P1 : (vij ) ∈ Rrc −−→ (I − P1 )(vij ) ∈ V2
V2 : subspace of (vij ) ∈ Rrc whose “columns” have zero mean
Vanishing product
P1 (I − P1 ) = 0
=⇒ stochastically independent RVs by Craig’s theorem
“Complement projector to P2 ”
I − P2 : (vij ) ∈ Rrc −−→ (I − P2 )(vij ) ∈ V4
V4 : subspace of (vij ) ∈ Rrc whose “rows” have zero mean
Vanishing product
P2 (I − P2 ) = 0
=⇒ stochastically independent RVs by Craig’s theorem
P1 P 2 = P2 P1
I − P1 P 2
P2 (I − P1 )
I − P2
P2 (I − P1 ) (I − P2 ) = 0
• if αi = 0 ∀ i, a balance on the number of d.o.f. is satisfied
rc − 1 = (r − 1) + r(c − 1)
H0 : µi = µ ∀ i = 1, . . . , r
H0 : αi = 0 ∀ i = 1, . . . , r ,
· the ratio F is a Fisher RV with (r − 1, rc − r) d.o.f.
· the explained variance SS1 /(r −1) is likely close to zero
· the unexplained variance SSu /r(c − 1) can be
assumed of the order of σ 2
{F ≥ Fcritical = F[1−α](r−1,r(c−1)) }
The Excel and Maple tools do not require that the number
of data is the same for all groups, except the matrix imple-
mentation of Maple’s command OneW ayAN OV A
In MATLAB the data matrix may contain some NaN entries,
allowing thereby unbalanced one-way ANOVA as well
p is the p-value
(probability of exceeding the calculated F statistic)
group is a cell array of strings with one string for each entry of x,
in the appropriate order, the strings labelling the groups
Elements of x corresponding to the same label belongs to
the same group. E.g., if:
x= [ 2 4 1 7 5 6 ]
group = { ’A’ ’B’ ’B’ ’A’ ’B’ ’A’ }
the groups are A{2,7,6} and B{4,1,5}
On each box:
− the central mark is the median
− the (lower and upper) edges of the box are the 1st and
the 3rd quartiles, q1 and q3 respectively
− the whiskers extend to the most extreme data points
within the chosen maximum whisker length
− points outside the maximum whisker length are considered
outliers, and plotted individually (red cross)
− the notch denotes the 95% CI of the median
The endpoints of the 95% CI for the median are the extremes
of the notch, estimated in the following way:
q3 − q1 q3 − q1
q2 − 1.57 √ q2 + 1.57 √
n n
group i mean xi· variation (xij − xi· )2
1 51.9 2.08
2 52.4 3.48
3 52.3 1.00
and the sum of variations over all the groups provides the un-
explained variation, or variation within groups:
r X
X c
SSu = (xij − xi· )2 = 2.08 + 3.48 + 1.00 = 6.56
i=1 j=1
group i xi· − x
1 −0.3
2 0.2
3 0.1
SS1 0.70
= = 0.35
r−1 2
whereas the unexplained variance is
SSu 6.56
= = 0.546667
r(c − 1) 3·4
The ratio of the two variances above provides the value of the
test variable:
{F ≥ F[1−α](r−1,r(c−1)) } = {F ≥ F[0.95](2,12) }
{F ≥ 3.8852938}
and since the calculated value of F does not fall within such
a rejection region, we conclude that the null hypothesis:
must be accepted.
Calculation by Excel
The application of the Excel tool to the previous data, at a
significance level α = 5% results in the following ANOVA table:
· the r = 3 groups are denoted by “Row1, Row2, Row3”
· “Count” is the number of data per group (c = 5)
· “Sum” stands for the sum of data in a row
· “Average” denotes the sample mean of data in a row
· “V ariance” is the sample variance in a row
· “Source of variation” refers to the kind of variation:
− SS1 (Between Groups)
− SSu (Within Groups)
− SSt (Total)
· “SS”, “df ” and “M S” denote the corresponding values of
variation, d.o.f. and variance, respectively
· F shows the calculated value of the F -test statistic
· F crit is the lower limit F[0.95](2,12) of the critical region
Calculation by Maple
One-way ANOVA is also implemented by the Maple Statistics
The data groups are defined as column vectors,
by the commands:
G1 := V ector([51.7, 53.0, 52.0, 51.8, 51.0]);
G2 := V ector([52.1, 52.3, 52.9, 53.6, 51.1]);
G3 := V ector([52.8, 51.8, 52.3, 52.8, 51.8]);
and the command OneW ayAN OV A is then applied to the list
of column vectors to obtain the desired result:
OneW ayAN OV A([G1, G2, G3]);
The output always appears as a 3 × 5 matrix:
2 .700000000 .350000000 .640243902 .5442561085
12 6.56000000 .546666667 N U LL N U LL
14 7.26000000 N U LL N U LL N U LL
Calculation by MATLAB
In MATLAB the 1-factor ANOVA test can be easily performed
by defining a cell array of the group names:
51.7 53.0 52.0 51.8 51.0; . . . % data of group 1
52.1 52.3 52.9 53.6 51.1; . . . % data of group 2
52.8 51.8 52.3 52.8 51.8; . . . % data of group 3
[p,table,stats] = anova1(x’,groups)
The approach is the same that defines the unpaired t-test for
the comparison of the means of two independent normal
populations of equal variance
(by using groups i and h, with the same number c of data)
Scheffé theorem
For any r-uple of real coefficients (C1 , C2 , . . . , Cr ) with zero
X r
Ci = 0
and for any given 1 − α ∈ (0, 1), the inequality
r r
X 2
SS u r − 1 X
Ci (xi· − µi ) ≤ F[1−α](r−1,rc−r)
r(c − 1) c
is verified with probability 1 − α
The output c is a matrix with one row for each pair of groups,
consisting of:
◦ the number of one group (i)
◦ the number of another group (j > i)
◦ the lower limit of the (1 − α)-CI for µi − µj
◦ the sample estimate to the difference of means µi − µj
◦ the upper limit of the (1 − α)-CI for µi − µj
the hypothesized difference 0 belongs to the (1 − α)-CI of µi − µh
0 ∈ [xi· − xh· − ∆x, xi· − xh· + ∆x]
xi· − xh· ≤ ∆x
the comparison intervals of µi and µh overlap
where we find:
− the explained variation (“between groups”)
Xr X c
SS1 = (xi· − x)2 = 135.762475
i=1 j=1
{F ≥ F[1−α](r−1,r(c−1)) } = {F ≥ F[0.95](3,12) }
that is
{F ≥ 3.4902948}
must be rejected.
Equivalently, the same conclusion stems from the very low
P -value:
P -value = 0.0044065 < 0.05 = α
To have an idea of how different are the performances of the
various techniques, we can calculate the CI for all the mean
differences, for instance at a confidence level 1 − α = 66%
Let µ1 , µ2 , µ3 , µ4 be the means related to the techniques A,
B, C, D (i.e., the Mg concentrations theoretically detected by
the various methods on the same sample)
at a 66%-confidence level.
xij i = 1, . . . , r j = 1, . . . , c
Have the chemical reagent or the treatment temperature a
significant effect on the property x of the material?
For any i = 1, . . . , r and j = 1, . . . , c the random variable xij
is normal with mean µij and variance σ 2
The mean of the variable xij may depend on both the indices
i and j (on both the chemical reagent and the temperature),
while the variance σ 2 is the same
Hypothesis to be tested
xij − x
xi· − x
x·j − x
“Unexplained” deviation
Total variation
SSt = (xij − x)2
X r
SS1 = (xi· − x)2 = c (xi· − x)2
i,j i=1
X c
SS2 = (x·j − x) = r (x·j − x)2
i,j j=1
Unexplained variation
SSu = (xij − xi· − x·j + x)2
xij = µ + αi + βj +εij i = 1, . . . , r j = 1, . . . , c
| {z }
with: X X
αi = 0 βj = 0
i j
εij i.i.d. random variables N (0, σ) ∀ i, j
1 X
(xij − x − αi − βj )2 =
σ i,j
D1 1 E
= (εij )(I − P1 P2 ) (εij ) = X 2 rc−1
σ σ
1 X
(xi· − x − αi )2 =
σ i,j
D1 1 E
= (εij )P2 (I − P1 ) (εij ) = X 2 r−1
σ σ
1 X 2
(x ·j − x − β j ) =
σ 2 i,j
D1 1 E
= (εij )(I − P2 )P1 (εij ) = X 2 c−1
σ σ
1 X
(xij − xi· − x·j + x)2 =
σ i,j
D1 1 E
= (εij )(I − P2 )(I − P1 ) (εij ) = X 2 (r−1)(c−1)
σ σ
X 2 r−1 X 2 c−1 X 2 (r−1)(c−1)
stochastically independent
For any set {xij } the total variation coincides with the sum of
− the variation explained by the factor related to i
− the variation explained by the factor associated to j
− the unexplained variation
SSt = SS1 + SS2 + SSu
Moreover, provided that the ANOVA model is correct,
we have that:
(i) if αi = 0 ∀ i and βj = 0 ∀ j:
SSt /σ 2 is a X 2 variable with rc − 1 d.o.f.
(ii) if αi = 0 ∀ i:
SS1 /σ 2 is a X 2 variable with r − 1 d.o.f.
(iii) if βj = 0 ∀ j:
SS2 /σ 2 is a X 2 variable with c − 1 d.o.f.
− in case (ii) the RVs SS1 /σ 2 and SSu /σ 2
are stochastically independent
Stefano Siboni 48
rc − 1 = (r − 1) + (c − 1) + (r − 1)(c − 1)
Test variable:
SS1 (r − 1)(c − 1) SS1
F = = (c − 1)
r−1 SSu SSu
which, for H0 true, is a Fisher RV with (r − 1, (r − 1)(c − 1))
If H0 is true, there typically holds
' σ2 ⇐⇒ F small
r−1 (r − 1)(c − 1)
F < F[1−α](r−1,(r−1)(c−1))
F ≥ F[1−α](r−1,(r−1)(c−1))
Test variable:
SS2 (r − 1)(c − 1) SS2
F = = (r − 1)
c−1 SSu SSu
which, for H0 true, is a Fisher RV with (c − 1, (r − 1)(c − 1))
If H0 is true, there typically holds
' σ2 ⇐⇒ F small
c−1 (r − 1)(c − 1)
F < F[1−α](c−1,(r−1)(c−1))
F ≥ F[1−α](c−1,(r−1)(c−1))
[p,table,stats] = anova2(x,displayopt)
H0 : αi = 0 ∀ i = 1, 2, 3
{F ≥ F[1−α](r−1,(r−1)(c−1)) }
{F ≥ F[0.95](2,8) = 4.4589701}
{F ≥ F[1−α](c−1,(r−1)(c−1)) }
{F ≥ F[0.95](4,8) = 3.8378534}
Calculation by Excel
The previous calculations can be easily performed by means
of the “Anova: Two-Factor Without Replication” Data
Analysis tool. For α = 5%, we obtain the ANOVA table below:
· the r = 3 levels of the 1st factor are denoted as “Rows”
· the c = 5 levels of the 2nd factor are named “Columns”
· “Count” stands for the number of data in a Row/Column
· “Average” is the sample mean within a Row/Column
· “V ariance” is the sample variance within a Row/Column
· “Source of variation” specifies the kind of variation:
· SS1 (Rows) - explained by the 1st factor
· SS2 (Columns) - explained by the 2nd factor
· SSu (Error) - unexplained
· SSt (Total) - total
· “SS”, “df ” and “M S” denote the corresponding values of
variation, d.o.f. and variance, respectively
Calculation by MATLAB
In MATLAB the 2-factor ANOVA test without interactions can
be performed by defining the matrix of data:
x = [ ...
51.7 53.0 52.0 51.8 51.0 ; . . .
52.1 52.3 52.9 53.6 51.1 ; . . .
52.8 51.8 52.3 52.8 51.8 ];
[p,table,stats] = anova2(x)
[p,table,stats] = anova2(x,1,’on’) % reps=1
cleaner C1 C2 C3
D1 53 50 59
D2 54 54 60
D3 56 58 62
D4 50 45 57
r X c
X (xij − x − αi − βj )2
X 2 rc−1 =
i=1 j=1
r X
X (xi· − x − αi )2
X 2 r−1 =
i=1 j=1
r X c
X (x·j − x − βj )2
X c−1 =
i=1 j=1
r X c
X (xij − xi· − x·j + x)2
X (r−1)(c−1) =
i=1 j=1
where we find:
· the variance explained by the 1st factor (Rows/detergent):
r c
SS1 1 XX
= (xi· − x)2 = 34.111111
r−1 r − 1 i=1 j=1
(r − 1)(c − 1)
r X c
1 X
= (xij − x·j − xx· + x)2 = 3.6944444
(r − 1)(c − 1) i=1 j=1
{F ≥ F[1−α](c−1,(r−1)(c−1)) } = {F ≥ F[0.95](2,6) } ,
p xi· − x − αi
ti = rc(c − 1) √
at a confidence level α;
− ∀ j = 1, . . . , c the RV
p x·j − x − βj
tj = rc(r − 1) √
follows a Student’s distribution with (r − 1)(c − 1) d.o.f.,
and a CI for the coefficient writes thereby:
x·j − x ± t[1− α2 ]((r−1)(c−1))
rc(r − 1)
with the same confidence level α as before.
xijk i = 1, . . . , r j = 1, . . . , c, k = 1, . . . , s
Have the chemical reagent or the treatment temperature
a significant effect on the property x of the material?
In there any interaction between the effects of the reagent
and those of the temperature?
For any i = 1, . . . , r, j = 1, . . . , c and k = 1, . . . , s the random
variable xijk is normal with mean µij and variance σ 2
The mean of the variable xijk may depend on the indices
i and j (i.e., on the chemical reagent and the temperature),
while the variance σ 2 is the same
Mean of data
r c s
1 XXX 1 X
x = xijk = xijk
rcs i=1 j=1 rcs
k=1 ijk
xijk − x
xi·· − x
x·j· − x
“Unexplained” deviation
Stefano Siboni 68
Ph. D. in Materials Engineering Statistical methods
Total variation
SSt = (xijk − x)2
Unexplained variation
SSu = (xijk − xij· )2
εijk i.i.d. random variables N (0, σ) ∀ i, j, k
(ii) if αi = 0 ∀ i:
SS1 /σ 2 is a X 2 variable with r − 1 d.o.f.
(iii) if βj = 0 ∀ j:
SS2 /σ 2 is a X 2 variable with c − 1 d.o.f.
(iv) if αβij = 0 ∀ i, j:
SS12 /σ 2 is a X 2 variable with (r − 1)(c − 1) d.o.f.
is a Fisher random variable with ((r − 1)(c − 1), rc(s − 1)) d.o.f.
Test variable:
SS1 rc(s − 1)
F =
r − 1 SSu
which, for H0 true, is a Fisher RV with (r − 1, rc(s − 1)) d.o.f.
If H0 is true, there typically holds
' σ2 ⇐⇒ F small
r−1 rc(s − 1)
F < F[1−α](r−1,rc(s−1))
F ≥ F[1−α](r−1,rc(s−1))
Test variable:
SS2 rc(s − 1)
F =
c − 1 SSu
which, for H0 true, is a Fisher RV with (c − 1, rc(s − 1)) d.o.f.
If H0 is true, there typically holds
' σ2 ⇐⇒ F small
c−1 rc(s − 1)
F < F[1−α](c−1,rc(s−1))
F ≥ F[1−α](c−1,rc(s−1))
Test variable:
SS12 rc(s − 1)
F =
(r − 1)(c − 1) SSu
a Fisher RV with ((r − 1)(c − 1), rc(s − 1)) d.o.f., for H0 true
If H0 is true, there typically holds
SS12 SSu
' σ2 ⇐⇒ F small
(r − 1)(c − 1) rc(s − 1)
SS12 SSu
∼ ' σ2 ⇐⇒ F large
(r − 1)(c − 1) rc(s − 1)
F < F[1−α]((r−1)(c−1),rc(s−1))
F ≥ F[1−α]((r−1)(c−1),rc(s−1))
[p,table,stats] = anova2(x,reps,displayopt)
r X c Xs
X (xijk − x − αi − βj − αβij )2
= X 2 rcs−1
i=1 j=1
r X
c X
X (xi·· − x − αi )2
= X 2 r−1
i=1 j=1 k=1
c X
r X s
X (x·j· − x − βj )2
= X 2 c−1
i=1 j=1 k=1
r X
c X
X (xij· − xi·· − x·j· + x − αβij )2
= X 2 (r−1)(c−1)
i=1 j=1 k=1
r X c X s
X (xijk − xij· )2 2
= X rc(s−1)
H0 : αβij = 0 ∀ i = 1, 2, 3 , ∀ j = 1, 2, 3
H0 : αi = 0 ∀ i = 1, 2, 3
H0 : βj = 0 ∀ j = 1, 2, 3
SS1 1 X
= (xi·· − x)2 = 1.755555
r−1 r−1
SS2 1 X
= (x·j· − x)2 = 4145.0889
c−1 c−1
(r − 1)(c − 1)
1 X
= (xij· − xi·· − x·j· + x)2 = 330.98889
(r − 1)(c − 1)
SSu 1 X
= (xijk − xij· )2 = 121.8222
rc(s − 1) rc(s − 1)
{F ≥ F[1−α]((r−1)(c−1),rc(s−1)) }
{F ≥ F[0.95](4,36) = 2.63353}
{F ≥ F[1−α](c−1,rc(s−1)) }
{F ≥ F[0.95](2,36) = 3.25945}
Calculation by Excel
All the previous calculations can be easily performed by using
the Excel “Anova: Two-Factor With Replication” Data
Analysis tool, which provides the ANOVA table below:
Calculation by MATLAB
In MATLAB the 2-factor ANOVA test with interactions is car-
ried out by defining the matrix of data:
x = [ ...
196 214 258 ; ...
208 216 250 ; ...
247 235 264 ; ...
216 240 248 ; ...
221 252 272 ; ...
216 215 246 ; ...
228 217 247 ; ...
240 235 261 ; ...
224 219 250 ; ...
236 241 255 ; ...
230 212 255 ; ...
242 218 251 ; ...
232 216 261 ; ...
244 224 258 ; ...
228 222 247 ...
[p,table,stats] = anova2(x,5,’on’)
[p,table,stats] = anova2(x,5)
[p,table,stats,terms] = anovan(y,group,’name’,value,. . .)
Remember that:
Table of contents
A. F-test for the comparison of means (ANOVA) . . . . . . 1
A.1 1-factor ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
A.1.1 Basic definitions of 1-factor ANOVA . . . . . . . . . . . . . . . . 7
A.1.2 Statistical model of 1-factor ANOVA . . . . . . . . . . . . . . . 9
A.1.3 CI for the difference of two means . . . . . . . . . . . . . . . . . . 32
A.1.4 CIs for all the differences of two means . . . . . . . . . . . . . 33
A.2 2-factor ANOVA without interaction . . . . . . . . . . . . . . . . 43
A.2.1 Basic definitions of 2-factor ANOVA
without interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
A.2.2 Statistical model of 2-factor ANOVA
without interaction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
A.3 2-factor ANOVA with interaction . . . . . . . . . . . . . . . . . . . 66
A.3.1 Basic definitions of 2-factor ANOVA with interaction 67
A.3.2 Statistical model of 2-factor ANOVA with interaction 70
A.4 MATLAB implementation of the general
n-factor ANOVA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
Stefano Siboni -1