X X + (X - X) + (X - X) (Observation) Overall Sample

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 9

Multivariate Analysis of Variance (MANOVA)

Paralleling the univariate reparameterization, we specify the MANOVA


model:

MANOVA MODEL FOR COMPARING g POPULATION MEAN VECTORS


X ℓj.=μ+τ ℓ +e ℓj ,
j= 1, 2, .. . , nℓ and ℓ= 1, 2, .. . , g (6-34) where
e ℓj are independent N p (0 , Σ ) variables. Here the parameter vector

μ is an overall mean (level), and τ ℓ represents the ℓ th treatment


g
∑ nℓ τ ℓ = 0.
effect with ℓ=1

According to the model (6-34), each component of the observation


vector X ℓ j satisfies the univariate model (6-29). The errors for the
components of are X ℓ j correlated, but the covariance matrix Σ is
the same for all populations.

A vector of observations may be decomposed as suggested by the


model. Thus

Xℓ j = X + (Xℓ - X) + (X ℓ j - X ℓ )
(overall sample ¿)¿ ¿¿
(observation) ¿
¿ (6-35)
The decomposition in (6-35) leads to the multivariate analog of the
univariate sum of squares breakup in (6-31). First we note that the
product
( X ℓ j - X ) ( X ℓ j - X ℓ )'

Can be written as

( X ℓ j - X ) ( X ℓ j - X ℓ )' = [ ( X ℓ j - X ℓ ) + ( X ℓ - X ) ] [ ( X ℓ j - X ℓ ) + ( X ℓ - X ) ] '
=( X ℓ j - X ℓ ) ( X ℓ j - X ℓ )' + ( X ℓ j - X ℓ )( X ℓ - X )'
+ ( X ℓ - X )( X ℓ j - X ℓ )' + ( X ℓ - X ) ( X ℓ - X )'

The sum over j of the middle two expressions is the zero matrix,
nℓ

∑ ( Xℓ j - Xℓ ) = 0
because j= 1 . Hence, summing the cross product over ℓ
and j yields.
g nℓ g g nℓ

∑ ∑ (Xℓ j - X ℓ )( X ℓ j - X ℓ )' = ∑ nℓ ( X ℓ
- X )( X - X)' +

∑ ∑ ( Xℓ j - X ℓ )( X ℓ j - X ℓ )'
ℓ=1 j= 1 ℓ=1 ℓ=1 j= 1
¿
¿ ( total (corrected ) sum of ¿ ) ( squares and cross ¿ ) ¿ ¿
¿
¿
The within sum of squares and cross products matrix can be
expressed as
g nℓ
W = ∑ ∑ (Xℓ j - X ℓ )( X ℓ j - X ℓ )'
ℓ=1 j= 1
= (n1 - 1)S 1 + ( n2 - 1)S 2 +.. .+ ( ng - 1)S g (6-37)

Where S ℓ is the sample covariance matrix for the ℓ th sample. This


matrix is a generalization of the (n1 + n2 - 2)Spooled matrix encountered
in the two-sample case. It plays a dominant role in testing for the
presence of treatment effects.

Analogous to the univariate result, the hypothesis of no. treatment


effects,
H 0 : τ 1 = τ 2 =...= τ g = 0

Is tested by considering the relative sizes of the treatment and residual


sums of squares and cross products. Equivalently, we may consider the
relative sizes of the residual and total (corrected) sum of squares and
cross products. Formally, we summarize the calculations leading to the
test statistic in a MANOVA table

MANOVA TABLE FOR COMARING POPULATION MEAN VECTORS

Source Matrix of sum of squares and Degrees of


of variation cross products (SSP) freedom (d. f.)
g
Treatment B = ∑ nℓ ( X - X )( X - X )' g - 1
ℓ ℓ
ℓ=1
g nℓ g
Residual (Error) W = ∑ ∑ (Xℓ j - X ℓ )( X ℓ j - X ℓ )' ∑ nℓ
ℓ=1 j= 1 ℓ=1 - g
Total (corrected
g
for the mean) g nℓ ∑ nℓ
B + W = ∑ ∑ ( Xℓ j - X )( X ℓ j - X )' ℓ=1 - 1
ℓ=1 j= 1

This table is exactly the same form, component by component, as the


ANOVA table, except that squares of scalars are replaced by their
2
vector counterparts. For example, ( x ℓ - x ) becomes ( X ℓ - X )( X ℓ - X )'
. The degrees of freedom correspond to the univariate geometry and
also to some multivariate distribution theory involving Wishart
densities. (see [1].)

One test of H 0 : τ 1 = τ 2 =...= τ g = 0 involves generalized variances.


We reject H0 if the ratio of generalized variances
g nℓ

|W|
|∑ ∑ ( Xℓ j - X ℓ )( X ℓ j - X ℓ )' |
¿ ℓ=1 j= 1
Λ= =
|B + W| g nℓ
|∑ ∑ ( X ℓ j - X )( X ℓ j - X )' |
ℓ=1 j= 1 (6-38)
|W| ¿
Λ=
Is too small. The quantity |B + W| , proposed originally by Wilks (See
[20]), corresponds to the equivalent form (6-23) of the F – test of H 0 :
no treatment effects in the univariate case. Wilks’ lambda has the
virtue of being convenient and related to the likelihood ratio criterion.
¿
The exact distribution of Λ can be derived for the special cases listed
in Table 6.3. For other cases and large sample sizes, a modification of
Λ¿ due to Bartlett (See [4]) can be used to test H0.

TABLE 6.3 DISTRIBUTION OF WILKS’S LAMBDA

No. of variables No. of groups Sampling distribution


For multivariate normal data
P=1 g ¿ 2 ∑ nℓ − g
( g−1 ) ¿¿

P=2 g ¿ 2 (∑ n ℓ − g −1
g− 1 ) ¿¿¿
¿

∑ nℓ − p −1
P ¿ 1 g=2 ( p ) ¿¿
P ¿ 1 g=3
(∑ p
n ℓ − p −2
) ¿¿¿
¿

Bartlett (See [4]) has shown that if H0 is true and ∑ nℓ = n is large,

( p + g) ( p + g) |W|
(
−n -1-
2 ) ¿
(
In Λ =− n - 1 -
2
In ) (
|B + W| ) (6-39)

Has approximately a chi-square distribution with p(g - 1) d.f.


Consequently, for ∑ nℓ = n large, we reject H at significance level
0 α

if

( p + g) |W|
(
−n -1-
2 ) (
In
|B + W| )
> χ 2p(g - 1) (α )
(6-40)
2
Where χ p(g - 1) (α ) is the upper (100 α ) th percentile of a chi-square
distribution with p(g - 1) d.f.

Example 6.8 (A MANOVA table and Wilks’ lambda for testing the
equality of three mean vectors)

Suppose an additional variable variable is observed along with the


variable introduced in Example 6.6. The sample sizes are n 1 = 3, n2 = 2
and n3 = 3. Arranging the observation pairs X ℓ j in rows, we obtain
[93 ] [ 62 ] [ 97 ]
( )
[04 ] [ 20 ]
[38 ] [ 19 ] [27 ]
with X 1 = 8 , X 2 = 1 , X 3 = 2

and X= 4
[] [] []
[]5
4 2 8

We have already expressed the observations on the first variable as the


sum of an overall mean, treatment effect, and residual in our discussion
of univariate ANOVA. We found that
9 6 9 4 4 4 4 4 4 1 −2 1
( ) ( )(
0 2
3 1 2
= 4 4 + −3 −3
4 4 4 −2 −2 −2 1 −1 0 ¿ )(+ −1 1
)
(observation) (mean) ¿ ( treatment ¿ ) ¿ ¿
¿
¿

And

SSobs = SSmean + SStr + SSres

21 = 128 + 78 + 10

Total SS (corrected) = SSobs - SSmean = 216 -128 = 88

Repeating this operation for the observations on the second variable,


we have
3 2 7 5 5 5 −1 −1 −1 −1 −2 3
( ) ( )(
4 0
8 9 7
= 5 5 + −3 −3
5 5 5 3 3 3 )( + 2 −2
)
0 1 −1 ¿
(observation) (mean) ¿ ( treatment ¿ ) ¿ ¿
¿
¿

And

SSobs = SSmean + SStr + SSres


272 = 200 + 48 + 24

Total SS (corrected) = SSobs - SSmean = 272 – 200 = 72

These two single-component analyses must be augmented with the


sum of entry-by-entry cross products in order to complete the entries in
the MANOVA table. Proceeding row by row in the arrays for the tow
variables, we obtain the cross product contributions:

Mean: 4(5) + 4(5) + . . . + 4(5) = 8(4) (5) = 160

Treatment: 3(4) (-1) + 2(-3) (-3) + 3(-2) (3) = -12

Residual: 1(-1) + (-2) (-2) + 1(3) + (-1) (2) + . . . + 0(-1) = 1

Total: 9(3) + 6(2) + 9(7) + 0(4) + . . . + 2(7) = 149

Total (corrected) cross product = total cross product – mean cross

Product

= 149 – 160 = -11

Thus, the MANOVA table takes the following form:

Source Matrix of sum of squares and Degrees of


of variation cross products (SSP) freedom (d. f.)
Treatment 78 -12 3–1=2
[ -12 48 ]
10 1
Residual
[ 1 24 ] 3+2+3–3=5

Total
(corrected) 88 -11 7
[ -11 72 ]
Equation (6-36) is verified by noting that
88 -11 78 -12 10 1
[ -11 72 ] = [ -12 48 ] = [ 1 24 ]
Using (6-38), we get
10 1
| |
|W| 1 24 10(24) - (1)2
Λ¿ = = = = 0.0385
|B + W| |88 -11| 88(72) - (-11)2
-11 72

Since p = 2 and g = 3, Table 6.3 indicates that an exact test (assuming


normality and equal group covariance matrices) of
H 0 : τ 1 = τ 2 =...= τ g = 0 (no treatment effects) versus H : at least one
1
τ ℓ ≠ 0 is available. To carry out the test, we compare the test

statistic

1−√ Λ¿ ( ∑ n ℓ - g - 1) 1 - √. 0385 8 - 3 - 1
( √Λ ¿ ) (g - 1)
=
√. 0385 3 - 1 ( = 8. 19
)( )
With a percentage of an F-distribution having
ν 1 = 2(g - 1)= 4 and ν 2= 2( ∑ n ℓ - g - 1)= 8 d. f
Since 8.19 > F4,8(.01) = 7.01, we .

reject H0 at the α=. 01 level and conclude that treatment differences


exist.

When the number of variables, p, is large, the MANOVA table is


usually not constructed. Still, it is good practice to have the computer
print the matrices B and W so that especially large entries can be
located. Also, the residual vectors
¿^ =X - X
ℓ j ℓj ℓj
e ¿
Should be examined for normality and the presence of outliers using
the techniques discussed in Sections 4.6 and 4.7 of Chapter 4.

You might also like