Analysis Multivariate Statistics
Analysis Multivariate Statistics
Analysis Multivariate Statistics
Exercise 1
The data in the following table, taken from Anderson (1984), correspond to the increase in hours of sleep due
to the use of two medications A and B. The experiment was performed on 10 patients. It is assumed that
the data follow a bivariate normal distribution. Find the maximum likelihood estimators of the mean and
variance-covariance matrix and the correlations matrix. Do medications have similar effects on patients?
Sol.
Let’s note that ¨ ˛
1.9 0.7
˚ 0.8 ´1.6‹
˚ ‹
˚ 1.1 ´0.2‹
˚ ‹
˚ 0.1 ´1.2‹
˚ ‹
˚´0.1 ´0.1‹
x“˚
˚ 4.4
‹ p“2 n “ 10
˚ 3.4 ‹
‹
˚ 5.5 3.7 ‹
˚ ‹
˚ 1.6 0.8 ‹
˚ ‹
˝ 4.6 0.0 ‚
3.4 2.0
Let’s calculate the sample mean
10 ˆ ˙
1 ÿ 2.33
x̄ “ xi “
10 i“1 0.75
The value was found using the following R code
x = matrix ( c ( 1 . 9 , 0 . 8 , 1 . 1 , 0 . 1 , −0.1 , 4 . 4 , 5 . 5 , 1 . 6 , 4 . 6 ,
3 . 4 , 0 . 7 , −1.6 , −0.2 , −1.2 , −0.1 , 3 . 4 , 3 . 7 , 0 . 8 , 0 . 0 , 2 . 0 ) , n c o l = 2 )
colMeans ( x )
Let’s calculate
n
1 ÿ 1
S“ pxi ´ x̄qpxi ´ x̄q
n ´ 1 i“1
Using the following R command
1
x = matrix ( c ( 1 . 9 , 0 . 8 , 1 . 1 , 0 . 1 , −0.1 , 4 . 4 , 5 . 5 , 1 . 6 , 4 . 6 ,
3 . 4 , 0 . 7 , −1.6 , −0.2 , −1.2 , −0.1 , 3 . 4 , 3 . 7 , 0 . 8 , 0 . 0 , 2 . 0 ) , n c o l = 2 )
var ( x )
We found ˆ ˙
4.009000 2.848333
S“
2.848333 3.200556
Now, let’s calculate the correlation matrix using the following R command
x = matrix ( c ( 1 . 9 , 0 . 8 , 1 . 1 , 0 . 1 , −0.1 , 4 . 4 , 5 . 5 , 1 . 6 , 4 . 6 ,
3 . 4 , 0 . 7 , −1.6 , −0.2 , −1.2 , −0.1 , 3 . 4 , 3 . 7 , 0 . 8 , 0 . 0 , 2 . 0 ) , n c o l = 2 )
cor (x)
Exercise 2
A biologist takes measurements of the skulls of two species of mice. Specifically, he observes three variables Y1,
Y2 and Y3 in a set of mice of which 50 are of species A, and the remaining 60 are of species B. Let us suppose
that the data are ordered in such a way that the first 50 (in the matrix Y) are those of species A and the last
60 are those of species B.
and ¨ ˛
40.2 10.9 15.6
YA1 YA “ ˝10.9 13.7 14.5‚
15.6 14.5 20.1
Calculate the mean vector and covariance matrix for this species.
Sol.
Let’s calculate the mean vector ȲA
¨ ˛ ¨ř50 ˛
25.5 i“1 Yi1
YA1 150 “ ˝14.1‚ “ ˝ 50
ř
Yi2 ‚
ři“1
50
11.3 i“1 Yi3
Then ¨ ˛ ¨ ˛ ¨ ˛
Ȳ1A 25.5 0.51
1
ȲA “ ˝Ȳ2A ‚ “ ˝14.1‚ “ ˝0.282‚
50
Ȳ3A 11.3 0.226
Now, let’s calculate the covariance matrix SA
¨ ˛ ¨ 2 ˛
40.2 10.9 15.6 ÿ50 Yi1 Yi1 Yi2 Yi1 Yi3
YA1 YA “ ˝10.9 13.7 14.5‚ “ ˝Yi2 Yi1 Yi22 Yi2 Yi3 ‚
15.6 14.5 20.1 i“1 Yi3 Yi1 Yi3 Yi2 Yi32
2
Yam <− matrix ( c ( 0 . 5 1 , 0 . 2 8 2 , 0 . 2 2 6 ) , n c o l =1)
YtY <− matrix ( c ( 4 0 . 2 , 1 0 . 9 , 1 5 . 6 , 1 0 . 9 , 1 3 . 7 , 1 4 . 5 , 1 5 . 6 , 1 4 . 5 , 2 0 . 1 ) , n c o l =3)
Sa <− 1/49 ∗ (YtY−50∗Yam%∗%t (Yam) )
Sa
and ¨ ˛
50.7 32.6 24.8
YB1 YB “ ˝32.6 29.0 12.6‚
24.8 12.6 35.8
Calculate the mean vector and covariance matrix for this species.
Sol. Let’s calculate the mean vector ȲB
¨ ˛ ¨ř60 ˛
25.5 i“1 Yi1
YB1 160 “ ˝14.1‚ “ ˝ 60
ř
Yi2 ‚
ři“1
60
11.3 i“1 Yi3
Then ¨ ˛ ¨ ˛ ¨ ˛
Ȳ1B 25.5 0.452
1 ˝
ȲB “ ˝Ȳ2B ‚ “ 14.1‚ “ ˝ 0.235 ‚
60
Ȳ3B 11.3 0.1883
Now, let’s calculate the covariance matrix SB
¨ ˛
50.7 32.6 24.8
YB1 YB “ ˝32.6 29.0 12.6‚
24.8 12.6 35.8
c) Calculate the mean vector and covariance matrix for all mice.
Sol. Let’s calculate the mean vector for all mice
»¨ ˛ ¨ ˛fi ¨ ˛
25.5 25.5 0.4636364
1 –˝
Ȳ “ 14.1‚` ˝14.1‚fl “ ˝ 0.2563636 ‚
110
11.3 11.3 0.20555455
3
and the covariance for all mices
“ ‰
Y 1 Y “ YA1 YA ` YB1 YB
»¨ ˛ ¨ ˛fi
40.2 10.9 15.6 50.7 32.6 24.8
“ –˝10.9 13.7 14.5‚` ˝32.6 29.0 12.6‚fl
15.6 14.5 20.1 24.8 12.6 35.8
¨ ˛
90.9 43.5 40.4
“ ˝43.5 42.7 27.1‚
40.4 27.1 55.9
We get ¨ ˛
0.6170142 0.2791326 0.2745121
˝0.2791326 0.3254178 0.1954696‚
0.2745121 0.1954696 0.4702452
Exercise 3
i) What is the geometric shape of the next equation?
Sol.
We know that the hyperellipsoid is the extension of an elipse of more than 2 dimmensions. So, the equation
of a p-dimensional hyperellipsoid is px ´ x̄q1 S ´1 px ´ x̄ “ c2 with base in S ´1 and centered in x̄. It contains
a proportions of the observations x1 , x2 , ..., xn in the sample. Let’s notice that in (0.1) S is replaced by Σ,
then the c2 could be determined by a chi-square distribution.
ii) Plot (use R): px ´ µq1 Σ´1 px ´ µq “ c2, with c “ 1 and:
ˆ ˙
1 1 0
a) µ “ p1, 1q; Σ “
0 2
Sol.
Let’s compute ˆ ˙ ˆ ˙
1 2 0 1 0
Σ´1 “ “ 1
2 0 1 0 2
Then, (0.1) becomes
ˆ ˙ˆ ˙
1 0 x´1
ppx ´ 1qpy ´ 1qq “1
0 21 y´1
ˆ ˙
x´1
ppx ´ 1qpy ´ 1qq “1
0.5py ´ 1q
px ´ 1q2 ` 0.5py ´ 1q2 “ 1
ˆ ˙
x
where x “ Now, with the following code
y
4
e1 <− f u n c t i o n ( x , y ) { ( x − 1 ) ∗ ∗ 2 + ( ( 0 . 5 ) ∗ ( y −1)∗∗2) −1}
x <− s e q ( −5 ,5 , l e n g t h =1000)
y<− s e q ( −5 ,5 , l e n g t h =1000)
z<− o u t e r ( x , y , e1 )
contour (x , y , z , l e v e l s = 0)
ˆ ˙
1 0
b) µ1 “ p0, 0q; Σ “
0 1
Sol.
Let’s compute ˆ ˙ ˆ ˙
1 1 0 1 0
Σ´1 “ “
1 0 1 0 1
Then, (0.1) becomes
ˆ ˙ˆ ˙
` ˘ 1 0 x
x y “1
0 1 y
ˆ ˙
` ˘ x
x y “1
y
x2 ` y 2 “ 1
ˆ ˙
x
where x “ Now, with the following code
y
e2 <− f u n c t i o n ( x , y ) { ( x ∗∗2)+( y∗∗2) −1}
x <− s e q ( − 1 . 5 , 1 . 5 , l e n g t h =1000)
y<− s e q ( − 1 . 5 , 1 . 5 , l e n g t h =1000)
z<− o u t e r ( x , y , e2 )
contour (x , y , z , l e v e l s = 0)
5
Figure 1: Plot 3b
ˆ ˙
1 1 ´1
c) µ “ p0, 0q; Σ “
´1 2
Sol.
First let’s find the inverse of Σ ˆ ˙ ˆ ˙
´1 1 2 1 2 1
Σ “ “
2´1 1 1 1 2
Then
ˆ ˙ˆ ˙
` ˘ 2 1 x
x y “1
1 2 y
ˆ ˙
` ˘ 2x ` 4
x y “1
x ` 2y
2x2 ` 2xy ` 2y 2 “ 1
Figure 2: Plot 3c
ˆ ˙
1 2 1
d) µ “ p0, 0q; Σ “
1 2
First, let’s find the inverse of Σ
6
ˆ ˙ ˆ ˙
1 2 ´1 1 2 ´1
Σ´1 “ “
4´1 ´1 2 3 ´1 2
Then
ˆ ˙ˆ ˙
1 2 ´1 x
px, 4q “1
3 ´1 2 4
ˆ ˙
1 2x ´ 4
px, 4q “1
3 ´x ` 2y
1
p2x2 ´ xy ´ xy ` 2y 2 q “ 1
3
p2x2 ´ 2xy ` 2y 2 q “ 3
Figure 3: Plot 3d
Exercise 4.
Let’s consider the sample: ¨ ˛
2 6 ´3
˚´4 8 7‹
A“˚ ‹
˝´2 9 7‚
´7 8 2
7
shapiro . test (y)
r<−mean ( y [ , 1 ] )
s<−mean ( y [ , 2 ] )
t<−mean ( y [ , 3 ] )
u <− data . frame ( r , s , t ) ;
u
Then :
Results: ¨ ˛
10.68750 ´2.18775 ´6.56250
˝´2.18775 1.18725 4.06275 ‚
´6.56250 4.06275 17.18775
sigma<−round ( cov ( y ) , 3 ) ;
sigma
¨ ˛
14.250 ´2.917 ´8.750
˝´2.917 1.583 5.417 ‚
´8.750 5.417 22.917
Exercise 5.
Let X be a random p dimensional vector of mean and matrix of variances covariances I (matrix identity). Given
a square matrix of order p, A, consider Y “ X 1 AX and proof that EpY q “ trapAq ` µ1 Aµ Sol.
We have:
8
ErX 1 AXs “ trapAqΣ ` µ1 Aµ
Σ “ I, therefore
Exercise 6.
If:
y “ Σ´1{2 px ´ µq
proof that
px ´ µq1 Σ´1 px ´ µq “ y 1 y
Then,
y 1 “ rΣ´1{2 px ´ µqs1
y 1 “ pΣ´1{2 q1 px ´ µq1
y 1 “ px ´ µq1 Σ´1{2
Therefore,
y 1 y “ px ´ µq1 Σ´1{2 Σ´1{2 px ´ µq
Finally,
y 1 y “ px ´ µq1 Σ´1 px ´ µq
Exercise 7.
Do the data represented in the following multiple scatter diagram come from a multivariate normal distribution?
The first graph plotted on each row of Figure 1. shows the bivariate normal density. Based on this analysis,
we can infer from Figure 1. that any of the graphs showed on each row follows a multiple normal distribution
as is shown in Figure 2. Therefore, the data in not normally distributed.
9
Figure 5: Bivariate normal density graph for multivariate normal distributions.
10