Analysis Multivariate Statistics

Download as pdf or txt
Download as pdf or txt
You are on page 1of 10

Homework 2

January 29, 2021

Exercise 1
The data in the following table, taken from Anderson (1984), correspond to the increase in hours of sleep due
to the use of two medications A and B. The experiment was performed on 10 patients. It is assumed that
the data follow a bivariate normal distribution. Find the maximum likelihood estimators of the mean and
variance-covariance matrix and the correlations matrix. Do medications have similar effects on patients?

Patient MEDICATION A MEDICATION B


1 1.9 0.7
2 0.8 -1.6
3 1.1 -0.2
4 0.1 -1.2
5 -0.1 -0.1
6 4.4 3.4
7 5.5 3.7
8 1.6 0.8
9 4.6 0.0
10 3.4 2.0

Sol.
Let’s note that ¨ ˛
1.9 0.7
˚ 0.8 ´1.6‹
˚ ‹
˚ 1.1 ´0.2‹
˚ ‹
˚ 0.1 ´1.2‹
˚ ‹
˚´0.1 ´0.1‹
x“˚
˚ 4.4
‹ p“2 n “ 10
˚ 3.4 ‹

˚ 5.5 3.7 ‹
˚ ‹
˚ 1.6 0.8 ‹
˚ ‹
˝ 4.6 0.0 ‚
3.4 2.0
Let’s calculate the sample mean
10 ˆ ˙
1 ÿ 2.33
x̄ “ xi “
10 i“1 0.75
The value was found using the following R code
x = matrix ( c ( 1 . 9 , 0 . 8 , 1 . 1 , 0 . 1 , −0.1 , 4 . 4 , 5 . 5 , 1 . 6 , 4 . 6 ,
3 . 4 , 0 . 7 , −1.6 , −0.2 , −1.2 , −0.1 , 3 . 4 , 3 . 7 , 0 . 8 , 0 . 0 , 2 . 0 ) , n c o l = 2 )
colMeans ( x )

Let’s calculate
n
1 ÿ 1
S“ pxi ´ x̄qpxi ´ x̄q
n ´ 1 i“1
Using the following R command

1
x = matrix ( c ( 1 . 9 , 0 . 8 , 1 . 1 , 0 . 1 , −0.1 , 4 . 4 , 5 . 5 , 1 . 6 , 4 . 6 ,
3 . 4 , 0 . 7 , −1.6 , −0.2 , −1.2 , −0.1 , 3 . 4 , 3 . 7 , 0 . 8 , 0 . 0 , 2 . 0 ) , n c o l = 2 )
var ( x )

We found ˆ ˙
4.009000 2.848333
S“
2.848333 3.200556
Now, let’s calculate the correlation matrix using the following R command
x = matrix ( c ( 1 . 9 , 0 . 8 , 1 . 1 , 0 . 1 , −0.1 , 4 . 4 , 5 . 5 , 1 . 6 , 4 . 6 ,
3 . 4 , 0 . 7 , −1.6 , −0.2 , −1.2 , −0.1 , 3 . 4 , 3 . 7 , 0 . 8 , 0 . 0 , 2 . 0 ) , n c o l = 2 )
cor (x)

So, we have that ˆ ˙


1.0000000 0.7951702
R“
0.7951702 1.0000000
Analyzing the difference in the mean of A and B medications which is 1.58 we can conclude that since is a big
value then do not have similar effects. Moreover, their variance is also different.

Exercise 2
A biologist takes measurements of the skulls of two species of mice. Specifically, he observes three variables Y1,
Y2 and Y3 in a set of mice of which 50 are of species A, and the remaining 60 are of species B. Let us suppose
that the data are ordered in such a way that the first 50 (in the matrix Y) are those of species A and the last
60 are those of species B.

a) Let us denote by YA the matrix of data observed in species A. If

YA1 150 “ p25.5, 14.1, 11.3q1

and ¨ ˛
40.2 10.9 15.6
YA1 YA “ ˝10.9 13.7 14.5‚
15.6 14.5 20.1
Calculate the mean vector and covariance matrix for this species.
Sol.
Let’s calculate the mean vector ȲA
¨ ˛ ¨ř50 ˛
25.5 i“1 Yi1
YA1 150 “ ˝14.1‚ “ ˝ 50
ř
Yi2 ‚
ři“1
50
11.3 i“1 Yi3

Then ¨ ˛ ¨ ˛ ¨ ˛
Ȳ1A 25.5 0.51
1
ȲA “ ˝Ȳ2A ‚ “ ˝14.1‚ “ ˝0.282‚
50
Ȳ3A 11.3 0.226
Now, let’s calculate the covariance matrix SA
¨ ˛ ¨ 2 ˛
40.2 10.9 15.6 ÿ50 Yi1 Yi1 Yi2 Yi1 Yi3
YA1 YA “ ˝10.9 13.7 14.5‚ “ ˝Yi2 Yi1 Yi22 Yi2 Yi3 ‚
15.6 14.5 20.1 i“1 Yi3 Yi1 Yi3 Yi2 Yi32

Finally, we have that


1 “ 1 ‰
SA “ YA YA ´ 50ȲA ȲA1
50 ´ 1
Using the following R code

2
Yam <− matrix ( c ( 0 . 5 1 , 0 . 2 8 2 , 0 . 2 2 6 ) , n c o l =1)
YtY <− matrix ( c ( 4 0 . 2 , 1 0 . 9 , 1 5 . 6 , 1 0 . 9 , 1 3 . 7 , 1 4 . 5 , 1 5 . 6 , 1 4 . 5 , 2 0 . 1 ) , n c o l =3)
Sa <− 1/49 ∗ (YtY−50∗Yam%∗%t (Yam) )
Sa

We get the following ¨ ˛


0.55500000 0.07569388 0.2007551
SA “ ˝0.07569388 0.19844490 0.2308857‚
0.20075510 0.23088571 0.3580857

b) Let us denote by YB the matrix of data observed in species B. If

YB1 160 “ p25.5, 14.1, 11.3q1

and ¨ ˛
50.7 32.6 24.8
YB1 YB “ ˝32.6 29.0 12.6‚
24.8 12.6 35.8
Calculate the mean vector and covariance matrix for this species.
Sol. Let’s calculate the mean vector ȲB
¨ ˛ ¨ř60 ˛
25.5 i“1 Yi1
YB1 160 “ ˝14.1‚ “ ˝ 60
ř
Yi2 ‚
ři“1
60
11.3 i“1 Yi3

Then ¨ ˛ ¨ ˛ ¨ ˛
Ȳ1B 25.5 0.452
1 ˝
ȲB “ ˝Ȳ2B ‚ “ 14.1‚ “ ˝ 0.235 ‚
60
Ȳ3B 11.3 0.1883
Now, let’s calculate the covariance matrix SB
¨ ˛
50.7 32.6 24.8
YB1 YB “ ˝32.6 29.0 12.6‚
24.8 12.6 35.8

Finally, we have that


1 “ 1 ‰
SA “ YB YB ´ 60ȲB ȲB1
60 ´ 1
Using the following R code
YBm <− matrix ( c ( 0 . 4 2 5 , 0 . 2 3 5 , 0 . 1 8 8 3 ) , n c o l =1)
YtY <− matrix ( c ( 5 0 . 7 , 3 2 . 6 , 2 4 . 8 , 3 2 . 6 , 2 9 . 0 , 1 2 . 6 , 2 4 . 8 , 1 2 . 6 , 3 5 . 8 ) , n c o l =3)
SB <− 1/59 ∗ (YtY−60∗YBm%∗%t (YBm) )
SB

We get the following ¨ ˛


0.6756356 0.4509746 0.3389551
SB “ ˝0.4509746 0.4353644 0.1685588‚
0.3389551 0.1685588 0.5707218

c) Calculate the mean vector and covariance matrix for all mice.
Sol. Let’s calculate the mean vector for all mice
»¨ ˛ ¨ ˛fi ¨ ˛
25.5 25.5 0.4636364
1 –˝
Ȳ “ 14.1‚` ˝14.1‚fl “ ˝ 0.2563636 ‚
110
11.3 11.3 0.20555455

3
and the covariance for all mices
“ ‰
Y 1 Y “ YA1 YA ` YB1 YB
»¨ ˛ ¨ ˛fi
40.2 10.9 15.6 50.7 32.6 24.8
“ –˝10.9 13.7 14.5‚` ˝32.6 29.0 12.6‚fl
15.6 14.5 20.1 24.8 12.6 35.8
¨ ˛
90.9 43.5 40.4
“ ˝43.5 42.7 27.1‚
40.4 27.1 55.9

So, for the covariance matrix S we have that


1 “ 1 ‰
S“ Y Y ´ 110Ȳ Ȳ 1
109

Using the following R code


YBm <− matrix ( c ( 0 . 4 2 5 , 0 . 2 3 5 , 0 . 1 8 8 3 ) , n c o l =1)
YtY <− matrix ( c ( 5 0 . 7 , 3 2 . 6 , 2 4 . 8 , 3 2 . 6 , 2 9 . 0 , 1 2 . 6 , 2 4 . 8 , 1 2 . 6 , 3 5 . 8 ) , n c o l =3)
SB <− 1/59 ∗ (YtY−60∗YBm%∗%t (YBm) )
SB

We get ¨ ˛
0.6170142 0.2791326 0.2745121
˝0.2791326 0.3254178 0.1954696‚
0.2745121 0.1954696 0.4702452

Exercise 3
i) What is the geometric shape of the next equation?

px ´ µq1 Σ´1 px ´ µq “ c2 (0.1)

Sol.
We know that the hyperellipsoid is the extension of an elipse of more than 2 dimmensions. So, the equation
of a p-dimensional hyperellipsoid is px ´ x̄q1 S ´1 px ´ x̄ “ c2 with base in S ´1 and centered in x̄. It contains
a proportions of the observations x1 , x2 , ..., xn in the sample. Let’s notice that in (0.1) S is replaced by Σ,
then the c2 could be determined by a chi-square distribution.
ii) Plot (use R): px ´ µq1 Σ´1 px ´ µq “ c2, with c “ 1 and:

ˆ ˙
1 1 0
a) µ “ p1, 1q; Σ “
0 2
Sol.
Let’s compute ˆ ˙ ˆ ˙
1 2 0 1 0
Σ´1 “ “ 1
2 0 1 0 2
Then, (0.1) becomes
ˆ ˙ˆ ˙
1 0 x´1
ppx ´ 1qpy ´ 1qq “1
0 21 y´1
ˆ ˙
x´1
ppx ´ 1qpy ´ 1qq “1
0.5py ´ 1q
px ´ 1q2 ` 0.5py ´ 1q2 “ 1
ˆ ˙
x
where x “ Now, with the following code
y

4
e1 <− f u n c t i o n ( x , y ) { ( x − 1 ) ∗ ∗ 2 + ( ( 0 . 5 ) ∗ ( y −1)∗∗2) −1}
x <− s e q ( −5 ,5 , l e n g t h =1000)
y<− s e q ( −5 ,5 , l e n g t h =1000)
z<− o u t e r ( x , y , e1 )
contour (x , y , z , l e v e l s = 0)

we get the plot

ˆ ˙
1 0
b) µ1 “ p0, 0q; Σ “
0 1
Sol.
Let’s compute ˆ ˙ ˆ ˙
1 1 0 1 0
Σ´1 “ “
1 0 1 0 1
Then, (0.1) becomes
ˆ ˙ˆ ˙
` ˘ 1 0 x
x y “1
0 1 y
ˆ ˙
` ˘ x
x y “1
y
x2 ` y 2 “ 1
ˆ ˙
x
where x “ Now, with the following code
y
e2 <− f u n c t i o n ( x , y ) { ( x ∗∗2)+( y∗∗2) −1}
x <− s e q ( − 1 . 5 , 1 . 5 , l e n g t h =1000)
y<− s e q ( − 1 . 5 , 1 . 5 , l e n g t h =1000)
z<− o u t e r ( x , y , e2 )
contour (x , y , z , l e v e l s = 0)

We get the plot

5
Figure 1: Plot 3b

ˆ ˙
1 1 ´1
c) µ “ p0, 0q; Σ “
´1 2
Sol.
First let’s find the inverse of Σ ˆ ˙ ˆ ˙
´1 1 2 1 2 1
Σ “ “
2´1 1 1 1 2
Then
ˆ ˙ˆ ˙
` ˘ 2 1 x
x y “1
1 2 y
ˆ ˙
` ˘ 2x ` 4
x y “1
x ` 2y
2x2 ` 2xy ` 2y 2 “ 1

Now, we plot this equation and get:


e3 <− f u n c t i o n ( x , y ) { ( 2 ∗ ( x ∗∗2))+(2∗ x∗y )+(2∗( y ∗∗2)) −1}
x <− s e q ( − 2 . 5 , 2 . 5 , l e n g t h =1000)
y<− s e q ( −2 ,2 , l e n g t h =1000)
z<− o u t e r ( x , y , e3 )
contour (x , y , z , l e v e l s = 0)

Figure 2: Plot 3c
ˆ ˙
1 2 1
d) µ “ p0, 0q; Σ “
1 2
First, let’s find the inverse of Σ

6
ˆ ˙ ˆ ˙
1 2 ´1 1 2 ´1
Σ´1 “ “
4´1 ´1 2 3 ´1 2
Then

ˆ ˙ˆ ˙
1 2 ´1 x
px, 4q “1
3 ´1 2 4
ˆ ˙
1 2x ´ 4
px, 4q “1
3 ´x ` 2y

1
p2x2 ´ xy ´ xy ` 2y 2 q “ 1
3

p2x2 ´ 2xy ` 2y 2 q “ 3

Finally, we plot this equation and get:


e4 <− f u n c t i o n ( x , y ) { ( 2 ∗ ( x ∗∗2) −(2∗( y ∗∗2)) −3}
x <− s e q ( − 2 . 5 , 2 . 5 , l e n g t h =1000)
y<− s e q ( −2 ,2 , l e n g t h =1000)
z<− o u t e r ( x , y , e4 )
contour (x , y , z , l e v e l s = 0)

Figure 3: Plot 3d

Exercise 4.
Let’s consider the sample: ¨ ˛
2 6 ´3
˚´4 8 7‹
A“˚ ‹
˝´2 9 7‚
´7 8 2

From N3 pµ, Σq with µ and Σ unknown


i) Calculate the maximum likelihood estimator of µ
Sol.
Code in R

7
shapiro . test (y)
r<−mean ( y [ , 1 ] )
s<−mean ( y [ , 2 ] )
t<−mean ( y [ , 3 ] )
u <− data . frame ( r , s , t ) ;
u
Then :

pµq1 “ Y “ p´2.75; 7.75; 3.25q

ii) Calculate a unbiased estimator of Σ


Sol.
Code in R
sigma<−round ( cov ( y ) , 3 ) ;
sigma
round ( c o r ( y ) , 3 )
n <−l e n g t h ( y [ , 1 ] ) ;
( ( n−1)/n ) ∗ sigma

Results: ¨ ˛
10.68750 ´2.18775 ´6.56250
˝´2.18775 1.18725 4.06275 ‚
´6.56250 4.06275 17.18775

iii) Calculate the sample variance-covariance matrix.


Result.

sigma<−round ( cov ( y ) , 3 ) ;
sigma
¨ ˛
14.250 ´2.917 ´8.750
˝´2.917 1.583 5.417 ‚
´8.750 5.417 22.917

Exercise 5.
Let X be a random p dimensional vector of mean and matrix of variances covariances I (matrix identity). Given
a square matrix of order p, A, consider Y “ X 1 AX and proof that EpY q “ trapAq ` µ1 Aµ Sol.
We have:

ErX 1 AXs “ trapAq ` µ1 ` Aµ


ErpX ´ µq1 Apx ´ µqs “ ErX 1 AXs ´ ErpX 1 ´ Aµq ´ Epµ1 AXq ` Epµ1 Auq
ErpX ´ µq1 Apx ´ µqs “ ErX 1 AXs ´ µ1 Aµ
From there, we get:

ErX 1 AXs “ ErX 1 AXs ` µ1 Aµ


ErX 1 AXs “ ErpX ´ µ1 qApX ´ µqs ` µ1 Aµ
ErX 1 AXs “ ErtrapX ´ µ1 qApX ´ µqs ` µ1 Aµ
ErX 1 AXs “ ErtraApX ´ µ1 qApX ´ µqs ` µ1 Aµ
ErX 1 AXs “ traErApX ´ µ1 qpX ´ µqs ` µ1 Aµ
ErX 1 AXs “ trapAqErpX ´ µ1 qApX ´ µqs ` µ1 Aµ

8
ErX 1 AXs “ trapAqΣ ` µ1 Aµ
Σ “ I, therefore

ErX 1 AXs “ trapAq ` µ1 Aµ


ErY s “ trapAq ` µ1 Aµ

Exercise 6.
If:
y “ Σ´1{2 px ´ µq
proof that
px ´ µq1 Σ´1 px ´ µq “ y 1 y
Then,
y 1 “ rΣ´1{2 px ´ µqs1
y 1 “ pΣ´1{2 q1 px ´ µq1
y 1 “ px ´ µq1 Σ´1{2
Therefore,
y 1 y “ px ´ µq1 Σ´1{2 Σ´1{2 px ´ µq
Finally,
y 1 y “ px ´ µq1 Σ´1 px ´ µq

Exercise 7.
Do the data represented in the following multiple scatter diagram come from a multivariate normal distribution?

Figure 4: Plots for supposed multivariate normal distribution.

The first graph plotted on each row of Figure 1. shows the bivariate normal density. Based on this analysis,
we can infer from Figure 1. that any of the graphs showed on each row follows a multiple normal distribution
as is shown in Figure 2. Therefore, the data in not normally distributed.

9
Figure 5: Bivariate normal density graph for multivariate normal distributions.

10

You might also like