document

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

Unconstrained Parameterizations for Variance-Covariance Matrices

José C. Pinheiro and Douglas M. Bates


Department of Biostatistics Department of Statistics
University of Wisconsin — Madison

The estimation of variance-covariance matrices in situations that involve the optimization of an ob-
jective function (e.g. a log-likelihood function) is usually a difficult numerical problem, since the
resulting estimates should be positive semi-definite matrices. We can either use constrained opti-
mization, or employ a parameterization that enforces this condition. We describe here five different
parameterizations for variance-covariance matrices that ensure positive definiteness, while leaving
the estimation problem unconstrained. We compare the parameterizations based on their computa-
tional efficiency and statistical interpretability. The results described here are particularly useful in
maximum likelihood and restricted maximum likelihood estimation in mixed effects models, but are
also applicable to other areas of statistics.
KEY WORDS: Unconstrained estimation, Variance-covariance components estimation, Cholesky
factorization, matrix logarithm.

1. I NTRODUCTION effort as an unconstrained solution. In addition, the statistical


properties of constrained estimates, such as asymptotic proper-
The estimation of variance-covariance matrices through opti- ties, can be difficult to characterize.
mization of an objective function, such as a log-likelihood func- In this case, the constraints themselves are quite complicated
tion, is usually a difficult numerical problem, since one must en- to express. Verifying that a given symmetric matrix is positive
sure that the resulting estimate is positive semi-definite. This semi-definite is essentially as difficult as employing one of the
kind of estimation problem occurs, for example, in the analysis unconstrained parameterizations we will describe later.
of linear and nonlinear mixed-effects models, when one is inter-
For these reasons, we recommend the use of unconstrained
ested in obtaining maximum likelihood, or restricted maximum
optimization with a parameterization that enforces the positive
likelihood, estimates of the random effects variance-covariance
semi-definite constraint.
matrix (Lindstrom and Bates, 1988 and 1990). In these models,
because the random effects are unobserved quantities, no sam- An unconstrained estimation approach for variance-
ple variance-covariance matrix type of estimator, that would au- covariance matrices in a Bayesian context using matrix
tomatically be positive semi-definite, is available. Indirect es- logarithms can be found in Leonard and Hsu (1993). Lindstrom
timation methods must be used. The estimation of a variance- and Bates (1988, 1990) describe the use of Cholesky factors
covariance matrix through optimization of a log-likelihood func- for implementing unconstrained estimation of random effects
tion may occur even when a sample variance-covariance estima- variance-covariance matrices in linear and nonlinear mixed
tor is available, if, for example, one is interested in using maxi- effects models using likelihood and restricted likelihood.
mum likelihood asymptotic results to assess the variability in the Since a variance-covariance matrix is positive semi-definite,
resulting estimates. but not positive definite, only in the rather degenerate situation
Two approaches can be used to ensure positive semi- of linear combinations of the underlying random variables taking
definiteness of a variance-covariance matrix estimate: con- constant values, we will restrict ourselves here to positive defi-
strained optimization, where the natural parameterization for nite variance-covariance matrices.
the upper-triangular elements in the variance-covariance matrix In addition to enforcing the positive definiteness constraints,
is used and the estimates are constrained to be positive semi- the choice of the parameterization can be influenced by compu-
definite matrices; and unconstrained optimization, where the tational efficiency and by the statistical interpretability of the in-
upper-triangular elements in the variance-covariance matrix are dividual elements. In general, we can use numerically or analyt-
reparameterized in such way that the resulting estimate must be ically determined second derivatives of the objective function to
positive semi-definite. approximate standard errors and derive confidence intervals for
The first approach, constrained optimization using the non- the individual parameters. To assess the variability of the vari-

redundant entries of the matrix as parameters, would be very ances and covariances estimates, it is desirable that they can be
difficult. As Dennis, Jr. and Schnabel (1983) point out, attempts expressed as simple functions of the unconstrained parameters.
to solve a constrained optimization problem usually boil down to More detailed techniques, such as profiling the likelihood (Bates
repeated unconstrained problems or to solving a nonlinear sys- and Watts, 1988, chapter 6), also work best for functions of the
tem of equations. The simplest cases are those where there are variance-covariance matrix that are expressed in the original pa-
simple inequality constraints on the parameters and even in those rameterization.
cases constrained solutions can require several times as much We describe in Section 2 five different parameterizations for
U NCONSTRAINED PARAMETERIZATIONS FOR VARIANCE -C OVARIANCE M ATRICES 2

transforming the estimation of unstructured (general) variance- Another problem with the Cholesky parameterization is the
covariance matrices into an unconstrained problem. In Section 3 lack of a straightforward relationship between and the ele- 
we compare the parameterizations with respect to their computa- 
ments of . This makes it hard to interpret the estimates of 
tional efficiency and statistical interpretability. Our conclusions and to obtain confidence intervals for the variances and covari-
and suggestions for further research are presented in Section 4. 
ances in based on confidence p intervals for the elements of . 
L 
One exception is j[ ]11 j = [ ]11 , so confidence intervals on
2. PARAMETERIZATIONS 
[ ]11 can be obtained from confidence intervals on [ ]11 , where L

Let denote a symmetric positive definite n  n variance-
A A
[ ]ij denotes the ij th element of the matrix . By appropriately

covariance matrix corresponding to a random vector X =
permuting the columns and rows of we can in fact derive con-


fidence intervals for all the variance terms based on confidence
(X1 ; : : : ; Xn ). We do not assume any further structure for . intervals for the elements of . L

Because is symmetric, only n(n +1)=2 parameters are needed

to represent it. We will denote by any such minimal set of pa-
The main advantage of this parameterization, apart from the


fact that it ensures positive definiteness of the estimate of , is
rameters to determine . The rationale behind all parameteriza- that it is computationally simple and stable.
tions considered in this section is to write The Cholesky factorization of in (2.2) is A
=L L T 2 32 3
(2.1)
1 0 0 1 1 1
where L = L () is an n  n matrix of full rank obtained from A = 4 1 2 0 54 0 2 2 5
a n(n + 1)=2-dimensional vector of unconstrained parameters 1 2 3 0 0 3
. It is clear that any  defined from a full rank L as in (2.1) is By convention, the components of the upper triangular part of L
positive definite.
L
Different choices of lead to different parameterizations of

are listed column-wise to give = (1; 1; 2; 1; 2; 3) .
T

 L
. We will consider here two classes of : one based on
the Cholesky factorization (Thisted, 1988, x33) of  2.2 Log-Cholesky Parameterization
L
and an-
 If one requires the diagonal elements of in the Cholesky
L
other based on the spectral decomposition of (Rao, 1973,
x1c3). The first three parameterizations presented below use the factorization to be positive then is unique. To avoid con-
 strained estimation, one can use the logarithms of the diagonal
L
Cholesky factorization of , while the last two are based on its
spectral decomposition. elements of . We call this parameterization the log-Cholesky
In some of the parameterizations there are particular compo- parameterization. It inherits the good computational properties

nents of the parameter vector that have meaningful statistical of the Cholesky parameterization, but has the advantage of be-

interpretations. These can include the eigenvalues of , which ing uniquely defined. As in the Cholesky parameterization the
are important in considering when the matrix is ill-conditioned, parameters lack direct interpretation in terms of the original vari-
ances and covariances, except for L11 .
A
the individual variances or standard deviations, and the correla-
tions. The log-Cholesky parameterization of is
The following variance-covariance matrix will be used
throughout this section to illustrate the use of the various  = (0; 1; log(2); 1; 2; log(3))T :
parameterizations.
2 3 2.3 Spherical Parameterization
1 1 1
A=4 1 5 5 5
The purpose of this parameterization is to combine the compu-
(2.2) tational efficiency of the Cholesky parameterization with direct
1 5 14 
interpretation of in terms of the variances and correlations in
 .
2.1 Cholesky Parameterization
L L
Let i denote the ith column of in the Cholesky factoriza-
Because 
is positive definite, it may by factored as 
=  l
tion of and i denote the spherical coordinates of the first i el-
LL T
L
, where is an upper triangular matrix. Setting to be L
ements of i . That is
L
the upper triangular elements of gives the Cholesky param-

eterization of . Lindstrom and Bates (1988) use this param- [Li ]1 = [li ]1 cos ([li ]2 )
eterization to obtain derivatives of the log-likelihood of a lin- [Li ]2 = [li ]1 sin ([li ]2 ) cos ([li ]3 )
ear mixed effects model for use in a Newton-Raphson algorithm.

[Li ]i?1 = [li ]1 sin ([li ]2 )    cos ([li ]i )
They reported that the use of this parameterization dramatically
improved the convergence properties of the optimization algo-
rithm, when compared to a constrained estimation approach. [Li ]i = [li ]1 sin ([li ]2 )    sin ([li ]i )
One problem with the Cholesky parameterization is that the
L
Cholesky factor is not unique. In fact, if is a Cholesky factor l
It then follows that ii = [ i ]1 and 1i = cos([ i ]2 ); i =
2
l

of then so is any matrix obtained by multiplying a subset of 2; : : : ; n, where ij denotes the correlation coefficient between
L
the rows of by ?1. This has implications on parameter iden- Xi and Xj . The correlations between other variables can be ex-

tification, since up to 2n different may represent the same .  pressed as linear combinations of products of sines and cosines
Numerical problems can arise in the optimization of an objec- l l
of the elements in 1 ; : : : ; n , but the relationship is not as
tive function when different optimal solutions are close together straightforward as those involving X1 . If confidence intervals
in the parameter space. l
are available for the elements of i ; i = 1; : : : ; n then we can
U NCONSTRAINED PARAMETERIZATIONS FOR VARIANCE -C OVARIANCE M ATRICES 3

also obtain confidence intervals for the variances and the corre- The matrix logarithm of A is
lations 1i . By appropriately permuting the rows and columns of 2 3
 ?0:174 0:392 0:104
, we can in fact obtain confidence intervals for all the variances
and correlations of X1 ; : : : ; Xn . The exact same reasoning can
log (A) = 4 0:392 1:265 0:650 5
be applied to derive profile traces and profile contours (Bates and
0:104 0:650 2:492
Watts, 1988) for variances and correlations of X1 ; : : : ; Xn based and therefore the matrix logarithm parameterization of A is
 = (?0:174; 0:392; 1:265; 0:104; 0:650; 2:492) :
on a likelihood function. T
To ensure uniqueness of the spherical parameterization we
must have 2.5 Givens Parameterization
[li ]1 > 0; i = 1; : : : ; n and The eigenstructure of contains valuable information for
[li ]j 2 (0; ) ; i = 2; : : : ; n; j = 2; : : : ; i determining whether some linear combination of X1 ; : : : ; Xn
could be regarded as nearly constant. This is useful, for example,
Unconstrained estimation is obtained by defining  as follows in model building for mixed-effects models, as near zero eigen-
values may indicate overparameterization (Pinheiro and Bates,
i = log ([li ]1 ) ; i = 1; : : : ; n and 1995). The Givens parameterization uses the eigenvalues of 
n+(i?2)(i?1)=2+(j?1) = directly in the definition of the parameter vector . 
!
[li ]j
The Givens parameterization is based on the spectral decom-
log  ? [l ] ; i = 2; : : : ; n; j = 2; : : : ; i 
position of given in (2.3) and the fact that the eigenvector ma-
i j U
trix can be represented by n(n ? 1)=2 angles, used to gener-
ate a series of Givens rotation matrices (Thisted, 1988, x3166)
The spherical parameterization has about the same computa- U
whose product reproduce as follows
tional efficiency as the Cholesky and log-Cholesky parameteri-
zations, is uniquely defined, and allows direct interpretability of U = G
8
G    Gn n? = ; where
1 2 ( 1) 2
 in terms of the variances and correlations in . 
> cos(i ); if j = k = m (i)
>
>
A
1
The spherical parameterization of is >
> or j = k = m2 (i)
>
>
< sin(i ); if j = m1 (i); k = m2 (i)
 = (0; log(5)=2; log(14)=2; ?0:608; ?0:348; ?0:787)T : Gi [j; k] = > ? sin(i ); if j = m2(i); k = m1(i)
>
>
> 1; if j = k 6= m1 (i)
2.4 Matrix Logarithm Parameterization >
>
> and j = k 6= m2 (i)
>
:
This and the next parameterization are based on the spectral 0; otherwise
 
decomposition of . Because is positive definite, it has n pos-

itive eigenvalues . Letting U denote the orthogonal matrix of and m1 (i) < m2 (i) are integers taking values in f1; : : : ; ng and
  
orthonormal eigenvectors of and = diag ( ), we can write satisfying i = m2 (i) ? m1 (i)+(m1 (i) ? 1) (n ? m1 (i)=2). To
ensure uniqueness of the Givens parameterization we must have
 = U U T (2.3) i 2 (0; ) ; i = 1: : : : ; n(n ? 1)=2.
The spectral decomposition (2.3) is unique up to a reorder-
By setting ing of the diagonal elements of 
and columns of and up U
L =  = UT1 2
(2.4) U
to switching of signs in each column of . Uniqueness can

p(2.1), where
in  1=2
denotes the diagonal matrix with [1=2 ]ii =
be achieved by forcing the eigenvalues to be sorted in ascend-

[]ii , we get a factorization of based on the spectral decom- ing order. This can be attained, within an unconstrained estima-
tion framework, by using a parameterization suggested by Jupp
position.
The matrix logarithm of  is defined as log ( )  = (1978) and defining the first n elements of as 
U U  
log ( ) T , where log ( ) = diag [log ( )]. Note that i = log (i ? i?1 ) ; i = 1; : : : ; n
 
and log ( ) share the same eigenvectors. The matrix log ( ) 
can take any value in the space of n  n symmetric matrices. where i denotes the ith eigenvalue of  is ascending order and

Letting be equal to its upper triangular elements gives the with the convention that 0 = 0. The remaining elements of 
matrix logarithm parameterization of .  in the Givens parameterization are defined by the relation
The matrix logarithm parameterization defines a one-to-one   
 
mapping between and and therefore does not have the identi- n+i = log  ?i  ; i = 1; : : : ; n(n ? 1)=2
i
fication problems of the Cholesky factorization. It does involve
 
considerable calculations, as produces log ( ) whose eigen- The main advantage of this parameterization is that the first n
structure must be determined before L in (2.4) can be calcu- 
elements of give information about the eigenvalues of di- 
lated. Similarly to the Cholesky and log-Cholesky parameteriza- rectly. Another advantage of the Givens parameterization is that

tions, the vector in the matrix logarithm parameterization does it can be easily modified to handle general (not necessarily posi-
not have a straightforward interpretation in terms of the origi- tive definite) symmetric matrices. The only modification needed

nal variances and covariances in . We note that even though is to set 1 = 1 and
the matrix logarithm is based on the spectral decomposition of
  X
i
, there is not a straightforward relationship between and the i =  1 + exp (i ) ; i = 2; : : : ; n
eigenvalues-eigenvectors of  j =2
U NCONSTRAINED PARAMETERIZATIONS FOR VARIANCE -C OVARIANCE M ATRICES 4

The main disadvantage of this parameterization is that it in-


volves considerable computational effort in the calculation of  Cholesky Log-Cholesky Spherical Matrix Log Givens


from the parameter vector . Another problem with the Givens Structure I Structure II

 
6.30 6.30

parameterization is that one cannot relate to the elements of

Time (seconds)

Time (seconds)
1.80 1.80
in a straightforward manner, so inferences about variances and
0.50 0.50
covariances require indirect methods.
U
The eigenvector matrix in (2.3) can also be expressed as a
0.15 0.15

product of a series of Householder reflection matrices (Thisted, 0.04 0.04

1988, x312) and these in turn can be derived from n(n ? 1)=2
Structure III Structure IV
6.30 6.30

parameters used to obtain the directions of the Householder re-

Time (seconds)

Time (seconds)
1.80 1.80

flections (Pinheiro, 1994). This Householder parameterization is 0.50 0.50

essentially equivalent to the Givens parameterization in terms of 0.15 0.15

statistical interpretability, but it is less efficient, since the deriva-


0.04 0.04
tion of the Householder reflection matrices involves even more Structure V Structure VI
computation than the Givens rotations. We have not considered 6.30 6.30

Time (seconds)

Time (seconds)
it here. 1.80 1.80

The Givens parameterization of is A 0.50 0.50

 = (?0:275; 0:761; 2:598; ?0:265; ?0:562; ?0:072)T :


0.15 0.15

0.04 0.04

5 24 43 62 81 100 5 24 43 62 81 100

Dimension Dimension

3. C OMPARING THE PARAMETERIZATIONS


In this section we compare the parameterizations described in Figure 1: Average user time to calculate L as a function of n, for
Section 2 in terms of their computational efficiency and the sta- the different parameterizations and eigenstructures of . 
tistical interpretability of the individual parameters.
2. Generate n independent random variables X1 ; : : : ; Xn ,
The computational efficiency of the different parameteriza-
such that Xi  N (log (i ) ; 0:01), and form a diagonal
tions is assessed by simulation. First we analyze the average time
L 
needed to calculate ( ) from for each parameterization and
matrix of random eigenvalues, , with [ ]ii = exp (Xi ).  
for different eigenstructures and for varying sizes of . Then  This ensures that the relative variability of the random
we compare the performance of the different parameterizations
eigenvalues is the same.
in computing the maximum likelihood estimate of the variance-
covariance matrix in a linear mixed effects model (Laird and 3. Obtain  = U U T .
Ware, 1982).
To investigate the effect of the eigenstructure of 
on the To evaluate the average time needed to calculate , we gener- L
ated, for each of the eigenvalue structures in Table 1, 25 random
computational efficiency of the parameterizations, six different

n  n matrices according to the above algorithm, with n vary-
eigenvalue structures, described in Table 1, were considered in
ing from 6 to 100. For each we obtained and recorded the  
the simulation study presented below.
average time to calculate . The time quoted is the time the CPUL
spent evaluating the user’s program for the calculation. Because
these user times were too small for accurate evaluation when us-
ing matrices of dimension less than 10, we used 5 evaluations of
Structure Eigenvalues
I f1; 1; : : : ; 1; 1g L
II f1000; 1; 1; : : :; 1; 1g for each user time calculation. Figure 1 presents the average
user time as a function of n for each of the parameterizations of
III 8 f1; 1; : : :; 1; 0:001g 9  and for each of the eigenvalue structures in Table 1.
>
< >
= The computational performances of the parameterizations are
1000 ; : : :; 1000 ; 0:001; : : :; 0:001
:| n{z=2 } |
> {z }>
IV essentially the same for all eigenvalue structures considered.
n=2
; The Cholesky, the log-Cholesky, and the spherical parameter-
V f1000; 1; : : :; 1; 0:001g izations have similar performances, considerably better than
VI f10; 20; 30; : : :; (n ? 1)  10; n  10g the other two parameterizations. The Cholesky and the log-
Cholesky parameterizations have slightly better performances
Table 1: Different eigenvalue structures for n  n matrices  used than the spherical parameterization, especially for n  25. The
in the simulation study. matrix logarithm had the worst performance, followed by the
Givens parameterization. These results are essentially reflecting

Random matrices of dimension n, for a given eigenvalue the computational complexity of each parameterization, as de-
structure (1 ; : : : ; n ), were generated according to the follow- scribed in Section 2.
ing algorithm. To compare the different parameterizations in an estimation
context, we conducted a small simulation study using the linear
1. Select a random n-dimensional orthogonal matrix uni- U mixed effects model
formly on the group of orthogonal matrices, using the algo-
rithm proposed by Anderson, Olkin and Underhill (1987). yi = X i ( + bi ) + i; i = 1; : : : ; M (3.1)
U NCONSTRAINED PARAMETERIZATIONS FOR VARIANCE -C OVARIANCE M ATRICES 5

User Time to Convergence Number of Iterations to Convergence User Time to Convergence Number of Iterations to Convergence
o 90 o 1200 o 135
70 o o
o
o
o
o o
o o o
o
o o o
o
o o
o o
o o
o
o o o o
o o o
40 o
o o 45 o 840 o 95
o o
o o o
o
User Time (seconds)

User Time (seconds)

User Time (seconds)

User Time (seconds)


o o o o
o
o o
o
o

70 o
20 575
20

50
10 400
10

o o o o
o o
o o
o 35 o o
5.5 5 275 o

Cholesky logCholesky Spherical Matrix log Givens Cholesky logCholesky Spherical Matrix log Givens Cholesky logCholesky Spherical Matrix log Givens Cholesky logCholesky Spherical Matrix log Givens
Pameterization Pameterization Pameterization Pameterization

Figure 2: Box-plots of user time and number of iterations to con- Figure 3: Box-plots of user time and number of iterations to con-
vergence for 300 random samples of model (3.1) with of dimen-  vergence for 50 random samples of model (3.1) with of dimen- 
sion 3. sion 6.

b
where the i are independent, identically distributed random ef- reduced.
fects with common N ( ;  2 ) distribution and the i are inde- 0   
As approaches singularity its determinant goes to zero and
pendent and identically distributed error terms with common dis- so at least one of the diagonal elements of its Cholesky factor
0 I
tribution Nni ( ;  2 ), independent of the i , with ni represent- b goes to zero too. The Cholesky parameterization would then be-
ing the number of observations on the ith cluster. Lindstrom and come numerically unstable, since equivalent solutions would get
Bates (1988) have shown that the log-likelihood in (3.1) can be closer together in the estimation space. At least one element of 
profiled to produce a function of alone. In the simulation, we  in the log-Cholesky parameterization would go to ?1 (the log-

used matrices of dimensions 3 and 6. These were defined such arithm of the diagonal element of that goes to zero). In the L
that the nonzero elements of the ith column of the corresponding spherical parameterization we would also have at least one el-
Cholesky factor were equal to the integers between 1 and i. For 
ement of going in absolute value to 1: if the first diagonal
n = 3 we have = , as given in (2.2). For n = 3 we used  A L
element of goes to zero, 1 ! ?1; otherwise at least one an-
M = 10, ni = 15; i = 1; : : : ; 10, 2 = 1, and = (10; 1; 2)T , gle of the spherical coordinates of the column of whose diago- L
while for n = 6 we used M = 50, ni = 25; i = 1; : : : 50, nal element approaches 0 would either approach 0 or  , in which
2 = 1, and = (10; 1; 2; 3; 4; 5)T . In both cases, the elements cases the corresponding element of would go respectively to 
of the first column of were set equal to 1 and the remaining X ?1 or 1.
elements were independently generated according to a U (1; 20) Singularity of 
implies that at least one of its eigenvalues
distribution. A total of 300 and 50 samples were generated re- is zero. The Givens parameterization would then have at least
spectively for n = 3 and n = 6, and the number of iterations the first element of going to ?1. To understand what hap- 
and the user time to calculate the maximum likelihood estimate pens with the matrix logarithm parameterization when ap- 

of for each parameterization recorded. proaches singularity we note that letting (1 ; 1 ); : : : ; (n ; n ) u u
Figures 2 and 3 present box-plots of the number of iterations P
represent the eigenvalue-eigenvector pairs corresponding to 
and of the user times for the various parameterizations. The we can write 
= ni=1 i i Ti . As 1 ! 0 all entries of uu
Cholesky, the log-Cholesky, the spherical, and the matrix log- 
log( ) corresponding to nonzero elements of 1 T1 would con- uu
arithm parameterizations had similar performances for n = 3, verge in absolute value to 1. Hence in the matrix logarithm pa-
considerably better than the Givens parameterization. For n = 6 rameterization we could have all elements of going either to 
the Cholesky and the matrix logarithm parameterizations gave ?1 of 1 as approached singularity. 
the best performances, followed by the log-Cholesky and spher- Finally we consider the statistical interpretability of the pa-
ical parameterizations, all considerably better than the Givens rameterizations of . The least interpretable parameterization 
parameterization. Because is relatively small in these exam-  is the matrix logarithm — none of its elements can be directly
ples, the numerical complexity of the different parameterizations related to the individual variances, covariances, or eigenvalues
did not play a major role in their performances. It is interesting 
of . The Cholesky and log-Cholesky parameterizations have
to note that even though the matrix logarithm is the least effi- the first component directly related to the variance of X1 , the
cient parameterization in terms of numerical complexity, it had first underlying random variable in . By permuting the order of 
the best performance in terms of number of iterations and user the random variables in the definition of , one can derive mea- 
time to obtain the maximum likelihood estimate of , suggest-  sures of variability and confidence intervals for all the variances
ing that this parameterization is most numerically stable. 
in , from corresponding quantities obtained for the parameters
Another important aspect in which the parameterizations in the Cholesky or log-Cholesky parameterizations. The Givens
should be compared is their behavior as approaches singular-  parameterization is the only one considered here that uses the
ity. All parameterizations described in Section 2 require to  
eigenvalues of directly in the definition of . It is a very use- 
be positive definite, though the Givens parameterization can be ful parameterization for identifying ill-conditioning of . None 
modified to handle general symmetric matrices. It is usually an of its parameters, though, can be directly related to the variances
important statistical issue to test whether is of less than full  and covariances in . Finally, the spherical parameterization is 
rank, in which case the dimension of the parameter space can be the one that gives the largest number of interpretable parameters
U NCONSTRAINED PARAMETERIZATIONS FOR VARIANCE -C OVARIANCE M ATRICES 6

of all parameterizations considered here. Measures of variability R EFERENCES



and confidence intervals for all the variances in and the corre-
lations with X1 can be obtained from the corresponding quanti- Anderson, T. W., Olkin, I. and Underhill, L. G. (1987). Genera-

ties calculated for . By permuting the order of the underlying tion of random orthogonal matrices, SIAM J. on Scientific

random variables in the definition of , one can in fact derive and Stat’l. Computing 8(4): 625–629.
measures of variability and confidence intervals for all the vari-

Bates, D. M. and Watts, D. G. (1988). Nonlinear Regression
ances and correlations in . Analysis and Its Applications, Wiley, New York.
Dennis, Jr., J. E. and Schnabel, R. B. (1983). Numerical Methods
4. C ONCLUSIONS for Unconstrained Optimization and Nonlinear Equations,
The parameterizations described in Section 2 allow the es- Prentice-Hall, Englewood Cliffs, NJ.
timation of variance-covariance matrices using unconstrained Jennrich, R. I. and Schluchter, M. D. (1986). Unbalanced re-
optimization. This has numerical and statistical advantages peated measures models with structural covariance matri-
over constrained optimization, since the latter is usually a much ces, Biometrics 42(4): 805–820.
harder numerical problem. Furthermore unconstrained estimates
tend to have better inferential properties. Jupp, D. L. B. (1978). Approximation to data by splines with free
Of the five parameterizations considered here, the spherical knots, SIAM Journal of Numerical Analysis 15(2): 328–
parameterization presents the best combination of performance 343.
and statistical interpretability of individual parameters. The
Cholesky and log-Cholesky parameterizations have comparable Laird, N. M. and Ware, J. H. (1982). Random-effects models for
performances, similar to the spherical parameterization, but lack longitudinal data, Biometrics 38: 963–974.
direct parameter interpretability. The Givens parameterization Leonard, T. and Hsu, J. S. J. (1993). Bayesian inference for a
is considerably less efficient than these parameterizations, but covariance matrix, Annals of Statistics 21: 1–25.
has the feature of being directly based on the eigenvalues of the
variance-covariance matrix. This can be used, for example, to Lindstrom, M. J. and Bates, D. M. (1988). Newton-Raphson
identify nonrandom linear combinations of the underlying ran- and EM algorithms for linear mixed-effects models for
dom variables. The matrix logarithm parameterization is very repeated-measures data, Journal of the American Statisti-
inefficient as the dimension of the variance-covariance matrix cal Association 83: 1014–1022.
increases, but seems to be most stable parameterization. It also
lacks direct interpretability of its parameters. Lindstrom, M. J. and Bates, D. M. (1990). Nonlinear mixed
effects models for repeated measures data, Biometrics
Different parameterizations can be used at different stages of
46: 673–687.
the data analysis. The matrix logarithm parameterization seems
to be the most efficient for the optimization step, at least for mod- Pinheiro, J. C. (1994). Topics in Mixed Effects Models, PhD the-

erately large . The spherical parameterizations is probably the sis, University of Wisconsin–Madison.
best one to derive measures of variability and confidence inter-

vals for the elements of , while the Givens parameterization is Pinheiro, J. C. and Bates, D. M. (1995). Model building for non-
the most convenient to investigate rank deficiency of .  linear mixed-effects models, Technical Report 91, Depart-
Only unstructured variance-covariance matrices were consid- ment of Biostatistics, University of Wisconsin – Madison.
ered here but in many situations that involve the optimization
Rao, C. R. (1973). Linear statistical inference and its applica-
of an objective function, structured matrices are used instead
tions, 2nd edn, Wiley, New York.
(Jennrich and Schluchter, 1986). It is therefore important to de-
rive parameterizations for structured variance-covariance matri- Thisted, R. A. (1988). Elements of Statistical Computing, Chap-
ces that allow unconstrained estimation of the associated param- man & Hall, London.
eters.
The asymptotic properties of the different parameterizations
considered here have not yet been studied and certainly consti-
tute an interesting research topic. It may be that some of the pa-
rameterization give faster rates of convergence to normality than
others and this could be used as a criterion for choosing among
them.

ACKNOWLEDGEMENTS
This research was partially supported by the National Sci-
ence Foundation through grant number DMS-9309101 and by
the Coordenação para Aperfeiçoamento de Pessoal de Nı́vel Su-
perior, Brazil. We are grateful to Bruce E. Ankenman for sug-
gesting the Givens parameterization and to two anonymous ref-
erees for helpful comments and suggestions.

You might also like