Shrinkage Estimation of The Covariance Matrix
Shrinkage Estimation of The Covariance Matrix
Shrinkage Estimation of The Covariance Matrix
Volume 4 | Issue 3
August 2011
This Regular Article is brought to you by the Bond Business School at ePublications@bond. It has been accepted for inclusion in Spreadsheets in
Education (eJSiE) by an authorized administrator of ePublications@bond. For more information, please contact Bond University's Repository
Coordinator.
Article 6
Shrinkage estimation of the covariance matrix of asset returns was introduced to the finance profession several
years ago. Since then, the approach has also received considerable attention in various life science studies, as a
remedial measure for covariance matrix estimation with insufficient observations of the underlying variables.
The approach is about taking a weighted average of the sample covariance matrix and a target matrix of the
same dimensions. The objective is to reach a weighted average that is closest to the true covariance matrix
according to an intuitively appealing criterion. This paper presents, from a pedagogic perspective, an
introduction to shrinkage estimation and uses Microsoft ExcelTM for its illustration. Further, some related
pedagogic issues are discussed and, to enhance the learning experience of students on the topic, some Excelbased exercises are suggested.
Keywords
The author wishes to thank Y. Feng and two anonymous reviewers for helpful comments and suggestions. He
also wishes to thank K. Brewer for advice on technical issues pertaining to Visual Basic for Applications
(VBA).
1
The author wishes to thank Y. Feng and two anonymous reviewers for helpful comments and suggestions.
He also wishes to thank K. Brewer for advice on technical issues pertaining to Visual Basic for Applications
(VBA).
Abstract
Shrinkage estimation of the covariance matrix of asset returns was introduced to the nance profession several years ago. Since then, the approach has also received considerable attention in
various life science studies, as a remedial measure for covariance matrix estimation with insu cient
observations of the underlying variables. The approach is about taking a weighted average of the
sample covariance matrix and a target matrix of the same dimensions. The objective is to reach a
weighted average that is closest to the true covariance matrix according to an intuitively appealing
criterion. This paper presents, from a pedagogic perspective, an introduction to shrinkage estimation and uses Microsoft ExcelT M for its illustration. Further, some related pedagogic issues
are discussed and, to enhance the learning experience of students on the topic, some Excel-based
exercises are suggested.
Keywords: Shrinkage estimation, sample covariance matrix.
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
Introduction
For a given set of random variables, the corresponding covariance matrix is a symmetric matrix with
its diagonal and o-diagonal elements being the individual variances and all pairwise covariances,
respectively.
When each variable involved is normalized to have a unit variance, such a matrix
reduces to a correlation matrix. The usefulness of these matrices for multivariate investigations is
well known across various academic disciplines. In nance, for example, the covariance matrix of
asset returns is part of the input parameters for portfolio analysis to assist investment decisions.
Likewise, in life science research, when statistical techniques are applied to analyze multivariate
data from experiments, statistical inference can be made with information deduced from some
corresponding covariance and correlation matrices.
As the true values of the individual matrix elements are unknown, they have to be estimated
from samples of empirical or experimental data.
example, historical asset returns are commonly used to estimate the covariance matrix.
Under
the stationarity assumption of the probability distributions of asset returns, sample estimates of
the covariance matrix are straightforward. The computations involved can easily be performed by
using some of the built-in matrix functions in Microsoft ExcelT M :
As explained in Kwan (2010), for a covariance matrix of asset returns to be acceptable for
portfolio analysis, it must be positive denite. A positive denite matrix is always invertible, but
not vice versa. In the context of portfolio investment, with variance being a measure of risk, an
invertible covariance matrix is required for portfolio selection models to reach portfolio allocation
results. A positive denite covariance matrix always provides strictly positive variances of portfolio
returns, regardless how investment funds are allocated among the assets considered. This feature
ensures not only the presence of portfolio risk, but also the uniqueness of e cient portfolio allocation
results as intended.
In case that the covariance matrix is estimated with insu cient observations, it will not be
positive denite. Thus, for example, to estimate a 100
returns requires more than 100 monthly return observations, just to ensure that the number of
observed returns exceed the number of the unknown matrix elements.
estimation errors be small enough for the sample covariance matrix to be acceptable as part of the
input parameters for portfolio analysis, additional observations are required. However, the validity
of the stationarity assumption of asset return distributions will become a concern if an overly long
sample period is used for the estimation.
The reliance on higher frequency observations over the same sample period, such as the use of
weekly or daily asset returns, though eective in easing ones concerns about non-stationary asset
return distributions, does have its drawbacks. Higher frequency observations tend to be noisier,
thus aecting the quality of the estimated covariance matrix.
that accounts for time-varying volatility of asset returns, some multivariate time series models
are available.1
However, such models are very complicated. Even the simplest versions of such
models are well beyond the scope of any investment courses for business students.
A challenge
for instructors of investment courses that cover empirical issues in covariance matrix estimation,
therefore, is whether it is feasible to go beyond sample estimates without using statistical methods
that are burdensome for students.
Before seriously contemplating the above challenge, notice that, in life sciences and related elds
where multivariate analysis is performed on experimental data, concerns about the adequacy of the
sample covariance and correlation matrices are from a very dierent perspective. As indicated in
Dobbin and Simon (2007) and Yao et al. (2008), biological studies often employ only small numbers
of observations, because either there are few available experimental data to use or data collection is
under budgetary or time constraints. Covariance matrices estimated with insu cient observations
are known to be problematic. As explained by Schfer and Strimmer (2005), in a study of gene
association networks using graphical Gaussian models (GGMs), the partial correlation of any two
genes, which measures their degree of association after removing the eect of other genes on them,
can be deduced from the inverse of a covariance matrix.
77
positive denite and thus is not invertible. The relatively low number of experimental observations
has made it necessary for these authors to rely on some remedial measures. The remedial measures
as reported by Schfer and Strimmer (2005), Beerenwinkel et al. (2007), and Yao et al. (2008),
as well as some other researchers facing similar challenges, are based on an innovative approach to
1
See, for example, the survey articles by Bauwens, Laurent, and Rombouts (2006) and Silvennoinen and Tersvirta
(2009) for descriptions of various multivariate time series models for covariance matrix estimation.
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
estimate the covariance matrix, called shrinkage estimation, as introduced to the nance profession
by Ledoit and Wolf (2003, 2004a, 2004b).
In investment settings, a weighted average of the sample covariance matrix of asset returns
and a structured matrix of the same dimensions is viewed as shrinkage of the sample covariance
matrix towards a target matrix. The shrinkage intensity is the weight that the target receives. In
the Ledoit-Wolf studies, the alternative targets considered include an identity matrix, a covariance
matrix based on the single index model (where the return of each asset is characterized as being
linearly dependent on the return of a market index), and a covariance matrix based on the constant
correlation model (where the correlation of returns between any two dierent assets is characterized
as being the same). In each case, the corresponding optimal shrinkage intensity has been derived
by minimizing an intuitively appealing quadratic loss function.2
For analytical convenience, the Ledoit-Wolf studies have relied on some asymptotic properties
of the asset return data in model formulation. Although stationarity of asset return distributions is
implicitly assumed, the corresponding analytical results are still based on observations of relatively
long time series. Thus, the Ledoit-Wolf shrinkage approach in its original form is not intended to
be a remedial measure for insu cient observations. To accommodate each life science case where
the number of observations is far fewer than the number of variables involved, Schfer and Strimmer
(2005) have extended the Ledoit-Wolf approach to nite sample settings. The Schfer-Strimmer
study has listed six potential shrinkage targets for covariance and correlation matrices.
They
include an identity matrix, a covariance matrix based on the constant correlation model, and a
diagonal covariance matrix with the individual sample variances being its diagonal elements, as
well as three other cases related to these matrices.
The emphasis of the Schfer-Strimmer shrinkage approach is a special case where the target is
a diagonal matrix. Shrinkage estimation of the covariance matrix for this case is relatively simple,
from both analytical and computational perspectives. When all variables under consideration are
normalized to have unit variances, the same shrinkage approach becomes that for the correlation
matrix instead. Analytical complications in the latter case are caused by the fact that normalization
of individual variables cannot be based on the true but unknown variances and thus has to be
based instead on the sample variances, which inevitably have estimation errors. In order to retain
the analytical features pertaining to shrinkage estimation of the covariance matrix, the Schfer2
The word optimal used throughout this paper is in an ex ante context. Whether an analytically determined
shrinkage intensity, based on in-sample data, is ex post superior is an empirical issue that can only be assessed with
out-of-sample data.
Strimmer study has assumed away any estimation errors in the variances when the same approach
is applied directly to a set of normalized data.
Opgen-Rhein and Strimmer (2006a, 2006b, 2007a, 2007b) have extended the Schfer-Strimmer
approach by introducing a new statistic for gene ranking and by estimating gene association networks in dynamic settings to account for the time path of the data. In view of the analytical simplicity of the Schfer-Strimmer version of the Ledoit-Wolf shrinkage approach, where the shrinkage
target is a diagonal matrix, it has been directly applied to various other settings in life sciences
and related elds.
Besides the studies by Beerenwinkel et al. (2007) and Yao et al. (2008) as
Goebel, and Bandettini (2006) have reported that shrinkage estimation with a diagonal target
improves the stability of the sample covariance matrix. Dabney and Storey (2007), also with the
covariance matrix estimated with shrinkage, have proposed an improved centroid classier for highdimensional data and have demonstrated that the new classier enhances the prediction accuracy
for both simulated and actual microarray data.
networks, Tenenhaus et al. (2010) have used the shrinkage GGM approach as one of the major
benchmarks to assess partial correlation networks that are based on partial least squares regression.
The research inuence of the Ledoit-Wolf shrinkage approach, however, is not conned to life
science elds. The approach as reported in Ledoits working papers well before its journal publications already attracted attention of other nance researchers. It was among the approaches for risk
reduction in large investment portfolios adopted by Jaggannathan and Ma (2003). More recently,
Disatnik and Benninga (2007) have compared empirically various shrinkage estimators (including
portfolios of estimators) of high-dimensional covariance matrices based on monthly stock return
data. In an analytical setting, where shrinkage estimation of covariance and correlation matrices
are with targets based on the average correlation of asset returns, Kwan (2008) has accounted for
estimation errors in all variances when shrinking the sample correlation matrix, thus implicitly
allowing the analytical expression of the Schfer-Strimmer shrinkage intensity to be rened.
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
In view of the attention that shrinkage estimation has received in the various studies, of particular
interest to us educators is whether the topic is now ready for its introduction to the classroom. With
the help of Excel tools, this paper shows that it is indeed ready. In order to avoid distractions by
analytical complications, this paper has its focus on shrinkage estimation of the covariance matrix,
with the target being a diagonal matrix. Specically, the diagonal elements of the target matrix
are the corresponding sample variances of the underlying variables. It is implicit, therefore, that
the shrinkage approach here pertains only to the covariances. Readers who are interested in the
analytical details of more sophisticated versions of shrinkage estimation can nd them directly in
Ledoit and Wolf (2003, 2004a, 2004b) and, for extensions to nite sample settings, in Schfer and
Strimmer (2005) and Kwan (2008).
This paper utilizes Excel tools in various ways to help students understand shrinkage estimation
better. Before formally introducing optimal shrinkage estimation, we establish in Section 3 that a
weighted average of a sample covariance matrix and a structured target, such as a diagonal matrix,
is always positive denite. To avoid digressions, analytical support for some materials in Section
3 is provided in Appendix A. An Excel example, with a scroll bar for manually making weight
changes, illustrates that, even for a covariance matrix estimated with insu cient observations, a
non-zero weight for the target matrix will always result in a positive denite weighted average.
The positive deniteness of the resulting matrix is conrmed by the consistently positive sign of
its leading principal minors (that is, the determinants of its leading principal submatrices). The
Excel function MDETERM, which is for computing the determinants of matrices, is useful for the
illustration. As any eects on the leading principal minors due to weight changes are immediately
displayed in the worksheet, the idea of shrinkage estimation will become less abstract and more
intuitive to students.
Optimal shrinkage is considered next in Section 4, with analytical support provided in Appendix
B. As mentioned briey earlier, the idea is based on minimization of a quadratic loss function.
Here, we take a weighted average of the sample covariance matrix, which represents a noisy but
unbiased estimate of the true covariance matrix, and a target matrix, which is biased.
Loss is
dened as the expected value of the sum of all squared deviations of the resulting matrix elements
from the corresponding true values.
lowest loss. As there is only one unknown parameter in the quadratic loss function, which is the
shrinkage intensity (i.e., the weight that the target matrix receives), its optimal value can easily be
found.
We then continue with the same Excel example to illustrate the computational task in shrinkage
estimation. In order to accommodate students with dierent prior experience with Excel, we use
alternative ways to compute the optimal shrinkage intensity. The most intuitive approach, which is
cumbersome for higher-dimensional cases, is to perform a representative computation for a cell in an
Excel worksheet and to use copy-and-paste to duplicate the cell formula elsewhere repeatedly, with
minor revisions when required. Students who have at least some rudimentary knowledge of Visual
Basic for Applications (VBA) can recognize that the use of a simple user-dened function is more
e cient for performing the same computations. As shown in Appendix D, the VBA program here
uses three nested for nextloops for the intended task. For convenience to users, the function has
only two arguments; they are for cell references of the required data for the computations involved.
Once dened, the function works just like any built-in Excel functions. Further, to enhance the
user-friendliness of the function, a simple Excel Sub procedure that calls the function directly is
also included. The procedure uses input boxes for entering cell references of the required data for
the computations and the intended location to show the nal result.
Section 5 addresses some pedagogic issues and suggests some exercises for students.
To en-
hance the learning experience of students, we consider it important for instructors to assign some
discipline-based Excel exercises on shrinkage estimation.
students with some hands-on experience with the specic technique, but also will enable them to
explore various estimation issues pertaining to the covariance matrix. In addition, some simulationbased exercises are also suggested, with analytical support provided in Appendix C.
As this paper is not meant to provide quick recipes for students to perform shrinkage estimation,
it pays special attention to the underlying concepts. It also provides an alternative formulation
of the same approach that shrinks the sample correlation matrix towards an identity matrix. To
allow students to attempt shrinkage estimation of the correlation matrix, a simplied version that
has been used in the various life science studies is suggested. The practical relevance of such an
approach, as well as the attendant analytical complications of a more sophisticated version, will
also be addressed in Section 5. Finally, Section 6 provides some concluding remarks.
Before proceeding to Section 3, notice that, although this paper is primarily a pedagogic paper,
with Excel tools utilized to illustrate shrinkage estimation that has been established elsewhere, it
still has a research contribution. As it will become clear in Section 4, the contribution pertains to
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
e1 ; R
e2 ; : : : ; R
en : For each variable R
ei ; where i =
Consider a set of n random variables, labeled as R
1; 2; : : : ; n; we have T observations, labeled as Ri1 ; Ri2 ; : : : ; RiT : Each observation t actually consists
of the set of observations of R1t ; R2t ; : : : ; Rnt ; for t = 1; 2; : : : ; T: Thus, the set of observations for
these random variables can be captured by an n
1
T
and
sij =
respectively, where Ri =
1
T
PT
1
T
t=1 Rit
XT
XT
t=1
and Rj =
t=1
Rit
1
T
PT
Rit
Ri
Ri
Rjt
t=1 Rjt
(1)
Rj ;
(2)
Notice that the sample covariance sii is the same as the sample variance s2i : The n
n matrix,
where each element (i; j) being sij ; for i; j = 1; 2; : : : ; n; is the sample covariance matrix, labeled
b Notice also that V
b is symmetric, with sij = sji ; for i; j = 1; 2; : : : ; n:
here as V:
3.1
0; for any
b to be also
n-element column vector x; where the prime indicates matrix transposition. For V
positive denite, x0 Vb x must be strictly positive for any x with at least one non-zero element. We
b is always positive semidenite. For V
b to be positive denite, some
show in Appendix A that V
conditions must be satised. As shown pedagogically in Kwan (2010), to be positive denite, the
b must have a positive determinant. We also show in Appendix A that,
sample covariance matrix V
b is estimated with insu cient observations (that is, with T
if V
n); its determinant is always
zero. If so, it is not positive denite.
3
Here and in what follows, we have assumed that students are already familiar with summation signs and basic
matrix operations. For students with inadequate algebraic skills, the materials in this section are best introduced
after they have acquired some hands-on experience with Excel functions pertaining to matrix operations.
Notice that the sample covariance matrix Vb is not always invertible even if it is estimated with
su cient observations. To ensure its invertibility, the following conditions must hold: First, no
ei can be a constant, as this situation will result in both row i and column i of Vb being zeros.
R
ei can be replicated by a linear combination of any of the remaining n 1 variables.
Second, no R
This replication will result in row (column) i of Vb being a linear combination of some other rows
(columns), thus causing its determinant to be zero.4
3.2
b x is
any n-element column vector x with at least one non-zero element, the matrix product x0 D
bx
always strictly positive. This is because, with xi being element i of vector x; we can write x0 D
Pn
explicitly as i=1 x2i sii ; which is strictly positive, as long as at least one of x1 ; x2 ; : : : ; xn is dierent
from zero.
The idea of shrinkage estimation of the covariance matrix is to take a weighted average of Vb
b With being the weight assigned to D;
b we can write the weighted average as
and D:
b = (1
C
b
)Vb + D:
(3)
b x = (1
x0 C
b x > 0:
)x0 Vb x + x0 D
(4)
shrinkage of all pairwise covariances, simply ignores the existence of any covariances of the random
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
10
< 0 or
< 1: It
will be shown in Section 4 that, under the formulation of quadratic loss minimization, the optimal
shrinkage intensity will always correspond to 0 <
3.3
< 1:
An Excel Example
To illustrate covariance matrix estimation, let us consider a simple Excel example where there are
seven random variables (n = 7) and six observations (T = 6): The example is shown in Figure
1. Although the number of observations is obviously much too low for a meaningful estimation,
our purpose here is to illustrate rst how insu cient observations, with T < n; aect the positive
deniteness of the sample covariance matrix. We then illustrate in the same Excel example that a
weighted average of the sample covariance matrix and a diagonal matrix (with its diagonal elements
containing the corresponding sample variances) is always positive denite.
The 6 7 blocks of cells, B3:H8, show the set of observations, with each column there containing
the observations for each variable.
copy-and-paste operations are also used to generate analogous cell formulas. The individual cell
formulas can be found in the supplementary Excel le (shrink.xls).
The 7
Notice that, to perform MMULT, as well as other matrix operations, the destination cells
matrix is the determinant of the submatrix containing its rst i rows and its rst i columns, for
i = 1; 2; : : : ; n: Thus, there are seven leading principal minors in a 7
As illustrated in Kwan (2010), all leading principal minors of a positive denite matrix are positive,
and a positive denite matrix has all positive leading principal minors.
11
A
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
Obs\Var
1
10
9
16
6
1
12
2
12
11
5
3
4
1
3
9
2
8
6
9
2
4
2
5
5
13
5
22
5
17
7
18
1
8
11
6
8
2
8
4
16
6
7
12
2
9
2
1
10
MeanRemoved
Obs\Var
1
1
4
2
15
3
10
4
0
5
5
6
6
2
11
12
4
4
3
2
3
6
1
5
3
12
1
4
4
7
3
15
3
20
5
9
15
10
7
0
3
6
6
0
6
2
18
4
7
7
7
4
3
6
5
2
47.4
62.0
10.4
16.2
68.2
4.0
32
32.2
3
28.6
10.4
43.2
20.6
19.0
56.8
25
25.4
4
44.8
16.2
20.6
141.6
52.8
2.0
32
32.0
5
75.8
68.2
19.0
52.8
92.8
22.4
48
48.8
6
39.6
4.0
56.8
2.0
22.4
83.2
37
37.6
7
46.6
32.2
25.4
32.0
48.8
37.6
36
36.8
LeadPrMin
80.4
2738.04
87071.056
5836124.52
11958160.4
1.74E08
2.3747E
3747E22
22
4
35.84
12.96
16.48
141.60
42.24
1.60
25.60
5
60.64
54.56
15.20
42.24
92.80
17.92
39.04
6
31.68
3.20
45.44
1.60
17.92
83.20
30.08
7
37.28
25.76
20.32
25.60
39.04
30.08
36.80
LeadPrMin
80.4
3546.8736
129639.83
13673532.3
396669294
1.2585E+10
1.3575E+11
1
2
3
4
5
6
Mean
CovMat
1
2
3
4
5
6
7
ScrollBar
ShrinkInt
1
80.4
47.4
28.6
44.8
75.8
39.6
46
46.6
2000 (from0to10000)
0.200
CovarianceMatrixafterShrinkage
1
2
3
1
80.40
37.92
22.88
2
37.92
62.00
8.32
3
22.88
8.32
43.20
4
35.84
12.96
16.48
5
60.64
54.56
15.20
6
31.68
3.20
45.44
7
37.28
25.76
20.32
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
12
As Excel has a function (MDETERM) for computing the determinant, it is easy to nd all
leading principal minors of a given matrix. A simple way is to use cut-and-paste operations for the
task. With the formula for cell J22, which is =MDETERM($B$22:B22), rst pasted to cells K23,
L24, M25, N26, O27, and P28 diagonally, we can subsequently move these cells back to column J to
allow J22:J28 to contain all seven leading principal minors. Not surprisingly, the rst ve leading
principal minors are all positive, indicating that the 5
rst ve random variables (n = 5) has been estimated with su cient observations (T = 6): When
the remaining variables are added successively, we expect the determinants of the corresponding
6
6 and 7
computations, two small non-zero values are reached instead. As the product of the seven sample
variances in the example is about 8:66
and 2:37
10
22 ;
1:74
10
To illustrate the eect of shrinkage, we insert a scroll bar to the worksheet from the Insert
tab of the Developer menu.
With the scroll bar in place, we can adjust the shrinkage inten-
sity manually and observe the corresponding changes to the estimated covariance matrix and
the seven leading principal minors.
= 0:200;
all positive, illustrating that shrinkage is eective as a remedial measure for insu cient observations
in sample estimation of the covariance matrix.
For an n
1) covariances in total.
To
introduce the idea of optimal shrinkage, let us start with a simple case where shrinkage is conned
to only one of these covariances.
ij :
and 1
which is [(1
)sij
2
ij ] ;
can be
13
viewed as a loss. With sij being a random variable, we are interested in nding a particular
that
2
ij ] g:
Here,
provides the lowest expected value of the squared deviation, labeled as Ef[(1
)sij
E is the expected value operator, with E( ) indicating the expected value of the random variable
( ) in question. Notice that the reliance on quadratic loss minimization is quite common because of
its analytical convenience. A well-known example is linear regression, where the best t according
to the ordinary-least-squares approach is that the sum of squared deviations of the observations
from the tted line is minimized.
The same idea can be extended to account for all individual covariances.
matrix is symmetric, we only have to consider the n(n
As the covariance
gle, where j > i (or, equivalently, in its lower triangle, where j < i): Analytically, we seek a
common weighting factor
with each being [(1
)sij
2
ij ] ;
As shown in Appendix B, the optimal shrinkage intensity based on minimization of the loss
function is
=P
j>i V
j>i [V
ar(sij )
2 ]:
ij
ar(sij ) +
Notice that, the variance V ar( ) of any random variable ( ); dened as Ef[( )
written as E[( )2 ]
(5) is equivalent to
j>i
= P
2
j>i E(sij )
2
ij
V ar(sij )
(5)
E( )]2 g; can also be
ij ;
equation
(6)
Before addressing
the estimation issues, notice that both the numerator and the denominator in the expression of
in equation (5) are positive. With the denominator being greater, we must have 0 <
< 1 as
intended. This analytical feature ensures positive weights on both the sample covariance matrix
and the diagonal target. It also ensures that the resulting covariance matrix be positive denite,
as illustrated in Section 3.
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
14
4.1
Estimation Issues
The estimation of V ar(sij ) can follow the same approach as described in Kwan (2009), which draws
on Schfer and Strimmer (2005).
Ri )(Rjt
wij =
1 XT
(Rit
t=1
T
Ri )(Rjt
Rj );
(7)
T
T
wij :
(8)
In view of equation (8), the sampling variances of sij and wij ; labeled as Vd
ar(sij ) and Vd
ar(wij );
Vd
ar(sij ) =
T2
Vd
ar(wij ):
(T 1)2
(9)
As the distribution of the sample mean of a random variable based on T observations has a sampling
variance that is only 1=T of the sampling variance of the variable, it follows from equation (9) that5
Vd
ar(sij ) =
T
(T
XT
T
d
V
ar(
w
e
)
=
(wijt
ij
t=1
1)2
(T 1)3
wij )2 :
(10)
This equation allows each of the variance terms, V ar(sij ); in equation (5) to be estimated.
In various studies involving the shrinkage of the sample covariance matrix towards a diagonal
target, including those based on the Schfer-Strimmer approach as referenced in Section 1, each
E(s2ij ) in equation (6) has been approximated directly by the square of the corresponding point
estimate sij : However, recall that E(s2ij ) = V ar(sij ) + [E(sij )]2 : Such an approximation, which
implicitly assumes the equality of E(s2ij ) and [E(sij )]2 ; has the eect of understating the denominator in the expression of
shrinkage intensity :
To avoid the above bias, this paper stays with equation (5) instead for any subsequent computations. With E(sij ) =
ij ;
2
ij
ij :
The
be the estimated
according
See, for example, Kwan (2009) for a pedagogic illustration of this statistical concept.
15
=
that
4.2
2
j>i sij ;
#
we can write
=(1
)>
= =( + ) and
= = : As 1=
= 1+ = = 1+
; it follows
An Excel Example
The same Excel example in Figure 1 is continued in Figure 2. We now illustrate how equation (5)
can be used to determine the optimal shrinkage intensity. For this task, we rst scale the sample
covariance matrix in B22:H28 by the factor (T
7 matrix
This step
2 (= 21) cases where j > i for computational e ciency, symmetry of the covariance
matrix allows us to reach the same shrinkage intensity by considering all 42 cases of j 6= i instead.
As wijt = wjit ; for i; j = 1; 2; : : : ; 7 and t = 1; 2; : : : ; 6; displaying all 42 cases explicitly will enable
us to recognize readily any errors in subsequent cell formulas pertaining to the individual wijt :
The individual cases of (wijt
wii )2 : Notice that each column of six cells here containing (wijt
is =A112/(A112+A115), as stored in A119. The covariance matrix after shrinkage and its leading
principal minors, which are shown in B123:H129 and J123:J129, respectively, can be established
in the same manner as those in B35:H41 and J35:J41. As expected, the seven leading principal
minors are all positive; the covariance matrix after shrinkage is positive denite, notwithstanding
the fact that the sample covariance matrix itself is based on insu cient observations.
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
16
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
A
wijbar
1
67.00
39.50
23.83
37.33
63.17
33.00
38.83
2
39.50
51.67
8.67
13.50
56.83
3.33
26.83
3
23.83
8.67
36.00
17.17
15.83
47.33
21.17
4
37.33
13.50
17.17
118.00
44.00
1.67
26.67
5
63.17
56.83
15.83
44.00
77.33
18.67
40.67
6
33.00
3.33
47.33
1.67
18.67
69.33
31.33
7
38.83
26.83
21.17
26.67
40.67
31.33
30.67
1
2246.76
2 2246.76
3
817.96
108.16
4 2007.04
262.44
5 5745.64 4651.24
6 1568.16
16.00
7 2171.56 1036.84
SquareofMeanRemovedw1jt
1
20.25
2
19740.25
3
0.25
4
1560.25
5
2970.25
6
2652.25
SquareofMeanRemovedw2jt
1
20.25
2 19740.25
3
0.25
4 1560.25
1560 25
5 2970.25
6 2652.25
SquareofMeanRemovedw3jt
1
0.03 3287.11
2
78.03
11.11
3
684.69
128.44
4
568.03
427.11
5 1308.03 1995.11
6
890.03
44.44
SquareofMeanRemovedw4jt
1 2844.444 3306.25
2 4578.778 4970.25
3 53.77778
2.25
4 1393.778 2162.25
5 2738.778
20.25
6 6833.778 2862.25
817.96
108.16
2007.04
262.44
424.36
5745.64
4651.24
361.00
2787.84
1568.16
16.00
3226.24
4.00
501.76
2171.56
1036.84
645.16
1024.00
2381.44
1413.76
1
2
3
4
5
6
7
SqCov
424.36
361.00
3226.24
645.16
2787.84
4.00
1024.00
501.76
2381.44
1413.76
0.03
78.03
684.69
568.03
1308.03
890.03
2844.44
738.03
4578.78 26190.03
53.78 1356.69
1393.78 3990.03
2738.78 3990.03
6833.78 2040.03
81.00
1089.00
729.00
1089.00
3249.00
81.00
117.36
4378.03
1.36
1508.03
78.03
78.03
3927.111
11.11111
427.1111
128 4444
128.4444
3287.111
128.4444
2516.694
3268.028
117.3611
220 0278
220.0278
2010.028
1356.694
1456.69
128.44
0.69 2240.44
1167.36
300.44
1356.69 1708.44
250.69 28448.44
354.69 2635.11
434.03
200.69
1.36
910.03
2584.03
684.69
3287.111
11.11111
128.4444
427 1111
427.1111
1995.111
44.44444
3306.25
4970.25
2.25
2162 25
2162.25
20.25
2862.25
46.69
584.03
1034.69
774.69
354.69
8.03
46.69444
584.0278
1034.694
774.6944
354.6944
8.027778
1778.028
15170.03
283.3611
831 3611
831.3611
3230.028
3948.028
6400
3721
196
3721
1936
256
498.7778
2.777778
386.7778
802.7778
2738.778
6669.444
2988.444
498.7778
215.1111
336.1111
1995.111
5377.778
17
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
A
B
C
D
SquareofMeanRemovedw5jt
1
738.03 1778.03 1456.69
2 26190.03 15170.03
0.69
3 1356.69
283.36 1167.36
4 3990.03
831.36 1356.69
5 3990.03 3230.03
250.69
6 2040.03 3948.03
354.69
SquareofMeanRemovedw6jt
1
81.00 3927.11
128.44
2 1089.00
11.11 2240.44
3
729.00
427.11
300.44
4 1089.00
128.44 1708.44
5 3249.00 3287.11 28448.44
6
81.00
128.44 2635.11
SquareofMeanRemovedw7jt
1
117.36 2516.69
434.03
2 4378.03 3268.03
200.69
3
1.36
117.36
1.36
4 1508.03
220.03
910.03
5
78.03 2010.03 2584.03
6
78.03 1356.69
684.69
6400.00
3721.00
196.00
3721.00
1936.00
256.00
G
1248.44
348.44
1708.44
1067.11
348.44
44.44
498.78
2.78
386.78
802.78
2738.78
6669.44
1248.44
348.44
1708.44
1067.11
348.44
44.44
2988.44
498.78
215.11
336.11
1995.11
5377.78
498.78
4138.78
0.44
386.78
1653.78
658.78
498.78
4138.78
0.44
386.78
1653.78
658.78
113.78
981.78
53.78
1393.78
5877.78
128.44
113.78
981.78
53.78
1393.78
5877.78
128.44
SumofAllEstimatedVarSij
25786.91
SumofAllSijSquared
66802.72
OptimalShrinkageIntensity
UsingAboveResults
0.278508
UsingFunction(SHRINK)
0.278508
CovarianceMatrixafterShrinkage
1
2
3
1
80.40
34.20
20.63
2
34.20
62.00
7.50
3
20.63
7.50
43.20
4
32.32
11.69
14.86
5
54.69
49.21
13.71
6
28.57
2.89
40.98
7
33.62
23.23
18.33
4
32.32
11.69
14.86
141.60
38.09
1.44
23.09
UsingMacro(Shortcut:Ctrl+s)
0.278508
5
54.69
49.21
13.71
38.09
92.80
16.16
35.21
6
28.57
2.89
40.98
1.44
16.16
83.20
27.13
7
33.62
23.23
18.33
23.09
35.21
27.13
36.80
LeadPrMin
80.4
3815.24605
144483.064
16512423.7
639771526
2.6442E+10
3.7954E+11
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
18
Although the computations as illustrated above are straightforward, the size of the worksheet
will increase drastically if the sample covariance matrix is based on many more observations. The
increase is due to the need for a total of n2 T cells to store the individual values of (wijt
wij )2 ; for
wij )2 as the
For each case of i; we vary j from i + 1 to n; which is 7 in the example. For each case of j; we vary
t from 1 to T; which is 6 in the example. Starting with i = 1; j = i + 1 = 2; and t = 1; we add
(wijt
wij )2 = (w121
w12 )2 to the initial sum, which is zero, thus resulting in a cumulative sum of
(w121
w12 )2 : The subsequent terms to be accumulated to the sum are, successively, (w122
(w123
w12 )2 ; : : : ; (w126
w12 )2 ;
We can use two of the above three nested loops in the same user-dened function to compute
the sum of s2ij ; for i; j = 1; 2; : : : ; n and i 6= j; as well. The idea is to have a separate cumulative
sum, which is also initialized to be zero in the computer memory. By going through all cases of
i from 1 to n
all n(n
1 and, for each i; all cases of j from i + 1 to n; the cumulative sum will cover
1)=2 cases of s2ij : With such a function in place, there will be no need for B54:H60 and
B62:H109 in the worksheet to display the individual values of s2ij and (wijt
wij )2 ; respectively.
The VBA code of this user-dened function, named SHRINK, as described above, is provided in
Appendix D. The cell formula for D119, which uses the function, is =SHRINK(B14:H19,B46:H52).
The two arguments of the function are the cell references of the T
observations and the n
n matrix with each element (i; j) being wij : In essence, by retrieving the
row numbers and the column numbers of the stored data in the worksheet, we are able to use the
information there to provide the cumulative sums of s2ij and (wijt
results based on the two alternative approaches, as shown in A119 and D119, are the same.
A more intuitive way to call the function SHRINK is to use a Sub procedure, which is an Excel
macro dened by the user, that allows the user to provide, via three input boxes, the cells for the
19
two arguments as required for the function to work, as well as the cell for displaying the end result.
The VBA code of this Sub procedure is also provided in Appendix D. Again, as expected, the end
result as shown in G119 is the same to those in A119 and D119.
In this paper, we have illustrated that shrinkage estimation for potentially improving the quality
of the sample covariance matrix can be covered in classes where estimation issues of the covariance
matrix are taught. From a pedagogic perspective, it is useful to address the issue of estimation
errors in the sample covariance matrix before introducing shrinkage estimation. Students ought
to be made aware of the fact that the sample covariance matrix is only an estimate of the true but
unknown covariance matrix. As each of the sample variances and covariances of the set of random
variables in question is itself a sample statistic, knowledge of the statistical concept of sampling
distribution will enable students to appreciate more fully what V ar(sij ) is all about. Simply stated,
the sampling distribution is the probability distribution of a sample statistic. By recognizing that
each sij is a sample statistic, students will also recognize that it has a distribution and that the
corresponding V ar(sij ) represents the second moment of the distribution.
For a given set of random variables, how well each sample covariance sij is estimated can be
assessed in terms of standard error, which is the square root of the sampling variance that Vd
ar(sij )
represents, relative to the point estimate itself.
worksheet in Figure 2, we have s12 = 47:4: According to equation (10), the sampling variance
of s12 ; which is Vd
ar(s12 ); can be computed by multiplying the sum of the six cells
q in C62:C67
3
ar(s12 ) =
with the factor T =(T
1) ; where T = 6; The standard error of s12 ; which is Vd
p
1; 293:29 = 35:96; is quite large relative to the point estimate of s12 = 47:4: Given that the
covariance estimation here is based on only six observations, a large estimation error is hardly a
surprise.
However, this example does illustrate the presence of estimation errors in the sample
covariance matrix.
To enhance the learning experience of students, some exercises are suggested below. The rst
two types of exercises are intended to help students recognize, in a greater depth, the presence of
errors in covariance matrix estimation and the impact of various factors on the magnitudes of such
errors. The third type of exercises extends the shrinkage approach to estimating the correlation
matrix.
variables are in dierent measurement units, as often encountered in life science studies.
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
20
5.1
Students can benet greatly from covariance matrix estimation by utilizing actual observations that
are relevant in their elds of studies. Depending on the elds involved, the observations can be in
the form or empirical or experimental data. In nance, for example, stock returns can be generated
from publicly available stock price and dividend data. Small-scale exercises for students, such as
those involving the sample covariance matrix of monthly or weekly returns of the 30 Dow Jones
stocks or some subsets of such stocks, estimated with a few years of monthly observations, ought to
be manageable for students. The size of the corresponding Excel worksheet for each exercise will
be drastically reduced if a user-dened function, similar to the function SHRINK above, is written
for the computations of the sampling variances of sij ; for i; j = 1; 2; : : : ; n; with n
30: From
such exercises, students will have hands-on experience with how the sampling variance that each
Vd
ar(sij ) represents tends to vary as the number of observations for the estimation increases.
5.2
In view of concerns about the non-stationarity of the underlying probability distributions in empirical studies or budgetary constraints for making experimental observations, the reliance on simulations, though tending to be tedious, is a viable way to generate an abundant amount of usable
data for pedagogic purposes. An attractive feature of using simulated data is that we can focus
on some specic issues without the encumbrance of other confounding factors. For example, with
everything else being the same, the more observations we have, the closer tends to be between the
sample covariance matrix and the true one. The use of a wide range of numbers of random draws
will allow us to assess how well shrinkage estimation really helps in small to large sample situations.
Further, simulated data are useful for examining issues of whether shrinkage estimation is more
eective or less eective if the underlying variables are highly correlated, or if the magnitudes of
the variances of the underlying variables are highly divergent.
simulations in the context of covariance matrix estimation, consider a four-variable case, where the
21
6
V =6
4
ij ;
for
36
12
24
36
i; j
3
12 24 36
64 32 24 7
7;
(11)
32 100 48 5
24 48 144
= 1; 2; 3; 4: The positive deniteness of V can easily
be conrmed with Excel, by verifying the positive sign of each of its four leading principal minors.
The analytical details pertaining to the generation of simulated observations for estimating V ; from
the underlying probability distribution, are provided in Appendix C.
In this illustration, where n = 4; we consider T = 5; 6; 7; : : : ; 50: The lowest T is 5; because
it is the minimum number of observations to ensure that the 4
positive denite. For each T; the random draws to generate a set of simulated observations Rit ; for
i = 1; 2; 3; 4 and t = 1; 2; : : : ; T; and the corresponding sample covariance matrix Vb are repeated
100 times. The average of the 100 values of each estimated covariance sij is taken, for all j > i:
Let s12 ; s13 ; s14 ; s23 ; s24 ; and s34 be such averages.
For each value of the shrinkage intensity, with
between jsij
ij j
and j sij
ij j
for all j > i: We then take the average of all such dierences,
denoted in general by
D=
Here, n(n
j sij
(12)
(jsij
ij j
ij j) :
j>i
n(n 1)
1)=2 is the number of covariances in the upper triangle of the covariance matrix. This
average can be interpreted as the dierence in the mean absolute deviations for the two estimation
methods.
Figure 3 shows
of observations, a higher shrinkage intensity will tend to result in a greater improvement in the
estimated covariances.
5.3
For analytical convenience, the pedagogic illustration in this paper has been conned to shrinkage
estimation of the sample covariance matrix towards a diagonal target matrix with its diagonal ele-
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
22
20
15
10
-5
-10
0
10
20
30
40
50
Number of Observations
Figure 3 Differences in Mean Absolute Deviations from the True Covariances for
Three Shrinkage Intensities from s Simulation Study, with a Positive Difference
Indicating an Improvement by Shrinkage Estimation.
23
ments being the corresponding sample variances. An extension of the same approach to shrinkage
estimation of the correlation matrix towards an identity matrix is simple, provided that the simplifying assumption in Schfer and Strimmer (2005) is also imposed. That is, if estimation errors
in all sample variances are ignored.
e1 =s1 ; R
e2 =s2 ; : : : ; R
en =sn ; the
The idea is that, if we start with a set of n random variables, R
is, if each of the n random variables is normalized by the corresponding sample standard deviation
si ; the resulting sample covariance matrix is the same as the sample correlation matrix of the
original random variables. For this set of normalized random variables, shrinkage estimation of its
covariance matrix towards a diagonal target matrix is equivalent to shrinking the sample correlation
e1 ; R
e2 ; : : : ; R
en ; towards an identity matrix.
matrix of the original random variables, R
For shrinkage estimation of the correlation matrix instead, equation (5) can be written as
P
j>i V ar(rij )
(13)
=P
2 ;
j>i [V ar(rij ) + ij ]
where rij = sij =(si sj ) is the sample correlation of the original random variables i and j; with
ij
are without estimation errors as in the Schfer-Strimmer study, each V ar(rij ) in equation (13) is
simply V ar(sij ); divided by the product of the sample estimates sii and sjj : Likewise,
2
ij
can be
approximated by s2ij =(sii sii ); the square of the sample estimate sij ; also divided by the product of
the sample estimates sii and sjj : In view of the simplicity of this revised formulation of shrinkage
estimation, its small-scale implementation with actual observations is also suitable as an exercise
for students. However, to relax the above assumption by recognizing the presence of estimation
errors in the individual sample variances is a tedious analytical exercise.6
Why is shrinkage estimation of the sample correlation matrix relevant in practice? In portfolio
investment settings, for example, to generate input data for portfolio analysis to guide investment
decisions, if the individual expected returns and variances of returns are, in whole or in part, based
on the insights of the security analysts involved, the correlation matrix is the remaining input
whose estimation requires historical return data. If so, although the correlation matrix can still
6
Besides the issue of estimation errors in the sample variances that already complicates the estimation of V ar(rij );
there is a further analytical issue. As indicated in Zimmerman, Zumbo, and Williams (2003), the sample correlation
provides a biased estimate of the true but unknown correlation. However, as Olkin and Pratt (1958) show, the bias
can easily be corrected if the underlying random variables are normally distributed. The correction for bias will
make equation (13) more complicated. See, for example, Kwan (2008, 2009) for analytical details pertaining to the
above issues.
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
24
be deduced from the shrinkage results of the covariance matrix, to estimate the correlation matrix
instead is more direct.
Another justication for directly shrinking the sample correlation matrix does not apply to
portfolio investment settings. Rather, it pertains to experimental settings, such as those in the
various life science studies, where dierent measurement units are used for the underlying variables.
Measurement units inevitably aect the magnitudes of the elements in each sample covariance
matrix.
When a quadratic loss function is used to determine the optimal shrinkage intensity,
sample covariances with larger magnitudes tend to receive greater attention. Thus, to avoid the
undesirable eects of the choice of measurement units on the optimal shrinkage results, it becomes
necessary to normalize the random variables involved.
will also lead to analytical complications. Nevertheless, such issues ought to be mentioned when
shrinkage estimation is introduced to the classroom.
Concluding Remarks
This paper has illustrated a novel approach, called shrinkage estimation, for potentially improving
the quality of the sample covariance matrix for a given set of random variables.
The approach,
which was introduced to the nance profession, including its practitioners, only a few years ago,
has also received considerable attention in some life science elds where invertible covariance and
correlation matrices are used to analyze multivariate experimental data. Although the implementation of the approach can be based on various analytical formulations, with some formulations
being analytically cumbersome, the two most common versions as reported in various life science
studies are surprisingly simple. Specically, in one version, an optimal weighted average is sought
between the sample covariance matrix and a diagonal matrix with the diagonal elements being the
corresponding sample variances. The other version involves the sample correlation matrix and an
identity matrix of the same dimensions instead, under some simplifying assumptions.
This paper has considered, from a pedagogic perspective, the former version, which involves the
sample covariance matrix. In order to understand shrinkage estimation properly, even for such a
simple version, an important concept for students to have is that the sample covariance matrix,
which is estimated with observations of the random variables considered, is subject to estimation
errors.
Once students are aware of this statistical feature and know its underlying reason, they
can understand why a sample covariance of two random variables is a sample statistic and what
the sampling variance of such a statistic represents.
25
estimation will allow students to follow the computational steps involved, thus facilitating a better
understanding of the underlying principle of the approach.
The role of Excel in this pedagogic illustration is indeed important. As all computational results
are displayed on the worksheets involved, students can immediately see how shrinkage estimation
improves the quality of the sample covariance matrix. For example, in cases where estimations are
based on insu cient observations, students can easily recognize, from the displayed values of the
leading principal minors, that the corresponding sample covariance matrix is problematic. They
can also recognize that shrinkage estimation is a viable remedial measure. What is attractive about
using Excel for illustrative purposes is that, as students will not be distracted by the attendant
computational chores, they can focus on understanding the shrinkage approach itself.
As having hands-on experience is important for students to appreciate better what shrinkage
estimation can do to improve the quality of the sample covariance matrix, it is useful to assign
relevant Excel-based exercises to students. In investment courses, for example, shrinkage estimation
of the covariance matrix of asset returns can be in the form of some stand-alone exercises for
students. It can also be part of a project for students that compares portfolio investment decisions
based on dierent characterizations of the covariance matrix. From the classroom experience of the
author as an instructor of investment courses, the hands-on experience that students have acquired
from Excel-based exercises is indeed valuable. Such hands-on experience have enabled students not
only to be more procient in various Excel skills, but also to understand the corresponding course
materials better. This paper, which has provided a pedagogic illustration of shrinkage estimation,
is intended to make classroom coverage of such a useful analytical tool less technically burdensome
for students.
References
Bauwens, L., Laurent, S., and Rombouts, J.V.K., (2006). Multivariate GARCH models: a survey.
Journal of Applied Econometrics, 21(1), 79-109.
Beerenwinkel, N., Antal, T., Dingli, D., Traulsen, A., Kinzler, K.W., Velculescu, V.E., Vogelstein,
B., and Nowak, M.A., (2007). Genetic Progression and the Waiting Time to Cancer. PLoS Computational Biology, 3(11), 2239-2246.
Dabney, A.R., and Storey, J.D., (2007). Optimality Driven Nearest Centroid Classication from
Genomic Data. PLoS ONE, 10, e1002.
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
26
Disatnik, D.J., and Benninga, S., (2007). Shrinking the Covariance Matrix: Simpler Is Better.
Journal of Portfolio Management, Summer, 55-63.
Dobbin, K.K., and Simon, R.M., (2007). Sample Size Planning for Developing Classiers Using
High-Dimensional DNA Microarray Data. Biostatistics, 8(1), 101-117.
Jaggannathan, R., and Ma, T., (2003). Risk Reduction in Large Portfolios: Why Imposing the
Wrong Constraints Helps. Journal of Finance, 58(4), 1651-1683.
Kriegeskorte, N., Goebel, R., and Bandettini, P. (2006) Information-Based Functional Brain Mapping. PNAS, 103(10), 3863-3868.
Kwan, C.C.Y., (2008). Estimation Error in the Average Correlation of Security Returns and Shrinkage Estimation of Covariance and Correlation Matrices. Finance Research Letters, 5, 236-244.
Kwan, C.C.Y., (2009). Estimation Error in the Correlation of Two Random Variables:
27
Opgen-Rhein, R., and Strimmer, K., (2007a). Accurate Ranking of Dierentially Expressed Genes
by a Distribution-Free Shrinkage Approach. Statistical Applications in Genetics and Molecular
Biology, 6(1), Article 9.
Opgen-Rhein, R., and Strimmer, K., (2007b). Learning Causal Networks from Systems Biology
Time Course Data: An Eective Model Selection Procedure for the Vector Autoregressive Process.
BMC Bioinformatics, 8, Supplement 2, S3.
Schfer, J., and Strimmer, K., (2005). A Shrinkage Approach to Large-Scale Covariance Matrix
Estimation and Implications for Functional Genomics. Statistical Applications in Genetics and
Molecular Biology, 4(1), Article 32.
Silvennoinen, A., and Tersvirta, T., (2009). Multivariate GARCH Models, in Handbook of Financial Time Series, Anderson, T.G., Davis, R.A., Kreiss, J.-P., and Mikosch, T., editors, Springer,
201-232.
Tenenhaus, A., Guillemot, V., Gidrol, X., and Frouin, V., (2010). Gene Association Networks from
Microarray Data Using a Regularized Estimation of Partial Correlation Based on PLS Regression.
IEEE/ACM Transactions on Computational Biology and Bioinformatics, 7(2), 251-262.
Werhli, A.V., Grzegorczyk, M., and Husmeier, D., (2006). Comparative Evaluation of Reverse
Engineering Gene Regulatory Networks with Relevance Networks, Graphical Gaussian Models and
Bayesian Networks. Bioinformatics, 22, 2523-2531.
Yao, J., Chang, C., Salmi, M.L., Hung, Y.S., Loraine, A., and Roux, S.J., (2008). Genome-scale
Cluster Analysis of Replicated Microarrays Using Shrinkage Correlation Coe cient. BMC Bioinformatics, 9:288.
Zimmerman, D.W., Zumbo, B.D., and Williams, R.H., (2003). Bias in Estimation and Hypothesis
Testing of Correlation. Psicolgica, 24(1), 133-158.
Appendix A
Let y be an n
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
1
Rit
T 1
Ri ; for i = 1; 2; : : : ; n and t = 1; 2; : : : ; T:
(A1)
28
As
sij =
XT
t=1
(A2)
which is the same as the product of row i of y and column j of y 0 , we can write
Vb = y y 0 :
(A3)
Accordingly, we have x0 Vb x = x0 (yy 0 ) x = (y 0 x)0 (y 0 x); which is the product of a T -element row
vector that (y 0 x)0 represents and its transpose that y 0 x represents. With vt being element t of
P
this row vector, it follows that x0 Vb x = T v 2 0: Thus, Vb is always positive semidenite.
t=1 t
We now show that, if Vb is estimated with insu cient observations, its determinant is zero. If
so, Vb is not invertible and thus is not positive denite. For this task, let us consider separately
cases where T < n and T = n: For the case where T < n; we can append a block of zeros to the
n
T matrix to make it an n
; where 0 is an n (n T )
matrix with all zero elements. With zz 0 = y y 0 ; we can also write Vb = z z 0 ; a product of two
square matrices. The determinant of Vb is the product of the determinant of z and the determinant
y 0
For the case where T = n; y is already a square matrix. With each yit being a mean-removed
p
P
observation of Rit ; scaled by the constant T 1; the sum Tt=1 yit for any i must be zero. Then,
for i = 1; 2; : : : ; n; each yit can be expressed as the negative of the sum of the remaining T
1 terms
among yi1 ; yi2 ; : : : ; yiT : That is, each column of y can be replicated by the negative of the sum of
the remaining T
of Vb :
Appendix B
As taking the expected value is like taking a weighted average, we have
nX
o X
2
E
[(1
)sij
]
=
Ef[(1
)sij
ij
j>i
Ef[( )
j>i
E( )]2 g = E[( )2 ]
2
ij ] g:
(B1)
write
Ef[(1
With
ij
E(sij ) =
)sij
2
ij ] g
= V ar[(1
)sij
)sij
ij ]
2
ij ] g
)sij
)sij
ij ]
ij ]
reduces to
= (1
)2
+ fE[(1
reduces to (1
ij :
j>i
)sij
2
ij ]g :
(B2)
)2 V ar(sij ): Further, as
It follows that
X
V ar(sij ) + 2
j>i
2
ij :
(B3)
29
2
ij ] g
)sij
2(1
j>i
V ar(sij ) + 2
2
ij
= 0:
(B4)
Appendix C
It is well known in matrix algebra that a symmetric positive denite matrix can be written as the
product of a triangular matrix with zero elements above its diagonal and the transpose of such a
triangular matrix.
Let V be an n
n covariance
matrix and L be the corresponding triangular matrix, satisfying the condition that LL0 = V :
To nd L; let us label the elements in its lower triangle as Lij ; for all j
i: Implicitly, we have
Lij = 0; for all j > i: Each Lij in the lower triangle of L can be determined iteratively as follows:
L11 =
Li1 =
Lii =
Lij
11 ;
(C1)
i1 =L11 ;
ii
1
Ljj
for i = 2; 3; : : : ; n;
Xi 1
L2ik ; for i = 2; 3; : : : ; n;
(C2)
(C3)
k=1
ij
Xj
k=1
1:
(C4)
Now, consider the standardized normal distribution, which is a normal distribution with a zero
mean and a unit standard deviation. Let us take nT random draws from this univariate distribution
and label them as uit ; for i = 1; 2; : : : ; ; n and t = 1; 2; : : : ; T: Let U be the n
T matrix consisting
PT
As each uit is a random draw, the sample mean, ui =
t=1 uit =T; approaches zero as T
PT
approaches innity. The sample variance, t=1 (uit ui )2 =(T 1); which approaches one as T
P
P
approaches innity, can be approximated as Tt=1 u2it =(T 1): The sample covariance, Tt=1 (uit
ui )(ujt uj )=(T
1); which approaches zero as T approaches innity, can be approximated as
PT
1); for all i 6= j:
t=1 uit ujt =(T
Accordingly, U U 0 =(T
1) approaches an n
The n
1) approaches
random draws from an n-variate distribution, with each row being the result of a random draw.
To generate W = LU requires the n
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
30
Appendix D
The codes in Visual Basic for Applications (VBA) of a user-dened function procedure and a Sub
procedure, for use in the Excel example, are shown below. The same codes can also be accessed
from the supplementary Excel le (shrink.xls) of this paper, by opening the Visual Basic window
under the Developer tab. Various examples pertaining to the syntax of the programming language
can be found in Excel Developer Reference, which is provided under the Help tab there.
Option Explicit
Function SHRINK(nvo As Range, wijbar1 As Range) As Double
Dim nvar As Integer, nobs As Integer, mrr As Integer
Dim mrc As Integer, wijbar As Integer
Dim nvoc As Integer, nvor As Integer
Dim i As Integer, j As Integer, t As Integer
Dim s1 As Double, sum1 As Double, s2 As Double, sum2 As Double
nvo: the cells containing all mean-removed observations
nvar: the number of variables
nobs: the number of observations
mrr: the row preceding the mean-removed observations
mrc: the column preceding the mean-removed observations
wijbar: the row preceding the square matrix of w ij bar
nvar = nvo.Columns.Count
nobs = nvo.Rows.Count
mrr = nvo.Row - 1
mrc = nvo.Column - 1
wijbar = wijbar1.Row - 1
sum1 = 0
sum2 = 0
For i = 1 To nvar - 1
For j = i + 1 To nvar
s1 = Cells(wijbar + i, mrc + j).Value
sum1 = sum1 + s1 * s1
31
For t = 1 To nobs
s2 = Cells(mrr + t, mrc + i).Value _
* Cells(mrr + t, mrc + j).Value _
- Cells(wijbar + i, mrc + j).Value
sum2 = sum2 + s2 * s2
Next t
Next j
Next i
shrink = sum2 / (sum2 + sum1 * nobs * (nobs - 1))
End Function
Sub ViaFunction()
Dim arg1 As Range, arg2 As Range, out As Range
Set arg1 = Application.InputBox(prompt:= _
"Select the cells for mean-removed observations", Type:=8)
Set arg2 = Application.InputBox(prompt:= _
"Select the cells for the matrix of w ij bar", Type:=8)
Set out = Application.InputBox(prompt:= _
"Select the cell for displaying the output", Type:=8)
Cells(out.Row, out.Column).Value = shrink(arg1, arg2)
End Sub
http://epublications.bond.edu.au/ejsie/vol4/iss3/6
32