Ledoit Wolf covariance estimator should standardize data #3508

cbrnr · 2014-07-30T14:42:12Z

If you have differently scaled features, the calculation of the shrinkage parameter using the algorithm by Ledoit and Wolf yields an incorrect estimate if the data are not standardized. Therefore, the function ledoit_wolf_shrinkage in shrunk_covariance_.py should standardize the data before computing the shrinkage parameter, and then scale the shrunk covariance matrix back at the end.

The text was updated successfully, but these errors were encountered:

eickenberg · 2014-07-31T15:59:48Z

According to the docstring, centering is done by default, but not standardization, just as you say.

def ledoit_wolf_shrinkage(X, assume_centered=False, block_size=1000):
    """Estimates the shrunk Ledoit-Wolf covariance matrix.

    Parameters
    ----------
    X : array-like, shape (n_samples, n_features)
        Data from which to compute the Ledoit-Wolf shrunk covariance shrinkage.

    assume_centered : Boolean
        If True, data are not centered before computation.
        Useful to work with data whose mean is significantly equal to
        zero but is not exactly zero.
        If False, data are centered before computation.

Maybe you could similarly add a kwarg assume_unit_variance (or something better named)?

Alternatively, adding a line in the docstring hinting at the usefulness of prepending a StandardScaler to this estimator could be helpful, too.

Unfortunately I cannot find what I am convinced I saw somewhere: I have the vague recollection that in one of their papers, Ledoit and Wolf just write cov = X.T.dot(X) accompanied by a line saying somehting along the lines of "assuming zero mean" (and I can't remember if they also said unit variance). Maybe I am wrong about this.

agramfort · 2014-08-01T00:03:41Z

centering is necessary but not standardization AFAIK. I however have the feeling that if you have a very noisy feature the perf with not be good. As LW does not assume standardized data I would let the users use StandardScaler before if they need.

cbrnr · 2014-08-01T09:19:33Z

@eickenberg, I can certainly add a new parameter. Maybe we should call it assume_scaled? IMO, centering and scaling is necessary, but it still makes sense to have parameters to turn this off (people might have centered and scaled their data before, so we don't have to do it twice).

@agramfort, I think the issue is really that if you have features on different scales, the estimated parameter will not be optimal. I will try to figure out if standardization is required, but we should add this optional feature anyway (I would argue to scale by default though).

I created PR #3521 to address this issue.

eickenberg · 2014-08-01T11:23:41Z

If you have time to look closely at the proof in Honey, I shrunk the covariance matrix then that should provide an answer, right?

agramfort · 2014-08-01T21:28:36Z

@cle1109 I've been using LW for MEG data with gradiometers and magnetometers and indeed I need to scale to make both sensors on the same order of magnitude but I use a scalar scaling and don't standardize the features.

cc @dengemann

mbillingr · 2014-08-03T13:25:21Z

@agramfort Maybe we got some terms mixed up. By standardization @cle1109 was referring to scaling each feature with its standard deviation. Isn't that scalar scaling too?

agramfort · 2014-08-03T15:09:26Z

@agramfort https://github.com/agramfort Maybe we got some terms mixed
up. By standardization @cle1109 https://github.com/cle1109 was
referring to scaling each feature with its standard deviation. Isn't that
scalar scaling too?

I meant using 1 scalar for all the same types of features (eg all
gradiometers). The diagonal of my matrices is not constant.

mbillingr · 2014-08-03T15:20:38Z

This makes sense, but I don't see how this could be applied if you don't have groups or don't know them.
(From a different perspective, standardizing the features is assuming a group size of 1 :) )

agramfort · 2014-08-03T15:23:57Z

:)

do you agree that there is no reason why LW should only work with a matrix
of constant diagonal?
to me standardizing features should really be a preprocessing outside of LW

eickenberg · 2014-08-03T15:24:25Z

As far as I understand, @agramfort 's normalization may in some settings not even be estimated from data, just scalar multiplication with constants that bring the features into the range of unit variance, the emphasis being on getting the two types of features onto the same scale

mbillingr · 2014-08-03T15:32:41Z

@agramfort I agree to some extent. Standardization is probably not strictly required for LW. However, LW fails when features are scaled with different orders of magnitude.
The need to rescale features might be common enough to justify putting this inside LW.

agramfort · 2014-08-03T15:33:12Z

Exactly

mbillingr · 2014-08-03T17:25:28Z

I have been reading through the shrinkage literature in the hope to shed light on this issue.

In their earlier papers Ledoit and Wolf use a common variance shrinkage target, which is also implemented in sklearn.
Later, in Honey, I shrunk the covariance matrix they use a constant correlation shrinkage target (which is useful for positively correlated variables).
Schäfer and Strimmer briefly discuss different shrinkage targets in Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics, focusing on an unequal variance target.

It seems that standardization is indeed not part of the LW estimator.
However, the LW estimator is not limited to the common variance target. Indeed, prior standardization of the data is equivalent to using a unequal variance target.

Wouldn't it be nice to support different shrinkage targets?

agramfort · 2014-08-03T21:58:29Z

It seems that standardization is indeed not part of the LW estimator.

we agree.

However, the LW estimator is not limited to the common variance target.
Indeed, prior standardization of the data is equivalent to using a unequal
variance target.

Wouldn't it be nice to support different shrinkage targets?

that would a be a way forward indeed. We could have a target_covariance
parameter that could specify it. This is to me however a different aim than
improving LDA as to me you can make LDA + LW work with the use of a
StandardScaler

cbrnr · 2014-08-05T05:32:39Z

I agree that the best solution for now is not to include standardization in the LW estimator. Indeed, supporting different shrinkage target other than the diagonal matrix with the mean variance would be great in the future. Specifically, LDA needs the diagonal matrix with the individual feature variances as a shrinkage target to work best.

I wrote to Olivier Ledoit, and he said that if we precondition our matrix by standardizing our features, we're not really minimizing the standard Frobenius norm, but a generalized one. In principal, preconditioning is a good idea if it captures important characteristics of the data (which it does in our case). We only need to be aware of the consequences (since the standard deviations are estimated from the same data as the sample covariance matrix, their errors might interact; also, we're using a loss function that downweights estimation errors in highly variable features).

Most importantly, he says:
"Our 2004 JMVA paper makes it clear that the improvement over the sample covariance matrix is highest when the population eigenvalues are not too dispersed. It also has a proposition saying that the population eigenvalues are at least as dispersed as the variances. Therefore we should expect that this type of pre-conditioning would enhance the performance of the shrinkage estimator when features have wildly differing variances, because it should reduce the cross-sectional dispersion of the eigenvalues of the pre-conditioned population covariance matrix."

In summary, all this can be solved by supporting different shrinkage targets.

For now, I'm standardizing the features in the LDA class if we're using shrinkage. I've addressed this in #3523, which will hopefully soon be merged once I figure out the build problems.

xiaoxionglin · 2018-12-17T13:14:37Z

Standardization will lead to ~ 0 dispersed eigenvalues, thus will increase shrinkage systematically and dramatically.
not saying if it's good or bad, but it makes a huge difference.
now the problems is, shrinkage='auto' standardizes while shrinkage=[any constant] does not, which will create big unexpected difference.

charlesbmi · 2021-11-17T17:14:33Z

I know this is an old thread, but would love to see the diagonal matrix supported as a shrinkage target in the future!

cbrnr mentioned this issue Aug 1, 2014

Added optional parameter to scale the data before shrinkage #3521

Closed

cbrnr closed this as completed Aug 5, 2014

charlesbmi mentioned this issue Nov 21, 2021

Non-identity shrinkage targets for covariance estimation #21734

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ledoit Wolf covariance estimator should standardize data #3508

Ledoit Wolf covariance estimator should standardize data #3508

cbrnr commented Jul 30, 2014

eickenberg commented Jul 31, 2014

agramfort commented Aug 1, 2014

cbrnr commented Aug 1, 2014

eickenberg commented Aug 1, 2014

agramfort commented Aug 1, 2014

mbillingr commented Aug 3, 2014

agramfort commented Aug 3, 2014

mbillingr commented Aug 3, 2014

agramfort commented Aug 3, 2014

eickenberg commented Aug 3, 2014

mbillingr commented Aug 3, 2014

agramfort commented Aug 3, 2014

mbillingr commented Aug 3, 2014

agramfort commented Aug 3, 2014

cbrnr commented Aug 5, 2014

xiaoxionglin commented Dec 17, 2018

charlesbmi commented Nov 17, 2021

Ledoit Wolf covariance estimator should standardize data #3508

Ledoit Wolf covariance estimator should standardize data #3508

Comments

cbrnr commented Jul 30, 2014

eickenberg commented Jul 31, 2014

agramfort commented Aug 1, 2014

cbrnr commented Aug 1, 2014

eickenberg commented Aug 1, 2014

agramfort commented Aug 1, 2014

mbillingr commented Aug 3, 2014

agramfort commented Aug 3, 2014

mbillingr commented Aug 3, 2014

agramfort commented Aug 3, 2014

eickenberg commented Aug 3, 2014

mbillingr commented Aug 3, 2014

agramfort commented Aug 3, 2014

mbillingr commented Aug 3, 2014

agramfort commented Aug 3, 2014

cbrnr commented Aug 5, 2014

xiaoxionglin commented Dec 17, 2018

charlesbmi commented Nov 17, 2021