-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
Ledoit Wolf covariance estimator should standardize data #3508
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
According to the docstring, centering is done by default, but not standardization, just as you say.
Maybe you could similarly add a kwarg Alternatively, adding a line in the docstring hinting at the usefulness of prepending a Unfortunately I cannot find what I am convinced I saw somewhere: I have the vague recollection that in one of their papers, Ledoit and Wolf just write |
centering is necessary but not standardization AFAIK. I however have the feeling that if you have a very noisy feature the perf with not be good. As LW does not assume standardized data I would let the users use StandardScaler before if they need. |
@eickenberg, I can certainly add a new parameter. Maybe we should call it @agramfort, I think the issue is really that if you have features on different scales, the estimated parameter will not be optimal. I will try to figure out if standardization is required, but we should add this optional feature anyway (I would argue to scale by default though). I created PR #3521 to address this issue. |
If you have time to look closely at the proof in Honey, I shrunk the covariance matrix then that should provide an answer, right? |
@cle1109 I've been using LW for MEG data with gradiometers and magnetometers and indeed I need to scale to make both sensors on the same order of magnitude but I use a scalar scaling and don't standardize the features. cc @dengemann |
@agramfort Maybe we got some terms mixed up. By standardization @cle1109 was referring to scaling each feature with its standard deviation. Isn't that scalar scaling too? |
|
This makes sense, but I don't see how this could be applied if you don't have groups or don't know them. |
:) do you agree that there is no reason why LW should only work with a matrix |
As far as I understand, @agramfort 's normalization may in some settings not even be estimated from data, just scalar multiplication with constants that bring the features into the range of unit variance, the emphasis being on getting the two types of features onto the same scale |
@agramfort I agree to some extent. Standardization is probably not strictly required for LW. However, LW fails when features are scaled with different orders of magnitude. |
Exactly |
I have been reading through the shrinkage literature in the hope to shed light on this issue.
It seems that standardization is indeed not part of the LW estimator. Wouldn't it be nice to support different shrinkage targets? |
we agree.
|
I agree that the best solution for now is not to include standardization in the LW estimator. Indeed, supporting different shrinkage target other than the diagonal matrix with the mean variance would be great in the future. Specifically, LDA needs the diagonal matrix with the individual feature variances as a shrinkage target to work best. I wrote to Olivier Ledoit, and he said that if we precondition our matrix by standardizing our features, we're not really minimizing the standard Frobenius norm, but a generalized one. In principal, preconditioning is a good idea if it captures important characteristics of the data (which it does in our case). We only need to be aware of the consequences (since the standard deviations are estimated from the same data as the sample covariance matrix, their errors might interact; also, we're using a loss function that downweights estimation errors in highly variable features). Most importantly, he says: In summary, all this can be solved by supporting different shrinkage targets. For now, I'm standardizing the features in the LDA class if we're using shrinkage. I've addressed this in #3523, which will hopefully soon be merged once I figure out the build problems. |
Standardization will lead to ~ 0 dispersed eigenvalues, thus will increase shrinkage systematically and dramatically. |
I know this is an old thread, but would love to see the diagonal matrix supported as a shrinkage target in the future! |
If you have differently scaled features, the calculation of the shrinkage parameter using the algorithm by Ledoit and Wolf yields an incorrect estimate if the data are not standardized. Therefore, the function
ledoit_wolf_shrinkage
inshrunk_covariance_.py
should standardize the data before computing the shrinkage parameter, and then scale the shrunk covariance matrix back at the end.The text was updated successfully, but these errors were encountered: