-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
FIX: Reduce bias of covariance.MinCovDet
with consistency correction
#32117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
MinCovDet
MinCovDet
, to reduce bias
MinCovDet
, to reduce biascovariance.MinCovDet
, to reduce bias
covariance.MinCovDet
, to reduce biascovariance.MinCovDet
with consistency correction
covariance.MinCovDet
with consistency correctioncovariance.MinCovDet
with consistency correction
It seems like in some cases, this test fails under the new implementation: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/covariance/tests/test_robust_covariance.py#L35. I suggest that this it not to blame on the implementation, and that the tolerance threshold of the test should be increased so it passes, as argued below. The test compares the obtained result with the covariance obtain from the true inliers. The specific test that fails has a much smaller tolerance threshold than the other tests. The test would pass in the PR if the threshold was similar to the other tests. The threshold seems too low given the variability of the method results (see #23162). In the example code of the Issue #23162 it is shown that the new implementation is much less biased than the original implementation. The test failure should not be blamed on the new implementation, but rather on the variability of the results + low threshold for that specific test. I suggest increasing the threshold in that test from |
I went ahead and increased the tolerance of the test, which now passes. Also, I changed the expected values in the Docstring examples to match the new implementation. Interestingly, the results in the PR are more similar to the true values than the results with the original version. |
Reference Issues/PRs
Partially fixes Issue #23162
What does this implement/fix? Explain your changes.
Background:
In brief, the output of the
covariance.MinCovDet
covariance estimator is strongly biased, because it is lacking a consistency correction. This PR adds the missing correction, reducing the bias. To fully eliminate the bias, an additional correction is needed, that should be implemented in the future.In the Issue #23162, an example of the bias in the output is shown. In my comment to that Issue, I provide a thorough explanation of the statistics behind the issue and this fix, and show that the new output obtained with this PR is less biased than the original.
Changes:
In this PR, I added the function
_consistency_correction
to thecovariance._robust_covariance
submodule. This function calculates the multiplicative correction factor so that the output is consistent at the normal distribution, following the references mentioned in my comment. In these lines I multiply the robust covariance estimate by the new multiplicative factor, reducing the bias.Also, this correction actually needs to be applied twice. The original implementation did apply the correction once, but using a somewhat adhoc method from the original paper. For consistency in the code, and because the new correction is more theoretically grounded, I changed the code to use the new
_consistency_correction
for the first application too, in these lines.Any other comments?
The estimates with Gaussian datasets with no outliers are still not unbiased, because they lack the finite sample correction. A future PR should add this further correction. The implementation of MinCovDet in R can be used as a template to implement this https://rdrr.io/cran/robustbase/src/R/covMcd.R.