Skip to content

Conversation

dherrera1911
Copy link

@dherrera1911 dherrera1911 commented Sep 5, 2025

Reference Issues/PRs

Partially fixes Issue #23162

What does this implement/fix? Explain your changes.

Background:

In brief, the output of the covariance.MinCovDet covariance estimator is strongly biased, because it is lacking a consistency correction. This PR adds the missing correction, reducing the bias. To fully eliminate the bias, an additional correction is needed, that should be implemented in the future.

In the Issue #23162, an example of the bias in the output is shown. In my comment to that Issue, I provide a thorough explanation of the statistics behind the issue and this fix, and show that the new output obtained with this PR is less biased than the original.

Changes:

In this PR, I added the function _consistency_correction to the covariance._robust_covariance submodule. This function calculates the multiplicative correction factor so that the output is consistent at the normal distribution, following the references mentioned in my comment. In these lines I multiply the robust covariance estimate by the new multiplicative factor, reducing the bias.

Also, this correction actually needs to be applied twice. The original implementation did apply the correction once, but using a somewhat adhoc method from the original paper. For consistency in the code, and because the new correction is more theoretically grounded, I changed the code to use the new _consistency_correction for the first application too, in these lines.

Any other comments?

The estimates with Gaussian datasets with no outliers are still not unbiased, because they lack the finite sample correction. A future PR should add this further correction. The implementation of MinCovDet in R can be used as a template to implement this https://rdrr.io/cran/robustbase/src/R/covMcd.R.

Copy link

github-actions bot commented Sep 5, 2025

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: e938beb. Link to the linter CI: here

@dherrera1911 dherrera1911 changed the title Add correction to MinCovDet Add consistency correction to MinCovDet, to reduce bias Sep 5, 2025
@dherrera1911 dherrera1911 changed the title Add consistency correction to MinCovDet, to reduce bias Add consistency correction to covariance.MinCovDet, to reduce bias Sep 5, 2025
@dherrera1911 dherrera1911 changed the title Add consistency correction to covariance.MinCovDet, to reduce bias FIX Reduce bias of covariance.MinCovDet with consistency correction Sep 6, 2025
@dherrera1911 dherrera1911 changed the title FIX Reduce bias of covariance.MinCovDet with consistency correction FIX: Reduce bias of covariance.MinCovDet with consistency correction Sep 6, 2025
@dherrera1911
Copy link
Author

dherrera1911 commented Sep 6, 2025

It seems like in some cases, this test fails under the new implementation: https://github.com/scikit-learn/scikit-learn/blob/main/sklearn/covariance/tests/test_robust_covariance.py#L35. I suggest that this it not to blame on the implementation, and that the tolerance threshold of the test should be increased so it passes, as argued below.

The test compares the obtained result with the covariance obtain from the true inliers. The specific test that fails has a much smaller tolerance threshold than the other tests. The test would pass in the PR if the threshold was similar to the other tests. The threshold seems too low given the variability of the method results (see #23162).

In the example code of the Issue #23162 it is shown that the new implementation is much less biased than the original implementation. The test failure should not be blamed on the new implementation, but rather on the variability of the results + low threshold for that specific test.

I suggest increasing the threshold in that test from 0.02 to 0.1 like the other simulations in that same test function.

@dherrera1911
Copy link
Author

I went ahead and increased the tolerance of the test, which now passes.

Also, I changed the expected values in the Docstring examples to match the new implementation. Interestingly, the results in the PR are more similar to the true values than the results with the original version.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant