Skip to content

ENH Checks n_features_in_ in covariance #19341

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Feb 11, 2021

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Related to #19333

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we want to validate data also for score method of the EmpiricalCovariance class?

@@ -163,6 +163,7 @@ def decision_function(self, X):
compatibility with other outlier detection algorithms.
"""
check_is_fitted(self)
X = self._validate_data(X, reset=False)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shall make the validation in score_samples instead since it is call below?
Right now, we don't have any check in this method.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. In this specific case and since predict, decision_function, score all call score_samples, let's move the validation to score_samples.

@thomasjpfan
Copy link
Member Author

Do we want to validate data also for score method of the EmpiricalCovariance class?

The common test does not currently cover score. We most likely need to go through all the modules to add _validate_data to score.

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Feb 11, 2021

But in this specific case, EmpiricalCovariance.score calls predict, it will get validated.

Edit: I looked at the wrong score method.

@lorentzenchr
Copy link
Member

But in this specific case, EmpiricalCovariance.score calls predict, it will get validated.

Shouldn't there be a self._validate_data in def score and def mahalanobis of class EmpiricalCovariance?

@thomasjpfan
Copy link
Member Author

Shouldn't there be a self._validate_data in def score and def mahalanobis of class EmpiricalCovariance?

score makes sense, as for mahalanobis as long as we use config_context(assume_finite=True) to turn off the nan check. Because pairwise_distances calls check_pairwise_arrays, which would validate again.

I think score should be a part of the common test, and mahalaobis needs its own test.

Looking at the code base, adding score to the common test should not be too difficult because most of it comes from the mixin calling predict, where predict already validates.

@thomasjpfan
Copy link
Member Author

Updated PR with adding score to the common test and a test for mahalanobis.

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - again.

Copy link
Member

@glemaitre glemaitre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@glemaitre glemaitre merged commit 6959532 into scikit-learn:main Feb 11, 2021
@glemaitre
Copy link
Member

Thank you @thomasjpfan

@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants