-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[MRG+2] clean outlier_detection.py #9018
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Open questions:
|
422ce84
to
4d7e6eb
Compare
added WIP in name :) |
yes thanks :) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few corrections/suggestions.
I also did a bit of research to find out why the cubit root of the mahalanobis distance is returned when raw_values=False
. Except this example stating visualization purpose I do not see why we need to return the cubic root.
In the end, I don't think we need this raw_values
parameter and we should always return the mahalanobis distance without the cubic root. This would however require a deprecation warning... I will check that this does not break the cited example (if that's the case, maybe use the cubic root only for the example).
@@ -55,7 +112,7 @@ def decision_function(self, X, raw_values=False): | |||
decision : array-like, shape (n_samples, ) | |||
The values of the decision function for each observations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
each observation.
|
||
Parameters | ||
---------- | ||
store_precision : bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean, optional (default=True)
store_precision : bool | ||
Specify if the estimated precision is stored. | ||
|
||
assume_centered : Boolean |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
boolean, optional (default=False)
If False, the robust location and covariance are directly computed | ||
with the FastMCD algorithm without additional treatment. | ||
|
||
support_fraction : float, 0 < support_fraction < 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float, optional (default=None)
|
||
support_fraction : float, 0 < support_fraction < 1 | ||
The proportion of points to be included in the support of the raw | ||
MCD estimate. Default is ``None``, which implies that the minimum |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in the interval (0,1).
MCD estimate. Default is ``None``, which implies that the minimum | ||
value of support_fraction will be used within the algorithm: | ||
`[n_sample + n_features + 1] / 2`. | ||
|
||
contamination : float, 0. < contamination < 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
float, optional (default=0.1)
MCD estimate. Default is ``None``, which implies that the minimum | ||
value of support_fraction will be used within the algorithm: | ||
`[n_sample + n_features + 1] / 2`. | ||
|
||
contamination : float, 0. < contamination < 0.5 | ||
The amount of contamination of the data set, i.e. the proportion | ||
of outliers in the data set. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in the interval (0, 0.5).
self.contamination = contamination | ||
|
||
def fit(self, X, y=None): | ||
MinCovDet.fit(self, X) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(EllipticEnveloppe, self).fit(X)
def __init__(self, store_precision=True, assume_centered=False, | ||
support_fraction=None, contamination=0.1, | ||
random_state=None): | ||
MinCovDet.__init__(self, store_precision=store_precision, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
super(EllipticEnveloppe, self).fit(X)
@@ -53,9 +110,9 @@ def decision_function(self, X, raw_values=False): | |||
Returns | |||
------- | |||
decision : array-like, shape (n_samples, ) | |||
The values of the decision function for each observations. | |||
The values of the decision function for each observation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decision function of the samples.
I also checked that this doesn't break the examples in plot_outlier_detection.py and plot_outlier_detection_housing.py Do we want to try to clarify our position on the |
See Comment in issue #4168 for explanations about the cubic root. See also Cross validated. |
Let's stick to cleaning the code in this PR, and think about API consistency in #9015 |
ping @vene or @agramfort for a review? |
re-ping @agramfort (small PR!) |
LGTM |
LGTM, merging, thanks a lot! |
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
Following up discussion in issue #8693
-remove OutlierDetectionMixin