Skip to content

[MRG+2] clean outlier_detection.py #9018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
Jun 9, 2017

Conversation

ngoix
Copy link
Contributor

@ngoix ngoix commented Jun 6, 2017

Following up discussion in issue #8693

-remove OutlierDetectionMixin

@ngoix
Copy link
Contributor Author

ngoix commented Jun 6, 2017

Open questions:

  • Should we keep raw_values parameter?
  • What do we do if contamination is None ? (maybe fix a 0.1 default parameter, as no value of the decision function is special?)

@ngoix ngoix force-pushed the clean_covariance_AD branch from 422ce84 to 4d7e6eb Compare June 6, 2017 16:40
@vene vene changed the title clean outlier_detection.py [WIP] clean outlier_detection.py Jun 7, 2017
@vene
Copy link
Member

vene commented Jun 7, 2017

added WIP in name :)

@ngoix
Copy link
Contributor Author

ngoix commented Jun 7, 2017

yes thanks :)

Copy link
Contributor

@albertcthomas albertcthomas left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few corrections/suggestions.

I also did a bit of research to find out why the cubit root of the mahalanobis distance is returned when raw_values=False. Except this example stating visualization purpose I do not see why we need to return the cubic root.

In the end, I don't think we need this raw_values parameter and we should always return the mahalanobis distance without the cubic root. This would however require a deprecation warning... I will check that this does not break the cited example (if that's the case, maybe use the cubic root only for the example).

@@ -55,7 +112,7 @@ def decision_function(self, X, raw_values=False):
decision : array-like, shape (n_samples, )
The values of the decision function for each observations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

each observation.


Parameters
----------
store_precision : bool
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean, optional (default=True)

store_precision : bool
Specify if the estimated precision is stored.

assume_centered : Boolean
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

boolean, optional (default=False)

If False, the robust location and covariance are directly computed
with the FastMCD algorithm without additional treatment.

support_fraction : float, 0 < support_fraction < 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float, optional (default=None)


support_fraction : float, 0 < support_fraction < 1
The proportion of points to be included in the support of the raw
MCD estimate. Default is ``None``, which implies that the minimum
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the interval (0,1).

MCD estimate. Default is ``None``, which implies that the minimum
value of support_fraction will be used within the algorithm:
`[n_sample + n_features + 1] / 2`.

contamination : float, 0. < contamination < 0.5
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

float, optional (default=0.1)

MCD estimate. Default is ``None``, which implies that the minimum
value of support_fraction will be used within the algorithm:
`[n_sample + n_features + 1] / 2`.

contamination : float, 0. < contamination < 0.5
The amount of contamination of the data set, i.e. the proportion
of outliers in the data set.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be in the interval (0, 0.5).

self.contamination = contamination

def fit(self, X, y=None):
MinCovDet.fit(self, X)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super(EllipticEnveloppe, self).fit(X)

def __init__(self, store_precision=True, assume_centered=False,
support_fraction=None, contamination=0.1,
random_state=None):
MinCovDet.__init__(self, store_precision=store_precision,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

super(EllipticEnveloppe, self).fit(X)

@@ -53,9 +110,9 @@ def decision_function(self, X, raw_values=False):
Returns
-------
decision : array-like, shape (n_samples, )
The values of the decision function for each observations.
The values of the decision function for each observation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Decision function of the samples.

@albertcthomas
Copy link
Contributor

albertcthomas commented Jun 8, 2017

I also checked that this doesn't break the examples in plot_outlier_detection.py and plot_outlier_detection_housing.py

Do we want to try to clarify our position on the raw_values parameter in this PR on in #9015?

@albertcthomas
Copy link
Contributor

See Comment in issue #4168 for explanations about the cubic root. See also Cross validated.

@ngoix
Copy link
Contributor Author

ngoix commented Jun 8, 2017

Let's stick to cleaning the code in this PR, and think about API consistency in #9015

@ngoix ngoix changed the title [WIP] clean outlier_detection.py [MRG+1] clean outlier_detection.py Jun 8, 2017
@ngoix
Copy link
Contributor Author

ngoix commented Jun 8, 2017

ping @vene or @agramfort for a review?

@ngoix
Copy link
Contributor Author

ngoix commented Jun 9, 2017

re-ping @agramfort (small PR!)

@TomDLT
Copy link
Member

TomDLT commented Jun 9, 2017

LGTM

@ngoix ngoix changed the title [MRG+1] clean outlier_detection.py [MRG+2] clean outlier_detection.py Jun 9, 2017
@lesteve
Copy link
Member

lesteve commented Jun 9, 2017

LGTM, merging, thanks a lot!

@lesteve lesteve merged commit 0fb9a50 into scikit-learn:master Jun 9, 2017
Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request Jun 14, 2017
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request Aug 7, 2017
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request Aug 11, 2017
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
paulha pushed a commit to paulha/scikit-learn that referenced this pull request Aug 19, 2017
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request Aug 29, 2017
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request Nov 15, 2017
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request Dec 18, 2017
Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants