[MRG+2] clean outlier_detection.py #9018

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

lesteve merged 3 commits into scikit-learn:master from ngoix:clean_covariance_AD

Jun 9, 2017

Contributor

ngoix commented Jun 6, 2017

Following up discussion in issue #8693

-remove OutlierDetectionMixin

Contributor Author

ngoix commented Jun 6, 2017 •

edited

Loading

Open questions:

Should we keep raw_values parameter?
What do we do if contamination is None ? (maybe fix a 0.1 default parameter, as no value of the decision function is special?)


          remove OutlierDetectionMixin

4d7e6eb

ngoix force-pushed the clean_covariance_AD branch from 422ce84 to 4d7e6eb Compare

June 6, 2017 16:40

vene changed the title ~~clean outlier_detection.py~~ [WIP] clean outlier_detection.py

Member

vene commented Jun 7, 2017

added WIP in name :)

Contributor Author

ngoix commented Jun 7, 2017

yes thanks :)

albertcthomas reviewed

View reviewed changes

Contributor

albertcthomas left a comment

A few corrections/suggestions.

I also did a bit of research to find out why the cubit root of the mahalanobis distance is returned when raw_values=False. Except this example stating visualization purpose I do not see why we need to return the cubic root.

In the end, I don't think we need this raw_values parameter and we should always return the mahalanobis distance without the cubic root. This would however require a deprecation warning... I will check that this does not break the cited example (if that's the case, maybe use the cubic root only for the example).

sklearn/covariance/outlier_detection.py Outdated

		@@ -55,7 +112,7 @@ def decision_function(self, X, raw_values=False):
		decision : array-like, shape (n_samples, )
		The values of the decision function for each observations.

Contributor

albertcthomas Jun 7, 2017

each observation.

sklearn/covariance/outlier_detection.py Outdated

                   Parameters
                   ----------
+                  store_precision : bool

Contributor

albertcthomas Jun 7, 2017

boolean, optional (default=True)

sklearn/covariance/outlier_detection.py Outdated

+                  store_precision : bool
+                      Specify if the estimated precision is stored.
+                  assume_centered : Boolean

Contributor

albertcthomas Jun 7, 2017

boolean, optional (default=False)

sklearn/covariance/outlier_detection.py Outdated

+                      If False, the robust location and covariance are directly computed
+                      with the FastMCD algorithm without additional treatment.
+                  support_fraction : float, 0 < support_fraction < 1

Contributor

albertcthomas Jun 7, 2017

float, optional (default=None)

sklearn/covariance/outlier_detection.py Outdated

+                  support_fraction : float, 0 < support_fraction < 1
+                      The proportion of points to be included in the support of the raw
+                      MCD estimate. Default is ``None``, which implies that the minimum

Contributor

albertcthomas Jun 7, 2017

Should be in the interval (0,1).

sklearn/covariance/outlier_detection.py Outdated

+                      MCD estimate. Default is ``None``, which implies that the minimum
+                      value of support_fraction will be used within the algorithm:
+                      `[n_sample + n_features + 1] / 2`.
                   contamination : float, 0. < contamination < 0.5

Contributor

albertcthomas Jun 7, 2017

float, optional (default=0.1)

sklearn/covariance/outlier_detection.py Outdated

+                      MCD estimate. Default is ``None``, which implies that the minimum
+                      value of support_fraction will be used within the algorithm:
+                      `[n_sample + n_features + 1] / 2`.
                   contamination : float, 0. < contamination < 0.5
                       The amount of contamination of the data set, i.e. the proportion
                       of outliers in the data set.

Contributor

albertcthomas Jun 7, 2017

Should be in the interval (0, 0.5).

sklearn/covariance/outlier_detection.py Outdated

                       self.contamination = contamination
+                  def fit(self, X, y=None):
+                      MinCovDet.fit(self, X)

Contributor

albertcthomas Jun 7, 2017

super(EllipticEnveloppe, self).fit(X)

sklearn/covariance/outlier_detection.py Outdated

+                  def __init__(self, store_precision=True, assume_centered=False,
+                               support_fraction=None, contamination=0.1,
+                               random_state=None):
+                      MinCovDet.__init__(self, store_precision=store_precision,

Contributor

albertcthomas Jun 7, 2017

super(EllipticEnveloppe, self).fit(X)


          take into account albertcthomas review

0b717f9

albertcthomas reviewed

View reviewed changes

sklearn/covariance/outlier_detection.py Outdated

@@ @@ -53,9 +110,9 @@ def decision_function(self, X, raw_values=False): @@
                       Returns
                       -------
                       decision : array-like, shape (n_samples, )
-                          The values of the decision function for each observations.
+                          The values of the decision function for each observation.

Contributor

albertcthomas Jun 7, 2017

Decision function of the samples.

Contributor

albertcthomas commented Jun 8, 2017 •

edited

Loading

I also checked that this doesn't break the examples in plot_outlier_detection.py and plot_outlier_detection_housing.py

Do we want to try to clarify our position on the raw_values parameter in this PR on in #9015?

Contributor

albertcthomas commented Jun 8, 2017

See Comment in issue #4168 for explanations about the cubic root. See also Cross validated.

Contributor Author

ngoix commented Jun 8, 2017

Let's stick to cleaning the code in this PR, and think about API consistency in #9015

ngoix changed the title ~~[WIP] clean outlier_detection.py~~ [MRG+1] clean outlier_detection.py

Contributor Author

ngoix commented Jun 8, 2017

ping @vene or @agramfort for a review?

ngoix mentioned this pull request

[MRG+2] Outlier detection algorithms API consistency #9015

Merged

Contributor Author

ngoix commented Jun 9, 2017

re-ping @agramfort (small PR!)

Member

TomDLT commented Jun 9, 2017

LGTM

TomDLT approved these changes

View reviewed changes


          cosmit

9e5f55c

ngoix changed the title ~~[MRG+1] clean outlier_detection.py~~ [MRG+2] clean outlier_detection.py

Member

lesteve commented Jun 9, 2017

LGTM, merging, thanks a lot!

lesteve merged commit 0fb9a50 into scikit-learn:master

Sundrique pushed a commit to Sundrique/scikit-learn that referenced this pull request


          [MRG+2] clean outlier_detection.py (scikit-learn#9018)

f4a97a0

Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope

dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request


          [MRG+2] clean outlier_detection.py (scikit-learn#9018)

2a78138

Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope

dmohns pushed a commit to dmohns/scikit-learn that referenced this pull request


          [MRG+2] clean outlier_detection.py (scikit-learn#9018)

d6154cc

Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope

NelleV pushed a commit to NelleV/scikit-learn that referenced this pull request


          [MRG+2] clean outlier_detection.py (scikit-learn#9018)

fb60d64

Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope

paulha pushed a commit to paulha/scikit-learn that referenced this pull request


          [MRG+2] clean outlier_detection.py (scikit-learn#9018)

c16952d

Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope

AishwaryaRK pushed a commit to AishwaryaRK/scikit-learn that referenced this pull request


          [MRG+2] clean outlier_detection.py (scikit-learn#9018)

60cc378

Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope

maskani-moh pushed a commit to maskani-moh/scikit-learn that referenced this pull request


          [MRG+2] clean outlier_detection.py (scikit-learn#9018)

4b7589b

Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope

jwjohnson314 pushed a commit to jwjohnson314/scikit-learn that referenced this pull request


          [MRG+2] clean outlier_detection.py (scikit-learn#9018)

3116c17

Remove OutlierDetectionMixin, which was only used by by EllipticEnvelope

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet