-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] iforest backward compatibility #11553
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/ensemble/iforest.py
Outdated
Assuming the behaviour parameter is set to 'old', we always have | ||
offset_ = -0.5. | ||
|
||
behaviour: str, default == 'old' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be written
behaviour: str, optional (default='old')
sklearn/ensemble/iforest.py
Outdated
@@ -131,6 +141,7 @@ def __init__(self, | |||
n_estimators=100, | |||
max_samples="auto", | |||
contamination="legacy", | |||
behaviour='old', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be positioned after n_jobs param
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need to update the whatsnew entry that you added in the PR about the outlier detection API. Common tests are also failing. The common tests are written for the new API. We can maybe do check_estimator(IsolationForest(behavior='new'))
in test_iforest.py and skip IsolationForest in the common test until the end of the deprecation cycle? or just do estimator.set_params(behavior='new')
if estimator=IsolationForest in the common tests.
sklearn/ensemble/iforest.py
Outdated
Assuming the behaviour parameter is set to 'old', we always have | ||
offset_ = -0.5. | ||
|
||
behaviour: str, optional (default='old') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should be in the Parameters section of the docstring not Attributes section
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One cosmetic comment, and then this is good.
sklearn/ensemble/iforest.py
Outdated
Default "behaviour" parameter will change to "new" in version 0.22. | ||
Passing behaviour="new" makes the decision_function change to match | ||
other anomaly detection algorithm API, as explained in the offset_ | ||
parameter documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that you should be slightly more explicit as to what switching this parameter entails.
LGTM. Thank you! |
for the common tests you can add if estimator.__class__.__name__ == "IsolationForest":
# XXX to be removed in 0.22.
# this is used because the old IsolationForest does not
# respect the outlier detection API and thus and does not
# pass the outlier detection common tests.
estimator.set_params(behaviour='new') in the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The examples should also be modified so that IsolationForest is instantiated with behaviour=new.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've not checked that the examples render similarly to 0.19, which is what we'd aim for...?
sklearn/ensemble/iforest.py
Outdated
Passing behaviour="new" makes the decision_function change to match | ||
other anomaly detection algorithm API, as explained in details in the | ||
offset_ attribute documentation. Basically, the decision_function | ||
becomes dependent from the contamination parameter, in such a way that |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"dependent from" is bad English. Do you mean "dependent on"?
sklearn/ensemble/iforest.py
Outdated
return self | ||
|
||
# else, define offset_ wrt contamination parameter, so that the | ||
# threshold_ attribute is implicitely 0 and is not needed anymore: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
*implicitly
flake8 errors |
|
Yes we need to check the examples. (Btw the example given in #11337 is not an example from scikit-learn but from a scipy tutorial). |
you have a travis error https://travis-ci.org/scikit-learn/scikit-learn/jobs/404541612 |
rebased on master, fixing conflict with #11561 (same as on 0.19) |
looks like travis will pass in a sec. should we wait for the rest? |
I'm very worried about the appveyor failure
Sent from my phone. Please forgive typos and briefness.
…On Jul 17, 2018, 20:44, at 20:44, Andreas Mueller ***@***.***> wrote:
looks like travis will pass in a sec. should we wait for the rest?
--
You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
#11553 (comment)
|
@ngoix you need to rebase. |
sklearn/ensemble/iforest.py
Outdated
self.contamination = contamination | ||
|
||
if behaviour == 'old': |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be done in fit
You need to catch the warning using the pytest.mark.filterwarings for when behaviour='old' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically, we turn warning into errors so you can refer to the following traceback:
================================ test session starts ================================
platform linux -- Python 3.6.5, pytest-3.6.1, py-1.5.3, pluggy-0.6.0 -- /home/lemaitre/miniconda3/envs/dev/bin/python
cachedir: .pytest_cache
rootdir: /home/lemaitre/Documents/code/toolbox/scikit-learn, inifile: setup.cfg
collected 14 items
sklearn/ensemble/tests/test_iforest.py::test_iforest PASSED [ 7%]
sklearn/ensemble/tests/test_iforest.py::test_iforest_sparse FAILED [ 14%]
sklearn/ensemble/tests/test_iforest.py::test_iforest_error FAILED [ 21%]
sklearn/ensemble/tests/test_iforest.py::test_recalculate_max_depth FAILED [ 28%]
sklearn/ensemble/tests/test_iforest.py::test_max_samples_attribute FAILED [ 35%]
sklearn/ensemble/tests/test_iforest.py::test_iforest_parallel_regression FAILED [ 42%]
sklearn/ensemble/tests/test_iforest.py::test_iforest_performance FAILED [ 50%]
sklearn/ensemble/tests/test_iforest.py::test_iforest_works PASSED [ 57%]
sklearn/ensemble/tests/test_iforest.py::test_max_samples_consistency FAILED [ 64%]
sklearn/ensemble/tests/test_iforest.py::test_iforest_subsampled_features FAILED [ 71%]
sklearn/ensemble/tests/test_iforest.py::test_iforest_average_path_length PASSED [ 78%]
sklearn/ensemble/tests/test_iforest.py::test_score_samples FAILED [ 85%]
sklearn/ensemble/tests/test_iforest.py::test_deprecation FAILED [ 92%]
sklearn/ensemble/tests/test_iforest.py::test_behaviour_param FAILED [100%]
===================================== FAILURES ======================================
________________________________ test_iforest_sparse ________________________________
@pytest.mark.filterwarnings('ignore:default contamination')
@pytest.mark.filterwarnings('ignore:threshold_ attribute')
def test_iforest_sparse():
"""Check IForest for various parameter settings on sparse input."""
rng = check_random_state(0)
X_train, X_test, y_train, y_test = train_test_split(boston.data[:50],
boston.target[:50],
random_state=rng)
grid = ParameterGrid({"max_samples": [0.5, 1.0],
"bootstrap": [True, False]})
for sparse_format in [csc_matrix, csr_matrix]:
X_train_sparse = sparse_format(X_train)
X_test_sparse = sparse_format(X_test)
for params in grid:
# Trained on sparse format
sparse_classifier = IsolationForest(
> n_estimators=10, random_state=1, **params).fit(X_train_sparse)
sklearn/ensemble/tests/test_iforest.py:85:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=True, contamination='legacy',
max_features=1.0, max_samples=0.5, n_estimators=10, n_jobs=1,
random_state=1, verbose=0)
X = <37x13 sparse matrix of type '<class 'numpy.float64'>'
with 420 stored elements in Compressed Sparse Column format>
y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
FutureWarning)
self._contamination = 0.1
else:
self._contamination = self.contamination
if self.behaviour == 'old':
warnings.warn('Default "behaviour" parameter will change to "new" '
'in version 0.22. Passing behaviour="new" makes '
'IsolationForest decision_function change to match '
'other anomaly detection algorithm API.',
> FutureWarning)
E FutureWarning: Default "behaviour" parameter will change to "new" in version 0.22. Passing behaviour="new" makes IsolationForest decision_function change to match other anomaly detection algorithm API.
sklearn/ensemble/iforest.py:209: FutureWarning
________________________________ test_iforest_error _________________________________
@pytest.mark.filterwarnings('ignore:default contamination')
@pytest.mark.filterwarnings('ignore:threshold_ attribute')
def test_iforest_error():
"""Test that it gives proper exception on deficient input."""
X = iris.data
# Test max_samples
assert_raises(ValueError,
> IsolationForest(max_samples=-1).fit, X)
sklearn/ensemble/tests/test_iforest.py:104:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
sklearn/utils/_unittest_backport.py:204: in assertRaises
return context.handle('assertRaises', args, kwargs)
sklearn/utils/_unittest_backport.py:113: in handle
callable_obj(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=1.0, max_samples=-1, n_estimators=100, n_jobs=1,
random_state=None, verbose=0)
X = array([[5.8, 2.8, 5.1, 2.4],
[6. , 2.2, 4. , 1. ],
[5.5, 4.2, 1.4, 0.2],
[7.3, 2.9, 6.3, 1.8],
...],
[6.3, 2.9, 5.6, 1.8],
[5.8, 2.7, 4.1, 1. ],
[7.7, 3.8, 6.7, 2.2],
[4.6, 3.2, 1.4, 0.2]])
y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
FutureWarning)
self._contamination = 0.1
else:
self._contamination = self.contamination
if self.behaviour == 'old':
warnings.warn('Default "behaviour" parameter will change to "new" '
'in version 0.22. Passing behaviour="new" makes '
'IsolationForest decision_function change to match '
'other anomaly detection algorithm API.',
> FutureWarning)
E FutureWarning: Default "behaviour" parameter will change to "new" in version 0.22. Passing behaviour="new" makes IsolationForest decision_function change to match other anomaly detection algorithm API.
sklearn/ensemble/iforest.py:209: FutureWarning
____________________________ test_recalculate_max_depth _____________________________
@pytest.mark.filterwarnings('ignore:default contamination')
def test_recalculate_max_depth():
"""Check max_depth recalculation when max_samples is reset to n_samples"""
X = iris.data
> clf = IsolationForest().fit(X)
sklearn/ensemble/tests/test_iforest.py:145:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=1,
random_state=None, verbose=0)
X = array([[5.8, 2.8, 5.1, 2.4],
[6. , 2.2, 4. , 1. ],
[5.5, 4.2, 1.4, 0.2],
[7.3, 2.9, 6.3, 1.8],
...],
[6.3, 2.9, 5.6, 1.8],
[5.8, 2.7, 4.1, 1. ],
[7.7, 3.8, 6.7, 2.2],
[4.6, 3.2, 1.4, 0.2]])
y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
FutureWarning)
self._contamination = 0.1
else:
self._contamination = self.contamination
if self.behaviour == 'old':
warnings.warn('Default "behaviour" parameter will change to "new" '
'in version 0.22. Passing behaviour="new" makes '
'IsolationForest decision_function change to match '
'other anomaly detection algorithm API.',
> FutureWarning)
E FutureWarning: Default "behaviour" parameter will change to "new" in version 0.22. Passing behaviour="new" makes IsolationForest decision_function change to match other anomaly detection algorithm API.
sklearn/ensemble/iforest.py:209: FutureWarning
____________________________ test_max_samples_attribute _____________________________
@pytest.mark.filterwarnings('ignore:default contamination')
def test_max_samples_attribute():
X = iris.data
> clf = IsolationForest().fit(X)
sklearn/ensemble/tests/test_iforest.py:153:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=1,
random_state=None, verbose=0)
X = array([[5.8, 2.8, 5.1, 2.4],
[6. , 2.2, 4. , 1. ],
[5.5, 4.2, 1.4, 0.2],
[7.3, 2.9, 6.3, 1.8],
...],
[6.3, 2.9, 5.6, 1.8],
[5.8, 2.7, 4.1, 1. ],
[7.7, 3.8, 6.7, 2.2],
[4.6, 3.2, 1.4, 0.2]])
y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
FutureWarning)
self._contamination = 0.1
else:
self._contamination = self.contamination
if self.behaviour == 'old':
warnings.warn('Default "behaviour" parameter will change to "new" '
'in version 0.22. Passing behaviour="new" makes '
'IsolationForest decision_function change to match '
'other anomaly detection algorithm API.',
> FutureWarning)
E FutureWarning: Default "behaviour" parameter will change to "new" in version 0.22. Passing behaviour="new" makes IsolationForest decision_function change to match other anomaly detection algorithm API.
sklearn/ensemble/iforest.py:209: FutureWarning
_________________________ test_iforest_parallel_regression __________________________
@pytest.mark.filterwarnings('ignore:default contamination')
@pytest.mark.filterwarnings('ignore:threshold_ attribute')
def test_iforest_parallel_regression():
"""Check parallel regression."""
rng = check_random_state(0)
X_train, X_test, y_train, y_test = train_test_split(boston.data,
boston.target,
random_state=rng)
ensemble = IsolationForest(n_jobs=3,
> random_state=0).fit(X_train)
sklearn/ensemble/tests/test_iforest.py:177:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=3,
random_state=0, verbose=0)
X = array([[1.44760e-01, 0.00000e+00, 1.00100e+01, ..., 1.78000e+01,
3.91500e+02, 1.36100e+01],
[3.46600e-0...+02, 6.53000e+00],
[1.40507e+01, 0.00000e+00, 1.81000e+01, ..., 2.02000e+01,
3.50500e+01, 2.12200e+01]])
y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
FutureWarning)
self._contamination = 0.1
else:
self._contamination = self.contamination
if self.behaviour == 'old':
warnings.warn('Default "behaviour" parameter will change to "new" '
'in version 0.22. Passing behaviour="new" makes '
'IsolationForest decision_function change to match '
'other anomaly detection algorithm API.',
> FutureWarning)
E FutureWarning: Default "behaviour" parameter will change to "new" in version 0.22. Passing behaviour="new" makes IsolationForest decision_function change to match other anomaly detection algorithm API.
sklearn/ensemble/iforest.py:209: FutureWarning
_____________________________ test_iforest_performance ______________________________
@pytest.mark.filterwarnings('ignore:default contamination')
def test_iforest_performance():
"""Test Isolation Forest performs well"""
# Generate train/test data
rng = check_random_state(2)
X = 0.3 * rng.randn(120, 2)
X_train = np.r_[X + 2, X - 2]
X_train = X[:100]
# Generate some abnormal novel observations
X_outliers = rng.uniform(low=-4, high=4, size=(20, 2))
X_test = np.r_[X[100:], X_outliers]
y_test = np.array([0] * 20 + [1] * 20)
# fit the model
> clf = IsolationForest(max_samples=100, random_state=rng).fit(X_train)
sklearn/ensemble/tests/test_iforest.py:208:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=1.0, max_samples=100, n_estimators=100, n_jobs=1,
random_state=<mtrand.RandomState object at 0x7fe2e0dc3360>,
verbose=0)
X = array([[-1.25027354e-01, -1.68800482e-02],
[-6.40858829e-01, 4.92081243e-01],
[-5.38030676e-01, -2.5252....66428750e-01, 5.41342992e-01],
[ 5.41229421e-02, 1.65949282e-01],
[ 3.09908720e-01, -9.87007304e-02]])
y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
FutureWarning)
self._contamination = 0.1
else:
self._contamination = self.contamination
if self.behaviour == 'old':
warnings.warn('Default "behaviour" parameter will change to "new" '
'in version 0.22. Passing behaviour="new" makes '
'IsolationForest decision_function change to match '
'other anomaly detection algorithm API.',
> FutureWarning)
E FutureWarning: Default "behaviour" parameter will change to "new" in version 0.22. Passing behaviour="new" makes IsolationForest decision_function change to match other anomaly detection algorithm API.
sklearn/ensemble/iforest.py:209: FutureWarning
___________________________ test_max_samples_consistency ____________________________
@pytest.mark.filterwarnings('ignore:default contamination')
def test_max_samples_consistency():
# Make sure validated max_samples in iforest and BaseBagging are identical
X = iris.data
> clf = IsolationForest().fit(X)
sklearn/ensemble/tests/test_iforest.py:238:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=1,
random_state=None, verbose=0)
X = array([[5.8, 2.8, 5.1, 2.4],
[6. , 2.2, 4. , 1. ],
[5.5, 4.2, 1.4, 0.2],
[7.3, 2.9, 6.3, 1.8],
...],
[6.3, 2.9, 5.6, 1.8],
[5.8, 2.7, 4.1, 1. ],
[7.7, 3.8, 6.7, 2.2],
[4.6, 3.2, 1.4, 0.2]])
y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
FutureWarning)
self._contamination = 0.1
else:
self._contamination = self.contamination
if self.behaviour == 'old':
warnings.warn('Default "behaviour" parameter will change to "new" '
'in version 0.22. Passing behaviour="new" makes '
'IsolationForest decision_function change to match '
'other anomaly detection algorithm API.',
> FutureWarning)
E FutureWarning: Default "behaviour" parameter will change to "new" in version 0.22. Passing behaviour="new" makes IsolationForest decision_function change to match other anomaly detection algorithm API.
sklearn/ensemble/iforest.py:209: FutureWarning
_________________________ test_iforest_subsampled_features __________________________
@pytest.mark.filterwarnings('ignore:default contamination')
@pytest.mark.filterwarnings('ignore:threshold_ attribute')
def test_iforest_subsampled_features():
# It tests non-regression for #5732 which failed at predict.
rng = check_random_state(0)
X_train, X_test, y_train, y_test = train_test_split(boston.data[:50],
boston.target[:50],
random_state=rng)
clf = IsolationForest(max_features=0.8)
> clf.fit(X_train, y_train)
sklearn/ensemble/tests/test_iforest.py:251:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=0.8, max_samples='auto', n_estimators=100, n_jobs=1,
random_state=None, verbose=0)
X = array([[5.20177e+00, 0.00000e+00, 1.81000e+01, 1.00000e+00, 7.70000e-01,
6.12700e+00, 8.34000e+01, 2.72270e+00... 6.89700e+00, 5.43000e+01, 6.33610e+00, 6.00000e+00, 3.00000e+02,
1.66000e+01, 3.91250e+02, 1.13800e+01]])
y = array([22.7, 26.2, 17.8, 10.9, 22.2, 18.2, 24.6, 22. , 13.4, 5. , 9.5,
36.4, 21.6, 31.5, 14.9, 33.3, 24. , 16... , 15.1, 14.9,
28.4, 14.4, 16.4, 19. , 14.5, 24.4, 31.1, 23.7, 19.3, 32.9, 33.1,
8.8, 19.9, 26.6, 22. ])
sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
FutureWarning)
self._contamination = 0.1
else:
self._contamination = self.contamination
if self.behaviour == 'old':
warnings.warn('Default "behaviour" parameter will change to "new" '
'in version 0.22. Passing behaviour="new" makes '
'IsolationForest decision_function change to match '
'other anomaly detection algorithm API.',
> FutureWarning)
E FutureWarning: Default "behaviour" parameter will change to "new" in version 0.22. Passing behaviour="new" makes IsolationForest decision_function change to match other anomaly detection algorithm API.
sklearn/ensemble/iforest.py:209: FutureWarning
________________________________ test_score_samples _________________________________
@pytest.mark.filterwarnings('ignore:default contamination')
def test_score_samples():
X_train = [[1, 1], [1, 2], [2, 1]]
> clf1 = IsolationForest(contamination=0.1).fit(X_train)
sklearn/ensemble/tests/test_iforest.py:271:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination=0.1,
max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=1,
random_state=None, verbose=0)
X = [[1, 1], [1, 2], [2, 1]], y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
FutureWarning)
self._contamination = 0.1
else:
self._contamination = self.contamination
if self.behaviour == 'old':
warnings.warn('Default "behaviour" parameter will change to "new" '
'in version 0.22. Passing behaviour="new" makes '
'IsolationForest decision_function change to match '
'other anomaly detection algorithm API.',
> FutureWarning)
E FutureWarning: Default "behaviour" parameter will change to "new" in version 0.22. Passing behaviour="new" makes IsolationForest decision_function change to match other anomaly detection algorithm API.
sklearn/ensemble/iforest.py:209: FutureWarning
_________________________________ test_deprecation __________________________________
def test_deprecation():
X = [[0.0], [1.0]]
clf = IsolationForest()
assert_warns_message(FutureWarning,
'default contamination parameter 0.1 will change '
'in version 0.22 to "auto"',
clf.fit, X)
assert_warns_message(FutureWarning,
'Default "behaviour" parameter will change to "new" '
'in version 0.22',
clf.fit, X)
> clf = IsolationForest().fit(X)
sklearn/ensemble/tests/test_iforest.py:295:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=1,
random_state=None, verbose=0)
X = [[0.0], [1.0]], y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
> FutureWarning)
E FutureWarning: default contamination parameter 0.1 will change in version 0.22 to "auto". This will change the predict method behavior.
sklearn/ensemble/iforest.py:199: FutureWarning
_______________________________ test_behaviour_param ________________________________
def test_behaviour_param():
X_train = [[1, 1], [1, 2], [2, 1]]
> clf1 = IsolationForest(behaviour='old').fit(X_train)
sklearn/ensemble/tests/test_iforest.py:304:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
self = IsolationForest(behaviour='old', bootstrap=False, contamination='legacy',
max_features=1.0, max_samples='auto', n_estimators=100, n_jobs=1,
random_state=None, verbose=0)
X = [[1, 1], [1, 2], [2, 1]], y = None, sample_weight = None
def fit(self, X, y=None, sample_weight=None):
"""Fit estimator.
Parameters
----------
X : array-like or sparse matrix, shape (n_samples, n_features)
The input samples. Use ``dtype=np.float32`` for maximum
efficiency. Sparse matrices are also supported, use sparse
``csc_matrix`` for maximum efficiency.
sample_weight : array-like, shape = [n_samples] or None
Sample weights. If None, then samples are equally weighted.
Returns
-------
self : object
"""
if self.contamination == "legacy":
warnings.warn('default contamination parameter 0.1 will change '
'in version 0.22 to "auto". This will change the '
'predict method behavior.',
> FutureWarning)
E FutureWarning: default contamination parameter 0.1 will change in version 0.22 to "auto". This will change the predict method behavior.
sklearn/ensemble/iforest.py:199: FutureWarning
======================== 11 failed, 3 passed in 0.96 seconds ========================
PR ready to be merged I think |
LGTM @glemaitre merge if you're happy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry to be late but I think we should be clearer about the deprecation cycle.
- we are going to remove the
behaviour
parameter at some point (0.24 according to the PR description) right? In that case I think we should mention it somewhere (in whastnew, the docstring or the deprecation warning). I don't see something mentioning this except the description of the PR. - I would also add in whatsnew that the default value of
behaviour
will benew
in 0.22
We have to be careful there. In 0.22, we will not be able to make the old behaviour since a bunch of thing will be deprecated or change from default. Therefore, Does it make sense (or is it make jet-lag speaking)? |
Yes it does. In this case the docstring of the Should we add in the docstring, the warning and the whatsnew entry that behaviour parameter will be removed in 0.22? |
Yes we definitely should, thanks for pointing out! |
Actually I'm not very comfortable in introducing a deprecated parameter. We can't both change the behaviour + remove the parameter in one single version. |
We are doing this right now there: Moreover, here I think that this is even more problematic since we will not be able to maintain |
here this is not a wrong behaviour though, just a different convention...
|
I don't think we should deprecate |
We are supposed to not have |
Right, but we can still allow I'm also ok with @jorisvandenbossche compromise solution, ie remove |
I'm +1 for @jorisvandenbossche 's solution, even if it takes 2 deprecation cycles. We introduce |
And +1 for this also if we choose @jorisvandenbossche 's solution |
fix + cosmit cosmit common test update examples cosmit + fix travis attribute error instead of value + mask warning in test whatsnew
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that we are almost good then. Just a weird thing with the example and I would slightly modify the behaviour docstring.
@@ -0,0 +1,129 @@ | |||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it a new file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example was removed in PR #10700 (merged yesterday) in favor of plot_anomaly_comparison.py. We don’t need it anymore.
sklearn/ensemble/iforest.py
Outdated
offset_ attribute documentation. Basically, the decision_function | ||
becomes dependent on the contamination parameter, in such a way that | ||
0 becomes its natural threshold to detect outliers. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we can use the sphinx syntax to highlight those remarks. I would also remove the optional since that people should probably use 'new' and state that 'new' will be the default in the future.
behaviour : str, default='old'
Behaviour of the ``decision_function`` which can be either 'old' or 'new'.
Passing ``behaviour='new'`` makes the ``decision_function`` change to match
other anomaly detection algorithm API which will be the default behaviour
in the future. As explained in details in the ``offset_`` attribute
documentation, the ``decision_function`` becomes dependent on the
contamination parameter, in such a way that 0 becomes its natural
threshold to detect outliers.
.. versionadded:: 0.20
``behaviour`` is added in 0.20 for back-compatibility purpose.
.. deprecated:: 0.20
``behaviour='old'`` is deprecated in 0.20 and will not be possible in 0.22.
.. deprecated:: 0.22
``behaviour`` parameter will be deprecated in 0.22 and removed in 0.24.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You have the explanation duplicated now
Thanks @ngoix !!! Merging now. |
Reference Issues/PRs
Fixes #11337
What does this implement/fix? Explain your changes.
Add a parameter which switches old/new behaviour. Default value to be changed in 0.22 and parameter to be removed in 0.24