Skip to content

[MRG+1] Option to suppress validation for finiteness #7548

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 20 commits into from
Jun 8, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
1cbf18b
ENH add suppress validation option
jnothman Oct 2, 2016
f0878d0
TST skip problematic doctest
jnothman Oct 2, 2016
4d35dd8
Rename SUPPRESS_VALIDATION to PRESUME_FINITE
jnothman Oct 5, 2016
2530a47
Change PRESUME_ to ASSUME_ for convention's sake
jnothman Oct 7, 2016
379302c
DOC add note regarding assert_all_finite
jnothman Oct 15, 2016
ac125b9
ENH add set_config context manager for ASSUME_FINITE
jnothman Nov 23, 2016
0b800e5
Make ASSUME_FINITE private and provide get_config
jnothman Nov 23, 2016
c53c8d9
Fix ImportError due to incomplete change in last commit
jnothman Nov 23, 2016
69e58c4
TST/DOC tests and more cautious documentation for set_config
jnothman Nov 24, 2016
bc03588
DOC what's new entry for validation suppression
jnothman Nov 28, 2016
c6664bb
Merge branch 'master' into suppress-validation
jnothman Nov 28, 2016
67188f0
context manager is now config_context; set_config affects global config
jnothman Dec 5, 2016
1ae394b
Rename missed set_config to config_context
jnothman Dec 5, 2016
71f6c23
Fix mis-named test
jnothman Dec 5, 2016
40bfbdb
Mention set_config in narrative docs
jnothman Dec 7, 2016
afbbdda
More explicit about limmited restoration of context
jnothman Dec 7, 2016
a020eac
Merge branch 'master' into suppress-validation
jnothman Dec 12, 2016
9d2eaf9
Handle case where error raised in config_context
jnothman Dec 15, 2016
089339c
Reset all settings after exiting context manager
jnothman Dec 21, 2016
b91c66e
Merge branch 'master' into suppress-validation
jnothman Jun 7, 2017
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions doc/modules/classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,9 @@ Functions
:template: function.rst

base.clone
config_context
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this naming.

set_config
get_config


.. _cluster_ref:
Expand Down
19 changes: 19 additions & 0 deletions doc/modules/computational_performance.rst
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,25 @@ To benchmark different estimators for your case you can simply change the
:ref:`sphx_glr_auto_examples_applications_plot_prediction_latency.py`. This should give
you an estimate of the order of magnitude of the prediction latency.

.. topic:: Configuring Scikit-learn for reduced validation overhead

Scikit-learn does some validation on data that increases the overhead per
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to add something on the global setter?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because it's discouraged? I feel like public functionality should be documented.... hm...

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe mention it but say why the context manager is better?

call to ``predict`` and similar functions. In particular, checking that
features are finite (not NaN or infinite) involves a full pass over the
data. If you ensure that your data is acceptable, you may suppress
checking for finiteness by setting the environment variable
``SKLEARN_ASSUME_FINITE`` to a non-empty string before importing
scikit-learn, or configure it in Python with :func:`sklearn.set_config`.
For more control than these global settings, a :func:`config_context`
allows you to set this configuration within a specified context::

>>> import sklearn
>>> with sklearn.config_context(assume_finite=True):
... pass # do learning/prediction here with reduced validation

Note that this will affect all uses of
:func:`sklearn.utils.assert_all_finite` within the context.

Influence of the Number of Features
-----------------------------------

Expand Down
5 changes: 5 additions & 0 deletions doc/whats_new.rst
Original file line number Diff line number Diff line change
Expand Up @@ -31,6 +31,11 @@ Changelog
New features
............

- Validation that input data contains no NaN or inf can now be suppressed
using :func:`config_context`, at your own risk. This will save on runtime,
and may be particularly useful for prediction time. :issue:`7548` by
`Joel Nothman`_.

- Added the :class:`neighbors.LocalOutlierFactor` class for anomaly
detection based on nearest neighbors.
:issue:`5279` by `Nicolas Goix`_ and `Alexandre Gramfort`_.
Expand Down
72 changes: 72 additions & 0 deletions sklearn/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,78 @@
import sys
import re
import warnings
import os
from contextlib import contextmanager as _contextmanager

_ASSUME_FINITE = bool(os.environ.get('SKLEARN_ASSUME_FINITE', False))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we want to add more config in the future, don't we want them to be bundled in a dictionary?



def get_config():
"""Retrieve current values for configuration set by :func:`set_config`

Returns
-------
config : dict
Keys are parameter names that can be passed to :func:`set_config`.
"""
return {'assume_finite': _ASSUME_FINITE}


def set_config(assume_finite=None):
"""Set global scikit-learn configuration

Parameters
----------
assume_finite : bool, optional
If True, validation for finiteness will be skipped,
saving time, but leading to potential crashes. If
False, validation for finiteness will be performed,
avoiding error.
"""
global _ASSUME_FINITE
if assume_finite is not None:
_ASSUME_FINITE = assume_finite


@_contextmanager
def config_context(**new_config):
"""Context manager for global scikit-learn configuration

Parameters
----------
assume_finite : bool, optional
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bad param docstring

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How so? Because I should say that it needs to be specified by name?

The point here is that the user always has the option to change, but can leave things unchanged Too

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

forget it

If True, validation for finiteness will be skipped,
saving time, but leading to potential crashes. If
False, validation for finiteness will be performed,
avoiding error.

Notes
-----
All settings, not just those presently modified, will be returned to
their previous values when the context manager is exited. This is not
thread-safe.

Examples
--------
>>> import sklearn
>>> from sklearn.utils.validation import assert_all_finite
>>> with sklearn.config_context(assume_finite=True):
... assert_all_finite([float('nan')])
>>> with sklearn.config_context(assume_finite=True):
... with sklearn.config_context(assume_finite=False):
... assert_all_finite([float('nan')])
... # doctest: +ELLIPSIS
Traceback (most recent call last):
...
ValueError: Input contains NaN, ...
"""
old_config = get_config().copy()
set_config(**new_config)

try:
yield
finally:
set_config(**old_config)


# Make sure that DeprecationWarning within this package always gets printed
Expand Down
68 changes: 68 additions & 0 deletions sklearn/tests/test_config.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
from sklearn import get_config, set_config, config_context
from sklearn.utils.testing import assert_equal, assert_raises


def test_config_context():
assert_equal(get_config(), {'assume_finite': False})

# Not using as a context manager affects nothing
config_context(assume_finite=True)
assert_equal(get_config(), {'assume_finite': False})

with config_context(assume_finite=True):
assert_equal(get_config(), {'assume_finite': True})
assert_equal(get_config(), {'assume_finite': False})

with config_context(assume_finite=True):
with config_context(assume_finite=None):
assert_equal(get_config(), {'assume_finite': True})

assert_equal(get_config(), {'assume_finite': True})

with config_context(assume_finite=False):
assert_equal(get_config(), {'assume_finite': False})

with config_context(assume_finite=None):
assert_equal(get_config(), {'assume_finite': False})

# global setting will not be retained outside of context that
# did not modify this setting
set_config(assume_finite=True)
assert_equal(get_config(), {'assume_finite': True})

assert_equal(get_config(), {'assume_finite': False})

assert_equal(get_config(), {'assume_finite': True})

assert_equal(get_config(), {'assume_finite': False})

# No positional arguments
assert_raises(TypeError, config_context, True)
# No unknown arguments
assert_raises(TypeError, config_context(do_something_else=True).__enter__)


def test_config_context_exception():
assert_equal(get_config(), {'assume_finite': False})
try:
with config_context(assume_finite=True):
assert_equal(get_config(), {'assume_finite': True})
raise ValueError()
except ValueError:
pass
assert_equal(get_config(), {'assume_finite': False})


def test_set_config():
assert_equal(get_config(), {'assume_finite': False})
set_config(assume_finite=None)
assert_equal(get_config(), {'assume_finite': False})
set_config(assume_finite=True)
assert_equal(get_config(), {'assume_finite': True})
set_config(assume_finite=None)
assert_equal(get_config(), {'assume_finite': True})
set_config(assume_finite=False)
assert_equal(get_config(), {'assume_finite': False})

# No unknown arguments
assert_raises(TypeError, set_config, do_something_else=True)
12 changes: 11 additions & 1 deletion sklearn/utils/tests/test_validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -30,14 +30,15 @@
has_fit_parameter,
check_is_fitted,
check_consistent_length,
assert_all_finite,
)
import sklearn

from sklearn.exceptions import NotFittedError
from sklearn.exceptions import DataConversionWarning

from sklearn.utils.testing import assert_raise_message


def test_as_float_array():
# Test function for as_float_array
X = np.ones((3, 10), dtype=np.int32)
Expand Down Expand Up @@ -526,3 +527,12 @@ def test_check_dataframe_fit_attribute():
check_consistent_length(X_df)
except ImportError:
raise SkipTest("Pandas not found")


def test_suppress_validation():
X = np.array([0, np.inf])
assert_raises(ValueError, assert_all_finite, X)
sklearn.set_config(assume_finite=True)
assert_all_finite(X)
sklearn.set_config(assume_finite=False)
assert_raises(ValueError, assert_all_finite, X)
3 changes: 3 additions & 0 deletions sklearn/utils/validation.py
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

from ..externals import six
from ..utils.fixes import signature
from .. import get_config as _get_config
from ..exceptions import NonBLASDotWarning
from ..exceptions import NotFittedError
from ..exceptions import DataConversionWarning
Expand All @@ -30,6 +31,8 @@

def _assert_all_finite(X):
"""Like assert_all_finite, but only for ndarray."""
if _get_config()['assume_finite']:
return
X = np.asanyarray(X)
# First try an O(n) time, O(1) space solution for the common case that
# everything is finite; fall back to O(n) space np.isfinite to prevent
Expand Down