-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG+1] Option to suppress validation for finiteness #7548
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
1cbf18b
f0878d0
4d35dd8
2530a47
379302c
ac125b9
0b800e5
c53c8d9
69e58c4
bc03588
c6664bb
67188f0
1ae394b
71f6c23
40bfbdb
afbbdda
a020eac
9d2eaf9
089339c
b91c66e
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -40,6 +40,9 @@ Functions | |
:template: function.rst | ||
|
||
base.clone | ||
config_context | ||
set_config | ||
get_config | ||
|
||
|
||
.. _cluster_ref: | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -68,6 +68,25 @@ To benchmark different estimators for your case you can simply change the | |
:ref:`sphx_glr_auto_examples_applications_plot_prediction_latency.py`. This should give | ||
you an estimate of the order of magnitude of the prediction latency. | ||
|
||
.. topic:: Configuring Scikit-learn for reduced validation overhead | ||
|
||
Scikit-learn does some validation on data that increases the overhead per | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. do you want to add something on the global setter? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not really :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Because it's discouraged? I feel like public functionality should be documented.... hm... There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Maybe mention it but say why the context manager is better? |
||
call to ``predict`` and similar functions. In particular, checking that | ||
features are finite (not NaN or infinite) involves a full pass over the | ||
data. If you ensure that your data is acceptable, you may suppress | ||
checking for finiteness by setting the environment variable | ||
``SKLEARN_ASSUME_FINITE`` to a non-empty string before importing | ||
scikit-learn, or configure it in Python with :func:`sklearn.set_config`. | ||
For more control than these global settings, a :func:`config_context` | ||
allows you to set this configuration within a specified context:: | ||
|
||
>>> import sklearn | ||
>>> with sklearn.config_context(assume_finite=True): | ||
... pass # do learning/prediction here with reduced validation | ||
|
||
Note that this will affect all uses of | ||
:func:`sklearn.utils.assert_all_finite` within the context. | ||
|
||
Influence of the Number of Features | ||
----------------------------------- | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -15,6 +15,78 @@ | |
import sys | ||
import re | ||
import warnings | ||
import os | ||
from contextlib import contextmanager as _contextmanager | ||
|
||
_ASSUME_FINITE = bool(os.environ.get('SKLEARN_ASSUME_FINITE', False)) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. If we want to add more config in the future, don't we want them to be bundled in a dictionary? |
||
|
||
|
||
def get_config(): | ||
"""Retrieve current values for configuration set by :func:`set_config` | ||
|
||
Returns | ||
------- | ||
config : dict | ||
Keys are parameter names that can be passed to :func:`set_config`. | ||
""" | ||
return {'assume_finite': _ASSUME_FINITE} | ||
|
||
|
||
def set_config(assume_finite=None): | ||
"""Set global scikit-learn configuration | ||
|
||
Parameters | ||
---------- | ||
assume_finite : bool, optional | ||
If True, validation for finiteness will be skipped, | ||
saving time, but leading to potential crashes. If | ||
False, validation for finiteness will be performed, | ||
avoiding error. | ||
""" | ||
global _ASSUME_FINITE | ||
if assume_finite is not None: | ||
_ASSUME_FINITE = assume_finite | ||
|
||
|
||
@_contextmanager | ||
def config_context(**new_config): | ||
"""Context manager for global scikit-learn configuration | ||
|
||
Parameters | ||
---------- | ||
assume_finite : bool, optional | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. bad param docstring There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How so? Because I should say that it needs to be specified by name? The point here is that the user always has the option to change, but can leave things unchanged Too There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. forget it |
||
If True, validation for finiteness will be skipped, | ||
saving time, but leading to potential crashes. If | ||
False, validation for finiteness will be performed, | ||
avoiding error. | ||
|
||
Notes | ||
----- | ||
All settings, not just those presently modified, will be returned to | ||
their previous values when the context manager is exited. This is not | ||
thread-safe. | ||
|
||
Examples | ||
-------- | ||
>>> import sklearn | ||
>>> from sklearn.utils.validation import assert_all_finite | ||
>>> with sklearn.config_context(assume_finite=True): | ||
... assert_all_finite([float('nan')]) | ||
>>> with sklearn.config_context(assume_finite=True): | ||
... with sklearn.config_context(assume_finite=False): | ||
... assert_all_finite([float('nan')]) | ||
... # doctest: +ELLIPSIS | ||
Traceback (most recent call last): | ||
... | ||
ValueError: Input contains NaN, ... | ||
""" | ||
old_config = get_config().copy() | ||
set_config(**new_config) | ||
|
||
try: | ||
yield | ||
finally: | ||
set_config(**old_config) | ||
|
||
|
||
# Make sure that DeprecationWarning within this package always gets printed | ||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,68 @@ | ||
from sklearn import get_config, set_config, config_context | ||
from sklearn.utils.testing import assert_equal, assert_raises | ||
|
||
|
||
def test_config_context(): | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
|
||
# Not using as a context manager affects nothing | ||
config_context(assume_finite=True) | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
|
||
with config_context(assume_finite=True): | ||
assert_equal(get_config(), {'assume_finite': True}) | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
|
||
with config_context(assume_finite=True): | ||
with config_context(assume_finite=None): | ||
assert_equal(get_config(), {'assume_finite': True}) | ||
|
||
assert_equal(get_config(), {'assume_finite': True}) | ||
|
||
with config_context(assume_finite=False): | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
|
||
with config_context(assume_finite=None): | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
|
||
# global setting will not be retained outside of context that | ||
# did not modify this setting | ||
set_config(assume_finite=True) | ||
assert_equal(get_config(), {'assume_finite': True}) | ||
|
||
assert_equal(get_config(), {'assume_finite': False}) | ||
|
||
assert_equal(get_config(), {'assume_finite': True}) | ||
|
||
assert_equal(get_config(), {'assume_finite': False}) | ||
|
||
# No positional arguments | ||
assert_raises(TypeError, config_context, True) | ||
# No unknown arguments | ||
assert_raises(TypeError, config_context(do_something_else=True).__enter__) | ||
|
||
|
||
def test_config_context_exception(): | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
try: | ||
with config_context(assume_finite=True): | ||
assert_equal(get_config(), {'assume_finite': True}) | ||
raise ValueError() | ||
except ValueError: | ||
pass | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
|
||
|
||
def test_set_config(): | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
set_config(assume_finite=None) | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
set_config(assume_finite=True) | ||
assert_equal(get_config(), {'assume_finite': True}) | ||
set_config(assume_finite=None) | ||
assert_equal(get_config(), {'assume_finite': True}) | ||
set_config(assume_finite=False) | ||
assert_equal(get_config(), {'assume_finite': False}) | ||
|
||
# No unknown arguments | ||
assert_raises(TypeError, set_config, do_something_else=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this naming.