-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
ENH Makes global configuration thread local #18736
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
this is great and looks much easier than I anticipated! I think what I'd like to see is to start two threads, change the config in the main thread, check that the threads don't have the config changed, change it in the other thread, ensure that the config isn't changed anywhere else, and then return? or something like that? Not sure if this is overkill, but maybe something like import threading
import time
import sklearn
whos_turn = ['main']
shared_state = {}
def thread1_func():
while True:
time.sleep(0.1)
if whos_turn[0] == 'thread1':
assert shared_state['main'] == 'main ran'
assert not sklearn.get_config()['assume_finite']
sklearn.set_config(assume_finite=True)
assert sklearn.get_config()['assume_finite']
shared_state['thread1'] = 'thread1 ran'
whos_turn[0] = 'thread2'
return
def thread2_func():
while True:
time.sleep(0.1)
if whos_turn[0] == 'thread2':
assert shared_state['thread1'] == 'thread1 ran'
assert not sklearn.get_config()['assume_finite']
sklearn.set_config(assume_finite=True)
shared_state['thread2'] = 'ran!'
return
thread1 = threading.Thread(target=thread1_func)
thread2 = threading.Thread(target=thread2_func)
thread1.start()
thread2.start()
sklearn.set_config(assume_finite=True)
shared_state['main'] = 'main ran'
whos_turn = ['thread1']
thread1.join()
thread2.join()
assert shared_state['thread1'] == 'thread1 ran'
assert shared_state['thread2'] == 'thread2 ran'
assert sklearn.get_config()['assume_finite'] |
Thanks for the PR! Wasn't the issue that, with sklearn.config_context(assume_finite=False):
Parallel(backend=backend, n_jobs=2)(
delayed(function)(value, sleep_dur)
for value, sleep_dur in zip(booleans, sleep_seconds)) doesn't work rather that config cannot be set consistently inside the thread/process? Also the added test passes on master at least for me.. |
Updated PR by adding a test using |
sklearn/tests/test_config.py
Outdated
@@ -74,5 +83,56 @@ def test_set_config(): | |||
assert get_config()['assume_finite'] is False | |||
|
|||
# No unknown arguments | |||
<<<<<<< HEAD |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge error
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
if not hasattr(_threadlocal, 'global_config'): | ||
_threadlocal.global_config = _global_config.copy() | ||
return _threadlocal.global_config |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just a nitpick. Why not put that directly in get_config ?
also I think the copy is unecessary here since there will be a copy afterwards anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of _get_threadlocal_config
for returning a threadlocal mutable version of the config, while get_config
returns a immutable copy of the config. This is a safe guard from someone doing the following:
config = get_config()
# Does not change the configuration
config["assume_finite"] = True
which is consistent with the current behavior on main
.
I added comments to describe this reasoning into the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But do we intend to use _get_threadlocal_config
outside of get_config
? If not why don't we do
def get_config():
if not hasattr(_threadlocal, 'global_config'):
_threadlocal.global_config = _global_config
return _threadlocal.global_config.copy()
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_threadlocal_config
is used in set_config
to update the threadlocal config.
In any case, for set_config
to not mutable _global_config
, _threadlocal.global_config
needs to be initialize with a copy of _global_config
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_get_threadlocal_config is used in set_config
Yep missed that. Sorry. Forget my comments :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @thomasjpfan !
@rth I think your concerns have been solved. Do you confirm ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, thanks!
This PR makes it so that the global config is thread local. This means that changing the global config in one thread does not change the global config on another thread.