Skip to content

ENH Makes global configuration thread local #18736

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 22 commits into from
Apr 27, 2021

Conversation

thomasjpfan
Copy link
Member

This PR makes it so that the global config is thread local. This means that changing the global config in one thread does not change the global config on another thread.

@amueller
Copy link
Member

this is great and looks much easier than I anticipated!
Maybe we can also add another test that is a bit more direct using threading?
This one should work but if for example joblib for some reason decided to run these in series because the jobs are too small, or something like that, it wouldn't actually test something.

I think what I'd like to see is to start two threads, change the config in the main thread, check that the threads don't have the config changed, change it in the other thread, ensure that the config isn't changed anywhere else, and then return? or something like that?

Not sure if this is overkill, but maybe something like

import threading
import time
import sklearn

whos_turn = ['main']
shared_state = {}

def thread1_func():
    while True:
        time.sleep(0.1)
        if whos_turn[0] == 'thread1':
            assert shared_state['main'] == 'main ran'
            assert not sklearn.get_config()['assume_finite']
            sklearn.set_config(assume_finite=True)
            assert sklearn.get_config()['assume_finite']
            shared_state['thread1'] = 'thread1 ran'
            whos_turn[0] = 'thread2'
            return
            
def thread2_func():
    while True:
        time.sleep(0.1)
        if whos_turn[0] == 'thread2':
            assert shared_state['thread1'] == 'thread1 ran'
            assert not sklearn.get_config()['assume_finite']
            sklearn.set_config(assume_finite=True)
            shared_state['thread2'] = 'ran!'
            return

thread1 = threading.Thread(target=thread1_func)
thread2 = threading.Thread(target=thread2_func)

thread1.start()
thread2.start()

sklearn.set_config(assume_finite=True)
shared_state['main'] = 'main ran'
whos_turn = ['thread1']
thread1.join()
thread2.join()
assert shared_state['thread1'] == 'thread1 ran'
assert shared_state['thread2'] == 'thread2 ran'
assert sklearn.get_config()['assume_finite']

@rth
Copy link
Member

rth commented Nov 26, 2020

Thanks for the PR! Wasn't the issue that,

with sklearn.config_context(assume_finite=False):
  Parallel(backend=backend, n_jobs=2)(
        delayed(function)(value, sleep_dur)
        for value, sleep_dur in zip(booleans, sleep_seconds))

doesn't work rather that config cannot be set consistently inside the thread/process?

Also the added test passes on master at least for me..

@thomasjpfan
Copy link
Member Author

Updated PR by adding a test using ThreadPoolExecutor with two workers to perform the same test as the joblib version.

Base automatically changed from master to main January 22, 2021 10:53
@@ -74,5 +83,56 @@ def test_set_config():
assert get_config()['assume_finite'] is False

# No unknown arguments
<<<<<<< HEAD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge error

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed!

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Comment on lines +19 to +21
if not hasattr(_threadlocal, 'global_config'):
_threadlocal.global_config = _global_config.copy()
return _threadlocal.global_config
Copy link
Member

@jeremiedbb jeremiedbb Apr 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just a nitpick. Why not put that directly in get_config ?
also I think the copy is unecessary here since there will be a copy afterwards anyway

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of _get_threadlocal_config for returning a threadlocal mutable version of the config, while get_config returns a immutable copy of the config. This is a safe guard from someone doing the following:

config = get_config()

# Does not change the configuration
config["assume_finite"] = True

which is consistent with the current behavior on main.

I added comments to describe this reasoning into the PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But do we intend to use _get_threadlocal_config outside of get_config ? If not why don't we do

def get_config():
    if not hasattr(_threadlocal, 'global_config'):
        _threadlocal.global_config = _global_config
    return  _threadlocal.global_config.copy()

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_threadlocal_config is used in set_config to update the threadlocal config.

In any case, for set_config to not mutable _global_config, _threadlocal.global_config needs to be initialize with a copy of _global_config.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_get_threadlocal_config is used in set_config

Yep missed that. Sorry. Forget my comments :)

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @thomasjpfan !

@jeremiedbb
Copy link
Member

@rth I think your concerns have been solved. Do you confirm ?

Copy link
Member

@rth rth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants