-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT Bump threadpoolctl min version for 1.5 #28838
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MAINT Bump threadpoolctl min version for 1.5 #28838
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sklearn/__init__.py
Outdated
@@ -141,6 +143,10 @@ | |||
except ModuleNotFoundError: | |||
pass | |||
|
|||
# Set a global controller that can be used to locally limit the number of | |||
# threads without looping through all shared libraries every time. | |||
_sklearn_threadpool_controller = ThreadpoolController() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Now inspection of shared libraries happen as soon as scikit-learn is imported, which may have some cost not sure if this matters in practice. Using %timeit
seems to show that it is quite fast on my machine ~0.3 milliseconds.
Also one question: IIUC ThreadpoolController
looks at loaded libraries in __init__
so that means that OpenBLAS and OpenMP needs to be loaded before _sklearn_threadpool_controller
is created.
- maybe adding a note about this would be nice to discourage moving the
ThreadpoolController
creation too early - I am also wondering if adding a test to make sure that BLAS and OpenMP can be seen by
_sklearn_threadpool_controller
is worth it or not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a comment.
The BLAS libs are already loaded in a kind of hidden way. Importing utils will import both numpy and scipy.linalg at some point. The OpenMP lib is already loaded by importing show_versions. I now import numpy and scipy.linalg to make sure that the BLAS libs are loaded first in a more explicit way.
I wouldn't add a test because if threadpoolctl fails to discover the libs (like with the latest scipy), the test would fail without anything that can be done on sklearn's side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wouldn't add a test because if threadpoolctl fails to discover the libs (like with the latest scipy), the test would fail without anything that can be done on sklearn's side.
Fair enough, I guess this is quite unlikely that people will move this code around anyway. I guess you mean in Numpy 2 rc e.g. joblib/threadpoolctl#175.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious BTW, if you create _sklearn_threadpool_controller
too early (before loading OpenBLAS for example) the _sklearn_threadpool_controller.limit(limits=1, user_api="blas")
don't do anything and there is no warning about it, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you mean in Numpy 2
indeed :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious BTW, if you create _sklearn_threadpool_controller too early (before loading OpenBLAS for example) the _sklearn_threadpool_controller.limit(limits=1, user_api="blas") don't do anything and there is no warning about it, right?
Yes, the goal of the controller object it to look for loaded libs once, during instantiation. Then calls to limit
won't refresh the list so it will have no effect if libs were not loaded before instantiation.
It would be hard to find a valid and consistent condition to warn about. For instance OpenMP is not mandatory and threadpoolctl can fail to detect BLAS libs (cf numpy 2) (it's a bug but still it can happen), so it can happen that no libs at all are loaded in the controller and we can't even warn in that situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks reasonable. Updating the threadpoolctl version also seems like the right thing to do.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides small nitpicky remarks, I am fine with this change. I was wondering if we should instead introduce a sklearn._get_threadpool_controller
callable that would lazily instantiate (and cache) the controller singleton when we actually need it, but I don't think the extra complexity is worth it as I suspect it most scikit-learn user would trigger it one way or another.
It least the the code is more direct this way and easier to debug in case of problems.
…into bump-threadpoolctl-min-version
The CI failures are unrelated and already reported in #28868 |
I'd like to bump the min version of
threadpoolctl
from 2.0.0 to 3.1.0.