-
-
Notifications
You must be signed in to change notification settings - Fork 26.2k
FIX Avoid dead-lock with OpenBLAS 0.3.26 on Windows inside pairwise distances calculation #28692
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
FIX Avoid dead-lock with OpenBLAS 0.3.26 on Windows inside pairwise distances calculation #28692
Conversation
@@ -36,8 +37,9 @@ cdef float64_t[::1] _sqeuclidean_row_norms64_dense( | |||
intp_t d = X.shape[1] | |||
float64_t[::1] squared_row_norms = np.empty(n, dtype=np.float64) | |||
|
|||
for idx in prange(n, schedule='static', nogil=True, num_threads=num_threads): | |||
squared_row_norms[idx] = _dot(d, X_ptr + idx * d, 1, X_ptr + idx * d, 1) | |||
with threadpool_limits(limits=1, user_api='blas'): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would add a comment here because it's surprising that it's necessary while _dot
is already single threaded. Maybe with a link to the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
_dot is already single threaded.
Actually it looks like it's not. see scipy/scipy#20294 (comment). It's weird cause every time I tested it, I only saw a single thread running, no matter the length of the arrays.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should just use nrm2
as suggested here scipy/scipy#20294 (comment), even though it implies take a square root and then square back. From an efficiency point of view, it should not have a visible impact since the computation of the norms is not the most costly part of the pairwise distance reduction compuations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not know this but ddot
is not necessarily single-threaded, see scipy/scipy#20294 (comment)
Instead of calling scikit-learn/sklearn/metrics/_pairwise_distances_reduction/_argkmin.pyx.tp Lines 59 to 93 in 64ae371
More precisely, move the This will this will reduce the number of successive calls to Note that: in addition to working around the thread-safety problem of recent OpenBLAS on Windows, this change can also protect against oversubscription problems for all platforms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Not sure if we need a changelog entry or not for this. Probably we should (probably as a fix). |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
I added a short changelog entry. Not sure if it is worth trying to mention all the estimators that can be affected by this. Also another question: should this be back-ported to 1.4.2? |
If you want to keep it simple, you can just say "neighbor based algorithms" :)
I think so since 1.4 users are impacted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I moved the changelog entry to target 1.4.2. LGTM, thanks.
…istances calculation (scikit-learn#28692) Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: jeremie du boisberranger <jeremiedbb@yahoo.fr>
Fix #28625. More context in scipy/scipy#20294.
It is easy to add a test, but for now we only have a single Windows build using MKL. The main question is do we want to add another Windows build with OpenBLAS?