-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] Use Scipy cython BLAS API instead of bundled CBLAS #12732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
be33b3c
to
486cf22
Compare
Note that all the |
CI is failing because of cython version :( |
Yes, we did discuss this in #10624 and I think it should be possible. Are you sure this is not affected by the issues from #10624 (comment) though? |
Not I don't use const with fused typed memview. But after all it's not necessary, I found a workaround :) |
Maybe it could be worth checking if some discussion about such wrappers did take place at scipy and if not open an issue about it? From a user perspective, we might indeed want 1 cython function for matrix-matrix multiplication with BLAS irrespective of the C/Fortran alignment or dtype, not 4... |
There is an old open issue in scipy scipy/scipy#4516 for that. It's been active again this summer, so there may be some progress in that way. |
fa78d95
to
2aa4bcb
Compare
I had to increase rtol up to This is a really high rtol... |
So I tested this rtol thing with the bundled CBLAS and it turns out that the issue is already here. However, to mitigate this, I used data such that the result of gemv (or others) was often close to zero, meaning that it was involving differences between close numbers, which can easily lead to numerical precision issues. Using better conditioned data allows to retrieve a 1e-6 rtol which is fine for float32. |
Since minimal versions requirements have been upgraded in #12746, it's now safe to use scipy cython blas (scipy >= 0.16). Does that need a what's new entry ? |
I think we should do what's new entries for the individual estimators when there implementation are updated as a consequence of this infrastructure change. For this PR you can already add an entry for the pairwise_metrics sparse Manhattan distance computation. |
461abde
to
86556d9
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @jeremiedbb.
return ddot(&n, x, &incx, y, &incy) | ||
|
||
|
||
cpdef _dot_memview(floating[::1] x, floating[::1] y): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we be using const memoryviews to allow read-only input arrays?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fused typed const memoryviews does not work yet, see #10624
However, all the _xxx_memview
functions are just python wrappers to be able to test the C functions with pytest. They are not meant to be used in the python code base (if we want to multiply matrices in python we just do numpy dot), we don't want to expose blas functions at the python level.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah of course. But is there a reason we should not be using memview interfaces when that would simplify the call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There might be the small overhead of an additional function call (and I don't really know how memview behave versus pointers performance wise) but I agree it would simplify it.
Currently all blas functions are called within functions where we have access to the pointers, so I'm not sure it's worth making interfaces for what we don't need currently. Maybe we should reconsider doing it when the need comes ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any function call overhead. Certainly not python functions. There will be cost in accessing members of the memoryview struct, but minimal.
No hurry, you're right, but I still find it strange that we are passed order rather than determining it from the memoryview strides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think there's any function call overhead. Certainly not python functions. There will be cost in accessing members of the memoryview struct, but minimal.
No hurry, you're right, but I still find it strange that we are passed order rather than determining it from the memoryview strides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You convinced me :) I updated the functions to infer the memory layout.
sklearn/utils/_cython_blas.pyx
Outdated
dgemv(&ta_, &m, &n, &alpha, A, &lda, x, &incx, &beta, y, &incy) | ||
|
||
|
||
cpdef _gemv_memview(BLAS_Order order, BLAS_Trans ta, floating alpha, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't we determine order from A.strides[0] == A.itemsize
rather than having it passed in?
Merged! Thanks @jeremiedbb! |
…kit-learn#12732)" This reverts commit eb9a022.
…kit-learn#12732)" This reverts commit eb9a022.
First step to tackle #11638
This PR add a new module in utils,
_cython_blas
, which contains helpers for the BLAS functions shipped with scipy inscipy.cython_blas
.sgemm
anddgemm
for float and double matrix matrix multiplication). These helpers are fused,sgemv
anddgemv
are replaced by_xgemv
, which avoids the redundantI still have to fill the docstrings.
For now the module only contains helpers for the BLAS functions which were already used in sklearn (+gemm because I'm using it in another PR :) ). We can add other BLAS functions as we need them.
This module comes with a test suite, which tests the helpers for all type/layout/transpose configurations.
I also added an example of use in sklearn pairwise_fast with the
asum
function, seesklearn/metrics/pairwise_fast.pyx
andsklearn/metrics/setup.py
.I don't think I'll replace all occurences of CBLAS use in sklearn in this PR. I think it would be easier to do it in separate PRs. Removing the bundled CBLAS will only be possible once all replacement are done.
Finally, this PR requires an upgrade of scipy minimal version > 0.16, which seems to be on it's way (#12184)