-
-
Notifications
You must be signed in to change notification settings - Fork 26k
[WIP] PERF Replace np.dot with higher level BLAS _gemv #20396
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -506,7 +506,10 @@ def enet_coordinate_descent_gram(floating[::1] w, | |||
cdef unsigned int n_features = Q.shape[0] | |||
|
|||
# initial value "Q w" which will be kept of up to date in the iterations | |||
cdef floating[:] H = np.dot(Q, w) | |||
# cdef floating[:] H = np.dot(Q, w) | |||
cdef floating[:] H = np.zeros(n_features, dtype=dtype) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would suggest making this empty
since it doesn't contribute to the computation below and is overwritten
cdef floating[:] H = np.zeros(n_features, dtype=dtype) | |
cdef floating[:] H = np.empty(n_features, dtype=dtype) |
Thank you @melissawm ! Could someone remind me what is the motivation of replacing np.dot with the BLAS call in this particular case? I understand the use case for combining multiple operations in a single BLAS call as in #11507 but that's not the case here. Is it that it would release the GIL or that there are less pre-processing? @melissawm Could you please
Actually, should we check that the array is C contiguous before applying BLAS functions? Since code can change in the future. |
floating[::1] w and np.ndarray[floating, ndim=2, mode='c'] Q that being said, I am not sure we would get a performance improvement in this specific situation. Maybe @jeremiedbb knows? In any case, a small benchmark would give us a definite answer. |
BTW, the GIL is already released at line 540. Neither the |
The motivation of the mentioned issue is to use higher level blas functions than the ones currently used when possible, i.e. use gemv instead of dots in a loop for instance, not replace numpy dot par directly calling the corresponding blas function. In would not expect much speed-up in that case, although a benchmark is welcomed because I could be wrong. |
Yeah I would agree with that. It's possible the list I made (a long time ago) included this line erroneously |
Got it - this is all helpful information so I'll review all this with benchmarks. I'll mark this one as a draft for completeness and post the benchmarking results here, but you are right that there may not be an improvement. In any case it is possible the I can go thought the list and find other stuff and now I know what to look for. Thanks! |
+1 to implement other cases from #13210 (comment) except for this one. |
#DataUmbrella LATAM sprint |
Sent @melissawm reminder on PR. |
This PR is stalled, and is open to a contributor. |
I have relabeled this PR as per this comment. |
I see 2 possible ways forward:
@melissawm IMO, it's up to you if you are still interested - and apologies for the long decision processes here:blush: |
If there is no measurable improvement, then I prefer to close. |
I agree, and as a general rule of thumb, directly calling BLAS should be done only where we don't want any python interaction at all. Elsewhere, numpy should be preferred, unless a convincing benchmarks shows otherwise. I'm also in favor of closing since it's not benchmarked (and very unlikely to give a speed-up). |
So let's close. |
Reference Issues/PRs
Partially addresses #13210
What does this implement/fix? Explain your changes.
Replaces
np.dot
with a BLAS_gemv
call.Any other comments?
This is my first PR so any feedback is welcome, thanks!