Skip to content

[WIP] PERF Replace np.dot with higher level BLAS _gemv #20396

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

melissawm
Copy link

Reference Issues/PRs

Partially addresses #13210

What does this implement/fix? Explain your changes.

Replaces np.dot with a BLAS _gemv call.

Any other comments?

This is my first PR so any feedback is welcome, thanks!

@@ -506,7 +506,10 @@ def enet_coordinate_descent_gram(floating[::1] w,
cdef unsigned int n_features = Q.shape[0]

# initial value "Q w" which will be kept of up to date in the iterations
cdef floating[:] H = np.dot(Q, w)
# cdef floating[:] H = np.dot(Q, w)
cdef floating[:] H = np.zeros(n_features, dtype=dtype)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would suggest making this empty since it doesn't contribute to the computation below and is overwritten

Suggested change
cdef floating[:] H = np.zeros(n_features, dtype=dtype)
cdef floating[:] H = np.empty(n_features, dtype=dtype)

@rth
Copy link
Member

rth commented Jun 27, 2021

Thank you @melissawm !

Could someone remind me what is the motivation of replacing np.dot with the BLAS call in this particular case? I understand the use case for combining multiple operations in a single BLAS call as in #11507 but that's not the case here. Is it that it would release the GIL or that there are less pre-processing?

@melissawm Could you please

  • benchmark ElasticNet before and after this change
  • confirm that the way this function is used Q and w is, I assume, always C contiguous, meaning that it was allocated somewhere before in the code, and is not for instance is a user input or obtained by array slicing.

Actually, should we check that the array is C contiguous before applying BLAS functions? Since code can change in the future.

@ogrisel
Copy link
Member

ogrisel commented Jun 28, 2021

Q and w are both contiguous (with the C-layout for Q as declared in the prototype of the enet_coordinate_descent_gram Cython function:

floating[::1] w

and

np.ndarray[floating, ndim=2, mode='c'] Q

that being said, I am not sure we would get a performance improvement in this specific situation. Maybe @jeremiedbb knows? In any case, a small benchmark would give us a definite answer.

@ogrisel
Copy link
Member

ogrisel commented Jun 28, 2021

BTW, the GIL is already released at line 540. Neither the np.dot version nor the _gemv version need the GIL in this case.

@jeremiedbb
Copy link
Member

jeremiedbb commented Jun 28, 2021

The motivation of the mentioned issue is to use higher level blas functions than the ones currently used when possible, i.e. use gemv instead of dots in a loop for instance, not replace numpy dot par directly calling the corresponding blas function. In would not expect much speed-up in that case, although a benchmark is welcomed because I could be wrong.

@jakirkham
Copy link
Contributor

Yeah I would agree with that. It's possible the list I made (a long time ago) included this line erroneously

@melissawm
Copy link
Author

Got it - this is all helpful information so I'll review all this with benchmarks. I'll mark this one as a draft for completeness and post the benchmarking results here, but you are right that there may not be an improvement. In any case it is possible the I can go thought the list and find other stuff and now I know what to look for. Thanks!

@melissawm melissawm marked this pull request as draft June 28, 2021 18:26
@melissawm melissawm changed the title PERF Replace np.dot with higher level BLAS _gemv [ẂIP] PERF Replace np.dot with higher level BLAS _gemv Jun 28, 2021
@rth
Copy link
Member

rth commented Jun 28, 2021

+1 to implement other cases from #13210 (comment) except for this one.

@melissawm melissawm changed the title [ẂIP] PERF Replace np.dot with higher level BLAS _gemv [WIP] PERF Replace np.dot with higher level BLAS _gemv Jun 28, 2021
@reshamas
Copy link
Member

reshamas commented Jul 7, 2021

#DataUmbrella LATAM sprint

@reshamas
Copy link
Member

Sent @melissawm reminder on PR.

@reshamas
Copy link
Member

This PR is stalled, and is open to a contributor.

@reshamas reshamas added the Sprint label Nov 5, 2021
@cmarmo cmarmo added Needs Decision - Close Requires decision for closing and removed help wanted labels Feb 22, 2022
@cmarmo
Copy link
Contributor

cmarmo commented Feb 22, 2022

I have relabeled this PR as per this comment.
Feel free to modify if I am wrong.

@lorentzenchr
Copy link
Member

lorentzenchr commented Mar 26, 2022

I see 2 possible ways forward:

  1. Put the initialization of H just after with nogil and merge - we should not ask to make extensive benchmarks here as we do not except a (well measurable) improvement.
  2. Close this PR (which I prefer over open ends).

@melissawm IMO, it's up to you if you are still interested - and apologies for the long decision processes here:blush:

@thomasjpfan
Copy link
Member

If there is no measurable improvement, then I prefer to close. np.dot is easier for future contributors & maintainers to reason about.

@jeremiedbb
Copy link
Member

np.dot is easier for future contributors & maintainers to reason about.

I agree, and as a general rule of thumb, directly calling BLAS should be done only where we don't want any python interaction at all. Elsewhere, numpy should be preferred, unless a convincing benchmarks shows otherwise.

I'm also in favor of closing since it's not benchmarked (and very unlikely to give a speed-up).

@lorentzenchr
Copy link
Member

So let's close.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants