-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
PERF Pass buffers via pointers in PairwiseDistancesReductions
routines for sparse data
#26765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@jjerphan @thomasjpfan In case either of you have interest in the PR. The scope/content is fairly limited. |
Nice one! Those implementations get up to 1.5× speedups! Two questions:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the performance improvement, I think this deserves a changelog entry in v1.4
. Otherwise, LGTM.
Also, should we adjust the title of this PR to be more specific? For instance, how about this one?
|
PairwiseDistancesReductions
routines for sparse data
Almost certainly. Anywhere we don't use the associated attributes of a memory-view and are able to leverage contiguous memory, we could technically use pointers instead. With that being said, this is really only a benefit for function calls in tight loops where the overhead is a dominant factor. I'm not sure where else that pattern is present (maybe somewhere in linear models or trees?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @Micky774!
LGTM modulo a comment.
As you indicated, other Cython implementations might be adapted similarly. I think we can keep them for first time Cython-contributions. What do you think?
@da-woods: are the optimisations that you have mentioned in #25608 (comment) similar to those ones?
doc/whats_new/v1.4.rst
Outdated
- |Performance| Computing pairwise distances for (CSR x CSR) and (CSR x Dense) | ||
datasets is now 1.5x faster by improving the argument passing strategy used | ||
in the computation routines in :class:`metrics.DistanceMetric`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nitpick: I think that we generally keep changelog entry short without giving too much technical details.
- |Performance| Computing pairwise distances for (CSR x CSR) and (CSR x Dense) | |
datasets is now 1.5x faster by improving the argument passing strategy used | |
in the computation routines in :class:`metrics.DistanceMetric`. | |
- |Performance| Computing pairwise distances via :class:`metrics.DistanceMetric` for CSR × CSR, Dense × CSR, and CSR × Dense datasets is now 1.5x faster. |
Should we also add another one for estimators of module.neighbors
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also add another one for estimators of
module.neighbors
?
I'm not sure -- there are a lot of estimators which at least partly use the DistanceMetric
backend
(the following list is puled from #26329):
- NearestNeighbors
- KNeighborsRegressor
- KNeighborsClassifier
- RadiusNeighborsRegressor
- RadiusNeighborsClassifier
- DBSCAN
- OPTICS
- Isomap
- TSNE (self.method != "exact")
- KernelDensity
- AffinityPropagation
- Birch
- MeanShift
- NearestCentroid
I don't quite know where to draw the line for this if we do include it for other estimators specifically. Most .predict
methods, for example, benefit from this.
I think that's a good idea, though I'm not sure what the best way to discover where in the code these changes ought to be propagated. Perhaps an open ended meta-issue of "If you spot this pattern, feel free to open a PR fixing it" would be appropriate? |
Yes, I guess we can spend some time to read implementations and list in a meta-issue candidate places for those optimisations. |
The one thing I'm slightly nervous of is that you look to be taking these pointers from non-contiguous memoryviews. I suspect practically they are contiguous in practice, but it might be nice to enforce that. Passing a pointer is about as light-weight as you can get, so nothing Cython can do is ever likely to beat it. The changes I mentioned in the linked issue should bring memoryview slicing a lot closer, but pointers are still likely to win. (Essentially all I've done is noticed that taking multiple slices of the same array is a reference counting no-op, so you pay for one reference count at the start of the loop and that's it. From the point of view of this PR, it's a distraction though) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to operate on continuous buffers as mentioned by #26765 (comment).
Good point! I've now done so in the attribute declarations and some method signatures. I wanted to as well check if enforcing it in the Let me know if there are any other spots you think we could tighten up our guarantees. Thanks to all 😄 |
@jjerphan Thoughts? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo a suggestion.
This must remove some of the dispatch cost that @Vincent-Maladiere and I typically have observed in the past with #25170.
…nes for sparse data (scikit-learn#26765)
…nes for sparse data (scikit-learn#26765)
Reference Issues/PRs
Discussed a little here
What does this implement/fix? Explain your changes.
The
{r}dist_csr
methods now accept pointers rather than memory views since none of the peripheral data of memory views are used. This significantly decreases call overhead, which is especially beneficial for the tight loops in which these functions are often used.This also includes a formatting fix for
HaversineDistance
This also enforces contiguous layouts wherever possible.
Any other comments?
Benchmarks generated via: https://gist.github.com/Micky774/9daede3d638ebbdbb34bc26f884f2748
Benchmarks