Skip to content

PERF Pass buffers via pointers in PairwiseDistancesReductions routines for sparse data #26765

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jul 21, 2023

Conversation

Micky774
Copy link
Contributor

@Micky774 Micky774 commented Jul 4, 2023

Reference Issues/PRs

Discussed a little here

What does this implement/fix? Explain your changes.

The {r}dist_csr methods now accept pointers rather than memory views since none of the peripheral data of memory views are used. This significantly decreases call overhead, which is especially beneficial for the tight loops in which these functions are often used.

This also includes a formatting fix for HaversineDistance

This also enforces contiguous layouts wherever possible.

Any other comments?

Benchmarks generated via: https://gist.github.com/Micky774/9daede3d638ebbdbb34bc26f884f2748

Benchmarks

208142a6-f381-4552-a45d-466c61a846e6

afa4249a-280e-494b-b65c-06dec4c823b9

@Micky774
Copy link
Contributor Author

Micky774 commented Jul 4, 2023

@jjerphan @thomasjpfan In case either of you have interest in the PR. The scope/content is fairly limited.

@github-actions
Copy link

github-actions bot commented Jul 4, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 92f7c12. Link to the linter CI: here

@Micky774 Micky774 changed the title MNT Pass indices memory-view by reference rather than by value PERF Pass indices memory-view by reference rather than by value Jul 4, 2023
@jjerphan
Copy link
Member

jjerphan commented Jul 4, 2023

Nice one! Those implementations get up to 1.5× speedups!

Two questions:

  • can you add a changelog entry?
  • are there any other places where buffers' pointers could be passed instead of memoryviews?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the performance improvement, I think this deserves a changelog entry in v1.4. Otherwise, LGTM.

@jjerphan
Copy link
Member

jjerphan commented Jul 5, 2023

Also, should we adjust the title of this PR to be more specific? For instance, how about this one?

PERF Pass buffers via pointers in `PairwiseDistancesReductions` routines for sparse data

@Micky774 Micky774 changed the title PERF Pass indices memory-view by reference rather than by value PERF Pass buffers via pointers in PairwiseDistancesReductions routines for sparse data Jul 7, 2023
@Micky774
Copy link
Contributor Author

Micky774 commented Jul 7, 2023

are there any other places where buffers' pointers could be passed instead of memoryviews?

Almost certainly. Anywhere we don't use the associated attributes of a memory-view and are able to leverage contiguous memory, we could technically use pointers instead. With that being said, this is really only a benefit for function calls in tight loops where the overhead is a dominant factor. I'm not sure where else that pattern is present (maybe somewhere in linear models or trees?)

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @Micky774!

LGTM modulo a comment.

As you indicated, other Cython implementations might be adapted similarly. I think we can keep them for first time Cython-contributions. What do you think?

@da-woods: are the optimisations that you have mentioned in #25608 (comment) similar to those ones?

Comment on lines 95 to 97
- |Performance| Computing pairwise distances for (CSR x CSR) and (CSR x Dense)
datasets is now 1.5x faster by improving the argument passing strategy used
in the computation routines in :class:`metrics.DistanceMetric`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nitpick: I think that we generally keep changelog entry short without giving too much technical details.

Suggested change
- |Performance| Computing pairwise distances for (CSR x CSR) and (CSR x Dense)
datasets is now 1.5x faster by improving the argument passing strategy used
in the computation routines in :class:`metrics.DistanceMetric`.
- |Performance| Computing pairwise distances via :class:`metrics.DistanceMetric` for CSR × CSR, Dense × CSR, and CSR × Dense datasets is now 1.5x faster.

Should we also add another one for estimators of module.neighbors?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also add another one for estimators of module.neighbors?

I'm not sure -- there are a lot of estimators which at least partly use the DistanceMetric backend
(the following list is puled from #26329):

- NearestNeighbors
- KNeighborsRegressor
- KNeighborsClassifier
- RadiusNeighborsRegressor
- RadiusNeighborsClassifier
- DBSCAN
- OPTICS
- Isomap
- TSNE (self.method != "exact")
- KernelDensity
- AffinityPropagation
- Birch
- MeanShift
- NearestCentroid

I don't quite know where to draw the line for this if we do include it for other estimators specifically. Most .predict methods, for example, benefit from this.

@Micky774
Copy link
Contributor Author

Micky774 commented Jul 7, 2023

As you indicated, other Cython implementations might be adapted similarly. I think we can keep them for first time Cython-contributions. What do you think?

I think that's a good idea, though I'm not sure what the best way to discover where in the code these changes ought to be propagated. Perhaps an open ended meta-issue of "If you spot this pattern, feel free to open a PR fixing it" would be appropriate?

@jjerphan
Copy link
Member

jjerphan commented Jul 7, 2023

Yes, I guess we can spend some time to read implementations and list in a meta-issue candidate places for those optimisations.

@da-woods
Copy link

da-woods commented Jul 7, 2023

The one thing I'm slightly nervous of is that you look to be taking these pointers from non-contiguous memoryviews. I suspect practically they are contiguous in practice, but it might be nice to enforce that.

Passing a pointer is about as light-weight as you can get, so nothing Cython can do is ever likely to beat it. The changes I mentioned in the linked issue should bring memoryview slicing a lot closer, but pointers are still likely to win. (Essentially all I've done is noticed that taking multiple slices of the same array is a reference counting no-op, so you pay for one reference count at the start of the loop and that's it. From the point of view of this PR, it's a distraction though)

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to operate on continuous buffers as mentioned by #26765 (comment).

@Micky774
Copy link
Contributor Author

I suspect practically they are contiguous in practice, but it might be nice to enforce that.

Good point!

I've now done so in the attribute declarations and some method signatures. I wanted to as well check if enforcing it in the {r}dist_csr signatures would offer any boost as an alternative to passing the raw pointer, but it seems to be a negligible difference wrt main. With the attributes' (e.g. {X, Y}_indices ) contiguity being enforced, we ought to be able to "safely" use the pointers directly now.

Let me know if there are any other spots you think we could tighten up our guarantees. Thanks to all 😄

@Micky774
Copy link
Contributor Author

@jjerphan Thoughts?

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM modulo a suggestion.

This must remove some of the dispatch cost that @Vincent-Maladiere and I typically have observed in the past with #25170.

@jjerphan jjerphan enabled auto-merge (squash) July 21, 2023 20:44
@jjerphan jjerphan merged commit 0486033 into scikit-learn:main Jul 21, 2023
@Micky774 Micky774 deleted the memview_to_ptr branch July 22, 2023 03:14
punndcoder28 pushed a commit to punndcoder28/scikit-learn that referenced this pull request Jul 29, 2023
REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants