-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT Miscellaneous maintenance items the private PairwiseDistancesReductions
submodule
#24542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Thank you, @MarcoGorelli for working on cython-lint! 🙏
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR. Overall, this is an improvement (after fixing the failing tests). Here are a few suggestions:
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx.tp
Outdated
Show resolved
Hide resolved
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx.tp
Outdated
Show resolved
Hide resolved
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx.tp
Outdated
Show resolved
Hide resolved
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx.tp
Outdated
Show resolved
Hide resolved
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR! Although the lines changed is not big, this PR has a lot of scope:
- Renaming of
BaseDistancesReductionDispatcher
and_radius_neighbors.pxd
- Refactoring Tempita files to reduce the number of parameters in each configuration
- Improving comments throughout
- Refactor interface from
_compute_distances_on_chunks
to_compute_dist_middle_terms
. - Supporting
X_norm_squared
inmetric_kwargs
If 1, 2, 3, 4 were their own PRs, I think they would be easy to merge quickly.
#24623, #24624, #24625, #24626 have been opened to isolate orthogonal changes as proposed in #24542 (review). |
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
b29c79d resolves the last discussions about the (in fact useless) NaN guard its associated check via the ternary operator (it's simpler now). |
Once the sub PR are merged and this PR is updated, I would like to see if the change in the handling of |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM besides the following:
sklearn/metrics/_pairwise_distances_reduction/_radius_neighbors.pyx.tp
Outdated
Show resolved
Hide resolved
Also remove the comment not to duplicate and restate information at several place in those implementations. Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
See the reasons here: scikit-learn#17299
This reverts commit f2e917b.
Similarly to what has been observed in #24715, there seems to be some small regressions for configurations using small datasets.
Might this be linked to some CPython process setup or runtime variability? In any case, I do not think that they are significant enough for blocking the merge. What do you think? Full ASV resultsRun on a machine with 128 physical cores using the asv continuous -b PairwiseDistancesR -e main benchmarks/maint/pdr-misc
|
Hey @jjerphan, does using fewer threads (say 8 or even a single one) yield the odd performances for some small datasets you observed on both this PR and #24715? Is there any possibility that using 128 threads on small datasets causes some instabilities? Alternatively, could it be linked to some recent small changes in Cython itself? If so, have you tried fixing the Cython version to a previous one? |
That was it, @Vincent-Maladiere! Using 8 threads, no performance regression are observed using ASV. I think using 128 threads just makes the variability due to CPython interpreter overhead more visible on configuration with at least one small dataset. @ogrisel and @thomasjpfan: having this PR merged would unlock other PRs. Can you have a lot at this one, please? 🙏 Full ASV outputasv continuous -b PairwiseDistancesR -e main HEAD
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Reference Issues/PRs
Relates to #22587.
What does this implement/fix? Explain your changes.
This suggest various miscellaneous improvements for maintenance using atomic and independent commits.
I would recommend reviewing each commit instead of the full diff.
Any other comments?
Feel free to suggest anything.