Skip to content

MNT Use const memory views in DistanceMetric subclasses #19883

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

jjerphan
Copy link
Member

@jjerphan jjerphan commented Apr 13, 2021

Reference Issues/PRs

Fixes #19875.
See also #10624.

What does this implement/fix? Explain your changes.

Distances computations (cluster.DistanceMetrics) implemented in Cython do not support readonly datasets, which are obtained when using mem-mapping for instance. This makes some interfaces relying on those implementation fail (see #19875).

This PR makes the Cython implementations use const memoryviews to be able computing those distances on memmapped datasets.

Any other comments?

5b61a29 fixes #19875, but maybe we should have the downstream issues be solved independently from this PR.

Also scipy.spatial.distance.{cdist,pdist} on memmapped datasets with 'mahalanobis' as a metric fail.
I am inspecting why; in the meantime 716b330 skips the associated tests.

@thomasjpfan
Copy link
Member

Const fused types has not been released yet. But in this case, DTYPE_t is not a fused type, so I think everything will still work.

@jjerphan jjerphan changed the title MNT Use const with fused types memory views for cluster.DistanceMetrics MNT Use const memory views for cluster.DistanceMetrics Apr 13, 2021
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for fixing #19875 explicitly as part of this PR with a dedicated non-regression test + an entry in the changelog.

Since the DistanceMetric class is part of the documented public API, we should also mention the fix for this class in the changelog.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some minor comments but otherwise LGTM. Is this PR style a DRAFT? Do you plan other changes? I think it's already a net improvement.

@jjerphan jjerphan marked this pull request as ready for review April 14, 2021 08:26
jjerphan and others added 5 commits April 14, 2021 10:28
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@jjerphan
Copy link
Member Author

Is this PR style a DRAFT? Do you plan other changes? I think it's already a net improvement.

I was inspecting if changes made to neighbors.DistanceMetric were impacting other public interfaces, but that's not the case.

@jjerphan jjerphan changed the title MNT Use const memory views for cluster.DistanceMetrics MNT Use const memory views for neighbors.DistanceMetric subclasses Apr 14, 2021
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @jjerphan !

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan changed the title MNT Use const memory views for neighbors.DistanceMetric subclasses MNT Use const memory views in DistanceMetric subclasses Apr 14, 2021
@thomasjpfan thomasjpfan merged commit 138da7e into scikit-learn:main Apr 14, 2021
@jjerphan jjerphan deleted the const_fused_memviews_dist_metrics branch April 14, 2021 15:27
thomasjpfan added a commit to thomasjpfan/scikit-learn that referenced this pull request Apr 19, 2021
…#19883)

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[BUG] cluster.AgglomerativeClustering fails on readonly memmapped datasets
3 participants