MAINT Refactor vector sentinel into utils #22728

thomasjpfan · 2022-03-08T03:13:36Z

Reference Issues/PRs

Follow up to #22320

What does this implement/fix? Explain your changes.

This PR refactors the StdVectorSentinel into it's own file and puts it into sklearn.utils. With this PR, adding new vectors entail.

Add new type to _typedef
Add new vector to vector_typed
Implement a StdVectorSentinel*
Add Sentinel to _create_sentinel

The caller of vector_to_nd_array does not need to know anything about sentinels. They can pass in a vector and an ndarray is returned.

Any other comments?

Running benchmarks from #22320 (review) I do not see any performance difference with this refactor.

I initially had a PR ready to resolve #11540 by using vector[int64_t] + StdVectorSentinelInt64. But I think the refactoring itself deserves it's own PR.

CC @jjerphan

thomasjpfan · 2022-03-08T04:01:57Z

sklearn/utils/_vector_sentinel.pyx

+        StdVectorSentinel sentinel =  _create_sentinel(vect_ptr)
+        np.ndarray arr = np.PyArray_SimpleNewFromData(
+            1, &size, sentinel.get_typenum(), sentinel.get_data())


I think this is slightly more clear than the current implementation. The get_data makes it clear that arr points to the data owned by the sentinel.

The implementation on main defines arr with the buffer from vect_ptr and then the sentinel would set the internal pointer in sentinel.vec to vect_ptr. Because only the pointers were swapped, arr is pointing to the correct place in memory.

jjerphan

Thank you for the refactoring.

I am glad that this set of features helps solving issues and improving others implementations.

This LGTM modulo a minor discussion: should we move coerce_vectors_to_nd_arrays here potentially renaming it to mention ragged arrays?

jjerphan · 2022-03-08T07:23:31Z

sklearn/utils/_vector_sentinel.pyx

+        StdVectorSentinel sentinel =  _create_sentinel(vect_ptr)
+        np.ndarray arr = np.PyArray_SimpleNewFromData(
+            1, &size, sentinel.get_typenum(), sentinel.get_data())


thomasjpfan · 2022-03-09T00:53:51Z

Should we move coerce_vectors_to_nd_arrays here potentially renaming it to mention ragged arrays?

I think ragged arrays in coerce_vectors_to_nd_arrays and it's fused type vector_vector_DITYPE_t is specific to RadiusNeighborhood.

In the future, if there is a need in other parts of the codebase for ragged arrays, I can see placing it into _vector_sentinel. For now, I prefer _vector_sentinel to focus on one task: converting single vectors into ndarrays.

jjerphan

LGTM.

jeremiedbb

LGTM

jeremiedbb · 2022-03-26T00:59:10Z

sklearn/utils/setup.py

+    config.add_extension(
+        "_vector_sentinel",
+        sources=["_vector_sentinel.pyx"],
+        libraries=libraries,
+        language="c++",
+    )


I'm always confused about when is include_dirs=[numpy.get_include()] necessary.. I thought it was whenever compiling against numpy c api.

Yea, it's strange. I added the include_dirs here to be safe.

MAINT Refactor vector sentinel into utils

97826ff

github-actions bot added cython module:metrics module:utils labels Mar 8, 2022

CLN Consistent naming

6e9de8c

thomasjpfan commented Mar 8, 2022

View reviewed changes

jjerphan reviewed Mar 8, 2022

View reviewed changes

Merge remote-tracking branch 'upstream/main' into cln_sentinel

1fcb04b

jjerphan approved these changes Mar 9, 2022

View reviewed changes

jjerphan added the Waiting for Reviewer label Mar 9, 2022

jjerphan added the Quick Review For PRs that are quick to review label Mar 16, 2022

jeremiedbb approved these changes Mar 26, 2022

View reviewed changes

thomasjpfan added 2 commits March 26, 2022 14:48

Merge remote-tracking branch 'upstream/main' into cln_sentinel

ba078be

BLD Add numpy include dir

b0ae5ed

jeremiedbb merged commit b51fbb0 into scikit-learn:main Mar 29, 2022

glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Apr 6, 2022

MAINT Refactor vector sentinel into utils (scikit-learn#22728)

f6932e7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

MAINT Refactor vector sentinel into utils #22728

MAINT Refactor vector sentinel into utils #22728

Uh oh!

thomasjpfan commented Mar 8, 2022 •

edited

Loading

Uh oh!

thomasjpfan Mar 8, 2022

Uh oh!

jjerphan Mar 8, 2022

Uh oh!

jjerphan left a comment

Uh oh!

jjerphan Mar 8, 2022

Uh oh!

thomasjpfan commented Mar 9, 2022

Uh oh!

jjerphan left a comment

Uh oh!

jeremiedbb left a comment

Uh oh!

jeremiedbb Mar 26, 2022

Uh oh!

thomasjpfan Mar 26, 2022

Uh oh!

Uh oh!

Uh oh!

MAINT Refactor vector sentinel into utils #22728

MAINT Refactor vector sentinel into utils #22728

Uh oh!

Conversation

thomasjpfan commented Mar 8, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Uh oh!

thomasjpfan Mar 8, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan Mar 8, 2022

Choose a reason for hiding this comment

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

jjerphan Mar 8, 2022

Choose a reason for hiding this comment

Uh oh!

thomasjpfan commented Mar 9, 2022

Uh oh!

jjerphan left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb left a comment

Choose a reason for hiding this comment

Uh oh!

jeremiedbb Mar 26, 2022

Choose a reason for hiding this comment

Uh oh!

thomasjpfan Mar 26, 2022

Choose a reason for hiding this comment

Uh oh!

Uh oh!

thomasjpfan commented Mar 8, 2022 •

edited

Loading