-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT Refactor vector sentinel into utils #22728
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
StdVectorSentinel sentinel = _create_sentinel(vect_ptr) | ||
np.ndarray arr = np.PyArray_SimpleNewFromData( | ||
1, &size, sentinel.get_typenum(), sentinel.get_data()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is slightly more clear than the current implementation. The get_data
makes it clear that arr
points to the data owned by the sentinel.
The implementation on main
defines arr
with the buffer from vect_ptr
and then the sentinel would set the internal pointer in sentinel.vec
to vect_ptr
. Because only the pointers were swapped, arr
is pointing to the correct place in memory.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the refactoring.
I am glad that this set of features helps solving issues and improving others implementations.
This LGTM modulo a minor discussion: should we move coerce_vectors_to_nd_arrays
here potentially renaming it to mention ragged arrays?
StdVectorSentinel sentinel = _create_sentinel(vect_ptr) | ||
np.ndarray arr = np.PyArray_SimpleNewFromData( | ||
1, &size, sentinel.get_typenum(), sentinel.get_data()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
I think ragged arrays in In the future, if there is a need in other parts of the codebase for ragged arrays, I can see placing it into |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
sklearn/utils/setup.py
Outdated
config.add_extension( | ||
"_vector_sentinel", | ||
sources=["_vector_sentinel.pyx"], | ||
libraries=libraries, | ||
language="c++", | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm always confused about when is include_dirs=[numpy.get_include()]
necessary.. I thought it was whenever compiling against numpy c api.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea, it's strange. I added the include_dirs
here to be safe.
Reference Issues/PRs
Follow up to #22320
What does this implement/fix? Explain your changes.
This PR refactors the
StdVectorSentinel
into it's own file and puts it intosklearn.utils
. With this PR, adding new vectors entail._typedef
vector_typed
StdVectorSentinel*
_create_sentinel
The caller of
vector_to_nd_array
does not need to know anything about sentinels. They can pass in a vector and an ndarray is returned.Any other comments?
Running benchmarks from #22320 (review) I do not see any performance difference with this refactor.
I initially had a PR ready to resolve #11540 by using
vector[int64_t]
+StdVectorSentinelInt64
. But I think the refactoring itself deserves it's own PR.CC @jjerphan