-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
MAINT Fix ctypedef types in tree submodule #27352
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, @adam2392.
I wonder whether we could further unify types to only use the ones we define.
See my comment which I think can be generalised beyond double
.
cdef double weighted_n_samples # Weighted number of samples (in total) | ||
cdef double weighted_n_node_samples # Weighted number of samples in the node | ||
cdef double weighted_n_left # Weighted number of samples in the left node | ||
cdef double weighted_n_right # Weighted number of samples in the right node | ||
cdef double weighted_n_missing # Weighted number of samples that are missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could double
(here and also elsewhere) be changed to float64_t
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I can make the change in this PR, or do it in a follow-on? I could do the int
there as well as @lorentzenchr suggested below.
WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A separate PR would be easier to review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In that case, I'll leave this as is for now. Would this PR be mergable? I can help test and make the systematic change in the tree submodule once we resolve this PR?
cdef double weighted_n_samples # Weighted number of samples (in total) | ||
cdef double weighted_n_node_samples # Weighted number of samples in the node | ||
cdef double weighted_n_left # Weighted number of samples in the left node | ||
cdef double weighted_n_right # Weighted number of samples in the right node | ||
cdef double weighted_n_missing # Weighted number of samples that are missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I second this suggestion.
@@ -554,24 +554,24 @@ cdef inline int node_split_best( | |||
|
|||
# Sort n-element arrays pointed to by feature_values and samples, simultaneously, | |||
# by the values in feature_values. Algorithm: Introsort (Musser, SP&E, 1997). | |||
cdef inline void sort(DTYPE_t* feature_values, SIZE_t* samples, SIZE_t n) noexcept nogil: | |||
cdef inline void sort(float32_t* feature_values, intp_t* samples, intp_t n) noexcept nogil: | |||
if n == 0: | |||
return | |||
cdef int maxd = 2 * <int>log(n) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cdef int maxd = 2 * <int>log(n) | |
cdef intp_t maxd = 2 * <intp_t>log(n) |
cdef DTYPE_t pivot | ||
cdef SIZE_t i, l, r | ||
cdef void introsort(float32_t* feature_values, intp_t *samples, | ||
intp_t n, int maxd) noexcept nogil: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intp_t n, int maxd) noexcept nogil: | |
intp_t n, intp_t maxd) noexcept nogil: |
I would treat the replacement for |
Signed-off-by: Adam Li <adam2392@gmail.com>
Reference Issues/PRs
Related to #25572
What does this implement/fix? Explain your changes.
I was playing around with ctypedefs and saw the related GH issue, which is a very nice cleanup. I went ahead and tried it and the code seems to be able to compile and pass unit-tests w/o issue.
This has to implement a change across all the tree submodule simultaneously and the
_gradient_boosting.pyx
file since they all rely on the types defined in thesklearn/tree/_tree.pxd
file.If you think there is a simpler strategy that refactors less LOC, then lmk.
Any other comments?
Hope this helps.