Skip to content

MAINT Fix ctypedef types in tree submodule #27352

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Oct 5, 2023

Conversation

adam2392
Copy link
Member

@adam2392 adam2392 commented Sep 12, 2023

Reference Issues/PRs

Related to #25572

What does this implement/fix? Explain your changes.

I was playing around with ctypedefs and saw the related GH issue, which is a very nice cleanup. I went ahead and tried it and the code seems to be able to compile and pass unit-tests w/o issue.

This has to implement a change across all the tree submodule simultaneously and the _gradient_boosting.pyx file since they all rely on the types defined in the sklearn/tree/_tree.pxd file.

If you think there is a simpler strategy that refactors less LOC, then lmk.

Any other comments?

Hope this helps.

Signed-off-by: Adam Li <adam2392@gmail.com>
@github-actions
Copy link

github-actions bot commented Sep 12, 2023

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 74ce8b9. Link to the linter CI: here

Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you, @adam2392.

I wonder whether we could further unify types to only use the ones we define.

See my comment which I think can be generalised beyond double.

Comment on lines +35 to +39
cdef double weighted_n_samples # Weighted number of samples (in total)
cdef double weighted_n_node_samples # Weighted number of samples in the node
cdef double weighted_n_left # Weighted number of samples in the left node
cdef double weighted_n_right # Weighted number of samples in the right node
cdef double weighted_n_missing # Weighted number of samples that are missing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could double (here and also elsewhere) be changed to float64_t?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second this suggestion.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can make the change in this PR, or do it in a follow-on? I could do the int there as well as @lorentzenchr suggested below.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A separate PR would be easier to review.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In that case, I'll leave this as is for now. Would this PR be mergable? I can help test and make the systematic change in the tree submodule once we resolve this PR?

@jjerphan jjerphan changed the title [MAINT] Fix ctypedef types in tree submodule MAINT Fix ctypedef types in tree submodule Oct 4, 2023
Comment on lines +35 to +39
cdef double weighted_n_samples # Weighted number of samples (in total)
cdef double weighted_n_node_samples # Weighted number of samples in the node
cdef double weighted_n_left # Weighted number of samples in the left node
cdef double weighted_n_right # Weighted number of samples in the right node
cdef double weighted_n_missing # Weighted number of samples that are missing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I second this suggestion.

@@ -554,24 +554,24 @@ cdef inline int node_split_best(

# Sort n-element arrays pointed to by feature_values and samples, simultaneously,
# by the values in feature_values. Algorithm: Introsort (Musser, SP&E, 1997).
cdef inline void sort(DTYPE_t* feature_values, SIZE_t* samples, SIZE_t n) noexcept nogil:
cdef inline void sort(float32_t* feature_values, intp_t* samples, intp_t n) noexcept nogil:
if n == 0:
return
cdef int maxd = 2 * <int>log(n)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
cdef int maxd = 2 * <int>log(n)
cdef intp_t maxd = 2 * <intp_t>log(n)

cdef DTYPE_t pivot
cdef SIZE_t i, l, r
cdef void introsort(float32_t* feature_values, intp_t *samples,
intp_t n, int maxd) noexcept nogil:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
intp_t n, int maxd) noexcept nogil:
intp_t n, intp_t maxd) noexcept nogil:

@jjerphan
Copy link
Member

jjerphan commented Oct 5, 2023

I would treat the replacement for double with float64_t in another PR as proposed by #27352 (comment).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants