MAINT Fix ctypedef types in tree submodule #27352

adam2392 · 2023-09-12T19:07:26Z

Reference Issues/PRs

Related to #25572

What does this implement/fix? Explain your changes.

I was playing around with ctypedefs and saw the related GH issue, which is a very nice cleanup. I went ahead and tried it and the code seems to be able to compile and pass unit-tests w/o issue.

This has to implement a change across all the tree submodule simultaneously and the _gradient_boosting.pyx file since they all rely on the types defined in the sklearn/tree/_tree.pxd file.

If you think there is a simpler strategy that refactors less LOC, then lmk.

Any other comments?

Hope this helps.

Signed-off-by: Adam Li <adam2392@gmail.com>

github-actions · 2023-09-12T19:09:15Z

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

_{Generated for commit: 74ce8b9. Link to the linter CI: here}

Signed-off-by: Adam Li <adam2392@gmail.com>

jjerphan

Thank you, @adam2392.

I wonder whether we could further unify types to only use the ones we define.

See my comment which I think can be generalised beyond double.

jjerphan · 2023-10-04T20:21:25Z

sklearn/tree/_criterion.pxd

+    cdef double weighted_n_samples         # Weighted number of samples (in total)
+    cdef double weighted_n_node_samples    # Weighted number of samples in the node
+    cdef double weighted_n_left            # Weighted number of samples in the left node
+    cdef double weighted_n_right           # Weighted number of samples in the right node
+    cdef double weighted_n_missing         # Weighted number of samples that are missing


Could double (here and also elsewhere) be changed to float64_t?

I second this suggestion.

Sure, I can make the change in this PR, or do it in a follow-on? I could do the int there as well as @lorentzenchr suggested below.

WDYT?

A separate PR would be easier to review.

In that case, I'll leave this as is for now. Would this PR be mergable? I can help test and make the systematic change in the tree submodule once we resolve this PR?

lorentzenchr · 2023-10-05T08:03:16Z

sklearn/tree/_criterion.pxd

+    cdef double weighted_n_samples         # Weighted number of samples (in total)
+    cdef double weighted_n_node_samples    # Weighted number of samples in the node
+    cdef double weighted_n_left            # Weighted number of samples in the left node
+    cdef double weighted_n_right           # Weighted number of samples in the right node
+    cdef double weighted_n_missing         # Weighted number of samples that are missing


I second this suggestion.

sklearn/tree/_criterion.pxd

sklearn/tree/_criterion.pyx

lorentzenchr · 2023-10-05T08:35:27Z

sklearn/tree/_splitter.pyx

@@ -554,24 +554,24 @@ cdef inline int node_split_best(

 # Sort n-element arrays pointed to by feature_values and samples, simultaneously,
 # by the values in feature_values. Algorithm: Introsort (Musser, SP&E, 1997).
-cdef inline void sort(DTYPE_t* feature_values, SIZE_t* samples, SIZE_t n) noexcept nogil:
+cdef inline void sort(float32_t* feature_values, intp_t* samples, intp_t n) noexcept nogil:
    if n == 0:
        return
    cdef int maxd = 2 * <int>log(n)


Suggested change

cdef int maxd = 2 * <int>log(n)

cdef intp_t maxd = 2 * <intp_t>log(n)

lorentzenchr · 2023-10-05T08:36:00Z

sklearn/tree/_splitter.pyx

-    cdef DTYPE_t pivot
-    cdef SIZE_t i, l, r
+cdef void introsort(float32_t* feature_values, intp_t *samples,
+                    intp_t n, int maxd) noexcept nogil:


Suggested change

intp_t n, int maxd) noexcept nogil:

intp_t n, intp_t maxd) noexcept nogil:

sklearn/tree/_splitter.pyx

jjerphan · 2023-10-05T16:22:30Z

I would treat the replacement for double with float64_t in another PR as proposed by #27352 (comment).

Signed-off-by: Adam Li <adam2392@gmail.com>

Fix ctypedef types in tree submodule

8d2332b

Signed-off-by: Adam Li <adam2392@gmail.com>

github-actions bot added cython module:ensemble module:tree labels Sep 12, 2023

Fix spacing in in-line comments

74ce8b9

Signed-off-by: Adam Li <adam2392@gmail.com>

jjerphan reviewed Oct 4, 2023

View reviewed changes

jjerphan changed the title ~~[MAINT] Fix ctypedef types in tree submodule~~ MAINT Fix ctypedef types in tree submodule Oct 4, 2023

lorentzenchr reviewed Oct 5, 2023

View reviewed changes

lorentzenchr approved these changes Oct 5, 2023

View reviewed changes

jjerphan approved these changes Oct 5, 2023

View reviewed changes

jjerphan merged commit fb0ab5a into scikit-learn:main Oct 5, 2023

adam2392 deleted the ctypecrit branch October 5, 2023 23:00

This was referenced Oct 6, 2023

MAINT Replace double with float64_t inside tree submodule #27539

Merged

MAINT Convert int to intp_t ctype def in tree/ related code #27546

Merged

tree_xpd threshold changed to float32 #27536

Closed

RFC Guideline for usage of Cython types #25572

Closed

REDVM pushed a commit to REDVM/scikit-learn that referenced this pull request Nov 16, 2023

MAINT Fix ctypedef types in tree submodule (scikit-learn#27352)

2881cc2

Signed-off-by: Adam Li <adam2392@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MAINT Fix ctypedef types in tree submodule #27352

MAINT Fix ctypedef types in tree submodule #27352

adam2392 commented Sep 12, 2023 •

edited

Loading

github-actions bot commented Sep 12, 2023 •

edited

Loading

jjerphan left a comment

jjerphan Oct 4, 2023

lorentzenchr Oct 5, 2023

adam2392 Oct 5, 2023

lorentzenchr Oct 5, 2023

adam2392 Oct 5, 2023

lorentzenchr Oct 5, 2023

lorentzenchr Oct 5, 2023

lorentzenchr Oct 5, 2023

jjerphan commented Oct 5, 2023

	cdef int maxd = 2 * <int>log(n)
	cdef intp_t maxd = 2 * <intp_t>log(n)

	intp_t n, int maxd) noexcept nogil:
	intp_t n, intp_t maxd) noexcept nogil:

MAINT Fix ctypedef types in tree submodule #27352

MAINT Fix ctypedef types in tree submodule #27352

Conversation

adam2392 commented Sep 12, 2023 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

github-actions bot commented Sep 12, 2023 • edited Loading

✔️ Linting Passed

jjerphan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjerphan commented Oct 5, 2023

adam2392 commented Sep 12, 2023 •

edited

Loading

github-actions bot commented Sep 12, 2023 •

edited

Loading