Skip to content

FIX Uses log2 in tree building #30557

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 6 commits into from
Jan 2, 2025

Conversation

thomasjpfan
Copy link
Member

Reference Issues/PRs

Fixes #30554

What does this implement/fix? Explain your changes.

This PR replaces log with log2 and explicitly uses that in sklearn/tree/_partitioner.pyx, so it's more clear that we are using base 2.

@thomasjpfan thomasjpfan changed the title Adds whats new BUG: Uses log2 in tree building Dec 30, 2024
@thomasjpfan thomasjpfan changed the title BUG: Uses log2 in tree building FIX Uses log2 in tree building Dec 30, 2024
Copy link

github-actions bot commented Dec 30, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 07c1b23. Link to the linter CI: here

Copy link
Contributor

@OmarManzoor OmarManzoor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks @thomasjpfan

@OmarManzoor OmarManzoor added Waiting for Second Reviewer First reviewer is done, need a second one! Quick Review For PRs that are quick to review labels Dec 30, 2024
@jeremiedbb
Copy link
Member

The name of the fragment should be 30557.fix.rst. Otherwise LGTM

@glemaitre
Copy link
Member

Do you think that it is worth a minimal non-regression test to acknowledge that we use the log2?

@ogrisel
Copy link
Member

ogrisel commented Dec 30, 2024

We might also want to delete the log function of https://github.com/scikit-learn/scikit-learn/blob/6385b7f7f4395f050cb4c85449fa3987031598c1/sklearn/tree/_utils.pyx#L65C1-L66C27 and update any remaining calls such as sklearn/tree/_criterion.pyx to cimport log2 from libc.math instead.

This change would better be done in another MAINT PR that would not be part of the 1.6.1 released fix.

@ogrisel
Copy link
Member

ogrisel commented Dec 30, 2024

Do you think that it is worth a minimal non-regression test to acknowledge that we use the log2?

I approved the PR but forgot about this point. I am not sure how to do that, but it would be great to find a way.

Copy link
Member Author

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a non-regression test that uses an example that fails on main and passes with this PR.

@@ -702,12 +702,17 @@ cdef inline void shift_missing_values_to_left_if_required(
best.pos += best.n_missing


def _py_sort(float32_t[::1] feature_values, intp_t[::1] samples, intp_t n):
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this function to test sort directly.

@jeremiedbb jeremiedbb merged commit 19ef479 into scikit-learn:main Jan 2, 2025
30 checks passed
jeremiedbb pushed a commit to jeremiedbb/scikit-learn that referenced this pull request Jan 8, 2025
jeremiedbb pushed a commit that referenced this pull request Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cython module:tree Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

scikit-learn 1.6 changed behavior of growing trees
5 participants