Skip to content

MAINT Simplify node split Cython API #29322

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Jun 26, 2024

Conversation

adam2392
Copy link
Member

@adam2392 adam2392 commented Jun 20, 2024

Reference Issues/PRs

n/a

What does this implement/fix? Explain your changes.

I noticed that there were extra arguments in node_split_best and node_split_random that need not be in the function signatures, and they simply complicate the function signature. The changes are inline with how the rest of the splitter parameters are used (i.e. there is a reference defined within the inline function rather than passing in the parameter explicitly).

This should be a super quick review.

cc: @thomasjpfan

Any other comments?

Note: in an ideal world we would also not pass in Criterion criterion since that is an attribute of the Splitter splitter object. But one cannot define Criterion within the nogil node_split_best and node_split_random functions. That is the following is not possible:

cdef inline int node_split_best(
    Splitter splitter,
    Partitioner partitioner,
    SplitRecord* split,
    ParentInfo* parent_record,
) except -1 nogil:
    """Find the best split on node samples[start:end]

    Returns -1 in case of failure to allocate memory (and raise MemoryError)
    or 0 otherwise.
    """
    cdef Criterion criterion = splitter.criterion

ref: https://stackoverflow.com/questions/68278706/is-nogil-safe-when-accessing-cython-extension-type-members

Signed-off-by: Adam Li <adam2392@gmail.com>
Signed-off-by: Adam Li <adam2392@gmail.com>
Copy link

github-actions bot commented Jun 20, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 7251b2f. Link to the linter CI: here

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks okay to me and should not cause any performance regressions.

Can you do a quick benchmark to check if there is a performance regression?

@adam2392
Copy link
Member Author

My asv benchmarks is not working, but I just ran the bench_tree.py script on my local laptop and it looks fine to me.

scikit-learn_tree_benchmark_results-main
scikit-learn_tree_benchmark_results

Copy link
Member

@lorentzenchr lorentzenchr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@lorentzenchr lorentzenchr merged commit 64f20aa into scikit-learn:main Jun 26, 2024
30 checks passed
@adam2392 adam2392 deleted the simplifynodesplit branch June 26, 2024 12:00
@jeremiedbb jeremiedbb mentioned this pull request Jul 2, 2024
11 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants