Skip to content

MNT Refactor tree splitter to use memoryviews #23273

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
May 13, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 7 additions & 0 deletions doc/whats_new/v1.2.rst
Original file line number Diff line number Diff line change
Expand Up @@ -85,6 +85,13 @@ Changelog
matrices in a variety of estimators and avoid an `EfficiencyWarning`.
:pr:`23139` by `Tom Dupre la Tour`_.

:mod:`sklearn.tree`
...................

- |Fix| Fixed invalid memory access bug during fit in
:class:`tree.DecisionTreeRegressor` and :class:`tree.DecisionTreeClassifier`.
:pr:`23273` by `Thomas Fan`_.

Code and Documentation Contributors
-----------------------------------

Expand Down
8 changes: 4 additions & 4 deletions sklearn/tree/_splitter.pxd
Original file line number Diff line number Diff line change
Expand Up @@ -46,13 +46,13 @@ cdef class Splitter:
cdef object random_state # Random state
cdef UINT32_t rand_r_state # sklearn_rand_r random number state

cdef SIZE_t* samples # Sample indices in X, y
cdef SIZE_t[::1] samples # Sample indices in X, y
cdef SIZE_t n_samples # X.shape[0]
cdef double weighted_n_samples # Weighted number of samples
cdef SIZE_t* features # Feature indices in X
cdef SIZE_t* constant_features # Constant features indices
cdef SIZE_t[::1] features # Feature indices in X
cdef SIZE_t[::1] constant_features # Constant features indices
cdef SIZE_t n_features # X.shape[1]
cdef DTYPE_t* feature_values # temp. array holding feature values
cdef DTYPE_t[::1] feature_values # temp. array holding feature values

cdef SIZE_t start # Start position for the current node
cdef SIZE_t end # End position for the current node
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you plan to change as sample_weight in a future PR?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I was referring to the pointer that is 2 lines below sample_weight.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think @glemaitre means:

cdef DOUBLE_t* sample_weight

Yes I plan to do it in the future. sample_weight touches multiple files, so I wanted to do it in another PR.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah ok I see: the sample_weight attribute below is still defined as a pointer (cdef DOUBLE_t* sample_weight) and it could also be changed to a memory view.

+1. I am fine for doing this in a later PR and merge this one.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Works with me.

Expand Down
Loading