Skip to content

FIX Adjust builder to support a different split record #16

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 3 commits into from

Conversation

thomasjpfan
Copy link

This is an example of how to get builder to work with the different types of split records.

This is functionally but I do not like the malloc + free in the builder.

All the setup.py language="c++" changes were needed to get the PR to compile on my machine

CC @adam2392

@thomasjpfan
Copy link
Author

thomasjpfan commented Mar 14, 2022

The constraint on SplitRecord being a struct really constraints the trees. Also I think I broke something when it comes to the algorithm itself.

I do not have any more bandwidth to think about this for the next few weeks. (Prioritizing other items for the next release)

@thomasjpfan
Copy link
Author

thomasjpfan commented Mar 15, 2022

BTW I think there is a way to make this work with subclassing. We'll have to define a cdef class _Record that is returned by a Splitter and is initialized by the builder (where we are doing malloc). Interally, the splitter would use it's own struct to do splitting, but when the information is passed back up to the TreeBuilder, it will be in the cdef class.

It's very similar to how we do it in HistGradientBoosting:

https://github.com/scikit-learn/scikit-learn/blob/fb4dbfd837483ac3daf06dfc28871bfcfb65c4ab/sklearn/ensemble/_hist_gradient_boosting/splitting.pyx#L32

Note how there is a split_info_struct and a SplitInfo class and how their fields are exactly the same.

One would need to experiment to see if this idea actually works.

@adam2392
Copy link
Collaborator

BTW I think there is a way to make this work with subclassing. We'll have to define a cdef class _Record that is returned by a Splitter. Interally, the splitter would use it's own struct to do splitting, but when the information is passed back up to the TreeBuilder, it will be in the cdef class.
One would need to experiment to see if this idea actually works.

Thanks for the info! I can work on it tomorrow.

Also, your idea of the malloc/free works! I fixed the bug it introduced to the algo and confirmed the performance is the same as before for both RF and OF.

If the subclassing works, then I'll lyk and begin working on unit tests. Otw, I have a working commit in the draft PR using your idea.

My colleague will also work on an in-depth example to append to the existing RF example.

@adam2392
Copy link
Collaborator

adam2392 commented Mar 15, 2022

(Notes for myself) Main challenges I foresee:

@thomasjpfan thomasjpfan closed this Aug 1, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants