Skip to content

TST Add TODO for global_dtype in sklearn/ensemble/tests/test_forest.py #22941

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 4 commits into from

Conversation

adam2392
Copy link
Member

Reference Issues/PRs

See: #22881 (comment)

What does this implement/fix? Explain your changes.

Adds an inline comment for TODO behavior change for global_dtype tests in ensemble trees in the future.

Any other comments?

cc: @jjerphan

@jjerphan
Copy link
Member

Hi @adam2392,

I am following up with this PR which I forgot.

I do not have commit rights on @neurodata's fork of scikit-learn.
Can you merge main in the branch of this PR?

@adam2392
Copy link
Member Author

I do not have commit rights on @neurodata's fork of scikit-learn. Can you merge main in the branch of this PR?

Done!

@jjerphan
Copy link
Member

jjerphan commented Dec 1, 2022

I think it is worth adding float32 support in tree based models before working further on extending tests as data is cast to float64 at the moment:

https://github.com/scikit-learn/scikit-learn/blob/ba5f9e0cd10486f954299a7796c862dd0ddc9470/sklearn/tree/_tree.pyx##L101-L102

@adam2392
Copy link
Member Author

adam2392 commented Dec 6, 2022

I think it is worth adding float32 support in tree based models before working further on extending tests as data is cast to float64 at the moment:
https://github.com/scikit-learn/scikit-learn/blob/ba5f9e0cd10486f954299a7796c862dd0ddc9470/sklearn/tree/_tree.pyx##L101-L102

What's the difficulty here? Could we just re-define the trees and sub the data types for a fused datatype? If so, I think I can submit a straightforward PR. Lmk if you think this is insufficient tho.

E.g.

ctypedef cnp.npy_float32 DTYPE_t          # Type of X
ctypedef cnp.npy_float64 DOUBLE_t       # Type of y, sample_weight

ctypedef fused NEW_DTYPE:
   DTYPE_t
   DOUBLE_t

@jjerphan
Copy link
Member

jjerphan commented Dec 6, 2022

Unfortunately fused-types and Cython extension types can't be used simultaneously.

More information and resolution pathways are given here: #7059 (comment)

@adam2392
Copy link
Member Author

adam2392 commented Dec 6, 2022

Oh yeah forgot bout that unfortunate limitation :p

@jjerphan
Copy link
Member

jjerphan commented Dec 6, 2022

Things would significantly be simpler for scikit-learn if Cython were to support it.

IIRC, supporting this really is complicated because one has to deal with many historical aspects and choices of Pyrex which Cython supersedes.

Edit: cython/cython#3283 tracks discussions on this topic.

@adam2392 adam2392 closed this Mar 8, 2023
@adam2392 adam2392 deleted the test_forest branch March 8, 2023 20:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants