-
-
Notifications
You must be signed in to change notification settings - Fork 26k
FIX remove max_samples in RandomTreesEmbedding #15693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I think the CI fails cause:
otherwise LGTM. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough but actually I am not sure what this class should do. Shouldn't we make it possible to enable bootstrapping?
Also by reading the code I saw that we are using a uniformly random y internally, but the same target y for all the trees in the ensemble. Wouldn't it make more sense to use independently resampled ys for each tree to get a less redundant (correlated) while still neighbor preserving, high dimensional sparse representation of the original data?
Any opinion @glouppe or others?
Actually because The |
So bootstrapping and |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unable to understand the failure from Circle CI, otherwise LGTM
doc/whats_new/v0.22.rst
Outdated
:class:`ensemble.ExtraTreesRegressor`, | ||
:class:`ensemble.RandomTreesEmbedding`. :pr:`14682` by | ||
:user:`Matt Hancock <notmatthancock>` and | ||
:class:`ensemble.ExtraTreesRegressor` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should remove ForestClassifier and ForestRegressor
sklearn/ensemble/_forest.py
Outdated
- If int, then draw `max_samples` samples. | ||
- If float, then draw `max_samples * X.shape[0]` samples. Thus, | ||
`max_samples` should be in the interval `(0, 1)`. | ||
|
||
.. versionadded:: 0.22 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also need to remove this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Apart from Hammond's comment lgtm
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will merge when green
Don't we need a deprecation? We may be break users' code |
I think this was added in 0.22, so removing it is safe |
Reference Issues/PRs
closes #15670
What does this implement/fix? Explain your changes.
Remove the parameter
max_samples
sinceRandomTreesEmbedding
does not allow to turnbootstrap=True
. It should be ported in 0.22.Any other comments?