Skip to content

[MRG] documentation for random_state in forest.py #15516

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 12 commits into from
Nov 19, 2019

Conversation

MDouriez
Copy link
Contributor

@MDouriez MDouriez commented Nov 2, 2019

Reference Issues/PRs

part of this issue: #15222
(related to #15264)

What does this implement/fix? Explain your changes.

This PR improves the documentation for RandomForestClassifier, RandomForestRegressor, ExtraTreesClassifier and ExtraTreesRegressor regarding the source of randomness. The only changes were made in the docstrings of sklearn/ensemble/_forest.py.

Any other comments?

@eickenberg
Copy link
Contributor

Looks good to me pending CI

@MDouriez MDouriez changed the title [WIP] documentation for random_state in forests [MRG] documentation for random_state in forests Nov 2, 2019
@MDouriez MDouriez changed the title [MRG] documentation for random_state in forests [MRG] documentation for random_state in forest.py Nov 2, 2019
@MDouriez
Copy link
Contributor Author

@NicolasHug Let me know what you think. Thanks!

@cmarmo
Copy link
Contributor

cmarmo commented Nov 17, 2019

Also @TomDLT or maybe @adrinjalali (seems to me that you volunteered to be pinged... ;) )... this PR looks pretty ready for merging?

@adrinjalali
Copy link
Member

Thanks for the ping @cmarmo

I like the description, but I'm not sure if the part which is related to the randomness in trees should be included here or not. What do you think of adding a link to the tree docstring intead for that part @MDouriez ?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a clear improvement, thanks @MDouriez . Made some comments but LGTM when addressed

Comment on lines 967 to 972
Also note that the features are always randomly permuted at each split.
Therefore, the best found split may vary, even with the same training
data, ``max_features=n_features`` and ``bootstrap=False``, if the
improvement of the criterion is identical for several splits enumerated
during the search of the best split. To obtain a deterministic
behaviour during fitting, ``random_state`` has to be fixed.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can omit this last part (from "Also note that..." to "has to be fixed").

The rest above is perfect

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This note was already there. I just moved it. Should I still remove it?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, maybe just leave it where it was in the notes then

@MDouriez
Copy link
Contributor Author

Should be ready for final review @NicolasHug @adrinjalali.

  • Left the note where it was
  • Added line for rendering
  • Added description for RandomTreesEmbedding as well

@MDouriez
Copy link
Contributor Author

Also interestingly, RandomTreesEmbedding has a max_samples parameter but no bootstrap. In the __init__, bootstrap is set to False. Looks like max_samples is never used?

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MDouriez !

@NicolasHug
Copy link
Member

Also interestingly, RandomTreesEmbedding has a max_samples parameter but no bootstrap. In the init, bootstrap is set to False. Looks like max_samples is never used?

Good catch, could you please open an issue for this?

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @MDouriez

@adrinjalali adrinjalali merged commit 663d052 into scikit-learn:master Nov 19, 2019
@MDouriez
Copy link
Contributor Author

filed an issue #15670

Also interestingly, RandomTreesEmbedding has a max_samples parameter but no bootstrap. In the init, bootstrap is set to False. Looks like max_samples is never used?

Good catch, could you please open an issue for this?

adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Nov 25, 2019
* documentation for random_state in forests

* move note to parameter

* same for RandomForestRegressor

* add doc for ExtraTreesRegressor and ExtraTreesClassifier

* skip line

* lint

* move note back to where it was

* add Glossary in RandomForestRegressor

* adding description for RandomTreesEmbedding

* small fix

* correct description for RandomTreesEmbedding
jnothman pushed a commit that referenced this pull request Nov 28, 2019
* documentation for random_state in forests

* move note to parameter

* same for RandomForestRegressor

* add doc for ExtraTreesRegressor and ExtraTreesClassifier

* skip line

* lint

* move note back to where it was

* add Glossary in RandomForestRegressor

* adding description for RandomTreesEmbedding

* small fix

* correct description for RandomTreesEmbedding
panpiort8 pushed a commit to panpiort8/scikit-learn that referenced this pull request Mar 3, 2020
* documentation for random_state in forests

* move note to parameter

* same for RandomForestRegressor

* add doc for ExtraTreesRegressor and ExtraTreesClassifier

* skip line

* lint

* move note back to where it was

* add Glossary in RandomForestRegressor

* adding description for RandomTreesEmbedding

* small fix

* correct description for RandomTreesEmbedding
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants