Skip to content

Better documentation for random_state #15222

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NicolasHug opened this issue Oct 12, 2019 · 6 comments
Closed

Better documentation for random_state #15222

NicolasHug opened this issue Oct 12, 2019 · 6 comments
Labels
Documentation help wanted Moderate Anything that requires some knowledge of conventions and best practices

Comments

@NicolasHug
Copy link
Member

NicolasHug commented Oct 12, 2019

Sort of like #14228, but for random_state.

For any public object that accepts a random_state parameter, we should document what parts of the algorithm are randomized. It's not always obvious what is and what isn't randomized. We should also always link to the glossary, where the different possible values of random_state are clearly explained.

For example for the random forest estimators, it would be helpful to indicate that random_state determines in particular the subsampling of the samples and the subsampling of the features. Something like:


random_state : int, np.random.RandomStateInstance or None, default=None
	Controls the randomness of the estimator, in particular the subsampling
    of the samples and the subsampling of the features. See 
	term:`random_state` for details.
@NicolasHug NicolasHug added Documentation Moderate Anything that requires some knowledge of conventions and best practices help wanted labels Oct 12, 2019
@mschaffenroth
Copy link
Contributor

mschaffenroth commented Oct 12, 2019

The script from #14228 (see here) adapted for the random_state parameter got the following results:

@matsmaiwald
Copy link

Just had a brief look at this.

It seems that approporiate documentation is already in place, for e.g. these two:
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/base.py
https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/linear_model/ridge.py

@albertcthomas
Copy link
Contributor

albertcthomas commented Oct 14, 2019

+1 for this, thanks for the report @mschaffenroth!

It was already done for sklearn/svm/classes.py - 90, 544 and 750. See issue #9497 and PR #9703.

For the svm module, it thus only remains to do

sara-es added a commit to sara-es/scikit-learn that referenced this issue Oct 19, 2019
svm/_base.py 852 and svm/_classes.py 310 as detailed in scikit-learn#15222.
@sara-es
Copy link

sara-es commented Oct 19, 2019

Hi, I am new to contributing and would like to help out with this. I have done the two instances mentioned by @albertcthomas in the svm module above and can continue working through the above list.

@jnothman
Copy link
Member

I think it would be helpful, @NicolasHug, to give some examples of what this should look like. Thanks for continuing this work.

@glemaitre
Copy link
Member

closing in favour of #10548

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation help wanted Moderate Anything that requires some knowledge of conventions and best practices
Projects
None yet
Development

No branches or pull requests

7 participants