Skip to content

API Gaussian Process: change default behavior of sample_y #25789

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ominusliticus
Copy link

The random_state argument was set to 0 forceing sample_y to always return the same gaussian process samples.
This is contrary to the expected behavior most people have from working with random number generating libraries such as numpy and scipy. Changing the default parameter to None will make the behavior of sample_y conform to the usual expectation a user.

Reference Issues/PRs

Fixes #25767

What does this implement/fix? Explain your changes.

Changes default parameter random_state from GaussianProcessRegressor from 0 to None to make it behave like a random
number generator (RNG) by default, conforming with all other RNG APIs.

Any other comments?

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR! Can you updated sklearn/gaussian_process/tests/test_gpr.py suc that

  1. All calls to sample_y has random_state=0. This keeps the test deterministic when running on different platforms.
  2. Add a new test to make check that the new warning is raised.

@thomasjpfan thomasjpfan changed the title Gaussian Process: change default behavior of sample_y API Gaussian Process: change default behavior of sample_y Mar 13, 2023
@ominusliticus ominusliticus force-pushed the OL/gaussian_process_edit branch from a764914 to 4472e61 Compare March 14, 2023 04:49
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update!

May you add a change log entry in doc/whats_new/v1.3.rst with tag |API| that describes the change?

Also sklearn/gaussian_process/tests/test_gpr.py still needs to be updated by setting sample_y(..., random_state=0) so the tests are deterministic between platforms.

@ominusliticus
Copy link
Author

Hey, just so I completely understand. You want me to go through sklearn/gaussian_process/test/test_gpr.py and modify all
the calls to sample_y calls such that they explicitly set the the random_state keyword to 0?

@ominusliticus ominusliticus force-pushed the OL/gaussian_process_edit branch from 4472e61 to bc327f5 Compare March 15, 2023 04:01
@ominusliticus
Copy link
Author

I see that the doc building failed.
I will take a look at the wiki these evening to see what I can do to test the doc building locally and work on debugging the problem then.

@ominusliticus ominusliticus force-pushed the OL/gaussian_process_edit branch from bc327f5 to a17a478 Compare March 16, 2023 04:59
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the calls to sample_y calls such that they explicitly set the the random_state keyword to 0?

Yup. It looks like you did it already.

I have some minor comments, otherwise this PR looks ready to go!

The `random_state` argument was set to `0` forceing `sample_y` to always
return the same gaussian process samples.
This is contrary to the expected behavior most people have from working
with random number generating libraries such as numpy and scipy.
Changing the default parameter to `None` will make the behavior of
`sample_y` conform to the usual expectation a user.

Updated `test_gpr.py` so that all calls to `sample_y` are passed
`random_state=0` to enusre deterministic testing.
@ominusliticus ominusliticus force-pushed the OL/gaussian_process_edit branch from a17a478 to 6fe269b Compare March 16, 2023 14:48
@ominusliticus
Copy link
Author

Thanks for all the help in getting me through this PR!

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@thomasjpfan thomasjpfan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Mar 16, 2023
Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, please submit a commit with [doc build] in the commit message so that we make sure there are no warnings as a result of this in the documentation.

Comment on lines +496 to +497
"The default value of random_state will change from"
+ " 0 to None in 1.5",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should tell users how to silence the warning. Something like the following text:

Suggested change
"The default value of random_state will change from"
+ " 0 to None in 1.5",
"The default value of random_state will change from"
" 0 to None in version 1.5. Explicitly set a value for `random_state` "
"to silence this warning",

@ominusliticus
Copy link
Author

Sorry for delay on this.
Since my pull request didn't make it into the last release. Should I add it to the whats_new/v1.4.rst document?

@adrinjalali
Copy link
Member

@ominusliticus sorry, this slipped through and didn't see your message. Our changelog is quite different now, it needs to follow the new template (separate file for each PR). Let me know if you still have time to work on this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
module:gaussian_process Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Change default behavior of GaussianProcessRegressor.sample_y's random_state variable.
3 participants