Skip to content

CLN Improve doc/error consistency for GaussianProcessRegressor #19687

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 4 commits into from
Apr 12, 2021

Conversation

chrisyeh96
Copy link
Contributor

Improves the documentation for GaussianProcessRegressor.

  • correctly links to Glossary
  • fixes typo (BGFS should be BFGS)
  • consistently uses n_targets instead of a combination of n_targets and n_output_dims

Regarding the last note though, it seems like scikit-learn hasn't standardized on the proper word to describe a multi-target / multi-output regression. Some models use "target," while others use "output." I chose to use n_targets here because it is used by most of the linear models, and no other page used n_output_dims. (Several others use n_outputs.)

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the PR @chrisyeh96 !

Minor comment, otherwise LGTM

@thomasjpfan thomasjpfan changed the title Improve documentation consistency for GaussianProcessRegressor DOC Improve documentation consistency for GaussianProcessRegressor Mar 20, 2021
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
@chrisyeh96
Copy link
Contributor Author

chrisyeh96 commented Mar 22, 2021

I made the suggested changes, but I noticed an additional problem. The function sample_y() itself has a n_samples parameter, and the argument X has shape (n_samples, n_features), but the two n_samples are not the same. This can lead to a lot of confusion.

def sample_y(self, X, n_samples=1, random_state=0):
    """Draw samples from Gaussian process and evaluate at X.
    Parameters
    ----------
    X : array-like of shape (n_samples, n_features) or list of object
        Query points where the GP is evaluated.
    n_samples : int, default=1
        The number of samples drawn from the Gaussian process
     ...

The existing documentation uses n_samples_X in the return value to distinguish between the two n_samples, but that still leaves the two n_samples in the parameters descriptions.

Thoughts on how to best resolve this duplicity? @thomasjpfan

@thomasjpfan
Copy link
Member

What do you think of updating the docstring of X to the following?

         X : array-like of shape (n_samples_X, n_features) or list of object

@chrisyeh96
Copy link
Contributor Author

chrisyeh96 commented Mar 23, 2021

@thomasjpfan: agreed

I also made some other small fixes / clarifications as well.

@chrisyeh96
Copy link
Contributor Author

@thomasjpfan I undid the np.dot -> @ code changes

@chrisyeh96
Copy link
Contributor Author

Bump. Will these changes be merged soon? @thomasjpfan

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PRs with code changes need 2 reviews. I think if we revert the code changes, I would be okay with merging with only the documentation changes.

Comment on lines +214 to +215
raise ValueError("alpha must be a scalar or an array "
"with same number of entries as y. (%d != %d)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a code change. Given the title of the PR, I would revert.

@@ -315,8 +315,7 @@ def predict(self, X, return_std=False, return_cov=False):
"""
if return_std and return_cov:
raise RuntimeError(
"Not returning standard deviation of predictions when "
"returning full covariance.")
"At most one of return_std or return_cov can be requested.")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a code change. Given the title of the PR, I would revert.

@chrisyeh96
Copy link
Contributor Author

I'd rather not have to waste time creating a 2nd pull request for the code changes. Is there any way to get a 2nd reviewer on this? Thanks!

@thomasjpfan thomasjpfan changed the title DOC Improve documentation consistency for GaussianProcessRegressor CLN Improve documentation consistency for GaussianProcessRegressor Apr 12, 2021
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll merge because this PR changes to the error messages are small and are an improvement.

@thomasjpfan thomasjpfan changed the title CLN Improve documentation consistency for GaussianProcessRegressor CLN Improve doc/error consistency for GaussianProcessRegressor Apr 12, 2021
@thomasjpfan thomasjpfan merged commit 7b343dd into scikit-learn:main Apr 12, 2021
@chrisyeh96 chrisyeh96 deleted the patch-1 branch April 12, 2021 17:06
thomasjpfan added a commit to thomasjpfan/scikit-learn that referenced this pull request Apr 19, 2021
@glemaitre glemaitre mentioned this pull request Apr 22, 2021
12 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants