Skip to content

DOC improve kernel PCA example #19945

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 23 commits into from
Nov 4, 2021

Conversation

glemaitre
Copy link
Member

@glemaitre glemaitre commented Apr 21, 2021

After discussion in #19732, here comes an improvement of the kernel PCA example.

This example goes more into details regarding the inverse transform of kernel PCA. In addition, we add an example of image denoising. I also added the references.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can make "Application to image denoising" be its own example and PR.

This PR can be smaller by only updating examples/decomposition/plot_kernel_pca.py and docstrings, which will already be an improvement.

glemaitre and others added 2 commits May 31, 2021 14:37
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @glemaitre for this example.

Here are a few suggestions.

@glemaitre
Copy link
Member Author

@thomasjpfan Are the changes fine with you.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @glemaitre !

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Just a few phrasing suggestion.

glemaitre and others added 2 commits June 24, 2021 12:03
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm happy with the content. Thank you @glemaitre !

One minor question about structure.

(arXiv:909)
A randomized algorithm for the decomposition of matrices
Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert
.. [1] `Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Historically, do we have a policy on where to place references? We already reference Schölkopf in the user guide.

I do like having a link to the reference while reading the docstring, maybe we can link the docstring to the reference in the user guide?

CC @NicolasHug

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My main position is that references without backlinks are almost entirely useless and even counterproductive/confusing when there's more than one. Without a backlink, we don't know what the reference references to.

I see all the refs here have backlinks, which is great. I don't mind having them in the docstrings as long as they're properly maintained.

However it does raise the issue of duplicated references and duplicated descriptions between the docstrings and the UG. I feel like this is something we'd want to avoid. My preference here is to only have the ref section in the UG, and have more detailed parameter descriptions there as well. Then we can link to the UG from the docstrings when relevant / needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My preference here is to only have the ref section in the UG, and have more detailed parameter descriptions there as well. Then we can link to the UG from the docstrings when relevant / needed.

I agree with the previous comment apart of this one mainly because, sometimes, I don't want to go in the user guide to get the info. I would expect to get the paper of the algorithm at minima. Here, this is the typical case that it is tricky because the paper does not provide the implementation of inverse_transform so I would also like to know about it :)

For the solver, I might agree that refering to the user guide could be enough because the default exact algorithm is in the main paper.

Copy link
Member

@ogrisel ogrisel Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's useful to have it in the code/docstring, especially when it makes it possible to give more details on the behavior of the parameters by cross-linking from the parameters section when appropriate as done here.

Why is it a problem to have it both in the code/docstring and the user guide?

Copy link
Member

@NicolasHug NicolasHug Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it a problem to have it both in the code/docstring and the user guide?

Duplicated content get out of sync, which is even more harmful to docs than to code. Inevitably, after a while, A says X and B says Y, and readers don't know which one they should trust. Also, twice as much work for us to maintain the same thing in 2 places. We'll forget.

The only way I found to not duplicate content is to write most info in the user guide, and link from the docstring to the user guide.

especially when it makes it possible to give more details on the behavior of the parameters by cross-linking from the parameters section when appropriate as done here

I'm not sure what you mean, but we can cross-link from docstrings to UG too.

Personally, I feel like the only good thing about refs in the docstring is that in theory you can get to the papers from within the code / IDE. But this alone doesn't outweigh the risks, IMHO. And everybody can look at the docs online on a browser. With Latex and links everywhere, our docs are becoming more and more "html only" anyway (and it's for the best, if you ask me).

Copy link
Member

@NicolasHug NicolasHug Jun 29, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As an example of my "strategy" of linking from the docstrings to the UG, and keeping the references in the UG:

categorical_features and monotonic_constraints docstrings of
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html

glemaitre and others added 2 commits June 25, 2021 16:07
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
@ogrisel
Copy link
Member

ogrisel commented Jun 29, 2021

For reference I merged #20248 which is a bit related to this PR.

Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

Here are some minor remarks which might help with reading the example.

glemaitre and others added 4 commits November 3, 2021 18:09
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
@jjerphan jjerphan merged commit d5ce9c4 into scikit-learn:main Nov 4, 2021
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Nov 29, 2021
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
glemaitre added a commit to glemaitre/scikit-learn that referenced this pull request Dec 24, 2021
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
glemaitre added a commit that referenced this pull request Dec 25, 2021
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants