-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC improve kernel PCA example #19945
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOC improve kernel PCA example #19945
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can make "Application to image denoising" be its own example and PR.
This PR can be smaller by only updating examples/decomposition/plot_kernel_pca.py
and docstrings, which will already be an improvement.
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @glemaitre for this example.
Here are a few suggestions.
@thomasjpfan Are the changes fine with you. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks @glemaitre !
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just a few phrasing suggestion.
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm happy with the content. Thank you @glemaitre !
One minor question about structure.
(arXiv:909) | ||
A randomized algorithm for the decomposition of matrices | ||
Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert | ||
.. [1] `Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Historically, do we have a policy on where to place references? We already reference Schölkopf in the user guide.
I do like having a link to the reference while reading the docstring, maybe we can link the docstring to the reference in the user guide?
CC @NicolasHug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My main position is that references without backlinks are almost entirely useless and even counterproductive/confusing when there's more than one. Without a backlink, we don't know what the reference references to.
I see all the refs here have backlinks, which is great. I don't mind having them in the docstrings as long as they're properly maintained.
However it does raise the issue of duplicated references and duplicated descriptions between the docstrings and the UG. I feel like this is something we'd want to avoid. My preference here is to only have the ref section in the UG, and have more detailed parameter descriptions there as well. Then we can link to the UG from the docstrings when relevant / needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My preference here is to only have the ref section in the UG, and have more detailed parameter descriptions there as well. Then we can link to the UG from the docstrings when relevant / needed.
I agree with the previous comment apart of this one mainly because, sometimes, I don't want to go in the user guide to get the info. I would expect to get the paper of the algorithm at minima. Here, this is the typical case that it is tricky because the paper does not provide the implementation of inverse_transform
so I would also like to know about it :)
For the solver, I might agree that refering to the user guide could be enough because the default exact algorithm is in the main paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's useful to have it in the code/docstring, especially when it makes it possible to give more details on the behavior of the parameters by cross-linking from the parameters section when appropriate as done here.
Why is it a problem to have it both in the code/docstring and the user guide?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it a problem to have it both in the code/docstring and the user guide?
Duplicated content get out of sync, which is even more harmful to docs than to code. Inevitably, after a while, A says X and B says Y, and readers don't know which one they should trust. Also, twice as much work for us to maintain the same thing in 2 places. We'll forget.
The only way I found to not duplicate content is to write most info in the user guide, and link from the docstring to the user guide.
especially when it makes it possible to give more details on the behavior of the parameters by cross-linking from the parameters section when appropriate as done here
I'm not sure what you mean, but we can cross-link from docstrings to UG too.
Personally, I feel like the only good thing about refs in the docstring is that in theory you can get to the papers from within the code / IDE. But this alone doesn't outweigh the risks, IMHO. And everybody can look at the docs online on a browser. With Latex and links everywhere, our docs are becoming more and more "html only" anyway (and it's for the best, if you ask me).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As an example of my "strategy" of linking from the docstrings to the UG, and keeping the references in the UG:
categorical_features
and monotonic_constraints
docstrings of
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>
For reference I merged #20248 which is a bit related to this PR. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Here are some minor remarks which might help with reading the example.
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>
After discussion in #19732, here comes an improvement of the kernel PCA example.
This example goes more into details regarding the inverse transform of kernel PCA. In addition, we add an example of image denoising. I also added the references.