DOC improve kernel PCA example #19945

glemaitre · 2021-04-21T08:36:03Z

After discussion in #19732, here comes an improvement of the kernel PCA example.

This example goes more into details regarding the inverse transform of kernel PCA. In addition, we add an example of image denoising. I also added the references.

…l_pca

thomasjpfan

I think we can make "Application to image denoising" be its own example and PR.

This PR can be smaller by only updating examples/decomposition/plot_kernel_pca.py and docstrings, which will already be an improvement.

examples/decomposition/plot_kernel_pca.py

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

jjerphan

Thanks @glemaitre for this example.

Here are a few suggestions.

examples/decomposition/plot_kernel_pca.py

sklearn/decomposition/_kernel_pca.py

…l_pca

glemaitre · 2021-06-23T16:10:10Z

@thomasjpfan Are the changes fine with you.

jjerphan

LGTM, thanks @glemaitre !

ogrisel

LGTM. Just a few phrasing suggestion.

examples/decomposition/plot_kernel_pca.py

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

thomasjpfan

I'm happy with the content. Thank you @glemaitre !

One minor question about structure.

thomasjpfan · 2021-06-25T13:36:39Z

sklearn/decomposition/_kernel_pca.py

-        (arXiv:909)
-        A randomized algorithm for the decomposition of matrices
-        Per-Gunnar Martinsson, Vladimir Rokhlin and Mark Tygert
+    .. [1] `Schölkopf, Bernhard, Alexander Smola, and Klaus-Robert Müller.


Historically, do we have a policy on where to place references? We already reference Schölkopf in the user guide.

I do like having a link to the reference while reading the docstring, maybe we can link the docstring to the reference in the user guide?

CC @NicolasHug

My main position is that references without backlinks are almost entirely useless and even counterproductive/confusing when there's more than one. Without a backlink, we don't know what the reference references to.

I see all the refs here have backlinks, which is great. I don't mind having them in the docstrings as long as they're properly maintained.

However it does raise the issue of duplicated references and duplicated descriptions between the docstrings and the UG. I feel like this is something we'd want to avoid. My preference here is to only have the ref section in the UG, and have more detailed parameter descriptions there as well. Then we can link to the UG from the docstrings when relevant / needed.

My preference here is to only have the ref section in the UG, and have more detailed parameter descriptions there as well. Then we can link to the UG from the docstrings when relevant / needed.

I agree with the previous comment apart of this one mainly because, sometimes, I don't want to go in the user guide to get the info. I would expect to get the paper of the algorithm at minima. Here, this is the typical case that it is tricky because the paper does not provide the implementation of inverse_transform so I would also like to know about it :)

For the solver, I might agree that refering to the user guide could be enough because the default exact algorithm is in the main paper.

I think it's useful to have it in the code/docstring, especially when it makes it possible to give more details on the behavior of the parameters by cross-linking from the parameters section when appropriate as done here.

Why is it a problem to have it both in the code/docstring and the user guide?

Why is it a problem to have it both in the code/docstring and the user guide?

Duplicated content get out of sync, which is even more harmful to docs than to code. Inevitably, after a while, A says X and B says Y, and readers don't know which one they should trust. Also, twice as much work for us to maintain the same thing in 2 places. We'll forget.

The only way I found to not duplicate content is to write most info in the user guide, and link from the docstring to the user guide.

especially when it makes it possible to give more details on the behavior of the parameters by cross-linking from the parameters section when appropriate as done here

I'm not sure what you mean, but we can cross-link from docstrings to UG too.

Personally, I feel like the only good thing about refs in the docstring is that in theory you can get to the papers from within the code / IDE. But this alone doesn't outweigh the risks, IMHO. And everybody can look at the docs online on a browser. With Latex and links everywhere, our docs are becoming more and more "html only" anyway (and it's for the best, if you ask me).

As an example of my "strategy" of linking from the docstrings to the UG, and keeping the references in the UG:

categorical_features and monotonic_constraints docstrings of
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.HistGradientBoostingClassifier.html

examples/decomposition/plot_kernel_pca.py

doc/modules/decomposition.rst

Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>

examples/decomposition/plot_kernel_pca.py

Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>

ogrisel · 2021-06-29T16:19:44Z

For reference I merged #20248 which is a bit related to this PR.

…l_pca

jjerphan

LGTM.

Here are some minor remarks which might help with reading the example.

doc/modules/decomposition.rst

examples/decomposition/plot_kernel_pca.py

sklearn/decomposition/_kernel_pca.py

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org> Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com> Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

glemaitre added 3 commits April 21, 2021 09:12

DOC improve example

44cf30a

Merge remote-tracking branch 'origin/main' into improve_example_kerne…

411fc09

…l_pca

iter

736ea51

github-actions bot added module:decomposition Documentation labels Apr 21, 2021

glemaitre added 2 commits April 21, 2021 10:41

iter

e61baf6

add info references

a4a9ab2

kstoneriv3 mentioned this pull request Apr 24, 2021

EHN Add transform_inverse to Nystroem #19971

Draft

Merge remote-tracking branch 'origin/main' into improve_example_kerne…

84c0900

…l_pca

thomasjpfan reviewed May 29, 2021

View reviewed changes

glemaitre and others added 2 commits May 31, 2021 14:37

Apply suggestions from code review

e1f0900

Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>

DOC simplify example

b9d6696

glemaitre mentioned this pull request Jun 10, 2021

DOC add image denoising kPCA example #20248

Merged

jjerphan requested changes Jun 23, 2021

View reviewed changes

glemaitre added 3 commits June 23, 2021 10:53

apply suggestion of julien

a28ae8f

Merge remote-tracking branch 'origin/main' into improve_example_kerne…

c8563fa

…l_pca

empty commit

acfa811

jjerphan approved these changes Jun 23, 2021

View reviewed changes

ogrisel approved these changes Jun 24, 2021

View reviewed changes

glemaitre and others added 2 commits June 24, 2021 12:03

Apply suggestions from code review

b9f7387

Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>

FIX hyperlink in inverse_transform

93a965a

thomasjpfan reviewed Jun 25, 2021

View reviewed changes

chkoar reviewed Jun 25, 2021

View reviewed changes

examples/decomposition/plot_kernel_pca.py Outdated Show resolved Hide resolved

chkoar reviewed Jun 25, 2021

View reviewed changes

doc/modules/decomposition.rst Outdated Show resolved Hide resolved

glemaitre and others added 2 commits June 25, 2021 16:07

Update examples/decomposition/plot_kernel_pca.py

0540fe5

Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>

Update doc/modules/decomposition.rst

42feb3f

Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>

chkoar reviewed Jun 25, 2021

View reviewed changes

examples/decomposition/plot_kernel_pca.py Outdated Show resolved Hide resolved

chkoar reviewed Jun 25, 2021

View reviewed changes

examples/decomposition/plot_kernel_pca.py Outdated Show resolved Hide resolved

Update examples/decomposition/plot_kernel_pca.py

03c9887

Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>

Update examples/decomposition/plot_kernel_pca.py

f470dd9

Co-authored-by: Christos Aridas <chkoar@users.noreply.github.com>

glemaitre added 2 commits October 10, 2021 14:16

Merge branch 'main' into improve_example_kernel_pca

2eb4c5e

Merge remote-tracking branch 'origin/main' into improve_example_kerne…

1fc976d

…l_pca

jjerphan approved these changes Nov 3, 2021

View reviewed changes

glemaitre and others added 4 commits November 3, 2021 18:09

Apply suggestion Julien

8237ea1

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

Update examples/decomposition/plot_kernel_pca.py

005e762

Co-authored-by: Julien Jerphanion <git@jjerphan.xyz>

black

4f301db

Update plot_kernel_pca.py

1a52093

jjerphan merged commit d5ce9c4 into scikit-learn:main Nov 4, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DOC improve kernel PCA example #19945

DOC improve kernel PCA example #19945

glemaitre commented Apr 21, 2021 •

edited

Loading

thomasjpfan left a comment

jjerphan left a comment

glemaitre commented Jun 23, 2021

jjerphan left a comment

ogrisel left a comment

thomasjpfan left a comment

thomasjpfan Jun 25, 2021

NicolasHug Jun 25, 2021

glemaitre Jun 25, 2021

ogrisel Jun 29, 2021 •

edited

Loading

NicolasHug Jun 29, 2021 •

edited

Loading

NicolasHug Jun 29, 2021 •

edited

Loading

ogrisel commented Jun 29, 2021 •

edited

Loading

jjerphan left a comment

DOC improve kernel PCA example #19945

DOC improve kernel PCA example #19945

Conversation

glemaitre commented Apr 21, 2021 • edited Loading

thomasjpfan left a comment

Choose a reason for hiding this comment

jjerphan left a comment

Choose a reason for hiding this comment

glemaitre commented Jun 23, 2021

jjerphan left a comment

Choose a reason for hiding this comment

ogrisel left a comment

Choose a reason for hiding this comment

thomasjpfan left a comment

Choose a reason for hiding this comment

thomasjpfan Jun 25, 2021

Choose a reason for hiding this comment

NicolasHug Jun 25, 2021

Choose a reason for hiding this comment

glemaitre Jun 25, 2021

Choose a reason for hiding this comment

ogrisel Jun 29, 2021 • edited Loading

Choose a reason for hiding this comment

NicolasHug Jun 29, 2021 • edited Loading

Choose a reason for hiding this comment

NicolasHug Jun 29, 2021 • edited Loading

Choose a reason for hiding this comment

ogrisel commented Jun 29, 2021 • edited Loading

jjerphan left a comment

Choose a reason for hiding this comment

glemaitre commented Apr 21, 2021 •

edited

Loading

ogrisel Jun 29, 2021 •

edited

Loading

NicolasHug Jun 29, 2021 •

edited

Loading

NicolasHug Jun 29, 2021 •

edited

Loading

ogrisel commented Jun 29, 2021 •

edited

Loading