Skip to content

DOC Display diagram to pipeline example #18758

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 89 commits into from
Oct 13, 2021

Conversation

reshamas
Copy link
Member

@reshamas reshamas commented Nov 4, 2020

Reference Issues/PRs

Fixes #18305

What does this implement/fix? Explain your changes.

Adds set_config(display='diagram') to first example in Pipelines and Composite Estimators

Any other comments?

cc: @Mariam-ke

@reshamas
Copy link
Member Author

reshamas commented Nov 4, 2020

I wonder if we want to show both ways, the default pipeline output and then specifying the diagram.

from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
pipe

output

Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])
# to see a visualization of the pipeline
from sklearn import set_config
set_config(display='diagram')

pipe

output in iPython (Jupyter notebook)

Screen Shot 2020-11-04 at 2 40 14 PM

>>> from sklearn.pipeline import Pipeline
>>> from sklearn.svm import SVC
>>> from sklearn.decomposition import PCA
>>> set_config(display='diagram')
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR @reshamas but I don't think this is the right place for using the diagram rendering: this example is about a pipeline creation. Anything that's not directly related to that is noise and distracts from the original intent.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug Where is a good place to put it?
Referring to:

We probably need an example, or a user guide page, showing how different estimators are diagrammed, and perhaps also instructing users to use IPython.display's display function when needed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I answered in the original issue ;)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense. Keep the conversation with the original issue.

Comment on lines 122 to 124
To render estimators as diagrams in notebooks, use the `display='diagram'` option:
>>> with config_context(display='diagram'):
>>> pipe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
To render estimators as diagrams in notebooks, use the `display='diagram'` option:
>>> with config_context(display='diagram'):
>>> pipe
To render estimators as diagrams in notebooks, use the `display='diagram'` option::
>>> with config_context(display='diagram'):
>>> pipe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(we need the :: otherwise this won't be interpreted as code)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@NicolasHug I accepted your suggestion and did "commit suggestion", but when I tried to push changes, I get:

▶ git push origin display-diagram 
To github.com:reshamas/scikit-learn.git
 ! [rejected]            display-diagram -> display-diagram (non-fast-forward)
error: failed to push some refs to 'git@github.com:reshamas/scikit-learn.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
(sklearndev) 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as suggested in the message, you need a git pull ;)

This happens because the branch on your fork (on the cloud) and the one on your local machine have diverged: they both got one commit that the other one doesn't

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried both git pull and git pull upstream master. I also tried various permutations, and I am now "Lost in Git"

There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.

    git pull <remote> <branch>

If you wish to set tracking information for this branch you can do so with:

    git branch --set-upstream-to=<remote>/<branch> display-diagram

(sklearndev) 

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the issue isn't with upstream master (which is this repo's main branch) but with this branch on your own fork: origin display-diagram

I'm a bit surprise git pull doesn't work, maybe try git pull origin display-diagram then?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though it looks like you're on the sklearndev branch on your machine? (or is that the name of the virtualenv?)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok, it went through. (the steps worked, but I needed to commit my changes first)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you have merge conflicts that weren't taken out of the file now

Comment on lines 126 to 127
You may also call `set_config
<https://scikit-learn.org/stable/modules/generated/sklearn.set_config.html>`_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please use a :func: sphinx reference instead.

Also above we'll need an import for config_context.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I run these lines in Jupyter:

import sklearn
from sklearn import set_config

with sklearn.config_context(display='diagram'):
    pipe

I don't get any output (no errors either).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah, the context manager is such that the cell doesn't return the estimator. That's unfortunate, I wanted to avoid that all subsequent estimator would be rendered as diagrams but that won't be possible.

In this case, I would just remove the note and call set_config at the beginning of the pipeline section code blcok, with a short comment like:

set_config(display='diagram')  # for HTML repr of estimators

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if the code was replaced with what is below? This does render the diagram in Jupyter notebook.

from sklearn import set_config
set_config(display='diagram')

pipe

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, we can keep the note


>>> from sklearn import set_config
>>> set_config(display='diagram')
...
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
...

This seems to create a rendering issue

@reshamas reshamas changed the title [WIP] DOC Display diagram to pipeline example (PCA/SVC) [WIP] DOC Display diagram to pipeline example Nov 5, 2020

>>> from sklearn import set_config
>>> set_config(display='diagram')
>>> pipe
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@thomasjpfan any idea why this does not render as html?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is pure rst, no code is actually ran when the docs are build. We have been linking to a sphinx gallery that properly displays the html repr.

Sidenote: I had a custom sphinx derivative that did just this, but I thought maintaining another sphinx derivative would be too much of a maintenance burden.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should I move the example to another file?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah right, we do run the docs but not when building them :/

I think we should go with a dedicated example then, and not change the getting_started guide (#18305 (comment)). @reshamas I guess we can just merge this PR if you revert the changes made to getting_started.rst? Thanks and sorry for the extra work

@@ -117,6 +117,7 @@ the test data::
>>> accuracy_score(pipe.predict(X_test), y_test)
0.97...


Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change

@reshamas
Copy link
Member Author

@NicolasHug
I am submitting one pipeline example for now, to get it working. After it runs and confirming I am on the right track, I can add the other examples.

Comment on lines 1 to 2
#!/usr/bin/env python
# -*- coding: utf-8 -*-
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for these

Suggested change
#!/usr/bin/env python
# -*- coding: utf-8 -*-

Copy link
Member

@NicolasHug NicolasHug left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @reshamas , yes I think this is on the right track!

`set_config(display='text')`. To visualize the diagram in Jupyter Notebook,
use `set_config(display='diagram')` and then call the pipeline object.

# %%
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't render properly, I think you need to end the docstring before starting the # %% blocks

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure that is the solution.
Example in this file: https://github.com/scikit-learn/scikit-learn/blob/master/examples/compose/plot_compare_reduction.py

When I end the docstring earlier, it doesn't work, unless I need two separate docstrings?

Note that the use of ``memory`` to enable caching becomes interesting when the
fitting of a transformer is costly.

# %%
Illustration of ``Pipeline`` and ``GridSearchCV``
###############################################################################

This section illustrates the use of a ``Pipeline`` with ``GridSearchCV``
"""

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @reshamas indeed you have found a bug in the documentation. You can see here the rendering of the page you referred to. The docstring needs to be ended there too. Would you be interested in proposing a pull request to fix the rendering of that page?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cmarmo
Yes. @Mariam-ke and I try to meet weekly to work on a PR, so we'll continue work on this.

Comment on lines 27 to 28
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
pipe = Pipeline([('reduce_dim', PCA()), ('clf', SVC())])

or call it "steps" instead of "estimators", since this is the name of the parameter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 I think "steps" is a better name than "estimators". I made that update.

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the follow up @reshamas !

I think this is almost ready to go!

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments on adding custom parameters so the visualization is more interesting when clicked. Otherwise LGTM!

reshamas and others added 2 commits October 4, 2021 18:01
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Copy link
Member

@jjerphan jjerphan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you @reshamas for creating this example.

@reshamas reshamas requested a review from glemaitre October 6, 2021 16:15
Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM (once the nitpick suggestions are accepted)!

@ogrisel
Copy link
Member

ogrisel commented Oct 13, 2021

Merged, thanks @reshamas!

@glemaitre glemaitre mentioned this pull request Oct 23, 2021
10 tasks
glemaitre pushed a commit to glemaitre/scikit-learn that referenced this pull request Oct 23, 2021
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
glemaitre pushed a commit that referenced this pull request Oct 25, 2021
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
@reshamas reshamas deleted the display-diagram branch October 26, 2021 18:12
samronsin pushed a commit to samronsin/scikit-learn that referenced this pull request Nov 30, 2021
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

display='diagram' needs more prominence in documentation
6 participants