-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
DOC Display diagram to pipeline example #18758
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I wonder if we want to show both ways, the default pipeline output and then specifying the diagram. from sklearn.pipeline import Pipeline
from sklearn.svm import SVC
from sklearn.decomposition import PCA
estimators = [('reduce_dim', PCA()), ('clf', SVC())]
pipe = Pipeline(estimators)
pipe outputPipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())]) # to see a visualization of the pipeline
from sklearn import set_config
set_config(display='diagram')
pipe output in iPython (Jupyter notebook) |
doc/modules/compose.rst
Outdated
>>> from sklearn.pipeline import Pipeline | ||
>>> from sklearn.svm import SVC | ||
>>> from sklearn.decomposition import PCA | ||
>>> set_config(display='diagram') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR @reshamas but I don't think this is the right place for using the diagram rendering: this example is about a pipeline creation. Anything that's not directly related to that is noise and distracts from the original intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug Where is a good place to put it?
Referring to:
We probably need an example, or a user guide page, showing how different estimators are diagrammed, and perhaps also instructing users to use IPython.display's display function when needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I answered in the original issue ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense. Keep the conversation with the original issue.
doc/getting_started.rst
Outdated
To render estimators as diagrams in notebooks, use the `display='diagram'` option: | ||
>>> with config_context(display='diagram'): | ||
>>> pipe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To render estimators as diagrams in notebooks, use the `display='diagram'` option: | |
>>> with config_context(display='diagram'): | |
>>> pipe | |
To render estimators as diagrams in notebooks, use the `display='diagram'` option:: | |
>>> with config_context(display='diagram'): | |
>>> pipe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(we need the ::
otherwise this won't be interpreted as code)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug I accepted your suggestion and did "commit suggestion", but when I tried to push changes, I get:
▶ git push origin display-diagram
To github.com:reshamas/scikit-learn.git
! [rejected] display-diagram -> display-diagram (non-fast-forward)
error: failed to push some refs to 'git@github.com:reshamas/scikit-learn.git'
hint: Updates were rejected because the tip of your current branch is behind
hint: its remote counterpart. Integrate the remote changes (e.g.
hint: 'git pull ...') before pushing again.
hint: See the 'Note about fast-forwards' in 'git push --help' for details.
(sklearndev)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as suggested in the message, you need a git pull
;)
This happens because the branch on your fork (on the cloud) and the one on your local machine have diverged: they both got one commit that the other one doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried both git pull
and git pull upstream master
. I also tried various permutations, and I am now "Lost in Git"
There is no tracking information for the current branch.
Please specify which branch you want to merge with.
See git-pull(1) for details.
git pull <remote> <branch>
If you wish to set tracking information for this branch you can do so with:
git branch --set-upstream-to=<remote>/<branch> display-diagram
(sklearndev)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the issue isn't with upstream master
(which is this repo's main branch) but with this branch on your own fork: origin display-diagram
I'm a bit surprise git pull
doesn't work, maybe try git pull origin display-diagram
then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Though it looks like you're on the sklearndev
branch on your machine? (or is that the name of the virtualenv?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ok, it went through. (the steps worked, but I needed to commit my changes first)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you have merge conflicts that weren't taken out of the file now
doc/getting_started.rst
Outdated
You may also call `set_config | ||
<https://scikit-learn.org/stable/modules/generated/sklearn.set_config.html>`_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please use a :func:
sphinx reference instead.
Also above we'll need an import for config_context
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I run these lines in Jupyter:
import sklearn
from sklearn import set_config
with sklearn.config_context(display='diagram'):
pipe
I don't get any output (no errors either).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah, the context manager is such that the cell doesn't return the estimator. That's unfortunate, I wanted to avoid that all subsequent estimator would be rendered as diagrams but that won't be possible.
In this case, I would just remove the note and call set_config
at the beginning of the pipeline section code blcok, with a short comment like:
set_config(display='diagram') # for HTML repr of estimators
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the code was replaced with what is below? This does render the diagram in Jupyter notebook.
from sklearn import set_config
set_config(display='diagram')
pipe
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, we can keep the note
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com>
doc/getting_started.rst
Outdated
|
||
>>> from sklearn import set_config | ||
>>> set_config(display='diagram') | ||
... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... |
This seems to create a rendering issue
doc/getting_started.rst
Outdated
|
||
>>> from sklearn import set_config | ||
>>> set_config(display='diagram') | ||
>>> pipe |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@thomasjpfan any idea why this does not render as html?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pure rst, no code is actually ran when the docs are build. We have been linking to a sphinx gallery that properly displays the html repr.
Sidenote: I had a custom sphinx derivative that did just this, but I thought maintaining another sphinx derivative would be too much of a maintenance burden.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should I move the example to another file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, we do run the docs but not when building them :/
I think we should go with a dedicated example then, and not change the getting_started guide (#18305 (comment)). @reshamas I guess we can just merge this PR if you revert the changes made to getting_started.rst
? Thanks and sorry for the extra work
doc/getting_started.rst
Outdated
@@ -117,6 +117,7 @@ the test data:: | |||
>>> accuracy_score(pipe.predict(X_test), y_test) | |||
0.97... | |||
|
|||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@NicolasHug |
#!/usr/bin/env python | ||
# -*- coding: utf-8 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need for these
#!/usr/bin/env python | |
# -*- coding: utf-8 -*- |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @reshamas , yes I think this is on the right track!
`set_config(display='text')`. To visualize the diagram in Jupyter Notebook, | ||
use `set_config(display='diagram')` and then call the pipeline object. | ||
|
||
# %% |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't render properly, I think you need to end the docstring before starting the # %%
blocks
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure that is the solution.
Example in this file: https://github.com/scikit-learn/scikit-learn/blob/master/examples/compose/plot_compare_reduction.py
When I end the docstring earlier, it doesn't work, unless I need two separate docstrings?
Note that the use of ``memory`` to enable caching becomes interesting when the
fitting of a transformer is costly.
# %%
Illustration of ``Pipeline`` and ``GridSearchCV``
###############################################################################
This section illustrates the use of a ``Pipeline`` with ``GridSearchCV``
"""
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@cmarmo
Yes. @Mariam-ke and I try to meet weekly to work on a PR, so we'll continue work on this.
estimators = [('reduce_dim', PCA()), ('clf', SVC())] | ||
pipe = Pipeline(estimators) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
estimators = [('reduce_dim', PCA()), ('clf', SVC())] | |
pipe = Pipeline(estimators) | |
pipe = Pipeline([('reduce_dim', PCA()), ('clf', SVC())]) |
or call it "steps" instead of "estimators", since this is the name of the parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 I think "steps" is a better name than "estimators". I made that update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the follow up @reshamas !
I think this is almost ready to go!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor comments on adding custom parameters so the visualization is more interesting when clicked. Otherwise LGTM!
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thank you @reshamas for creating this example.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM (once the nitpick suggestions are accepted)!
Merged, thanks @reshamas! |
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Co-authored-by: Nicolas Hug <contact@nicolas-hug.com> Co-authored-by: Thomas J. Fan <thomasjpfan@gmail.com> Co-authored-by: Olivier Grisel <olivier.grisel@ensta.org>
Reference Issues/PRs
Fixes #18305
What does this implement/fix? Explain your changes.
Adds
set_config(display='diagram')
to first example in Pipelines and Composite EstimatorsAny other comments?
cc: @Mariam-ke