Skip to content

display='diagram' needs more prominence in documentation #18305

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
jnothman opened this issue Aug 31, 2020 · 24 comments · Fixed by #18758
Closed

display='diagram' needs more prominence in documentation #18305

jnothman opened this issue Aug 31, 2020 · 24 comments · Fixed by #18758

Comments

@jnothman
Copy link
Member

The sklearn.set_config(display='diagram') feature highlighted here needs more prominence. At least, it should be easy to find it when googling scikit-learn diagram display. It isn't. scikit-learn display=diagram is much the same, and scikit-learn "display=diagram" is on the mark but the results are not easy to parse as being relevant.

We probably need an example, or a user guide page, showing how different estimators are diagrammed, and perhaps also instructing users to use IPython.display's display function when needed.

@jnothman jnothman added Documentation Easy Well-defined and straightforward way to resolve help wanted labels Aug 31, 2020
@dishak331
Copy link

The sklearn.set_config(display='diagram') feature highlighted here needs more prominence. At least, it should be easy to find it when googling scikit-learn diagram display. It isn't. scikit-learn display=diagram is much the same, and scikit-learn "display=diagram" is on the mark but the results are not easy to parse as being relevant.

We probably need an example, or a user guide page, showing how different estimators are diagrammed, and perhaps also instructing users to use IPython.display's display function when needed.

So you need examples along with that guide page so that it shows more prominence such as diagrams and all?

@tash149
Copy link

tash149 commented Oct 6, 2020

Hey @jnothman can you please help me with open source contributions to projects pertaining to machine learning. I have worked on research projects though I have never contributed here. Also all the issues seem so overwhelming I can't decide where to begin with. It would be wonderful if you could guide me through. Thanks 😁✨

@NicolasHug
Copy link
Member

I'm not sure the right way to go is to introduce diagram rendering in the UG at random places. In most cases this will distract from the original purpose of the code snippets.

I would prefer to have a dedicated example for illustrating the diagrams of different estimator, as Joel proposed above. Note that we already have https://scikit-learn.org/dev/auto_examples/compose/plot_column_transformer_mixed_types.html#html-representation-of-pipeline . Maybe we can add the keyword "Diagram" to this subsection for SEO.

We can also add a small and short note at the end of this section of the getting started guide: https://scikit-learn.org/dev/getting_started.html

@jnothman
Copy link
Member Author

jnothman commented Nov 5, 2020

Or a section (in Getting Started or otherwise) on how to configure Scikit-learn for a Jupyter Notebook.

@reshamas
Copy link
Member

reshamas commented Nov 5, 2020

How about,

  1. in this file: HTML representation of Pipeline,
    we change the title from "HTML representation of Pipeline" to "HTML Representation of Pipeline (Display Diagram)"

  2. in this file: Getting Started, can add:

Pipelines: displaying diagrams in Jupyter notebook
---------------------------------------------------

The default configuration for displaying a pipeline is 'text':  `set_config(display='text')`.  To visualize the diagram in Jupyter Notebook, use `set_config(display='diagram')` and then call the pipeline object.

  >>>from sklearn.pipeline import Pipeline
  >>>from sklearn.svm import SVC
  >>>from sklearn.decomposition import PCA
  >>>estimators = [('reduce_dim', PCA()), ('clf', SVC())]
  >>>pipe = Pipeline(estimators)
  >>>pipe
  Pipeline(steps=[('reduce_dim', PCA()), ('clf', SVC())])

  >>>from sklearn import set_config
  >>>set_config(display='diagram')
  >>>pipe

@NicolasHug
Copy link
Member

OK for 1

For 2: I would rather not make a new section in the getting started guide. I think a note would be enough and to keep it short, we can just re-use the already defined pipeline in the corresponding section, and use the config_context manager (We can also mention set_config)

@reshamas
Copy link
Member

reshamas commented Nov 5, 2020

In getting_started.rst, can make the following edit, at the end of the section "Pipelines: chaining pre-processors and estimators", can add:

The default configuration for displaying a pipeline is 'text' where `set_config(display='text')`.  To visualize the diagram in Jupyter Notebook, use `set_config(display='diagram')` and then call the pipeline object.

@NicolasHug
Copy link
Member

NicolasHug commented Nov 5, 2020

I think users would benefit from seeing the diagram in action:

To render estimators as diagrams in notebooks, use the `display='diagram'` option:

>>> with config_context(display='diagram'):
>>>    pipe
<diagram renders here>

You may also call `set_config` (link https://scikit-learn.org/stable/modules/generated/sklearn.set_config.html) at the top of the notebook.

This would ideally be withing a .. note:: Diagram rendering of estimators

@NicolasHug
Copy link
Member

I didn't realize that the code in the UG is not executed by sphinx, and thus we can't see diagrams in the UG. It seems that we should create a dedicated example then, with various pipelines / estimators of different complexity. Would you be interested in doing that @reshamas ? @hongshaoyang also showed interest

Some examples of estimators worth rendering (feel free to deviate):

  • single estimator like SVC
  • basic pipeline
  • complex pipeline with ColumnTransformer (as in the other example linked above)
  • GridSearch within a pipeline, or GridSerach over a pipeline

@NicolasHug NicolasHug removed Easy Well-defined and straightforward way to resolve help wanted labels Nov 13, 2020
@reshamas
Copy link
Member

@NicolasHug
Would the examples:

  1. be created here: https://github.com/scikit-learn/scikit-learn/tree/master/examples
  2. Would we add a new sub-folder for "examples/pipelines" OR
  • put SVC example under "/svm"
  • put ColumnTransformer under "/preprocessing"
  • put basic pipeline --> ?
  • put GridSearch --> ?

@NicolasHug
Copy link
Member

NicolasHug commented Nov 13, 2020

I think we can just put it in examples/miscellaneous (we only need 1 example)

@reshamas
Copy link
Member

You mean we need a few examples, but in one file called /examples/miscellaneous/pipeline.py?

@NicolasHug
Copy link
Member

NicolasHug commented Nov 13, 2020

yes, though the file name needs to be named plot_xyz.py so I'd suggest plot_display.py

By example I mean one entry in https://scikit-learn.org/stable/auto_examples/index.html. Each of these are 1 file. In the example, we can illustrate more than one estimator (all of them, actually)

@jnothman
Copy link
Member Author

jnothman commented Nov 14, 2020 via email

@NicolasHug
Copy link
Member

where? should this just be the example? I'm not sure we have enough material about scikit-learn in jupyter to have a whole new UG chapter

@jnothman
Copy link
Member Author

jnothman commented Nov 14, 2020 via email

@NicolasHug
Copy link
Member

I wouldn't recommend a tutorial because our tutorials page needs some work: #18257. We use the examples nowadays for tutorial-like entries, e.g. this or this

I'm also curious about what else our recommendations are regarding scikit-learn + jupyter?

@reshamas
Copy link
Member

@NicolasHug @jnothman
Could we have a folder with example notebooks, that people could easily download and use?

@reshamas
Copy link
Member

@NicolasHug @cmarmo @jnothman
The Modin project has a "jupyter" folder in their /examples/. Here it is:
https://github.com/modin-project/modin/tree/master/examples/jupyter

Is this something we could consider doing for scikit-learn? It seems a bit more flexible.

@NicolasHug
Copy link
Member

@reshamas why not create a regular example as suggested in #18305 (comment)? The examples from the gallery can be downloaded as .ipynb files if users want to use them as notebooks.

@reshamas
Copy link
Member

@NicolasHug I recall the problem with creating a regular example was that sphinx did not produce the diagram in the documentation. But, in a Jupyter notebook, the diagram is produced. That way, users can see the code with the diagram.

@NicolasHug
Copy link
Member

The examples are able to generate the html disaply properly, if that's what you mean: https://scikit-learn.org/stable/auto_examples/compose/plot_column_transformer_mixed_types.html#sphx-glr-auto-examples-compose-plot-column-transformer-mixed-types-py

The issue with the html renderring and sphinx, IIRC, was only in the UG because the code isn't executed there

@reshamas
Copy link
Member

@cmarmo
I haven't had time to work on this, so if someone wants to take it over, it's available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
6 participants