Skip to content

[Doc]: Support of notebook format for docs creation #25016

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kolibril13 opened this issue Jan 18, 2023 · 22 comments
Closed

[Doc]: Support of notebook format for docs creation #25016

kolibril13 opened this issue Jan 18, 2023 · 22 comments

Comments

@kolibril13
Copy link
Contributor

kolibril13 commented Jan 18, 2023

Problem

As soon as one example has more than one sub-example, e.g. here
https://matplotlib.org/stable/gallery/text_labels_and_annotations/font_family_rc.html#sphx-glr-gallery-text-labels-and-annotations-font-family-rc-py

it is inconvenient to iterate on that example with python files (*.py), because it's not straight forward to run a python file cell in sections.

Suggested improvement

A good format to support section wise code execution would be .ipynb.
What do you think about supporting .ipynb for creating docs examples?
I think that would make iterating over examples easier, and it would encourage contributors to create examples of shorter code blocks with multiple sub-examples. This could also be very useful in the tutorial section.

In my opinion, a bunch of small examples each showing only one feature is much better than one big example.
Matplotlib has currently many examples that have long and convoluted code (e.g. 1,2,3,4), and I think that might be the case because it's difficult to split examples into smaller chunks with the current infrastructure.

@timhoffm
Copy link
Member

I'm not following. Do you see inconvenience for the autor or consumer of the example?

You can use multiple sections (https://sphinx-gallery.github.io/stable/syntax.html).

For the author in particular:

The #%% and # %% syntax is consistent with the ‘code block’ (or ‘code cell’) separator syntax in Visual Studio Code Python extension, Visual Studio Python Tools, Jupytext, Pycharm Professional, Hydrogen plugin (for Atom) and Spyder.

For a consumer: We have "Download jupyter notebook" buttons at the bottom of the example.

@jklymak
Copy link
Member

jklymak commented Jan 18, 2023

The biggest issue with our docs right now is that the tutorials/examples are built a different way than the narrative docs, and that strongly affects the structure of the docs, in bad ways I would argue. #24987 is an attempt to mitigate that somewhat by including tutorials linked in the narrative sections.

Looking at nbsphinx, it seems to me that it can intersperse rst with Notebooks. If it's true, that means a PR like #24979 that was converting gallery *.py examples to *.rst could instead convert to Notebooks. To me, the biggest problem with #24979 was that the *.rst plot directive has no global state, so it's not analogous to a sphinx-gallery example.

So, I think adding nbsphinx would be a relatively intriguing option, and may indeed allow people to put more documentation in the narrative sections of the docs, particularly if the notebooks can be interspersed with rst and cross linking etc all work seamlessly.

@jklymak
Copy link
Member

jklymak commented Jan 18, 2023

OTOH, a quick test with nbsphinx didn't render in-place for me, so I guess it will take some configuration to get working with our doc build.

@tacaswell
Copy link
Member

I am very much against adding .ipynb files to the docs as they pose significant version control problems (you have to remember to strip the output and diff tools do not understand json).

If we do go that route we would have to use the jupytext format, but as @timhoffm points out sphinx-gallery already support # %% as a separator! I would be in favor of replacing our long # lines with # %% if it makes the files play nicer with a bunch of IDEs.


If we are going to go through another disruptive change to how we encode the docs it will probably be a move to myst (but I think we should let that settle a bit longer).

@jklymak
Copy link
Member

jklymak commented Jan 18, 2023

Good point about diffs, and needing to use jupytext or similar.

I wouldn't personally be in favour if this had to be a wholesale change. But mixing in a different format with the existing rst files (not gallery!) isn't out of the question.

@tacaswell
Copy link
Member

#25021 does the bulk switch to # %%.

@kolibril13
Copy link
Contributor Author

Do you see inconvenience for the autor or consumer of the example?

For the consumer, there is no inconvenience, and the "Download Jupyter notebook" button is amazing!

I see the inconvenience for the author and the reviewer.

For the author, because it's difficult to iterate over an example that is in the end of the example script, as the whole script has to be run for each iteration. I think that the # %% syntax is an excellent solution to introduce block wise execution, thanks for this pr @tacaswell !

For the reviewer, it's not easy to see how the changes are affecting the example image outputs:
Right now, the reviewer has to check firstly the example in the current docs, and secondly the example in the pr artefact docs.
That is inconvenient and quite time-consuming.
Furthermore, the pr artefact docs will only be available after approx. 30 minutes, which slows the pr review process down a LOT.
In my opinion, writing documentation is a highly iterative process, and I think that synchronous communication would make the process much more pleasant for the author and reviewer.

This inconvenience could be solved by another advantage of .ipynb: it allows to not only save the code, but also the image output.
When all examples are saved in notebooks together with their image output, a reviewer would immediately see what changes in the code.

Here is how a GitHub pull request changes tab will look like for.ipynb that contains the rendered images (I've signed up for a GitHub feature beta testing, this notebook div view is not yet publicly available):
image

The main problem with this: The git history would be filled with image data.

This could be overcome with the following strategy: All prs, that iterate on examples, can go to a new branch called docs. This docs branch contains all executed example notebooks with image output.
The contributor makes the pull request with the rendered example.
After the review and some iterations, the reviewer can merge the pr to the docs branch.
Once a week, a GitHub action can run which makes a pull request from the docs branch to the main branch, where all example changes are considered that were made in that past week. This GitHub action pr takes care that the output is cleaned, and it could even convert the notebook to any other format (like e.g. the above-mentioned # %% Jupytext syntax in .pyfiles)

@jklymak
Copy link
Member

jklymak commented Jan 18, 2023

Can you mix rst with nbsphinx? I didn't get that to work in my 5 minutes of playing with it.

@tacaswell
Copy link
Member

The main problem with this: The git history would be filled with image data.

This is a fatal problem. Our repo is already extremely large because we include the test images (which we need to do something about) so including the example images is a non-starter. Even if the examples were on a separate branch, they would still be in the repository and included in history unless we reguarly force-pushed that branch out of existence (which would then lead to chaos on any outstanding PRs against that branch when we did so). We would also need a checker to make sure no changes came in via any method but that GHA (which is doable but annoying). The proposed scheme might work with a separate repo, but at that point we are growing a tremendous amount of complexity to solve a problem which is easy to not make (by sticking with .py, .rst, or (if we must) .md files).

I am also slightly concerned about integrating GHA that closely into our development process. We are getting those free cpu cycles essentially on an advertising budget line and they can go away at any time (see what happened with travis.org). Using it for nice-to-haves (like the new contributor greeter) and CI is a reasonable risk (because if they went away tomorrow we could do without or replace it "easily" (as we already use a diverse set of CI providers)), but for something like "this needs to run to update the examples in our docs" I would be worried. In our history we have moved hosting platforms once (from sourceforge to github which also included a VCS change from svn->git) so I also think we should keep an eye towards that being a possibility in the future.

Here is how a GitHub pull request changes tab will look like for.ipynb that contains the rendered images (I've signed up for a GitHub feature beta testing, this notebook div view is not yet publicly available):

That GitHub has integrated tools is good (even if they are in beta), but it does not solve the problem when using other git tools on local machines. Being able to chase through the history of changes to a file is very valuable to understand why things are the way they are. I am aware of things like nbdime, however I am not sure I have ever managed to get it set up correctly. I think that any improvement to the editing workflow for some contributors (and a major degradation for others (well, me)) would be greatly outweighed by the added complexity for all developers of having to get notebook diffing tools installed.

For the author, because it's difficult to iterate over an example that is in the end of the example script, as the whole script has to be run for each iteration.

As someone who has written a bunch of those longer examples, my tactic was to copy out the section I was working on into its own file (well, emacs buffer), iterate there and then copy back when I was happy. Further, most IDEs also have a "run region interactively" functionality even absent the automatic "cell" detection.


From my verbosity @kolibril13 I hope you can tell I do not like to say "no", but I am pretty firm that we should not start using .ipynb files for the checked-in version of any part of our docs at this point.

@jklymak
Copy link
Member

jklymak commented Jan 18, 2023

As someone who has written a bunch of those longer examples, my tactic was to copy out the section I was working on into its own file (well, emacs buffer), iterate there and then copy back when I was happy. Further, most IDEs also have a "run region interactively" functionality even absent the automatic "cell" detection.

I'll agree strongly with this - I wouldn't do this just for the convenience of the author. The constrained_layout tutorial is the slowest tutorial we have, and it takes 4 s on my machine to make all the figures, which is hardly a barrier to iterating. Or if I'm doing something really fussy, I'll do what Tom suggests above and make a test file.

For me, the only advantage here would be organizational, where long explanations that would be better in the narrative docs but are easier to write in code would have their home where they belong in the narrative.

@anntzer
Copy link
Contributor

anntzer commented Jan 18, 2023

FWIW I'm strongly against ipynb here as well, because...

(and a major degradation for others (well, me))

... I still have no idea of how to set up the matplotlib/notebook interaction at all. (I haven't tried very hard, but I have tried a couple of times.)

@kolibril13
Copy link
Contributor Author

I think the problem with section wise code execution is solved by the PR #25021 so.ipynb won't be necessary for that.
Thank you all for spending your valuable time to participate in this discussion, really appreciate it!
I will close this issue now.

including the example images is a non-starter

It would be really lovely to have all images version controlled, close together with their generating code.
But I totally understand that this is out of scope for the Matplotlib main repo.

I just want to mention that this is can be done for smaller Matplotlib related projects.
Low resolution images with have a size of a few kBs, that means in 100 MB (which is a reasonable repo size), one could easily version control 10.000 images in notebooks. This is how the Git History would look like:
image

@tacaswell
Copy link
Member

Thank you for being understanding @kolibril13 .


This is how the Git History would look like:

That is still on GitHub, what does it look like with git diff on the command line or gitk?

@melissawm
Copy link
Member

Hello, folks - I am just now seeing this. I'd like to point out that it is not necessary to use the .ipynb format for executable docs. In fact, using Jupytext it is possible to have executable .md files (using the MyST Markdown syntax) that will only be executed upon docs build, meaning they don't contain any outputs and so would be straightforward to diff.

I have a PR up to SciPy to do that but unfortunately we are still hanging on a minor detail. Please check scipy/scipy#17322

Happy to answer questions or provide a minimal example if folks are generally in favor of this idea.

@kolibril13
Copy link
Contributor Author

After the bulk switch to # %% in #25021, it is now also possible to open the .py format as an executable notebook. @melissawm : What would be the advantage of executable .md files over executable .py files ?

Also, I want to note that the best workflow I could find to work .py files as notebooks is not optimal, but close enough.
Ideas how to improve this workflow are appreciated! :)
My setup is the Latest VS Code, together with the Jupytext plugin https://github.com/congyiwu/vscode-jupytext
As soon as I right-click a .py file, I can choose "Open as Jupyter Notebook", and I can then edit, run and save the notebook, which is saved as .py with # %% cell dividers.
BUT: Even when I don't change anything, the new saved file now contains a # %% also in the very first cell of the notebook, which is currently not there in the Matplotlib examples. Furthermore, there is now a JupyterText specific header, which I have to delete via hand:

# ---
# jupyter:
#   jupytext:
#     cell_metadata_filter: -all
#     custom_cell_magics: kql
#     text_representation:
#       extension: .py
#       format_name: percent
#       format_version: '1.3'
#       jupytext_version: 1.11.2
#   kernelspec:
#     display_name: matplenv
#     language: python
#     name: python3
# ---

Currently, it's also not possible to change these settings in the Jupytext plugin, but maybe this would be worth making an issue https://github.com/notebookPowerTools/vscode-jupytext

@melissawm
Copy link
Member

It does look like you have a different setup for the .py files to make them more "narrative" that I'm not familiar with. Are these modified .py files also opened as notebooks in Jupyter classic or JupyterLab? Or just VSCode? The advantage of the myst/jupytext approach is that these markdown files are opened as jupyter notebooks in a plain jupyter setup as well.

@melissawm
Copy link
Member

Also - this header you are mentioning is exactly what makes a regular markdown file executable. Remove it, and it is back to a regular static markdown file.

@kolibril13
Copy link
Contributor Author

.py files also opened as notebooks in Jupyter classic or JupyterLab? Or just VSCode?

Considering this example:
foo.py

# %%
a = 1
# %%
b = 2

It's possible to open this in VS Code as a notebook when the vscode-jupytext plugin is installed.
I've just spent 15 minutes trying to open foo.py as a notebook in JupyterLab as well, but I did not find a quick solution and gave up on that.

The advantage of the myst/jupytext approach is that these markdown files are opened as jupyter notebooks in a plain jupyter setup as well

If that works, that would be really nice! I've just tested that with my JupyterLab installation (Version 3.4.8) with the notebook from your pull request: https://github.com/scipy/scipy/blob/b55d32eee1a18db5fd3f06a423b62bb19bc2501b/doc/source/notebooks/interp_transition_guide.md
but it does not open as a notebook, as you can see in this screenshot:
image
is there another package I have to install?

@melissawm
Copy link
Member

You need to install jupytext, and then you can right click this md file in Jupyterlab and "Open as notebook". You can see examples here: https://github.com/numpy/numpy-tutorials (a more comprehensive explanation is here)

@kolibril13
Copy link
Contributor Author

Thanks!
One more thing to consider:
A common way to tweak an example and open a pull request in recent days is via github.dev, the online version of VS Code.
That saves time because one does not have to clone the whole repo.
E.g. https://github.dev/scipy/scipy/pull/17322
In this only version, VS Code Jupytext is not supported at all, which means one can only edit in markdown mode.
That's not a big issue, just wanted to mention that.
image
image

@melissawm
Copy link
Member

@kolibril13 I am pretty sure you can get that to work by using a devcontainer.json that includes an initial configuration for your codespace - check out numpy/numpy#23076

@kolibril13
Copy link
Contributor Author

I'll have a look!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants