Skip to content

[Bug]: Plot directive clutters the output directory with unneeded files #24005

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
MikhailRyazanov opened this issue Sep 25, 2022 · 9 comments

Comments

@MikhailRyazanov
Copy link
Contributor

MikhailRyazanov commented Sep 25, 2022

Bug summary

Using the Matplotlib plot directive in Sphinx documentation results in cluttering the output directory with all the produced image files in all generated formats even when they are not needed (plot_html_show_formats = False). This is especially obvious with the singlehtml builder, where all the necessary images must be inside the _images subdirectory, but in fact, in addition to _images, the whole directory tree with every image in every format is copied to the output directory, although nothing of it is referenced from the HTML file.

What is more interesting, if another builder is executed first, then its cached images are reused properly — only the necessary files are copied into _images, without cluttering the output directory with unneeded files and subdirectories.

Code for reproduction

See a minimal working example at https://github.com/MikhailRyazanov/plot-clutter.

Actual outcome

From step 1 of the example above:

$ tree build/singlehtml
build/singlehtml
├── _images
│   ├── index-1.svg
│   ├── other-1.svg
│   └── script.svg
├── index-1.pdf
├── index-1.svg
├── index.html
├── objects.inv
├── _static
│   <skipped>
└── subdir
    ├── other-1.pdf
    ├── other-1.svg
    ├── script.pdf
    └── script.svg

Expected outcome

From step 2 of the example above:

build/singlehtml
├── _images
│   ├── index-1.svg
│   ├── other-1.svg
│   └── script.svg
├── index.html
├── objects.inv
└── _static
    <skipped>

Additional information

In practice, this undesired behavior directly affects the downloadable documentation at Read the docs — see issue #9068 there.

Operating system

No response

Matplotlib Version

3.5.2

Matplotlib Backend

No response

Python version

No response

Jupyter version

No response

Installation

Linux package manager

@MikhailRyazanov MikhailRyazanov changed the title [Bug]: Plot directive clutter the output directory with unneeded files [Bug]: Plot directive clutters the output directory with unneeded files Sep 26, 2022
@MikhailRyazanov
Copy link
Contributor Author

I think, this is somewhat related to #20814, in which a similar situation with source scripts was addressed. I've tried similarly to wrap the preceding image-copying code with if config.plot_html_show_formats:, and this helped.

But then I've discovered that the overall logic of copying all these files is seriously flawed. Namely, if adding links to images (and scripts) is not disabled, and different builds are done in succession (for example, html followed by singlehtml), then the now necessary files are copied only into the first build's output directory, but the second build has no files in its output directory, resulting in all the image (and script) links broken. See step 3 that I've added to my example.

So either the file-copying code needs to be moved somewhere else, or, if possible, file handling should be delegated to Sphinx itself...

@harupy, @tacaswell and @QuLogic, maybe you can take a closer look at this? Because it also affects #20814. I'm not saying that the PR broke anything, since the lack of needed copying was apparently present before, but this issue is more or less related, and you already have some familiarity with the involved code and use cases.

@MikhailRyazanov
Copy link
Contributor Author

MikhailRyazanov commented Oct 5, 2022

A simple method to delegate all file handling to Sphinx would be to use the :download: role (see the diff). This solves both problems — Sphinx will make sure that all and only needed files are copied. However, this solution is not ideal:

  1. The default-format image file is duplicated in _downloads even though it already exists in _images. However, this can be avoided by simply skipping the image with fmt == default_fmt (all browsers provide the “open image” and “save image” functionality to the user, so the default image file would still be available, by standard means).
  2. The _downloads directory has an ugly structure, with every file placed in its own subdirectory with a long unreadable name (file's hexadecimal hash string). This is probably not a big deal by itself but can lead to other problems.
  3. If there is anything that depends on where the plot directive places its files in the output directory, it will break. Not only immediately, due these files' relocation into _downloads, but also conceptually because these hash paths within _downloads cannot be predicted. I hope that nobody really depended on that (as it was not documented). However, at least the tests need to be modified accordingly.

Does anybody know whether Sphinx provides any relevant API to solve this in a cleaner and more backward-compatible way?

@MikhailRyazanov
Copy link
Contributor Author

Yet another case of broken behavior of the current implementation is that links to files corresponding to documents in subdirectories of the source tree are generated incorrectly in the singlehtml build, see step 4 that I've added to my example. The :download: approach easily avoids this problem as well.

@MikhailRyazanov
Copy link
Contributor Author

Issue #13858 seems to be very similar in terms of incorrect generated links, but for the dirhtml build. And it is also solved by the :download: approach.

@HealthyPear
Copy link

I think I have the same problem, with the difference that in my case I am asking both to show the source code link and downloadable images (but I could do without that to be honest)

I have an examples folder in the root of my repo (where I would like to keep it) and I use the sphinx extension like this,

.. plot:: ../../examples/plot_Sources_and_Areas.py
	  :caption: Result of the ``plot_Sources_and_Areas`` script
	  :width: 600

This creates a new examples folder under docs, but I would expect it to be under docs/build, which is what I expose from my CI job artifact for the final web page.

Interestingly, if we follow the instructions from this page this behaviour doesn't happen (but this forces the developer to keep source code into the docs folder).

This is my work tree after generating the docs,

image

@HealthyPear
Copy link

Do your PRs relate to my case?

@MikhailRyazanov
Copy link
Contributor Author

@HealthyPear, I was able to model your case and can reproduce similar behavior.

I suppose that your conf.py is in docs/samples, your Sphinx build directory is docs/build, and the output directory is docs/build/html. Here is what, I think, happens:

  1. Plot directive extension should cache its generated files in docs/build/plot_directive, but when doing so, it preserves file paths relative to the documentation source root (where conf.py is located). Thus, for example, your ../../examples/plot_Sources_and_Areas.py is cached as docs/build/plot_directive/../../examples/plot_Sources_and_Areas.py, which basically means docs/examples/plot_Sources_and_Areas.py — this is exactly where you see the script and its output stored.
  2. Then plot directive extension should copy the files from its cache to the output directory docs/build/html, but when doing so, it also preserves relative paths. Now your ../../examples/plot_Sources_and_Areas.py becomes docs/build/html/../../examples/plot_Sources_and_Areas.py, which accidentally again means docs/examples/plot_Sources_and_Areas.py (and similarly for the related image files), so no files are in fact copied, but the links in the output HTML are pointing to those files.

My PR #24205 correctly treats the second part — it does copy all the necessary files into docs/build/html/_downloads, so they will be available to your CI (for the final web page, you really need to expose only docs/build/html, not the whole docs/build) and be linked correctly from the HTML files.

It, however, doesn't modify the caching part, so you'll still see docs/examples generated. This is a separate problem, and I'm not sure that Matplotlib maintainers would be willing to do anything about it because they might just claim that plotting scripts must be within the source tree (not its grandparent, as in your case). Nevertheless, a simple solution that I can propose is to make a symlink directory docs/examples pointing to ../../examples and accordingly change

.. plot:: ../../examples/plot_Sources_and_Areas.py

to

.. plot:: examples/plot_Sources_and_Areas.py

Then the cached files will end up in docs/build/plot_directive/examples, and the final output files in docs/build/html/examples (with the current Matplotlib code; or, after the PR is merged, inside the same docs/build/html/_downloads as described above). I'm not sure, although, how portable this solution with a symlink is (it you need to sync your repo between Linux/macOS/Windows/whatever).

@HealthyPear
Copy link

My PR #24205 correctly treats the second part — it does copy all the necessary files into docs/build/html/_downloads, so they will be available to your CI (for the final web page, you really need to expose only docs/build/html, not the whole docs/build) and be linked correctly from the HTML files.

Ok good.

It, however, doesn't modify the caching part, so you'll still see docs/examples generated. This is a separate problem, and I'm not sure that Matplotlib maintainers would be willing to do anything about it because they might just claim that plotting scripts must be within the source tree (not its grandparent, as in your case). Nevertheless, a simple solution that I can propose is to make a symlink directory docs/examples pointing to ../../examples and accordingly change

I think there is an easier cross-platform solution for this part, since the docs/examples folder would be just for temporarely caching:

  • you add the created directory to .gitignore (just to be sure to not clutter the project repo)
  • you update the Makefile to remove that folder at the end of the build when the HTML material is safe and sound.

@HealthyPear
Copy link

By the way, my fina plan is actually to make a sort of gallery from my examples, just like matplotlob has.

they might just claim that plotting scripts must be within the source tree (not its grandparent, as in your case).

Isn't this what they are doing here, though?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants