Skip to content

sphinx-exhibit #11936

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from
Closed

sphinx-exhibit #11936

wants to merge 1 commit into from

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Aug 25, 2018

As hinted in a few other places in the issue tracker, I have been working on a new example runner for the docs. While it is still incomplete, I think I can now put up this PR to showcase some features.

See https://3905-7439715-gh.circle-artifacts.com/0/home/circleci/project/doc/build/html/index.html for an example build.

A (very) brief README for the sphinx extension (boringly named "sphinx-exhibit") is available at https://github.com/anntzer/sphinx-exhibit

What works, especially in comparison with sphinx-gallery:

  • Examples are run and doc pages generated (well, that's the base...).
  • The links in the examples from the names to the corresponding API docs are much more complete than the ones generated by sphinx-gallery. This is because I actually introspect these while running the examples, so I can know that (e.g.) ax.plot is matplotlib.axes.Axes.plot. See e.g. the hillshading demo:
    annotations
  • Conversely, this allows backreferences to also be more complete: they now exist for methods too (see e.g. Axes.pcolorfast (at the very bottom) for something that doesn't have a corresponding function in pyplot).
  • Notebooks are generated and have nicer markup, because we postprocess sphinx's HTML output, so all rst has been processed by sphinx already. Compare e.g. the links to ipython and pillow in the notebooks generated for the image demo by sphinx-gallery
    nb-sg
    and by sphinx-exhibit
    nb-se
    (Yes, you notice the broken image link more in the latter, but note that the image is actually also missing in the former :))
  • I think the way to specify document order is nicer than for sphinx-gallery (see gallery/index.rst and tutorials/index.rst), although it is admittedly less general (could be improved).

What remains to be done:

  • Thumbnails are not generated (should be easy), and lists of links to examples are text-only (need to do some CSS wrangling to include the images).

Note that sphinx-exhibit is not on PyPI (yet) so this build installs it from my github repo. Also, I use sphinx-jinja for templating (see doc/gallery/index.rst), but that project has a few bugs and my PRs are not getting merged (the project seems inactive), so I also install it from my repo (I may ultimately fork that project under a separate name if it remains inactive). (The sphinx-jinja manager merged my PRs and pushed a new version to PyPI, so things are fine now on this front.)

Note that the first commit of this PR is #11931.

attn @ImportanceOfBeingErnest @jklymak @timhoffm (who discussed this in #10974) and @choldgraf (as sphinx-gallery dev).

@timhoffm
Copy link
Member

Wow, this looks really promising, even tough I had just a quick glance at it! 👍

@choldgraf
Copy link
Contributor

choldgraf commented Aug 26, 2018

This looks really neat! I'm curious what's the main difference between this and sphinx-gallery. It seems like this is more lightweight, and also has "deeper" inspection of the classes etc, which is nice!

My main question comes from a sustainability perspective. I feel like the #1 rule in open-source is "only create new things when you absolutely have to", so my question is: why create a new package that we'll need to maintain rather than add features to sphinx-gallery, which already has a fairly strong dev community? One of the big reasons for the initial move to sphinx-gallery was to remove matplotlib-specific gallery code that was becoming burdensome, and instead use a community standard that many other packages were using.

IMO, we should need a very strong reason to move away from a pre-existing, well-vetted, community-wide tool, so I'm just making sure that we've done our due diligence here. I'm not sure what the matplotlib governance rules are on big tech decisions like this, but this seems like a big enough decision (since built examples and the galleries make up most of the online documentation) to warrant getting feedback from the dev community in a rigorous way.

@choldgraf
Copy link
Contributor

cc also @NelleV as she was the main driver of the initial conversion of the docs into sphinx-gallery

@anntzer
Copy link
Contributor Author

anntzer commented Aug 26, 2018

I am glad to hear that you like it. I knew the "why not PR these into SG?" question was coming... :) so here are some thoughts.

  • First of all, my interest in writing Py2-compatible code is close to zero nowadays (this will not be a surprise to core devs here...). Obviously this depends on the kind of tasks at hand, but for this kind of work things like pathlib are obligatory IMO. (Would SG have accepted pathlib as dependency for py<3.4? I don't know, but that's not the point; I'd also have needed to pathlibify the internals of SG as well.)
  • I much prefer adding rst custom directives (e.g. specifying the examples with .. exhibit::, or using .. exhibit-skip:: for skipping the running of an example -- other things that can be added later include e.g. .. exhibit-thumbnail-number::) rather than configuring more and more stuff into conf.py and parsing "special comments" (# sphinx_gallery_thumbnail_number) as SG does -- the former will just cleanly error out if you made a typo. Honestly, I don't think https://github.com/matplotlib/matplotlib/blob/master/doc/sphinxext/gallery_order.py showcases the best part of SG's API (I think gallery/index.rst in this PR is much simpler, but YMMV). Again, using directives could have been PR'd into SG, but see ENH: Allow setting example order sphinx-gallery/sphinx-gallery#275 (comment) (the last paragraph). In other words, I prefer to stick to docutils/sphinx's API rather than work around it. (See Replace :ref:sphx_glr_... by :doc:/.... #11312 for another example of what I consider "SG reinventing the wheel of what sphinx already provided".)
  • The above-mentioned comment notes that sphinx's own internals are moving quite quickly. Actually I don't think this applies to the directives part (which comes from docutils, which is extremely stable), but it is true that sphinx itself moves quite quickly. Again I have approximately zero interest in supporting old versions of sphinx (from what I see SG just dropped support for sphinx<=1.4; SE requires sphinx>=1.7); sphinx is pure-python and trivial to upgrade.
  • Finally, there are some quite "technical" parts in the code of SE (both the runtime introspection part (which relies on AST modification tricks), and the notebook generation part (which inserts raw html tags into the rst source, then extracts the blocks using lxml)), and my experience with this kind of technical code is that it tends to sit in review forever until some dev gives in and says "sure, looks reasonable, let's give it a try". (Or rather, this is my experience as a mpl dev -- can't speak for SG of course.) Perhaps you can also see SE as an attempt to effectively PR ~1000 lines of documentation-generation code into matplotlib, without actually having them reviewed :) (I realize this sentence won't help me getting this merged...)

To be clear, I think SG is a great project and did a lot to improve the state of docs in the scipy community (I also realize that not everyone has the chance of writing Py3-only code...). SE is just "how I would have done things differently", and some of these differences are just questions of taste (directives vs. conf.py+comments).

What happens now?

If the SG devs wants to integrate some of the features I wrote for SE (which I guess would mostly be better backrefs and better notebooks), they should definitely go for it (in a sense, you can take SE just as proof that "it can be done"). In other words, if SE can help making SG better, I'm happy too.
If you, or mpl devs, want to help pushing SE to the finish line (mostly, thumbnail generation and the associated CSS), I'd be happy to get your help too (e.g. for binder notebooks, I know nothing of the binder API but I'm sure it can be added). It may also be worth looking at whether there are longstanding feature requests of SG and see whether they are easier (or not) to implement with SE's design.

@choldgraf
Copy link
Contributor

choldgraf commented Aug 26, 2018

A quick couple of thoughts:

I think you're bringing up a lot of reasonable points. I don't think we really need to go into specific technical details here, my points are more about organizational process than technical pros/cons.

In general, I think a large and well-maintained project like matplotlib should also have a strong preference for other large or well-maintained projects (for non-trivial code, anyway). For example, the goal of moving away from using the matplotlib gallery directives and towards sphinx-gallery was because we were removing custom code that was understood/maintained by very few people, and instead depending on a well-tested/maintained package that was already fairly well-used and maintained (e.g. in the scikit-learn documentation). I think the decision to move away from an established project and towards a newer/smaller project shouldn't be taken lightly.

That said, I'm probably going to put a hold on any sphinx-gallery improvements in matplotlib (such as #11415 ) until we get some clarification on this, since I don't want to waste my time improving stuff if it's going to be replaced later on anyway. I promise this isn't meant as a negative comment, just trying to mindful of time commitments!

To be transparent: Obviously I have strong, strong preference for "don't create a new thing, contribute to a pre-existing thing" in this space. There are a million ways to solve these problems from a technical perspective, the hard thing is building sustainable / maintainable communities around the tools. Sometimes it does make sense for the ecosystem to fracture though!

@NelleV
Copy link
Member

NelleV commented Aug 26, 2018

I share @choldgraf's concern about maintainability. I don't think this approach is sustainable neither in terms of software development nor from a community perspective.

I'm also tagging @tacaswell in this discussion. Dependency changes should be discussed with the release manager and core maintainer, even when they only affect the documentation.

@matthew-brett
Copy link
Contributor

I must say, from my experience on https://matthew-brett.github.com/nb2plots, that I can well imagine that using directives would be more attractive than special comments.

There's surely a risk for maintenance, and a benefit from improved notebooks and introspection. So - how big is the maintenance risk? @anntzer - can you commit to maintaining this for - say - 5 years?

@anntzer
Copy link
Contributor Author

anntzer commented Aug 26, 2018

As noted above, I think both of the most important additional features (better introspection and better notebook generation) could be adopted by SG -- I just won't be the one to write that PR at least until SG drops Py2 support and starts tracking sphinx versions much more aggressively (that's not my decision; I am not familiar with SG's internals either but I doubt that is a problem). In fact, I see that @GaelVaroquaux just mentioned having the same idea for notebooks in sphinx-gallery/sphinx-gallery#402.

The use of directives to specify the galleries would be a bigger change, both backwards incompatible and, at least a year ago, opposed by a SG core dev (sphinx-gallery/sphinx-gallery#275 (comment) as noted in a previous comment). On the other hand, while I think directives are nicer, gallery order is not something you need to fiddle with very often, and the API is not visible to those who actually read the docs, so the incentive to change is smaller. The same point applies for directives vs. special comments.

If SG implemented the additional features (and possibly switched to directives, or not), then there would be no reason to switch to SE. In fact, note that I published this PR as soon as these main points were implemented in SE, so you can think of this as a sort of "technical preview" ("showcase some features", as written at the top of the PR); things such as thumbnails (generation, CSS) are well understood and at least I can't think of major improvements over what SG does (sure, I could just have copypasted/imported SG's code for that, but what's the point? I did import SG for parsing the examples though as that was needed to actually build the Matplotlib docs :-) Or you can think of this PR as a mega-PR of SE into SG, except that I was too lazy to actually write it on top of SG's current implementation and split it into reviewable chunks... Do I want to fragment the community? Of course not. But (while I realize that the discussion can be a source of tension) I do think (given on my unwillingness to write Py2 code) that the community is better off knowing that things that I propose in SE are technically possible (beyond the vaporware state).

SE is incomplete: it lacks thumbnail generation, and tests are... limited (though you can run pytest and get pretty good coverage, but it's just smoketesting), so bugs likely present. If we do decide to adopt it (which, as stated above, is far from being obviously the best choice), I would definitely prefer if some other devs familiarized themselves with it (also that gives me an excuse to look for volunteers to finish the parts I was too lazy to implement :-) of course I can go over the implementation with whoever is interested). While I am a priori happy to maintain SE for as long as needed, I also agree that it makes no sense for Matplotlib to start having a dependency on a library with a bus factor of one.

@choldgraf
Copy link
Contributor

A few points from my end:

  1. I agree that the features that SE implements are cool and useful - I think @anntzer did a really nice job there!
  2. I'd much prefer that we use this as an opportunity to improve sphinx-gallery, maybe as @anntzer suggests the existence of SE can be inspiration for development even if he isn't the one to write the code.
  3. To that extent, it's also totally up to @anntzer what packages etc he'd like to develop :-) If you wanna spend the time writing something like SE then I am all for it...as I mentioned, sometimes there's value in creating an N+1th package that does something.
  4. That said, I think Matplotlib should be fairly conservative about what tools it uses, and I'd recommend that we only depend on tools with (I'll say somewhat arbitrarily) 3+ dedicated devs to a project, fully-formed documentation, complete test coverage, and a fairly large user base.

@anntzer
Copy link
Contributor Author

anntzer commented Aug 26, 2018

Actually, here's another technical point: I just benchmarked SE and it plays quite nicely with sphinx's builtin parallel build (... which doesn't work on Windows but that's sphinx's problem to fix :-)) It is known that sphinx's parallel build doesn't actually scale very well (sphinx-doc/sphinx#4575) but there's still a significant improvement.

SE

no-parallel: 466s
-j4: 199s
-j16: 161s (run on a 32-core machine)

SG

no-parallel: 456s
(haven't tried -j4)
-j16: 338s

Caveat: sphinx complains with a bunch of "image not found" errors during the parallel SE build. This is likely because the some docs refer to some images that haven't been generated yet (because they're in another process) (SG doesn't run into this issue likely because it runs all examples first before reading the docs). But the images it's complaining about are actually present at the end of the build, so I think things are fine (haven't done a careful check though); therefore, I readily admit that the comparison isn't totally fair.

Again, you may see this as a nudge/challenge for SG to scale better with the number of cores :-) (In fact, I think that in that specific case, because SG runs the examples outside of sphinx's main "read" loop, it can more easily implement its own, better parallelization scheme.)

@matthew-brett
Copy link
Contributor

@choldgraf - the requirements seem a bit strict. If SE was obviously better, and feature complete, and @anntzer had committed to maintain it for 5 years, are you really saying you'd insist on two extra core developers before you'd consider it?

@anntzer - in my experience, it very rarely works if you have a new project that is nearly feature-complete, and you are waiting for others to come and help finish it up. For SE to be a contender, it would really have to be a better drop-in replacement for SG, at which point the discussion could be the trade-off between size of developer base, code maintainability and features.

@choldgraf
Copy link
Contributor

@matthew-brett I wasn't trying to say that this was a strict rule (that's why I said it was semi-arbitrary). Just saying that a large package like Matplotlib should err on the more conservative side in its dependencies, especially when there's already a well-developed package out there that it already uses. Though from a maintainability perspective, it does seem like a reasonable concern. It sounds like the main reasoning that @Titan-C was 👎 on directives was because it was a big pain to maintain. If that's true then it doesn't comfort me that a single person decides they want to maintain this functionality on their own...

I also agree with you that maybe it makes sense to adopt a different package if it implements truly crucial functionality that we can't live without, and if we have exhausted attempts at getting these features into the already-used package and this hasn't been successful. My point is that we haven't tried the "contribute to a pre-existing thing" path yet. E.g., I don't believe the sphinx-gallery community has had an explicit conversation about using a directive.

I am all for growing out new pieces of the ecosystem if there is a well-thought-out and strong reasoning behind it (e.g. when repo2docker decided to use its own image building pipeline instead of using s2i, we wrote a whole blog post justifying why). I'm trying to make sure we exhaust those options before considering using a totally new tool that replicates much of the functionality of an already-existing and well-maintained tool.

@anntzer
Copy link
Contributor Author

anntzer commented Aug 27, 2018

in my experience, it very rarely works if you have a new project that is nearly feature-complete, and you are waiting for others to come and help finish it up. For SE to be a contender, it would really have to be a better drop-in replacement for SG, at which point the discussion could be the trade-off between size of developer base, code maintainability and features.

Thanks for the advice. Given the point it's at, I'll probably finish the implementation myself at some point anyways (though on a yet-to-be-defined timeframe) to make it truly usable, but (as stated above) the main priority was really to convince myself and hopefully others that the additional features are indeed possible.

It sounds like the main reasoning that @Titan-C was 👎 on directives was because it was a big pain to maintain. If that's true then it doesn't comfort me that a single person decides they want to maintain this functionality on their own...

I don't think the problem is with the maintenance, it's with keeping compatibility with \infty versions of the dependencies. Again, I think that if you can git clone matplotlib/sklearn/... and possibly even build it :-) then you can pip install the latest version of a pure python package. I can even be generous and guarantee to support the last two minor versions of sphinx at any time, that should give you plenty of time to upgrade it.

My point is that we haven't tried the "contribute to a pre-existing thing" path yet.

On a project like this, the energy barrier for me to write something from scratch that only supports the last version of everything (I added Py3.5 support just recently :-)) is much lower than trying to adapt to an existing codebase, especially when I disagree with certain design decisions in that codebase (again, some of it is just matter of taste). Some may think that this is not the most constructive approach (due to the friction it causes), but as stated above I think it is at least better than not writing anything at all and just discussing vaporware.

E.g., I don't believe the sphinx-gallery community has had an explicit conversation about using a directive.

It is true that there hasn't been an explicit discussion about it on SG's tracker, but I did raise the idea and it was disliked by a core dev and a bit before (same thread) by @larsoner on backcompatibility grounds, even though the feature that would have been broken had been released 1 day before. Perhaps I could have pursued this more (just as I could have opened an issue to ask to switch to Py3 only) but frankly I have little interest in these doing that; I tend to think that at the end it's the implementation that speaks for itself.
To be clear, this shouldn't be read as a critique of the SG tracker/devs -- I have occasionally contributed to sklearn for example (but a self-contained algorithm runs into much fewer such issues than big architectural changes such as the diff between SG and SE); I don't think the ways things panned out for SE/SG would have been different anywhere else.

@Titan-C
Copy link

Titan-C commented Aug 27, 2018 via email

@@ -19,7 +19,6 @@ The easiest way to make a live animation in matplotlib is to use one of the

.. autosummary::
:toctree: _as_gen
:template: autosummary_inher.rst
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems un-related/un-intended? The inherited members on the Animation classes help quite a bit.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's from #11931, see reasoning there (tldr: the default template already shows inheritance now).
Built this PR on top of it so that I only have to rewrite one template, not two.

@anntzer anntzer mentioned this pull request Sep 14, 2018
@jklymak jklymak marked this pull request as draft July 17, 2020 16:09
@story645
Copy link
Member

This is because I actually introspect these while running the examples, so I can know that (e.g.) ax.plot is matplotlib.axes.Axes.plot. See e.g. the hillshading demo:

Given this all works now in sphinx-gallery, can this PR be closed?

@anntzer anntzer closed this May 20, 2021
@anntzer anntzer deleted the sphinx-exhibit branch May 20, 2021 19:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants