Skip to content

DOC: Enable nitpicky #14768

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
Aug 8, 2019
Merged

DOC: Enable nitpicky #14768

merged 5 commits into from
Aug 8, 2019

Conversation

alexrudy
Copy link
Contributor

PR Summary

Enable Sphinx nitpicky mode, which raises a warning when a reference can't be found (and when combined with the -W option, causes an error when a reference can't be found).

There are currently ~2500 references which can't be resolved. I can't fix all of them in this SciPy sprint 🙁 but I did write a sphinx extension, missing_references which:

(a) records all of the missing references from one sphinx run in a JSON file.
(b) ignores those missing references on future runs.

This should mean that we can turn Sphinx nitpicky on for now, and it will emit a warning for new problems.

PR Checklist

  • Has Pytest style unit tests
  • Code is Flake 8 compliant
  • New features are documented, with examples if plot related
  • Documentation is sphinx and numpydoc compliant
  • Added an entry to doc/users/next_whats_new/ if major new feature (follow instructions in README.rst there)
  • Documented in doc/api/api_changes.rst if API changed in a backward-incompatible way

@alexrudy alexrudy force-pushed the enable-nitpicky branch 2 times, most recently from a61c555 to 528c47f Compare July 14, 2019 16:39
@alexrudy
Copy link
Contributor Author

@anntzer I've added a location string to the JSON file – its not perfect (it replaces paths not within matplotlib with <external> and some paths end up as <unknown>).

Unfortunately, nitpicky can't use the location in the docs when suppressing warnings, so it doesn't impact the warning suppression (it will suppress all instances of the broken reference) but the filenames and line numbers should help when fixing these references in the future.

@anntzer
Copy link
Contributor

anntzer commented Jul 16, 2019

  • Would it make sense to publish the sphinx extension separately? (I don't really mind either way)
  • I'm not sure it makes sense to commit the actual json file to the repo, unless we have a way to make sure it doesn't go out of date (forcing contributors to do a full doc build for docstrings edits just to keep the file up to date may raise the bar for contribution a bit :()
  • There's a bunch of entries that I don't actually understand, e.g. towards the top:
  [
    "<unknown>:1",
    "py:class",
    "_FancyAxislineStyle.FilledArrow"
  ],
  [
    "<unknown>:1",
    "py:class",
    "_FancyAxislineStyle.SimpleArrow"
  ],

FilledArrow's docstring is

The artist class that will be returned for SimpleArrow style.

so there's no markup problems there?

@alexrudy
Copy link
Contributor Author

alexrudy commented Jul 16, 2019

  • This could make sense as separate sphinx extension, but I'm not yet totally convinced of its wide-ranging applicability. It really only makes a difference for projects which haven't enabled nitpicky mode yet.
  • I think the point of this PR is the JSON file – it encodes the list of currently broken references. By including it, and not regenerating it again, we enforce that new references must be correct (they are subject to the nitpicky setting) but previously broken references can be fixed progressively over time. We only need to regenerate the JSON file when we fix a previously broken reference. And even then, if a contributor doesn't do so, there isn't much harm done (the JSON file specifies missing references to ignore, but now-correct references will be checked. I suppose a future PR could re-introduce a broken reference in the interim, but I would propose we live with that risk for now).
  • These entries aren't about mis-formatted docstrings, but about incorrect references to those objects. I'm not sure what drives the <unknown> (lots of things can inject RST into the sphinx parser and not set the filename sensibly) but it suggests that somewhere there is a reference of the form :py:class:`_FancyAxislineStyle.FilledArrow` but that reference isn't being resolved correctly. Usually, this is because the reference was underspecified (maybe it should have been ~mpl_toolkits.axisartist.axisline_style._FancyAxislineStyle.FilledArrow for the reference to resolve correctly, I'm not sure)

@anntzer
Copy link
Contributor

anntzer commented Jul 16, 2019

This could make sense as separate sphinx extension, but I'm not yet totally convinced of its wide-ranging applicability. It really only makes a difference for projects which haven't enabled nitpicky mode yet.

Either way is fine, it's not very big and we'd not be exposing it as public API anyways.

I think the point of this PR is the JSON file – it encodes the list of currently broken references. By including it, and not regenerating it again, we enforce that new references must be correct (they are subject to the nitpicky setting) but previously broken references can be fixed progressively over time. We only need to regenerate the JSON file when we fix a previously broken reference. And even then, if a contributor doesn't do so, there isn't much harm done (the JSON file specifies missing references to ignore, but now-correct references will be checked. I suppose a future PR could re-introduce a broken reference in the interim, but I would propose we live with that risk for now).

Ah, I missed that non-listed entries would be subjected to nitpicking, that's good.
Perhaps you could fail the build if there are entries in the json file that are not being used, and list in the output? Then the circleci build would have something like

The following entries in missing-references.json are not used and should be deleted:
...

and then one can just delete them manually from the json file.

These entries aren't about mis-formatted docstrings, but about incorrect references to those objects. I'm not sure what drives the (lots of things can inject RST into the sphinx parser and not set the filename sensibly) but it suggests that somewhere there is a reference of the form :py:class:_FancyAxislineStyle.FilledArrow but that reference isn't being resolved correctly. Usually, this is because the reference was underspecified (maybe it should have been ~mpl_toolkits.axisartist.axisline_style._FancyAxislineStyle.FilledArrow for the reference to resolve correctly, I'm not sure)

Grepping for FilledArrow in the codebase yields

$ git grep -C5 FilledArrow
lib/mpl_toolkits/axisartist/axisline_style.py-            extended_path = self._extend_path(path_in_disp,
lib/mpl_toolkits/axisartist/axisline_style.py-                                              mutation_size=mutation_size)
lib/mpl_toolkits/axisartist/axisline_style.py-            self._path_original = extended_path
lib/mpl_toolkits/axisartist/axisline_style.py-            FancyArrowPatch.draw(self, renderer)
lib/mpl_toolkits/axisartist/axisline_style.py-
lib/mpl_toolkits/axisartist/axisline_style.py:    class FilledArrow(SimpleArrow):
lib/mpl_toolkits/axisartist/axisline_style.py-        """
lib/mpl_toolkits/axisartist/axisline_style.py-        The artist class that will be returned for SimpleArrow style.
lib/mpl_toolkits/axisartist/axisline_style.py-        """
lib/mpl_toolkits/axisartist/axisline_style.py-        _ARROW_STYLE = "-|>"
lib/mpl_toolkits/axisartist/axisline_style.py-
--
lib/mpl_toolkits/axisartist/axisline_style.py-                                           line_mutation_scale=self.size)
lib/mpl_toolkits/axisartist/axisline_style.py-            return axisline
lib/mpl_toolkits/axisartist/axisline_style.py-
lib/mpl_toolkits/axisartist/axisline_style.py-    _style_list["->"] = SimpleArrow
lib/mpl_toolkits/axisartist/axisline_style.py-
lib/mpl_toolkits/axisartist/axisline_style.py:    class FilledArrow(SimpleArrow):
lib/mpl_toolkits/axisartist/axisline_style.py:        ArrowAxisClass = _FancyAxislineStyle.FilledArrow
lib/mpl_toolkits/axisartist/axisline_style.py-
lib/mpl_toolkits/axisartist/axisline_style.py:    _style_list["-|>"] = FilledArrow

so it's not clear what could be done to fix that entry, for example.


Just to be clear, I think this would be a nice feature, I'm just trying to make sure we can use it efficiently :)

@anntzer anntzer mentioned this pull request Jul 16, 2019
6 tasks
@jklymak
Copy link
Member

jklymak commented Jul 24, 2019

As someone who is prone to having malformed references in the docs I'd be happy t see this merged. Is it ready for review?

@alexrudy
Copy link
Contributor Author

I am working on two last small pieces – 

  1. Recording files & line numbers for warnings
  2. Emitting a warning for unused warning suppressions in the .json doc.

@timhoffm
Copy link
Member

@alexrudy Thanks for working on this. I've marked this as "work in progress" so that reviewers don't spend extra time on this until it's ready. Please report back when you think this is ready for review.

@alexrudy
Copy link
Contributor Author

I think I'm ready for some more eyes on this!

Where we stand: @anntzer had a good request – record where warnings came from. That required a bit more work inside the extension, but now we get two things:

  1. A better idea of where the missing references can be found – look through the missing_references.json to understand.
  2. A warning when a reference is no longer missing – meaning that if you fix a reference in the docs, you will also need to remove that reference from the .json file.

@timhoffm – I can't add or remove labels, but removing the WIP label seems appropriate now.

@anntzer
Copy link
Contributor

anntzer commented Jul 25, 2019

This is definitely much better now :) looking at missing-references.json under e.g. the py:obj key, one can easily find fixable references. A few more requests though:

  • looks like intersphinx is not handled properly? for example there is
    "numpy.hamming": [
      "lib/matplotlib/axes/_axes.py:docstring of matplotlib.axes.Axes.angle_spectrum:19",

but https://matplotlib.org/devdocs/api/_as_gen/matplotlib.axes.Axes.angle_spectrum.html actually links correctly to numpy.angle_spectrum. Likewise for some stdlib references.

    "matplotlib.animation.AVConvFileWriter.bin_path": [
      "doc/api/_as_gen/matplotlib.animation.AVConvFileWriter.rst:28:<autosummary>:1"

which likely(?) come from the fact that in https://matplotlib.org/devdocs/api/_as_gen/matplotlib.animation.AVConvFileWriter.html, under the Methods section, autosummary tries to link to AVConvFileWriter but that should go to the parent class' doc, https://matplotlib.org/devdocs/api/_as_gen/matplotlib.animation.MovieWriter.html#matplotlib.animation.MovieWriter.bin_path; not sure whether this should be filed as a sphinx bug?

  • perhaps sort the results alphabetically?

@alexrudy
Copy link
Contributor Author

Hmm – @anntzer I think you have identified something more fundamental. I wonder if some of these references are being resolved by the sphinx missing-references hook via other extensions. I will have to investigate whether sphinx supports the ordering of extension hook calls.

@alexrudy
Copy link
Contributor Author

Alright, one more attempt at this!

Since last time:

  • Record missing references primarily using a logging filter (not the missing-references sphinx event). This is because sphinx calls all event callbacks, even if it only needs the first successful result, so other extensions which used the missing-references callback (e.g. viewcode) to resolve references would report missing references to this extension.
  • Still use the event to ensure that reference is no longer missing – if any ignored reference doesn't trigger missing-references, then it doesn't need to be ignored. We can't use the logging filter in this case, because ignored references are never sent to the logger!

Two major outcomes from this:

  1. The list of ignored references is much easier to understand and fix over time.
  2. The missing references monitoring now depends not only on sphinx events but also on sphinx's logging setup. I think that is okay because (a) this is an internal sphinx extension and (b) sphinx logging has been pretty stable for a while and no class we access is private (though they aren't part of the extension hooks).

If I/we ever want to break this extension out into its own repo, I would want to ensure that upstream sphinx adds better support for this kind of thing.

path = os.path.relpath(path, start=basepath)

if path.startswith(os.path.pardir):
path = os.path.join("<external>", os.path.basename(path))
Copy link
Contributor

@anntzer anntzer Jul 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will give a different result if someone builds on Windows, which we actually don't want? So force the use of "/" here? (I would use pathlib.Path.as_posix because pathlib is nice :), but you don't have to.)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, good catch! I'll switch to pathlib, I like that too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wanted to go with pathlib, but I decided against it, because the semantics of Path.relative_to aren't the same as os.path.relpath (the former raises an error for paths which can't be constructed as children of the base directory) – so it was easier just to pull out posixpath.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. if you are relying on relpath having different semantics from Path.relative_to (haven't looked in depth) you should leave a comment so that the code doesn't "helpfully" get rewritten to use relative_to in the future...
  2. otoh, you know you can pass pathlib Paths to os.path.relpath? alternatively, something like if basepath in path.parents: ...
  3. I guess if you do use pathlib then PurePosixPath is the way to go to have forward slashes all the time, as you're not going to do any filesystem ops on them.

@anntzer
Copy link
Contributor

anntzer commented Jul 25, 2019

Nice! Just one review point left I think, I'm going to skip style nitpicks unless you want to hear them :)

@alexrudy
Copy link
Contributor Author

Always happy to hear any nitpicks, @anntzer !

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

none of these style comments should be considered too seriously :)


if path:

basepath = os.path.abspath(os.path.join(app.confdir, ".."))
Copy link
Contributor

@anntzer anntzer Jul 26, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

actually I think this will fail if building the docs against a non-editable install? (so that you'd find the wrong matplotlib by going just up once from doc/conf.py) Admittedly a bit theoretical [though note that conda or linux packagers have been known to do stranger stuff], but you could possibly instead normalize the paths relative to Path(matplotlib.__init__.__file__).parent.parent (which helpfully also works for mpl-toolkits) instead (I guess if this was to be a mpl-independent extension you'd need to make this a config option, but note that in such a case your current approach also fails if conf.py is under doc/source/conf.py rather than doc/conf.py, which is a valid sphinx layout).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh – yes, this makes sense. A very good catch.

I'm going to test your suggestion and see what happens. It definitely won't work if this becomes a true sphinx extension, but there are are a few ways around that in the future.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried some things out here –

It's quite difficult to gracefully handle building the docs correctly when matplotlib isn't an editable install. Figuring out what is a "matplotlib" path vs any other module, and producing paths which match the editable install case, is very difficult. e.g. in the source tree mode, we have a path lib/matplotlib/axes/_axes.py, but in installed mode, this is something like site-packages/matplotlib/axes/_axes.py. Although we could normalize those two paths, it is a lot of work for a particular edge case, for functionality that isn't very important to this extension. (The features of this extension are already ballooning beyond what I had envisioned).

Instead, I propose that we detect when the matplotlib source directory doesn't seem to be in the right place relative to the docs directory and in those cases, we don't worry about warning for unused ignores. Everything else should still work fine in those cases – the actual process of ignoring unmatched references doesn't use the file paths at all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry to have sent you on a wild goose chase... just ignoring warnings if not in a standard layout is perfectly fine.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No worries – it was interesting idea to try. I'm just trying to balance making things useful with making the perfect sphinx extension!

Copy link
Member

@QuLogic QuLogic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some typos and other clarifications.

self.app = app
super().__init__()

def _record_reference(self, record):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure why you need this separate method when it only has one caller, which is also a method on the same class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its done as a convenience – the filter() method must return True or False, but the _record_reference method doesn't actually do any log filtering (all messages should get logged) so its a bit clearer IMO to return True from the filter method and put the logic somewhere else (so that we don't have to ensure that early returns from _record_reference also return True).

missing_reference_filter = MissingReferenceFilter(app)
for handler in sphinx_logger.handlers[:]:
if (isinstance(handler, sphinx_logging.WarningStreamHandler) and
missing_reference_filter not in handler.filters):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Word-wrapped if should be indented one more level.

for ignored_reference_location in locations:
if (ignored_reference_location not in
missing_reference_locations):
msg = (f"Reference {domain_type} {target} for "
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indent a bit weird

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed – @QuLogic what were you going for with your earlier comment about this if statement?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This indent is correct now; I don't know what @anntzer finds weird.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Though I suppose I could point out that the if condition could fit on one line.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's the f-string indent which is currently

                msg = (f"Reference {domain_type} {target} for "
                    f"{ignored_reference_location} can be removed"
                    f" from {app.config.missing_references_filename}."
                        "It is no longer a missing reference in the docs.")

but should rather be something like

                msg = (f"Reference {domain_type} {target} for "
                       f"{ignored_reference_location} can be removed"
                       f" from {app.config.missing_references_filename}."
                        "It is no longer a missing reference in the docs.")

Copy link
Contributor

@anntzer anntzer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

enough nitpicking :p

@alexrudy
Copy link
Contributor Author

I'm going to rebase and fix the failing tests on this.

Copy link
Member

@timhoffm timhoffm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, let's give this a try. 😄

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants