Skip to content

Add tools for documentation formatting and validation #8136

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

tgross35
Copy link
Contributor

@tgross35 tgross35 commented Jun 15, 2022

Description

The goal of this PR is to add tools that help with contributing to documentation, from discussion started on #7659. Goals include:

  • Automatic formatting of code blocks in .rst documents and in docstrings (blacken-docs)
  • Flake8 validation of these code blocks (flake8-rst-docstrings, flake8-docstrings)
  • Addition of an optional spellcheck tool for documentation (sphinxcontrib-spelling)
  • Adding documentation on documentation. Seems like there's a little blurb here https://www.sqlalchemy.org/develop.html but I'm not sure where the source lives. I'd be happy to add more to this page, or a separate page on docs.

Checklist

This pull request is:

  • A documentation / typographical error fix
    • Good to go, no issue or tests are needed
  • A short code fix
    • please include the issue number, and create an issue if none exists, which
      must include a complete example of the issue. one line code fixes without an
      issue and demonstration will not be accepted.
    • Please include: Fixes: #<issue number> in the commit message
    • please include tests. one line code fixes without tests will not be accepted.
  • A new feature implementation
    • please include the issue number, and create an issue if none exists, which must
      include a complete example of how the feature would look.
    • Please include: Fixes: #<issue number> in the commit message
    • please include tests.

Have a nice day!

@tgross35 tgross35 marked this pull request as draft June 15, 2022 04:25
@tgross35 tgross35 mentioned this pull request Jun 15, 2022
18 tasks
@tgross35
Copy link
Contributor Author

tgross35 commented Jun 15, 2022

Status as-of dddad32

Spelling

Currently the spelling plugin is installed and working with make spelling (this requires the enchant C lib). Note for docs building docutils 0.17.1 needs to be installed (0.18.1 is their latest). The issue is somewhere in SQLA's tools and not with spelling though, zzzeek is looking at it.

There are some quirks with it

  • To actually run, the enchant C library needs to be installed. I don't think this is a blocker to anything since the requirements can run and
  • The tokenizer pyenchant has some weirdness with tokenization. It splits on ' and - so things like as-hoc or doesn't show up as ad, hoc, doesn, t, which really sucks. I also wouldn't consider this a blocker for making the tool available, I've already gone through and added a lot of split words to the dictionary. There is an open issue to fix this Public API to config arbitrary valid word characters; do not tokenize hyphenated words pyenchant/pyenchant#286

flake8

It looks like flake8-rst-docstrings and flake8-docstrings are already on the pre-commit dependencies, just need to make sure they're running

blacken-docs

So far I like this one a lot.

One note is that it only runs in .. code-block:: python and .. code-block:: pycon, (same in docstrings) and not :: (relevant issue here). This actually kind of provides an easy way to opt documents into formatting - specify the language type whenever it's ready. Only 7 docs get caught right now, could actually fix those and enable this hook by default.

It doesn't pick up the .. sourcecode:: pycon+sql directives, which is big. My first thought on an easy way to fix this is a pretty easy syntax change. Just update zzzeeksphinx to be the lexer for ..code::pycon and accept a syntax like:

.. code:: pycon

    >>> Base.metadata.create_all(engine)
    """{opensql}
    ...
    CREATE TABLE user_account (
        id INTEGER NOT NULL,
        name VARCHAR(30) NOT NULL,
        fullname VARCHAR,
        PRIMARY KEY (id)
    )
    ...
    """

Probably pretty easy to change the identifiers, then the whole block stays as valid (formattable) python. Nice too that then rst syntax highlighters would pick it up.

Side note: a python and sql role might be useful. Would give syntax highlighting to things like

 :python:`session.execute(delete(obj))` will issue :sql:`DELETE FROM user WHERE...`
.. role:: python(code)
   :language: python

.. role:: sql(code)
   :language: sql

@tgross35
Copy link
Contributor Author

Turns out I mixed up the flake8 things. flake8-rst-docstrings and flake8-docstrings don't validate the python within the docstrings, just the docstrings themselves. flake8-rst does validate the docstrings, but it's unmaintained since 2020. I reached out to their maintainer and also the maintainer of pycqa to see if somebody might take it over and maintain it.

Could also add doctest as a precommit hook.

Interestingly, zzzeek was the inspiration for flake8-rst

image

@zzzeek
Copy link
Member

zzzeek commented Jun 15, 2022

id have to look and see what the rationale for "pycon+sql" as a separate thing was. it's very old and lots of things have changed since then, including that I've ditched the [SQL] button thing which seemed so clever 15 years ago. also would need to see how blacken-docs formats python console examples.

visitable
walkthrough
zxJDBC
zxJDBC
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems duplicate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch - I'll dedupe before marking the PR as non-draft

# spelling_ignore_acronyms = True
# spelling_ignore_python_builtins = True
# spelling_ignore_importable_modules = True
# spelling_ignore_contributor_names = True
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is there a configuration that indicates the location of the wordlist file?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes there's spelling_word_list_filename, I just included the word filter related options. Do you want it to be explicit?

Full options here https://sphinxcontrib-spelling.readthedocs.io/en/latest/customize.html

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would make sense to make it explicit yes, so searching the file name would make it obvious where it's used

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also a comment with the link to the option list is always handy :)

@tgross35
Copy link
Contributor Author

tgross35 commented Jun 15, 2022

@zzzeek tested it out, it does about what you'd expect for pycon, just looks like standard black formatting.

One note is it does remove blank lines in pycon, which are sometimes nice to have. I guess the correct solutions here would be to add a # on these blank lines, or use the plain python format (edit: forgot about doctest), not sure what you'd find acceptable. I mean technically having the blank lines there means the character for character code isn't quite right (since no trailing whitespace for a tab in console), but that's not the biggest issue since sphinx-copybutton seems to remove them.

Did a quick test on the orm quickstart page to show how it formats. _test is the original file, _formatted is post blacken-docs. Had to zip the files because gh doesn't like rst, for no good reason

quickstart.zip

@CaselIT
Copy link
Member

CaselIT commented Jun 15, 2022

Side note: a python and sql role might be useful. Would give syntax highlighting to things like

 :python:`session.execute(delete(obj))` will issue :sql:`DELETE FROM user WHERE...`
.. role:: python(code)
   :language: python

.. role:: sql(code)
   :language: sql

these would be nice to have

@tgross35
Copy link
Contributor Author

Heh, having fun testing the inline roles

If you want to you could :mysql:`set @@system_versioning_alter_history = true`,
or if you feel like it :tsql:`USE sqlalchemy; GO`, or if you don't feel like it
:postgres:`SELECT 'i know my a b c's'::tsvector @@ 'a & b'::tsquery;`. Maybe
even :ccode:`++i; i++; printf("don't be fooled, they're different\n");`, but at
the end of the day you just want to :python:`print([relax for relax in paradise if
relax.temp_c < 28])`

Compared to unformatted ``SELECT 'i know my a b c'::tsvector @@ 'a &
b'::tsquery;``

image

I didn't even know they had sql flavor-specific highlighting, sqlalchemy is probably the best project possible to make use of that

@tgross35
Copy link
Contributor Author

I'm hitting a bug running spelling on the changelogs, (report) so I just excluded that directory for now.

There's also something zzzeeksphinx doesn't like, it does the spellcheck fine but throws an error at the end looking for some css/scss files in the output/spelling directory. I'll look into that a bit more, but it doesn't need to stop anything.

That being said, it runs successfully for the most part and I have a semifinal word list done. A lot of the remaining errors are just unquoted code or variable names that use "" rather than ````. I notice that a ton of docs even backtick quote simple things like jsonb (postgres), I wonder if that's for this reason.

Excluding code quoting, looks like the docs are overall very good for spelling, probably only 100 or so errors.

I think it would be good to merge this PR (when ready) without any fixes, then do a separate formatting PR that can be marked as ignore in .git-blame-ignore-revs. I don't think the flake8 checks will work now - I actually took over the flake8-rst repo from the old maintainer (think he just didn't have time) but it will take a bit of time before that is ready to go.

So the main thing here is probably just deciding how to handle code blocks. Could also put the SQL in a collapsable section like this I guess https://stackoverflow.com/a/25543713/5380651, or do the docstring thing, or just leave those blocks unformatted.

Additionally, I think I'd like to write some docs about writing docs, to help anyone new out (I struggled a bit getting started when I wrote the docs for the temporal thingy). Where's a good place for that?

@zzzeek
Copy link
Member

zzzeek commented Jan 8, 2024

there's a lot happening here....spellcheck is nice but this seems like a lot of new tooling to deal with.

any way we can have separate PRs for individual new checkers and such so we can just do small things one at a time?

if there's no longer interest in working on this I'll close it for now.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants