Skip to content

[MRG] Patches sphinx's autosummary to handle case insensitive file systems #12968

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed

Conversation

thomasjpfan
Copy link
Member

@thomasjpfan thomasjpfan commented Jan 13, 2019

Reference Issues/PRs

Fixes #12712

What does this implement/fix? Explain your changes.

Autogenerated functions that have the same case insensitive name as another generated file will include a hash of its name in its filename.

Any other comments?

This is a weird solution, since it manually replaces the process_generate_options.

There will be a warning:

scikit-learn/doc/modules/generated/sklearn.cluster.optics_lowercase.rst: WARNING: document isn't included in any toctree

Since the new filename doesn't actually appear in any toctree.

@jnothman
Copy link
Member

Hmmmm... Have you tried submitting a patch to sphinx-doc? I'd be more comfortable evaluating this change if I could see that patch directly at least.

@thomasjpfan
Copy link
Member Author

Fundamentally, the only addition to the process_generate_options (line 100 in custom_files_autosummary.py of this PR) is:

try:
    filename = app.config.custom_autosummary_file_map[name] + suffix
except KeyError:
    filename = name + suffix

It is a little crazy this is all that needs to be changed for this PR to work. scikit-learn must be doing something special to be able to link sklearn.cluster.dbscan to sklearn.cluster.dbscan_lowercase.html correctly. When I created a barebones example to debug this issue, I needed to do a little more fine tuning of the references to get links to point correctly.

Anyways, I could try to patch sphinx, but that patch would most likely be on the latest version. Is there something preventing us from using the latest version of sphinx?

@jnothman
Copy link
Member

jnothman commented Jan 14, 2019 via email

@jnothman
Copy link
Member

jnothman commented Jan 14, 2019 via email

@thomasjpfan
Copy link
Member Author

@jnothman I updated this PR with a smaller LOC fix to our issue. It mocks out os.path.join temporarily with a function that maps filenames to our custom filenames. Again, this is not the most elegant solution.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The monkey nature of this is messy, but I'm glad to have documentation working on windows...

return orig_os_path_join(a, name_with_suffix)

os.path.join = custom_os_path_join
process_generate_options(app)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should use try-finally (or a context manager) when monkey patching

@jnothman
Copy link
Member

jnothman commented Jan 15, 2019 via email

doc/conf.py Outdated
# a upper case module, i.e. `sklearn.cluster.DBSCAN` and
# `sklearn.cluster.dbscan`
custom_autosummary_file_map = {
"sklearn.cluster.dbscan": "sklearn.cluster.dbscan_lowercase",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should probably be using some character that is invalid in identifiers as a delimiter, i.e. -, not _.

doc/conf.py Outdated
custom_autosummary_file_map = {
"sklearn.cluster.dbscan": "sklearn.cluster.dbscan_lowercase",
"sklearn.cluster.optics": "sklearn.cluster.optics_lowercase"
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sklearn.covariance.oas and sklearn.decomposition.fastica need the same treatment

@thomasjpfan
Copy link
Member Author

I updated this PR with zero configuration approach to detecting if a file has already been generated with the same case insensitive name. In these cases the filename will be appended with the hash of the name.

This can be seen here: https://43873-843222-gh.circle-artifacts.com/0/doc/modules/generated/sklearn.cluster.optic35a4574d84b18e6ad28c8c022895e64bb2242daf.html#sklearn.cluster.optics

sklearn.cluster.optic is appended with a sha1 hash of sklearn.cluster.optic.

@jnothman
Copy link
Member

jnothman commented Jan 18, 2019 via email

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Jan 18, 2019

You have valid points. I’ll look into reducing the scope of this PR to prioritize the uppercase filename and add a predefined string to the lowercase filename.

@thomasjpfan
Copy link
Member Author

Here comes another update to this PR. The class names now take priority over the function names. When a function name matches a class name, then a prefix -lowercase will be added to the end of the function's generated filename. This makes an two assumptions that should be valid for sklearn:

  1. Class names always begins with a capitalized letter.
  2. If a function name matches a class name (case-insensitive), then it is the
    only function that matches.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we assured that class names get processed before lowercase? Or does the algorithm not depend on that and I've misread it?

@thomasjpfan
Copy link
Member Author

You are correct, this implementation still assumes that the class is generated first. It turns out this is always the case because sphinx sorts the names here.

@jnothman
Copy link
Member

jnothman commented Jan 20, 2019 via email

@thomasjpfan
Copy link
Member Author

In creating this PR, I was trying to create a no configuration fix, so future PR's do not need to worry about this.

Regarding a change to sphinx-doc, this problem stems from not being able to trust os.path.isfile on case insensitive file systems. Every solution I come up with to patch sphinx, feels hacky. The least hacky solution was the solution with a configuration dict that maps a name to another name. This dictionary may need to be passed to the autodoc extension to make sure it knows about the mapping.

@jnothman
Copy link
Member

jnothman commented Jan 20, 2019 via email

@thomasjpfan
Copy link
Member Author

I suspect os.path.isfile is used in autosummary.generate because it needs to work with the sphinx-autogen cli. It uses the filesystem to figure out if a file was already generated.

@thomasjpfan
Copy link
Member Author

thomasjpfan commented Jan 21, 2019

I have been wrestling with the following two approaches:

  1. Two configuration options:
custom_autosummary_names_with_new_suffix = {
    "sklearn.cluster.dbscan",
    "sklearn.cluster.optics",
    "sklearn.covariance.oas",
    "sklearn.decomposition.fastica"
}
custom_autosummary_new_suffix = "-lowercase.rst"

This leads to a simpler patch, but future contributors running into this issue would need to know to add their function to custom_autosummary_filenames_with_new_suffix.

  1. No configuration options and take advantage of the fact that classes are generated first. This leads to a slightly more complicated patch.

@jnothman What do you think?

@thomasjpfan
Copy link
Member Author

As a reference, I opened another PR: #13022 that uses the configuration options. It feels less hacky because it does less and allow a user to configure the filenames.

@thomasjpfan
Copy link
Member Author

I am closing this PR in favor of #13022.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Documentation paths are ambiguous on case insensitive file systems
2 participants