[MRG] Patches sphinx's autosummary to handle case insensitive file systems #12968

thomasjpfan · 2019-01-13T03:42:55Z

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Autogenerated functions that have the same case insensitive name as another generated file will include a hash of its name in its filename.

Any other comments?

This is a weird solution, since it manually replaces the process_generate_options.

There will be a warning:

scikit-learn/doc/modules/generated/sklearn.cluster.optics_lowercase.rst: WARNING: document isn't included in any toctree

Since the new filename doesn't actually appear in any toctree.

jnothman · 2019-01-14T00:21:02Z

Hmmmm... Have you tried submitting a patch to sphinx-doc? I'd be more comfortable evaluating this change if I could see that patch directly at least.

thomasjpfan · 2019-01-14T00:48:29Z

Fundamentally, the only addition to the process_generate_options (line 100 in custom_files_autosummary.py of this PR) is:

try:
    filename = app.config.custom_autosummary_file_map[name] + suffix
except KeyError:
    filename = name + suffix

It is a little crazy this is all that needs to be changed for this PR to work. scikit-learn must be doing something special to be able to link sklearn.cluster.dbscan to sklearn.cluster.dbscan_lowercase.html correctly. When I created a barebones example to debug this issue, I needed to do a little more fine tuning of the references to get links to point correctly.

Anyways, I could try to patch sphinx, but that patch would most likely be on the latest version. Is there something preventing us from using the latest version of sphinx?

jnothman · 2019-01-14T01:44:03Z

I don't think there is anything preventing us using the latest version

jnothman · 2019-01-14T01:45:34Z

I don't think it's so crazy that that is all that is needed. But it might be nice to also consider how conflicts can be detected.

…sitive

thomasjpfan · 2019-01-14T20:08:50Z

@jnothman I updated this PR with a smaller LOC fix to our issue. It mocks out os.path.join temporarily with a function that maps filenames to our custom filenames. Again, this is not the most elegant solution.

jnothman

The monkey nature of this is messy, but I'm glad to have documentation working on windows...

jnothman · 2019-01-15T05:45:01Z

doc/sphinxext/custom_files_autosummary.py

+        return orig_os_path_join(a, name_with_suffix)
+
+    os.path.join = custom_os_path_join
+    process_generate_options(app)


You should use try-finally (or a context manager) when monkey patching

jnothman · 2019-01-15T07:42:16Z

Is it possible to raise an error when there are conflicts, to make sure this issue doesn't repeat?

jnothman · 2019-01-16T12:38:48Z

doc/conf.py

+# a upper case module, i.e. `sklearn.cluster.DBSCAN` and
+# `sklearn.cluster.dbscan`
+custom_autosummary_file_map = {
+    "sklearn.cluster.dbscan": "sklearn.cluster.dbscan_lowercase",


we should probably be using some character that is invalid in identifiers as a delimiter, i.e. -, not _.

jnothman · 2019-01-16T20:44:15Z

doc/conf.py

+custom_autosummary_file_map = {
+    "sklearn.cluster.dbscan": "sklearn.cluster.dbscan_lowercase",
+    "sklearn.cluster.optics": "sklearn.cluster.optics_lowercase"
+}


sklearn.covariance.oas and sklearn.decomposition.fastica need the same treatment

…sitive

thomasjpfan · 2019-01-18T00:38:27Z

I updated this PR with zero configuration approach to detecting if a file has already been generated with the same case insensitive name. In these cases the filename will be appended with the hash of the name.

This can be seen here: https://43873-843222-gh.circle-artifacts.com/0/doc/modules/generated/sklearn.cluster.optic35a4574d84b18e6ad28c8c022895e64bb2242daf.html#sklearn.cluster.optics

sklearn.cluster.optic is appended with a sha1 hash of sklearn.cluster.optic.

jnothman · 2019-01-18T04:24:26Z

Relying on "already processed" means processing order must be deterministic and ideally ASCII ordering to preference the capitalised version Sha1 is excessive when we really only care about uppercase or lowercase and could produce a binary signature and base 64 encod it. But whatever

thomasjpfan · 2019-01-18T20:51:31Z

You have valid points. I’ll look into reducing the scope of this PR to prioritize the uppercase filename and add a predefined string to the lowercase filename.

thomasjpfan · 2019-01-19T16:55:13Z

Here comes another update to this PR. The class names now take priority over the function names. When a function name matches a class name, then a prefix -lowercase will be added to the end of the function's generated filename. This makes an two assumptions that should be valid for sklearn:

Class names always begins with a capitalized letter.
If a function name matches a class name (case-insensitive), then it is the
only function that matches.

jnothman

Are we assured that class names get processed before lowercase? Or does the algorithm not depend on that and I've misread it?

thomasjpfan · 2019-01-20T21:29:07Z

You are correct, this implementation still assumes that the class is generated first. It turns out this is always the case because sphinx sorts the names here.

jnothman · 2019-01-20T22:13:02Z

Overall I'm okay with this change as a temporary solution. I think it is good if our docs compile on Windows, etc. but this is quite hacky. We need a change like this to go in sphinx-doc that at least identifies the conflict and allows the user to configure some resolution.

thomasjpfan · 2019-01-20T22:40:42Z

In creating this PR, I was trying to create a no configuration fix, so future PR's do not need to worry about this.

Regarding a change to sphinx-doc, this problem stems from not being able to trust os.path.isfile on case insensitive file systems. Every solution I come up with to patch sphinx, feels hacky. The least hacky solution was the solution with a configuration dict that maps a name to another name. This dictionary may need to be passed to the autodoc extension to make sure it knows about the mapping.

jnothman · 2019-01-20T22:44:20Z

Why do we need os.path.isfile? I think having a way to configure renames is a good start to a solution in sphinx-doc, but keeping a history of generated paths in order to raise a warning when duplicates area found also seems advisable

thomasjpfan · 2019-01-20T23:24:18Z

I suspect os.path.isfile is used in autosummary.generate because it needs to work with the sphinx-autogen cli. It uses the filesystem to figure out if a file was already generated.

thomasjpfan · 2019-01-21T14:51:49Z

I have been wrestling with the following two approaches:

Two configuration options:

custom_autosummary_names_with_new_suffix = {
    "sklearn.cluster.dbscan",
    "sklearn.cluster.optics",
    "sklearn.covariance.oas",
    "sklearn.decomposition.fastica"
}
custom_autosummary_new_suffix = "-lowercase.rst"

This leads to a simpler patch, but future contributors running into this issue would need to know to add their function to custom_autosummary_filenames_with_new_suffix.

No configuration options and take advantage of the fact that classes are generated first. This leads to a slightly more complicated patch.

@jnothman What do you think?

thomasjpfan · 2019-01-21T16:28:30Z

As a reference, I opened another PR: #13022 that uses the configuration options. It feels less hacky because it does less and allow a user to configure the filenames.

thomasjpfan · 2019-01-25T21:38:52Z

I am closing this PR in favor of #13022.

thomasjpfan added 8 commits January 12, 2019 13:38

DOC: Fixes issue with case insensitive file systems

911df26

RFC: Mininize changes

3db0200

RFC: Mininize changes

faa91fc

RFC: Link to optics

bfbc540

RFC: Rename extension

8bd89ec

DOC: Improves

c5303e1

BUG: Uses sphinx 1.6.2 api

0cdee74

BUG: Uses sphinx 1.6.2 api

37d54bc

thomasjpfan added 5 commits January 14, 2019 09:17

Merge remote-tracking branch 'upstream/master' into sphinx_case_insen…

910565b

…sitive

REV: Minimizes diffs

8903f34

WIP

0e105e5

RFC: Smaller fix

d3279ff

Merge remote-tracking branch 'upstream/master' into sphinx_case_insen…

1efcca3

…sitive

jnothman reviewed Jan 15, 2019

View reviewed changes

jnothman reviewed Jan 16, 2019

View reviewed changes

thomasjpfan added 6 commits January 17, 2019 10:24

ENH: Uses contextmanager

8d2e3df

Merge remote-tracking branch 'upstream/master' into sphinx_case_insen…

7984e9f

…sitive

RFC: Removes print

e006a6b

RFC

9cd7a23

ENH: Removes configuration

3e63960

RFC: Renames patch

e31823d

thomasjpfan added 2 commits January 19, 2019 11:49

ENH: Prioritize classes when for generated files

adee09a

DOC: Reword

0ed820b

jnothman reviewed Jan 20, 2019

View reviewed changes

DOC: Adds comment about order

25906a3

thomasjpfan mentioned this pull request Jan 21, 2019

[MRG] Patches sphinx.ext.autosummary for case insensitive file systems #13022

Merged

thomasjpfan closed this Jan 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] Patches sphinx's autosummary to handle case insensitive file systems #12968

[MRG] Patches sphinx's autosummary to handle case insensitive file systems #12968

thomasjpfan commented Jan 13, 2019 •

edited

Loading

jnothman commented Jan 14, 2019

thomasjpfan commented Jan 14, 2019

jnothman commented Jan 14, 2019 via email

jnothman commented Jan 14, 2019 via email

thomasjpfan commented Jan 14, 2019

jnothman left a comment

jnothman Jan 15, 2019

jnothman commented Jan 15, 2019 via email

jnothman Jan 16, 2019

jnothman Jan 16, 2019

thomasjpfan commented Jan 18, 2019

jnothman commented Jan 18, 2019 via email

thomasjpfan commented Jan 18, 2019 •

edited

Loading

thomasjpfan commented Jan 19, 2019

jnothman left a comment

thomasjpfan commented Jan 20, 2019

jnothman commented Jan 20, 2019 via email

thomasjpfan commented Jan 20, 2019

jnothman commented Jan 20, 2019 via email

thomasjpfan commented Jan 20, 2019

thomasjpfan commented Jan 21, 2019 •

edited

Loading

thomasjpfan commented Jan 21, 2019

thomasjpfan commented Jan 25, 2019

[MRG] Patches sphinx's autosummary to handle case insensitive file systems #12968

[MRG] Patches sphinx's autosummary to handle case insensitive file systems #12968

Conversation

thomasjpfan commented Jan 13, 2019 • edited Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

jnothman commented Jan 14, 2019

thomasjpfan commented Jan 14, 2019

jnothman commented Jan 14, 2019 via email

jnothman commented Jan 14, 2019 via email

thomasjpfan commented Jan 14, 2019

jnothman left a comment

Choose a reason for hiding this comment

jnothman Jan 15, 2019

Choose a reason for hiding this comment

jnothman commented Jan 15, 2019 via email

jnothman Jan 16, 2019

Choose a reason for hiding this comment

jnothman Jan 16, 2019

Choose a reason for hiding this comment

thomasjpfan commented Jan 18, 2019

jnothman commented Jan 18, 2019 via email

thomasjpfan commented Jan 18, 2019 • edited Loading

thomasjpfan commented Jan 19, 2019

jnothman left a comment

Choose a reason for hiding this comment

thomasjpfan commented Jan 20, 2019

jnothman commented Jan 20, 2019 via email

thomasjpfan commented Jan 20, 2019

jnothman commented Jan 20, 2019 via email

thomasjpfan commented Jan 20, 2019

thomasjpfan commented Jan 21, 2019 • edited Loading

thomasjpfan commented Jan 21, 2019

thomasjpfan commented Jan 25, 2019

thomasjpfan commented Jan 13, 2019 •

edited

Loading

thomasjpfan commented Jan 18, 2019 •

edited

Loading

thomasjpfan commented Jan 21, 2019 •

edited

Loading