Skip to content

DOC [PST] refactor API structure and improve display #28428

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 32 commits into from
Apr 11, 2024

Conversation

Charlie-XIAO
Copy link
Contributor

@Charlie-XIAO Charlie-XIAO commented Feb 15, 2024

Please note that this PR targets the new_web_theme branch!

Towards #28084.

Here is the list of things left to be done in this PR (if any). Maintainers please feel free to modify.

  • Placeholder.

@Charlie-XIAO Charlie-XIAO marked this pull request as ready for review February 15, 2024 12:19
@Charlie-XIAO Charlie-XIAO marked this pull request as draft February 15, 2024 12:19
Copy link

github-actions bot commented Feb 15, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 1aa8ffa. Link to the linter CI: here

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 16, 2024

Refactor API structure

Previously we have all classes/functions on the same page (i.e., 1-level). The problem is that in pydata-sphinx-theme, the primary sidebar will then show all classes/functions while the secondary sidebar will show the modules (because they are subsection titles), which is pretty strange. See this page for illustration. The new structure is 2-level, i.e., the index page links to module pages, and each module page links to all its classes/functions. See the rendered docs for illustration.

Script for generating API pages (optional)

I created a script for generating API pages (contents in sklearn/_api_reference.py, writer function in doc/conf.py) because I found many inconsistencies between modules. The modifications to some module/submodule docstrings are also for consistency. Please let me know if such a script is not desired, then I may leave the generated pages there and remove the script.

All-in-one template

I replaced all templates with just one templates/base.rst. I also removed the deprecated_xxx.rst templates because: (1) Personally I don't think deprecated classes/functions should have the :robots: noindex metadata. Suppose that I don't know a class/function is deprecated and search by its name, I would want that deprecated page to show up (please correct me if I misunderstood how things work). (2) The warning admonition should be added in the docstring instead of in the template.

Tweaking in-page toc

If you look at a class reference page, you will see that there are no entries for its methods in the secondary sidebar. This is not because they are not there, but because there parent div is not visible (if you inspect the element you will see). As second-level entries, the methods should have shown up when you scroll there, but because of this bootstrap bug scrollspy is not recognizing their IDs because they contain dots. From my perspective it is reasonable that we always show two levels of in-page toc for API pages, but this configuration is global in pydata-sphinx-theme, meaning that other pages would show two levels as well (which I think would be too long). Also the entries would be by default {{ objname }}.{{ method }} which is often too long to fit into the secondary sidebar, so it might be worth creating a script for stripping the object name part and to make the second level visible, which is what I'm currently doing in sphinxext/override_pst_pagetoc.py.

To explain a bit more what this extension does, it first tells from pagename whether it is an API page or not. If it is, it creates a function that calls the pydata-sphinx-theme function that is used to generate the in-page toc HTML for the secondary sidebar, tweaks that HTML, and put it back into the context (replacing the original one). That way the when the pydata-sphinx-theme in-page toc component calls that function from the context, it will actually get our tweaked version.

There was a similar issue in pydata-sphinx-theme, though the proposed solution is to create a new component and reading .py.method divs from the page, which is different from my solution that directly tweaks the generated sidebar HTML. However pydata-sphinx-theme developers seem to be against including such a feature into the theme (see this comment), so we may have to maintain it ourselves.

Search on API index page (new)

Proposed in #28428 (comment), I used DataTables to realize a version of API searching (by names only). The initialization script is in scripts/api-search.js and it involves some CSS tweaking (especially for dark theme) in scss/api-search.scss. In the special case where JS is disabled, all APIs will be displayed in a super long list so that one may still use Ctrl-F to search. If one wants to navigate by modules, then the primary sidebar should suffice.

@Charlie-XIAO Charlie-XIAO marked this pull request as ready for review February 16, 2024 13:08
@adrinjalali
Copy link
Member

I agree the sidebars on the example in your link looks odd, so would be nice to fix that. However:

personally, the API page is pretty much the only page I always have open from our website, which allows me to very quickly do a Ctrl+F and find something and with one single click I'm in the docs of that class/function.

The issue with the new layout is that there's no single page where everything is listed, and this page becomes somewhat useless maybe? One would need to do a search, wait for a page reload, then find the right link, and click. I personally don't know where each class is, especially with a lot of our preprocessing stuff.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 16, 2024

I didn't notice that I also used it this way until you've said it 😆 Do you think it is a reasonable solution that: we add a search box on this page which only search for class/function names and display the results? This way we will not need to go to the search page, and the search speed will be significantly faster than searching through the whole documentation.

@adrinjalali
Copy link
Member

I think anything requiring a page reload would be bad, but if you find a way to do that w/o a page reload, I'd be game.

@Charlie-XIAO
Copy link
Contributor Author

@adrinjalali I used DataTables to achieve something like the following. Do you think this satisfies your need?

API.Reference.scikit-learn.1.5.dev0.documentation.9.-.-.Microsoft.Edge.2024-02-17.17-18-49.mp4

@adrinjalali
Copy link
Member

Yeah that looks pretty cool!

Copy link
Member

@adrinjalali adrinjalali left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still kinda miss the one page listing all public API though, it'd be nice to have that. I'm not sure if that's me being used to that page or if it's really good to have it though.

Thanks @Charlie-XIAO

@ogrisel
Copy link
Member

ogrisel commented Feb 22, 2024

I still kinda miss the one page listing all public API though, it'd be nice to have that.

I think I agree: I think it's was better to get a high level overview of the library.

On the other hand, the search filtering tool for the table introduced in this PR is very snappy and nice to use.

Maybe we could get the best of both worlds by:

  • displaying 1000 items per page by default (effectively displaying all 558 API entries at once by default?),
  • replacing the module column in the table by the first line of the docstring (as done before in the screenshot below):

image

  • the module name could be displayed in the first column, either above or under the entry name, possibly with a smaller and/or lower contrast font.

WDYT?

@ogrisel
Copy link
Member

ogrisel commented Feb 22, 2024

Also, since the API table/search index page does not need a right-hand side navigation panel, maybe we could use that space to expand the table horizontally.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 24, 2024

Here are some updates (note that the deprecated things are only to test the rendering).

Dark Theme Light Theme
d8de58a0c370e68526d41532e9afce3 f7d1930817645f146c005360b77651e

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Feb 24, 2024

Sorry for taking quite a while since the last update. The core of the new update is the ShortSummaryDocumenter implemented in sphinxext/autoshortsummary.py, which is used to produce the short summary in the search table.

About module name display. As suggested I used smaller font size and lower contrast color for the module names, placed under the short summaries. Other alternatives that I considered:

  • Directly showing full object name: too long and there would be limited space for the short summaries.
  • Using another column: same reason as above.
  • Placing below object name: module name could be longer than object name even in smaller font; this sometimes looks weird.

About default per-page entries. As suggested I showed all entries by default, but one major concern is the increased delay of initializing the table and searching. You may want to try and see if the delay is acceptable.

Other comments. Since there is no deprecated API at this moment, I intentionally moved some just in order to check the rendering. When finalizing this PR they should be moved back to their right places.

@ogrisel @adrinjalali Please take a look when you have time :)

@adrinjalali
Copy link
Member

I think this is better now, yes!

@ogrisel
Copy link
Member

ogrisel commented Feb 26, 2024

I played with the preview and it looks great to me.

Will try to find the time to do a proper review.

@Charlie-XIAO

This comment was marked as outdated.

Copy link
Member

@ogrisel ogrisel left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM once the following is addressed:

@Charlie-XIAO
Copy link
Contributor Author

Done, moved to under doc/.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Apr 11, 2024

I'm going to self-merge since the PR has already got 2 approvals :)

@Charlie-XIAO Charlie-XIAO merged commit b9f8f27 into scikit-learn:new_web_theme Apr 11, 2024
@Charlie-XIAO Charlie-XIAO deleted the pst-api branch April 11, 2024 07:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants