Skip to content

DOC use Algolia for the search bar #29666

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

Charlie-XIAO
Copy link
Contributor

@Charlie-XIAO Charlie-XIAO commented Aug 13, 2024

Take over #29138 due to some cryptic CI error that blocked the build of artifacts. Pinging people who involved in the previous PR @glemaitre @adrinjalali @lesteve.

Artifacts

Concerns and Possible Solutions

Versioning

Searching in any version will currently link to results in both stable and dev versions. A possible solution is to use meta tags and facet filters. See #29138 (comment), meta tags docs, and facet filter docs. I've opened pydata/pydata-sphinx-theme#1951 upstream to add the meta tag, and it is also easy to tweak the page.html template as a temporary workaround. I think we need to first push a version with those meta tags (and probably backport to 1.5 as well) and reconfigure the crawler somehow. @glemaitre may know better about this.

Search context

There were reports that search contexts were missing (#29138 (comment), #29138 (comment)). I think it's not that they are missing. For certain search queries there will be matches in titles instead of in contents, and these will be ordered before matches in contents. Since the per-page search bar only shows the first few results, sometimes we will see no search context.

Results page

Algolia DocSearch (i.e., the search bar per page) does not supports viewing all results by default. I built an "all results" page powered by Algolia instantsearch and simulating the appearance of the native sphinx search. Improvement suggestions are welcome :)

Search bar navigation

By default when pressing enter in the per-page search box, Algolia DocSearch will navigate to the page of the corresponding search result, which is somehow counterintuitive as mentioned in #29138 (comment) and #29138 (comment). With the navigator API we can override the behavior when pressing Enter, but this is per-item so is actually a hack, and it means that we lose the ability to use Enter to navigate to a specific item. I've kept navigateNewTab and navigateNewWindow un-overridden so now the behavior is: Enter goes to "all results" page, Ctrl+Enter goes to the item page in new tab, and Shift+Enter goes to the item page in new window.

Local build

Mentioned in #29138 (comment), it would be strange to point to remote docs when searching in a local build. This PR implements the switch based on the environment variable SKLEARN_DOC_USE_ALGOLIA_SEARCH. If "0" or not set nothing is changed (you can easily see this because all changes in conf.py are wrapped in if use_algolia) so by default local builds will use the native sphinx search. If "1", it changes the navbar and adds the additional Algolia "all results" page. In CI I currently set the doc run to use "1" for the final artifacts and the doc_min_dependencies run to use "0" to test that it at least does not cause build errors.

Copy link

github-actions bot commented Aug 13, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 30921f0. Link to the linter CI: here

@adrinjalali
Copy link
Member

This is pretty good. Thanks @Charlie-XIAO ❤️

So now my two concerns are:

  • the issue with versions, which is being investigated
  • the actual results, which might require fiddling with the index. For instance, searching for "tsne" results in:

image

And going all the way down, I couldn't find the API page for TSNE.

@Charlie-XIAO
Copy link
Contributor Author

Charlie-XIAO commented Aug 14, 2024

Yes I think we need to work on the crawler and index to further refine the results. Just for reference I'm pasting the information available on the client side below (it's very limited):

The structure of a search hit

{
  "url": "https://scikit-learn.org/dev/modules/decomposition.html#kpca-solvers",
  "url_without_anchor": "https://scikit-learn.org/dev/modules/decomposition.html",
  "anchor": "kpca-solvers",
  "content": null,
  "type": "lvl3",
  "hierarchy": {
    "lvl0": "User guide",
    "lvl1": "2.5. Decomposing signals in components (matrix factorization problems)",
    "lvl2": "2.5.2. Kernel Principal Component Analysis (kPCA)",
    "lvl3": "2.5.2.2. Choice of solver for Kernel PCA",
    "lvl4": null,
    "lvl5": null,
    "lvl6": null
  },
  "objectID": "13-https://scikit-learn.org/dev/modules/decomposition.html",
  "_highlightResult": {
    "hierarchy": {
      "lvl0": {
        "value": "User guide",
        "matchLevel": "none",
        "matchedWords": []
      },
      "lvl1": {
        "value": "2.5. Decomposing signals in components (matrix factorization problems)",
        "matchLevel": "none",
        "matchedWords": []
      },
      "lvl2": {
        "value": "2.5.2. <mark>Kernel</mark> Principal Component Analysis (kPCA)",
        "matchLevel": "full",
        "fullyHighlighted": false,
        "matchedWords": [
          "kernel"
        ]
      },
      "lvl3": {
        "value": "2.5.2.2. Choice of solver for <mark>Kernel</mark> PCA",
        "matchLevel": "full",
        "fullyHighlighted": false,
        "matchedWords": [
          "kernel"
        ]
      }
    }
  },
  "__position": 50,
  "__hitIndex": 49
}

I've got no access to the crawler though so I think we need to wait for @glemaitre's reply :)

@glemaitre glemaitre self-requested a review September 2, 2024 09:29
@glemaitre
Copy link
Member

I'll try to have a look at the crawler. We should also be several people having access to it. I'll try to sort out.

@glemaitre
Copy link
Member

@Charlie-XIAO From what I intended in the crawler, the lvl0 could serve a section to order the data. It is already used in this way in the instant search.

Regarding providing access, I have to contact Algolia because I'm not the owner of the project.

apiKey: SKLEARN_ALGOLIA_API_KEY,
indexName: SKLEARN_ALGOLIA_INDEX_NAME,
placeholder: "Search the docs ...",
searchParameters: { attributesToHighlight: ["hierarchy.lvl0"] },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently there is a bug that explain why I could not get the desired rendering in the instant box:
algolia/docsearch#2294

drammock pushed a commit to pydata/pydata-sphinx-theme that referenced this pull request Sep 20, 2024
Thanks for the amazing theme! Scikit-learn has migrated to
pydata-sphinx-theme for the main website since version 1.5
(scikit-learn/scikit-learn#29038), and the nice
three-column layout has unlocked many potential improvements to our
website UI/UX-wise. This PRs adds scikit-learn to the list of `Other
projects using this theme`.

However though scikit-learn is not one of the earliest adopters of
pydata-sphinx-theme, I'm asking if it can fit into the list of `Featured
projects` as we have many quite some customizations that could be
helpful for other users of the theme. Some of them include:

- [Landing page](https://scikit-learn.org/)
- [API references](https://scikit-learn.org/stable/api/index.html) (a
searchable table on the index page containing all APIs while keeping the
hierarchy in the primary sidebar)
- [Each API
page](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
(customizing the secondary sidebar)
- [Gallery
examples](https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_5_0.html)
(secondary sidebar customization with sphinx-gallery)
- [Platform-specific installation
instructions](https://scikit-learn.org/stable/install.html#installing-the-latest-release)
(CSS customizations based on tabs in sphinx-design)
-
[Dropdowns](https://scikit-learn.org/stable/modules/linear_model.html#references)
(anchor link for dropdown blocks; toggle-all button for Ctrl-F
searching)
- Algolia search (WIP):
scikit-learn/scikit-learn#29666
- Many other CSS customizations

Looking forward to your feedback :)

*BTW some of the projects were not placed in alphabetical order as
required so I reordered a bit.*
gabalafou pushed a commit to gabalafou/pydata-sphinx-theme that referenced this pull request Oct 6, 2024
Thanks for the amazing theme! Scikit-learn has migrated to
pydata-sphinx-theme for the main website since version 1.5
(scikit-learn/scikit-learn#29038), and the nice
three-column layout has unlocked many potential improvements to our
website UI/UX-wise. This PRs adds scikit-learn to the list of `Other
projects using this theme`.

However though scikit-learn is not one of the earliest adopters of
pydata-sphinx-theme, I'm asking if it can fit into the list of `Featured
projects` as we have many quite some customizations that could be
helpful for other users of the theme. Some of them include:

- [Landing page](https://scikit-learn.org/)
- [API references](https://scikit-learn.org/stable/api/index.html) (a
searchable table on the index page containing all APIs while keeping the
hierarchy in the primary sidebar)
- [Each API
page](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html)
(customizing the secondary sidebar)
- [Gallery
examples](https://scikit-learn.org/stable/auto_examples/release_highlights/plot_release_highlights_1_5_0.html)
(secondary sidebar customization with sphinx-gallery)
- [Platform-specific installation
instructions](https://scikit-learn.org/stable/install.html#installing-the-latest-release)
(CSS customizations based on tabs in sphinx-design)
-
[Dropdowns](https://scikit-learn.org/stable/modules/linear_model.html#references)
(anchor link for dropdown blocks; toggle-all button for Ctrl-F
searching)
- Algolia search (WIP):
scikit-learn/scikit-learn#29666
- Many other CSS customizations

Looking forward to your feedback :)

*BTW some of the projects were not placed in alphabetical order as
required so I reordered a bit.*
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

3 participants