Skip to content

DOC use Algolia for the search bar #29138

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 28 commits into from

Conversation

glemaitre
Copy link
Member

Take over #28478

Give another try to the Algolia docsearch as a search bar.

Copy link

github-actions bot commented May 30, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: d62827d. Link to the linter CI: here

@glemaitre
Copy link
Member Author

Apart from the choice of rendering and color, I think that the PR is ready for a review.

Copy link
Contributor

@Charlie-XIAO Charlie-XIAO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @glemaitre, this looks very nice! I left some minor comments on styling:

Copy link
Contributor

@Charlie-XIAO Charlie-XIAO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might also need to take care of algolia/sphinx-docsearch#35. I've tried several ways and the only one working is:

html_theme_options = {
    # ...
    "navbar_start": ["navbar-logo", "searchbox"],
    "navbar_persistent": [],
    # ...
}

All other places either: (1) has two copies for different screen sizes while algolia can add the search input box only for one, or (2) do not persist when scrolling the page thus not proper for a search box.

Remarks regarding navbar_persistent: sphinx_docsearch overwrites this field forcefully, so can simply remove it (REF).

Also a minor typo in conf.py:

Typo:       # Renable built-in search disable by sphinx-docsearch
Should be:  # Renable built-in search disabled by sphinx-docsearch

@Charlie-XIAO
Copy link
Contributor

Charlie-XIAO commented Jun 2, 2024

An additional question: It seems that the search results in the preview docs link to pages in the stable version doc?

@glemaitre
Copy link
Member Author

An additional question: It seems that the search results in the preview docs link to pages in the stable version doc?

Yes. I think that it will be one of the limitation. The Algolia crawler is configured to crawl the stable website to populate the index. I assume that if I had the dev website into the index, we will double reference for each entry. I think this is more reasonable to have the current setting.

Copy link
Contributor

@Charlie-XIAO Charlie-XIAO left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks @glemaitre for integrating Algolia search into the theme! I like the instant search experience which I think is widely available nowadays but missing for most Python documentation websites. Just some final suggestions (styling issues that I overlooked and some simplification) ;)

@adrinjalali
Copy link
Member

I don't see how to make it apart from having different indices (one index per domain). For sure, we cannot have multi-index in the docsearch open-source plan. Having everything in the same index will just show duplicate information.

hmm, that's a pity. How do other projects deal with this then?

For the rest of points, I would think that we would need to modify the docsearch javascript file or even more elaborate solution.

What's your own feeling about these raised issues?

@glemaitre
Copy link
Member Author

hmm, that's a pity. How do other projects deal with this then?

From the project that I found, either they are doing as it is in this PR or they usually have a single version. The only one that do something good is the bootstrap version. But I'm not sure how to find the way they do it. I have to check but it might be in the config of the index generation.

What's your own feeling about these raised issues?

They are justified but as I don't know how to solve those and I have difficulties to envisage what it takes to get there.

@adrinjalali
Copy link
Member

@Charlie-XIAO what's your take on the thoughts here?

@glemaitre
Copy link
Member Author

hmm, that's a pity. How do other projects deal with this then?

Actually I found some "facets" feature that seems to answer this question. I'll try to make work the crawler with it.

@glemaitre
Copy link
Member Author

glemaitre commented Jun 3, 2024

OK, so for the versioning, you need to use meta-tags such that the crawler use it when you do the crawling.

https://docsearch.algolia.com/docs/required-configuration/#introduce-global-information-as-meta-tags

This is the way bootstrap is doing it:

image

(e.g. source from https://getbootstrap.com/docs/5.3/getting-started/introduction/)

It might be something that we want to ask pydata-sphinx-theme to add since they are already halfway: https://github.com/pydata/pydata-sphinx-theme/blob/b45af6cc66e25399880183484eab70baf1767b96/src/pydata_sphinx_theme/theme/pydata_sphinx_theme/layout.html#L27

@adrinjalali
Copy link
Member

Another thing I realize (while working on a very unrelated PR) is how this is going to work in development environment. Right now, we can search in the local build of the docs and the results point to the local build, and I guess this will break that. It also certainly won't work w/o internet, but I guess that's rarely the case.

At least with slow internet, I tested (with firefox tools) and this still works.

@glemaitre
Copy link
Member Author

Another thing I realize (while working on a very unrelated PR) is how this is going to work in development environment. Right now, we can search in the local build of the docs and the results point to the local build, and I guess this will break that. It also certainly won't work w/o internet, but I guess that's rarely the case.

I you build locally, I don't have the credential for Algolia so the docsearch is deactivated. Since we are still building the sphinx index, it would actually be great to switch to it. I don't know if as a side effect right now this is actually the case.

@Charlie-XIAO
Copy link
Contributor

Apart from instance search, are there other benefits we want from Algolia search? If not probably it's easier to implement instant search based on the current sphinx search framework than to do all those customizations on Algolia docsearch, but I haven't investigated yet.

@glemaitre
Copy link
Member Author

Apart from instance search, are there other benefits we want from Algolia search?

More than instant search, this is more about fuzzy search on my side. But both features are really nice.

@Charlie-XIAO
Copy link
Contributor

Okay then I may still investigate how to customize Algolia first. Not sure if it's possible to add search context but an "all results" page is definitely possible.

@Charlie-XIAO
Copy link
Contributor

Charlie-XIAO commented Jul 27, 2024

I made a small demo on how to create an redirect to an Algolia "all results" page.

2024-07-28.121142.mp4

If we want to switch to Algolia, I do think we need to work on refining the search results though.

@Charlie-XIAO
Copy link
Contributor

Charlie-XIAO commented Jul 27, 2024

And responding to some earlier comments:

Loïc: no search summary context, which I find very useful to decide whether a match is likely worth clicking on or not
Adrin: there is no search context

Actually there is short context, if the result type is "content". But they will be ordered after the ones whose title matches the query. For instance if you search "a", then all pages having "a" in titles will appear before those having "a" in contents, and since the search widget only lists the top ones you can see only those with no context.

There is indeed a transformItems API that can be used to reorder the items, but somehow there are complicated caching and other mechanisms in Algolia search and transformItems only applies per search group (instead of globally), thus sometimes messing things up. With the "all results" page #29138 (comment) perhaps we don't need to take this risk?

Adrin: there is no way to see "more results"

See demo in #29138 (comment); this overwrites the default sphinx search page, and can be linked from the footer of the Algolia search widget in each page. It is implemented with Algolia instantsearch.

Adrin: hitting enter after typing the search opens the first link instead of a page showing search results with context, which I would have preferred

This I haven't found a good way. The Algolia docsearch behavior is that "Enter" is "select", so it might be bad (and hard) to override it (in a decent way). I might be able to bind some other key though, e.g. Alt+Enter for going to the "all results" page.

Loïc: dev search would go to stable website is I understand #29138 (comment). Maybe there is a way to tweak this I don't know.
Adrin: the entries forward to /stable, while they should stay in the same space (/dev, /stable, /1.3, ...)

I haven't investigated and don't have access to the indices, but see Guillaume's comment #29138 (comment)

Adrin: Another thing I realize (while working on a very unrelated PR) is how this is going to work in development environment. Right now, we can search in the local build of the docs and the results point to the local build, and I guess this will break that. It also certainly won't work w/o internet, but I guess that's rarely the case.

We can use an environement variable (e.g. SKLEARN_DOC_USE_ALGOLIA) as a switch. If it is set to "1" (in the CI) we change those navbar setups, introduce the JS and CSS, override the search page, etc. otherwise (the variable is not set or "0") we use the default local sphinx search.

@glemaitre
Copy link
Member Author

Thanks @Charlie-XIAO. The full search page is already something that would have take me ages and a lot of frustration to do :)

With this addition, it seems that we are getting closer to the expectations.

@adrinjalali
Copy link
Member

This I haven't found a good way. The Algolia docsearch behavior is that "Enter" is "select", so it might be bad (and hard) to override it (in a decent way). I might be able to bind some other key though, e.g. Alt+Enter for going to the "all results" page.

Ideally by default nothing would be "selected" really, and enter w/o arrow keys would show the page. If the user "selects" something explicitly with arrow keys, then it makes sense for enter to show that page.

@Charlie-XIAO
Copy link
Contributor

Charlie-XIAO commented Aug 1, 2024

Ideally by default nothing would be "selected" really, and enter w/o arrow keys would show the page. If the user "selects" something explicitly with arrow keys, then it makes sense for enter to show that page.

I agree but it's kind of beyond my knowledge how to achieve this - will try to investigate a bit more.

BTW @glemaitre do you mind my directly pushing my updated version into this PR?

@glemaitre
Copy link
Member Author

BTW @glemaitre do you mind my directly pushing my updated version into this PR?

Absolutely not :) Feel free to push any improvement. I wanted just to spin something out with the tool.

@Charlie-XIAO
Copy link
Contributor

I'm not sure why CircleCI failed to checkout: https://app.circleci.com/pipelines/github/scikit-learn/scikit-learn/60621/workflows/0d66d144-1d21-4da8-9094-190a4927aab0/jobs/282498

error: cannot lock ref 'refs/remotes/origin/pull/29138': 'refs/remotes/origin/pull/29138/head' exists; cannot create 'refs/remotes/origin/pull/29138'
From github.com:scikit-learn/scikit-learn
 ! [new ref]             refs/pull/29138/head -> origin/pull/29138  (unable to update local ref)
failed to fetch repo: exit status 1

@adrinjalali
Copy link
Member

That's a very cryptic error @Charlie-XIAO . I re-ran the workflow, but same issue. You could open a new PR and see if that helps (we can close this one then)

@Charlie-XIAO
Copy link
Contributor

So CI is good in #29666, I think we can move the discussion to there so we can see the built artifacts.

@adrinjalali
Copy link
Member

Closing to keep the discussions in the other PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Development

Successfully merging this pull request may close these issues.

4 participants