-
-
Notifications
You must be signed in to change notification settings - Fork 2.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Large search indexes can cause Lunr to freeze the page while it creates the index. #859
Comments
That seems reasonable to me. Presumably the problem occurs due to the large number of pages. On a smaller site, this would not be a problem as the wait time would be much less. |
We have 50 files with 18,936 lines of Markdown. I don't know if that's considered large, but even for smaller projects running on slower servers it would provide a better user experience. Thanks for looking into it! |
It appears that the RTD theme includes the default content "Sorry, page not found." on the search results page (see search.html#L18). However, the search tool will also replace any content with |
This is quite interesting. I assumed the delay was the length of time it takes to download the search index (about 0.5mb). However, there is a long delay after it is downloaded. So is the delay while lunr.js is processing the index? Meaning the slowness is probably more dependant users machines than the server. Looking a the Lunr change log there are a few performance improvements in
|
I just cloned and tested with the DataTorrent docs locally, which confirms the slowness is with the JavaScript as downloading is almost instantaneous. Even worse, my browser (Chrome) is locked up by the JavaScript for a good few seconds, so I can't actually click anything. I done some very rough measurements to try and find what was making it so slow. My timings below are all rounded because I don't have confidence in my measuring. The loading of the JSON into Lunr is the culprit. This for loop takes around 8(!!) seconds. More specifically it is the call to I don't have time to keep looking now, but we need try and determine what makes these so slow. Is it just because these are long pages? Is there something that causes an issue with Lunr? I tested with both Lunr 0.5.7 and 0.6.0 and got similar results. 0.6.0 might actually be a little slower, but I didn't do enough tests to be confident about that. |
I noticed the same behavior when looking into this as well. Although I didn't big any deeper. The point is that I can confirm this behavior is not isolated.
I don't see any reason why we can't do this now. I'll push a PR when I get a few minutes unless someone beats me to it. |
The search tool will replace the text with "No results found" if that is the case. No reason to display is here. Also, the default text displays when the search is running (especially if the search takes a long time), so it might as well be accurate and indicate that the search is running. Addresses #859.
The loops in Index.add can easily be called hundreds of thousands of times for large documents. Larger documents can perform particularly badly[1] under Chrome. Removing the usage of reduce and filter in favour of inline native for loops can lead to a 20-30% performance increase in Chrome. Firefox also sees a similar but much smaller impovement (~5%). [1]: mkdocs/mkdocs#859 (comment)
I investigated further. FireFox is much faster than Chrome. I managed to improve the performance under Chrome a bit. See: olivernn/lunr.js#208 |
The loops in Index.add can easily be called hundreds of thousands of times for large documents. Larger documents can perform particularly badly[1] under Chrome. Removing the usage of reduce and filter in favour of inline native for loops can lead to a 20-30% performance increase in Chrome. Firefox also sees a similar but much smaller impovement (~5%). [1]: mkdocs/mkdocs#859 (comment)
I've pushed @d0ugal's changes in 0.7.0 of lunr, so there should hopefully be some improvements from that change. Without knowing anything of how mkdocs makes use of lunr, I have a few suggestions that could improve the performance and user experience further:
As an example, the example lunr app makes use loading a pre-built index. I think a member of the Angular.js team wrote up how they improved performance using web workers too. |
Our usage of Lunr is very simple. I think all the relevant code is less than 100 lines. It is a testament to Lunr that it has gotten us this far before we hit an issue like this. Thanks for the pointers, it sound like those would be both valuable additions. I'll look into building the index, I think that would be the biggest win. |
The main issue for us, is that we are Python based, pre-building the index would be tricky. We would need to shell out to node and call a file. I would be okay with that if it gracefully fell back to the current approach when node isn't available. However, this sort of functionality might be better suited for a plugin (so another candidate for #206). That maybe means we should look into web workers for now and revisit this later. |
I got the same issue. I have several files only. Every time I visit my mkdocs-based web document, the page is frozen. Firefox warning dialog appeared several times, that allowed me to stop lunr.js I found that search_index.json response was about 1.2MB. It is fast to download but quite slow to parse in Javascript |
I had wondered if there was a way to pre-build the index in Python. This is what I found: The index is just JSON, so technically it should be possible to pre-build the index and serve that rather than the JSON file we serve today. However, there is no published spec for the index; although it is versioned. The index stores the version of lunr.js which was used to create it and an error is raised if a different version of lunr.js attempts to load the index. This suggests that backward incompatible changes between versions are possible/expected. However, as the lunr.js lib is shipped with MkDocs we have full control over which version is being used. The tricky part is the lack of a spec. Someone would have to go through the JS code and re-implement the index builder in Python and transitioning from one version to the next has the potential to require a similar amount of work. I suppose if a third party created such a lib, it might make sense to use it (or via a Plugin if/when MkDocs adds support), but until then, calling to Node is the more sensible approach. But even that feels more like something for a Plugin. In the end, using webworkers to avoid a browser freeze is probably the most sensible short-term solution. |
Maybe I don't understand it, but regards, |
@rickpeters the The ideal solution would be to have |
Thanks very much for this clarification! |
The lunr.js library includes an example script which could be adapted and run under Node.js to build the index. You would need to call out to Node in your build processes (or call JavaScript from Python some way). That would give you a prebuilt index. However, the current lunr.js calling code in MkDocs would need to be updated to accept a prebuilt index (at aprox search.js#L32) similar to this. That code in search.js would need to be modified for a webworker as well. The code would need to be placed in a webworker with a fallback for when a webworker is not available (in some older browsers). |
FYI, I have succeeded in getting the search index to build here: 3009a3a. It was relatively easy using the PyExecJS lib. Still not usable though as search.js has not yet been modified. You can follow my progress here: master...waylan:search |
So I thought I had a working solution for pre-building the index (here: master...waylan:search), but I'm getting weird errors I can't make sense of on Windows. It works fine on Linux and I don't have access to a Mac for testing right now. If anyone wants to help, please test (my branch) on your local machine and report back. |
The require path must use forward slashes: lunr_path = os.path.join(
os.path.dirname(os.path.abspath(__file__)),
'assets/search/mkdocs/js/lunr.min.js'
).replace('\\', '/') |
Thanks @facelessuser. That solved one problem. I have Node installed on all my machines and it is working now. However, as I understand it, PyExecJS does not need any dependencies installed in Windows or OS X as it can use JScript or Apple JavaScriptCore, each of which are installed by default on their respective systems (Linux will always need an additional dependency (one of Node.js, PyV8, Spidermoney, PhantomJS, ...), but that seems like an acceptable compromise). When I force the use of JScript, I'm getting an error I can't make sense of ( |
Maybe I don't know enough about the details but could the index, once created be saved into localstorage and then accessed from any subsequent page load without requiring it to be built again and again? |
@YoungElPaso, that feature would need to be added to the library we use (olivernn/lunr.js), and if I recall correctly, such a feature request was made in the past and denied for good reason. I don't recall the details now, but perhaps a search of that project would answer your question better. |
@YoungElPaso @waylan You can achieve this without any modifications to the lunr. There is good support for serialising an index to JSON and then loading an index from JSON, this will be significantly quicker than regenerating the index. Perhaps something like this: if (existsInLocalStorage()) {
var idx = lunr.Index.load(getJSONFromLocalStorage())
} else {
var idx = buildIndexFromScratch()
storeIndexInLocalStorage(JSON.stringify(idx))
} |
Before we jump down the "store the index in LocalStorage" rabbit hole, we need to address the creation of the index in the first place. Even using LocalStorage, the first visit to a site would be painfully slow and unresponsive for the user (on a large site). You have lost your user on their first visit and they probably won't be coming back. Until we have that problem fixed, there is no point of exploring how to store the index across multiple visits. And if we do fix that problem, then maybe we don't need to worry about storing the index across visits. My point is, even if LocalStorage is possible, it doesn't solve the problem. it just addresses one of the symptoms. As an aside, my current roadmap for this issue is as follows:
Note that I only have item 1 completed. I started item 4 and realized that the above would be a better approach, so I stopped and moved my attention to the Plugin API. The great thing about using a Plugin is that others can create their own search which starts with a different set of assumptions (or different backend tools) to better meet there needs. |
@waylan completely agree that pre-building is the best option here. Was just saying that there is nothing in lunr preventing a serialised index being stored client side. The tradeoff made with pre-building the index and serving it to the client is that, even gzipped, it can be quite large. That said, if it was me, I'd still opt for serving a pre-built index. If there is anything I can help with from lunr then please do let me know, I'm more than happy to help. |
What would be ideal is a Python library that builds the index. Requiring our users to have both Python and Node installed is a little much (our Python script calls out to a Node script to build the index). As it happens, many users have both installed anyway, but I'm not comfortable making it a hard requirement and having the fallback results in two very different user experiences. Of course, I don't expect you to go build a Python clone of your JavaScript lib, but a publicly published spec of your index file format could go a long way toward someone else creating a Python lib (or Ruby, or ... whatever other language static site generators are written in). |
This is similar to an approach I am hoping to move towards with a new version of lunr, documenting (JSON Schema?) the serialisation format would open up the possibility of using the JavaScript version as a client, with the actual indexing etc being done in any other language that can generate the serialised index. Python would probably be an excellent choice for this given the number of NLP and IR libraries available. My Python is non-existent, so I'm unlikely to be able to produce anything production ready myself, but by documenting the format others could certainly produce an implementation that would be easier to integrate into MkDocs (or other tools). |
This builds an actual lunr.js search index via Node.js in addition to a JSON array of page objects previously used to build the index client side. Note that both are needed as the array of page objects is used to display search results. If Node fails (not installed or errors out), then fallback to old behavior by creating an empty index. If the client (browser) receives an empty index, then it builds the index from the array of pages as previously, which is fine for most (small) sites. Large sites will want to make sure Node is installed to avoid the browser from hanging during index creation. Partially addresses mkdocs#859.
This builds an actual lunr.js search index via Node.js in addition to a JSON array of page objects previously used to build the index client side. Note that both are needed as the array of page objects is used to display search results. If Node fails (not installed or errors out), then fallback to old behavior by creating an empty index. If the client (browser) receives an empty index, then it builds the index from the array of pages as previously, which is fine for most (small) sites. Large sites will want to make sure Node is installed to avoid the browser from hanging during index creation. Partially addresses mkdocs#859.
Just a note: This may be a perfect case for |
Fixes mkdocs#1218, fixes mkdocs#1127, and partially addresses mkdocs#859.
Fixes mkdocs#1218, fixes mkdocs#1127, and partially addresses mkdocs#859.
FYI to everyone following this: A refactor of search is available for review in #1418 which addresses all of the concerns raised here and more. |
* Use a web worker in the browser with a fallback (fixes #859 & closes #1396). * Optionally pre-build search index (fixes #859 & closes #1061). * Upgrade to lunr.js 2.x (fixes #1319). * Support search in languages other than English (fixes #826). * Allow the user to define the word separators (fixes #867). * Only run searches for queries of length > 2 (fixes #1127). * Remove dependency on require.js, mustache, etc. (fixes #1218). * Compress the search index (fixes #1128).
When using readthedocs theme, search shows "no results" for a few seconds while search is being performed, and then results are loaded. However, few seconds are enough to convince the users nothing is found, and they move on.
For example see: http://docs.datatorrent.com Try searching for "test" and you will see the following for a while:
Perhaps an indicator of search in progress can be added?
The text was updated successfully, but these errors were encountered: