Skip to content

Infinite loop/hang when a query returns an index for an object that isn't present #1994

@techlover10

Description

@techlover10
  • [✓] Tested with the latest Haystack release
  • [✓] Tested with the current Haystack master branch

When a search query returns an index for which there is no associated model, for whatever reason, Haystack can hang indefinitely.

Expected behaviour

The query should gracefully omit the stale index.

Actual behaviour

The result cache invariant gets broken and the query gets stuck in an endless loop.

Steps to reproduce the behaviour

  1. Create some number of objects and index them.
  2. Delete or otherwise hide one of the objects from being returned in search results, without updating the search index.
  3. Perform a search query which would include the index of the hidden or removed object in the search results.

I was able to identify and resolve the issue for our own internal fork, but I wanted to report it nonetheless because it seems to be a very obviously incorrect logical issue in the code. In line 193 of query.py, the "else" case breaks the _result_cache invariant of not having None's in the middle of your results.

            else:
                    # No objects were returned -- possible due to SQS nesting such as
                    # XYZ.objects.filter(id__gt=10) where the amount ignored are
                    # exactly equal to the ITERATOR_LOAD_PER_QUERY
                    del self._result_cache[: len(results)]
                    self._ignored_result_count += len(results)
                    break

What this means is that as you go through pages of results, if you end up with one result that isn't matched back to a model, this else block kicks in and deletes the first len(results) items from self._result_cache, which not only removes valid results but also mutates self._result_cache such that cache_start and cache_end which were defined prior to calling self.post_process_results(results) are no longer valid. I did not submit a pull request as it's unclear to me what the purpose of this block even is, so I don't really know how to fix it (the git blame appears to cite a broken test case as the reason this code was introduced many years ago). Our internal solution was just to replace this entire block to just increment self.ignored_result_count (there's no reason why it should be incrementing by len(results) in an iterator over results as far as I can tell) and then continue instead of breaking.

As it stands, the result of this mutation is that you can end up in a situation where self._result_cache can look something like this:
[<model>, <model>, ..., <model>, None, None, None, None, None, None, None, None, None, None, <model>, <model>, ..., <model>, None, None, ..., None]

This results in an infinite loop while filling the cache as setting the current_cache_max to self._result_cache.index(None) will result in an index that is not the end of the cached results, and therefore an index for which the queries to the search backend will return a non-empty page of results. Haystack will then continue indefinitely trying to cache "more" results but will never complete this task as it is now stuck in a state where the "next page" of results is never empty.

Configuration

  • Operating system version: AlmaLinux 8.6
  • Search engine version: Solr 6.6.6
  • Python version: 3.9.7
  • Django version: 3.2.7
  • Haystack version: 3.3.0

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions