Skip to content

Revert changes to CursorPagination that caused serious performance regression #9359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
kylebebak opened this issue Apr 2, 2024 · 6 comments

Comments

@kylebebak
Copy link

kylebebak commented Apr 2, 2024

Problem

Sorry for not going through the steps in the checklist, I don't have time right now, but I think this issue is important. #8912 introduced a serious performance regression in cursor pagination if you're using Postgres (and likely other DBs).

Let's say you're using a created_at column for cursor pagination. The code changes in the PR I linked to add OR created_at IS NULL to the WHERE clause in the SQL query generated by the ORM, even if created_at is not a nullable column.

In other words, a query like select * from item where created_at < '...' order by created desc limit 100 becomes select * from item where created_at < '...' OR created_at IS NULL order by created desc limit 100.

So why is this a problem? If you're doing cursor pagination on created_at you have a b-tree index on created_at. With the first query, the query planner uses this index, which lets the DB fetch records from a huge table without breaking a sweat. However, with the second query, the query planner doesn't use this index. A query against a table with e.g. 100m records might go from taking 10ms to taking 10s. We just upgraded DRF and this happened to us.

The example above is the standard use case for cursor pagination. CursorPagination now performs terribly for its most common use case. Cursor pagination should almost never use a nullable column, which is why the original authors of this code didn't write it the way it's written now. From the DRF docs:

Cursor based pagination requires that there is a unique, unchanging ordering of items in the result set. This ordering might typically be a creation timestamp on the records, as this presents a consistent ordering to paginate against.

I'm guessing the author of the above PR was not a "database performance person", but cursor pagination is largely motivated by performance concerns... Also from the docs:

With extremely large datasets pagination using offset-based pagination styles may become inefficient or unusable. Cursor based pagination schemes instead have fixed-time properties, and do not slow down as the dataset size increases.

Solution

The best way to fix this issue is roll back the changes introduced in that PR. IMO it would also be good to get more eyes on code changes that affect "performance-critical" code, like CursorPagination.

@auvipy
Copy link
Member

auvipy commented Apr 27, 2024

thanks for the detailed report of the issue. we should revert this.

@realsuayip
Copy link
Contributor

@auvipy Could you consider releasing 3.15.2 for this one?

@auvipy
Copy link
Member

auvipy commented May 26, 2024

@tomchristie is it possible to cut a minor release?

@alexandre-paroissien
Copy link

Hi team, could 3.15.3 be released ? I think this is the last breaking change to be reverted, so releasing 3.15.3 would allow migrating from 3.14, thanks

@tomchristie
Copy link
Member

My understanding is that this has been resolved in 3.15.2.
See item #9381 in the release notes.

Happy to re-open this if I've misunderstood something here.

@alexandre-paroissien
Copy link

ok I see, then there might be another issue, as I have endpoint going from 10 queries to 50+ queries in the unit tests, when upgrading from 3.14.0 to 3.15.2, I dont have a minimal reproducible example to share now, will try to make one

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants