-
Notifications
You must be signed in to change notification settings - Fork 894
chore: fix concurrent CommitQuota
transactions for unrelated users/orgs
#15261
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 1 commit
dce750d
4fd9760
6215603
9ccd0b2
784a98f
5f021ad
426eeb6
b69dd0f
ec8f799
f613f31
746203d
c0d05db
245f6df
422e694
685b21e
33dbd92
a99882a
6ce8bfe
3a9270d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
- Loading branch information
There are no files selected for viewing
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -18,14 +18,13 @@ INNER JOIN groups ON | |
WITH latest_builds AS ( | ||
SELECT | ||
DISTINCT ON | ||
(wb.workspace_id) wb.id, | ||
wb.workspace_id, | ||
(wb.workspace_id) wb.workspace_id, | ||
wb.daily_cost | ||
FROM | ||
workspace_builds wb | ||
-- This INNER JOIN prevents a seq scan of the workspace_builds table. | ||
-- Limit the rows to the absolute minimum required, which is all workspaces | ||
-- in a given organization for a given user. | ||
-- This INNER JOIN prevents a seq scan of the workspace_builds table. | ||
-- Limit the rows to the absolute minimum required, which is all workspaces | ||
-- in a given organization for a given user. | ||
INNER JOIN | ||
workspaces on wb.workspace_id = workspaces.id | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ok, so in the query plan you have now, it's doing this nested loop where it finds the workspace IDs, then for each workspace ID it's doing this bitmap query:
That is, it first scans the index to find the pages to load, then scans the pages with the Do you know whether this results in page locks for the transaction, or tuple locks (I'm assuming these are row-level locks)? Page locks have a greater likelihood of catching unrelated transactions. And, some suggestions:
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
I was under the impression it was row-level locks, but I admit that was a result of my experiments with very few rows. The docs seem to imply it could take out larger locks.
Originally there was a suspicion we were getting this error from different transactions interring with I thought about moving I think this index + sorting by build number would be a large win: As I look at this query more, I don't see why we need to inner join If we add the index According to the quote I pulled though, is the goal just to reduce the number of rows touched to prevent the lock from having to go more "coarse" to the page lock? The text implies the behavior depends on the memory availability. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that we don't need to join to And yeah, my understanding is that with page locks, updating quota for an unrelated user could cause a serialization error if the builds happen to share a page. So it's undesirable to use page locks if we could get away with finer grained locking. I realize that postgres can use page locks for memory reasons, and maybe there isn't anything we can do about that, but I'm also wondering whether it automatically uses page locks when it does a bitmap query, rather than tuple locks if we were able to use the index. |
||
WHERE | ||
|
@@ -39,7 +38,7 @@ SELECT | |
coalesce(SUM(daily_cost), 0)::BIGINT | ||
FROM | ||
workspaces | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Looks like it's still doing a Seq scan on
Doesn't that mean any update to the workspaces table will still cause a serialization error? However! I think the query planner takes into account cardinality of the columns relative to the rows in the table, so it might make a different plan in a "real" deployment. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, we still do a seq scan on the workspaces.
The serializable lock only affects tables (rows) that it written to. The There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. How do you square this with the observation that a Seq Scan on Maybe it depends on the number of workspaces and the assumed cardinality. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Experimentally, I am running this:
The TX mode that is important here is
What I see my change doing is
And now does:
So we went from a full The With my query changes raw
Before my changes raw
Disclaimer: I recognize the docs suggest these lock patterns can change with table size and memory availability. This is a very small dataset. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Interesting that serializable transactions only interfere with other serializable transactions --- OK for now, but I feel like this puts us on shaky ground. Today we insert new builds with repeatable read isolation, but do not set the Or, if we ever introduce a new update query on Not necessarily for this PR, but I think we should still aim to remove the Seq Scan on There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Yes.
I agree. I will make a follow up PR (or issue if it takes to long) to address all these extra concerns. In this PR I did disable retries, so a unit test is more likely to fail if we hit this (would be a valid flake). I do not see a way to implement forward thinking protections if someone adds a new There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Just to clarify, I specifically meant the error There is another error you can get that is not between 2 serializable transactions.
|
||
JOIN latest_builds ON | ||
INNER JOIN latest_builds ON | ||
latest_builds.workspace_id = workspaces.id | ||
WHERE | ||
NOT deleted AND | ||
|
Uh oh!
There was an error while loading. Please reload this page.