Skip to content

Improve how pruning estimates how much of a table will be removed #6012

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 5 commits into from
May 20, 2025

Conversation

lutter
Copy link
Collaborator

@lutter lutter commented May 15, 2025

Pruning can use one of two strategies to get rid of unneeded rows in a table: deleting or rebuilding. Rebuilding is better when we remove more than 50% of a table, otherwise, deletion should be used.

Currently, we estimate how much of a table will be removed by looking at the percentage of historical blocks we prune weighted by the ratio of entity versions to entities. This basically assumes that entity changes are very evenly distributed across blocks. In practice, this leads to serious overestimation of how much of a table will be removed. In one case, the estimate was that 58% of a table would be removed, when in reality only 2% of the table would be removed.

With this change, we use database statistics to improve the estimates which reflect reality much more closely.

@lutter lutter force-pushed the lutter/prune-estimate branch from e4b8b3c to bb2c164 Compare May 16, 2025 00:05
@lutter lutter requested a review from zorancv May 16, 2025 00:06
@lutter lutter self-assigned this May 16, 2025
@lutter lutter force-pushed the lutter/prune-estimate branch from bb2c164 to 6ee64b4 Compare May 16, 2025 00:22
Copy link
Contributor

@zorancv zorancv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Voiced a single concern which I can't judge how relevant it is. Other than that the improvement of PR seems to be really good and important.

@lutter lutter force-pushed the lutter/prune-estimate branch from 6ee64b4 to 6cd68d8 Compare May 20, 2025 22:26
We are not using those right now and there's therefore no point in storing
them
@lutter lutter merged commit 553b6ac into master May 20, 2025
6 checks passed
@lutter lutter deleted the lutter/prune-estimate branch May 20, 2025 23:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants