Skip to content

Async pruning #5949

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Apr 21, 2025
Merged

Async pruning #5949

merged 9 commits into from
Apr 21, 2025

Conversation

lutter
Copy link
Collaborator

@lutter lutter commented Apr 16, 2025

So far, the intial prune of a subgraph had to be done synchronously in a terminal. This PR makes it possible to modify the amount of history a subgraph should keep but have the initial and all subsequent prunes happen in the background.

To that end, this PR changes the graphman commands used for pruning. graphman prune now has subcommands:

  • graphman prune run behaves as graphman prune did before this PR
  • graphman prune set just changes how much history a subgraph keeps. The actual pruning happens in the background
  • graphman prune status shows the details of pruning runs

Most of this PR is concerned with keeping information in the database that is shown by graphman prune status. The output of the status command is something like

prune Qm...[19] (run #5)
     range: 12834375 - 12835766 (1391 blocks, should keep 500 blocks)
   started: 2025-04-12 18:43:15-07:00
  finished: 2025-04-16 14:39:29-07:00
  duration: 3d 19h 56m

            table              |         status         |   rows   | batch_size  | duration
-------------------------------+------------------------+----------+-------------+---------
circular_buffer                | r/done               ✓ |        8 |         200 |    102ms
financials_daily_snapshot      | d/done               ✓ |      -45 |       20000 |     17ms
poi2$                          | r/copy_final    ( 37%) |       32 |       20000 |  10h 55m
reward_pool_info               | d/done               ✓ |       -4 |       20000 |     17ms
usage_metrics_daily_snapshot   | d/done               ✓ |      -43 |       20000 |     20ms
vault                          | d/done               ✓ |      -62 |        1600 |     43ms
yield_aggregator               | r/done               ✓ |       12 |         200 |     90ms

graph-node stores the details of the initial pruning run, and then the details of the last ongoing pruning runs.

@lutter lutter requested a review from zorancv April 16, 2025 21:50
@lutter lutter self-assigned this Apr 16, 2025
@lutter lutter force-pushed the lutter/prune-status branch from 482acc8 to 511070a Compare April 16, 2025 23:05
Copy link
Contributor

@zorancv zorancv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of small remarks but otherwise looks great! With all that additional information stored one would wish that pruning could continue after restart of graph-node, but I agree it's too dangerous, and probably not so useful.

@lutter lutter force-pushed the lutter/prune-status branch 2 times, most recently from fc8ef94 to 0397ba0 Compare April 18, 2025 14:58
@lutter lutter force-pushed the lutter/prune-status branch from 0397ba0 to 28fa444 Compare April 21, 2025 15:35
@lutter lutter merged commit 28fa444 into master Apr 21, 2025
6 checks passed
@lutter lutter deleted the lutter/prune-status branch April 21, 2025 17:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants