Skip to content

Conversation

dAdAbird
Copy link
Member

Before this commit, we XLogged the provider ID (keyringId) of the old key. Yet, we then attempt to fetch the new key from the old provider during the Redo, which obviously fails and crashes the recovery. So the next steps lead to the recovery stalemate:

  • Create new provider (with new destination - mount_path, url etc).
  • Create new server/global key.
  • Rotate key.
  • <Crash!>

This commit fixes it by Xlogging the new key's provider ID.

For: PG-1895

@dAdAbird dAdAbird changed the title Fix XLogging of rotatated key Fix XLogging of rotated key Aug 26, 2025
@dAdAbird dAdAbird requested a review from AndersAstrand August 26, 2025 13:39
@codecov-commenter
Copy link

codecov-commenter commented Aug 26, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 82.22%. Comparing base (415cb8d) to head (6e47930).
⚠️ Report is 8 commits behind head on release-17.5.3.

❌ Your project status has failed because the head coverage (82.22%) is below the target coverage (90.00%). You can increase the head coverage or adjust the target coverage.

Additional details and impacted files
@@               Coverage Diff               @@
##           release-17.5.3     #552   +/-   ##
===============================================
  Coverage           82.22%   82.22%           
===============================================
  Files                  25       25           
  Lines                3246     3246           
  Branches              512      512           
===============================================
  Hits                 2669     2669           
  Misses                465      465           
  Partials              112      112           
Components Coverage Δ
access 84.70% <100.00%> (ø)
catalog 87.65% <ø> (ø)
common 77.77% <ø> (ø)
encryption 72.97% <ø> (ø)
keyring 73.21% <ø> (ø)
src 94.18% <ø> (ø)
smgr 96.53% <ø> (ø)
transam ∅ <ø> (∅)
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Collaborator

@AndersAstrand AndersAstrand left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question about why we use the old key for any of the fields. If there is no good reason we should probably use the new key for all of them.

@@ -319,7 +319,7 @@ pg_tde_perform_rotate_key(const TDEPrincipalKey *principal_key, const TDEPrincip
XLogPrincipalKeyRotate xlrec;

xlrec.databaseId = principal_key->keyInfo.databaseId;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there any reason we don't want to use the new_principal_key as the source of all three fields?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. I was on the fence with that and decided in favor "less changes". But given it the second thought after your comment, yes, it's better to change it for consistency. Database will be the same anyway, as we can't rotate key "into another db". I'll do the change

Before this commit, we XLogged the provider ID (keyringId) of the old
key. Yet, we then attempt to fetch the new key from the old provider
during the Redo, which obviously fails and crashes the recovery.
So the next steps lead to the recovery stalemate:
- Create new provider (with new destination - mount_path, url etc).
- Create new server/global key.
- Rotate key.
- <Crash!>

This commit fixes it by Xlogging the new key's provider ID.

For: PG-1895
@dAdAbird dAdAbird merged commit 2364be2 into percona:release-17.5.3 Aug 26, 2025
19 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants