-
Notifications
You must be signed in to change notification settings - Fork 11
PG-1866 Reset wal cache on shmem init #542
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
PG-1866 Reset wal cache on shmem init #542
Conversation
78215a0
to
2f3f7dc
Compare
Codecov Report❌ Patch coverage is Additional details and impacted files@@ Coverage Diff @@
## TDE_REL_17_STABLE #542 +/- ##
====================================================
Coverage ? 82.46%
====================================================
Files ? 25
Lines ? 3228
Branches ? 511
====================================================
Hits ? 2662
Misses ? 455
Partials ? 111
🚀 New features to boost your workflow:
|
2f3f7dc
to
46daae4
Compare
46daae4
to
767a59e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's interesting that some memory survives restarts. What is the reason to restart then?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if this is the best place for this fix - see #530, and also how the principal key cache initialization works similarly to that.
I assume that we also rerun _PG_Init
during a restart, maybe we should detect the situation there instead?
And while it doesn't cause bugs now, shouldn't we also clear the principal key cache?
The principal key cache lives in shared memory and not in the backend, so that one is fine.
I believe we don't do that when the postmaster does a "soft" restart on backend crash as the shared preload libraries are already loaded in the postmaster.
Again, the principal key cache lives in shared memory which is cleared by the postmaster, and then reinitialized before any other backends are launched (except on windows) i believe. I do however think we need to look through all of our assumptions about what's done at startup with the lens of how postmaster does a soft restart. I couldn't find anything glaringly obvious though. This is not at all related to #530 as that one is for shared memory. The issue here is global memory in each backend that they inherit as they fork. |
I see, I forgot that the wal key cache isn't also in shmem because this call is in |
We already do initialization of backend global wal key stuff in the shmem routines. This cleanup is for the future I believe. Because the whole thing is quite janky. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because the whole thing is quite janky.
That's why it would be nice to keep this also in InitWrite
where we initialize the cache, to not increase the chaos even more.
But I agree that this is mostly just organization and we have to refactor this, so we can merge it any way.
@@ -160,6 +160,8 @@ TDEXLogShmemInit(void) | |||
|
|||
Assert(LWLockHeldByMeInMode(AddinShmemInitLock, LW_EXCLUSIVE)); | |||
|
|||
pg_tde_free_wal_key_cache(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels very scary. Is this really the only thing we need to reinitialize?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The one I'm unsure about is the ddlEventStack, but we're focusing on wal right now so I didn't look into it. The rest looked fine to me.
They're either initialized to values that doesn't change, or are initialized in the shmem_startup_hook already. I may have missed something though.
It seems like there are cases when the postmaster have "restarted" after a backend crash where the wal cache inherited from the postmaster is wrong. I'm not at all sure exactly how and why this happens, but this patch fixes a bug with this and allows recovery/013_crash_restart to pass with WAL encryption enabled.
I moved it there and added a comment. I agree that it probably fits better there. |
767a59e
to
a3d2a24
Compare
It seems like there are cases when the postmaster have "restarted"
after a backend crash where the wal cache inherited from the postmaster
is wrong.
I'm not at all sure exactly how and why this happens, but this patch
fixes a bug with this and allows recovery/013_crash_restart to pass with
WAL encryption enabled.