Skip to content

Commit 4abcce8

Browse files
committed
Improve coding around the fsync request queue.
In all branches back to 8.3, this patch fixes a questionable assumption in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are no uninitialized pad bytes in the request queue structs. This would only cause trouble if (a) there were such pad bytes, which could happen in 8.4 and up if the compiler makes enum ForkNumber narrower than 32 bits, but otherwise would require not-currently-planned changes in the widths of other typedefs; and (b) the kernel has not uniformly initialized the contents of shared memory to zeroes. Still, it seems a tad risky, and we can easily remove any risk by pre-zeroing the request array for ourselves. In addition to that, we need to establish a coding rule that struct RelFileNode can't contain any padding bytes, since such structs are copied into the request array verbatim. (There are other places that are assuming this anyway, it turns out.) In 9.1 and up, the risk was a bit larger because we were also effectively assuming that struct RelFileNodeBackend contained no pad bytes, and with fields of different types in there, that would be much easier to break. However, there is no good reason to ever transmit fsync or delete requests for temp files to the bgwriter/checkpointer, so we can revert the request structs to plain RelFileNode, getting rid of the padding risk and saving some marginal number of bytes and cycles in fsync queue manipulation while we are at it. The savings might be more than marginal during deletion of a temp relation, because the old code transmitted an entirely useless but nonetheless expensive-to-process ForgetRelationFsync request to the background process, and also had the background process perform the file deletion even though that can safely be done immediately. In addition, make some cleanup of nearby comments and small improvements to the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
1 parent 3d03c97 commit 4abcce8

File tree

6 files changed

+109
-58
lines changed

6 files changed

+109
-58
lines changed

src/backend/postmaster/checkpointer.c

Lines changed: 36 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@
105105
*/
106106
typedef struct
107107
{
108-
RelFileNodeBackend rnode;
108+
RelFileNode rnode;
109109
ForkNumber forknum;
110110
BlockNumber segno; /* see md.c for special values */
111111
/* might add a real request-type field later; not needed yet */
@@ -927,17 +927,22 @@ CheckpointerShmemSize(void)
927927
void
928928
CheckpointerShmemInit(void)
929929
{
930+
Size size = CheckpointerShmemSize();
930931
bool found;
931932

932933
CheckpointerShmem = (CheckpointerShmemStruct *)
933934
ShmemInitStruct("Checkpointer Data",
934-
CheckpointerShmemSize(),
935+
size,
935936
&found);
936937

937938
if (!found)
938939
{
939-
/* First time through, so initialize */
940-
MemSet(CheckpointerShmem, 0, sizeof(CheckpointerShmemStruct));
940+
/*
941+
* First time through, so initialize. Note that we zero the whole
942+
* requests array; this is so that CompactCheckpointerRequestQueue
943+
* can assume that any pad bytes in the request structs are zeroes.
944+
*/
945+
MemSet(CheckpointerShmem, 0, size);
941946
SpinLockInit(&CheckpointerShmem->ckpt_lck);
942947
CheckpointerShmem->max_requests = NBuffers;
943948
}
@@ -1094,11 +1099,15 @@ RequestCheckpoint(int flags)
10941099
* Forward a file-fsync request from a backend to the checkpointer
10951100
*
10961101
* Whenever a backend is compelled to write directly to a relation
1097-
* (which should be seldom, if the checkpointer is getting its job done),
1102+
* (which should be seldom, if the background writer is getting its job done),
10981103
* the backend calls this routine to pass over knowledge that the relation
10991104
* is dirty and must be fsync'd before next checkpoint. We also use this
11001105
* opportunity to count such writes for statistical purposes.
11011106
*
1107+
* This functionality is only supported for regular (not backend-local)
1108+
* relations, so the rnode argument is intentionally RelFileNode not
1109+
* RelFileNodeBackend.
1110+
*
11021111
* segno specifies which segment (not block!) of the relation needs to be
11031112
* fsync'd. (Since the valid range is much less than BlockNumber, we can
11041113
* use high values for special flags; that's all internal to md.c, which
@@ -1115,8 +1124,7 @@ RequestCheckpoint(int flags)
11151124
* let the backend know by returning false.
11161125
*/
11171126
bool
1118-
ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
1119-
BlockNumber segno)
1127+
ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
11201128
{
11211129
CheckpointerRequest *request;
11221130
bool too_full;
@@ -1172,6 +1180,7 @@ ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
11721180
/*
11731181
* CompactCheckpointerRequestQueue
11741182
* Remove duplicates from the request queue to avoid backend fsyncs.
1183+
* Returns "true" if any entries were removed.
11751184
*
11761185
* Although a full fsync request queue is not common, it can lead to severe
11771186
* performance problems when it does happen. So far, this situation has
@@ -1181,7 +1190,7 @@ ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
11811190
* gets very expensive and can slow down the whole system.
11821191
*
11831192
* Trying to do this every time the queue is full could lose if there
1184-
* aren't any removable entries. But should be vanishingly rare in
1193+
* aren't any removable entries. But that should be vanishingly rare in
11851194
* practice: there's one queue entry per shared buffer.
11861195
*/
11871196
static bool
@@ -1203,18 +1212,20 @@ CompactCheckpointerRequestQueue(void)
12031212
/* must hold CheckpointerCommLock in exclusive mode */
12041213
Assert(LWLockHeldByMe(CheckpointerCommLock));
12051214

1215+
/* Initialize skip_slot array */
1216+
skip_slot = palloc0(sizeof(bool) * CheckpointerShmem->num_requests);
1217+
12061218
/* Initialize temporary hash table */
12071219
MemSet(&ctl, 0, sizeof(ctl));
12081220
ctl.keysize = sizeof(CheckpointerRequest);
12091221
ctl.entrysize = sizeof(struct CheckpointerSlotMapping);
12101222
ctl.hash = tag_hash;
1223+
ctl.hcxt = CurrentMemoryContext;
1224+
12111225
htab = hash_create("CompactCheckpointerRequestQueue",
12121226
CheckpointerShmem->num_requests,
12131227
&ctl,
1214-
HASH_ELEM | HASH_FUNCTION);
1215-
1216-
/* Initialize skip_slot array */
1217-
skip_slot = palloc0(sizeof(bool) * CheckpointerShmem->num_requests);
1228+
HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
12181229

12191230
/*
12201231
* The basic idea here is that a request can be skipped if it's followed
@@ -1229,19 +1240,28 @@ CompactCheckpointerRequestQueue(void)
12291240
* anyhow), but it's not clear that the extra complexity would buy us
12301241
* anything.
12311242
*/
1232-
for (n = 0; n < CheckpointerShmem->num_requests; ++n)
1243+
for (n = 0; n < CheckpointerShmem->num_requests; n++)
12331244
{
12341245
CheckpointerRequest *request;
12351246
struct CheckpointerSlotMapping *slotmap;
12361247
bool found;
12371248

1249+
/*
1250+
* We use the request struct directly as a hashtable key. This
1251+
* assumes that any padding bytes in the structs are consistently the
1252+
* same, which should be okay because we zeroed them in
1253+
* CheckpointerShmemInit. Note also that RelFileNode had better
1254+
* contain no pad bytes.
1255+
*/
12381256
request = &CheckpointerShmem->requests[n];
12391257
slotmap = hash_search(htab, request, HASH_ENTER, &found);
12401258
if (found)
12411259
{
1260+
/* Duplicate, so mark the previous occurrence as skippable */
12421261
skip_slot[slotmap->slot] = true;
1243-
++num_skipped;
1262+
num_skipped++;
12441263
}
1264+
/* Remember slot containing latest occurrence of this request value */
12451265
slotmap->slot = n;
12461266
}
12471267

@@ -1256,7 +1276,8 @@ CompactCheckpointerRequestQueue(void)
12561276
}
12571277

12581278
/* We found some duplicates; remove them. */
1259-
for (n = 0, preserve_count = 0; n < CheckpointerShmem->num_requests; ++n)
1279+
preserve_count = 0;
1280+
for (n = 0; n < CheckpointerShmem->num_requests; n++)
12601281
{
12611282
if (skip_slot[n])
12621283
continue;

src/backend/storage/buffer/bufmgr.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2049,7 +2049,7 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
20492049
int i;
20502050

20512051
/* If it's a local relation, it's localbuf.c's problem. */
2052-
if (rnode.backend != InvalidBackendId)
2052+
if (RelFileNodeBackendIsTemp(rnode))
20532053
{
20542054
if (rnode.backend == MyBackendId)
20552055
DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
@@ -2103,7 +2103,7 @@ DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
21032103
int i;
21042104

21052105
/* If it's a local relation, it's localbuf.c's problem. */
2106-
if (rnode.backend != InvalidBackendId)
2106+
if (RelFileNodeBackendIsTemp(rnode))
21072107
{
21082108
if (rnode.backend == MyBackendId)
21092109
DropRelFileNodeAllLocalBuffers(rnode.node);

0 commit comments

Comments
 (0)