Skip to content

Commit 73b796a

Browse files
committed
Improve coding around the fsync request queue.
In all branches back to 8.3, this patch fixes a questionable assumption in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue that there are no uninitialized pad bytes in the request queue structs. This would only cause trouble if (a) there were such pad bytes, which could happen in 8.4 and up if the compiler makes enum ForkNumber narrower than 32 bits, but otherwise would require not-currently-planned changes in the widths of other typedefs; and (b) the kernel has not uniformly initialized the contents of shared memory to zeroes. Still, it seems a tad risky, and we can easily remove any risk by pre-zeroing the request array for ourselves. In addition to that, we need to establish a coding rule that struct RelFileNode can't contain any padding bytes, since such structs are copied into the request array verbatim. (There are other places that are assuming this anyway, it turns out.) In 9.1 and up, the risk was a bit larger because we were also effectively assuming that struct RelFileNodeBackend contained no pad bytes, and with fields of different types in there, that would be much easier to break. However, there is no good reason to ever transmit fsync or delete requests for temp files to the bgwriter/checkpointer, so we can revert the request structs to plain RelFileNode, getting rid of the padding risk and saving some marginal number of bytes and cycles in fsync queue manipulation while we are at it. The savings might be more than marginal during deletion of a temp relation, because the old code transmitted an entirely useless but nonetheless expensive-to-process ForgetRelationFsync request to the background process, and also had the background process perform the file deletion even though that can safely be done immediately. In addition, make some cleanup of nearby comments and small improvements to the code in CompactCheckpointerRequestQueue/CompactBgwriterRequestQueue.
1 parent 71f2dd2 commit 73b796a

File tree

6 files changed

+109
-58
lines changed

6 files changed

+109
-58
lines changed

src/backend/postmaster/checkpointer.c

Lines changed: 36 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -105,7 +105,7 @@
105105
*/
106106
typedef struct
107107
{
108-
RelFileNodeBackend rnode;
108+
RelFileNode rnode;
109109
ForkNumber forknum;
110110
BlockNumber segno; /* see md.c for special values */
111111
/* might add a real request-type field later; not needed yet */
@@ -924,17 +924,22 @@ CheckpointerShmemSize(void)
924924
void
925925
CheckpointerShmemInit(void)
926926
{
927+
Size size = CheckpointerShmemSize();
927928
bool found;
928929

929930
CheckpointerShmem = (CheckpointerShmemStruct *)
930931
ShmemInitStruct("Checkpointer Data",
931-
CheckpointerShmemSize(),
932+
size,
932933
&found);
933934

934935
if (!found)
935936
{
936-
/* First time through, so initialize */
937-
MemSet(CheckpointerShmem, 0, sizeof(CheckpointerShmemStruct));
937+
/*
938+
* First time through, so initialize. Note that we zero the whole
939+
* requests array; this is so that CompactCheckpointerRequestQueue
940+
* can assume that any pad bytes in the request structs are zeroes.
941+
*/
942+
MemSet(CheckpointerShmem, 0, size);
938943
SpinLockInit(&CheckpointerShmem->ckpt_lck);
939944
CheckpointerShmem->max_requests = NBuffers;
940945
}
@@ -1091,11 +1096,15 @@ RequestCheckpoint(int flags)
10911096
* Forward a file-fsync request from a backend to the checkpointer
10921097
*
10931098
* Whenever a backend is compelled to write directly to a relation
1094-
* (which should be seldom, if the checkpointer is getting its job done),
1099+
* (which should be seldom, if the background writer is getting its job done),
10951100
* the backend calls this routine to pass over knowledge that the relation
10961101
* is dirty and must be fsync'd before next checkpoint. We also use this
10971102
* opportunity to count such writes for statistical purposes.
10981103
*
1104+
* This functionality is only supported for regular (not backend-local)
1105+
* relations, so the rnode argument is intentionally RelFileNode not
1106+
* RelFileNodeBackend.
1107+
*
10991108
* segno specifies which segment (not block!) of the relation needs to be
11001109
* fsync'd. (Since the valid range is much less than BlockNumber, we can
11011110
* use high values for special flags; that's all internal to md.c, which
@@ -1112,8 +1121,7 @@ RequestCheckpoint(int flags)
11121121
* let the backend know by returning false.
11131122
*/
11141123
bool
1115-
ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
1116-
BlockNumber segno)
1124+
ForwardFsyncRequest(RelFileNode rnode, ForkNumber forknum, BlockNumber segno)
11171125
{
11181126
CheckpointerRequest *request;
11191127
bool too_full;
@@ -1169,6 +1177,7 @@ ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
11691177
/*
11701178
* CompactCheckpointerRequestQueue
11711179
* Remove duplicates from the request queue to avoid backend fsyncs.
1180+
* Returns "true" if any entries were removed.
11721181
*
11731182
* Although a full fsync request queue is not common, it can lead to severe
11741183
* performance problems when it does happen. So far, this situation has
@@ -1178,7 +1187,7 @@ ForwardFsyncRequest(RelFileNodeBackend rnode, ForkNumber forknum,
11781187
* gets very expensive and can slow down the whole system.
11791188
*
11801189
* Trying to do this every time the queue is full could lose if there
1181-
* aren't any removable entries. But should be vanishingly rare in
1190+
* aren't any removable entries. But that should be vanishingly rare in
11821191
* practice: there's one queue entry per shared buffer.
11831192
*/
11841193
static bool
@@ -1200,18 +1209,20 @@ CompactCheckpointerRequestQueue(void)
12001209
/* must hold CheckpointerCommLock in exclusive mode */
12011210
Assert(LWLockHeldByMe(CheckpointerCommLock));
12021211

1212+
/* Initialize skip_slot array */
1213+
skip_slot = palloc0(sizeof(bool) * CheckpointerShmem->num_requests);
1214+
12031215
/* Initialize temporary hash table */
12041216
MemSet(&ctl, 0, sizeof(ctl));
12051217
ctl.keysize = sizeof(CheckpointerRequest);
12061218
ctl.entrysize = sizeof(struct CheckpointerSlotMapping);
12071219
ctl.hash = tag_hash;
1220+
ctl.hcxt = CurrentMemoryContext;
1221+
12081222
htab = hash_create("CompactCheckpointerRequestQueue",
12091223
CheckpointerShmem->num_requests,
12101224
&ctl,
1211-
HASH_ELEM | HASH_FUNCTION);
1212-
1213-
/* Initialize skip_slot array */
1214-
skip_slot = palloc0(sizeof(bool) * CheckpointerShmem->num_requests);
1225+
HASH_ELEM | HASH_FUNCTION | HASH_CONTEXT);
12151226

12161227
/*
12171228
* The basic idea here is that a request can be skipped if it's followed
@@ -1226,19 +1237,28 @@ CompactCheckpointerRequestQueue(void)
12261237
* anyhow), but it's not clear that the extra complexity would buy us
12271238
* anything.
12281239
*/
1229-
for (n = 0; n < CheckpointerShmem->num_requests; ++n)
1240+
for (n = 0; n < CheckpointerShmem->num_requests; n++)
12301241
{
12311242
CheckpointerRequest *request;
12321243
struct CheckpointerSlotMapping *slotmap;
12331244
bool found;
12341245

1246+
/*
1247+
* We use the request struct directly as a hashtable key. This
1248+
* assumes that any padding bytes in the structs are consistently the
1249+
* same, which should be okay because we zeroed them in
1250+
* CheckpointerShmemInit. Note also that RelFileNode had better
1251+
* contain no pad bytes.
1252+
*/
12351253
request = &CheckpointerShmem->requests[n];
12361254
slotmap = hash_search(htab, request, HASH_ENTER, &found);
12371255
if (found)
12381256
{
1257+
/* Duplicate, so mark the previous occurrence as skippable */
12391258
skip_slot[slotmap->slot] = true;
1240-
++num_skipped;
1259+
num_skipped++;
12411260
}
1261+
/* Remember slot containing latest occurrence of this request value */
12421262
slotmap->slot = n;
12431263
}
12441264

@@ -1253,7 +1273,8 @@ CompactCheckpointerRequestQueue(void)
12531273
}
12541274

12551275
/* We found some duplicates; remove them. */
1256-
for (n = 0, preserve_count = 0; n < CheckpointerShmem->num_requests; ++n)
1276+
preserve_count = 0;
1277+
for (n = 0; n < CheckpointerShmem->num_requests; n++)
12571278
{
12581279
if (skip_slot[n])
12591280
continue;

src/backend/storage/buffer/bufmgr.c

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -2049,7 +2049,7 @@ DropRelFileNodeBuffers(RelFileNodeBackend rnode, ForkNumber forkNum,
20492049
int i;
20502050

20512051
/* If it's a local relation, it's localbuf.c's problem. */
2052-
if (rnode.backend != InvalidBackendId)
2052+
if (RelFileNodeBackendIsTemp(rnode))
20532053
{
20542054
if (rnode.backend == MyBackendId)
20552055
DropRelFileNodeLocalBuffers(rnode.node, forkNum, firstDelBlock);
@@ -2103,7 +2103,7 @@ DropRelFileNodeAllBuffers(RelFileNodeBackend rnode)
21032103
int i;
21042104

21052105
/* If it's a local relation, it's localbuf.c's problem. */
2106-
if (rnode.backend != InvalidBackendId)
2106+
if (RelFileNodeBackendIsTemp(rnode))
21072107
{
21082108
if (rnode.backend == MyBackendId)
21092109
DropRelFileNodeAllLocalBuffers(rnode.node);

0 commit comments

Comments
 (0)