Skip to content

Commit 00066aa

Browse files
committed
md: Add comment & assert to buffer-zeroing path in md[start]readv()
mdreadv() has a codepath to zero out buffers when a read returns zero bytes, guarded by a check for zero_damaged_pages || InRecovery. The InRecovery codepath to zero out buffers in mdreadv() appears to be unreachable. The only known paths to reach mdreadv()/mdstartreadv() in recovery are XLogReadBufferExtended(), vm_readbuf(), and fsm_readbuf(), each of which takes care to extend the relation if necessary. This looks to either have been the case for a long time, or the code was never reachable. The zero_damaged_pages path is incomplete, as missing segments are not created. Putting blocks into the buffer-pool that do not exist on disk is rather problematic, as such blocks will, at least initially, not be found by scans that rely on smgrnblocks(), as they are beyond EOF. It also can cause weird problems with relation extension, as relation extension does not expect blocks beyond EOF to exist. Therefore we would like to remove that path. mdstartreadv(), which I added in e5fe570, does not implement this zeroing logic. I had started a discussion about that a while ago (linked below), but forgot to act on the conclusion of the discussion, namely to disable the in-memory-zeroing behavior. We could certainly implement equivalent zeroing logic in mdstartreadv(), but it would have to be more complicated due to potential differences in the zero_damaged_pages setting between the definer and completor of IO. Given that we want to remove the logic, that does not seem worth implementing the necessary logic. For now, put an Assert(false) and comments documenting this choice into mdreadv() and comments documenting the deprecation of the path in mdreadv() and the non-implementation of it in mdstartreadv(). If we, during testing, discover that we do need the path, we can implement it at that time. Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/postgr.es/m/20250330024513.ac.nmisch@google.com Discussion: https://postgr.es/m/postgr.es/m/3qxxsnciyffyf3wyguiz4besdp5t5uxvv3utg75cbcszojlz7p@uibfzmnukkbd
1 parent 93bc3d7 commit 00066aa

File tree

1 file changed

+28
-0
lines changed
  • src/backend/storage/smgr

1 file changed

+28
-0
lines changed

src/backend/storage/smgr/md.c

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -910,9 +910,30 @@ mdreadv(SMgrRelation reln, ForkNumber forknum, BlockNumber blocknum,
910910
* is ON or we are InRecovery, we should instead return zeroes
911911
* without complaining. This allows, for example, the case of
912912
* trying to update a block that was later truncated away.
913+
*
914+
* NB: We think that this codepath is unreachable in recovery
915+
* and incomplete with zero_damaged_pages, as missing segments
916+
* are not created. Putting blocks into the buffer-pool that
917+
* do not exist on disk is rather problematic, as it will not
918+
* be found by scans that rely on smgrnblocks(), as they are
919+
* beyond EOF. It also can cause weird problems with relation
920+
* extension, as relation extension does not expect blocks
921+
* beyond EOF to exist.
922+
*
923+
* Therefore we do not want to copy the logic into
924+
* mdstartreadv(), where it would have to be more complicated
925+
* due to potential differences in the zero_damaged_pages
926+
* setting between the definer and completor of IO.
927+
*
928+
* For PG 18, we are putting an Assert(false) in mdreadv()
929+
* (triggering failures in assertion-enabled builds, but
930+
* continuing to work in production builds). Afterwards we
931+
* plan to remove this code entirely.
913932
*/
914933
if (zero_damaged_pages || InRecovery)
915934
{
935+
Assert(false); /* see comment above */
936+
916937
for (BlockNumber i = transferred_this_segment / BLCKSZ;
917938
i < nblocks_this_segment;
918939
++i)
@@ -1007,6 +1028,13 @@ mdstartreadv(PgAioHandle *ioh,
10071028
/*
10081029
* The error checks corresponding to the post-read checks in mdreadv() are
10091030
* in md_readv_complete().
1031+
*
1032+
* However we chose, at least for now, to not implement the
1033+
* zero_damaged_pages logic present in mdreadv(). As outlined in mdreadv()
1034+
* that logic is rather problematic, and we want to get rid of it. Here
1035+
* equivalent logic would have to be more complicated due to potential
1036+
* differences in the zero_damaged_pages setting between the definer and
1037+
* completor of IO.
10101038
*/
10111039
}
10121040

0 commit comments

Comments
 (0)