Skip to content

Commit 26a79cb

Browse files
committed
Replace durable_rename_excl() by durable_rename(), take two
durable_rename_excl() attempts to avoid overwriting any existing files by using link() and unlink(), and it falls back to rename() on some platforms (aka WIN32), which offers no such overwrite protection. Most callers use durable_rename_excl() just in case there is an existing file, but in practice there shouldn't be one (see below for more details). Furthermore, failures during durable_rename_excl() can result in multiple hard links to the same file. As per Nathan's tests, it is possible to end up with two links to the same file in pg_wal after a crash just before unlink() during WAL recycling. Specifically, the test produced links to the same file for the current WAL file and the next one because the half-recycled WAL file was re-recycled upon restarting, leading to WAL corruption. This change replaces all the calls of durable_rename_excl() to durable_rename(). This removes the protection against accidentally overwriting an existing file, but some platforms are already living without it and ordinarily there shouldn't be one. The function itself is left around in case any extensions are using it. It will be removed on HEAD via a follow-up commit. Here is a summary of the existing callers of durable_rename_excl() (see second discussion link at the bottom), replaced by this commit. First, basic_archive used it to avoid overwriting an archive concurrently created by another server, but as mentioned above, it will still overwrite files on some platforms. Second, xlog.c uses it to recycle past WAL segments, where an overwrite should not happen (origin of the change at f0e37a8) because there are protections about the WAL segment to select when recycling an entry. The third and last area is related to the write of timeline history files. writeTimeLineHistory() will write a new timeline history file at the end of recovery on promotion, so there should be no such files for the same timeline. What remains is writeTimeLineHistoryFile(), that can be used in parallel by a WAL receiver and the startup process, and some digging of the buildfarm shows that EEXIST from a WAL receiver can happen with an error of "could not link file \"pg_wal/xlogtemp.NN\" to \"pg_wal/MM.history\", which would cause an automatic restart of the WAL receiver as it is promoted to FATAL, hence this should improve the stability of the WAL receiver as rename() would overwrite an existing TLI history file already fetched by the startup process at recovery. This is the second time this change is attempted, ccfbd92 being the first one, but this time no assertions are added for the case of a TLI history file written concurrently by the WAL receiver or the startup process because we can expect one to exist (some of the TAP tests are able to trigger with a proper timing). This commit has been originally applied on v16~ as of dac1ff3, and we have received more reports of this issue, where clusters can become corrupted at replay in older stable branches with multiple links pointing to the same physical WAL segment file. This backpatch addresses the problem for the v13~v15 range. Author: Nathan Bossart Reviewed-by: Robert Haas, Kyotaro Horiguchi, Michael Paquier Discussion: https://postgr.es/m/20220407182954.GA1231544@nathanxps13 Discussion: https://postgr.es/m/Ym6GZbqQdlalSKSG@paquier.xyz Discussion: https://postgr.es/m/CAJhEC04tBkYPF4q2uS_rCytauvNEVqdBAzasBEokfceFhF=KDQ@mail.gmail.com
1 parent 2c7887c commit 26a79cb

File tree

2 files changed

+8
-19
lines changed

2 files changed

+8
-19
lines changed

src/backend/access/transam/timeline.c

+5-13
Original file line numberDiff line numberDiff line change
@@ -441,12 +441,8 @@ writeTimeLineHistory(TimeLineID newTLI, TimeLineID parentTLI,
441441
* Now move the completed history file into place with its final name.
442442
*/
443443
TLHistoryFilePath(path, newTLI);
444-
445-
/*
446-
* Perform the rename using link if available, paranoidly trying to avoid
447-
* overwriting an existing file (there shouldn't be one).
448-
*/
449-
durable_rename_excl(tmppath, path, ERROR);
444+
Assert(access(path, F_OK) != 0 && errno == ENOENT);
445+
durable_rename(tmppath, path, ERROR);
450446

451447
/* The history file can be archived immediately. */
452448
if (XLogArchivingActive())
@@ -516,15 +512,11 @@ writeTimeLineHistoryFile(TimeLineID tli, char *content, int size)
516512
errmsg("could not close file \"%s\": %m", tmppath)));
517513

518514
/*
519-
* Now move the completed history file into place with its final name.
515+
* Now move the completed history file into place with its final name,
516+
* replacing any existing file with the same name.
520517
*/
521518
TLHistoryFilePath(path, tli);
522-
523-
/*
524-
* Perform the rename using link if available, paranoidly trying to avoid
525-
* overwriting an existing file (there shouldn't be one).
526-
*/
527-
durable_rename_excl(tmppath, path, ERROR);
519+
durable_rename(tmppath, path, ERROR);
528520
}
529521

530522
/*

src/backend/access/transam/xlog.c

+3-6
Original file line numberDiff line numberDiff line change
@@ -3648,15 +3648,12 @@ InstallXLogFileSegment(XLogSegNo *segno, char *tmppath,
36483648
}
36493649
}
36503650

3651-
/*
3652-
* Perform the rename using link if available, paranoidly trying to avoid
3653-
* overwriting an existing file (there shouldn't be one).
3654-
*/
3655-
if (durable_rename_excl(tmppath, path, LOG) != 0)
3651+
Assert(access(path, F_OK) != 0 && errno == ENOENT);
3652+
if (durable_rename(tmppath, path, LOG) != 0)
36563653
{
36573654
if (use_lock)
36583655
LWLockRelease(ControlFileLock);
3659-
/* durable_rename_excl already emitted log message */
3656+
/* durable_rename already emitted log message */
36603657
return false;
36613658
}
36623659

0 commit comments

Comments
 (0)