Skip to content

Commit 13a8924

Browse files
committed
Increase distance between flush requests during bulk file copies.
copy_file() reads and writes data 64KB at a time (with default BLCKSZ), and historically has issued a pg_flush_data request after each write. This turns out to interact really badly with macOS's new APFS file system: a large file copy takes over 100X longer than it ought to on APFS, as reported by Brent Dearth. While that's arguably a macOS bug, it's not clear whether Apple will do anything about it in the near future, and in any case experimentation suggests that issuing flushes a bit less often can be helpful on other platforms too. Hence, rearrange the logic in copy_file() so that flush requests are issued once per N writes rather than every time through the loop. I set the FLUSH_DISTANCE to 32MB on macOS (any less than that still results in a noticeable speed degradation on APFS), but 1MB elsewhere. In limited testing on Linux and FreeBSD, this seems slightly faster than the previous code, and certainly no worse. It helps noticeably on macOS even with the older HFS filesystem. A simpler change would have been to just increase the size of the copy buffer without changing the loop logic, but that seems likely to trash the processor cache without really helping much. Back-patch to 9.6 where we introduced msync() as an implementation option for pg_flush_data(). The problem seems specific to APFS's mmap/msync support, so I don't think we need to go further back. Discussion: https://postgr.es/m/CADkxhTNv-j2jw2g8H57deMeAbfRgYBoLmVuXkC=YCFBXRuCOww@mail.gmail.com
1 parent 185279d commit 13a8924

File tree

1 file changed

+30
-8
lines changed

1 file changed

+30
-8
lines changed

src/backend/storage/file/copydir.c

Lines changed: 30 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -139,10 +139,24 @@ copy_file(char *fromfile, char *tofile)
139139
int dstfd;
140140
int nbytes;
141141
off_t offset;
142+
off_t flush_offset;
142143

143-
/* Use palloc to ensure we get a maxaligned buffer */
144+
/* Size of copy buffer (read and write requests) */
144145
#define COPY_BUF_SIZE (8 * BLCKSZ)
145146

147+
/*
148+
* Size of data flush requests. It seems beneficial on most platforms to
149+
* do this every 1MB or so. But macOS, at least with early releases of
150+
* APFS, is really unfriendly to small mmap/msync requests, so there do it
151+
* only every 32MB.
152+
*/
153+
#if defined(__darwin__)
154+
#define FLUSH_DISTANCE (32 * 1024 * 1024)
155+
#else
156+
#define FLUSH_DISTANCE (1024 * 1024)
157+
#endif
158+
159+
/* Use palloc to ensure we get a maxaligned buffer */
146160
buffer = palloc(COPY_BUF_SIZE);
147161

148162
/*
@@ -164,11 +178,23 @@ copy_file(char *fromfile, char *tofile)
164178
/*
165179
* Do the data copying.
166180
*/
181+
flush_offset = 0;
167182
for (offset = 0;; offset += nbytes)
168183
{
169184
/* If we got a cancel signal during the copy of the file, quit */
170185
CHECK_FOR_INTERRUPTS();
171186

187+
/*
188+
* We fsync the files later, but during the copy, flush them every so
189+
* often to avoid spamming the cache and hopefully get the kernel to
190+
* start writing them out before the fsync comes.
191+
*/
192+
if (offset - flush_offset >= FLUSH_DISTANCE)
193+
{
194+
pg_flush_data(dstfd, flush_offset, offset - flush_offset);
195+
flush_offset = offset;
196+
}
197+
172198
nbytes = read(srcfd, buffer, COPY_BUF_SIZE);
173199
if (nbytes < 0)
174200
ereport(ERROR,
@@ -186,15 +212,11 @@ copy_file(char *fromfile, char *tofile)
186212
(errcode_for_file_access(),
187213
errmsg("could not write to file \"%s\": %m", tofile)));
188214
}
189-
190-
/*
191-
* We fsync the files later but first flush them to avoid spamming the
192-
* cache and hopefully get the kernel to start writing them out before
193-
* the fsync comes.
194-
*/
195-
pg_flush_data(dstfd, offset, nbytes);
196215
}
197216

217+
if (offset > flush_offset)
218+
pg_flush_data(dstfd, flush_offset, offset - flush_offset);
219+
198220
if (CloseTransientFile(dstfd))
199221
ereport(ERROR,
200222
(errcode_for_file_access(),

0 commit comments

Comments
 (0)