Skip to content

Commit 7acad95

Browse files
committed
Adopt the GNU convention for handling tar-archive members exceeding 8GB.
The POSIX standard for tar headers requires archive member sizes to be printed in octal with at most 11 digits, limiting the representable file size to 8GB. However, GNU tar and apparently most other modern tars support a convention in which oversized values can be stored in base-256, allowing any practical file to be a tar member. Adopt this convention to remove two limitations: * pg_dump with -Ft output format failed if the contents of any one table exceeded 8GB. * pg_basebackup failed if the data directory contained any file exceeding 8GB. (This would be a fatal problem for installations configured with a table segment size of 8GB or more, and it has also been seen to fail when large core dump files exist in the data directory.) File sizes under 8GB are still printed in octal, so that no compatibility issues are created except in cases that would have failed entirely before. In addition, this patch fixes several bugs in the same area: * In 9.3 and later, we'd defined tarCreateHeader's file-size argument as size_t, which meant that on 32-bit machines it would write a corrupt tar header for file sizes between 4GB and 8GB, even though no error was raised. This broke both "pg_dump -Ft" and pg_basebackup for such cases. * pg_restore from a tar archive would fail on tables of size between 4GB and 8GB, on machines where either "size_t" or "unsigned long" is 32 bits. This happened even with an archive file not affected by the previous bug. * pg_basebackup would fail if there were files of size between 4GB and 8GB, even on 64-bit machines. * In 9.3 and later, "pg_basebackup -Ft" failed entirely, for any file size, on 64-bit big-endian machines. In view of these potential data-loss bugs, back-patch to all supported branches, even though removal of the documented 8GB limit might otherwise be considered a new feature rather than a bug fix.
1 parent b29a40f commit 7acad95

File tree

6 files changed

+125
-116
lines changed

6 files changed

+125
-116
lines changed

doc/src/sgml/ref/pg_dump.sgml

+5-14
Original file line numberDiff line numberDiff line change
@@ -266,12 +266,12 @@ PostgreSQL documentation
266266
<listitem>
267267
<para>
268268
Output a <command>tar</command>-format archive suitable for input
269-
into <application>pg_restore</application>. The tar-format is
270-
compatible with the directory-format; extracting a tar-format
269+
into <application>pg_restore</application>. The tar format is
270+
compatible with the directory format: extracting a tar-format
271271
archive produces a valid directory-format archive.
272-
However, the tar-format does not support compression and has a
273-
limit of 8 GB on the size of individual tables. Also, the relative
274-
order of table data items cannot be changed during restore.
272+
However, the tar format does not support compression. Also, when
273+
using tar format the relative order of table data items cannot be
274+
changed during restore.
275275
</para>
276276
</listitem>
277277
</varlistentry>
@@ -1087,15 +1087,6 @@ CREATE DATABASE foo WITH TEMPLATE template0;
10871087
catalogs might be left in the wrong state.
10881088
</para>
10891089

1090-
<para>
1091-
Members of tar archives are limited to a size less than 8 GB.
1092-
(This is an inherent limitation of the tar file format.) Therefore
1093-
this format cannot be used if the textual representation of any one table
1094-
exceeds that size. The total size of a tar archive and any of the
1095-
other output formats is not limited, except possibly by the
1096-
operating system.
1097-
</para>
1098-
10991090
<para>
11001091
The dump file produced by <application>pg_dump</application>
11011092
does not contain the statistics used by the optimizer to make

src/backend/replication/basebackup.c

+1-17
Original file line numberDiff line numberDiff line change
@@ -748,7 +748,7 @@ SendBackupHeader(List *tablespaces)
748748
}
749749
else
750750
{
751-
Size len;
751+
Size len;
752752

753753
len = strlen(ti->oid);
754754
pq_sendint(&buf, len, 4);
@@ -1164,13 +1164,6 @@ sendDir(char *path, int basepathlen, bool sizeonly, List *tablespaces)
11641164
*/
11651165

11661166

1167-
/*
1168-
* Maximum file size for a tar member: The limit inherent in the
1169-
* format is 2^33-1 bytes (nearly 8 GB). But we don't want to exceed
1170-
* what we can represent in pgoff_t.
1171-
*/
1172-
#define MAX_TAR_MEMBER_FILELEN (((int64) 1 << Min(33, sizeof(pgoff_t)*8 - 1)) - 1)
1173-
11741167
/*
11751168
* Given the member, write the TAR header & send the file.
11761169
*
@@ -1199,15 +1192,6 @@ sendFile(char *readfilename, char *tarfilename, struct stat * statbuf,
11991192
errmsg("could not open file \"%s\": %m", readfilename)));
12001193
}
12011194

1202-
/*
1203-
* Some compilers will throw a warning knowing this test can never be true
1204-
* because pgoff_t can't exceed the compared maximum on their platform.
1205-
*/
1206-
if (statbuf->st_size > MAX_TAR_MEMBER_FILELEN)
1207-
ereport(ERROR,
1208-
(errmsg("archive member \"%s\" too large for tar format",
1209-
tarfilename)));
1210-
12111195
_tarWriteHeader(tarfilename, NULL, statbuf);
12121196

12131197
while ((cnt = fread(buf, 1, Min(sizeof(buf), statbuf->st_size - len), fp)) > 0)

src/bin/pg_basebackup/pg_basebackup.c

+5-15
Original file line numberDiff line numberDiff line change
@@ -777,7 +777,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
777777
bool in_tarhdr = true;
778778
bool skip_file = false;
779779
size_t tarhdrsz = 0;
780-
size_t filesz = 0;
780+
pgoff_t filesz = 0;
781781

782782
#ifdef HAVE_LIBZ
783783
gzFile ztarfile = NULL;
@@ -1042,7 +1042,7 @@ ReceiveTarFile(PGconn *conn, PGresult *res, int rownum)
10421042

10431043
skip_file = (strcmp(&tarhdr[0], "recovery.conf") == 0);
10441044

1045-
sscanf(&tarhdr[124], "%11o", (unsigned int *) &filesz);
1045+
filesz = read_tar_number(&tarhdr[124], 12);
10461046

10471047
padding = ((filesz + 511) & ~511) - filesz;
10481048
filesz += padding;
@@ -1135,7 +1135,7 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
11351135
char current_path[MAXPGPATH];
11361136
char filename[MAXPGPATH];
11371137
const char *mapped_tblspc_path;
1138-
int current_len_left;
1138+
pgoff_t current_len_left = 0;
11391139
int current_padding = 0;
11401140
bool basetablespace;
11411141
char *copybuf = NULL;
@@ -1204,20 +1204,10 @@ ReceiveAndUnpackTarFile(PGconn *conn, PGresult *res, int rownum)
12041204
}
12051205
totaldone += 512;
12061206

1207-
if (sscanf(copybuf + 124, "%11o", &current_len_left) != 1)
1208-
{
1209-
fprintf(stderr, _("%s: could not parse file size\n"),
1210-
progname);
1211-
disconnect_and_exit(1);
1212-
}
1207+
current_len_left = read_tar_number(&copybuf[124], 12);
12131208

12141209
/* Set permissions on the file */
1215-
if (sscanf(&copybuf[100], "%07o ", &filemode) != 1)
1216-
{
1217-
fprintf(stderr, _("%s: could not parse file mode\n"),
1218-
progname);
1219-
disconnect_and_exit(1);
1220-
}
1210+
filemode = read_tar_number(&copybuf[100], 8);
12211211

12221212
/*
12231213
* All files are padded up to 512 bytes

src/bin/pg_dump/pg_backup_tar.c

+20-30
Original file line numberDiff line numberDiff line change
@@ -79,13 +79,6 @@ typedef struct
7979
ArchiveHandle *AH;
8080
} TAR_MEMBER;
8181

82-
/*
83-
* Maximum file size for a tar member: The limit inherent in the
84-
* format is 2^33-1 bytes (nearly 8 GB). But we don't want to exceed
85-
* what we can represent in pgoff_t.
86-
*/
87-
#define MAX_TAR_MEMBER_FILELEN (((int64) 1 << Min(33, sizeof(pgoff_t)*8 - 1)) - 1)
88-
8982
typedef struct
9083
{
9184
int hasSeek;
@@ -1050,7 +1043,7 @@ isValidTarHeader(char *header)
10501043
int sum;
10511044
int chk = tarChecksum(header);
10521045

1053-
sscanf(&header[148], "%8o", &sum);
1046+
sum = read_tar_number(&header[148], 8);
10541047

10551048
if (sum != chk)
10561049
return false;
@@ -1092,13 +1085,6 @@ _tarAddFile(ArchiveHandle *AH, TAR_MEMBER *th)
10921085
strerror(errno));
10931086
fseeko(tmp, 0, SEEK_SET);
10941087

1095-
/*
1096-
* Some compilers will throw a warning knowing this test can never be true
1097-
* because pgoff_t can't exceed the compared maximum on their platform.
1098-
*/
1099-
if (th->fileLen > MAX_TAR_MEMBER_FILELEN)
1100-
exit_horribly(modulename, "archive member too large for tar format\n");
1101-
11021088
_tarWriteHeader(th);
11031089

11041090
while ((cnt = fread(buf, 1, sizeof(buf), tmp)) > 0)
@@ -1223,11 +1209,10 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
12231209
{
12241210
lclContext *ctx = (lclContext *) AH->formatData;
12251211
char h[512];
1226-
char tag[100];
1212+
char tag[100 + 1];
12271213
int sum,
12281214
chk;
1229-
size_t len;
1230-
unsigned long ullen;
1215+
pgoff_t len;
12311216
pgoff_t hPos;
12321217
bool gotBlock = false;
12331218

@@ -1250,7 +1235,7 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
12501235

12511236
/* Calc checksum */
12521237
chk = tarChecksum(h);
1253-
sscanf(&h[148], "%8o", &sum);
1238+
sum = read_tar_number(&h[148], 8);
12541239

12551240
/*
12561241
* If the checksum failed, see if it is a null block. If so, silently
@@ -1273,27 +1258,31 @@ _tarGetHeader(ArchiveHandle *AH, TAR_MEMBER *th)
12731258
}
12741259
}
12751260

1276-
sscanf(&h[0], "%99s", tag);
1277-
sscanf(&h[124], "%12lo", &ullen);
1278-
len = (size_t) ullen;
1261+
/* Name field is 100 bytes, might not be null-terminated */
1262+
strlcpy(tag, &h[0], 100 + 1);
1263+
1264+
len = read_tar_number(&h[124], 12);
12791265

12801266
{
1281-
char buf[100];
1267+
char posbuf[32];
1268+
char lenbuf[32];
12821269

1283-
snprintf(buf, sizeof(buf), INT64_FORMAT, (int64) hPos);
1284-
ahlog(AH, 3, "TOC Entry %s at %s (length %lu, checksum %d)\n",
1285-
tag, buf, (unsigned long) len, sum);
1270+
snprintf(posbuf, sizeof(posbuf), UINT64_FORMAT, (uint64) hPos);
1271+
snprintf(lenbuf, sizeof(lenbuf), UINT64_FORMAT, (uint64) len);
1272+
ahlog(AH, 3, "TOC Entry %s at %s (length %s, checksum %d)\n",
1273+
tag, posbuf, lenbuf, sum);
12861274
}
12871275

12881276
if (chk != sum)
12891277
{
1290-
char buf[100];
1278+
char posbuf[32];
12911279

1292-
snprintf(buf, sizeof(buf), INT64_FORMAT, (int64) ftello(ctx->tarFH));
1280+
snprintf(posbuf, sizeof(posbuf), UINT64_FORMAT,
1281+
(uint64) ftello(ctx->tarFH));
12931282
exit_horribly(modulename,
12941283
"corrupt tar header found in %s "
12951284
"(expected %d, computed %d) file position %s\n",
1296-
tag, sum, chk, buf);
1285+
tag, sum, chk, posbuf);
12971286
}
12981287

12991288
th->targetFile = pg_strdup(tag);
@@ -1308,7 +1297,8 @@ _tarWriteHeader(TAR_MEMBER *th)
13081297
{
13091298
char h[512];
13101299

1311-
tarCreateHeader(h, th->targetFile, NULL, th->fileLen, 0600, 04000, 02000, time(NULL));
1300+
tarCreateHeader(h, th->targetFile, NULL, th->fileLen,
1301+
0600, 04000, 02000, time(NULL));
13121302

13131303
/* Now write the completed header. */
13141304
if (fwrite(h, 1, 512, th->tarFH) != 512)

src/include/pgtar.h

+3-1
Original file line numberDiff line numberDiff line change
@@ -11,5 +11,7 @@
1111
*
1212
*-------------------------------------------------------------------------
1313
*/
14-
extern void tarCreateHeader(char *h, const char *filename, const char *linktarget, size_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
14+
extern void tarCreateHeader(char *h, const char *filename, const char *linktarget,
15+
pgoff_t size, mode_t mode, uid_t uid, gid_t gid, time_t mtime);
16+
extern uint64 read_tar_number(const char *s, int len);
1517
extern int tarChecksum(char *header);

0 commit comments

Comments
 (0)