-
-
Notifications
You must be signed in to change notification settings - Fork 31.8k
bpo-45863: tarfile: don't zero out header fields unnecessarily #29693
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -1899,6 +1899,61 @@ def test_pax_extended_header(self): | |
finally: | ||
tar.close() | ||
|
||
def test_create_pax_header(self): | ||
# The ustar header should contain values that can be | ||
# represented reasonably, even if a better (e.g. higher | ||
# precision) version is set in the pax header. | ||
# Issue #45863 | ||
|
||
# values that should be kept | ||
t = tarfile.TarInfo() | ||
t.name = "foo" | ||
t.mtime = 1000.1 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Maybe to avoid any rounding issue in the test, we could use
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can make that change if you want, but there are going to be quite a few real-world timestamps that don't have an exact representation in floating point. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. And come to think of it, 1000.5 has the disadvantage of being exactly halfway between two integers, so the result of rounding it changes depending on the prevailing rounding strategy. I think All in all, switching to fixed-point would probably be a good idea. But that's again rather out of scope for what I'm trying to accomplish here. :) |
||
t.size = 100 | ||
t.uid = 123 | ||
t.gid = 124 | ||
info = t.get_info() | ||
header = t.create_pax_header(info, encoding="iso8859-1") | ||
self.assertEqual(info['name'], "foo") | ||
# mtime should be rounded to nearest second | ||
self.assertIsInstance(info['mtime'], int) | ||
self.assertEqual(info['mtime'], 1000) | ||
jmroot marked this conversation as resolved.
Show resolved
Hide resolved
|
||
self.assertEqual(info['size'], 100) | ||
self.assertEqual(info['uid'], 123) | ||
self.assertEqual(info['gid'], 124) | ||
self.assertEqual(header, | ||
b'././@PaxHeader' + bytes(86) \ | ||
+ b'0000000\x000000000\x000000000\x0000000000020\x0000000000000\x00010205\x00 x' \ | ||
+ bytes(100) + b'ustar\x0000'+ bytes(247) \ | ||
+ b'16 mtime=1000.1\n' + bytes(496) + b'foo' + bytes(97) \ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. info contains mtime rounded to an int, but in the binary header, mtime can be a floatting point number? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Correct. The mtime is rounded in the info after calling |
||
+ b'0000644\x000000173\x000000174\x0000000000144\x0000000001750\x00006516\x00 0' \ | ||
+ bytes(100) + b'ustar\x0000' + bytes(247)) | ||
|
||
# values that should be changed | ||
t = tarfile.TarInfo() | ||
t.name = "foo\u3374" # can't be represented in ascii | ||
t.mtime = 10**10 # too big | ||
t.size = 10**10 # too big | ||
t.uid = 8**8 # too big | ||
t.gid = 8**8+1 # too big | ||
info = t.get_info() | ||
header = t.create_pax_header(info, encoding="iso8859-1") | ||
# name is kept as-is in info but should be added to pax header | ||
self.assertEqual(info['name'], "foo\u3374") | ||
self.assertEqual(info['mtime'], 0) | ||
self.assertEqual(info['size'], 0) | ||
self.assertEqual(info['uid'], 0) | ||
self.assertEqual(info['gid'], 0) | ||
self.assertEqual(header, | ||
b'././@PaxHeader' + bytes(86) \ | ||
+ b'0000000\x000000000\x000000000\x0000000000130\x0000000000000\x00010207\x00 x' \ | ||
+ bytes(100) + b'ustar\x0000' + bytes(247) \ | ||
+ b'15 path=foo\xe3\x8d\xb4\n16 uid=16777216\n' \ | ||
+ b'16 gid=16777217\n20 size=10000000000\n' \ | ||
+ b'21 mtime=10000000000\n'+ bytes(424) + b'foo?' + bytes(96) \ | ||
+ b'0000644\x000000000\x000000000\x0000000000000\x0000000000000\x00006540\x00 0' \ | ||
+ bytes(100) + b'ustar\x0000' + bytes(247)) | ||
|
||
|
||
class UnicodeTest: | ||
|
||
|
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1 @@ | ||
When the :mod:`tarfile` module creates a pax format archive, it will put an integer representation of timestamps in the ustar header (if possible) for the benefit of older unarchivers, in addition to the existing full-precision timestamps in the pax extended header. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does PAX have a limit for the number of digits after the dot for a floating point number?
For example, mtime=1/7 is formatted as
0.14285714285714285
. The number0.142,857,142,857,142,85
has 17 digits after the dot. A resolution of 10^-17 seconds, whereas the most accurate clocks have a resolution of 1 ns (10^-9) on Linux.I get:
mtime=1644325033.182744228
atime=1644325033.182744228
ctime=1644325033.182744228
9 digits after the dot for mtime, atime and ctime.
Maybe for these 9 header names, round to 9 digits if there are more than 9 digits?
Maybe it's ok to have more digits. I don't know.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The standard doesn't specify a maximum number of digits. It suggests to use enough digits to allow the timestamp to be reconstructed exactly (if extracting on a system with the same timestamp precision as the originating system). Most implementations use nanosecond precision.
Doing additional rounding of the value in the pax extended header would be a behavioural change beyond what I'm setting out to do here. I suspect any potential issues with using more digits than an extractor can handle would be rare anyway, since you would have to deliberately use a made-up number rather than just dividing
st_mtime_ns
by 10^9.