Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

phar:// tar parser and zero-length file header blocks #16695

Closed
hakre opened this issue Nov 4, 2024 · 12 comments
Closed

phar:// tar parser and zero-length file header blocks #16695

hakre opened this issue Nov 4, 2024 · 12 comments

Comments

@hakre
Copy link
Contributor

hakre commented Nov 4, 2024

Description

The following code:

<?php

$reportTar = 'report.tar';

// header block of a file with 0 bytes, from an example "report.tar"
// $ tar -t -vf report.tar
// drwx------ 0/0               0 1970-01-01 01:00 tls
$length = file_put_contents($reportTar, base64_decode('dGxzAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAwMDA3MDAAMDAwMDAwMAAwMDAwMDAwADAwMDAwMDAwMDAwADAwMDAwMDAwMDAwADAwNzcxNgAgNQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB1c3RhcgAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwMDAwMDAwADAwMDAwMDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA='));
assert(512 === $length);
$buffer = file_get_contents("phar://$reportTar/tls");
# assert('' === $buffer);

Resulted in this output:

$ <report.php php8.4
PHP Warning:  file_get_contents(phar://report.tar/tls): Failed to open stream: internal corruption of phar "/home/user/report/report.tar" (__HALT_COMPILER(); not found) in Standard input code on line 10

But I expected no diagnostic messages to output instead.

Additional Information:

If there is a file with length in the tar archive beforehand, a different diagnostic message is given:

PHP Warning:  file_get_contents(phar://report.tar/meta.json): Failed to open stream: phar error: "/home/user/report/report.tar" is a corrupted tar file (truncated)

We believe in good faith that the file is NOT truncated and therefore the diagnostic message is wrong. That is, because if we add 512 NUL bytes to the file, the meta.json file contents can be properly obtained.

And then conclude by it, that the more minimal reproducer yields a different diagnostic.

Therefore here the original reproducer as a self-contained php script:

<?php

$reportTar = 'report.tar';

// header block of a file with 0 bytes, from an example "report.tar"
// $ tar -t -vf report.tar
// -rw-r--r-- 0/0             122 1970-01-01 01:00 meta.json
// drwx------ 0/0               0 1970-01-01 01:00 tls
$length = file_put_contents($reportTar, base64_decode('bWV0YS5qc29uAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAwMDA2NDQAMDAwMDAwMAAwMDAwMDAwADAwMDAwMDAwMTcyADAwMDAwMDAwMDAwADAxMTAyNgAgMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB1c3RhcgAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwMDAwMDAwADAwMDAwMDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB7Ik5hbWUiOiJkZWZhdWx0IiwiTWV0YWRhdGEiOnt9LCJFbmRwb2ludHMiOnsiZG9ja2VyIjp7Ikhvc3QiOiJ1bml4Oi8vL3J1bi91c2VyLzEwMDAvZG9ja2VyLnNvY2siLCJTa2lwVExTVmVyaWZ5IjpmYWxzZX19fQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAHRscwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwMDAwNzAwADAwMDAwMDAAMDAwMDAwMAAwMDAwMDAwMDAwMAAwMDAwMDAwMDAwMAAwMDc3MTYAIDUAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAdXN0YXIAMDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAMDAwMDAwMAAwMDAwMDAwAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA'));
assert(3*512 === $length);

$buffer = file_get_contents("phar://$reportTar/meta.json");
assert(false === $buffer, 'assert the false negative');

// appending 512 NUL bytes aides the phar stream wrapper to handle the file
$length = file_put_contents($reportTar, str_repeat("\x00", 512), FILE_APPEND);
assert(512 === $length);

$buffer = file_get_contents("phar://$reportTar/meta.json");
assert('{"Name":"default","Metadata":{},"Endpoints":{"docker":{"Host":"unix:///run/user/1000/docker.sock","SkipTLSVerify":false}}}' === $buffer);

Resulted in this output:

$ <report.php php8.4
PHP Warning:  file_get_contents(phar://report.tar/meta.json): Failed to open stream: phar error: "/home/user/report/report.tar" is a corrupted tar file (truncated) in Standard input code on line 12

Command the tar file has been created with (before base64 encoding it for the report):

$ rm -f -- report.tar
$ docker context export "${DOCKER_CONTEXT:-default}" report.tar
$ ls -l report.tar
-rw------- 1 user user 1536 Nov  4 09:26 report.tar
$ tar -vt -f report.tar
-rw-r--r-- 0/0             122 1970-01-01 01:00 meta.json
drwx------ 0/0               0 1970-01-01 01:00 tls

As the tar file format is named ustar, and the tar writer yields an object of 3*512 / 1536 bytes (header, file data, header), and the GNU Tar info pages suggest not to write file data if there is none, and the two terminating 512 blocks of NUL chars are not an error ref, and the file is being read when adding only 512 NUL bytes, we'd like to report the original diagnostic message:

corrupted tar file (truncated)

as we believe the file is not corrupted nor truncated. It is our understanding, that it only contains a zero-length file which suffices to have the file header block alone and must not be preceded by a data block consisting of only NUL chars.

PHP Version

PHP 8.4.0RC3 (and also: 8.3.13, 8.2.25, 8.1.30, 8.0.30, ... 5.3.29)

Operating System

$ lsb_release -a
LSB Version:	core-11.1.0ubuntu4-noarch:printing-11.1.0ubuntu4-noarch:security-11.1.0ubuntu4-noarch
Distributor ID:	Ubuntu
Description:	Ubuntu 22.04.5 LTS
Release:	22.04
Codename:	jammy
@hakre
Copy link
Contributor Author

hakre commented Nov 4, 2024

It looks like that if not at least a header sized (512 bytes) buf block can be read, the diagnostic message is triggered and failure enforced at the very end of the do/while loop:

php-src/ext/phar/tar.c

Lines 594 to 606 in f37fd7f

read = php_stream_read(fp, buf, sizeof(buf));
if (read != sizeof(buf)) {
if (error) {
spprintf(error, 4096, "phar error: \"%s\" is a corrupted tar file (truncated)", fname);
}
php_stream_close(fp);
phar_destroy_phar_data(myphar);
return FAILURE;
}
} while (!php_stream_eof(fp));

as it is then at the end of the file and the actual size to read is 0, the actual correct situation that no bytes should have been to read and not being able to read the "next" 512 bytes is the actual event of end-of-file (not truncation).

As the loop is long (starts 254) still looking for the actual exit condition with the 512 NUL bytes added which works fine, so take this initial analysis with a grain of salt.

@nielsdos
Copy link
Member

nielsdos commented Nov 4, 2024

Well, the first bug seems to be a classic off-by-one check, i.e. if (got > 512) { instead of if (got >= 512) {.
The second issue will take some time for me to understand, as this code is complex.

@nielsdos nielsdos self-assigned this Nov 4, 2024
nielsdos added a commit to nielsdos/php-src that referenced this issue Nov 4, 2024
There are two issues:
1) There's an off-by-one in the check for the minimum file size for a
   tar (i.e. `>` instead of `>=`).
2) The loop in the tar parsing parses a header, and then unconditionally
   reads the next one. However, that doesn't necessarily exist.
   Instead, we remove the loop condition and check for the end of the
   file before reading the next header. Note that we can't use
   php_stream_eof as the flag may not be set yet when we're already at
   the end.
@nielsdos
Copy link
Member

nielsdos commented Nov 4, 2024

Please check #16700, it should fix the issues for you.

@hakre
Copy link
Contributor Author

hakre commented Nov 4, 2024

Well, the first bug seems to be a classic off-by-one check, i.e. if (got > 512) { instead of if (got >= 512) {.

Makes sense, was not so much concerned about that one yet thought as the second issue was my concrete issue (so the first).

The second issue will take some time for me to understand, as this code is complex.

Yes, starring at the code over the day. Started to compile locally to test a bit. Unfortunately having the problem that somehow track_errors directive is set when executing sapi/cli/php and as it turns into a fatal error without telling from where I'm having a hard time to troubleshoot it ... @nielsdos

@nielsdos
Copy link
Member

nielsdos commented Nov 4, 2024

track_errors is one of those obsolete INI entries that indeed causes an E_CORE_ERROR (see main.c)
I suppose it's loading an old php.ini from somewhere. You should remove the track_errors from that INI.
if you can't find the INI, you can check php -i. If that still doesn't help, pass -n to php to disable loading any INI.

@hakre
Copy link
Contributor Author

hakre commented Nov 4, 2024

Wow you already pushed a patch, nice! @nielsdos

For the reproducers, the 3*512 bytes one is doing too much which I learned later. The last 512 bytes are the header of the entry named tls which is a directory. It can be left out for this report, the test is fine with the first 1024 bytes (and should work):

// header block of a file with 122 bytes, from an example "report.tar"
// $ tar -t -vf report.tar
// -rw-r--r-- 0/0             122 1970-01-01 01:00 meta.json
$buffer = base64_decode('bWV0YS5qc29uAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAADAwMDA2NDQAMDAwMDAwMAAwMDAwMDAwADAwMDAwMDAwMTcyADAwMDAwMDAwMDAwADAxMTAyNgAgMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB1c3RhcgAwMAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAwMDAwMDAwADAwMDAwMDAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAB7Ik5hbWUiOiJkZWZhdWx0IiwiTWV0YWRhdGEiOnt9LCJFbmRwb2ludHMiOnsiZG9ja2VyIjp7Ikhvc3QiOiJ1bml4Oi8vL3J1bi91c2VyLzEwMDAvZG9ja2VyLnNvY2siLCJTa2lwVExTVmVyaWZ5IjpmYWxzZX19fQAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA==');

And thinking: A good third test would be a tar consisting of a single header with an empty file, so the overall tar is just 512 bytes long but file_get_contents() of it should work.

@hakre
Copy link
Contributor Author

hakre commented Nov 4, 2024

track_errors is one of those obsolete INI entries that indeed causes an E_CORE_ERROR (see main.c) I suppose it's loading an old php.ini from somewhere. You should remove the track_errors from that INI. if you can't find the INI, you can check php -i. If that still doesn't help, pass -n to php to disable loading any INI.

Thanks! Found the culprit, it is the following setting:

$ sapi/cli/php -n -i | grep ini | head -n 1
Configuration File (php.ini) Path => /usr/local/lib

Have to check buildconf or configure parameters to change it to it's own location, at least assuming so @nielsdos

nielsdos added a commit to nielsdos/php-src that referenced this issue Nov 4, 2024
There are two issues:
1) There's an off-by-one in the check for the minimum file size for a
   tar (i.e. `>` instead of `>=`).
2) The loop in the tar parsing parses a header, and then unconditionally
   reads the next one. However, that doesn't necessarily exist.
   Instead, we remove the loop condition and check for the end of the
   file before reading the next header. Note that we can't use
   php_stream_eof as the flag may not be set yet when we're already at
   the end.
@nielsdos
Copy link
Member

nielsdos commented Nov 4, 2024

For the reproducers, the 3*512 bytes one is doing too much which I learned later. The last 512 bytes are the header of the entry named tls which is a directory. It can be left out for this report, the test is fine with the first 1024 bytes (and should work):

Done, thanks!

And thinking: A good third test would be a tar consisting of a single header with an empty file, so the overall tar is just 512 bytes long but file_get_contents() of it should work.

Indeed, I've modified the first tar file by hand to create an extra test variation. There are now three tests.

Have to check buildconf or configure parameters to change it to it's own location, at least assuming so @nielsdos

If you want to change the INI location of your build, pass --with-config-file-path=PATH to ./configure (see ./configure --help for more info).

@hakre
Copy link
Contributor Author

hakre commented Nov 4, 2024

If you want to change the INI location of your build, pass --with-config-file-path=PATH to ./configure (see ./configure --help for more info).

Found it already via configure.ac and building just went through. No more -n necessary:

$ sapi/cli/php -i | grep -F ini | head -n 1
Configuration File (php.ini) Path => /usr/local/lib/php/master/cli

Thanks a ton @nielsdos

Now double cross fingers make test passes, then this was the only gap. I'll then checkout your branch and test it locally.

@hakre
Copy link
Contributor Author

hakre commented Nov 5, 2024

Local testing looks good so far. Both for compiling and concrete scripts with the previously broken functionality.

I'm only seeing untracked files on the php 8.2 branch, .gitignore in master seems more up-to-date. How are backports done in this regard?

Finally I also have some minor changes for the buildconf script waiting to be cherry-picked. Should I file those again 8.2 first as well?

 
 # Go to project root.
-cd $(CDPATH= cd -- "$(dirname -- "$0")" && pwd -P)
+cd "$(CDPATH='' cd -- "$(dirname -- "$0")" && pwd -P)" || exit
 
-php_extra_version=$(grep '^AC_INIT(' configure.ac)
+php_extra_version=$(grep '^AC_INIT(' configure.ac) || exit
 case "$php_extra_version" in
   *-dev*)

@nielsdos
Copy link
Member

nielsdos commented Nov 5, 2024

I'm only seeing untracked files on the php 8.2 branch, .gitignore in master seems more up-to-date. How are backports done in this regard?

Not sure I fully understand the question. Our pull requests target the lowest supported bugfix branch and then we merge the patch upwards to more recent branches. If you think something is missing in the gitignore feel free to open a PR.

Finally I also have some minor changes for the buildconf script waiting to be cherry-picked. Should I file those again 8.2 first as well?

 
 # Go to project root.
-cd $(CDPATH= cd -- "$(dirname -- "$0")" && pwd -P)
+cd "$(CDPATH='' cd -- "$(dirname -- "$0")" && pwd -P)" || exit
 
-php_extra_version=$(grep '^AC_INIT(' configure.ac)
+php_extra_version=$(grep '^AC_INIT(' configure.ac) || exit
 case "$php_extra_version" in
   *-dev*)

I'm not a build system expert so I'm not entirely sure what this fixes, but yes please file a PR against 8.2 so someone more knowledgeable can have a look.

@hakre
Copy link
Contributor Author

hakre commented Nov 6, 2024

Not sure I fully understand the question. Our pull requests target the lowest supported bugfix branch and then we merge the patch upwards to more recent branches. If you think something is missing in the gitignore feel free to open a PR.

Thanks, that helped clarifying it and also after checking it again, the reason was b/c building on top of master first, then testing on top of PHP 8.2 - but unclean. make clean did also help.

I'm not a build system expert so I'm not entirely sure what this fixes, but yes please file a PR against 8.2 so someone more knowledgeable can have a look.

Now filed PR #16717 (and PR #16724 for PHP-8.4)

nielsdos added a commit that referenced this issue Nov 9, 2024
* PHP-8.2:
  Fix GH-16695: phar:// tar parser and zero-length file header blocks
nielsdos added a commit that referenced this issue Nov 9, 2024
* PHP-8.3:
  Fix GH-16695: phar:// tar parser and zero-length file header blocks
nielsdos added a commit that referenced this issue Nov 9, 2024
* PHP-8.4:
  Fix GH-16695: phar:// tar parser and zero-length file header blocks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants