Skip to content

[Process] Windows: stronger escaping, disabled delayed expansion, deprecated options #21254

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 1 commit into from

Conversation

nicolas-grekas
Copy link
Member

@nicolas-grekas nicolas-grekas commented Jan 12, 2017

Q A
Branch? master
Bug fix? no
New feature? yes
BC breaks? no
Deprecations? yes
Tests pass? yes
Fixed tickets #19993, #10486
License MIT
Doc PR -

Related to #3381.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jan 12, 2017

Credits to @johnstevenson

@nicolas-grekas nicolas-grekas force-pushed the shell-args branch 2 times, most recently from 02a6451 to ef747f3 Compare January 12, 2017 20:07
@dunglas
Copy link
Member

dunglas commented Jan 12, 2017

Maybe should you copy the full test suite too? https://github.com/johnstevenson/winbox-args/blob/master/tests/ArgsTest.php

@nicolas-grekas
Copy link
Member Author

@johnstevenson could you please have a look at this PR, and see if the current failures+patch seems legit to you? Also, OK for adding the test suite here?

@johnstevenson
Copy link
Contributor

@nicolas-grekas Will do. No worries copying the test suite

@staabm
Copy link
Contributor

staabm commented Jan 12, 2017

@johnstevenson did you check whether this 'advanced' windows escaping logic should be fixed/applied to php-src?

@nicolas-grekas
Copy link
Member Author

See https://blogs.msdn.microsoft.com/twistylittlepassagesallalike/2011/04/23/everyone-quotes-command-line-arguments-the-wrong-way/ also for reference. Looks like the escaping logic should vary if bypass_shell is on/off.

@johnstevenson
Copy link
Contributor

Yes, that's what the meta param is all about.

@nicolas-grekas nicolas-grekas changed the base branch from 2.7 to master January 18, 2017 09:44
@nicolas-grekas nicolas-grekas force-pushed the shell-args branch 12 times, most recently from a26c700 to ba4c870 Compare January 18, 2017 13:45
@nicolas-grekas nicolas-grekas force-pushed the shell-args branch 2 times, most recently from d85ebec to bdd686d Compare January 18, 2017 13:56
@nicolas-grekas nicolas-grekas changed the title [Process] Make arg escaping on Windows more robust [Process] Windows: stronger escaping, disabled delayed expansion, deprecated options Jan 18, 2017
@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jan 18, 2017

PR green
Status: needs review

Changelog updated:

  • disabled delayed expansion on Windows
  • added second argument to ProcessUtils::escapeArgument() for controlling the escaping context on Windows
  • deprecated !VAR! expansion allowed by ProcessUtils::escapeArgument()
  • deprecated configuring proc_open() options
  • deprecated configuring enhanced Windows compatibility
  • deprecated configuring enhanced SIGCHLD compatibility

One could consider some of these changes as breaking BC. Yet, any other options/uses of the related default behaviors was putting the component in some "unknown/unpredictable" state. It couldn't work really on Windows.

Eg delayed expansion what enabled as a workaround to allow using !env_vars! in assetic. Yet, delayed expansion adds yet another parsing layer on top of the already too many and clumsy ones on Windows, making argument escaping impossible.
Now that we have %env(var)% in configs, this is not needed anymore. For BC, !env_vars! are now resolved on PHP's side, with a deprecation notice.

@johnstevenson sorry, I did not use your code - it did not fit here :)

* `!VAR!` expansion allowed by ProcessUtils::escapeArgument() is deprecated.
You should resolve these variables before calling `ProcessUtils::escapeArgument()`.

* Configuring `proc_open()` options, Windows compatibility and SIGCHLD
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing I in SIGCHILD (same in other files)

$this->options = array_replace(array('suppress_errors' => true, 'binary_pipes' => true), $options);
if (null !== $options) {
@trigger_error(sprintf('The $options parameter of the %s constructor is deprecated since version 3.3 and will be removed in 4.0.', __CLASS__), E_USER_DEPRECATED);
$this->options = $options + $this->options;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would keep array_replace. It makes the code more readable IMO.

@@ -274,24 +277,23 @@ public function start(callable $callback = null)
if ('\\' === DIRECTORY_SEPARATOR && !empty($this->options['bypass_shell']) && !$this->enhanceWindowsCompatibility) {
throw new LogicException('The "bypass_shell" option must be false to inherit environment variables while enhanced Windows compatibility is off');
}
$escape = '\\' === DIRECTORY_SEPARATOR ? function ($v) { return ProcessUtils::escapeArgument($v, ProcessUtils::ESC_WINDOWS_CMD); } : 'escapeshellarg';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wouldn't this break BC by disabling the expansion of variables due to having 2 args ?

}

return escapeshellarg($argument);
if ((self::ESC_WINDOWS_ARGV & $mode) && (false !== strpbrk($argument, " \t\n\v\"") || !isset($argument[0]))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't the case of the empty string be checked before the strpbrk ? I suspect it is a cheaper call

return escapeshellarg($argument);
if ((self::ESC_WINDOWS_ARGV & $mode) && (false !== strpbrk($argument, " \t\n\v\"") || !isset($argument[0]))) {
$argument = preg_replace('/(\\\\*+)"/', '$1$1\\"', $argument);
$argument = preg_replace('/(\\\\++)$/', '$1$1', $argument);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for the empty string, we could skip these regexes

*/
class ProcessUtils
{
const ESC_WINDOWS_ARGV = 1;
const ESC_WINDOWS_CMD = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should have documentation on these to explain their respective behavior

array('^%path^%', '%path%'),
array('^"^<^|^>\\^" \\^"\'f^"', '<|>" "\'f'),
array('^"^"', ''),
array('^"with\trailing bs\\\\^"', 'with\trailing bs\\'),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also have tests covering the usage of the different constants

@@ -274,24 +277,23 @@ public function start(callable $callback = null)
if ('\\' === DIRECTORY_SEPARATOR && !empty($this->options['bypass_shell']) && !$this->enhanceWindowsCompatibility) {
throw new LogicException('The "bypass_shell" option must be false to inherit environment variables while enhanced Windows compatibility is off');
}
$escape = '\\' === DIRECTORY_SEPARATOR ? function ($v) { return ProcessUtils::escapeArgument($v, ProcessUtils::ESC_WINDOWS_CMD); } : 'escapeshellarg';
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and why applying on the CMD escaping, not the ARGV escaping, while Unix uses escapeshellarg ?

@johnstevenson
Copy link
Contributor

@nicolas-grekas No worries from my end for not using my code. However...

The reason I did not use the code from the blog entry was because it is not reliable (incidentally, it is listed in my References section). Unfortunately, the author missed an extremely important gotcha:

If the argument is the name of the program to execute and it contains whitespace, then it will not be recognized by cmd if it is caret-escaped. This is because the program is identified from the start of the command, delimited by whitespace outside a quoted-string. And because the parser is never inside a quoted-string it will stop when it reaches the first whitespace character.

from Caret Escaping Meta Characters which is part of the Implementing a solution section that ties everything together.

For example C:\Program Files\php\php.exe will be caret-escaped as ^"C:\Program Files\php\php.exe^". But cmd will parse this as "C:\Program which obviously cannot be executed.

On a separate point, I do not understand the bit-masked constants, which appear to make things very brittle. For example the test

if (!(self::ESC_WINDOWS_ARGV_CMD & $mode))

doesn't actually restrict the $mode argument to the range 1-3 as intended. Also, if I were to pass in ESC_WINDOWS_CMD then a crucial part of the escaping is ignored. Perhaps the absence of an explanation has confused me here, but surely all that is needed is a:

  • default true we are using cmd.exe
  • false we are not using cmd.exe

Incidentally, just before you updated the PR I was working through the appveyor test failures to check if the actual output was correct and that the test data needed changing. I am pleased to report that this is the case. Hey ho!

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jan 18, 2017

Hum. In fact, the author of the MSDN article makes it clear to me:

Windows knows about only one command line string for each process. Because one string is not terribly useful, libraries conspire to provide the illusion of multiple command line arguments.

cmd parses its parts, then for some apps, CommandLineToArgvW splits again the string (as done eg when calling php).
Thus, the path-to-php.exe is not parsed by CommandLineToArgvW - only by cmd. It doesn't take part of that "single string" that CommandLineToArgvW then interprets as command line arguments (the illusion mentioned above.)

All in all, that means there is no solution: escaping on Windows is contextual. If you don't know the context where the escaped value is going to be inserted, you can't escape it correctly...

Eg there is no way to escape the \n character.

Thus, I'm closing. Our enemy is parsing as done by cmd.

I might come up soon again with another radically different approach based on this snippet :)

putenv('LF=
');
putenv('C=SET && php -r "print_r($_SERVER);" "^^^!LF^!a!LF!b!LF!c"');

proc_close(proc_open('cmd /V:ON /E:ON /D /S /C %C%', array(), $p, null, null, array('bypass_shell' => true)));

@nicolas-grekas nicolas-grekas deleted the shell-args branch January 18, 2017 22:47
@johnstevenson
Copy link
Contributor

Thus, the path-to-php.exe is not parsed by CommandLineToArgvW - only by cmd.

I don't understand what you mean by this. Of course the path-to-php.exe is parsed by CommandLineToArgvW. See How Windows parses the command line for the details.

@nicolas-grekas
Copy link
Member Author

nicolas-grekas commented Jan 19, 2017

Continued in #21347 with a radically new approach.

@nicolas-grekas
Copy link
Member Author

Of course the path-to-php.exe is parsed by CommandLineToArgvW.

Example with echo: echo "\"" > the output is "\"" - which means CommandLineToArgvW did not apply.

@johnstevenson
Copy link
Contributor

Sorry, you've lost me! echo is an internal command so CommandLineToArgvW would never apply.

Just to be clear, do you believe that path-to-php.exe is not parsed by CommandLineToArgvW ?

@nicolas-grekas
Copy link
Member Author

We don't care anymore, see #21254 :)

@johnstevenson
Copy link
Contributor

Eh? This is #21254 isn't it?

@Simperfit
Copy link
Contributor

Maybe he was talking about #21347

@nicolas-grekas
Copy link
Member Author

I meant #21347 sorry :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants