Skip to content

[Routing] fix encoding of static static #4205

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 5 commits into from
Closed

Conversation

Tobion
Copy link
Contributor

@Tobion Tobion commented May 5, 2012

BC break: no
Fixes #4166

The first commit shows simply what the problem is: The chars, in this case ", get encoded when part of a placeholder but not as static text. This makes it inconsistent and can produce invalid URLs.

The second commit fixes the issue and further enhances the test with more chars. It also adds chars that do not need to be encoded for better usuability (explained in code). Some people might want to use chars like + in the path and it looks better when its not encoded and it doesn't change anything semantically (it generates the same url, but is easier to read).

The third commit fixed a phpDoc issue (not using inheritDoc and the setter was marked with @api but not the getter).

The forth commit fixes another problem I found out: The path segments "." and ".." are interpreted as relative reference when resolving a URI. So we need to encode them as they are not (and can not be) used for this purpose by a user. Otherwise we would generate a URI that, when followed by a user agent (e.g. browser), does not match this route. See test cases for examples. This issue can potentially be exploited by users.
Example exploit: The developer has defined a route like /blog/{slug} and let the user define the slug title himself. When the user choses ".." as slug and the developer generates a URL for this route with this slug, the resulting URL will be: /blog/...
But a link to this URI basically means / (the parent) and thus points to a complete different URI than intended. So the user could change the target URL which is a security issue.

'%2F' => '/',
// the following chars should not be encoded so someone could generate path parameters
// see http://doriantaylor.com/policy/http-url-path-parameter-syntax
'%3B' => ';',
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Tobion to make it clear, is it required that all this char list is not encoded or is is only to generate URL that are easier to read ? (I refer to all the chars excepting "/")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

;, , and = must not be encoded so the routing component can be used to generate path parameters. this feature is defined in the uri specification but I guess most people didn't know about that yet (me including until recently).
The other chars are only left unencoded for readability. But I think it's nice and would otherwise confuse people who for example specify a route like /calcsum/{summand1}+{summand2} that generates something like /calcsum/x%2By. As the example illustrates, these chars are sometimes used by developers as delimiters and encoding them wouldn't look that nice (e.g. in plain text emails).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So:

  • it would be great to link the spec in the comment
  • it would not do more than what is strictly required - btw "left unencoded" is "encoded then decoded" to be exact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

http://doriantaylor.com/policy/http-url-path-parameter-syntax already links to it: http://tools.ietf.org/html/rfc3986#section-3.3
But the mentioned link explains it much better.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couldn't there be an issue here: An uri would be like path;param1=value2,param2=value2 and if I get your proposal right, ;, , and = would not be encoded in both the path and path parameters so the solution is wrong if the path contains any of these 3 chars.

If this is the case - and provided that parameter path is seldom used - could we pass the path parameters in the $parameters instead (i.e. _path_parameters) ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's still not clear to me, sorry...

I think ;, , and = should be either encoded or raw according to their meaning (part of path or part of the path parameters). I don't think your PR allows this ? If it doesn't I fail to see how "It just allows users to use it".

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A route like /resource/name;{param1};{param2} and you have created a path with path parameters by hand. If we encode ; it wouldn't work anymore.
And it would also break BC somehow as currently this style also works (unintended because no static text is encoded at all). So currently /resource/na%me;{param1};{param2} would create an invalid URL because % is not encoded.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I must admit I hadn't look at the details of this PR, now I understand your point. However I still think this is not the right solution (to a real issue)

How would you differentiate the path "/path;k=v" (URL = "/path%3Bk%3Dv") from the path with parameters (URL = "/path;k=v").

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think I also got a bit confused with the reserved sub-delimiting chars ",;=". IMO it doesn't matter if these chars get encoded or not in terms of URI equivalence. See http://tools.ietf.org/html/rfc3986#section-6.2.2
Using them for delimiting path parameters is only a suggestion in the URI specification for URI producing applications. So they have no real predefined meaning and are therefore treated the same as other sub-delimiters, like "+".

So your example represents the same resource: "/path;k=v" == "/path%3Bk%3Dv". For example you can access such a file "path;k=v" in Apache with both URI requests.

All in all, we could encode all chars (except "/") and will still produce the same uri. That means we could also encode "A-Z" etc because after normalization it's still the same. But it's not recommended that

URI producers percent-encode octets that do not require percent-encoding

I would consider these sub-delimiters as such chars that don't need encoding because it will enhance readability. But after all it's our choice.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will remove the commenatary distinction for ",;=" in the code of this PR.

@vicb
Copy link
Contributor

vicb commented May 6, 2012

I still don't agree with this PR: The "Reserved Characters" could be encoded or not according to the meaning they should have.

Your PR proposes to have some chars de-encoded which according to the previous statement could change the meaning of the URL.

I don't have a good solution either.

Note: you linked the section 6.2.2 which is about normalization however section 2.2 clearly states: "Thus, characters in the reserved set are protected from normalization"

@Tobion
Copy link
Contributor Author

Tobion commented May 6, 2012

I don't understand what you want to achieve differently. If you disagree with this PR, please give a test case that in your opinion should have a different output. Otherwise we are talking in circles.

@Tobion
Copy link
Contributor Author

Tobion commented May 6, 2012

Let's recap. We have three options:

  • encode all special chars except "/" (would slightly break bc for special chars in static text)
  • not encode (= encode and decode) some chars both in static text and variables that developers often use as sub-delimiters in their URLs for better readability (would slightly break bc for special chars in variables)
  • not encode special chars in static text, but encode them in variables (would pretty much not break bc, but is inconsistent because it generates different URLs depending on whether the reserved char is in the text or variable)

And yes, it's correct that it matters if the reserved chars get encoded or not because they are not part of the normalization. So it would generate two different URIs (when comparing URIs in browser caches for example).
It doesn't matter for the symfony matching as rawurldecode() also decodes the reserved chars. And Apache will also find the file regardless of encoding as mentioned above: "/path;k=v" == "/path%3Bk%3Dv"
So encoding or not of reserved chars matters for uniquely identifying resources, but not for retrieving.

The current situation is that static text is not encoded at all, which we agree is a bug, as it can potentially generate invalid, inconsistent and unexcepted URLs. To fix this, I'm highly in favor of option 2 as:

  • it enhances readability of URLs (like in E-Mails where people might find encoded values suspicious and ugly),
  • the chosen chars are safe to be used in the path as explained in code and cannot be misinterpreted by URL parsers,
  • is consistent.

Example: I have a route like /@{id}. Currently and with this PR it would generate for example: /@1234.
If we chose option 1 it would suddenly generate /%401234, which doesn't matter for routing, but all browser caches and proxy caches etc. would interpret this as a new URL and thus miss the cache.
Opposite example: Route /{id} with id="@1234" would with this PR also generate /@1234, whereas previously it generated /%401234. If we chose option 3 both would generate the same as before but is inconsistent and probably not what someone would expect: /@{id} != /{id} (with id having an @ an beginning).

I hope everything is clear now.

@Tobion
Copy link
Contributor Author

Tobion commented May 7, 2012

@vicb: I'm online at symfony-dev and skype.

@travisbot
Copy link

This pull request passes (merged 45fd8844 into 6f6ee95).

@Tobion
Copy link
Contributor Author

Tobion commented May 10, 2012

I fixed another issue with path segments ".." and "." and updated the PR description.
I think this should be merged ASAP as the issue can potentially be exploited by users (see description).

@travisbot
Copy link

This pull request passes (merged ae34dd6 into 906f6f6).

@Tobion
Copy link
Contributor Author

Tobion commented May 10, 2012

ping @fabpot

@travisbot
Copy link

This pull request passes (merged 4ef215e into 906f6f6).

@Tobion
Copy link
Contributor Author

Tobion commented May 17, 2012

@vicb FYI, since you were not really comfortable with not encoding some of these chars: Zends URI package also excludes these chars from encoding in the path. See https://github.com/zendframework/zf2/blob/master/library/Zend/Uri/Uri.php#L912
They even exclude some more chars like &, which I could also add.

This is also true for Ruby on Rails. See https://github.com/rails/journey/blob/master/lib/journey/router/utils.rb#L34

@Tobion Tobion closed this Jun 9, 2012
fabpot added a commit that referenced this pull request Jul 3, 2012
Commits
-------

3466896 [Routing] fix encoding of static static, so UrlGenerator produces valid URLs

Discussion
----------

[Routing] fix encoding of static text

Fixes #4166

As requested by Fabien, I split #4205 into multiple PRs.

---------------------------------------------------------------------------

by stof at 2012-06-09T12:49:32Z

github tells me this PR cannot be merged automatically. Could you rebase it ?

---------------------------------------------------------------------------

by Tobion at 2012-06-09T13:12:55Z

Done

---------------------------------------------------------------------------

by travisbot at 2012-06-09T13:18:18Z

This pull request [passes](http://travis-ci.org/symfony/symfony/builds/1576266) (merged 3466896 into 15b5aa4).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants