-
-
Notifications
You must be signed in to change notification settings - Fork 9.6k
[Routing] fix encoding of static static #4205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
'%2F' => '/', | ||
// the following chars should not be encoded so someone could generate path parameters | ||
// see http://doriantaylor.com/policy/http-url-path-parameter-syntax | ||
'%3B' => ';', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Tobion to make it clear, is it required that all this char list is not encoded or is is only to generate URL that are easier to read ? (I refer to all the chars excepting "/")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
;
, ,
and =
must not be encoded so the routing component can be used to generate path parameters. this feature is defined in the uri specification but I guess most people didn't know about that yet (me including until recently).
The other chars are only left unencoded for readability. But I think it's nice and would otherwise confuse people who for example specify a route like /calcsum/{summand1}+{summand2}
that generates something like /calcsum/x%2By
. As the example illustrates, these chars are sometimes used by developers as delimiters and encoding them wouldn't look that nice (e.g. in plain text emails).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So:
- it would be great to link the spec in the comment
- it would not do more than what is strictly required - btw "left unencoded" is "encoded then decoded" to be exact.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
http://doriantaylor.com/policy/http-url-path-parameter-syntax already links to it: http://tools.ietf.org/html/rfc3986#section-3.3
But the mentioned link explains it much better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Couldn't there be an issue here: An uri would be like path;param1=value2,param2=value2
and if I get your proposal right, ;
, ,
and =
would not be encoded in both the path and path parameters so the solution is wrong if the path contains any of these 3 chars.
If this is the case - and provided that parameter path is seldom used - could we pass the path parameters in the $parameters
instead (i.e. _path_parameters
) ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's still not clear to me, sorry...
I think ;
, ,
and =
should be either encoded or raw according to their meaning (part of path or part of the path parameters). I don't think your PR allows this ? If it doesn't I fail to see how "It just allows users to use it".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A route like /resource/name;{param1};{param2}
and you have created a path with path parameters by hand. If we encode ;
it wouldn't work anymore.
And it would also break BC somehow as currently this style also works (unintended because no static text is encoded at all). So currently /resource/na%me;{param1};{param2}
would create an invalid URL because %
is not encoded.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I must admit I hadn't look at the details of this PR, now I understand your point. However I still think this is not the right solution (to a real issue)
How would you differentiate the path "/path;k=v" (URL = "/path%3Bk%3Dv") from the path with parameters (URL = "/path;k=v").
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, I think I also got a bit confused with the reserved sub-delimiting chars ",;=". IMO it doesn't matter if these chars get encoded or not in terms of URI equivalence. See http://tools.ietf.org/html/rfc3986#section-6.2.2
Using them for delimiting path parameters is only a suggestion in the URI specification for URI producing applications. So they have no real predefined meaning and are therefore treated the same as other sub-delimiters, like "+".
So your example represents the same resource: "/path;k=v" == "/path%3Bk%3Dv". For example you can access such a file "path;k=v" in Apache with both URI requests.
All in all, we could encode all chars (except "/") and will still produce the same uri. That means we could also encode "A-Z" etc because after normalization it's still the same. But it's not recommended that
URI producers percent-encode octets that do not require percent-encoding
I would consider these sub-delimiters as such chars that don't need encoding because it will enhance readability. But after all it's our choice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will remove the commenatary distinction for ",;=" in the code of this PR.
I still don't agree with this PR: The "Reserved Characters" could be encoded or not according to the meaning they should have. Your PR proposes to have some chars de-encoded which according to the previous statement could change the meaning of the URL. I don't have a good solution either. Note: you linked the section 6.2.2 which is about normalization however section 2.2 clearly states: "Thus, characters in the reserved set are protected from normalization" |
I don't understand what you want to achieve differently. If you disagree with this PR, please give a test case that in your opinion should have a different output. Otherwise we are talking in circles. |
Let's recap. We have three options:
And yes, it's correct that it matters if the reserved chars get encoded or not because they are not part of the normalization. So it would generate two different URIs (when comparing URIs in browser caches for example). The current situation is that static text is not encoded at all, which we agree is a bug, as it can potentially generate invalid, inconsistent and unexcepted URLs. To fix this, I'm highly in favor of option 2 as:
Example: I have a route like I hope everything is clear now. |
@vicb: I'm online at symfony-dev and skype. |
I fixed another issue with path segments ".." and "." and updated the PR description. |
…id URLs also added test with many special characters
ping @fabpot |
@vicb FYI, since you were not really comfortable with not encoding some of these chars: Zends URI package also excludes these chars from encoding in the path. See https://github.com/zendframework/zf2/blob/master/library/Zend/Uri/Uri.php#L912 This is also true for Ruby on Rails. See https://github.com/rails/journey/blob/master/lib/journey/router/utils.rb#L34 |
Commits ------- 3466896 [Routing] fix encoding of static static, so UrlGenerator produces valid URLs Discussion ---------- [Routing] fix encoding of static text Fixes #4166 As requested by Fabien, I split #4205 into multiple PRs. --------------------------------------------------------------------------- by stof at 2012-06-09T12:49:32Z github tells me this PR cannot be merged automatically. Could you rebase it ? --------------------------------------------------------------------------- by Tobion at 2012-06-09T13:12:55Z Done --------------------------------------------------------------------------- by travisbot at 2012-06-09T13:18:18Z This pull request [passes](http://travis-ci.org/symfony/symfony/builds/1576266) (merged 3466896 into 15b5aa4).
BC break: no
Fixes #4166
The first commit shows simply what the problem is: The chars, in this case
"
, get encoded when part of a placeholder but not as static text. This makes it inconsistent and can produce invalid URLs.The second commit fixes the issue and further enhances the test with more chars. It also adds chars that do not need to be encoded for better usuability (explained in code). Some people might want to use chars like
+
in the path and it looks better when its not encoded and it doesn't change anything semantically (it generates the same url, but is easier to read).The third commit fixed a phpDoc issue (not using inheritDoc and the setter was marked with @api but not the getter).
The forth commit fixes another problem I found out: The path segments "." and ".." are interpreted as relative reference when resolving a URI. So we need to encode them as they are not (and can not be) used for this purpose by a user. Otherwise we would generate a URI that, when followed by a user agent (e.g. browser), does not match this route. See test cases for examples. This issue can potentially be exploited by users.
Example exploit: The developer has defined a route like
/blog/{slug}
and let the user define the slug title himself. When the user choses ".." as slug and the developer generates a URL for this route with this slug, the resulting URL will be:/blog/..
.But a link to this URI basically means
/
(the parent) and thus points to a complete different URI than intended. So the user could change the target URL which is a security issue.