Skip to content

Optional encoder function, in addition to decoder (converter) #70

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
dezlov opened this issue Jul 31, 2015 · 37 comments
Closed

Optional encoder function, in addition to decoder (converter) #70

dezlov opened this issue Jul 31, 2015 · 37 comments
Milestone

Comments

@dezlov
Copy link

dezlov commented Jul 31, 2015

This will allow for ultimate customization and compatibility with other cookie encoding functions.

It could be added as an extension to Cookies.withConverter() to allow specifying an encoding function too.

For example, the incompatibility with PHP cannot be resolved because this JS library will always store "+" signs in plain text, while PHP will always convert them to spaces when read.

In PHP, both $_COOKIE and filter_input(INPUT_COOKIE) will return already decoded value, where all "+" signs will be converted to spaces. There is a setrawcookie, but no getrawcookie function, so no way around it in PHP (without having to manually double encode and decode in both languages).

@dezlov
Copy link
Author

dezlov commented Jul 31, 2015

Related ticket: carhartl/jquery-cookie#13

It only addresses one-way problem (PHP to JS), but not the reverse (JS to PHP).

@FagnerMartinsBrack
Copy link
Member

Are you sure there is no way to do it in PHP? The + sign is a valid character in the RFC, I know .net has some method to workaround it, PHP should have it too.

Did you consider reporting to PHP itself as a bug report due to the fact it handles the RFC incorrectly and fails to provide a method that handles it correctly?

@dezlov
Copy link
Author

dezlov commented Jul 31, 2015

Yes, I'm pretty sure.

PHP reads cookie string and replaces all '+' characters with spaces, for historical reasons (as mentioned in the docs).

I guess one can parse HTTP headers manually and extract all cookie information manually, but the native PHP methods for accessing cookies via $_COOKIE and filter_input(INPUT_COOKIE) have this '+' handling built-in and not adjustable.

To make this library compatible with PHP cookie encoding/decoding: always encode + as %2B. The diff below demonstrates how to apply it in code.

59c59,60
<               value = value.replace(/%(23|24|26|2B|3A|3C|3E|3D|2F|3F|40|5B|5D|5E|60|7B|7D|7C)/g, decodeURIComponent);
---
>               value = value.replace(/%(23|24|26|3A|3C|3E|3D|2F|3F|40|5B|5D|5E|60|7B|7D|7C)/g, decodeURIComponent);
62c63,64
<               key = key.replace(/%(23|24|26|2B|5E|60|7C)/g, decodeURIComponent);
---
>               key = key.replace(/%(23|24|26|5E|60|7C)/g, decodeURIComponent);

While in PHP one must use setrawcookie($name, rawurlencode($value)) instead of setcookie($name, $value) to force it to save + as %2B.

Could that small diff be incorporated into this library perhaps as an option, e.g. Cookie.compatibilityWithPHP = true?

P.S. This should defiantly be reported as a bug in PHP, but even if it will be addressed in some future version, majority of PHP installations won't have that future version for a very very long time.

@FagnerMartinsBrack
Copy link
Member

Could that small diff be incorporated into this library perhaps as an option, e.g. Cookie.compatibilityWithPHP = true

Previously we had raw and json that tried to solve similar problems, that lead to lots of confusion, converters were added in v2 to fix the decoding issue regarding third party cookies set in the server.

Implementing converters that also allows writing the cookie in a different format have crossed my mind when I added it, but I didn't had strong evidence that it would be useful for the majority of cases, where you can just replace the desired characters.

Also, it would be a best practice to use the robustness principle when dealing with this kind of problem:

Be conservative in what you do, be liberal in what you accept from others (often reworded as "Be conservative in what you send, be liberal in what you accept").

In js-cookie case, we are conservative on what we send (we have a default procedure to encode the cookie), and we are liberal in what we accept from others (we allow converters to specify a custom decoding mechanism).

In php case, it would make sense to do the same. Create a function that decodes the cookie in a per application basis (liberal on what you accept), but make it sure that what is written is written correctly using setrawcookie($name, rawurlencode($value)) (be conservative in what you send).

Of course, the problem as I understand is that, if you use the default function to retrieve the cookie, PHP already replaces that without notice. In that case there's no way to handle it unless you parse the headers manually with a custom function.

Unfortunately we can't change the way the "plus" is encoded because that would break backwards compatibility, since the character is permitted by the spec and the docs says that:

All special characters that are not allowed in the cookie-name or cookie-value are encoded with each one's UTF-8 Hex equivalent using percent-encoding.

What we can do is:

  1. Create a php-cookie project that exposes a proper cookie handling API for php (You say this is not feasible because other methods depend on the wrong behavior, but it would be interesting to try it out)
  2. Create a converter for js-cookie that allows changing the method used to write the cookie, the same way this is done now for reading the cookie.

@carhartl

@FagnerMartinsBrack
Copy link
Member

I have created a pull request in case we decide that the option 2 is the way to go.

@dezlov
Copy link
Author

dezlov commented Aug 1, 2015

Your idea with passing converter as an object with read and write properties looks great!

I just checked what PHP does with cookie names, for the reference:

  1. Cookie names do not seem to get encoded in any way
  2. UTF-8 characters get mangled into some kind of ANSI garbage
  3. Does not allow using any of these characters =,; \t\r\n\013\014 displaying the following warning:
Warning: Cookie names cannot contain any of the following '=,; \t\r\n\013\014'

@FagnerMartinsBrack
Copy link
Member

Your idea with passing converter as an object with read and write properties looks great!

Thanks. I understand "idea" has some kind of ethical definitions. But for the literal sense of the word, #71 was not exactly "my idea".

Cookie names do not seem to get encoded in any way

That probably would have no problem as long as browsers are able to interpret it correctly. js-cookie uses decodeURIComponent in the default decoding mechanism for any sequence of /(%[0-9A-Z]{2})+/g characters (the percent-encoding format).

UTF-8 characters get mangled into some kind of ANSI garbage

I didn't understand what you mean by "ANSI garbage", could you elaborate? They don't use percent-encoding?

Does not allow using any of these characters =,; \t\r\n\013\014 displaying the following warning:
Warning: Cookie names cannot contain any of the following '=,; \t\r\n\013\014'

That makes sense. The cookie-name is interpreted as "token" in the RFC 6265. The "token" does not allow ,, ;, SP or CTLs (\t, \n, \r ... ) so it will be treated correctly using js-cookie because we encode those characters.

Thank you very much for your insights regarding PHP behavior. I am pretty sure this is going to be very useful for someone (because I don't use PHP xD).

@dezlov
Copy link
Author

dezlov commented Aug 1, 2015

I understand "idea" has some kind of ethical definitions.

What it meant is that I found the selected "approach" quote interesting, i.e. using an object instead of adding an additional parameter to Cookies.withConverter().

I didn't understand what you mean by "ANSI garbage", could you elaborate? They don't use percent-encoding?

When I tried saving a cookie in PHP with Unicode characters (natively in UTF-8) in cookie name, they were stored in browser (Firefox) in a mangled form. They were not %-encoded. I'm not sure whether there that is problem in PHP or browser or both, I didn't investigate further, but thought I'd mention anyway.

I am pretty sure this is going to be very useful for someone (because I don't use PHP xD).

I can assure you that it will be useful to many developers, after all, PHP is the most widely used language for web development. I have checked few other cookie JS libraries and none of the can work in a PHP-compatible behavior out of the box. You'll have an edge right there!

I think it is very important to add a section in the docs about compatibility with PHP.

@FagnerMartinsBrack
Copy link
Member

I think it is very important to add a section in the docs about compatibility with PHP.

PRs are welcome xD

@dezlov
Copy link
Author

dezlov commented Aug 1, 2015

PHP compatibility

PHP applies non-standard encoding and decoding for cookies, which can create problems when used in conjunction with other cookie libraries.

In PHP, setcookie() function encodes cookie values using urlencode() function, which applies %-encoding but also encodes spaces as + signs, for historical reasons. When cookies are read back via $_COOKIE or filter_input(INPUT_COOKIE), they would go trough a decoding process which decodes %-encoded sequences and also converts + signs back to spaces. However, the plus (+) sign is valid cookie character by itself, which means that libraries that adhere to standards will interpret + signs differently to PHP.

This presents two types of problems:

  1. PHP writes a cookie via setcookie() and all spaces get converted to + signs. JS library read + signs and uses them literally, since it is a valid cookie character.
  2. JavaScript library writes a cookie with a value that contains + signs and stores it as is, since it is a valid cookie character. PHP read a cookie and converts + signs to spaces.

How to make both PHP and this library play nicely together?

In PHP, use setrawcookie() instead of setcookie():

setrawcookie($name, rawurlencode($value));

In JavaScript, use a custom converter:

Cookies.withConverter({
    write: function (value) {
        return encodeURIComponent(value);
    }
    read: function (value) {
        return decodeURIComponent(value);
    }
});

@FagnerMartinsBrack FagnerMartinsBrack added this to the v2.1.0 milestone Aug 16, 2015
@nanaya
Copy link

nanaya commented Aug 19, 2015

By the way, rails rack is doing similar thing (space is converted to plus and plus sign etc are percent-encoded). I just hit this problem when migrating from jquery-cookies to js-cookies.

I just checked, it also applies to decoding cookie (plus is converted to space).

A bit more precise, it encodes/decodes in application/x-www-form-urlencoded (using URI.decode_www_form_component, URI.encode_www_form_component).

@FagnerMartinsBrack
Copy link
Member

@nanaya Is there any workaround for rack? Maybe in a newer version?

@nanaya
Copy link

nanaya commented Aug 19, 2015

@FagnerMartinsBrack nothing from what I can tell (short of making own cookies parser).

@FagnerMartinsBrack
Copy link
Member

@carhartl What do you think about this?

Due to historical reasons, the plus character is converted to space incorrectly by server-side frameworks also when reading:

Can't change now

We can't change the way plus is handled because it would break backwards compatibility, since it is documented that the characters allowed in the spec are not encoded.

According to semver spec, breaking changes should only be introduced in a major release.

Writer converter ignores default encoding

If we add a write converter option we would be encouraging people to write:

Cookies.withConverter({
  write: encodeURIComponent
})

By using this way, the whole encoding mechanism will be ignored, defeating the purpose of encoding only the characters that are not allowed in RFC 6265.

Server-side problem

If we don't add a way to do this, some developers will not be able to add a + in their cookies, because PHP and rack seems to convert the plain plus character to space, despite RFC 6265 saying it is valid.

We could maybe add a way to run the value through the default encoding mechanism inside the write callback to prevent ignoring it completely and make it possible to fix the offending characters, something like this:

Cookies.withConverter({
  write: function( value, key, defaultEncoding ) {
    var encodedValue = defaultEncoding( value );
    var fixedValueForPHP = encodedValue.replace( /+/g, '%2B' );
    return fixedValueForPHP;
  }
});

But then I would argue that we are doing to much and making it too complex to be able to solve the problems of somebody else.

The write converter PR is there, but I am not sure if that is the best way to solve the problem.

@carhartl
Copy link
Member

In short, the problem is that if I write a cookie value "+", a couple of server-side frameworks will convert it to " ", while a developer might have actually meant "+"?

@carhartl
Copy link
Member

Can we somehow piggyback on the fact that we're supporting quoted cookies as defined in the specification? I.e quoted cookie => leave "+" as it is and thus encode it to %2B, preventing serve-side to turn it into white space...

@FagnerMartinsBrack
Copy link
Member

Where is quoted cookies specified? RFC 6265 does not mention any specific rule for quoted cookies, it just says you can use a double quote in both ends of the value:

cookie-value      = *cookie-octet / ( DQUOTE *cookie-octet DQUOTE )

See http://tools.ietf.org/html/rfc6265#section-4.1.1

@carhartl
Copy link
Member

The double quotes enclosed cookie value was what I was referring to. I know there is no special handling defined for such a cookie, hence "piggyback". Whatever.

@FagnerMartinsBrack
Copy link
Member

Got it.

Up to RFC 6265 there was no clear specification for cookies, each browser implemented what they thought it was ok, they didn't seemed to follow the original specs.

As I understand, RFC 6265 came to establish a baseline and make it possible to document the current behavior of browsers, but unfortunately they didn't thought about all those framework and server-side issues 😢

I would love to fix this without having to add a new API. The problem is that I really have no idea of how the behavior of a cookie should be when enclosed in double quotes or even if implementing that would fix the problem. Since there's no spec that I know of, It would demand lots of testing and that might expose several edge case issues... 😫

@FagnerMartinsBrack
Copy link
Member

@dezlov

PHP reads cookie string and replaces all '+' characters with spaces, for historical reasons (as mentioned in the docs).

Can you please link the docs or bug report that talk about the plus being converted to spaces upon decode? I couldn't find it.

@nanaya

Can you please do the same regarding rack, just for reference?


I don't use neither of the languages frameworks you guys mentioned, @carhartl probably doesn't too. As far as I remember, the $.COOKIE api from php is pretty old, probably older than RFC 6265, so that behavior is not supposed to have been added after April 2011.

I have sent an e-mail to the http-state group from ietf asking for the reason why this php peculiarity wasn't addressed when the RFC came out: http://www.ietf.org/mail-archive/web/http-state/current/maillist.html

@dezlov
Copy link
Author

dezlov commented Sep 7, 2015

@FagnerMartinsBrack

setcookie() - http://php.net/manual/en/function.setcookie.php

Note that the value portion of the cookie will automatically be urlencoded when you send the cookie, and when it is received, it is automatically decoded and assigned to a variable by the same name as the cookie name.

urlencode() - http://php.net/manual/en/function.urlencode.php

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

Source code of relevant internal PHP functions (PHP-5.5.29 branch, released on 3 Sep 2015):
php_setcookie() - https://github.com/php/php-src/blob/PHP-5.5.29/ext/standard/head.c#L76
php_url_encode() - https://github.com/php/php-src/blob/PHP-5.5.29/ext/standard/url.c#L487
php_url_decode() - https://github.com/php/php-src/blob/PHP-5.5.29/ext/standard/url.c#L569

@nanaya
Copy link

nanaya commented Sep 7, 2015

Unfortunately I couldn't find documentation for Rack. I tried to track the history but the closest thing I got is it's been parsed similar way as query string (and forms) since beginning and never really changed.

@FagnerMartinsBrack
Copy link
Member

Thanks guys!

It seems php plus replacement on decoding is there since at least Jan 2004.
Rack's plus decode handling is there at least since 2007.

@FagnerMartinsBrack
Copy link
Member

One more edge case in Tomcat:

If send cookie value has ( or ) to tomcat 7.x. the value was cut before ( or ).

-- @mastojun at #92 (comment)

@FagnerMartinsBrack
Copy link
Member

I am planning in documenting all these server-side issues somehow. Maybe provide write and read converters for all the known issues.

FagnerMartinsBrack added a commit that referenced this issue Sep 28, 2015
@FagnerMartinsBrack
Copy link
Member

@nanaya, @dezlov Tell me what you think about this.

@nanaya
Copy link

nanaya commented Sep 30, 2015

Looks fine apart of typos and wrong read function (declared as write) and I believe + must be converted to space first before conversion from %2B to + happens.

read: function (value) {
    return value
        // Decode the plus sign to spaces
        .replace(/\+/g, ' ')
        // Decode all characters according to the "encodeURIComponent" spec
        .replace(/(%[0-9A-Z]{2})+/g, decodeURIComponent)
}

@FagnerMartinsBrack
Copy link
Member

Thanks for the review.

Looks fine apart of typos and wrong read function (declared as write)

Yeah that was the first link that you probably received by e-mail. I edited right after.

I believe + must be converted to space first before conversion from %2B to + happens.

Looks like the order doesn't matter. decodeURIComponent doesn't mess with the plus sign, only percent encoded characters:

"+".replace(/(%[0-9A-Z]{2})+/g, decodeURIComponent).replace(/\+/g, ' '); // => " "
"+".replace(/\+/g, ' ').replace(/(%[0-9A-Z]{2})+/g, decodeURIComponent); // => " "

FagnerMartinsBrack added a commit that referenced this issue Oct 2, 2015
@dezlov
Copy link
Author

dezlov commented Oct 2, 2015

I've left a comment at 8b2a139#commitcomment-13571492 about few issues with that commit.

A suggestion: Instead of supplying very cumbersome Cookies.withConverter({...}) for PHP, Rack, and possibly other platforms - why not built them into the library as say Cookies.withPHPConverter()?

Otherwise people are likely to miss info in SERVERS.md or may find it too cumbersome to use in every project.

@FagnerMartinsBrack
Copy link
Member

A suggestion: Instead of supplying very cumbersome Cookies.withConverter({...}) for PHP, Rack, and possibly other platforms - why not built them into the library as say Cookies.withPHPConverter()?

If the spec allows the character then there's nothing we can do here, this is a bug in PHP or in the spec, not js-cookie.

Adding server-side specific code would break our attempt to make this lib as lightweight as possible and would require a lot of maintenance effort. Also, cookie handling was like Wild West before RFC 6265, so there's still a lot of incompatibilities in the wild, we can't just make it work for all cases. The best we can do is to make it easy for the developer to workaround those server-side incompatibilities by providing hooks and documentation for each specific case.

@dezlov
Copy link
Author

dezlov commented Oct 3, 2015

Adding server-side specific code would break our attempt to make this lib as lightweight as possible and would require a lot of maintenance effort.

Those 15 lines of code will provide invaluable functionality without performance penalty ... from my point of view it's a no-brainer.

we can't just make it work for all cases

This is not about making it work for all cases. There is a very specific case (PHP), and since the solution is established, why not make it permanent instead of burying it in the docs.

These are just my thoughts. If it doesn't resonate, I'll move on.

Thanks for your time spent on this issue.

@FagnerMartinsBrack
Copy link
Member

When I say "working for all cases" I mean that there's no sense making it work only for one situation while refusing to do the same for other similar situations that will come in the future. If we do for PHP, we need to do it for Tomcat, then for JBoss, then for Rails. And then we realize servers implement other RFCs, like 2068, while ignoring the RFC 6265. Then we could realize it would be better to change and provide a code to add flags to treat each RFC in a specific way instead (like how Java Servlet does with cookie setVersion(1), setVersion(2)... etc, to specify the version of the cookie protocol that specific cookie complies with).

To prevent all this edge cases and maintenance, the best we can do is to follow a spec. In this case we choose to follow RFC 6265, the official cookies RFC.

These are just my thoughts. If it doesn't resonate, I'll move on.

I believe this discussion and any on this similar matter is useful, and considering the purpose of the lib, I believe the specifics of how to encode/decode the cookies should not even have been treated by js-cookie in the first place because this is server-side agnostic. It should only serve as an interface do abstract the peculiarities of cookie handling in the browsers.

@carhartl

@nanaya
Copy link

nanaya commented Oct 3, 2015

Looks like the order doesn't matter. decodeURIComponent doesn't mess with the plus sign, only percent encoded characters:

I mean, the server (rack) sends %2B for +. And thus, when reading, conversion from + to space must happen before conversion from %2B to + otherwise both will be converted to space.

  • Server sends
    • before encoding: here be + sign
    • what's actually being sent: here+be+%2B+sign
  • Client reads
    • decode first: "here+be+%2B+sign".replace(/(%[0-9A-Z]{2})+/g, decodeURIComponent).replace(/\+/g, ' ') // => "here be sign" (imagine three spaces there)
    • replace first: "here+be+%2B+sign".replace(/\+/g, ' ').replace(/(%[0-9A-Z]{2})+/g, decodeURIComponent) // => "here be + sign"

@FagnerMartinsBrack
Copy link
Member

@nanaya Oh, now I got it. When I update the snippet I let you know, thanks! :D

FagnerMartinsBrack added a commit that referenced this issue Oct 17, 2015
FagnerMartinsBrack added a commit that referenced this issue Oct 17, 2015
Otherwise "legit" encoded pluses will be replaced incorrectly

This is based on @nanaya's comment at #70 (comment)
@FagnerMartinsBrack
Copy link
Member

@nanaya see 214a23e

@FagnerMartinsBrack
Copy link
Member

The Cookie implementation of servlet 3.0 on JBoss does not allow [ or ] in the cookie value, although they are valid cookie characters:

static {
    if (Boolean.valueOf(System.getProperty(
            "org.glassfish.web.rfc2109_cookie_names_enforced",
            "true")).booleanValue()) {
        TSPECIALS = "/()<>@,;:\\\"[]?={} \t";
    } else {
        TSPECIALS = ",; ";
    }
}

See #100.

@jboss-arquillian
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants