-
Notifications
You must be signed in to change notification settings - Fork 7.8k
hash_equals: Leak less lengths differences. (Bug 67939) #792
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
hash_equals leaks difference in length of compared strings. This implementation ported from Symfony don't.
This has been discussed at length on the mailing list, so a 👎 from me on changing this. |
@cjbj I've read it. I couldn't find a mention about leaking if there is a length difference. @datibbaw The actual function's prototype is confusing and the doc is not really explicit (http://php.net/hash_equals says that both arguments must be of the same length but does not explicitly says that the length will leak with the current implementation). Btw it will allow to use As said on the wiki, an implementation in a low level language is better and it will benefit to the whole PHP ecosystem. |
A far from perfect benchmark about length leak: https://gist.github.com/dunglas/4b33d449fd1b05469077 The results with this implementation (edit, updated with data from the new CPython inspirited version):
The "leaked" information seems to be the size of the user submitted data (already known by the attacker). The results with PHP 5.6.0:
Even if an advanced attacker can discover the length (we can still say in the doc that this function is not robust against length leaking), this is a way harder than with the current implementation for almost no cost. |
@dunglas We determined that we don't have any robust mechanism to prevent length leaks and as such preferred to explicitly not support comparison of strings with different length, rather than provide a (wrong) implication that we protect against this, while only providing something that kinda-sorta works. (The docs should be more explicit about this function not being safe with different lengths - right now that isn't clear...) |
@nikic isn't it better to have something hard to crack (and documented as not robust to avoid a wrong impression of security) than something easy to crack (think "secure" cookie signature for instance). |
If this implementation is still robust for same length strings, of course (if it is not the case, maybe can we introduce a new function or a flag to this one). |
@dunglas A signature always has the same known length. A hash always has the same known length. I don't see why the different-lengths case would be relevant in that case. |
Python use a similar and probably better implementation (thanks to They advertise in their doc that length can potentially leak. I think they took the good decision (robuster than classical comparisons and advertised as not secure in some cases). For a use case, see https://github.com/symfony/symfony/blob/397687f345f4ea83f344a54275f4bc328ac2d4b4/src/Symfony/Component/Security/Csrf/CsrfTokenManager.php#L95 |
@dunglas Why does that use case require timing safety with different length strings? Is the length of CSRF tokens not constant and known? |
@nikic the length can vary. It depends of your configuration and of your CSRF token generator implementation: https://github.com/symfony/symfony/blob/397687f345f4ea83f344a54275f4bc328ac2d4b4/src/Symfony/Component/Security/Csrf/TokenGenerator/UriSafeTokenGenerator.php#L47 |
@dunglas But within an application? As the client needs to submit the CSRF token, he can inspect a sample value for it - and the length of that value should apply to all tokens across the application. Anyway, I don't really have a problem with adding this, if we properly document that it's unsafe. Needs to run by the list though, as this was a point of contention at the time the function was implemented. |
@nikic you're right if the attacker is able to get a first CSRF token, this is not always the case. A custom generator can also generate not constant length token. |
As a side note, Zend Framework also try to hide length of compared strings and this function is not usable to replace the PHP implementation: https://github.com/zendframework/zf2/blob/master/library/Zend/Crypt/Utils.php#L41 |
Hmm... I'm no security expert, but I wonder if this isn't solving the wrong problem. Your function itself won't leak on length, but wouldn't all the memory allocations and so on surrounding it leak instead? |
In Java, the leak on lengths (almost the same algorithm than in PHP 5.6.0) was considered as a security vulnerability and has been fixed: http://codahale.com/a-lesson-in-timing-attacks/ @TazeTSchnitzel I'm not an expert too. What I want to do is not be 100% sure that the function leak anything but make it harder for an attacker to compute useful data. And, btw, increase the security of the whole PHP ecosystem by allowing Sf, Zf, Joomla... to switch to the (better) C version. |
c4329a5
to
dc090a4
Compare
Alright, let's go through this. First off, it's very hard not to leak length information. Very hard. In the general case, it may not even be possible. In the case of Python, there are data-length dependent branches:
In that case, the first two branches will depend on the length difference between the two strings. This is a source of timing information about the two lengths. This is because x86 processors make heavy use of branch prediction. This is able to be leveraged by an attacker. So leveraging the difference between branch prediction misses (which are extremely expensive) and hits, it leaks information that the lengths are not the same. To prove that this difference exists, let's analyze your benchmark results. If we average all of the timings on a per-byte basis, we get the following results:
Hmmm... They are all different? Yes. Well, but the smallest value is the most expensive one. Weird. Well, not so, if we think about it. There's overhead to calling the function, and setting up (the two if statements). And those costs get averaged in and are constant for each size (well, one of them is different for our known length of 1024). But let's look at the difference between the average times. But we need to normalize that difference, because it's spread out over different numbers of bits. So we wind up needing an equation to figure out the overhead (these numbers are simply the average time for a full run:
And since we have 2 unknowns and 3 functions, we can solve. Let's solve for
Then
But wait, that only accounts for the two, so we need to take into account the 2048 run. So let's compute the three permutations of overhead and average them (between 512 and 1024, 512 and 2048, and 1024 and 2048:
Average: So when we subtract that number from our original table, we get:
Notice how the most expensive per byte is 1024... And that's the one which has our user defined string! If you want to try this yourself, rather than trying 512 vs 1024 vs 2048, try it on 1027 vs 1028 vs 1029 (all three not the same as 1024 known). And try it on 1023 vs 1024 vs 1025 (where 1024 is known). And use a lot more than 4 significant digits of timing. And you should be able to see without even all this math, that 1024 is the value. In fact, in general (for arbitrary strings) it's not really possible to prevent leaking the length of the source string. We could hash each string with a secret random key, and verify the hashes match. This is called Double HMAC Verification. This seems like it hides length differences. Right? Well, no. Because all hash functions pad input to a multiple of its block size. So there is branch conditional logic around exact string size. And you can trivially detect how many blocks there are, so you know the scale of the string. Additionally, the link that @dunglas posted about Java's timing attack was because it leaked timing information. The fixed Java implementation is basically identical to PHP's for all practical purposes. HoweverI must question the use-case for protecting arbitrary length strings. The typical use-case for timing safe comparison is validating MACs (Message Authentication Codes. MACs should always have the same length. If it has a different length, something is very wrong. So leaking the length isn't an issue. Especially since you're giving the attacker the original length via the MAC when you append/prepend it to the ciphertext. What are you comparing with a timing safe comparison where lengths could be different? And if it's that critical, you should pre-hash the value in the database (not the same as hashing it before comparison), so that it's reduced to a fixed length by the time the attacker can guess it. That way, even though timing information may be leaked, it's useless as it's not timing information about the data, but about its hash. So in general, I think that PHP's implementation is good enough. Yes, we can try to defend against length leak. But it won't work. So we can either pretend to, or we can simply acknowledge that it won't work, and document it. And tell users that they should design their systems so length isn't a secret. And that will improve security. |
@ircmaxell thanks for the explanation. Here what we want is not to leak nothing, but to leak less, and the Python implementation leak less than the PHP one. If we consider that the PHP implementation is good enough, what about throwing an error (or a warning) when lengths are not equals instead of returning false. It will be more explicit for the end developer. |
@dunglas I'll let Anthony (ircmaxell) say if that's a good idea or not, but if it is, you should use EDIT: Wait, no, that's a terrible idea, silly me. |
I am strongly against adding an error (warning, notice, strict, whatever you want to call it). If one of the inputs is user-controlled, that means that the user can cause you to raise an error? That's not good. And your only defense of that is having the developer add a check for the length (which is adding a timing difference anyway). Point being that it's impossible to defend against length differences in the general case. Raising errors would just make that worse. Or at best lead people to thinking:
Is somehow safer than just using No, different length strings are a valid case. And in the case where you don't want to leak length information, you need to do things like pre-hashing (in a different request/process). But even at that point, it should be extremely rare when you do actually need to protect length as more than just obscurity... |
@ircmaxell, if, by design, |
@dunglas Comparing strings of different lengths is a completely valid and even normal thing, you're just not safe from it leaking the length difference. |
Other way around. And it's not up to the developer to determine if the strings are different lengths. It's the attacker/user that does. They are the one who is submitting the data that's being compared. Normal comparison functions (
Will take measurably less time than
So information is leaked. And if an attacker can knowingly control one of the pieces of data, then they can deduce the entire string, by measuring the timing differences at play. With So yes, And yes, while we could make it harder to spot the difference of length (like python did with its implementation), that will lead to a situation where many think that it's safe, when its not. And lead to developers not designing systems where length is a secret, which reduces overall security. I'd rather not make false promises ("This is mostly resistant to length leaks"). Instead, I'd rather make security where we can, and document and educate where we can't. |
@ircmaxell my proposal here is making harder to spot the difference of length and documenting the function on php.net has leaking the length (without even mentioning a mitigation). It will not give a wrong impression of security. It will also allow using the C implementation (more robust because lower level) in Symfony (they want to keep the mitigation, see symfony/symfony#11797) and other frameworks implementing a mitigation strategy. Many projects will benefit of an increased security (even if far from perfect and badly designed) by this change. |
I commented on that PR. this needs to be fixed with education, not with applying changes that don't work because it makes someone feel safer without actually protecting them. |
For information, the HHVM implementation try to hide the length too: https://github.com/facebook/hhvm/blob/89ede71cd24a8ba901ae7a89142b08feaaf8e8ab/hphp/runtime/ext/ext_hash.cpp#L366 The commit introducing this feature: facebook/hhvm@2ff6f85 cc @sgolemon @apage43 |
@dunglas Facebook doing something silly doesn't mean we should. |
@TazeTSchnitzel I don't want to troll. I think there is a consensus to document the function has leaking length information. But having a similar behavior for security functions in both implementations benefits to all PHP apps (regardless if the behavior is to return early or not). |
@dunglas There's no point given it's not in our docs, and might even be unhelpful as people will gain the impression that length isn't leaked. |
The warning in the Python doc (https://docs.python.org/3.5/library/hmac.html#hmac.compare_digest):
Regardless if we keep the current implementation or not, I propose to improve http://php.net/hash_equals by making clear that the length can leak:
|
@dunglas Yeah, it's bad the docs don't make clear the length can leak. Your function removes this warning from the PHP source, though. |
@@ -730,12 +730,12 @@ PHP_FUNCTION(hash_pbkdf2) | |||
|
|||
/* {{{ proto bool hash_equals(string known_string, string user_string) | |||
Compares two strings using the same time whether they're equal or not. | |||
A difference in length will leak */ | |||
Inspirited by CPython's algorithm: https://github.com/python/cpython/blob/c7688b44387d116522ff53c0927169db45969f0e/Modules/_operator.c#L175 */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inspired
dc090a4
to
4516782
Compare
@Tyrael you're right. Fixed and updated the PR body reflecting the current state of this proposal. |
@dunglas I agree 100% that the docs should explain the leak. I think we need to be careful though, as the wording should be thought through as to indicate that the function is still safe and should be used. A warning may scare people off and make the overall situation worse. So while it definitely should say that lengths are leaked. Perhaps it should also say that this is expected (and impossible to defend against anyway), with suggestions for how to secure it if necessary... |
Additionally, looking at the HHVM implementation, it most definitely leaks length information (worse than the python implementation). Because it has data-dependent branching, which will leak the length to a side channel attacker. And rather than being a single branch like cpython's version, it's a branch within a loop. And keep in mind that there aren't gradations of leakages. Either it leaks information, or it doesn't. If it leaks information, the difference between a gardenhose and a water main is irrelevant when all you need is a drop. So while I don't think it's worth changing any of the implementations. i think it's worth while documenting the limitations, but the limitations that apply to all of them. Not just one. So HHVM and PHP should both document the leaked length information. And should avoid saying things like "leaks less" or "is not as leaky as", etc. |
@ircmaxell can you write something to add to the doc about how to secure apps with pre-hashing or other applicable strategies? I strongly agree that both PHP.net, HHVM and frameworks docs must make it clear that the length will leak. I tried for Symfony (symfony/symfony#11822). Concerning the Python-like implementation, I don't understand why you introduced a similar mitigation in Symfony and don't want it in PHP (documented as leaking the length anyway). Correct me if I'm wrong: an attacker will take more time / need more ressources and maybe fail to crack a poorly designed "typical" web application (doing only one call to Here is a use case: Shorter: why not merge it if it makes it harder in some cases for an attacker while documented as leaking the length. |
Can you provide an example where leaking the length results in a breach? In what specific cases is the length a protected secret? Additionally, even if the Symfony version was documented to not leak length information, it did. So the issue isn't that PHP's leaks it, it's that Symfony's pretended not to.
Does it? Can you prove that? What edge does knowing the length give an attacker in a real world app? |
I am swayed by the arguments put forward in opposition that changing the implementation achieves nothing of any value. Closing PR. |
hash_equals
leaks difference in length of compared strings (this is a documented behavior). The current implementation prevents from usinghash_equals
as a remplacement of Symfony, Zend Framework and Joomla PHP implementations of constant-time string comparison because they use an algorithm providing a mitigation.This implementation ported from CPython still leak (#792 (comment)), but less. It makes harder (but still possible) for an attacker guessing the secret length, especially over a network.
The proposal is making things harder for an attacker by using this implementation and documenting the function on php.net as leaking lengths (#792 (comment)), in a similar fashion than Python.
The discussion started in a try to use
hash_equals
in Symfony: symfony/symfony#11797(Reopened against the 4.6 branch.)
Edit: updated, reflecting the discussion.