-
-
Notifications
You must be signed in to change notification settings - Fork 32.1k
Excessive hash collisions in IPv4Network and IPv6Network classes #134062
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
…ythonGH-134063) (cherry picked from commit f3fc0c1) Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com> pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network pythongh-134062: Add hash collision regression test
…ythonGH-134063) (cherry picked from commit f3fc0c1) Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com> pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network pythongh-134062: Add hash collision regression test
…ythonGH-134063) (cherry picked from commit f3fc0c1) Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com> pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network pythongh-134062: Add hash collision regression test
…ythonGH-134063) (cherry picked from commit f3fc0c1) Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com> pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network pythongh-134062: Add hash collision regression test
…ythonGH-134063) (cherry picked from commit f3fc0c1) Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com> pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network pythongh-134062: Add hash collision regression test
…ythonGH-134063) (cherry picked from commit f3fc0c1) Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com> pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network pythongh-134062: Add hash collision regression test
I've merged the initial PR as it seems to be an improvement. But does it actually fix it or just move the collision pairs to other easily reliably computable combos? Python's hash randomization likely isn't being used in the hash(int, int) implementation. If we wanted that, hashing the pair of str(hex(int))s would be similarly fast but also get python's per process hash randomization. |
…ythonGH-134063) pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network pythongh-134062: Add hash collision regression test
I don't know enough about the internals of Python hashing to offer much of an answer. All I can really say is that the problem was initially caused by collisions in the inputs to |
Right, randomization isn't used for hashing either ints or tuples.
Luckily for us, I do know a lot about it 😉. The original raw xor was a Bad Idea™, for reasons you discovered the hard way. While there's nothing obvious about this, hashing a two-tuple of the ints instead is far more robust, due to the heavily researched and tested way tuple hashing combines the hashes of the tuple's elements. Not just an xor, but multiplies and bit rotations. Without going into the weeds, the tuple hash algorithm isn't aiming at eliminating collisions (which is,, in general, impossible), but at minimizing "pileup": the number of distinct inputs that all hash to the same value. It's not collisions that damage runtime so much as high pileup. But tuple hashing wasn't designed with a malicious adversary in mind. Is that a potential problem here? The original raw xor was so weak it was quite possible to stumble into bad cases purely by ordinary bad luck. |
Thanks! I'm going to close this one as the PR made the situation a lot better than the old XOR. A future maybe-improvement of introducing a bytes/str into the mix so that python's hash randomization seed is factored is an option, a new issue can be opened for that if anyone demonstrates it still being a practical problem. |
Uh oh!
There was an error while loading. Please reload this page.
Bug report
Bug description:
While working with
IPv4Network
objects, @cakekoa and I discovered that hash collisions were common. Below is a small script that shows a few examples of different networks that have hash collisions.Upon investigating, we discovered CVE-2020-14422, which fixed a similar (albeit much more severe) hash collision in the
IPv4Interface
andIPv6Interface
classes. This CVE was fixed in b98e779.The implementation of
_BaseNetwork.__hash()
looks like this on the main branch:Based on the fix for CVE-2020-14422, the fix for the
_BaseNetwork
class would likely look like:As this method produces far fewer collisions than what caused CVE-2020-14422, the security impact is likely negligible. @sethmlarson from the PSRT team has given us the green light to publicly submit a fix.
CPython versions tested on:
3.12
Operating systems tested on:
Linux, macOS
Linked PRs
The text was updated successfully, but these errors were encountered: