Skip to content

Excessive hash collisions in IPv4Network and IPv6Network classes #134062

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
mssalvatore opened this issue May 15, 2025 · 4 comments
Closed

Excessive hash collisions in IPv4Network and IPv6Network classes #134062

mssalvatore opened this issue May 15, 2025 · 4 comments
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error type-security A security issue

Comments

@mssalvatore
Copy link
Contributor

mssalvatore commented May 15, 2025

Bug report

Bug description:

While working with IPv4Network objects, @cakekoa and I discovered that hash collisions were common. Below is a small script that shows a few examples of different networks that have hash collisions.

from ipaddress import IPv4Network, IPv6Network


def test_hash_collision(network_1, network_2):
    # Shows that the networks are not equivalent.
    assert network_1 != network_2
    assert network_1.num_addresses != network_2.num_addresses

    # Shows a hash collision similar to CVE-2020-14422
    assert hash(network_1) == hash(network_2)


test_hash_collision(IPv4Network("192.168.1.255/32"), IPv4Network("192.168.1.0/24"))
test_hash_collision(IPv4Network("172.24.255.0/24"), IPv4Network("172.24.0.0/16"))
test_hash_collision(IPv4Network("192.168.1.87/32"), IPv4Network("192.168.1.86/31"))
test_hash_collision(
    IPv4Network("10.0.0.0/8"), IPv6Network("ffff:ffff:ffff:ffff:ffff:ffff:aff:0/112")
)

Upon investigating, we discovered CVE-2020-14422, which fixed a similar (albeit much more severe) hash collision in the IPv4Interface and IPv6Interface classes. This CVE was fixed in b98e779.

The implementation of _BaseNetwork.__hash() looks like this on the main branch:

    def __hash__(self):
        return hash(int(self.network_address) ^ int(self.netmask))

Based on the fix for CVE-2020-14422, the fix for the _BaseNetwork class would likely look like:

diff --git a/Lib/ipaddress.py b/Lib/ipaddress.py
index 703fa289dda..d8a84f33264 100644
--- a/Lib/ipaddress.py
+++ b/Lib/ipaddress.py
@@ -729,7 +729,7 @@ def __eq__(self, other):
             return NotImplemented
 
     def __hash__(self):
-        return hash(int(self.network_address) ^ int(self.netmask))
+        return hash((int(self.network_address), int(self.netmask)))
 
     def __contains__(self, other):
         # always false if one is v4 and the other is v6.

As this method produces far fewer collisions than what caused CVE-2020-14422, the security impact is likely negligible. @sethmlarson from the PSRT team has given us the green light to publicly submit a fix.

CPython versions tested on:

3.12

Operating systems tested on:

Linux, macOS

Linked PRs

@mssalvatore mssalvatore added the type-bug An unexpected behavior, bug, or error label May 15, 2025
mssalvatore added a commit to mssalvatore/cpython that referenced this issue May 15, 2025
@picnixz picnixz added the stdlib Python modules in the Lib dir label May 15, 2025
mssalvatore added a commit to mssalvatore/cpython that referenced this issue May 16, 2025
mssalvatore added a commit to mssalvatore/cpython that referenced this issue May 16, 2025
mssalvatore added a commit to mssalvatore/cpython that referenced this issue May 16, 2025
mssalvatore added a commit to mssalvatore/cpython that referenced this issue May 16, 2025
@gpshead gpshead added the type-security A security issue label May 17, 2025
gpshead pushed a commit that referenced this issue May 22, 2025
)

gh-134062: Fix hash collisions in IPv4Network and IPv6Network
gh-134062: Add hash collision regression test
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 22, 2025
…ythonGH-134063)

(cherry picked from commit f3fc0c1)

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network
pythongh-134062: Add hash collision regression test
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 22, 2025
…ythonGH-134063)

(cherry picked from commit f3fc0c1)

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network
pythongh-134062: Add hash collision regression test
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 22, 2025
…ythonGH-134063)

(cherry picked from commit f3fc0c1)

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network
pythongh-134062: Add hash collision regression test
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 22, 2025
…ythonGH-134063)

(cherry picked from commit f3fc0c1)

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network
pythongh-134062: Add hash collision regression test
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 22, 2025
…ythonGH-134063)

(cherry picked from commit f3fc0c1)

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network
pythongh-134062: Add hash collision regression test
miss-islington pushed a commit to miss-islington/cpython that referenced this issue May 22, 2025
…ythonGH-134063)

(cherry picked from commit f3fc0c1)

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network
pythongh-134062: Add hash collision regression test
@gpshead
Copy link
Member

gpshead commented May 22, 2025

I've merged the initial PR as it seems to be an improvement. But does it actually fix it or just move the collision pairs to other easily reliably computable combos?

Python's hash randomization likely isn't being used in the hash(int, int) implementation. If we wanted that, hashing the pair of str(hex(int))s would be similarly fast but also get python's per process hash randomization.

gpshead pushed a commit that referenced this issue May 22, 2025
…H-134063) (#134477)

gh-134062: Fix hash collisions in IPv4Network and IPv6Network (GH-134063)
(cherry picked from commit f3fc0c1)


gh-134062: Fix hash collisions in IPv4Network and IPv6Network
gh-134062: Add hash collision regression test

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
gpshead pushed a commit that referenced this issue May 22, 2025
…H-134063) (#134476)

gh-134062: Fix hash collisions in IPv4Network and IPv6Network (GH-134063)
(cherry picked from commit f3fc0c1)


gh-134062: Fix hash collisions in IPv4Network and IPv6Network
gh-134062: Add hash collision regression test

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
Yhg1s pushed a commit that referenced this issue May 26, 2025
…H-134063) (#134478)

gh-134062: Fix hash collisions in IPv4Network and IPv6Network (GH-134063)
(cherry picked from commit f3fc0c1)


gh-134062: Fix hash collisions in IPv4Network and IPv6Network
gh-134062: Add hash collision regression test

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
lkollar pushed a commit to lkollar/cpython that referenced this issue May 26, 2025
…ythonGH-134063)

pythongh-134062: Fix hash collisions in IPv4Network and IPv6Network
pythongh-134062: Add hash collision regression test
@mssalvatore
Copy link
Contributor Author

I've merged the initial PR as it seems to be an improvement. But does it actually fix it or just move the collision pairs to other easily reliably computable combos?

Python's hash randomization likely isn't being used in the hash(int, int) implementation. If we wanted that, hashing the pair of str(hex(int))s would be similarly fast but also get python's per process hash randomization.

I don't know enough about the internals of Python hashing to offer much of an answer. All I can really say is that the problem was initially caused by collisions in the inputs to hash(). An operation (XOR) was performed on unique properties that define a network. This operation sometimes resolved unique networks to the same value, meaning distinct networks could provide the same input to hash(). In other words, the collisions were occurring before hash() was even called. In the new implementation I don't believe this can happen, but that doesn't necessarily mean that hash collisions are impossible.

@tim-one
Copy link
Member

tim-one commented May 28, 2025

Python's hash randomization likely isn't being used in the hash(int, int) implementation

Right, randomization isn't used for hashing either ints or tuples.

I don't know enough about the internals of Python hashing to offer much of an answer

Luckily for us, I do know a lot about it 😉. The original raw xor was a Bad Idea™, for reasons you discovered the hard way. While there's nothing obvious about this, hashing a two-tuple of the ints instead is far more robust, due to the heavily researched and tested way tuple hashing combines the hashes of the tuple's elements. Not just an xor, but multiplies and bit rotations.

Without going into the weeds, the tuple hash algorithm isn't aiming at eliminating collisions (which is,, in general, impossible), but at minimizing "pileup": the number of distinct inputs that all hash to the same value. It's not collisions that damage runtime so much as high pileup.

But tuple hashing wasn't designed with a malicious adversary in mind. Is that a potential problem here? The original raw xor was so weak it was quite possible to stumble into bad cases purely by ordinary bad luck.

@gpshead
Copy link
Member

gpshead commented May 28, 2025

Thanks! I'm going to close this one as the PR made the situation a lot better than the old XOR.

A future maybe-improvement of introducing a bytes/str into the mix so that python's hash randomization seed is factored is an option, a new issue can be opened for that if anyone demonstrates it still being a practical problem.

@gpshead gpshead closed this as completed May 28, 2025
ambv pushed a commit that referenced this issue Jun 2, 2025
…H-134063) (GH-134479)

(cherry picked from commit f3fc0c1)

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
ambv pushed a commit that referenced this issue Jun 2, 2025
…H-134063) (GH-134480)

(cherry picked from commit f3fc0c1)

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
ambv pushed a commit that referenced this issue Jun 2, 2025
…H-134063) (GH-134481)

(cherry picked from commit f3fc0c1)

Co-authored-by: Mike Salvatore <mike.s.salvatore@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stdlib Python modules in the Lib dir type-bug An unexpected behavior, bug, or error type-security A security issue
Projects
None yet
Development

No branches or pull requests

4 participants