gh-136599: Improve long_hash #136600

eendebakpt · 2025-07-12T21:36:40Z

There is a corner case in the C code that not tested in long_hash: the final check

    if (x == (Py_uhash_t)-1)
        x = (Py_uhash_t)-2;

We can improve performance of long_hash by unrolling the first two digits

For reviewers: the first commit adds a test, the second and third commit unroll the hash calculation for the digits. The performance gain is not a lot, so I would be fine with reverting the last two commits. A test scripts:

import pyperf

setup = """

z = 2 << 30
two_digit_ints = list(range(z, z + 2000))

def long_hash_small_int():
    for _ in range(4):
        for ii in range(0,250):
            hash(ii)

def long_hash_one_digit():
    z = 1000
    for ii in range(z, z + 1000):
        hash(ii)

def long_hash_two_digit():
    z = 2 << 30
    for ii in range(z, z + 1000):
        hash(ii)

def long_hash_multi_digit():
    z = 1 << 30 << 30 << 30 << 30
    for ii in range(z, z + 1000):
        hash(ii)
"""

runner = pyperf.Runner()
runner.timeit(name="long_hash_small_int", stmt="long_hash_small_int()", setup=setup)
runner.timeit(name="long_hash_one_digit", stmt="long_hash_one_digit()", setup=setup)
runner.timeit(name="long_hash_two_digit", stmt="long_hash_two_digit()", setup=setup)
runner.timeit(name="long_hash_multi_digit", stmt="long_hash_multi_digit()", setup=setup)
runner.timeit(name="set(ints)", stmt="set(two_digit_ints)", setup=setup)

On my system I get a fairly consistent performance increase of 10% for the set(ints) test. For the other benchmarks my system (thermal throttled laptop) is not stable enough (the benchmarks are also a bit dominated by the creation of new integers).

Issue: Improve long_hash #136599

Lib/test/test_long.py

Objects/longobject.c

chris-eibl · 2025-07-13T09:18:40Z

Lib/test/test_long.py

@@ -1693,5 +1693,11 @@ class MyInt(int):
        # GH-117195 -- This shouldn't crash
        object.__sizeof__(1)

+    def test_long_hash(self):


+1 on introducing hash tests in test_long.py even though there are

cpython/Lib/test/test_builtin.py

Line 1165 in 3dbe02c

def test_hash(self):

and Lib/test/test_hash.py

There are special tests with respect to hashing in many other test modules, e.g.

cpython/Lib/test/test_float.py

Line 635 in 9e5cebd

def test_hash(self):

so IMHO this is a good fit 👍

serhiy-storchaka · 2025-07-16T07:05:00Z

This may be not so on a build with 15-bit digits.

In general, this should be conditional, depending on the size of the compact representation and the size of the digit, at compile time.

Objects/longobject.c

Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>

chris-eibl · 2025-09-01T07:08:54Z

Objects/longobject.c

+
+    // unroll first two digits
+#if ( PyHASH_BITS > PyLong_SHIFT )
+    assert(i>=2);


Maybe just use assert(i>=1) here and then again in the second #if below?

Otherwise LGTM.

skirpichev · 2025-09-01T07:22:06Z

Lib/test/test_long.py

+        self.assertEqual(hash(-sys.hash_info.modulus - 2), -2)
+        self.assertEqual(hash(-sys.hash_info.modulus - 1), -2)
+        self.assertEqual(hash(-sys.hash_info.modulus), 0)
+        self.assertEqual(hash(-sys.hash_info.modulus + 1), - (sys.hash_info.modulus - 1))


Suggested change

self.assertEqual(hash(-sys.hash_info.modulus + 1), - (sys.hash_info.modulus - 1))

self.assertEqual(hash(-sys.hash_info.modulus + 1), -sys.hash_info.modulus + 1)

skirpichev · 2025-09-01T07:24:17Z

Lib/test/test_long.py

@@ -1693,5 +1693,22 @@ class MyInt(int):
        # GH-117195 -- This shouldn't crash
        object.__sizeof__(1)

+    def test_hash(self):


I think that part could be in a separate pr, merged first.

The docs describe algorithm in details:
https://docs.python.org/3/library/stdtypes.html#hashing-of-numeric-types

Maybe we could test it against pure-Python implementation, using also hypothesis?

skirpichev · 2025-09-01T07:34:33Z

Objects/longobject.c

-    x = 0;
+
+    // unroll first two digits
+#if ( PyHASH_BITS > PyLong_SHIFT )


PyHASH_BITS can be either 31 or 61, while PyLong_SHIFT - either 15 or 30. So, this condition is always true.

BTW, it seems the 15-bit digit is untested by regular CI. Is there at least some buildbot with such settings? I'll open an issue.

Suggested change

#if ( PyHASH_BITS > PyLong_SHIFT )

The case is indeed untested, but I added it because of a comment by Serhiy #136600 (comment)

The case is indeed untested

FYI: #138336

I added it because of a comment by Serhiy

Then fine. Though, an assert might be option.

skirpichev

I think long_hash() change is correct, modulo few nitpicks.

I also confirm small performance boost with OP benchmark:

Benchmark	ref	patch
long_hash_small_int	81.5 us	82.2 us: 1.01x slower
long_hash_one_digit	136 us	137 us: 1.00x slower
long_hash_two_digit	186 us	179 us: 1.04x faster
long_hash_multi_digit	338 us	320 us: 1.06x faster
set(ints)	132 us	120 us: 1.10x faster
Geometric mean	(ref)	1.04x faster

raw data

$ python a.py -q -o ref.json
long_hash_small_int: Mean +- std dev: 81.5 us +- 0.2 us
long_hash_one_digit: Mean +- std dev: 136 us +- 2 us
long_hash_two_digit: Mean +- std dev: 186 us +- 1 us
long_hash_multi_digit: Mean +- std dev: 338 us +- 7 us
set(ints): Mean +- std dev: 132 us +- 3 us
$ python a.py -q -o patch.json
long_hash_small_int: Mean +- std dev: 82.2 us +- 0.1 us
long_hash_one_digit: Mean +- std dev: 137 us +- 1 us
long_hash_two_digit: Mean +- std dev: 179 us +- 2 us
long_hash_multi_digit: Mean +- std dev: 320 us +- 5 us
set(ints): Mean +- std dev: 120 us +- 1 us

I'm +1 for test_hash() and think that this deserve a separate pr.

I'm unsure about the rest. Some performance improvement (up to 10%) come with much less readable code. Maybe you can reorganize one in some way? I.e. the huge comment, currently in the loop - will come first to explain things, than code to unroll one-two steps, than the loop. Patch size will be much bigger, of course. In same shoot you can choose a consistent naming for PyHASH_BITS and friends (either use private or public macro).

CC @tim-one, @serhiy-storchaka

Objects/longobject.c

Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>

eendebakpt added 3 commits July 12, 2025 22:21

Add test for long_hash

146f5aa

Unroll first digit calculation in long_hash

a162da2

Unroll second digit calculation in long_hash

4f9fc76

bedevere-app bot added the awaiting review label Jul 12, 2025

bedevere-app bot mentioned this pull request Jul 12, 2025

Improve long_hash #136599

Open

eendebakpt added 3 commits July 12, 2025 23:37

whitespace

07bce4b

add gh number

32341de

Merge branch 'long_hash' of github.com:eendebakpt/cpython into long_hash

194fb7a

StanFromIreland reviewed Jul 13, 2025

View reviewed changes

Lib/test/test_long.py Outdated Show resolved Hide resolved

chris-eibl reviewed Jul 13, 2025

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

chris-eibl reviewed Jul 13, 2025

View reviewed changes

eendebakpt added 3 commits July 13, 2025 20:58

review comments

a48860f

fix test on wasi

6d3754b

add news entry

fec9fbe

rhettinger requested a review from tim-one July 14, 2025 03:14

eendebakpt added 2 commits July 25, 2025 21:13

review comments

08d7ba9

Merge branch 'main' into long_hash

14a90f1

eendebakpt requested a review from skirpichev August 31, 2025 18:19

skirpichev reviewed Sep 1, 2025

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

Objects/longobject.c Outdated Show resolved Hide resolved

Apply suggestions from code review

76c4f6a

Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>

skirpichev self-requested a review September 1, 2025 05:09

chris-eibl self-requested a review September 1, 2025 07:04

chris-eibl reviewed Sep 1, 2025

View reviewed changes

skirpichev reviewed Sep 1, 2025

View reviewed changes

Objects/longobject.c Outdated Show resolved Hide resolved

Objects/longobject.c Show resolved Hide resolved

eendebakpt mentioned this pull request Sep 1, 2025

gh-136599: Add tests for long_hash #138335

Open

Update Objects/longobject.c

f720557

Co-authored-by: Sergey B Kirpichev <skirpichev@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

gh-136599: Improve long_hash #136600

gh-136599: Improve long_hash #136600

eendebakpt commented Jul 12, 2025 •

edited by bedevere-app bot

Loading

Uh oh!

Uh oh!

Uh oh!

chris-eibl Jul 13, 2025

Uh oh!

serhiy-storchaka commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

chris-eibl Sep 1, 2025

Uh oh!

skirpichev Sep 1, 2025

Uh oh!

skirpichev Sep 1, 2025

Uh oh!

skirpichev Sep 1, 2025 •

edited

Loading

Uh oh!

eendebakpt Sep 1, 2025

Uh oh!

skirpichev Sep 1, 2025

Uh oh!

skirpichev left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	self.assertEqual(hash(-sys.hash_info.modulus + 1), - (sys.hash_info.modulus - 1))
	self.assertEqual(hash(-sys.hash_info.modulus + 1), -sys.hash_info.modulus + 1)

Uh oh!

gh-136599: Improve long_hash #136600

Are you sure you want to change the base?

gh-136599: Improve long_hash #136600

Conversation

eendebakpt commented Jul 12, 2025 • edited by bedevere-app bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

chris-eibl Jul 13, 2025

Choose a reason for hiding this comment

Uh oh!

serhiy-storchaka commented Jul 16, 2025

Uh oh!

Uh oh!

Uh oh!

chris-eibl Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

skirpichev Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

skirpichev Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

skirpichev Sep 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

eendebakpt Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

skirpichev Sep 1, 2025

Choose a reason for hiding this comment

Uh oh!

skirpichev left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eendebakpt commented Jul 12, 2025 •

edited by bedevere-app bot

Loading

skirpichev Sep 1, 2025 •

edited

Loading