gh-96538: Move some type-checking out of bisect loops #96539

sweeneyde · 2022-09-03T18:31:51Z

Here's a benchmark program:

from pyperf import Runner, perf_counter
from itertools import repeat
from bisect import bisect
from decimal import Decimal

def bench(loops, size, tp):
    a = sorted(map(tp, range(size)))
    k = tp(size//3)

    t0 = perf_counter()
    for _ in repeat(None, loops):
        bisect(a, k); bisect(a, k); bisect(a, k); bisect(a, k); bisect(a, k);
    t1 = perf_counter()

    return t1 - t0


btf = Runner().bench_time_func
for size in (10, 100, 1_000, 10_000, 100_000, 1_000_000):
    for tp in (int, float, Decimal, str):
        btf(f"[{tp.__name__}]*{size:_}", bench, size, tp, inner_loops=5)

And the results:

- [Decimal]*10: 102 ns +- 1 ns -> 89.7 ns +- 1.1 ns: 1.14x faster
- [Decimal]*100: 169 ns +- 17 ns -> 139 ns +- 2 ns: 1.21x faster
- [Decimal]*1_000: 249 ns +- 2 ns -> 202 ns +- 4 ns: 1.23x faster
- [Decimal]*10_000: 312 ns +- 6 ns -> 255 ns +- 2 ns: 1.23x faster
- [Decimal]*100_000: 396 ns +- 5 ns -> 321 ns +- 5 ns: 1.23x faster
- [Decimal]*1_000_000: 459 ns +- 10 ns -> 369 ns +- 4 ns: 1.24x faster

- [float]*10: 62.4 ns +- 0.4 ns -> 50.4 ns +- 0.5 ns: 1.24x faster
- [float]*100: 93.4 ns +- 0.8 ns -> 71.1 ns +- 0.8 ns: 1.31x faster
- [float]*1_000: 149 ns +- 1 ns -> 104 ns +- 2 ns: 1.43x faster
- [float]*10_000: 174 ns +- 2 ns -> 121 ns +- 3 ns: 1.44x faster
- [float]*100_000: 223 ns +- 3 ns -> 162 ns +- 15 ns: 1.37x faster
- [float]*1_000_000: 249 ns +- 5 ns -> 180 ns +- 10 ns: 1.38x faster

- [int]*10: 62.8 ns +- 1.5 ns -> 49.0 ns +- 0.6 ns: 1.28x faster
- [int]*100: 96.0 ns +- 5.7 ns -> 69.8 ns +- 3.6 ns: 1.37x faster
- [int]*1_000: 137 ns +- 1 ns -> 102 ns +- 4 ns: 1.35x faster
- [int]*10_000: 183 ns +- 1 ns -> 122 ns +- 7 ns: 1.51x faster
- [int]*100_000: 218 ns +- 2 ns -> 157 ns +- 3 ns: 1.39x faster
- [int]*1_000_000: 251 ns +- 6 ns -> 179 ns +- 2 ns: 1.41x faster

- [str]*10: 76.0 ns +- 1.0 ns -> 64.7 ns +- 4.4 ns: 1.18x faster
- [str]*100: 119 ns +- 1 ns -> 97.2 ns +- 1.9 ns: 1.22x faster
- [str]*1_000: 173 ns +- 5 ns -> 143 ns +- 4 ns: 1.20x faster
- [str]*10_000: 238 ns +- 10 ns -> 187 ns +- 2 ns: 1.27x faster
- [str]*100_000: 322 ns +- 6 ns -> 222 ns +- 6 ns: 1.45x faster
- [str]*1_000_000: 365 ns +- 7 ns -> 252 ns +- 2 ns: 1.45x faster

Geometric mean: 1.31x faster

Issue: Take advantage of type uniformity in _bisect.bisect() #96538

sweeneyde · 2022-09-03T18:33:59Z

Modules/_bisectmodule.c

+            if (res_obj == Py_NotImplemented) {
+                Py_DECREF(res_obj);
+                compare = NULL;
+                res = PyObject_RichCompareBool(item, litem, Py_LT);


Remark: this could technically call __lt__, __lt__, __gt__ instead of the current behavior, calling __lt__, __gt__.

This is similar to what unsafe_object_compare does though.

rhettinger · 2022-09-04T00:27:28Z

With sorting, which is potentially O(n log n), there is a large payoff for all this added complexity. With bisect, which is O(log n), the payoff is very low. The granularity is different. We call sorting on a full dataset while bisect calls are one per lookup. You would get a better payoff by type specializing min() and max() than for the handful of comparisons made in a typical call to bisect().

I recognize that there has been a shift in the core developer attitudes towards complexity and that many things we would have never considered now are fair game if some clock cycles can be saved. With the interpreter, there was a high payoff that compensated for fewer people being able to understand and maintain the code. There is less of case to be made for complexifying code on the periphery. It would be nice if some simple things remained simple. At one time, the whole module was basically just this code:

        mid = (lo + hi) // 2
        if x < a[mid]:
            hi = mid
        else:
            lo = mid + 1

Tim, what are your thoughts?

tim-one · 2022-09-04T03:58:47Z

Note that sort()'s type specialization is much more involved, writing its own comparison functions for a number of common types. This, in contrast, is more just applying strength reduction to the single "don't know what the type is and don't care - only care whether comparands are the same type, in which case the type-specific comparison function doesn't need to be deduced again" case. This is done on the fly, as needed; no pre-scan required, so no change to O() behavior.

The code seems pretty straightforward to me, and Dennis's timing results do show significant speedups for typical type-homogeneous cases, even for short lists. I think it would help if comments were added to point out which parts are logically necessary, and which just to speed typical cases.

rhettinger · 2022-09-05T04:02:40Z

Tim, thanks for looking at this. It is not aways clear how far we should go.

tim-one · 2022-09-05T04:22:51Z

It is not always clear how far we should go.

No argument here! IMO, "too many" changes go in that add too much complexity for too little gain. The changes here are conceptually simple, though, and the "1.31x faster" overall result looks credible, well-motivated, and more robust than the kinds of changes that make seemingly random changes to C spellings that yield seeming random "1.04x faster" results on some platforms, but also minor slowdowns on others 😉. We're actually taking code off critical paths in this one.

sweeneyde · 2022-09-05T05:02:38Z

Thanks for the reviews!

pablogsal · 2022-09-06T14:55:36Z

Looks like this PR introduced some reference leaks. Looks like there may be some leaks here:

9e35d05

In line 122 and maybe in line 301.

Please, check out or otherwise we will need to revert it if is not fixed in 24 hours as our buildbot policy (if confirmed that indeed the refleaks come from this PR).

Move some type-checking out of bisect loops

f079ac0

sweeneyde requested a review from rhettinger as a code owner September 3, 2022 18:31

bedevere-bot added the awaiting core review label Sep 3, 2022

sweeneyde commented Sep 3, 2022

View reviewed changes

blurb-it bot and others added 3 commits September 3, 2022 18:39

📜🤖 Added by blurb_it.

2cfc70a

XDECREF in error label

b85a132

improve comment

775708f

sweeneyde added performance Performance or resource usage stdlib Python modules in the Lib dir labels Sep 3, 2022

rhettinger requested a review from tim-one September 4, 2022 00:28

tim-one mentioned this pull request Sep 4, 2022

Take advantage of type uniformity in _bisect.bisect() #96538

Closed

Add comments distinguishing fast path from defualt path

083f0d7

rhettinger approved these changes Sep 5, 2022

View reviewed changes

bedevere-bot added awaiting merge and removed awaiting core review labels Sep 5, 2022

sweeneyde merged commit 9e35d05 into python:main Sep 5, 2022

bedevere-bot removed the awaiting merge label Sep 5, 2022

sweeneyde deleted the bisect_faster branch September 5, 2022 05:02

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

gh-96538: Move some type-checking out of bisect loops #96539

gh-96538: Move some type-checking out of bisect loops #96539

sweeneyde commented Sep 3, 2022 •

edited

Loading

sweeneyde Sep 3, 2022

rhettinger commented Sep 4, 2022

tim-one commented Sep 4, 2022

rhettinger commented Sep 5, 2022

tim-one commented Sep 5, 2022

sweeneyde commented Sep 5, 2022

pablogsal commented Sep 6, 2022 •

edited

Loading

gh-96538: Move some type-checking out of bisect loops #96539

gh-96538: Move some type-checking out of bisect loops #96539

Conversation

sweeneyde commented Sep 3, 2022 • edited Loading

sweeneyde Sep 3, 2022

Choose a reason for hiding this comment

rhettinger commented Sep 4, 2022

tim-one commented Sep 4, 2022

rhettinger commented Sep 5, 2022

tim-one commented Sep 5, 2022

sweeneyde commented Sep 5, 2022

pablogsal commented Sep 6, 2022 • edited Loading

sweeneyde commented Sep 3, 2022 •

edited

Loading

pablogsal commented Sep 6, 2022 •

edited

Loading