-
-
Notifications
You must be signed in to change notification settings - Fork 32.6k
bpo-28685: Optimize sorted() list.sort() with type-specialized comparisons #582
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@embg, thanks for your PR! By analyzing the history of the files in this pull request, we identified @DanielStutzbach, @tim-one, @rhettinger, @loewis and @Yhg1s to be potential reviewers. |
Objects/listobject.c
Outdated
@@ -1044,6 +1045,193 @@ sortslice_advance(sortslice *slice, Py_ssize_t n) | |||
#define IFLT(X, Y) if ((k = ISLT(X, Y)) < 0) goto fail; \ | |||
if (k) | |||
|
|||
/* Here we define custom comparison functions to optimize for the cases one commonly | |||
* in practice: homogeneous lists, often of one of the basic types. */ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't there a "encounters" or "finds" missing here, between "commonly" and "in practice"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yup, thanks for pointing this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C is a "call by value" language. In particular, when you pass a struct, the language has to act as if a physical copy of the struct is passed. The easiest way is for the implementation to actually make a copy every time a struct is passed; while a smart compiler can arrange to do it more efficiently in most cases, it's poor practice to rely on that:
http://www.cs.utsa.edu/~wagner/CS2213/structs/structs_pass.html
That's why you almost never see non-trivial structs passed in C. Instead a pointer is passed.
The uses of the MergeState struct in the existing sort code are typical examples of that. Indeed, I believe we'd be better off if you added the new comparison-function members directly to the MergeState struct, and pass around pointers to ms
(the one MergeState instance) instead. The top layers of functions already pass pointers to that struct around, so their signatures wouldn't need to change at all.
@tim-one I'm aware that the struct would be copied; my reasoning for wanting to pass by value was rather nonsensical. I realize now that I was completely wrong. The reason I chose to define a new struct instead of adding to Last thing: since I've got you here, can I ask you another question? Namely, this entire project was based on the following quote from the documentation:
So I figured I can assume the types will stay the same. I mean, what's the point of checking the types if they're going to change on you? However, ppperry supplied the the following counterexample on the tracker:
My code would work correctly here, since the compare functions are equivalent sans side effects, but in general something like this could break Thank you so much for reviewing my code! |
Yes, I believe it would be best to add the new stuff to the MergeState struct. Why that's not confusing: conceptually, we're building an object to do the sort. All the sort-related helper functions are really methods of that object, and the MergeState struct is that object's private data. Indeed, that's why the functions that do currently take a MergeState argument list it first: in analogy with a Python method taking If CPython were coded in C++ instead, it really would be obvious ;-) About the |
@tim-one Now that you explained it, I'm totally on board; I'll rewrite everything to use About the |
I'm curious about what the patched Python prints for this program:
What it prints now:
|
@tim-one Just ran it; it printed this:
|
@tim-one Actually, I'm getting weirded out here. First of all. The test you just gave should not have worked. What you were trying to get at is that Another example. ppperry proposed the following test:
The idea is that if you cache the richcompare, you're going to mess up the print statements. But my patched interpreter printed the same as the current implementation! Finally, I decided to run the original counterexample ppperry proposed, putting in print statements to see what's going on:
Again, it matched up with the current implementation! So my question is: why is my patch working when it shouldn't be working??? I have no idea; can you figure this out? The only explanation I can think of is that maybe heap types all use the same |
Slamming 3 distinct mysteries into one message doesn't really help ;-) I'll address the one I started: I expect you'd get different results (in the NaN test) if your tuple compare loop:
started at Which is a really weird "reason" to keep that loop starting at
|
@tim-one Ah! That makes sense. What happened was, I figured, if you're already at the loop, it's going to be slow anyway, so I may as well exactly copy-paste from the current implementation to reduce the probability of error. Turns out that saved me! Anyway, your call: should I change it to start at 1 (adding a pointer compare to make it safe)? As for the other mysteries: I'm convinced it must just be that So, basically, the patch has withstood all attempts to break it! Awesome! |
Objects/listobject.c
Outdated
assert(v->ob_type == w->ob_type && | ||
v->ob_type->tp_richcompare != NULL); | ||
#endif | ||
int ok; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current style guide allows declarations not at the start of the block.
Objects/listobject.c
Outdated
if (res == NULL) | ||
return -1; | ||
|
||
if (PyBool_Check(res)){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a space before {
.
Objects/listobject.c
Outdated
#endif | ||
int ok; | ||
|
||
if (v == w) return 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Split the line before return
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was actually removed -- it turns out you only need to look at addresses when you're doing equality testing.
Objects/listobject.c
Outdated
|
||
/* Latin string compare: safe for any two latin (one byte per char) strings. */ | ||
static int | ||
unsafe_latin_compare(PyObject* v, PyObject* w, MergeState* ms){ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Move {
to separate line.
Objects/listobject.c
Outdated
/* This function is used by unsafe_object_compare to optimize comparisons | ||
* when we know our list is type-homogeneous but we can't assume anything else. | ||
* In the pre-sort check it is set equal to key->ob_type->tp_richcompare */ | ||
PyObject* (*key_richcompare)(PyObject*, PyObject*, int); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is more common to use a space before *
rather than after it.
The same is everywhere in the new code.
Objects/listobject.c
Outdated
Py_SIZE(lo.keys[0]) > 0); | ||
|
||
PyTypeObject* key_type = (keys_are_in_tuples ? | ||
PyTuple_GET_ITEM(lo.keys[0],0)->ob_type : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a space after ,
.
Objects/listobject.c
Outdated
* lo.keys[i], not lo.keys[i] itself! We verify type-homogeneity | ||
* for lists of tuples in the if-statement directly above. */ | ||
PyObject* key = (keys_are_in_tuples ? | ||
PyTuple_GET_ITEM(lo.keys[i],0) : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a space after ,
.
Objects/listobject.c
Outdated
break; | ||
} | ||
|
||
else if (key_type == &PyLong_Type && ints_are_bounded && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
is not needed since an alternate branch is ended by break
.
If ints_are_bounded
or Py_ABS(Py_SIZE(key)) > 1
are false, key_type == &PyUnicode_Type
still is checked.
I would write:
if (key_type == &PyLong_Type) {
if (ints_are_bounded && Py_ABS(Py_SIZE(key)) > 1)
ints_are_bounded = 0;
}
else if (key_type == &PyUnicode_Type ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK!
Objects/listobject.c
Outdated
/* Choose the best compare, given what we now know about the keys. */ | ||
if (keys_are_all_same_type) { | ||
|
||
if (key_type == &PyUnicode_Type && strings_are_latin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since in any case you use empty lines for readability, it would be better to add braces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure
Objects/listobject.c
Outdated
else if ((ms.key_richcompare = key_type->tp_richcompare) != NULL) | ||
ms.key_compare = unsafe_object_compare; | ||
|
||
} else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else
should be at the start of new line.
Sorry, I missed that this PR is closed. Is the proposition rejected? |
@serhiy-storchaka Thanks so much for your review! Tim asked me to write tests in |
Ack! This is the first time I've used github, and I didn't intend to close it. I must have hit a wrong button :-( Here, I'll try to open it again. Sorry! |
@tim-one Haha no problem, it's going to be a couple days till my tests are up to snuff :) |
Comparing bytes and strings yields a warning, not an error, so assertRaises fails.
@serhiy-storchaka Don't know if you saw this or not; I fixed all the issues you raised. As far as I can tell, the patch is 100% finished. |
Lib/test/test_sort.py
Outdated
L = [WackyList1([WackyComparator(i), i]) for i in range(10)] | ||
elem = L[-1] | ||
self.assertRaises(ValueError, L.sort) | ||
self.assertRaises(ValueError, [(x,) for x in L].sort) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test on line 325 doesn't actually work; the first sort of the non-tupled list already did the __class__
modification, so L now contains a WackyList2 already and thus the sort would fail even if there was a bug that made sorting one-tuples not handle mid-sort tp_richcompare changes.
* Use the with-statement form of self.assertRaises. * Make the two assertions independent of one another. The second test was invalid because the first list was already sorted, making the ValueError inevitable.
@rhettinger: Please replace |
Bumps [aiohttp](https://github.com/aio-libs/aiohttp) from 3.8.1 to 3.8.3. - [Release notes](https://github.com/aio-libs/aiohttp/releases) - [Changelog](https://github.com/aio-libs/aiohttp/blob/master/CHANGES.rst) - [Commits](aio-libs/aiohttp@v3.8.1...v3.8.3) --- updated-dependencies: - dependency-name: aiohttp dependency-type: direct:production update-type: version-update:semver-patch ... Signed-off-by: dependabot[bot] <support@github.com> Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Description of the optimization (see also this poster)
The idea is simple: in practice, it's very uncommon to sort type-heterogeneous lists. This is because lists in general tend to be used in a homogeneous way (if you're iterating and the type is changing, your code may break, depending on what you're doing), and because comparison is often not defined in the heterogeneous context ("apples and oranges").
So, instead of checking types during every single compare in the sort (dynamic dispatch), we can simply iterate once in a pre-sort check and see if the list is type-homogeneous. If it is, we can replace
PyObject_RichCompareBool
with whatever compare function would have ended up being dispatched for that type. Since this check is cheap and very unlikely to fail, and checking types every time we compare is expensive, this is a reasonable optimization to consider.This is, however, only the beginning of what's possible. Namely, there are many safety checks that have to be performed during every compare in the common cases (string, int, float, tuple) that one encounters in practice. For example, character width has to be checked for both strings every time two strings are compared. Since these checks almost never fail in practice (because, e.g., non-latin strings are uncommon in practice, etc.), we can move them out of the comparison function and into the pre-sort check, as well. We then write special-case compare functions (I implemented one for each of the four types mentioned above) that are selected iff. the assumptions necessary to use them are verified for each list element.
Benchmarks
I considered two sets of benchmarks: one organized by type (random lists of that type), and one organized by structure. Full benchmark scripts can be found here. The results are below (standard deviations were less than 0.3% of the mean for all measurements):
By type
By structure
These are just the benchmarks described in
Objects/listsort.txt
. The first table is the loss we experience if we sort structured heterogeneous lists (worst case: list is already sorted, we go all the way through doingn
type-checks, and then we only end up doingn
comparisons. Tragic, but extremely unlikely in practice; in practice, we would usually find the first heterogeneous element early on, and break out of the check, but here, the single, lonelyfloat
is hiding all the way at the end of the list ofint
, so we don't find it until we've done alln
checks):The second table is the same benchmark, but on homogeneous lists (int):
Patch summary
Here we describe at a high level what each section of the patch does:
Objects/listobject.c
PyObject_RichCompareBool
. To be selected if all of our pre-checks fail.ob_type->tp_richcompare
, which is stored by the pre-sort check atcompare_funcs.key_richcompare
. This yields modest optimization (neighbourhood of 10%), but we generally hope we can do better.memcmp
s them.PyFloat_Type->tp_richcompare
does a lot of typechecking that we want to move out of the sort loop, it pays to have this optimized compare available.compare_funcs.key_compare
in the pre-sort check, we run the pre-sort check again on the listT = [x[0] for x in L]
(we don't actually run the check twice, but we do something functionally equivalent to this). IfT
is type-homogeneous, or even better, satisfies the requirements for one of our special-case compares, we can replace the call toPyObject_RichCompareBool
for the first tuple element with a call tocompare_funcs.tuple_elem_compare
. This allows us to bypass two levels of wasteful safety checks. If the first elements of the two tuples are equal, of course, we have to callPyObject_RichCompareBool
on subsequent elements; the idea is that this is uncommon in practice.key_type
,keys_are_all_same_type
,ints_are_bounded
,strings_are_latin
, andkeys_are_in_tuples
(which is 1 iff. every list element is a non-empty tuple, in which case all the other variables refer to the list[x[0] for x in L]
).keys_are_in_tuples
andkey_type != &PyTuple_Type
, then use the other variables to selectcompare_funcs.tuple_elem_compare
, and setcompare_funcs.key_compare = unsafe_tuple_compare
.Selected quotes from the python-ideas thread
Terry Reedy:
Tim Peters:
Later in that message, Tim also pointed out a bug, which has been fixed in this version of the patch.
https://bugs.python.org/issue28685