-
-
Notifications
You must be signed in to change notification settings - Fork 32.6k
bpo-36218: Fix handling of heterogeneous values in list.sort #12209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
bpo-36218: Fix handling of heterogeneous values in list.sort #12209
Conversation
Please add a NEWS entry since this fixes a segfault in a released version. |
Thanks @tirkarthi! |
Author of the bug here: apologies for the error, this patch is absolutely correct. |
To be fair, it was a clever patch and did pass the review from Tim Peters ;) (and the bug took quite a long time to get noticed so I guess lists are actually homogenous in general) |
Hi @embg, thinking back about this bug, I'm not sure this is the best fix. For inputs like |
@remilapeyre You're correct, and according to my measurements from the original PR, the performance gain would be about 40% for the case you mention. How about like this? (See changes below) |
Objects/listobject.c
Outdated
@@ -2305,6 +2305,7 @@ list_sort_impl(PyListObject *self, PyObject *keyfunc, int reverse) | |||
lo.keys[i]); | |||
|
|||
if (key->ob_type != key_type) { | |||
keys_are_in_tuples = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
keys_are_in_tuples = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We remove this line, which over-compensates in the case you mentioned.
Objects/listobject.c
Outdated
if (keys_are_in_tuples) { | ||
/* Make sure we're not dealing with tuples of tuples | ||
* (remember: here, key_type refers list [key[0] for key in keys]) */ | ||
if (key_type == &PyTuple_Type) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if (key_type == &PyTuple_Type) { | |
if ( (keys_are_all_same_type == 0) || (key_type == &PyTuple_Type) ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And instead modify this check, which sets tuple_elem_compare
to safe_object_compare
in the case of nested tuples, to also fall back when the tuple first-elements are heterogeneous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that's going to work because of the early exit at https://github.com/python/cpython/pull/12209/files#diff-5037da2a492e5903f100107e0506f818R2310 you can't be sure that all keys are tuples anyway.
I think the simplest way forward would be to use two variables key_type
and inner_key_type
to know if it's item[0]
or item
that changed type.
Does this make sense?
@remilapeyre On second thought, I'm not clear how the early exit would impact the change proposed in my review, since the check is outside the loop. Could you walk me through that? Although I agree that two variables is a good solution if we can't patch it by just modifying a single line. |
Unless I missed something, you are suggesting to use this code:
Then for input like If you break from the list before the third element, how can you be sure it will be a tuple? |
OK, I see what you mean now. Couldn't we fix it like this?
That way we break early for the non-tuple case, but for tuples we check all the way through. The full diff would then be two lines. |
@tim-one and @rhettinger, since you were involved in the original change, could you review this please? |
@embg yes, this seems to be the correct condition to exit the loop. |
@ned-deily I haven't chimed in here because it appears to be a shallow bug that's already getting first-class attention. The kinds of bugs we were looking for at the start were far subtler 😉 . |
Serhiy, I would appreciate it if you could go through this as well. |
Co-Authored-By: remilapeyre <remi.lapeyre@henki.fr>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, you have changed too much. This change adds a pessimization. I suggested to fix an indentation, not to change a code.
Sorry, I will fix it. |
@remilapeyre Since |
Thanks for invitation but it was Lyn Levenick that reported the issue on https://bugs.python.org/issue36218#msg337341 |
Ah, I will post on the bugtracker thread then |
Let's see if this can get wrapped up in time for the alpha release this weekend. |
Thanks @remilapeyre for the PR, and @rhettinger for merging it 🌮🎉.. I'm working now to backport this PR to: 3.7. |
GH-12532 is a backport of this pull request to the 3.7 branch. |
https://bugs.python.org/issue36218