-
-
Notifications
You must be signed in to change notification settings - Fork 25.8k
[MRG] MAINT Use set litterals when possible #12667
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
sklearn/tests/test_multiclass.py
Outdated
@@ -239,7 +239,7 @@ def test_ovr_multiclass(): | |||
clf = OneVsRestClassifier(base_clf).fit(X, y) | |||
assert_equal(set(clf.classes_), classes) | |||
y_pred = clf.predict(np.array([[0, 0, 4]]))[0] | |||
assert_equal(set(y_pred), set("eggs")) | |||
assert_equal(set(y_pred), {"eggs"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no idea how that passed -- it shouldn't have,
>>> set("eggs")
{'g', 'e', 's'}
>>> set(['eggs'])
{'eggs'}
Curious to see this fix will fail..
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, the test was passing because we were creating a set(<string>)
(and creating {'g', 'e', 's'}
) on each side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 since it's faster, though I don't think it's more readable.
sklearn/tests/test_multiclass.py
Outdated
@@ -239,7 +239,7 @@ def test_ovr_multiclass(): | |||
clf = OneVsRestClassifier(base_clf).fit(X, y) | |||
assert_equal(set(clf.classes_), classes) | |||
y_pred = clf.predict(np.array([[0, 0, 4]]))[0] | |||
assert_equal(set(y_pred), set("eggs")) | |||
assert_equal(set([y_pred]), {"eggs"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's happening here? Why two different solutions for two sets? Maybe {y_pred}
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since the dict syntax is much more common, I tend to find set literals only easy to read if they are either very short ({1,2,3}
) or if there is another visual hint that :
is absent, such as putting in newlines between long set items or before for
in comprehensions. But that's a personal taste.
I don't think the speed-up is sufficient justification alone
Yeah, this one is a bit controversial. See related discussion at scipy scipy/scipy#9531 Not really sure why I started this -- stumbled on an issue about it at numpy while looking for something else... OK, will revert set comprehensions that don't include a new line. |
I'll vote +0 (maybe -1) to modify part of the our repo. Seems that readability is indeed a problem (not only for some python beginners like me), so maybe close this one? |
Well, {"binary", "multiclass"} is still the right way to define sets as opposed to, set(["binary", "multiclass"]) Also, that's the mathematical notation for sets https://en.wikipedia.org/wiki/Set_(mathematics)#Describing_sets . Just the fact that dicts are more widespread doesn't change that. Using a notation close to the math notation in scientific code is good IMO. Set comprehension is just a generalization of that notation, again close to math notations https://en.wikipedia.org/wiki/Set_notation#Metaphor_in_denoting_sets For the set comprehension, I kind of agree that it's less readable, but again it's just due to the fact that sets are little as compared to dicts. Dict is a generalization of sets to two elements in terms of notation, not the other way around. Will revert set comprehension. |
I don't think I understand that statement. For small sets I find set literals more readable and was surprised to find the |
Removed set comprehension, and left only the most simple cases. CI is green (apart for the failing test on master). |
sklearn/tests/test_multiclass.py
Outdated
y_pred = clf.predict(np.array([[0, 0, 4]]))[0] | ||
assert_equal(set(y_pred), set("eggs")) | ||
y_pred = clf.predict(np.array([[0, 0, 4]])) | ||
assert_equal(set(y_pred), {"eggs"}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@qinhanmin2014 So set(a)
will iterate on a
and create a set with items of a.
In the original code we iterated over the first element of y_pred
, which was a string, and so the result was set("eggs") == {'g', 'e', 's'}
. The same thing was done on the right. It worked but I'm pretty sure that was not intentional.
Here, I changed the code a bit following your comment. On the left we iterate on y_pred
that only has one element "egg"
and on the right, we create the same set with one element.
Merged master in to fix conflicts. CI should be green. This should be quite easy to review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll vote +1. It's slightly faster and won't hurt readability too much I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shrug. I find these harder to parse than the word set
, but I'm okay with the change.
+2, merging |
This reverts commit 2ee7ede.
This reverts commit 2ee7ede.
Set literals (and set comprehension) were added in Python 2.7. They are a bit more readable and faster than alternative methods for
set
creation. This uses those when possible.