Skip to content

[MRG] MAINT Use set litterals when possible #12667

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jan 6, 2019

Conversation

rth
Copy link
Member

@rth rth commented Nov 24, 2018

Set literals (and set comprehension) were added in Python 2.7. They are a bit more readable and faster than alternative methods for set creation. This uses those when possible.

@@ -239,7 +239,7 @@ def test_ovr_multiclass():
clf = OneVsRestClassifier(base_clf).fit(X, y)
assert_equal(set(clf.classes_), classes)
y_pred = clf.predict(np.array([[0, 0, 4]]))[0]
assert_equal(set(y_pred), set("eggs"))
assert_equal(set(y_pred), {"eggs"})
Copy link
Member Author

@rth rth Nov 24, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no idea how that passed -- it shouldn't have,

>>> set("eggs")
{'g', 'e', 's'}
>>> set(['eggs'])
{'eggs'}

Curious to see this fix will fail..

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, the test was passing because we were creating a set(<string>) (and creating {'g', 'e', 's'}) on each side.

@rth rth changed the title MAINT Use set litterals when possible [MRG] MAINT Use set litterals when possible Nov 25, 2018
Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 since it's faster, though I don't think it's more readable.

@@ -239,7 +239,7 @@ def test_ovr_multiclass():
clf = OneVsRestClassifier(base_clf).fit(X, y)
assert_equal(set(clf.classes_), classes)
y_pred = clf.predict(np.array([[0, 0, 4]]))[0]
assert_equal(set(y_pred), set("eggs"))
assert_equal(set([y_pred]), {"eggs"})
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's happening here? Why two different solutions for two sets? Maybe {y_pred}?

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since the dict syntax is much more common, I tend to find set literals only easy to read if they are either very short ({1,2,3}) or if there is another visual hint that : is absent, such as putting in newlines between long set items or before for in comprehensions. But that's a personal taste.

I don't think the speed-up is sufficient justification alone

@rth
Copy link
Member Author

rth commented Nov 27, 2018

Yeah, this one is a bit controversial. See related discussion at scipy scipy/scipy#9531

Not really sure why I started this -- stumbled on an issue about it at numpy while looking for something else...

OK, will revert set comprehensions that don't include a new line.

@qinhanmin2014
Copy link
Member

qinhanmin2014 commented Nov 27, 2018

I'll vote +0 (maybe -1) to modify part of the our repo. Seems that readability is indeed a problem (not only for some python beginners like me), so maybe close this one?

@rth
Copy link
Member Author

rth commented Nov 27, 2018

Well,

{"binary", "multiclass"}

is still the right way to define sets as opposed to,

set(["binary", "multiclass"])

Also, that's the mathematical notation for sets https://en.wikipedia.org/wiki/Set_(mathematics)#Describing_sets . Just the fact that dicts are more widespread doesn't change that. Using a notation close to the math notation in scientific code is good IMO. Set comprehension is just a generalization of that notation, again close to math notations https://en.wikipedia.org/wiki/Set_notation#Metaphor_in_denoting_sets

For the set comprehension, I kind of agree that it's less readable, but again it's just due to the fact that sets are little as compared to dicts. Dict is a generalization of sets to two elements in terms of notation, not the other way around.

Will revert set comprehension.

@amueller
Copy link
Member

is still the right way to define sets as opposed to,

I don't think I understand that statement. For small sets I find set literals more readable and was surprised to find the set([...]) syntax in the code

@rth
Copy link
Member Author

rth commented Nov 27, 2018

Removed set comprehension, and left only the most simple cases. CI is green (apart for the failing test on master).

y_pred = clf.predict(np.array([[0, 0, 4]]))[0]
assert_equal(set(y_pred), set("eggs"))
y_pred = clf.predict(np.array([[0, 0, 4]]))
assert_equal(set(y_pred), {"eggs"})
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@qinhanmin2014 So set(a) will iterate on a and create a set with items of a.

In the original code we iterated over the first element of y_pred, which was a string, and so the result was set("eggs") == {'g', 'e', 's'}. The same thing was done on the right. It worked but I'm pretty sure that was not intentional.

Here, I changed the code a bit following your comment. On the left we iterate on y_pred that only has one element "egg" and on the right, we create the same set with one element.

@rth
Copy link
Member Author

rth commented Jan 3, 2019

Merged master in to fix conflicts. CI should be green.

This should be quite easy to review.

Copy link
Member

@qinhanmin2014 qinhanmin2014 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll vote +1. It's slightly faster and won't hurt readability too much I think.

Copy link
Member

@jnothman jnothman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shrug. I find these harder to parse than the word set, but I'm okay with the change.

@qinhanmin2014
Copy link
Member

+2, merging

@qinhanmin2014 qinhanmin2014 merged commit 684d8a2 into scikit-learn:master Jan 6, 2019
@rth rth deleted the set-litteral branch January 6, 2019 17:23
adrinjalali pushed a commit to adrinjalali/scikit-learn that referenced this pull request Jan 7, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
xhluca pushed a commit to xhluca/scikit-learn that referenced this pull request Apr 28, 2019
koenvandevelde pushed a commit to koenvandevelde/scikit-learn that referenced this pull request Jul 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants