-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
ASV: more for str.cat #22652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASV: more for str.cat #22652
Conversation
Hello @h-vetinari! Thanks for submitting the PR.
|
1c00ed1
to
497a336
Compare
Codecov Report
@@ Coverage Diff @@
## master #22652 +/- ##
=======================================
Coverage 92.17% 92.17%
=======================================
Files 169 169
Lines 50708 50708
=======================================
Hits 46740 46740
Misses 3968 3968
Continue to review full report at Codecov.
|
ping because you managed the CirlceCI transition from 1.0 to 2.0. It seems that new setup does not respect |
d90e164
to
2bccc65
Compare
Last I saw CircleCI had an issue open for ci-skip. Not sure what the status
is.
…On Sun, Sep 9, 2018 at 4:33 PM h-vetinari ***@***.***> wrote:
@TomAugspurger <https://github.com/TomAugspurger>
ping because you managed the CirlceCI transition from 1.0 to 2.0. It seems
that new setup does not respect [ci skip] - I view the CI as a limited
resource (appveyor recently fluctuating up to something like 12h waiting
time), so [ci skip] can be useful for avoiding a CI run where it's
pointless (like this PR, which changes nothing in code/docs; and PEP8 runs
anyway).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#22652 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABQHImB2sLu_oynoGv8HnqLkxQhR88kEks5uZYkRgaJpZM4Wgcb6>
.
|
not sure where you saw it, but https://discuss.circleci.com/search?q=ci%20skip is pretty messy with lots of duplicates (due to their auto-closing policy for issues after a while with no activity). One of the closer ones seems to be https://discuss.circleci.com/t/ci-skip-does-not-work-in-pull-requests/5308 The 2.0 docs (https://circleci.com/docs/2.0/skip-build/) mention that
which is weird, because it did work for 1.0 in this repo (side note: the docs for that don't exist anymore: https://circleci.com/docs/1.0/skip-a-build/). On an unrelated note: would you care to comment about the PR itself? :) |
https://discuss.circleci.com/t/ci-skip-does-not-work-in-pull-requests/5308/10 is the one I saw. Agreed it's unfortunate. If a maintainer is around you can ask them to cancel it :/ No strong thoughts on how the PR, other than a consideration of runtime. It seems like these should run pretty quickly. |
Maintainer time is an even more precious resource than CI time, so I think I won't bother you guys. ;-)
Here's the output of an ASV run against master. Of course,
|
asv_bench/benchmarks/strings.py
Outdated
param_names = ['others', 'sep', 'na_rep', 'na_frac'] | ||
|
||
def setup(self, others, sep, na_rep, na_frac): | ||
N = int(5e5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you see how we do this in other benchmarks? I think we usually just pick 100_000 so for consistency might be nice to reflect that here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, was trying to get the results near goal_time
, but maybe I'm misinterpreting what that is supposed to be
asv_bench/benchmarks/strings.py
Outdated
|
||
def setup(self, others, sep, na_rep, na_frac): | ||
N = int(5e5) | ||
mask_gen = lambda: np.random.choice([True, False], N, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need the lambda here? Figured we could just use the returned array in the subsequent call
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's essential IMO that each column gets a different mask to compound the NaNs (for the na_rep=None
case) and have a reasonable similarity to real-life data.
asv_bench/benchmarks/strings.py
Outdated
self.s = Series(tm.makeStringIndex(N)).where(mask_gen()) | ||
self.others = (DataFrame({i: tm.makeStringIndex(N).where(mask_gen()) | ||
for i in range(others)}) | ||
if others is not None else None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any reason you don't just use 0
instead of None
as the parametrized argument? Would get rid of the need for a condition here
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to 0
for consistency/legibility, but still need the condition, as pd.DataFrame({'i' : ... for i in range(0)})
would yield a DataFrame (but str.cat
needs others=None
to self-concatenate).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review.
asv_bench/benchmarks/strings.py
Outdated
param_names = ['others', 'sep', 'na_rep', 'na_frac'] | ||
|
||
def setup(self, others, sep, na_rep, na_frac): | ||
N = int(5e5) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No problem, was trying to get the results near goal_time
, but maybe I'm misinterpreting what that is supposed to be
asv_bench/benchmarks/strings.py
Outdated
|
||
def setup(self, others, sep, na_rep, na_frac): | ||
N = int(5e5) | ||
mask_gen = lambda: np.random.choice([True, False], N, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it's essential IMO that each column gets a different mask to compound the NaNs (for the na_rep=None
case) and have a reasonable similarity to real-life data.
asv_bench/benchmarks/strings.py
Outdated
self.s = Series(tm.makeStringIndex(N)).where(mask_gen()) | ||
self.others = (DataFrame({i: tm.makeStringIndex(N).where(mask_gen()) | ||
for i in range(others)}) | ||
if others is not None else None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to 0
for consistency/legibility, but still need the condition, as pd.DataFrame({'i' : ... for i in range(0)})
would yield a DataFrame (but str.cat
needs others=None
to self-concatenate).
c91f374
to
67503c6
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request on readability otherwise lgtm
asv_bench/benchmarks/strings.py
Outdated
self.s = Series(tm.makeStringIndex(N)).where(mask_gen()) | ||
self.others = (DataFrame({i: tm.makeStringIndex(N).where(mask_gen()) | ||
for i in range(other_cols)}) | ||
if other_cols > 0 else None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better but I think the condition inside the comprehension just makes this tougher to read than it needs to be. Would rather take the condition out of the comprehension and as a subsequent statement evaluate the truthiness of the expression, assigning None where False
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed - hope I understood your request correctly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thx, changed
asv_bench/benchmarks/strings.py
Outdated
self.s = Series(tm.makeStringIndex(N)).where(mask_gen()) | ||
self.others = (DataFrame({i: tm.makeStringIndex(N).where(mask_gen()) | ||
for i in range(other_cols)}) | ||
if other_cols > 0 else None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed - hope I understood your request correctly.
4d84816
to
862abe2
Compare
862abe2
to
ddc130e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm @TomAugspurger
@WillAyd
|
Thanks! |
Planning to do a clean-up of the
Series.str.cat
internals and wanted to expand ASV-coverage before doing so.