Skip to content

MAINT: Optimize loadtxt usecols. #19618

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Aug 23, 2021
Merged

MAINT: Optimize loadtxt usecols. #19618

merged 2 commits into from
Aug 23, 2021

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Aug 5, 2021

7-10% speedup in usecols benchmarks; it appears that even in the
single-usecol case, avoiding the iteration over usecols more than
compensates the cost of the extra function call to usecols_getter.

       before           after         ratio
     [cc7f1504]       [649b0461]
     <main>           <loadtxtusecols>
-     6.96±0.03ms      6.46±0.03ms     0.93  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv(2)
-      11.5±0.1ms      10.4±0.04ms     0.90  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3, 5, 7])
-      9.39±0.1ms      8.47±0.05ms     0.90  bench_io.LoadtxtUseColsCSV.time_loadtxt_usecols_csv([1, 3])

Comment on lines 1008 to 1010
if usecols:
vals = [vals[j] for j in usecols]
if len(vals) != ncols:
vals = usecols_getter(vals)
elif len(vals) != ncols:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this change to an elif?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If usecols is not None, then we already know that usecols_getter will (by construction) return a list with the right number of elements. The performance gain is negligible, but it seems just good to skip the unneeded check.

@charris
Copy link
Member

charris commented Aug 11, 2021

@anntzer Needs rebase.

@anntzer
Copy link
Contributor Author

anntzer commented Aug 16, 2021

rebased

@charris
Copy link
Member

charris commented Aug 22, 2021

#19693 is in, everything needs a rebase :)

7-10% speedup in usecols benchmarks; it appears that even in the
single-usecol case, avoiding the iteration over `usecols` more than
compensates the cost of the extra function call to usecols_getter.
@charris charris merged commit 6d8eacd into numpy:main Aug 23, 2021
@charris
Copy link
Member

charris commented Aug 23, 2021

Thanks @anntzer .

@charris charris changed the title PERF: Optimize loadtxt usecols. MAINT: Optimize loadtxt usecols. Aug 23, 2021
@anntzer anntzer deleted the loadtxtusecols branch August 23, 2021 04:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants