Skip to content

PERF: Simplify some of loadtxt's standard converters. #19620

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 6, 2021

Conversation

anntzer
Copy link
Contributor

@anntzer anntzer commented Aug 5, 2021

Standard converters only ever get called with str inputs (loadtxt
performs the required decoding); saving a bunch of runtime typechecks
(in asstr) results in a 15-20% speedup when loadtxt()ing the
corresponding types.

       before           after         ratio
     [cc7f1504]       [3fd5664b]
     <main>           <_psuhme/loadtxtconvs>
-         753±4μs          716±5μs     0.95  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(200)
-     7.24±0.08ms      6.88±0.03ms     0.95  bench_io.LoadtxtCSVDateTime.time_loadtxt_csv_datetime(2000)
-      50.7±0.5μs       44.7±0.4μs     0.88  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10)
-        47.5±3μs       40.7±0.6μs     0.86  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10)
-      57.6±0.8μs       47.5±0.4μs     0.82  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10)
-        34.2±3ms       28.1±0.4ms     0.82  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10000)
-        29.8±2ms       24.4±0.3ms     0.82  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10000)
-         382±3μs          313±4μs     0.82  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100)
-        315±20μs          256±3μs     0.81  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100)
-       313±100ms          254±4ms     0.81  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100000)
-        368±40μs          292±2μs     0.79  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100)
-        381±20ms        301±0.9ms     0.79  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100000)
-        37.6±1ms      29.1±0.04ms     0.78  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000)

@anntzer anntzer force-pushed the loadtxtconvstr branch 2 times, most recently from a9d9668 to 78af087 Compare August 6, 2021 12:13
@eric-wieser
Copy link
Member

Would you mind rebasing and re-profiling on main?

@anntzer
Copy link
Contributor Author

anntzer commented Aug 6, 2021

$ python runtests.py --bench-compare HEAD~ Loadtxt
<elided>
       before           after         ratio
     [99eac423]       [63e0dc4e]
     <main>           <loadtxtconvstr>
-       170±0.7ms        159±0.3ms     0.94  bench_io.LoadtxtCSVStructured.time_loadtxt_csv_struct_dtype
-      33.4±0.2μs       30.4±0.2μs     0.91  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10)
-      30.5±0.1μs       27.6±0.2μs     0.91  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10)
-         191±1μs        169±0.6μs     0.88  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100)
-      35.9±0.3μs       31.7±0.5μs     0.88  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10)
-      18.3±0.1ms      16.1±0.09ms     0.88  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 10000)
-         182±1ms          161±1ms     0.88  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('str', 100000)
-     15.9±0.06ms       13.8±0.2ms     0.87  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 10000)
-         169±2μs          146±1μs     0.86  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100)
-       169±0.9ms          145±1ms     0.86  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('object', 100000)
-       213±0.9ms        174±0.5ms     0.81  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100000)
-     20.6±0.07ms      16.7±0.09ms     0.81  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 10000)
-         221±2μs        178±0.4μs     0.81  bench_io.LoadtxtCSVdtypes.time_loadtxt_dtypes_csv('complex128', 100)

Comment on lines +751 to +752
(np.bytes_, methodcaller('encode', 'latin-1')),
(np.unicode_, str),
Copy link
Member

@eric-wieser eric-wieser Aug 6, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it the case that the input to _CONVERTERS is always a str and never bytes? If so, can you add a comment to _getconv saying that the converters are for parsing data from str?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes (see also #19609). Added the comment.

Standard converters only ever get called with str inputs (loadtxt
performs the required decoding); saving a bunch of runtime typechecks
(in `asstr`) results in a 15-20% speedup when loadtxt()ing the
corresponding types.
Copy link
Member

@eric-wieser eric-wieser left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, let's wait for CI to go green just in case.

@charris
Copy link
Member

charris commented Aug 6, 2021

close/reopen

@charris charris closed this Aug 6, 2021
@charris charris reopened this Aug 6, 2021
@charris charris merged commit 2f52098 into numpy:main Aug 6, 2021
@charris
Copy link
Member

charris commented Aug 6, 2021

Thanks @anntzer .

@anntzer anntzer deleted the loadtxtconvstr branch August 6, 2021 18:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants