Skip to content

Less allocation in decomposition and recomposition #32

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
wants to merge 2 commits into from

Conversation

djudd
Copy link

@djudd djudd commented Jan 5, 2019

Replaces Vec/VecDeque with normally non-allocating buffer in decomposition and recomposition.

In simple benchmarking on my laptop this provides modest but noticeable speed improvements, especially for the ASCII case where we don't actually change anything. From one pair of before/after runs:

test bench_nfc_ascii                               1,247 ns/iter -> 890 ns/iter (-29%)
test bench_nfc_long                               365,786 ns/iter -> 356,578 ns/iter (-4%)
test bench_nfd_ascii                               630 ns/iter -> 394 ns/iter (-37%)
test bench_nfd_long                               206,544 ns/iter -> 175,961 ns/iter (-15%)
test bench_nfkc_ascii                             1,294 ns/iter -> 953 ns/iter (-26%)
test bench_nfkc_long                             378,280 ns/iter -> 356,570 ns/iter (-6%)
test bench_nfkd_ascii                             586 ns/iter -> 396 ns/iter (-32%)
test bench_nfkd_long                             220,095 ns/iter -> 191,294 ns/iter (-13%)

Normally I'd check in with a crate maintainer before writing this much code, but it was fun and I don't mind if it's just a starting point for discussion. Some alternatives would be:

  • Just use SmallVec - the reason I didn't do this was a guess that you might want to continue to have zero dependencies
  • Use unsafe code - there are more theoretical optimizations possible if we were willing to be unsafe (eg skipping bounds checks when we've already checked ready, uninitialized memory, using pointers to let a bunch of code not care if we've spilled or not)

@sujayakar
Copy link
Contributor

Impressive gains, @djudd!

Before going into the details of the changes, here's some high-level ideas. Looking at the new buffer data structure, it looks very specialized to the problem at hand. The underlying algorithm would be clearer if we could compose more generic data structures while still maintaining performance.

For example, recomposition would be entirely accommodated by something like a SmallVecDeque. Decomposition's core idea is that it has a FIFO queue of "ready" characters and a buffer of "pending" characters that need more input to be sorted. Could we get away with a SmallVecDeque for the ready characters and a SmallVec for the pending ones? I think it'd be reasonable to accept a small performance penalty here if the code is more readable.

I don't have strong opinions on adding SmallVec as a dependency but @Manishearth should make the call here :)

@Manishearth
Copy link
Member

Approach looks good. Though I guess it would be nice to have a SmallVecDeque abstraction somewhere (maybe in the SmallVec crate itself?)

I'm totally okay with having a dependency on smallvec.

r? @SimonSapin

@djudd
Copy link
Author

djudd commented Jan 10, 2019

I'll give the "SmallVec for pending, SmallVecDeque for ready" idea a shot this weekend, where "SmallVecDeque" is probably a simplified version of Buffer rather than something actually as generic as SmallVec or VecDeque.

Note, though, that the performance gains so far are pretty fragile to added abstraction--for example, I tried a few variants without a mutate_and_push_back, just separate methods, and lost most of the gains each time.

@djudd
Copy link
Author

djudd commented Jan 12, 2019

@djudd djudd closed this Jan 12, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants