Skip to content

Use Smallvec in decomposition & recomposition #35

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 9 commits into from
Jan 21, 2019

Conversation

djudd
Copy link

@djudd djudd commented Jan 12, 2019

Rewrite of #32 to use SmallVec instead of a custom data structure.

Avoids the need for a "SmallVecDeque" or equivalent by taking advantage of the fact that in both decomposition and recomposition, we effectively have a state machine where we are either appending to the buffer, or dequeueing, and when we dequeue we normally empty the buffer--so we can keep a pointer to the front and reset on state transition, instead of mutating the buffer itself on pop_front. (It's possible that this usage pattern was why the not-actually-ring-buffer in #32 worked pretty well.)

New timing comparison:

test bench_nfc_ascii                               1,247 ns/iter -> 994 ns/iter (-21%)
test bench_nfc_long                               365,786 ns/iter -> 365,714 ns/iter (0%)
test bench_nfd_ascii                               630 ns/iter -> 441 ns/iter (-30%)
test bench_nfd_long                               206,544 ns/iter -> 170,967 ns/iter (-17%)
test bench_nfkc_ascii                             1,294 ns/iter -> 927 ns/iter (-28%)
test bench_nfkc_long                             378,280 ns/iter -> 360,764 ns/iter (-5%)
test bench_nfkd_ascii                             586 ns/iter -> 393 ns/iter (-33%)
test bench_nfkd_long                             220,095 ns/iter -> 195,658 ns/iter (-11%)

Pretty similar to last time, probably within the margin of error. Simpler code, though.

Most of the improvement (even in the recomposition case) comes from 2af0c65; the benefit of 09091a3 is quite marginal, and while I'm pretty sure it helps in the simple ASCII case, it's possible it is actually a slight regression in the long-string case. Probably at least worth somebody else at least running the benchmarks themselves to check my particular OS/hardware combination isn't a best-case or something?

Copy link
Contributor

@sujayakar sujayakar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is fantastic @djudd! I left some comments inline but overall idea looks great.

}
}
self.pop_front()

let (_, ch) = self.buffer[self.ready.start];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a precondition for this block that self.ready.start < self.ready.end, yeah? Is it possible to structure this code (or leave a comment) to make this more clear?

From the loop above, we know that self.ready.end > 0, but it's not immediately clear to me that we don't have self.ready.start == self.ready.end.

One idea would be to change increment_next_ready to something like fn pop_front(&mut self) -> Option<char> and then structure the loop as...

loop {
    if let Some(ch) = self.pop_front() {
        return Some(ch);
    }
    ... 
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How do you feel about solving this with a comment instead of a restructuring?

  • The pop_front idea is elegant but benchmarks worse. I believe this is because we add an extra branch to the beginning of every next call in the common case where we are buffering only one character at a time so that we need to advance the underlying iterator each time.

  • Even changing self.ready.end == 0 to self.ready.end == self.ready.start benchmarks slightly worse on my machine; I'm not sure why and am not much good at reading assembly, but perhaps there's an extra load, or the extra efficiency of a comparison to zero is meaningful.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, a comment sounds good! In addition to a correctness argument, can you include a note explaining that it's written this way for performance? That way, subsequent editors know to run benchmarks to check to see if there's a regression.

Thanks for giving it a shot @djudd :)

@@ -93,21 +103,28 @@ impl<I: Iterator<Item=char>> Iterator for Decompositions<I> {

#[inline]
fn next(&mut self) -> Option<char> {
while self.ready == 0 && !self.done {
while self.ready.end == 0 {
match (self.iter.next(), &self.kind) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are we potentially calling .next after the iterator is already exhausted and returned None? For example, say we hit L118 below, where we have multiple pending characters left, sort them, and leave them in the buffer. Then, we'll pop off the first sorted character at L125, but the next call to Self::iter will call self.iter.next() again at L107.

It's not immediately clear from the docs whether calling .next on an exhausted iterator is okay or not, but it's at best iterator-dependent.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, that's what that done field was doing! Let me see if I can restructure in a way that addresses both this and your other comment on the function.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, while doesn't look like adding back done adds noticeable cost, I'm not 100% sure it's correct.

The docs here say:

Returns None when iteration is finished. Individual iterator implementations may choose to resume iteration, and so calling next() again may or may not eventually start returning Some(Item) again at some point.

That makes it sound like calling next() on an exhausted iterator is supposed to be supported (and might even be a good idea). I guess we don't want to support resumption out of the worry that an (in my reading, misbehaving) iterator might panic or violate memory safety?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure exactly what resumption means in this case. Does it, for example, restart the iterator at its beginning? Or is it for streaming iterators that want to return None and then make subsequent values available?

In either case, our loop here would get really confused by a Some following a None: we'd 1) sort the characters we had left on receiving None, 2) potentially emit some of them, and then 3) push some more decomposed characters on from the subsequent Some. We'd then potentially emit non-starters out of combining class order due to the premature sort at step 1.

FWIW, the existence of the Fuse combinator signals to me that it's not really expected to call next on an exhausted iterator. This combinator takes an arbitrary iterator and, at some runtime cost, ensures it always returns None after returning None the first time.

The state of the docs here is a little disappointing. The Futures ones do a better job: https://doc.rust-lang.org/std/future/trait.Future.html#panics.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, cool, fuse actually seems like exactly what we want here so that there's at least an opportunity for specialization to eliminate the cost of the done check.

@sujayakar sujayakar merged commit 6eb631f into unicode-rs:master Jan 21, 2019
@sujayakar
Copy link
Contributor

sorry for the delay @djudd, I was out on vacation. fantastic work!

@sujayakar
Copy link
Contributor

@djudd, I've bumped the version to 0.1.8 in f24cb8a and published it to crates.io!

@djudd-stripe
Copy link

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants