-
Notifications
You must be signed in to change notification settings - Fork 42
Use Smallvec in decomposition & recomposition #35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is fantastic @djudd! I left some comments inline but overall idea looks great.
} | ||
} | ||
self.pop_front() | ||
|
||
let (_, ch) = self.buffer[self.ready.start]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a precondition for this block that self.ready.start < self.ready.end
, yeah? Is it possible to structure this code (or leave a comment) to make this more clear?
From the loop above, we know that self.ready.end > 0
, but it's not immediately clear to me that we don't have self.ready.start == self.ready.end
.
One idea would be to change increment_next_ready
to something like fn pop_front(&mut self) -> Option<char>
and then structure the loop as...
loop {
if let Some(ch) = self.pop_front() {
return Some(ch);
}
...
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How do you feel about solving this with a comment instead of a restructuring?
-
The
pop_front
idea is elegant but benchmarks worse. I believe this is because we add an extra branch to the beginning of everynext
call in the common case where we are buffering only one character at a time so that we need to advance the underlying iterator each time. -
Even changing
self.ready.end == 0
toself.ready.end == self.ready.start
benchmarks slightly worse on my machine; I'm not sure why and am not much good at reading assembly, but perhaps there's an extra load, or the extra efficiency of a comparison to zero is meaningful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, a comment sounds good! In addition to a correctness argument, can you include a note explaining that it's written this way for performance? That way, subsequent editors know to run benchmarks to check to see if there's a regression.
Thanks for giving it a shot @djudd :)
@@ -93,21 +103,28 @@ impl<I: Iterator<Item=char>> Iterator for Decompositions<I> { | |||
|
|||
#[inline] | |||
fn next(&mut self) -> Option<char> { | |||
while self.ready == 0 && !self.done { | |||
while self.ready.end == 0 { | |||
match (self.iter.next(), &self.kind) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we potentially calling .next
after the iterator is already exhausted and returned None
? For example, say we hit L118 below, where we have multiple pending characters left, sort them, and leave them in the buffer. Then, we'll pop off the first sorted character at L125, but the next call to Self::iter
will call self.iter.next()
again at L107.
It's not immediately clear from the docs whether calling .next
on an exhausted iterator is okay or not, but it's at best iterator-dependent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, that's what that done
field was doing! Let me see if I can restructure in a way that addresses both this and your other comment on the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, while doesn't look like adding back done
adds noticeable cost, I'm not 100% sure it's correct.
The docs here say:
Returns None when iteration is finished. Individual iterator implementations may choose to resume iteration, and so calling next() again may or may not eventually start returning Some(Item) again at some point.
That makes it sound like calling next()
on an exhausted iterator is supposed to be supported (and might even be a good idea). I guess we don't want to support resumption out of the worry that an (in my reading, misbehaving) iterator might panic or violate memory safety?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I'm not sure exactly what resumption means in this case. Does it, for example, restart the iterator at its beginning? Or is it for streaming iterators that want to return None
and then make subsequent values available?
In either case, our loop here would get really confused by a Some
following a None
: we'd 1) sort the characters we had left on receiving None
, 2) potentially emit some of them, and then 3) push some more decomposed characters on from the subsequent Some
. We'd then potentially emit non-starters out of combining class order due to the premature sort at step 1.
FWIW, the existence of the Fuse
combinator signals to me that it's not really expected to call next
on an exhausted iterator. This combinator takes an arbitrary iterator and, at some runtime cost, ensures it always returns None
after returning None
the first time.
The state of the docs here is a little disappointing. The Future
s ones do a better job: https://doc.rust-lang.org/std/future/trait.Future.html#panics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, cool, fuse
actually seems like exactly what we want here so that there's at least an opportunity for specialization to eliminate the cost of the done
check.
sorry for the delay @djudd, I was out on vacation. fantastic work! |
Thanks! |
Rewrite of #32 to use SmallVec instead of a custom data structure.
Avoids the need for a "SmallVecDeque" or equivalent by taking advantage of the fact that in both decomposition and recomposition, we effectively have a state machine where we are either appending to the buffer, or dequeueing, and when we dequeue we normally empty the buffer--so we can keep a pointer to the front and reset on state transition, instead of mutating the buffer itself on
pop_front
. (It's possible that this usage pattern was why the not-actually-ring-buffer in #32 worked pretty well.)New timing comparison:
Pretty similar to last time, probably within the margin of error. Simpler code, though.
Most of the improvement (even in the recomposition case) comes from 2af0c65; the benefit of 09091a3 is quite marginal, and while I'm pretty sure it helps in the simple ASCII case, it's possible it is actually a slight regression in the long-string case. Probably at least worth somebody else at least running the benchmarks themselves to check my particular OS/hardware combination isn't a best-case or something?