-
Notifications
You must be signed in to change notification settings - Fork 9
Implement a novel consistent hashing algorithm with replication #80
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
This PR implements a novel consistent hashing algorithm with support for replication that achieves constant time complexity independent of the number of nodes. The implementation provides consistent hash iterators and a "choose-k" variant for selecting multiple distinct nodes.
- Introduces core consistent hashing iterators (
ConsistentHashIterator
,ConsistentHashRevIterator
) with bucket-based enumeration - Implements a consistent choose-k algorithm for selecting multiple replicas with guaranteed uniformity
- Adds comprehensive test coverage for uniformity and consistency properties
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
File | Description |
---|---|
crates/consistent-hashing/src/lib.rs |
Core implementation of the consistent hashing algorithm with iterators and choose-k functionality |
crates/consistent-hashing/README.md |
Documentation explaining the algorithm, complexity analysis, and replication concepts |
crates/consistent-hashing/Cargo.toml |
Package configuration for the new consistent hashing crate |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… into aneubeck/sampling
Co-authored-by: Luke Francl <look@github.com>
Co-authored-by: Luke Francl <look@github.com>
Co-authored-by: Luke Francl <look@github.com>
Co-authored-by: Luke Francl <look@github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An initial batch of comments.
crates/consistent-hashing/src/lib.rs
Outdated
let mut samples = Vec::with_capacity(self.k); | ||
let mut samplers: Vec<_> = (0..self.k) | ||
.map(|i| ConsistentHashRevIterator::new(n - i, self.builder.seq_builder(i)).peekable()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mentioned in Slack but I'll rewrite here.
It might be nice for performance to have k
be a const generic to allow these allocations to be on the stack. There might be some other ways to avoid the allocation of course.
I did some quick benchmarking and removing the vec allocations makes the runtime ~35% lower with k=3
and ~25% for k=5
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added another function which takes a vector as input.
Changed the implementation to get rid of the second vector.
Downside is that we have to do more work now in order to fetch the next consistent hash for a specific sampler 🤷
Generally, performance is actually not so bad...
This algorithm is special, since it's complexity is not dependent on the number of nodes!
The simple implementation is quadratic in the number of desired replication nodes. An r * log(r) complexity could be achieved by utilizing a heap or sorted set data structure. Since r is typically small, this overhead is not worth it though.
rendered