Skip to content

Implement a novel consistent hashing algorithm with replication #80

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 21 commits into
base: main
Choose a base branch
from

Conversation

aneubeck
Copy link
Collaborator

@aneubeck aneubeck commented Aug 11, 2025

This algorithm is special, since it's complexity is not dependent on the number of nodes!
The simple implementation is quadratic in the number of desired replication nodes. An r * log(r) complexity could be achieved by utilizing a heap or sorted set data structure. Since r is typically small, this overhead is not worth it though.

rendered

@aneubeck aneubeck requested a review from a team as a code owner August 11, 2025 12:54
@Copilot Copilot AI review requested due to automatic review settings August 11, 2025 12:54
Copy link
Contributor

@Copilot Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR implements a novel consistent hashing algorithm with support for replication that achieves constant time complexity independent of the number of nodes. The implementation provides consistent hash iterators and a "choose-k" variant for selecting multiple distinct nodes.

  • Introduces core consistent hashing iterators (ConsistentHashIterator, ConsistentHashRevIterator) with bucket-based enumeration
  • Implements a consistent choose-k algorithm for selecting multiple replicas with guaranteed uniformity
  • Adds comprehensive test coverage for uniformity and consistency properties

Reviewed Changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
crates/consistent-hashing/src/lib.rs Core implementation of the consistent hashing algorithm with iterators and choose-k functionality
crates/consistent-hashing/README.md Documentation explaining the algorithm, complexity analysis, and replication concepts
crates/consistent-hashing/Cargo.toml Package configuration for the new consistent hashing crate

aneubeck and others added 4 commits August 15, 2025 10:13
Co-authored-by: Luke Francl <look@github.com>
Co-authored-by: Luke Francl <look@github.com>
Co-authored-by: Luke Francl <look@github.com>
Co-authored-by: Luke Francl <look@github.com>
Copy link
Contributor

@itsibitzi itsibitzi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An initial batch of comments.

Comment on lines 236 to 238
let mut samples = Vec::with_capacity(self.k);
let mut samplers: Vec<_> = (0..self.k)
.map(|i| ConsistentHashRevIterator::new(n - i, self.builder.seq_builder(i)).peekable())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mentioned in Slack but I'll rewrite here.

It might be nice for performance to have k be a const generic to allow these allocations to be on the stack. There might be some other ways to avoid the allocation of course.

I did some quick benchmarking and removing the vec allocations makes the runtime ~35% lower with k=3 and ~25% for k=5.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added another function which takes a vector as input.
Changed the implementation to get rid of the second vector.
Downside is that we have to do more work now in order to fetch the next consistent hash for a specific sampler 🤷

Generally, performance is actually not so bad...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants