Implement a novel consistent hashing algorithm with replication#80
Conversation
There was a problem hiding this comment.
Pull Request Overview
This PR implements a novel consistent hashing algorithm with support for replication that achieves constant time complexity independent of the number of nodes. The implementation provides consistent hash iterators and a "choose-k" variant for selecting multiple distinct nodes.
- Introduces core consistent hashing iterators (
ConsistentHashIterator,ConsistentHashRevIterator) with bucket-based enumeration - Implements a consistent choose-k algorithm for selecting multiple replicas with guaranteed uniformity
- Adds comprehensive test coverage for uniformity and consistency properties
Reviewed Changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.
| File | Description |
|---|---|
crates/consistent-hashing/src/lib.rs |
Core implementation of the consistent hashing algorithm with iterators and choose-k functionality |
crates/consistent-hashing/README.md |
Documentation explaining the algorithm, complexity analysis, and replication concepts |
crates/consistent-hashing/Cargo.toml |
Package configuration for the new consistent hashing crate |
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
… into aneubeck/sampling
Co-authored-by: Luke Francl <look@github.com>
Co-authored-by: Luke Francl <look@github.com>
Co-authored-by: Luke Francl <look@github.com>
Co-authored-by: Luke Francl <look@github.com>
itsibitzi
left a comment
There was a problem hiding this comment.
An initial batch of comments.
jorendorff
left a comment
There was a problem hiding this comment.
OK, if I was going to make one change to this code, I'd add one comment at the top of consistent_hash or on ConsistentHashRevIterator::next saying "this is Algorithm 5 from [JumpBackHash]". Would have saved me some time!
But no point causing conflicts, if we're just going to delete it.
This algorithm is special, since it's complexity is not dependent on the number of nodes!
The simple implementation is quadratic in the number of desired replication nodes. An r * log(r) complexity could be achieved by utilizing a heap or sorted set data structure. Since r is typically small, this overhead is not worth it though.
rendered