CityHash - Fast Hash Functions For Strings
CityHash - Fast Hash Functions For Strings
Geoff Pike
(joint work with Jyrki Alakuijala)
http://code.google.com/p/cityhash/
Introduction
I Who?
I What?
I When?
I Where?
I Why?
Outline
Introduction
Interlude: Testing
CityHash
Conclusion
Recent activity
I My notation
I Discussion of cyclic redundancy checks
I What is a CRC?
I What does the crc32q instruction do?
Traditional String Hashing
a = a - b; a = a - c; a = a ⊕ σ13 (c)
b = b - c; b = b - a; b = b ⊕ σ-8 (a)
c = c - a; c = c - b; c = c ⊕ σ13 (b)
a = a - b; a = a - c; a = a ⊕ σ12 (c)
b = b - c; b = b - a; b = b ⊕ σ-16 (a)
c = c - a; c = c - b; c = c ⊕ σ5 (b)
a = a - b; a = a - c; a = a ⊕ σ3 (c)
b = b - c; b = b - a; b = b ⊕ σ-10 (a)
c = c - a; c = c - b; c = c ⊕ σ15 (b)
Jenkins’ mix
Also around 1996, Bob Jenkins published a hash function with
a 96-bit input and a 96-bit output. Pseudocode with 32-bit
registers:
a = a - b; a = a - c; a = a ⊕ σ13 (c)
b = b - c; b = b - a; b = b ⊕ σ-8 (a)
c = c - a; c = c - b; c = c ⊕ σ13 (b)
a = a - b; a = a - c; a = a ⊕ σ12 (c)
b = b - c; b = b - a; b = b ⊕ σ-16 (a)
c = c - a; c = c - b; c = c ⊕ σ5 (b)
a = a - b; a = a - c; a = a ⊕ σ3 (c)
b = b - c; b = b - a; b = b ⊕ σ-10 (a)
c = c - a; c = c - b; c = c ⊕ σ15 (b)
and
NX
mod 8
mod 8)−i
TailBytes(N) = 256(N · BN − i
i=1
Murmur2
uint64 k = 14313749767032793493
int iters = bN /8c
uint64 hash = seed ⊕ Nk
for (int i = 0; i < iters; i++)
hash = (hash ⊕ (ShiftMix(Wi ·k)·k))·k
if (N mod 8 > 0)
hash = (hash ⊕ TailBytes(N))·k
return ShiftMix(ShiftMix(hash) · k)
Murmur2 Strong Points
I Simple
I Fast (assuming multiplication is fairly cheap)
I Quality is quite good
Questions about Murmur2 (or any other choice)
where f is “Mul-ShiftMix-Mul”
Murmur2 Speed
I f is invertible
I During the loop, diffusion isn’t perfect
Testing
Goals:
I Speed (on Google datacenter hardware or similar)
I Quality
I Excellent diffusion
I Excellent behavior on all contributed test data
I Excellent behavior on basic synthetic test data
I Good internal state diffusion—but not too good,
cf. Rogaway’s Bucket Hashing
Portability
if (N <= 32)
if (N <= 16)
if (N <= 8)
...
else
...
else
...
else if (N <= 64) {
// Handle 33 <= N <= 64
...
} else {
// Handle N > 64
int iters = bN /64c
...
}
The CityHash64 function (2012): preliminaries
let a = u⊕v
a’ = ShiftMix(a · m)
a” = a’ ⊕ v
a”’ = ShiftMix(a00 · m)
in
a”’ · m
The CityHash64 function (2012): preliminaries
let a = u⊕v
a’ = ShiftMix(a · m)
a” = a’ ⊕ v
a”’ = ShiftMix(a00 · m)
in
a”’ · m
Also, k0 , k1 , and k2 are primes near 264 , and K is k2 + 2N.
CityHash64: 1 <= N <= 3
let a = B0
b = BbN /2c
c = BN−1
y = a + 256b
z = N + 4c
in
ShiftMix((y · k2 ) ⊕ (z · k0 ))
CityHash64: 4 <= N <= 8
let a = W0 + k2
b = W−1
c = ρ37 (b) · K + a
d = (ρ25 (a) + b) · K
in
α(c, d, K)
CityHash64: 17 <= N <= 32
let a = W0 · k1
b = W1
c = W−1 · K
d = W−2 · k2
in
α(ρ43 (a + b) + ρ30 (c) + d, a + ρ18 (b + k2 ) + c, K)
CityHash64: 33 <= N <= 64
let a = W0 · k2
e = W2 · k2
f = W3 · 9
h = W−2 · K
u = ρ43 (a + W−1 ) + 9 (ρ30 (W1 ) + c)
v = a + W−1 + f + 1
w = h + β((u + v) · K)
x = ρ42 (e + f ) + W−3 + β(W−4 )
y = (β((v + w) · K) + W−1 ) · K
z = e + f + W−3
r = β((x + z) · K + y ) + W1
t = ShiftMix((r + z) · K + W−4 + h)
in
tK + x
Evaluation for N <= 64
Evaluation for N <= 64
I CityHash128
I same loop body, manually unrolled
I slightly faster for large N
I CityHashCrc128
I totally different function
I uses CRC instruction, but isn’t a CRC
I faster still for large N
Evaluation for N > 64
Evaluation for N > 64
I CityHash32
I Big Endian
I SIMD
The End
The End
Backup Slides
Notation
Sample polynomial:
p = x 32 + x 27 + 1
CRC, part 3
x 50 − x 18 · p = x 50 − x 18 · (x 32 + x 27 + 1)
= x 50 − (x 50 + x 45 + x 18 )
= x 45 + x 18
CRC, part 6
The result:
x 50 ≡ x 30 + x 18 + x 13 + x 8 + x 3
CRC, part 7
More samples:
x 50 ≡ x 30 + x 18 + x 13 + x 8 + x 3
x 50 + 1 ≡ x 30 + x 18 + x 13 + x 8 + x 3 + 1
x 51 ≡ x 31 + x 19 + x 14 + x 9 + x 4
x 51 + x 50 ≡ x 31 + x 30 + x 19 + x 18 + x 14 + x 13 + x 9 + x 8 + x 4 + x 3
x 51 + x 31 ≡ x 19 + x 14 + x 9 + x 4
CRC in Practice
F (0) = 0, F (x 31 ) = 1, F (x 30 ) = 2, F (x 31 + x 30 ) = 3, F (x 29 ) =
4, . . .
How is crc32q used?