6 Hashfunctions

CSE 207B:
Applied Cryptography
Nadia Heninger
UCSD
Fall 2023 Lecture 6

Announcements
1. HW 2 is due today!
2. HW 3 is out, due before class in 1 week

Last time: Message integrity and MACs
This time: Hash functions

Cryptographic hash functions
Definition (Practical definition)

A hash function is an efficiently computable function that maps
arbitrary-length inputs to fixed-length outputs
H(x) : {0, 1}∗ → {0, 1}n
We would like to define H as secure if it is collision resistant.
Collision resistant: No efficient adversary can find messages m0 ,

m1 such that
H(m0 ) = H(m1 )
Cryptographic hash functions
Definition (Practical definition)

A hash function is an efficiently computable function that maps
arbitrary-length inputs to fixed-length outputs
H(x) : {0, 1}∗ → {0, 1}n
We would like to define H as secure if it is collision resistant.
Collision resistant: No efficient adversary can find messages m0 ,

m1 such that
H(m0 ) = H(m1 )
This definition doesn’t work in a theoretical sense: By the
pigeonhole principle there exist such collisions, so there exists an
adversary A that just prints them out.
Security game for a collision-resistant hash function
Fix for theoretical proofs: Define a family of hash functions that
depends on a parameter that the adversary doesn’t know in
advance, and define runtime in terms of a security parameter.
Definition
H is collision-resistant if Pr[A succeeds] ≤ negligible.
Practical hash functions
In actual practice, hash functions are fixed functions where all

parameters were generated when the algorithms were
designed/standardized. (Think MD5, SHA1, SHA256, SHA3.)
So “collision-resistance” means “the best cryptanalytic attacks to

find a collision for this fixed function are expected to have infeasible
runtimes”.
Security notions for hash functions
1. Collision resistance:
Find m0 , m1 such that H(m0 ) = H(m1 ).
2. 2nd preimage resistance:

Input m0 , find m1 s.t. H(m0 ) = H(m1 ).
3. Preimage resistance:
Input t = H(m0 ), find m1 s.t. H(m1 ) = t.
Generic collision-finding attacks on hash functions
Inefficient pigeonhole algorithm

If H has ℓ-bit outputs, then after hashing 2ℓ + 1 distinct inputs
there must be a collision.
We can do much better than this with the birthday “paradox”.

Birthday attacks
Let H be a hash function with N = 2n possible outputs.
Birthday Attack Algorithm

1. Choose k uniformly random messages m1 , . . . , mk .
2. Compute ti = H(mi ) for every i.
3. Search for i, j s.t. mi ̸= mj and H(mi ) = H(mj ).
Theorem
√
Pr[collision after Θ( N) inputs] ≥ 1/2
Birthday collision theorem
Theorem
√
Pr[collision after Θ( N) inputs] ≥ 1/2
Proof.
We throw k balls uniformly at random into N distinct bins.
1 2 k −1
Pr[k balls into N distinct bins] = 1·(1− )·(1− ) . . . (1− )
N N N
1
Use 1 − N ⪅ e −1/N .
k−1
Y Pk−1
Pr[distinct] ≤ e −i/N = e −1/N i=1 i
= e −k(k−1)/2N
i=1
√
For k = N:
2 /2N
Pr[collision] = 1 − Pr[distinct] ≥ 1 − e k ≥ 1 − e −2 ≥ 1/2
Birthday Attack Algorithm Complexity
The previous algorithm requires:

√
• N = 2n/2 hash evaluations
√
• n N = n2n/2 memory
It is possible to find collisions with less memory using the Floyd

cycle-finding algorithm.
Floyd cycle-finding algorithm
Floyd cycle-finding algorithm
1. Start at an arbitrary x0 = y0 .
2. Compute xi = H(xi−1 ), yi = H(H(yi−1 )).
3. If xi = yi , then m0 = xi−1 and m1 = H(yi−1 ) collide.
4. To find collision at beginning of cycle, restart x pointer from
start and single-step both x and y .
√
By previous analysis, there is a cycle of length N assuming H
behaves like a random walk.
√
Algorithm time: O( N)
Memory: O(1)
Side note: Can get parallelization and time/memory tradeoff using

method of distinguished points.
Any collision-resistant hash function needs output at least 256 bits
in practice.
How do you construct a hash function that takes
arbitrary-length inputs?
Assume we have a fixed-length hash function h
Does CBC-hash work?
Strawman CBC-hash construction

1. Set h(m) = AES0 (m).
2. Set H0 = 0000 . . . 0.
3. Let Hi = AES0 (Hi−1 ⊕ mi )
4. Output Hk
How do you construct a hash function that takes
arbitrary-length inputs?
Assume we have a fixed-length hash function h
Does CBC-hash work?
Strawman CBC-hash construction

1. Set h(m) = AES0 (m).
2. Set H0 = 0000 . . . 0.
3. Let Hi = AES0 (Hi−1 ⊕ mi )
4. Output Hk
No: Let m = m1 ||m2 and m′ = m1′ ||m2′ with

m1′ = m2 ⊕ H1 and m2′ = H2 ⊕ m2 ⊕ H1 .
Conclusion: CBC mode on its own does not lead to a secure hash
function.
Merkle-Damgård Construction
Assume we have a fixed-length compression function h.

1. Input message m = m1 ||m2 || . . .
2. Pad final block to ℓ bits: mˆk = mk || pad || len(m).
3. Set t0 = IV .
4. For i = 1 to k do: ti = h(ti−1 , mi )
5. Output tk
Theorem
If h is a fixed-length collision-resistant hash function then H is a
collision-resistant hash function.
Proof.
Assume have a collision (m, m′ ) for H. Construct a collision for h.
Case 1 Different message lengths K ̸= K ′ . At last step,

h(tk−1 , mk || pad ||K ) = h(tk′ ′ −1 , mk′ ′ || pad ||K ′ ) is a collision.
Case 2 Last block identical; K = K ′ . Then collision occurs at some
′ , m′ .
intermediate block. Find last point with ti−1 , mi ̸= ti−1 i
′ , m′ ).
Then h(ti−1 , mi ) = h(ti−1 i
Davies-Meyer compression functions
Can construct collision-resistant compression functions from block

ciphers.
This is collision-resistant.
Requires specialized block ciphers: 128-bit AES is too short to

ensure collision resistance.
Need block ciphers that take 512 or 1024-bit keys.

Hash functions in practice
• MD5: Totally insecure.

• SHA1: Totally insecure.
• SHA2 (SHA256, SHA512): Secure
• SHA3: New hotness standardized by NIST in 2015.
The long saga of MD5
MD5 was a popular hash function using the Merkle-Damgård

construction.
1991 Rivest publishes MD5
1993 “some weaknesses”
1995 SHA1 proposed by NIST
2004 Wang et al. collision for MD5 in Crypto rump session
2009 Rogue CA certificate
2010 NIST recommends replacing SHA1
2012 Flame malware exploits MD5 collision in wild against
Microsoft CA
MD5 collisions can be computed in seconds on a laptop.

The continuing saga of SHA1
SHA1 is a popular hash function using the Merkle-Damgård

construction.
1993 SHA0 published by NIST
1995 NIST adds one instruction to compression function; turns out
to be necessary for collision resistance
2005 Wang, Yao, Yao 263 collision attack
2017 Stevens et al. first collision for SHA1 in 263.1 evaluations;
6500 CPU-years + 110 GPU-years
2019 Chosen-prefix collisions in 268 evaluations
SHA2
The SHA2 hash functions use Merkle-Damgård construction with

Davies-Meyer compression function.
2002 NIST adds SHA256 and SHA512 to SHA family
• SHA224 and SHA384 are truncated versions of these.
• SHA256 uses 512-bit message blocks and 256-bit chaining
variables.
• Recommended for use now.
SHA3
SHA3 uses a new construction, a sponge.

2006 NIST organizes a new hash function competition
• Goal to have a more “diverse” algorithm portfolio
• Intended to be drop-in replacement for SHA2
2012 Keccak algorithm (Bertoni, Daemen, Peeters, Van Aasche)
announced as competition winner
2013 NIST announces decrease to the security parameter for
standard, uproar ensues
• Among other things, algorithm would be less quantum
resistant
2013 NIST walks back change to original security parameter
2015 NIST publishes standard
HW 3 due in 1 week!

6 Hashfunctions

Uploaded by

Copyright:

Available Formats

6 Hashfunctions

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

6 Hashfunctions

Uploaded by

Copyright:

Available Formats

CSE 207B:

Fall 2023 Lecture 6

2. HW 3 is out, due before class in 1 week

This time: Hash functions

Definition (Practical definition)

H(x) : {0, 1}∗ → {0, 1}n

We would like to define H as secure if it is collision resistant.

Collision resistant: No efficient adversary can find messages m0 ,

Definition (Practical definition)

H(x) : {0, 1}∗ → {0, 1}n

We would like to define H as secure if it is collision resistant.

Collision resistant: No efficient adversary can find messages m0 ,

In actual practice, hash functions are fixed functions where all

So “collision-resistance” means “the best cryptanalytic attacks to

2. 2nd preimage resistance:

Inefficient pigeonhole algorithm

We can do much better than this with the birthday “paradox”.

Let H be a hash function with N = 2n possible outputs.

Birthday Attack Algorithm

The previous algorithm requires:

It is possible to find collisions with less memory using the Floyd

Side note: Can get parallelization and time/memory tradeoff using

Does CBC-hash work?

Strawman CBC-hash construction

Does CBC-hash work?

Strawman CBC-hash construction

No: Let m = m1 ||m2 and m′ = m1′ ||m2′ with

Assume we have a fixed-length compression function h.

Case 1 Different message lengths K ̸= K ′ . At last step,

Can construct collision-resistant compression functions from block

Requires specialized block ciphers: 128-bit AES is too short to

Need block ciphers that take 512 or 1024-bit keys.

• MD5: Totally insecure.

MD5 was a popular hash function using the Merkle-Damgård

MD5 collisions can be computed in seconds on a laptop.

SHA1 is a popular hash function using the Merkle-Damgård

The SHA2 hash functions use Merkle-Damgård construction with

SHA3 uses a new construction, a sponge.

You might also like