6 Hashfunctions

Download as pdf or txt
Download as pdf or txt
You are on page 1of 25

CSE 207B:

Applied Cryptography
Nadia Heninger
UCSD

Fall 2023 Lecture 6


Announcements

1. HW 2 is due today!

2. HW 3 is out, due before class in 1 week


Last time: Message integrity and MACs

This time: Hash functions


Cryptographic hash functions

Definition (Practical definition)


A hash function is an efficiently computable function that maps
arbitrary-length inputs to fixed-length outputs

H(x) : {0, 1}∗ → {0, 1}n

We would like to define H as secure if it is collision resistant.

Collision resistant: No efficient adversary can find messages m0 ,


m1 such that
H(m0 ) = H(m1 )
Cryptographic hash functions

Definition (Practical definition)


A hash function is an efficiently computable function that maps
arbitrary-length inputs to fixed-length outputs

H(x) : {0, 1}∗ → {0, 1}n

We would like to define H as secure if it is collision resistant.

Collision resistant: No efficient adversary can find messages m0 ,


m1 such that
H(m0 ) = H(m1 )
This definition doesn’t work in a theoretical sense: By the
pigeonhole principle there exist such collisions, so there exists an
adversary A that just prints them out.
Security game for a collision-resistant hash function
Fix for theoretical proofs: Define a family of hash functions that
depends on a parameter that the adversary doesn’t know in
advance, and define runtime in terms of a security parameter.

Definition
H is collision-resistant if Pr[A succeeds] ≤ negligible.
Practical hash functions

In actual practice, hash functions are fixed functions where all


parameters were generated when the algorithms were
designed/standardized. (Think MD5, SHA1, SHA256, SHA3.)

So “collision-resistance” means “the best cryptanalytic attacks to


find a collision for this fixed function are expected to have infeasible
runtimes”.
Security notions for hash functions

1. Collision resistance:
Find m0 , m1 such that H(m0 ) = H(m1 ).

2. 2nd preimage resistance:


Input m0 , find m1 s.t. H(m0 ) = H(m1 ).

3. Preimage resistance:
Input t = H(m0 ), find m1 s.t. H(m1 ) = t.
Generic collision-finding attacks on hash functions

Inefficient pigeonhole algorithm


If H has ℓ-bit outputs, then after hashing 2ℓ + 1 distinct inputs
there must be a collision.

We can do much better than this with the birthday “paradox”.


Birthday attacks

Let H be a hash function with N = 2n possible outputs.

Birthday Attack Algorithm


1. Choose k uniformly random messages m1 , . . . , mk .
2. Compute ti = H(mi ) for every i.
3. Search for i, j s.t. mi ̸= mj and H(mi ) = H(mj ).

Theorem

Pr[collision after Θ( N) inputs] ≥ 1/2
Birthday collision theorem
Theorem

Pr[collision after Θ( N) inputs] ≥ 1/2

Proof.
We throw k balls uniformly at random into N distinct bins.
1 2 k −1
Pr[k balls into N distinct bins] = 1·(1− )·(1− ) . . . (1− )
N N N
1
Use 1 − N ⪅ e −1/N .
k−1
Y Pk−1
Pr[distinct] ≤ e −i/N = e −1/N i=1 i
= e −k(k−1)/2N
i=1

For k = N:
2 /2N
Pr[collision] = 1 − Pr[distinct] ≥ 1 − e k ≥ 1 − e −2 ≥ 1/2
Birthday Attack Algorithm Complexity

The previous algorithm requires:



• N = 2n/2 hash evaluations

• n N = n2n/2 memory

It is possible to find collisions with less memory using the Floyd


cycle-finding algorithm.
Floyd cycle-finding algorithm
Floyd cycle-finding algorithm
1. Start at an arbitrary x0 = y0 .
2. Compute xi = H(xi−1 ), yi = H(H(yi−1 )).
3. If xi = yi , then m0 = xi−1 and m1 = H(yi−1 ) collide.
4. To find collision at beginning of cycle, restart x pointer from
start and single-step both x and y .

By previous analysis, there is a cycle of length N assuming H
behaves like a random walk.


Algorithm time: O( N)
Memory: O(1)

Side note: Can get parallelization and time/memory tradeoff using


method of distinguished points.
Any collision-resistant hash function needs output at least 256 bits
in practice.
How do you construct a hash function that takes
arbitrary-length inputs?
Assume we have a fixed-length hash function h

Does CBC-hash work?

Strawman CBC-hash construction


1. Set h(m) = AES0 (m).
2. Set H0 = 0000 . . . 0.
3. Let Hi = AES0 (Hi−1 ⊕ mi )
4. Output Hk
How do you construct a hash function that takes
arbitrary-length inputs?
Assume we have a fixed-length hash function h

Does CBC-hash work?

Strawman CBC-hash construction


1. Set h(m) = AES0 (m).
2. Set H0 = 0000 . . . 0.
3. Let Hi = AES0 (Hi−1 ⊕ mi )
4. Output Hk

No: Let m = m1 ||m2 and m′ = m1′ ||m2′ with


m1′ = m2 ⊕ H1 and m2′ = H2 ⊕ m2 ⊕ H1 .

Conclusion: CBC mode on its own does not lead to a secure hash
function.
Merkle-Damgård Construction

Assume we have a fixed-length compression function h.


1. Input message m = m1 ||m2 || . . .
2. Pad final block to ℓ bits: mˆk = mk || pad || len(m).
3. Set t0 = IV .
4. For i = 1 to k do: ti = h(ti−1 , mi )
5. Output tk
Theorem
If h is a fixed-length collision-resistant hash function then H is a
collision-resistant hash function.
Proof.
Assume have a collision (m, m′ ) for H. Construct a collision for h.

Case 1 Different message lengths K ̸= K ′ . At last step,


h(tk−1 , mk || pad ||K ) = h(tk′ ′ −1 , mk′ ′ || pad ||K ′ ) is a collision.
Case 2 Last block identical; K = K ′ . Then collision occurs at some
′ , m′ .
intermediate block. Find last point with ti−1 , mi ̸= ti−1 i
′ , m′ ).
Then h(ti−1 , mi ) = h(ti−1 i
Davies-Meyer compression functions

Can construct collision-resistant compression functions from block


ciphers.

This is collision-resistant.

Requires specialized block ciphers: 128-bit AES is too short to


ensure collision resistance.

Need block ciphers that take 512 or 1024-bit keys.


Hash functions in practice

• MD5: Totally insecure.


• SHA1: Totally insecure.
• SHA2 (SHA256, SHA512): Secure
• SHA3: New hotness standardized by NIST in 2015.
The long saga of MD5

MD5 was a popular hash function using the Merkle-Damgård


construction.
1991 Rivest publishes MD5
1993 “some weaknesses”
1995 SHA1 proposed by NIST
2004 Wang et al. collision for MD5 in Crypto rump session
2009 Rogue CA certificate
2010 NIST recommends replacing SHA1
2012 Flame malware exploits MD5 collision in wild against
Microsoft CA

MD5 collisions can be computed in seconds on a laptop.


The continuing saga of SHA1

SHA1 is a popular hash function using the Merkle-Damgård


construction.
1993 SHA0 published by NIST
1995 NIST adds one instruction to compression function; turns out
to be necessary for collision resistance
2005 Wang, Yao, Yao 263 collision attack
2017 Stevens et al. first collision for SHA1 in 263.1 evaluations;
6500 CPU-years + 110 GPU-years
2019 Chosen-prefix collisions in 268 evaluations
SHA2

The SHA2 hash functions use Merkle-Damgård construction with


Davies-Meyer compression function.
2002 NIST adds SHA256 and SHA512 to SHA family
• SHA224 and SHA384 are truncated versions of these.
• SHA256 uses 512-bit message blocks and 256-bit chaining
variables.
• Recommended for use now.
SHA3

SHA3 uses a new construction, a sponge.


2006 NIST organizes a new hash function competition
• Goal to have a more “diverse” algorithm portfolio
• Intended to be drop-in replacement for SHA2
2012 Keccak algorithm (Bertoni, Daemen, Peeters, Van Aasche)
announced as competition winner
2013 NIST announces decrease to the security parameter for
standard, uproar ensues
• Among other things, algorithm would be less quantum
resistant
2013 NIST walks back change to original security parameter
2015 NIST publishes standard
HW 3 due in 1 week!

You might also like