Hash Functions: 6.1 The Hash Function SHA1

Download as pdf or txt
Download as pdf or txt
You are on page 1of 15

Chapter 6

Hash Functions

A hash function usually means a function that compresses, meaning the output is shorter than the
input. Often, such a function takes an input of arbitrary or almost arbitrary length to one whose
length is a fixed number, like 160 bits. Hash functions are used in many parts of cryptography,
and there are many different types of hash functions, with differing security properties. We will
consider them in this chapter.

6.1 The hash function SHA1


The hash function known as SHA1 is a simple but strange function from strings of almost arbitrary
length to strings of 160 bits. The function was finalized in 1995, when a FIPS (Federal Information
Processing Standard) came out from the US National Institute of Standards that specified SHA1.
Let {0, 1}< denote the set of all strings of length strictly less than . The function SHA1:
64
{0, 1}<2 {0, 1}160 is shown in Fig. 6.1. (Since 264 is a very large length, we think of SHA1
as taking inputs of almost arbitrary length.) It begins by padding the message via the function
shapad, and then iterates the compression function sha1 to get its output. The operations used in
the algorithms of Fig. 6.1 are described in Fig. 6.2. (The first input in the call to SHF1 in code for
SHA1 is a 128 bit string written as a sequence of four 32-bit words, each word being consisting of
8 hexadecimal characters. The same convention holds for the initialization of the variable V in the
code of SHF1.)
SHA1 is derived from a function called MD4 that was proposed by Ron Rivest in 1990, and the
key ideas behind SHA1 are already in MD4. Besides SHA1, another well-known child of MD4 is
MD5, which was likewise proposed by Rivest. The MD4, MD5, and SHA11 algorithms are all quite
similar in structure. The first two produce a 128-bit output, and work by chaining a compression
function that goes from 512 + 128 bits to 128 bits, while SHA1 produces a 160 bit output and works
by chaining a compression function from 512 + 160 bits to 160 bits.
So what is SHA1 supposed to do? First and foremost, it is supposed to be the case that nobody
can find distinct strings M and M such that SHA1(M ) = SHA1(M ). This property is called
collision resistance.
Stop for a moment and think about the collision-resistance requirement, for it is really quite
amazing to think that such a thing could be possible. The function SHA1 maps strings of (almost)
any length to strings of 160 bits. So even if you restricted the domain of SHA1 just to short
stringslet us say strings of length 256 bitsthen there must be an enormous number of pairs of
2 HASH FUNCTIONS

algorithm SHA1(M ) // |M | < 264


V SHF1( 5A827999 k 6ED9EBA1 k 8F1BBCDC k CA62C1D6 , M )
return V

algorithm SHF1(K, M ) // |K| = 128 and |M | < 264


y shapad(M )
Parse y as M1 k M2 k k Mn where |Mi | = 512 (1 i n)
V 67452301 k EFCDAB89 k 98BADCFE k 10325476 k C3D2E1F0
for i = 1, . . . , n do
V shf1(K, Mi k V )
return V

algorithm shapad(M ) // |M | < 264


d (447 |M |) mod 512
Let be the 64-bit binary representation of |M |
y M k 1 k 0d k // |y| is a multiple of 512
return y

algorithm shf1(K, B k V ) // |K| = 128, |B| = 512 and |V | = 160


Parse B as W0 k W1 k k W15 where |Wi | = 32 (0 i 15)
Parse V as V0 k V1 k k V4 where |Vi | = 32 (0 i 4)
Parse K as K0 k K1 k K2 k K3 where |Ki | = 32 (0 i 3)
for t = 16 to 79 do
Wt ROTL1 (Wt3 Wt8 Wt14 Wt16 )
A V0 ; B V1 ; C V2 ; D V3 ; E V4
for t = 0 to 19 do
Lt K0 ; Lt+20 K1 ; Lt+40 K2 ; Lt+60 K3
for t = 0 to 79 do
if (0 t 19) then f (B C) ((B) D)
if (20 t 39 OR 60 t 79) then f B C D
if (40 t 59) then f (B C) (B D) (C D)
temp ROTL5 (A) + f + E + Wt + Lt
E D ; D C ; C ROTL30 (B) ; B A ; A temp
V0 V0 + A ; V1 V1 + B ; V2 V2 + C ; V3 V3 + D ; V4 V4 + E
V V0 k V1 k V2 k V3 k V4
return V

Figure 6.1: The SHA1 hash function and the underlying SHF1 family.

strings M and M that hash to the same value. This is just by the pigeonhole principle: if 2256
pigeons (the 256-bit messages) roost in 2160 holes (the 160-bit hash values) then some two pigeons
(two distinct strings) roost in the same hole (have the same hash). Indeed countless pigeons must
share the same hole. The difficult is only that nobody has as yet identified (meaning, explicitly
provided) even two such pigeons (strings).
Bellare and Rogaway 3

X Y bitwise AND of X and Y


X Y bitwise OR of X and Y
X Y bitwise XOR of X and Y
X bitwise complement of X
X +Y integer sum modulo 232 of X and Y
ROTLl (X) circular left shift of bits of X by l positions (0 l 31)

Figure 6.2: Operations on 32-bit words used in sha1.

In trying to define this collision-resistance property of SHA1 we immediately run into foun-
dational problems. We would like to say that it is computationally infeasible to output a pair
of distinct strings M and M that collide under SHA1. But in what sense could it be infeasible?
There is a programindeed a very short an simple one, having just two print statementswhose
output specifies a collision. Its not computationally hard to output a collision; it cant be. The
only difficulty is our human problem of not knowing what this program is.
It seems very hard to make a mathematical definition that captures the idea that human beings
cant find collisions in SHA1. In order to reach a mathematically precise definition we are going to
have to change the very nature of what we conceive to be a hash function. Namely, rather than it
being a single function, it will be a family of functions. This is unfortunate in some ways, because
it distances us from concrete hash functions like SHA1. But no alternative is known.

6.2 Collision-resistant hash functions


A hash function for us is a family of functions H: K D R. Here D is the domain of H and
R is the range of H. As usual, if K K is a particular key then HK : D R is defined for all
M D by HK (M ) = H(K, M ). This is the instance of H defined by key K.
64
An example is SHF1: {0, 1}128 {0, 1}<2 {0, 1}160 , as described in Fig. 6.1. This hash
function takes a 128-bit key and an input M of at most 264 bits and returns a 160-bit output. The
function SHA1 is an instance of this family, namely the one whose associated key is
5A827999 k 6ED9EBA1 k 8F1BBCDC k CA62C1D6 .
Let H: K D R be a hash function. Here is some notation we use in this chapter. For any
key K and y R we let
1
HK (y) = { x D : HK (x) = y }
denote the pre-image set of y under HK . Let
Image(HK ) = { HK (x) : x D }
denote the image of HK .
A collision for a function h: D R is a pair x1 , x2 D of points such that (1) HK (x1 ) =
HK (x2 ) and (2) x1 6= x2 . The most basic security property of a hash function is collision-resistance,
which measures the ability of an adversary to find a collision for an instance of a family H. There
are different notions of collision-resistance, varying in restrictions put on the adversary in its quest
for a collision.
To introduce the different notions, we imagine a game, parameterized by an integer s {0, 1, 2},
4 HASH FUNCTIONS

Pre-key attack phase A selects 2 s points


Key selection phase A key K is selected at random from K
Post-key attack phase A is given K and returns s points
Winning condition The 2 points selected by A form a collision for HK

Figure 6.3: Framework for security notions for collision-resistant hash functions. The three choices
of s {0, 1, 2} give rise to three notions of security.

and involving an adversary A. It consists of a pre-key attack phase, followed by a key-selection


phase, followed by a post-key attack phase. The adversary is attempting to find a collision for
HK , where key K is selected at random from K in the key-selection phase. Recall that a collision
consists of a pair x1 , x2 of (distinct) points in D. The adversary is required to specify 2 s points
in the pre-key attack phase, before it has any information about the key. (The latter has yet to
be selected.) Once the adversary has specified these points and the key has been selected, the
adversary is given the key, and will choose the remaining s points as a function of the key, in the
post-key attack phase. It wins if the 2 = (2 s) + s points it has selected form a collision for HK .
Fig. 6.3 summarizes the framework. The three choices of the parameter s give rise to three
notions of security. The higher the value of s the more power the adversary has, and hence the more
stringent is the corresponding notion of security. Fig. 6.4 provides in more detail the experiments
underlying the three attacks arising from the above framework. We represent by st information
that the adversary wishes to maintain across its attack phases. It will output this information in
the pre-key attack phase, and be provided it at the start of the post-key attack phase.
In a variant of this model that we consider in Section ??, the adversary is not given the key
K in the post-key attack phase, but instead is given an oracle for HK (). To disambiguate, we
refer to our current notions as capturing collision-resistance under known-key attack, and the
notions of Section ?? as capturing collision-resistance under hidden-key attack. The notation in
the experiments of Fig. 6.4 and Definition 6.2.1 reflects this via the use of kk, except that for
CR0, known and hidden key attacks coincide, and hence we just say cr0.
The three types of hash functions we are considering are known by other names in the literature,
as indicated in Fig. 6.5.

Definition 6.2.1 Let H: K D R be a hash function and let A be an algorithm. We let


h i
Advcr2 A
H (A) = Pr CR2H true
h i
Advcr1 A
H (A) = Pr CR1H true
h i
Advcr0 A
H (A) = Pr CR0H true .

In measuring resource usage of an adversary we use our usual conventions. Although there is
formally no definition of a secure hash function, we will talk of a hash function being CR2, CR1
or CR0 with the intended meaning that its associated advantage function is small for all adversaries
of practical running time.
Note that the running time of the adversary is not really relevant for CR0, because we can
always imagine that hardwired into its code is a best choice of distinct points x1 , x2 , meaning a
Bellare and Rogaway 5

Game CR2H procedure Finalize(x1 , x2 )


procedure Initialize Return (x1 6= x2 HK (x1 ) = HK (x2 ))
$
K {0, 1}k
Return K

Game CR1H procedure Submit(x1 )


procedure Initialize $
K {0, 1}k
Return K
procedure Finalize(x2 )
Return (x1 6= x2 HK (x1 ) = HK (x2 ))

Game CR0H procedure Submit(x1 , x2 )


procedure Initialize $
K {0, 1}k
Return K
procedure Finalize()
Return (x1 6= x2 HK (x1 ) = HK (x2 ))

Figure 6.4: Games defining security notions for three kinds of collision-resistant hash functions
under known-key attack.

Type Name(s) in literature


CR2-KK collision-free, collision-resistant, collision-intractable
CR1-KK universal one-way [8] (aka. target-collision resistant [1])
CR0 universal, almost universal

Figure 6.5: Types of hash functions, with names in our framework and corresponding names found
in the literature.

choice for which


h i
$
Pr K K : HK (x1 ) = HK (x2 )
h i
$
= max Pr K K : HK (y1 ) = HK (y2 ) .
y1 6=y2

The above value equals Advcr0H (A) and is the maximum advantage attainable.
Clearly, a CR2 hash function is also CR1 and a CR1 hash function is also CR0. The following
states the corresponding relations formally. The proof is trivial and is omitted.
6 HASH FUNCTIONS

Proposition 6.2.2 Let H: K D R be a hash function. Then for any adversary A0 there
exists an adversary A1 having the same running time as A0 and
Advcr0 (A ) Advcr1-kk (A ) .
H 0 H 1

Also for any adversary A1 there exists an adversary A2 having the same running time as A1 and
cr1-kk -kk (A ) .
AdvH (A1 ) Advcr2
H 2

cr2-kk
We believe that SHF1 is CR2, meaning that there is no practical algorithm A for which AdvH (A)
is appreciably large. This is, however, purely a belief, based on the current inability to find such
an algorithm. Perhaps, later, such an algorithm will emerge.
It is useful, for any integer n, to get SHF1n : {0, 1}n {0, 1}160 denote the restriction of SHF1
to the domain {0, 1}n . Note that a collision for SHF1nK is also a collision for SHF1K , and it is often
convenient to think of attacking SHF1n for some fixed n rather than SHF1 itself.

6.3 Collision-finding attacks


Let us focus on CR2, which is the most important property for the applications we will see later.
We consider different types of CR2-type collision-finding attacks on a family H: K D R where
D, R are finite sets. We assume the family performs some reasonable compression, say |D| 2|R|.
Canonical example families to keep in mind are H = SHF1n for n 161 and shf1, the compression
function of SHF1.
Collision-resistance does not mean it is impossible to find a collision. Analogous to the case of
one-wayness, there is an obvious collision-finding strategy. Let us enumerate the elements of D in
some way, so that D = {D1 , D2 , . . . , Dd } where d = |D|. The following adversary A implements an
exhaustive search collision-finding attack:
Adversary A(K)
$
x1 D ; y HK (x1 )
for i = 1, . . . , q do
if (HK (Di ) = y and x1 6= Di ) then return x1 , Di
return FAIL
We call q the number of trials. Each trial involves one computation of HK , so the number of trials
1
is a measure of the time taken by the attack. To succeed, the attack requires that HK (y) has size
at least two, which happens at least half the time if |D| 2|R|. However, we would still expect
that it would take about q = |D| trials to find a collision, which is prohibitive, for D is usually
large. For example, for F = shf1, the domain has size 2672 , far too large. For SHF1n , we would
choose n as small as possible, but need n 161 to ensure collisions exist, so the attack uses 2161
computations of HK , which is not practical.
Now heres another idea. We pick points at random, hoping that their image under HK equals
the image under HK of an initial target point. Call this the random-input collision-finding attack.
It is implemented like this:
Adversary A(K)
$
x1 D ; y HK (x1 )
for i = 1, . . . , q do
$
x2 D
if (HK (x2 ) = y and x1 6= x2 ) then return x1 , x2
return FAIL
Bellare and Rogaway 7

for i = 1, . . . , q do // q is the number of trials


$
xi D ; yi HK (xi )
if (there exists j < i such that yi = yj but xi 6= xj ) // collision found
then return xi , xj
return FAIL // No collision found

Figure 6.6: Birthday attack on a hash function H: K D R. The attack is successful in finding
a collision if it does not return FAIL.

A particular trial finds a collision with probability (about) 1 in |R|, so we expect to find a collision
in about q = |R| trials. This is much better than the |D| trials used by our first attempt. In
particular, a collision for shf1 would be found in time around 2160 rather than 2672 . But this is still
far from practical. Our conclusion is that as long as the range size of the hash function is large
enough, this attack is not a threat.
We now consider another strategy, called a birthday attack, that turns out to be much better
than the above. It is illustrated in Fig. 6.6. It picks at random q points from the domain, and
applies HK to each of them. If it finds two distinct points yielding the same output, it has found
a collision for HK . The question isphow large q need be to find a collision. The answer may seem
surprising at first. Namely, q = O( |R|) trials suffices.
We will justify this later, but first let us note the impact. Consider SHA1n with n 161. As we
indicated, the random-input collision-finding attack 160 trials to find a collision. The
takes about 2
birthday attack on the other hand takes around 2160 = 280 trials. This is MUCH less than 2160 .
Similarly, the birthday attack finds a collision in shf1 in around 280 trials while while random-input
collision-finding takes about 2160 trials.
To see why the birthday attack performs as well as we claimed, we recall the following game.
Suppose we have q balls. View them as numbered, 1, . . . , q. We also have N bins, where N q.
We throw the balls at random into the bins, one by one, beginning with ball 1. At random means
that each ball is equally likely to land in any of the N bins, and the probabilities for all the balls
are independent. A collision is said to occur if some bin ends up containing at least two balls. We
are interested in C(N, q), the probability of a collision. As shown in the Appendix,
q2
C(N, q) (6.1)
2N

for 1 q 2N . Thus C(N, q) 1 for q 2N .
The relation to birthdays arises from the question of how many people need be in a room before
the probability of there being two people with the same birthday is close to one. We imagine each
person has a birthday that is a random one of the 365 days in a year. This means we can think
of a person as a ball being thrown at random into one of 365 bins, where the i-th bin represents
having birthday the i-th day of the year. So we can apply the Proposition from the Appendix
with N = 365and q the number of people in the room. The Proposition says that when the room
contains q 2 365 27 people, the probability that there are two people with the same birthday
is close to one. This number (27) is quite small and may be hard to believe at first hearing, which
is why this is sometimes called the birthday paradox.
To see how this applies to the birthday attack of Fig. 6.6, let us enumerate the points in the
range as R1 , . . . , RN , where N = |R|. Each such point defines a bin. We view xi as a ball, and
imagine that it is thrown into bin yi , where yi = HK (xi ). Thus, a collision of balls (two balls in
8 HASH FUNCTIONS

the same bin) occurs precisely when two values xi , xj have the same output under HK . We are
interested in the probability that this happens as a function of q. (We ignore the probability that
xi = xj , counting a collision only when HK (xi ) = HK (xj ). It can be argued that since D is larger
than R, the probability that xi = xj is small enough to neglect.)
However, we cannot apply the birthday analysis directly, because the latter assumes that each
ball is equally likely to land in each bin. This is not, in general, true for our attack. Let P (Rj )
denote the probability that a ball lands in bin Rj , namely the probability that HK (x) = Rj taken
over a random choice of x from D. Then
1
|HK (Rj )|
P (y) = .
|D|
In order for P (R1 ) = P (R2 ) = = P (RN ) to be true, as required to apply the birthday analysis,
it must be the case that
1 1 1
|HK (R1 )| = |HK (R2 )| = = |HK (RN )| .
A function HK with this property is called regular, and H is called regular if HK is regular for
every K. Our conclusion is that if H is regular, then the probability that the attack
p succeeds is
roughly C(N, q). So the above says that in this case we need about q 2N = 2 |R| trials to
find a collision with probability close to one.
If H is not regular, it turns out the attack succeeds even faster, telling us that we ought to
design hash functions to be as close to regular as possible [2].
In summary, there is a 2l/2 or better time attack to find collisions in any hash function outputting
l bits. This leads designers to choose l large enough that 2l/2 is prohibitive. In the case of SHF1
and shf1, the choice is l = 160 because 280 is indeed a prohibitive number of trials. These functions
cannot thus be considered vulnerable to birthday attack. (Unless they turn out to be extremely
non-regular, for which there is no evidence so far.)
Ensuring, by appropriate choice of output length, that a function is not vulnerable to a birthday
attack does not, of course, guarantee it is collision resistant. Consider the family H: K{0, 1}161
{0, 1}160 defined as follows. For any K and any x, function HK (x) returns the first 160 bits of x.
The output length is 160, so a birthday attack takes 280 time and is not feasible, but it is still easy
to find collisions. Namely, on input K, an adversary can just pick some 160-bit y and output y0, y1.
This tells us that to ensure collision-resistance it is not only important to have a long enough output
but also design the hash function so that there no clever shortcuts to finding a collision, meaning
no attacks that exploit some weakness in the structure of the function to quickly find collisions.
We believe that shf1 is well-designed in this regard. Nobody has yet found an adversary that
finds a collision in shf1 using less than 280 trials. Even if a somewhat better adversary, say one
finding a collision for shf1 in 265 trials, were found, it would not be devastating, since this is still a
very large number of trials, and we would still consider shf1 to be collision-resistant.
If we believe shf1 is collision-resistant, Theorem 6.5.2 tells us that SHF1, as well as SHF1n , can
also be considered collision-resistant, for all n.

6.4 One-wayness of collision-resistant hash functions


Intuitively, a family H is one-way if it is computationally infeasible, given HK and a range point
y = HK (x), where x was chosen at random from the domain, to find a pre-image of y (whether
x or some other) under HK . Since this definition too has a hidden-key version, we indicate the
known-key in the notation below.
Bellare and Rogaway 9

Definition 6.4.1 Let H: K D R be a family of functions and let A be an algorithm. We


consider the following experiment:
Game OWH
procedure Initialize
$
K {0, 1}k
$
x D ; y HK (x)
Return K, y
procedure Finalize(x )
Return (HK (x ) = y)

We let

Advow A
H (A) = Pr[OWH true] .

We now ask ourselves whether collision-resistance implies one-wayness. It is easy to see, however,
that, in the absence of additional assumptions about the hash function than collision-resistance,
the answer is no. For example, let H be a family of functions every instance of which is the
identity function. Then H is highly collision-resistant (the advantage of an adversary in finding
a collision is zero regardless of its time-complexity since collisions simply dont exist) but is not
one-way.
However, we would expect that genuine hash functions, meaning ones that perform some
non-trivial compression of their data (ie. the size of the range is more than the size of the domain)
are one-way. This turns out to be true, but needs to be carefully quantified. To understand the
issues, it may help to begin by considering the natural argument one would attempt to use to show
that collision-resistance implies one-wayness.
Suppose we have an adversary A that has a significant advantage in attacking the one-wayness
of hash function H. We could try to use A to find a collision via the following strategy. In the
pre-key phase (we consider a type-1 attack) we pick and return a random point x1 from D. In the
post-key phase, having received the key K, we compute y = HK (x1 ) and give K, y to A. The latter
returns some x2 , and, if it was successful, we know that HK (x2 ) = y. So HK (x2 ) = HK (x1 ) and
we have a collision.
Not quite. The catch is that we only have a collision if x2 6= x1 . The probability that this
happens turns out to depend on the quantity:
h i
$ $ 1
PreImH (1) = Pr K K; x D ; y HK (x) : |HK (y)| = 1 .

This is the probability that the size of the pre-image set of y is exactly 1, taken over y generated
as shown. The following Proposition says that a collision-resistant function H is one-way as long
as PreImH (1) is small.

Proposition 6.4.2 Let H: K D R be a hash function. Then for any A there exists a B such
that

Advow -kk (A) 2 Advcr1-kk (B) + PreIm (1) .


H H H

Furthermore the running time of B is that of A plus the time to sample a domain point and compute
H once.
10 HASH FUNCTIONS

The result is about the CR1 type of collision-resistance. However Proposition 6.2.2 implies that
the same is true for CR2.
A general and widely-applicable corollary of the above Proposition is that collision-resistance
implies one-wayness as long as the domain of the hash function is significantly larger than its range.
The following quantifies this.

Corollary 6.4.3 Let H: K D R be a hash function. Then for any A there exists a B such
that
-kk (A) 2 Advcr1-kk (B) + |R|
Advow
H H .
|D|
Furthermore the running time of B is that of A plus the time to sample a domain point and compute
H once.

Proof of Corollary 6.4.3: For any key K, the number of points in the range of HK that have
exactly one pre-image certainly cannot exceed |R|. This implies that
|R|
PreImH (1) .
|D|
The corollary follows from Proposition 6.4.2.

Corollary 6.4.3 says that if H is collision-resistant, and performs enough compression that |R|
is much smaller than |D|, then it is also one-way. Why? Let A be a practical adversary that
attacks the one-wayness of H. Then B is also practical, and since H is collision-resistant we know
Advcr1 -kk (B) is low. Equation (6.2) then tells us that as long as |R|/|D| is small, Advow-kk (A) is
H H
low, meaning H is one-way.
As an example, let H be the compression function shf1. In that case R = {0, 1}160 and D =
{0, 1}672 so |R|/|D| = 2512 , which is tiny. We believe shf1 is collision-resistant, and the above thus
says it is also one-way.
There are some natural hash functions, however, for which Corollary 6.4.3 does not apply.
Consider a hash function H every instance of which is two-to-one. The ratio of range size to
domain size is 1/2, so the right hand side of the equation of Corollary 6.4.3 is 1, meaning the
bound is vacuous. However, such a function is a special case of the one considered in the following
Proposition.

Corollary 6.4.4 Suppose 1 r < d and let H: K {0, 1}d {0, 1}r be a hash function which
1
is regular, meaning |HK (y)| = 2dr for every y {0, 1}r and every K K. Then for any A there
exists a B such that
ow-kk -kk (B) .
AdvH (A) 2 Advcr1
H

Furthermore the running time of B is that of A plus the time to sample a domain point and compute
H once.

Proof of Corollary 6.4.4: The assumption d > r implies that PreImH (1) = 0. Now apply
Proposition 6.4.2.

We now turn to the proof of Proposition 6.4.2.

Proof of Proposition 6.4.2: Heres how B works:


Bellare and Rogaway 11

Pre-key phase Post-key phase

Adversary B() Adversary B(K, st)


$
x1 D ; st x1 Retrieve x1 from st
$
return (x1 , st) y HK (x1 ) ; x2 B(K, y)
return x2

Let Pr [] denote the probability of event in experiment Expcr1-kk (B). For any K K let
H
1
SK = { x D : |HK (HK (x))| = 1 } .

Advcr1 -kk (B) (6.2)


H

= Pr [HK (x2 ) = y x1 6= x2 ] (6.3)


Pr [HK (x2 ) = y x1 6= x2 x1 6 SK ] (6.4)
= Pr [x1 6= x2 | HK (x2 ) = y x1 6 SK ] Pr [HK (x2 ) = y x1 6 SK ] (6.5)
1
Pr [HK (x2 ) = y x1 6 SK ] (6.6)
2
1
(Pr [HK (x2 ) = y] Pr [x1 SK ]) (6.7)
2
1 
-kk (A) PreIm (1)

= Advow
H H . (6.8)
2
Re-arranging terms yields Equation (6.2). Let us now justify the steps above. Equation (6.3) is by
definition of Advcr1 -kk (B) and B. Equation (6.4) is true because Pr[E] Pr[E F ] for any events
H
E, F . Equation (6.5) uses the standard formula Pr[E F ] = P r[E|F ] Pr[F ]. Equation (6.6) is
justified as follows. Adversary A has no information about x1 other than that it is a random point
1 1
in the set HK (y). However if x1 6 SK then |HK (y)| 2. So the probability that x2 6= x1 is
at least 1/2 in this case. Equation (6.7) applies another standard probabilistic inequality, namely
that Pr[E F ] Pr[E] Pr[F ]. Equation (6.8) uses the definitions of the quantities involved.

6.5 The MD transform


We saw above that SHF1 worked by iterating applications of its compression function shf1. The
latter, under any key, compresses 672 bits to 160 bits. SHF1 works by compressing its input 512
bits at a time using shf1.
The iteration method has been chosen carefully. It turns out that if shf1 is collision-resistant,
then SHF1 is guaranteed to be collision-resistant. In other words, the harder task of designing a
collision-resistant hash function taking long and variable-length inputs has been reduced to the
easier task of designing a collision-resistant compression function that only takes inputs of some
fixed length.
This has clear benefits. We need no longer seek attacks on SHF1. To validate it, and be
assured it is collision-resistant, we need only concentrate on validating shf1 and showing the latter
is collision-resistant.
This is one case of an important hash-function design principle called the MD paradigm [10, 3].
This paradigm shows how to transform a compression function into a hash function in such a way
that collision-resistance of the former implies collision-resistance of the latter. We are now going
to take a closer look at this paradigm.
12 HASH FUNCTIONS

H(K, M )
y pad(M )
Parse y as M1 k M2 k k Mn where |Mi | = b (1 i n)
V IV
for i = 1, . . . , n do
V h(K, Mi k V )
Return V

Adversary Ah (K)
Run AH (K) to get its output (x1 , x2 )
y1 pad(x1 ) ; y2 pad(x2 )
Parse y1 as M1,1 k M1,2 k k M1,n[1] where |M1,i | = b (1 i n[1])
Parse y2 as M2,1 k M2,2 k k M2,n[2] where |M2,i | = b (1 i n[2])
V1,0 IV ; V2,0 IV
for i = 1, . . . , n[1] do V1,i h(K, M1,i k V1,i1 )
for i = 1, . . . , n[2] do V2,i h(K, M2,i k V2,i1 )
if (V1,n[1] 6= V2,n[2] OR x1 = x2 ) return FAIL
if |x1 | =
6 |x2 | then return (M1,n[1] k V1,n[1]1 , M2,n[2] k V2,n[2]1 )
n n[1] // n = n[1] = n[2] since |x1 | = |x2 |
for i = n downto 1 do
if M1,i k V1,i1 6= M2,i k V2,i1 then return (M1,i k V1,i1 , M2,i k V2,i1 )

Figure 6.7: Hash function H defined from compression function h via the MD paradigm, and
adversary Ah for the proof of Theorem 6.5.2.

Let b be an integer parameter called the block length, and v another integer parameter called
the chaining-variable length. Let h: K {0, 1}b+v {0, 1}v be a family of functions that we call
the compression function. We assume it is collision-resistant.
Let B denote the set of all strings whose length is a positive multiple of b bits, and let D be
b
some subset of {0, 1}<2 .

Definition 6.5.1 A function pad: D B is called a MD-compliant padding function if it has the
following properties for all M, M1 , M2 D:
(1) M is a prefix of pad(M )
(2) If |M1 | = |M2 | then |pad(M1 )| = |pad(M2 )|
(3) If M1 6= M2 then the last block of pad(M1 ) is different from the last block of pad(M2 ).

A block, above, consists of b bits. Remember that the output of pad is in B, meaning is a sequence
of b-bit blocks. Condition (3) of the definition is saying that if two messages are different then,
when we apply pad to them, we end up with strings that differ in their final blocks.
An example of a MD-compliant padding function is shapad. However, there are other examples
as well.
Now let IV be a v-bit value called the initial vector. We build a family H: K D {0, 1}v
from h and pad as illustrated in Fig. 6.7. Notice that SHF1 is such a family, built from h = shf1
and pad = shapad. The main fact about this method is the following.
Bellare and Rogaway 13

Theorem 6.5.2 Let h: K{0, 1}b+v {0, 1}v be a family of functions and let H: KD {0, 1}v
be built from h as described above. Suppose we are given an adversary AH that attempts to find
collisions in H. Then we can construct an adversary Ah that attempts to find collisions in h, and
Advcr2-kk (A ) Advcr2-kk (A ) .
H H h h (6.9)
Furthermore, the running time of Ah is that of AH plus the time to perform (|pad(x1 )|+|pad(x2 )|)/b
computations of h where (x1 , x2 ) is the collision output by AH .

This theorem says that if h is collision-resistant then so is H. Why? Let AH be a practical adversary
attacking H. Then Ah is also practical, because its running time is that of AH plus the time to
do some extra computations of h. But since h is collision-resistant we know that Advcr2 -kk (A )
h h
cr2-kk
is low. Equation (6.9) then tells us that AdvH (AH ) is low, meaning H is collision-resistant as
well.

Proof of Theorem 6.5.2: Adversary Ah , taking input a key K K, is depicted in Fig. 6.7. It
runs AH on input K to get a pair (x1 , x2 ) of messages in D. We claim that if x1 , x2 is a collision
for HK then Ah will return a collision for hK .
Adversary Ah computes V1,n[1] = HK (x1 ) and V2,n[2] = HK (x2 ). If x1 , x2 is a collision for HK then
we know that V1,n[1] = V2,n[2] . Let us assume this. Now, let us look at the inputs to the application
of hK that yielded these outputs. If these inputs are different, they form a collision for hK .
The inputs in question are M1,n[1] k V1,n[1]1 and M2,n[2] k V2,n[2]1 . We now consider two cases. The
first case is that x1 , x2 have different lengths. Item (3) of Definition 6.5.1 tells us that M1,n[1] 6=
M2,n[2] . This means that M1,n[1] k V1,n[1]1 6= M2,n[2] k V2,n[2]1 , and thus these two points form a
collision for hK that can be output by Ah .
The second case is that x1 , x2 have the same length. Item (2) of Definition 6.5.1 tells us that y1 , y2
have the same length as well. We know this length is a positive multiple of b since the range of pad
is the set B, so we let n be the number of b-bit blocks that comprise y1 and y2 . Let Vn denote the
value V1,n , which by assumption equals V2,n . We compare the inputs M1,n kV1,n1 and M2,n kV2,n1
that under hK yielded Vn . If they are different, they form a collision for hK and can be returned
by Ah . If, however, they are the same, then we know that V1,n1 = V2,n1 . Denoting this value by
Vn1 , we now consider the inputs M1,n1 k V1,n2 and M2,n1 k V2,n2 that under hK yield Vn1 .
The argument repeats itself: if these inputs are different we have a collision for hK , else we can step
back one more time.
Can we get stuck, continually stepping back and not finding our collision? No, because y1 6= y2 .
Why is the latter true? We know that x1 6= x2 . But item (1) of Definition 6.5.1 says that x1 is a
prefix of y1 and x2 is a prefix of y2 . So y1 6= y2 .
We have argued that on any input K, adversary Ah finds a collision in hK exactly when AH finds a
collision in HK . This justifies Equation (6.9). We now justify the claim about the running time of
Ah . The main component of the running time of Ah is the time to run AH . In addition, it performs
a number of computations of h equal to the number of blocks in y1 plus the number of blocks in
y2 . There is some more overhead, but small enough to neglect.
14 HASH FUNCTIONS
Bibliography

[1] M. Bellare and P. Rogaway. Collision-Resistant Hashing: Towards Making UOWHFs


Practical. Advances in Cryptology CRYPTO 97, Lecture Notes in Computer Science
Vol. 1294, B. Kaliski ed., Springer-Verlag, 1997.

[2] M. Bellare and T. Kohno. Hash function balance and its impact on birthday attacks.
Advances in Cryptology EUROCRYPT 04, Lecture Notes in Computer Science Vol. 3027,
C. Cachin and J. Camenisch ed., Springer-Verlag, 2004.

[3] I. Damg ard. A Design Principle for Hash Functions. Advances in Cryptology CRYPTO
89, Lecture Notes in Computer Science Vol. 435, G. Brassard ed., Springer-Verlag, 1989.

[4] B. den Boer and A. Bosselaers, Collisions for the compression function of MD5. Advances
in Cryptology EUROCRYPT 93, Lecture Notes in Computer Science Vol. 765, T. Helleseth
ed., Springer-Verlag, 1993.

[5] H. Dobbertin, Cryptanalysis of MD4. Fast Software EncryptionCambridge Workshop,


Lecture Notes in Computer Science, vol. 1039, D. Gollman, ed., Springer-Verlag, 1996.

[6] H. Dobbertin, Cryptanalysis of MD5. Rump Session of Eurocrypt 96, May 1996,
http://www.iacr.org/conferences/ec96/rump/index.html.

[7] H. Dobbertin, A. Bosselaers, B. Preneel. RIPEMD-160, a strengthened version of


RIPEMD. Fast Software Encryption 96, Lecture Notes in Computer Science Vol. 1039,
D. Gollmann ed., Springer-Verlag, 1996.

[8] M. Naor and M. Yung, Universal one-way hash functions and their cryptographic appli-
cations. Proceedings of the 21st Annual Symposium on the Theory of Computing, ACM,
1989.

[9] National Institute of Standards. FIPS 180-2, Secure hash standard, August 2000. http://
csrc.nist.gov/publications/fips/fips180-2/fips180-2.pdf.

[10] R. Merkle. One way hash functions and DES. Advances in Cryptology CRYPTO 89,
Lecture Notes in Computer Science Vol. 435, G. Brassard ed., Springer-Verlag, 1989.

[11] R. Rivest, The MD4 message-digest algorithm, Advances in Cryptology CRYPTO 90,
Lecture Notes in Computer Science Vol. 537, A. J. Menezes and S. Vanstone ed., Springer-
Verlag, 1990, pp. 303311. Also IETF RFC 1320 (April 1992).

[12] R. Rivest, The MD5 message-digest algorithm. IETF RFC 1321 (April 1992).

You might also like