Dilithium Nist
Dilithium Nist
Dilithium Nist
Shi Bai, Léo Ducas, Eike Kiltz, Tancrède Lepoint, Vadim Lyubashevsky,
Peter Schwabe, Gregor Seiler and Damien Stehlé
1
Florida Atlantic University
2
CWI, The Netherlands
3
Ruhr Universität Bochum, Germany
4
Google, USA
5
IBM Research Europe, Switzerland
6
Max Planck Institute for Security and Privacy, Germany & Radboud University, The
Netherlands
7
IBM Research Europe and ETH, Switzerland
8
ENS de Lyon, France
October 1, 2020
The most up-to-date versions of our specification document and code can be
accessed at:
https://pq-crystals.org/dilithium/
and
https://github.com/pq-crystals/dilithium/tree/round3
2 CRYSTALS-Dilithium
1 Introduction
We present the digital signature scheme Dilithium, whose security is based on the hardness
of finding short vectors in lattices. Our scheme was designed with the following criteria in
mind:
Be conservative with parameters. Since we are aiming for long-term security, we have
analyzed the applicability of lattice attacks from a very favorable, to the attacker, viewpoint.
In particular, we are considering (quantum) algorithms whose space requirements are on
the same order as the time ones. Such algorithms are currently unrealistic, and there seem
to be serious obstacles in removing the space requirement, but we are allowing for the
possibility that improvements may occur in the future.
Minimize the size of public key + signature. Since many applications require the
transmission of both the public key and the signature (e.g. certificate chains), we designed
our scheme to minimize the sum of these parameters. Under the restriction that we avoid
(discrete) Gaussian sampling, to the best of our knowledge, Dilithium has the smallest
combination of signature and public key sizes of any lattice-based scheme with the same
security levels.
Be modular – easy to vary security. The two operations that constitute nearly the
entirety of the signing and verification procedures are expansion of an XOF (we use
SHAKE-128 and SHAKE-256), and multiplication in the polynomial ring Zq [X ]/(X n + 1).
Highly efficient implementations of our algorithm will therefore need to optimize these
operations and make sure that they run in constant time. For all security levels, our
scheme uses the same ring with q = 223 − 213 + 1 and n = 256. Varying security simply
involves doing more/less operations over this ring and doing more/less expansion of the
XOF. In other words, once an optimized implementation is obtained for some security
level, it is easy to obtain an optimized implementation for a higher/lower level.
Gen
01 A ← Rqk×`
02 (s1 , s2 ) ← Sη` × Sηk
03 t := As1 + s2
04 return (pk = (A, t), sk = (A, t, s1 , s2 ))
Sign(sk, M )
05 z := ⊥
06 while z = ⊥ do
07 y ← Sγ`1 −1
08 w1 := HighBits(Ay, 2γ2 )
09 c ∈ Bτ := H(M k w1 )
10 z := y + cs1
11 if kzk∞ ≥ γ1 − β or kLowBits(Ay − cs2 , 2γ2 )k∞ ≥ γ2 − β, then z := ⊥
12 return σ = (z, c)
Figure 1: Template for our signature scheme without public key compression.
Key Generation. The key generation algorithm generates a k × ` matrix A each of whose
entries is a polynomial in the ring Rq = Zq [X ]/(X n + 1). As previously mentioned, we will
always have q = 223 − 213 + 1 and n = 256. Afterwards, the algorithm samples random
secret key vectors s1 and s2 . Each coefficient of these vectors is an element of Rq with
small coefficients of size at most η. Finally, the second part of the public key is computed
as t = As1 + s2 . All algebraic operations in this scheme are assumed to be over the
polynomial ring Rq .
Verification. The verifier first computes w10 to be the high-order bits of Az − ct, and then
accepts if all the coefficients of z are less than γ1 − β and if c is the hash of the message and
w10 . Let us look at why verification works, in particular as to why HighBits(Az − ct, 2γ2 ) =
HighBits(Ay, 2γ2 ). The first thing to notice is that Az − ct = Ay − cs2 . So all we really
need to show is that
The reason for this is that a valid signature will have kLowBits(Ay − cs2 , 2γ2 )k∞ < γ2 − β.
And since we know that the coefficients of cs2 are smaller than β, we know that adding
cs2 is not enough to cause any carries by increasing any low-order coefficient to have
magnitude at least γ2 . Thus Eq. (1) is true and the signature verifies correctly.
1.2 Dilithium
The basic template in Fig. 1 is rather inefficient, as is. The most glaring (but trivially
fixed) inefficiency is that the public key consists of a matrix of k · ` polynomials, which
could have a rather large representation. The fix is simply to have A generated from some
seed ρ using SHAKE-128, and this is a standard technique. The public key is therefore
(ρ, t) and its size is dominated by t. The novelty of Dilithium over the previous schemes
(e.g. [BG14] and qTESLA [ABB+ 19], which is a particular instantiation of the [BG14]
framework) is that we also shrink the bit-representation size of t by a factor slightly larger
than two at the expense of increasing the signature by less than 100 bytes.
The main observation for obtaining this very favorable trade-off is that when the verifier
computes w10 in Line 13, the high-order bits of Az − ct do not depend too much on the
low order bits of t because t is being multiplied by a very low-weight polynomial c. In
our scheme, some low-order bits of t are not included in the public key, and so the verifier
cannot always correctly compute the high-order bits of Az − ct. To make up for this,
the signer includes some “hints” as part of the signature, which are essentially the carries
caused by adding in the product of c with the missing low-order bits of t. With this hint,
the verifier is able to correctly compute w10 .
Additionally, we give an option for making our scheme deterministic using the standard
technique of adding a seed to the secret key and using this seed together with the message
to produce the randomness y in Line 07. The recent result of Kiltz et al. [KLS18] showed
that the fewer different signatures the adversary sees for the same messages, the tighter the
reduction is in the quantum random oracle model between the signature scheme and the
underlying hardness assumptions. While it’s not clear as to whether there is an improved
quantum attack for randomized signatures, we suggest the deterministic version as the
default option except in scenarios where an adversary can mount side-channel attacks that
exploit determinism [SBB+ 18, PSS+ 18]. Our full scheme in Fig. 4 also makes use of basic
optimizations such as pre-hashing the message M so as to not rehash it with every signing
attempt.
r is such an element, then X 256 + 1 = (X − r)(X − r 3 ) · · · (X − r 511 ) and thus one can
equivalently represent any polynomial a ∈ Zq [X ]/(X 256 +1) in its CRT (Chinese Remainder
Theorem) form as (a(r), a(r 3 ), . . . , a(r 2n−1 )). The advantage of this representation is that
the product of two polynomials is coordinate-wise. Therefore the most expensive parts of
polynomial multiplication are the transformations a → â and the inverse â → a – these
are the NTT and inverse NTT operations.
The above-mentioned advantage of using the NTT representation is even more pro-
nounced due to the fact that when doing the multiplication Ay, one needs to compute
much fewer NTTs than the dimension of the matrix. Since the matrix is uniformly random,
one can generate (and store) it already in NTT form. Doing the multiplication of this
matrix by a vector y ∈ Rq5 then only involves doing 5 NTT computations to convert y into
the NTT representation1 , then doing pointwise multiplications to obtain Ay in NTT form,
followed by 6 inverse NTT multiplications.
The other major time-consuming operation is the expansion of a seed ρ into the
polynomial matrix A. The matrix A is needed for both signing and verification, therefore
a good implementation of SHAKE-128 is important for the efficiency of the scheme.
For our AVX2 optimized implementation of the NTT we take the approach of using
integer, rather than floating point, arithmetic. Although we pack only 4 coefficients into
one vector register of 256 bits, which is the same density that is also used by floating
point implementations, we can improve on the multiplication speed by about a factor of
2. We achieved this speed-up by carefully scheduling the instructions and interleaving
the multiplications and reductions during the NTT so that parts of the multiplication
latencies are hidden.
Security. Following the basic approach from [Lyu12], the security of the signature scheme
in Figure 1 can be proved, in the Random Oracle model (ROM), based on the hardness of
two problems. The first is the standard LWE (over polynomial rings) problem which asks
to distinguish (A, t := As1 + s2 ) from (A, u), where u is uniformly random. The other
problem is, what was called
the SelfTargetMSIS problem in [KLS18], which is the problem
z
of finding a vector c with small coefficients and a message (digest) µ satisfying
v
z
H µ k [ A | t | I] · c = c,
v
where A and t are uniformly random and I is the identity matrix. In the ROM, one can
get a (non-tight) reduction using the forking lemma [PS00, BN06] from the standard MSIS
problem of finding a z0 with small coefficients satisfying Az0 = 0 to SelfTargetMSIS. One
can follow this exact approach to prove Dilithium secure in the ROM based on the hardness
of MLWE and SIS.
In the Quantum Random Oracle model (QROM), where the adversary can query H in
superposition, the situation is a little different. It was shown in [KLS18] that Dilithium is
still based on MLWE and SelfTargetMSIS in the QROM, even with a tight reduction when
the scheme is deterministic. But one can no longer directly use the forking lemma (since
it’s a type of rewinding) to give a quantum reduction from MSIS to SelfTargetMSIS. There
are still good reasons to believe that the SelfTargetMSIS problem, and therefore Dilithium,
is secure in the QROM. Firstly, there are no natural signature schemes constructed from
Σ-protocols using the Fiat-Shamir transform that are secure in the ROM but not in the
QROM. Also, it is possible to set parameters of Dilithium (while leaving the design of the
1 The vectors, like y, which we multiply A with contain only coefficients and so cannot be generated
Table 1: Output sizes, security, and performance of Dilithium. The formulas for the sizes
of the public key and signature are given in Section 5.4. The numbers in parentheses for
SIS security are for the strongly-unforgeable (i.e. hard to come up with a second distinct
signature for an already-seen message/signature pair) version of the signature scheme,
where the coefficient length comes from Eq. (8), rather than Eq. (7) for the unparenthesized
version. Due to the rejection sampling, there is a noticeable difference between the median
and average signing times, and so we measure both.
2 Basic Operations
2.1 Ring Operations
We let R and Rq respectively denote the rings Z[X ]/(X n + 1) and Zq [X ]/(X n + 1), for q
an integer. Throughout this document, the value of n will always be 256 and q will be the
prime 8380417 = 223 − 213 + 1. Regular font letters denote elements in R or Rq (which
includes elements in Z and Zq ) and bold lower-case letters represent column vectors with
coefficients in R or Rq . By default, all vectors will be column vectors. Bold upper-case
letters are matrices. For a vector v, we denote by vT its transpose. The boolean operator
JstatementK evaluates to 1 if statement is true, and to 0 otherwise.
Modular reductions. For an even (resp. odd) positive integer α, we define r 0 = r mod± α
to be the unique element r 0 in the range − α2 < r 0 ≤ α2 (resp. − α−1
2 ≤ r ≤ 2 ) such that
0 α−1
r ≡ r mod α. We will sometimes refer to this as a centered reduction modulo α.2 For
0
any positive integer α, we define r 0 = r mod+ α to be the unique element r 0 in the range
0 ≤ r 0 < α such that r 0 ≡ r mod α. When the exact representation is not important, we
simply write r mod α.
We will write Sη to denote all elements w ∈ R such that kwk∞ ≤ η. We will write S̃η , to
denote the set {w mod± 2η : w ∈ R}. Note that Sη and S̃η are very similar except that S̃η
does not contain any polynomials with −η coefficients.
can be computed quickly with the help of the Fast Fourier Transform. Since X 256 + 1 =
X 256 − r 256 = (X 128 − r 128 )(X 128 + r 128 ) one can first compute the map
and then continue separately with the two reduced polynomials of degree less than
128 noting that X 128 + r 128 = X 128 − r 384 . The Fast Fourier Transform is also called
2 We draw the reader’s attention to the fact that for even α, the range includes α/2 but not −α/2. This
is a somewhat less standard choice, but defining things in this way makes some parts of the scheme (in
particular, the bit-packing of the public key) more efficient.
10 CRYSTALS-Dilithium
Number Theory Transform (NTT) in this case where the ground field is a finite field.
Natural fast NTT implementations do not output vectors with coefficients in the order
a(r), a(r 3 ), . . . , a(r 511 ). Therefore we define the NTT domain representation â = NTT(a) ∈
Z256
q of a polynomial a ∈ Rq to have coefficients in the order as output by our reference
NTT. Concretely,
where ri = r brv(128+i) with brv(k) the bitreversal of the 8 bit number k. With this notation,
and because of the isomorphism property, we have ab = NTT−1 (NTT(a)NTT(b)). For vectors
y and matrices A, the representations ŷ = NTT(y) and  = NTT(A) mean that every
polynomial yi and ai,j comprising y and A is in NTT domain representation. We give
further detail about our NTT implementations in Section 5.6.
2.3 Hashing
Our scheme uses several different algorithms that hash strings in {0, 1}∗ onto domains of
various forms. Below we give the high level descriptions of these functions and defer the
details of how exactly they are used in the signature scheme to Section 5.3.
Hashing to a Ball. Let Bτ denote the set of elements of R that have τ coefficients that
are either −1 or 1 and the rest are 0. We have |Bτ | = 2τ · 256
τ . For our signature scheme,
we will need a cryptographic hash function that hashes onto Bτ in two steps. The first
step consists of using a 2nd pre-image resistant cryptographic hash function to map {0, 1}∗
onto the domain {0, 1}256 . The second stage consists of an XOF (e.g. SHAKE-256) that
maps the output of the first stage to an element of Bτ . The reason for this two-step
process is that the output of the first stage allows for a very simple, compact description
of an element in Bτ (under the assumption that the XOF is indistinguishable from a
purely-random function). The algorithm we will use to create a random element in Bτ is
sometimes referred to as an “inside-out” version of the Fisher-Yates shuffle [Knu97], and
its high-level description is in Fig. 2.3 The XOF in the second stage is used to create the
randomness needed for this algorithm (in Steps 03 and 04) using the output of the first
stage as the seed.
SampleInBall(ρ)
01 Initialize c = c0 c1 . . . c255 = 00 . . . 0
02 for i := 256 − τ to 255
03 j ← {0, 1, . . . , i}
04 s ← {0, 1}
05 ci := cj
06 cj := (−1)s
07 return c
Figure 2: Create a random 256-element array with τ ±1’s and 256 − τ 00 s using the input
seed ρ (and an XOF) to generate the randomness needed in Steps 03 and 04.
Expanding the Matrix A. The function ExpandA maps a uniform seed ρ ∈ {0, 1}256 to
a matrix A ∈ Rqk×l in NTT domain representation. The matrix A is only needed for
multiplication. Hence, for the sake of faster implementations, the expansion function
ExpandA does not output A ∈ Rqk×l = (Zq [X ]/(X 256 + 1))k×l . Instead it outputs  ∈ Z256
q ,
3 Normally, the algorithm should begin at i = 0, but since there are 256 − τ 0’s, the first 256 − τ − 1
Sampling the vectors y. The function ExpandMask, used for deterministically generating
the randomness of the signature scheme, maps a seed ρ0 and a nonce κ to y ∈ S̃γl 1 .
Collision resistant hash. The function CRH used in our signature scheme is a collision
resistant hash function mapping to {0, 1}384 .
Lemma 1. Suppose that q and α are positive integers satisfying q > 2α, q ≡ 1 (mod α)
and α even. Let r and z be vectors of elements in Rq where kzk∞ ≤ α/2, and let h, h0
be vectors of bits. Then the HighBitsq , MakeHintq , and UseHintq algorithms satisfy the
following properties:
Lemma 3. Let (r1 , r0 ) = Decomposeq (r, α), (w1 , w0 ) = Decomposeq (r + s, α), and
ksk∞ ≤ β. Then
3 Signature
The Key Generation, Signing, and Verification algorithms for our signature scheme are
presented in Fig. 4. We present the deterministic version of the scheme in which the ran-
domness used in the signing procedure is generated (using SHAKE-256) as a deterministic
function of the message and a small secret key. Since our signing procedure may need to
be repeated several times until a signature is produced, we also append a counter in order
to make the SHAKE-256 output differ with each signing attempt of the same message.
Also due to the fact that each (possibly long) message may require several iterations to
sign, we compute an initial digest of the message using a collision-resistant hash function,
and use this digest in place of the message throughout the signing procedure.
As discussed in Section 1.2, the main design improvement of Dilithium over the scheme
in Fig. 1 is that the public key size is approximately halved at the expense of fewer than a
hundred extra bytes in the signature. To accomplish the size reduction, the key generation
algorithm outputs t1 := Power2Roundq (t, d) as the public key instead of t as in Fig. 1.
This means that instead of dlog qe bits per coefficient, the public key requires dlog qe − d
bits. In our instantiations, q ≈ 223 and d = 13, which means that instead of 23 bits in
each public key coefficient, there are instead 10.
CRYSTALS-Dilithium 13
Gen
01 ζ ← {0, 1}256
02 (ρ, ς, K ) ∈ {0, 1}256×3 := H(ζ) B H is instantiated as SHAKE-256 throughout
03 (s1 , s2 ) ∈ Sη` × Sηk := H(ς)
04 A ∈ Rqk×` := ExpandA(ρ) B A is generated and stored in NTT Representation as Â
05 t := As1 + s2 B Compute As1 as NTT−1 (Â · NTT(s1 ))
06 (t1 , t0 ) := Power2Roundq (t, d)
07 tr ∈ {0, 1}384 := CRH(ρ k t1 )
08 return (pk = (ρ, t1 ), sk = (ρ, K , tr, s1 , s2 , t0 ))
Sign(sk, M )
09 A ∈ Rqk×` := ExpandA(ρ) B A is generated and stored in NTT Representation as Â
10 µ ∈ {0, 1}384 := CRH(tr k M )
11 κ := 0, (z, h) := ⊥
12 ρ0 ∈ {0, 1}384 := CRH(K k µ) (or ρ0 ← {0, 1}384 for randomized signing)
13 while (z, h) = ⊥ do B Pre-compute ŝ1 := NTT(s1 ), ŝ2 := NTT(s2 ), and t̂0 := NTT(t0 )
14 y ∈ S̃γ`1 := ExpandMask(ρ0 , κ)
15 w := Ay B w := NTT−1 (Â · NTT(y))
16 w1 := HighBitsq (w, 2γ2 )
17 c̃ ∈ {0, 1}256 := H(µ k w1 )
18 c ∈ Bτ := SampleInBall(c̃) B Store c in NTT representation as ĉ = NTT(c)
19 z := y + cs1 B Compute cs1 as NTT−1 (ĉ · ŝ1 )
20 r0 := LowBitsq (w − cs2 , 2γ2 ) B Compute cs2 as NTT−1 (ĉ · ŝ2 )
21 if kzk∞ ≥ γ1 − β or kr0 k∞ ≥ γ2 − β, then (z, h) := ⊥
22 else
23 h := MakeHintq (−ct0 , w − cs2 + ct0 , 2γ2 ) B Compute ct0 as NTT−1 (ĉ · t̂0 )
24 if kct0 k∞ ≥ γ2 or the # of 1’s in h is greater than ω, then (z, h) := ⊥
25 κ := κ + `
26 return σ = (z, h, c̃)
Figure 4: The pseudo-code for deterministic and randomized versions of Dilithium. The
only difference between the two versions is in Line 12, where ρ0 is either a function of the
key and message, or is chosen completely at random.
14 CRYSTALS-Dilithium
The main complication due to not having the entire t in the public key is that the
verification algorithm is no longer able to exactly compute w10 in Line 13 in Fig. 1. In
order to do this, the verification algorithm will need the high order bits of Az − ct, but it
can only compute Az − ct1 · 2d = Az − ct + ct0 . Even though the high-order bits of ct0
are 0,its presence in the sum creates “carries” which may have an effect on the higher bits.
The signer thus sends these carries as a hint to the verifier. Heuristically, based on our
parameter choices, there should not be more than ω positions in which a carry is caused.
The signer therefore simply sends the positions in which these carries occur (these are the
extra bytes in Dilithium compared to the signature in Fig. 1), which allows the verifier to
compute the high order bits of Az − ct.
3.3 Correctness
In this section, we prove the correctness of the signature scheme.
If kct0 k∞ < γ2 , then by Lemma 1 we know that
Furthermore, because β is set such that kcs2 k∞ ≤ β and the signer checks in Line 21
that LowBitsq (w − cs2 , 2γ2 ) < γ2 − β, Lemma 2 implies that
HighBitsq (w − cs2 , 2γ2 ) = HighBitsq (w − cs2 + cs2 , 2γ2 ) = HighBitsq (w, 2γ2 ) = w1 . (3)
Therefore, the w10 computed by the verifier in the input to the hash function is the
same as the w1 of the signer. And thus the verification procedure will always accept.
where we used the fact that our values of γ1 are large compared to 1/2.
We now move to computing the probability that we have
If we (heuristically) assume that the low order bits are uniformly distributed modulo 2γ2 ,
then there is a
256·k
2(γ2 − β) − 1
≈ e−256·βk/γ2
2γ2
probability that all the coefficients are in the good range (using the fact that our values
of β are large compared to 1/2). Therefore, the probability that Step 21 passes is
It is more difficult to formally compute the probability that Step 24 results in a restart.
The parameters were set such that heuristically (z, h) = ⊥ with probability between 1 and
2%. Therefore the vast majority of the loop repetitions will be caused by Step 21.
We would like to stress that the expected number of iterations is independent of the
secret key s1 , s2 and therefore no information about them can be gained by an attack that
counts the iterations.
Table 3: Challenge and Extended Security Parameters for Dilithium. The numbers in
parentheses for SIS security are for the strongly-unforgeable (i.e. hard to come up with
a second distinct signature for an already-seen message/signature pair) version of the
signature scheme.
of any symmetric primitives (e.g. PRFs or collision-resistant hash functions) that are
used inside our protocol, but only increase the security of the underlying lattice problems.
The goal of these parameter sets is to show what the parameters would be should there
be a moderate improvement in lattice cryptanalysis. In terms of Core-SVP hardness,
the 5++ level has about twice the required security of NIST level 3, and is therefore a
good representative of what the parameters may look like if moderate improvements in
lattice-based cryptanalysis necessitate a change. Such a change would be recommended
in case some future cryptanalytic effort begins to threaten the security of the level 3
parameter set.
5 Implementation Details
5.1 Alternative way of decomposing and computing the hints
In the pseudo-code for signing in Figure 4, the decomposition function Decompose is called
3 times per iteration of the rejection sampling loop. We have presented the algorithm
in this way for the ease of exposition and for compatibility with the security proofs
in [KLS18, DKL+ 18]. In the implementation, we use an alternative and faster way of
computing the hint vector h and the rejection condition based on the low part w0 of w.
First note that Lemma 3 says that instead of computing (r1 , r0 ) = Decomposeq (w − cs2 , α)
and checking whether kr0 k∞ < γ2 − β and r1 = w1 , it is equivalent to just check that
kw0 − cs2 k∞ < γ2 − β, where w0 is the low part of w. If this check passes, w0 − cs2
is the low part of w − cs2 . Next, recall that by the definition of the MakeHint function,
a coefficient of a polynomial in h as computed in Figure 4 is non-zero precisely if the
high parts of the corresponding coefficients of w − cs2 and w − cs2 + ct0 differ. Now, we
have already computed the full decomposition w = αw1 + w0 of w, and we know that
αw1 + (w0 − cs2 ) is the correct decomposition of w − cs2 . But then, αw1 + (w0 − cs2 + ct0 )
is the correct decomposition of w − cs2 + ct0 (i.e. the high part is w1 ) if and only if each
coefficient of w0 − cs2 + ct0 lies in the interval (−γ2 , γ2 ], or, when some coefficient is −γ2
and the corresponding coefficient of w1 is zero. The last condition is due to the border case
in the Decompose function. On the other hand, if these conditions are not true, then the
high parts must differ and it then follows that for computing the hint vector h it suffices
to just check these conditions on the coefficients of w0 − cs2 + ct0 .
5.2 Bit-packing
We now describe how we encode vectors as byte strings. This is needed for absorbing
them into SHAKE and defining the data layout of the keys and signature. To reduce the
computation time spent on SHAKE and the sizes of keys and signatures, we use bit-packing.
The general rule that we will follow is that if the range what a element x is in consists
exclusively of non-negative integers, then we simply pack the integer x. If x is from a range
[−a, b] that may contain some negative integers, then we pack the positive integer b − x.
We begin with the vector w1 . It consists of k polynomials w1,1 , . . . , w1,k in Rq with
coefficients that are roundings of elements in Zq with respect to α = 2γ2 . It follows that
its coefficients lie in {0, . . . , 15} for NIST levels 3 and 5, and in {0, . . . , 43} for NIST level
2. These can be represented by 4 and, respectively, 6, bits each. This allows w1 to be
packed in a string of k · 256 · 4/8 = k · 128, respectively k · 192 bytes. For NIST security
levels 3 and 5, each byte encodes two consecutive coefficients of a polynomial w1,i in its
low 4 bits and high 4 bits, respectively. See Figure 5 for an explanation of the exact bit
packing. For NIST security level 2, the packing looks as in Figure 6.
Next we turn to the vector t1 , which is the power-of-two rounding of t. Note that
q − 1 = 223 − 213 = (210 − 1)213 , which shows that the coefficients of the k polynomials
18 CRYSTALS-Dilithium
c1 c2 c3 c4 ... c5 c6 c c c 1 2 3
...
c Byte 1c
1 2 c Byte 2c 3 c Byte 3c 4
... 5 6 c Bytec 1 c 1 2 3
...
Figure 5: Bit-packing w for NIST levels 3 and 5. The k polynomials comprising w are
w , . . . , w and
1,1 1,k c we let
Byte 1
1
1 c , . . .c, c Byte
1 be the c
2 coefficients
2 256 Byte 3
. lower
of wc3 (with the . . powers first),
4
1,1 Byte 1 c
1
......
257 512 1,2
c c c c c c c c c c
1
cByte 11
2
c Byte 2 c
3
2
4
Bytec3
...
5
3
. . . 6
4 Byte 1 c
1 2 3 4
Figure 7: Bit-packing t1 . The k polynomials comprising t1 are t1,1 , . . . , t1,k and we let
c1 , . . . , c256 be the coefficients of t1,1 (with the lower powers first), c257 , . . . , c512 be the
coefficients of t1,2 , etc.
The coefficients of the polynomials of t0 can be written in the form 212 − v with
v ∈ {0, . . . , 213 − 1}. These v in little endian byte-order are bit-packed. This results in
256 · 13/8 bytes per polynomial and k · 256 · 13/8 = 416k bytes for t0 . See Figure 8 for an
explanation of the exact bit packing.
The polynomials in s1 and s2 have coefficients with infinity norm at most η. So every
coefficient of these polynomials is equivalent modulo q to η−c with some c ∈ {0, . . . , 2η}. In
the bit packing the values for c are stored so that each polynomial needs 256dlog(2η + 1)e/8
bytes. This amounts to 256 · 3/8 = 96 bytes for NIST levels 2 and 5, and 256 · 4/8 = 128
bytes for level 3. The bit-packing is done similarly to the case of w1 , t1 and t0 . See Figure
9 for an explanation of the exact bit packing.
Finally, z contains polynomials whose coefficients are equivalent modulo q to γ1 − c
with c ∈ {0, . . . , 2γ1 − 1} and these values c are bit packed. For NIST levels 3 and 5,
bit-packing z requires l · 256 · 20/8 = 640l bytes and blocks of 2 coefficients are stored in 5
consecutive bytes. See Figure 10 for an explanation of the exact bit packing. For NIST
level 2, the bit-packing of z requires l · 256 · 18/8 = 576l bytes
... c1 c2 c3 c4 c5 c6 c7 c8 ...
... ...
CRYSTALS-Dilithium
Byte 1 Byte 2 Byte 3 19
... c1 c2 ...
... ...
Byte 1 Byte 2 Byte 3
Figure 8: Bit-packing t0 . The k polynomials comprising t0 are t0,1 , . . . , t0,k with ci defined
... ...
c6 ... cByte
1 1c2 c3 Byte
c4 2 c5 c6 Bytec73 c
8 ...
... ...
3 Byte 1 Byte 2 Byte 3
Figure 9: Bit-packing si . The ` polynomials comprising s1 are s1,1 , . . . , s1,` and we let
4
... η−c1 , . . . , η−c256 be the coefficients of s1,1 (with the lower powers first), η−c257 , . . . , η−c512
be the coefficients of s1,2c,1etc. where ci ∈ {0, . . . , 2η}.cThe ...
k polynomials comprising s2
2
... are s2,1 , . . . , s2,` and we let η − c1 , . . . , η − c256 be the coefficients of s2,1 (with the lower
...
powers first), c257 , . . . , c512 be the coefficients of s2,2 , etc. ci ∈ {0, . . . , 2η}. The above
3 picture is for parameter sets where ci requires three bits per coefficient. Packing four bits
per coefficient is Byte done 1
in the obviousByte 2 and looks
manner Byte
like3the packing of wi in Figure 5.
5.3 Hashing
c3 . Hashing
. . to a Ball. The function
...
c1 H in Figure 4 absorbs the concatenation of µ and w1
... ...
into SHAKE-256, and produces a 32-byte output c̃. We now precisely specify the operation
of the function SampleInBall : c̃ 7→ c ∈ Bτ described in Figure 2 as it is used in our
signature scheme. The SampleInBall routine absorbs the 32 bytes of c̃ into SHAKE-256.
3 Byte 1 Byte 2 Byte 3
Throughout its operations the function squeezes SHAKE-256 in order to obtain a stream
of random bytes of variable length. The first τ bits in the first 8 bytes of this random
stream are interpreted as τ random sign bits si ∈ {0, 1}, i = 0, . . . , τ − 1. The remaining
64 − τ bits are discarded. In each iteration of the for loop in the Algorithm in Figure 2,
we use rejection sampling on elements from {0, . . . , 255} until we obtain a j ∈ {0, . . . , i}.
An element in {0, . . . , 255} is obtained by interpreting the next byte of the random stream
from SHAKE-256 as a number in this set. For the sign s the corresponding si−196 is used.
Expanding the Matrix A. The function ExpandA maps a uniform seed ρ ∈ {0, 1}256 to a
matrix A ∈ Rqk×l in NTT domain representation. It computes each coefficient âi,j ∈ Rq of
 separately. For the coefficient âi,j it absorbs the 32 bytes of ρ immediately followed by
two bytes representing 0 ≤ 256 ∗ i + j < 216 in little-endian byte order into SHAKE-128.
In the AES variant ρ is used as the key and 256i + j is zero-padded to a 12-byte nonce
for AES in counter mode. The output stream of SHAKE-128 or AES256ctr is interpreted
as a sequence of integers between 0 and 223 − 1. This is done by setting the highest bit
of every third byte to zero and interpreting blocks of 3 consecutive bytes in little endian
byte order. So for example the three bytes b0 , b1 and b2 are used to get the integer
0 ≤ b20 · 216 + b1 · 28 + b0 ≤ 223 − 1 where b20 is the logical AND of b2 and 2128 − 1. Finally,
ExpandA performs rejection sampling on these 23-bit integers to sample the 256 coefficients
... c1 c2 ...
... ...
20 Byte 1 Byte 2 Byte 3 CRYSTALS-Dilithium
c3 ... c1 ...
... ...
Byte 1 Byte 2 Byte 3
Figure 10: Bit-packing z. The ` polynomials comprising z are z1 , . . . , z` and we let γ1 −
c1 , . . . , γ1 −c256 be the coefficients of z1 (with the lower powers first), γ1 −c257 , . . . , γ1 −c512
be the coefficients of z2 , etc. In the above, each coefficient of z uses 20 bits. The packing
for level 2, where each coefficient is 18 bits looks analogously.
ai,j (r0 ), ai,j (−r0 ), . . . , ai,j (r127 )ai,j (−r127 ) of âi,j uniformly from the set {0, . . . , q − 1} in
the order of our NTT domain representation.
Sampling the vectors y. The function ExpandMask maps (ρ0 , κ) to y ∈ S̃γl 1 , where κ ≥ 0,
and works as follows. It computes each of the l coefficients of y, which are polynomials
in S̃γ1 , independently. For the i-th polynomial, 0 ≤ i < l, it absorbs the 48 bytes of ρ0
concatenated with the 2 bytes representing κ+i in little endian byte order into SHAKE-256.
In the AES variant the first 32 bytes of ρ0 are used as the key and κ + i is zero-padded
to a 12 byte nonce for AES in counter mode. Then the output bytes are used to create
a positive number in the range {0, . . . , 2γ1 − 1}, and then the integers comprising y are
then obtained by subtracting (γ1 − 1) from this positive number. Because γ1 is a power of
2, no rejection sampling is required. Because γ1 is 17 and 19, for different security levels,
generating the consecutive integers comprising y involves either taking 18 or 20 bits at a
time from the byte stream.
Collision resistant hash. The function CRH in Figure 4 is a collision resistant hash
function. For this purpose 384 bits of the output of SHAKE-256 are used. CRH is called
with 3 different inputs. First it is called with ρ k t1 . The function then absorbs the 32
bytes of ρ followed by the k · 256 · 9/8 bytes of the bit-packed representation of t1 into
SHAKE-256 and takes the first 48 bytes of the first output block of SHAKE-256 as the
output hash. The second input is tr k M . Here the concatenation of the hash tr and the
message string M are absorbed into SHAKE-256 and the first 48 output bytes are used as
the resulting hash. The third input K k µ is handled in the same way.
Secret key. The secret key contains ρ, K , tr, s1 , s2 and t0 and is also stored as the
concatenation of the bit-packed representation of these quantities in the given order.
Consequently, a secret key requires 32 + 32 + 48 + 32((k + `) · dlog(2η + 1)e + 13k) bytes.
As mentioned previously, the signer also has the option of simply storing the 32-byte value
ζ and then simply re-deriving all the other elements of the secret key.
Signature. The signature byte string is the concatenation of a bit packed representation
of z and encodings of h and c in this order. We describe the encoding of h, which needs
ω + k bytes. Together all the polynomials in the vector h have at most ω non-zero
CRYSTALS-Dilithium 21
coefficients. It is sufficient to store the locations of these non-zero coefficients. Each of the
first ω bytes of the byte string representing h is the index i of the next non-zero coefficient
in its polynomial, i.e. 0 ≤ i ≤ 255, or zero if there are no more non-zero coefficients. The
bytes numbers ω up to ω + k − 1 record the k positions j of the polynomial boundaries in
the string of ω coefficient indices, where 0 ≤ j ≤ ω. The challenge c̃ has 32 bytes.
Therefore, a signature requires 32l · (1 + log γ1 ) + ω + k + 32 bytes.
We make heavy use of lazy reduction in our implementation. In the NTT we do not
reduce the results of additions and subtractions at all. For rounding and norm checking
it is important to map to standard representatives. This freezing of the coefficients is
achieved in constant-time by conditionally subtracting q with another instance of the
arithmetic right shift trick.
6 Security Reductions
The standard security notion for digital signatures is UF-CMA security, which is security
under chosen message attacks. In this security model, the adversary gets the public key and
has access to a signing oracle to sign messages of his choice. The adversary’s goal is to come
up with a valid signature of a new message. A slightly stronger security requirement that
is sometimes useful is SUF-CMA (Strong Unforgeability under Chosen Message Attacks),
which also allows the adversary to win by producing a different signature of a message
that he has already seen.
It can be shown that in the (classical) random oracle model, Dilithium is SUF-CMA
secure based on the hardness of the standard MLWE and MSIS lattice problems. The
reduction, however, is not tight. Furthermore, since we also care about quantum attackers,
we need to consider the security of the scheme when the adversary can query the hash
function on a superposition of inputs (i.e. security in the quantum random oracle model –
QROM). Since the classical security proof uses the “forking lemma” (which is essentially
rewinding), the reduction does not transfer over to the quantum setting.
There are no counter-examples of schemes whose security is actually affected by the
non-tightness of the reduction. For example, schemes like Schnorr signatures [Sch89], GQ
signatures [GQ88], etc. all set their parameters ignoring the non-tightness of the reduction.
Furthermore, the only known uses of the additional power of quantum algorithms against
schemes whose security is based on quantum-resistant problems under a classical reduction
involve “Grover-type” algorithms that improve exhaustive search (although it has been
shown that there cannot be a “black-box” proof that the Fiat-Shamir transform is secure
in the QROM [ARU14]).
The reason that there haven’t been any attacks taking advantage of the non-tightness
of the reduction is possibly due to the fact that there is an intermediate problem which is
tightly equivalent, even under quantum reductions, to the UF-CMA security of the signature
scheme. This problem is essentially a “convolution” of the underlying mathematical
problem (such as MSIS or discrete log) with a cryptographic hash function H. It would
appear that as long as there is no relationship between the structure of the math problem
and H, solving this intermediate problem is not easier than solving the mathematical
problem.4
Below, we will introduce the assumptions upon whose hardness the SUF-CMA security
of our scheme is based. The first two assumptions, MLWE and MSIS, are standard lattice
problems which are a generalization of LWE, Ring-LWE, SIS, and Ring-SIS. The third
problem, SelfTargetMSIS is the aforementioned problem that’s based on the combined
hardness of MSIS and the hash function H. In the classical ROM, there is a (non-tight)
reduction from MSIS to SelfTargetMSIS.
6.1 Assumptions
The MLWE Problem. For integers m, k, and a probability distribution D : Rq → [0, 1],
we say that the advantage of algorithm A in solving the decisional MLWEm,k,D problem
over the ring Rq is
m,k,D := Pr[b = 1 | A ← Rq
AdvMLWE m×k
; t ← Rqm ; b ← A(A, t)]
− Pr[b = 1 | A ← Rqm×k ; s1 ← D k ; s2 ← D m ; b ← A(A, As1 + s2 )] .
4 In the ROM, there is indeed a (non-tight) reduction using the forking lemma that states that solving
AdvMSIS
m,k,γ (A) := Pr 0 < kyk∞ ≤ γ ∧ [ I | A ] · y = 0 | A ← Rq
m×k
; y ← A(A) .
AdvSelfTargetMSIS
H,m,k,γ (A) :=
0 ≤ kyk∞ ≤ γ
r
Pr A← Rqm×k ; y := ,µ ← A|H(·)i
(A) .
∧ H(µ k [ I | A ] · y) = c c
Classical Hardness of SelfTargetMSIS. We now sketch the classical reduction from MSIS
to SelfTargetMSIS. In this scenario, the attacker A is classical and only has classical access
to the function H modeled as a random oracle. This reduction is a standard application of
rewinding / forking lemma, and therefore we only give a sketch.
Suppose A makes queries H(µ1 k w1 ), . . . , H0 (µ k k wk ) and receives randomly-chosen
r
responses c1 , . . . , ck , and finally outputs µi , y = satisfying H(µi k [ I | A ] · y) = ci
ci
for some 1 ≤ i ≤ k. The reduction then rewinds A to the point where he made the query
H(µi k wi ) and reprograms the response to another randomly-chosen ci0 . The forking
lemma then states that the successful A has 0 a ≈ 1/k chance to create a forgery on the
r
same i. That is, he will output µi , y = 0 satisfying H(µi k [ I | A ] · y0 ) = ci0 . We
0
ci
therefore have that [ I | A ] · (y − y0 ) = 0 and y − y0 6= 0 because ci 6= ci0 . Furthermore,
all the coefficients of yi − yi0 are small, thus giving a solution to MSIS.
• kzk∞ < γ1 − β
• # of 1’s in h is ≤ ω
where kuk∞ ≤ 2γ2 + 1. Furthermore, only ω coefficients of u will have magnitude greater
than γ2 . If we write t = t1 · 2d + t0 where kt0 k∞ ≤ 2d−1 , then we can rewrite Eq. (9) as
For simplicity of notation, we can define the function H0 such that H(µ k x) = H0 (µ k
2γ2 x), and so Eq. (11) becomes
z
H0 µ k [ A | t | Ik ] · c = c.8 (12)
u0
6 It was also shown in [KLS18] that the “deterministic” part of the requirement can be relaxed. The
security proof simply loses a factor of the number of different signatures produced per message in its
tightness. Thus, for example, if one were to implement the signature scheme (with the same secret key) on
several devices with different random-number generators, the security of the scheme would not be affected
much.
7 In that paper, it is actually proved that the underlying zero-knowledge proof is zero-knowledge and
then the security of the signature scheme follows via black box transformations.
8 Notice that the difference between H and H0 is just a change in the format of the input.
26 CRYSTALS-Dilithium
Since (A, t) is completely random, this is exactly the definition of the SelfTargetMSIS
problem from above.
A standard forking lemma argument sketched in Section 6.1 can be used to show that
an adversary solving the above problem in the (standard) random oracle model can be
used to solve the MSIS problem. While giving a reduction using the forking lemma is a
good “sanity check”, it is not particularly useful for setting parameters due to its lack of
tightness. So how does one set parameters? The Fiat-Shamir transform has been used
for over 3 decades (and we have been aware of the non-tightness of the forking lemma for
two of them), yet the parameter settings for schemes employing it have ignored this loss
in tightness. Implicitly, therefore, these schemes rely on the exact hardness of analogues
(based on various assumptions such as discrete log [Sch89], one-wayness of RSA [GQ88],
etc.) of the problem in Eq. (12).
The intuition for the security of the problem in Eq. (12) (and its discrete log, RSA,
etc. analogues) is as follows: since H is a cryptographic hash function whose structure is
completely independent of the algebraic structure of its inputs, choosing some µ “strategi-
cally” should not help – so the problem would be equally hard if the µ were fixed. Then,
again relying on the independence of H0 and the algebraic structure of its inputs, the only
approach for obtaining a solution appears to be picking some w, computing H0 (µ k w) = c,
and then finding z, u0 such that Az + u0 = w + ct.9 The hardness of finding such z, u0
with `∞ -norms less than in Eq. (7) such that
Az + u0 = t0 (13)
for some t0 is the problem whose concrete security we will be analyzing. Note that this
bound is somehwat conservative because in Eq. (12) kzk∞ < γ1 − β τ · 2d−1 + 2γ2 + 1.
Furthermore, only ω coefficients of u0 can be larger than τ · 2d−1 + γ2 . Thus most of
the coefficients of the solution (z, u0 ) to Eq. (13) generated by a real forgery must have
coefficients noticeably smaller than the maximum in Eq. (7).
By Lemma 1, the above equality can be shown to imply that there exist kzk∞ ≤ 2(γ1 − β)
and kuk∞ ≤ 4γ2 + 2 such that Az + u = 0.
the two instances is slightly different and we analyze the MLWE problem in Appendix C.2
and the MSIS problem (as well as SelfTargetMSIS) in Appendix C.3.
We first follow the core-SVP hardness methodology from [ADPS16], based on the
asymptotic costs of sieving [BDGL16a, Laa15]. It is significantly more conservative than
prior ones used in lattice-based cryptography. We describe how we use it for MLWE and
MSIS in sections C.2 and C.3. However, recent progresses in sieving in practice seems to
mandate more refined estimates; we sketch such refinements in Section C.5 (the full details
are provided in the Kyber Round 3 specification document). It expresses security in terms
of the number of gates required for the computation, in the RAM model; because the
algorithm of [ADPS16] makes random access to an exponentially large amount of memory,
it is quite likely that any realistic implementation will be significantly more expensive.
While the MLWE and MSIS problems are defined over polynomial rings, we do not
currently have any way of exploiting this ring structure, and therefore the best attacks are
mounted by simply viewing these problems as LWE and SIS problems. The LWE and SIS
problems are exactly as in the definitions of MLWE and MSIS in Section 6.1 with the ring
Rq being replaced by Zq .
expense of more repetitions.10 Because the increase in running time is rather dramatic
(e.g. halving both γi would end up squaring the number of required repetitions as per
Eq. (5)), we recommend increasing (k, `) when needing to “substantially” increase security.
Changing the γi should be reserved only for slight “tweaks” in the security levels.
References
[ABB+ 19] Erdem Alkim, Paulo S. L. M. Barreto, Nina Bindel, Patrick Longa, and
Jefferson E. Ricardini. The lattice-based digital signature scheme qtesla.
Cryptology ePrint Archive, Report 2019/085, 2019. https://eprint.iacr.
org/2019/085. 5
[ADH+ 19] Martin R Albrecht, Léo Ducas, Gottfried Herold, Elena Kirshanova, Ea-
monn W Postlethwaite, and Marc Stevens. The general sieve kernel and
new records in lattice reduction. In Annual International Conference on
the Theory and Applications of Cryptographic Techniques, pages 717–746.
Springer, 2019. 37
[ADPS16] Erdem Alkim, Léo Ducas, Thomas Pöppelmann, and Peter Schwabe. Post-
quantum key exchange – a new hope. In Proceedings of the 25th USENIX
Security Symposium, pages 327–343. USENIX Association, 2016. http://
cryptojedi.org/papers/#newhope. 21, 22, 27, 35, 37, 38
[AG11] Sanjeev Arora and Rong Ge. New algorithms for learning in presence of
errors. In ICALP (1), pages 403–415, 2011. 36
[AGVW17] Martin R Albrecht, Florian Göpfert, Fernando Virdia, and Thomas Wunderer.
Revisiting the expected cost of solving uSVP and applications to LWE.
In Tsuyoshi Takagi and Thomas Peyrin, editors, Advances in Cryptology –
ASIACRYPT 2017, volume 10211 of LNCS, pages 65–102. Springer, 2017.
https://eprint.iacr.org/2017/815. 37
[APS15] Martin R. Albrecht, Rachel Player, and Sam Scott. On the concrete hardness
of learning with errors. J. Mathematical Cryptology, 9(3):169–203, 2015.
https://eprint.iacr.org/2015/046. 37
[ARU14] Andris Ambainis, Ansis Rosmanis, and Dominique Unruh. Quantum attacks
on classical proof systems: The hardness of quantum rewinding. In FOCS,
pages 474–483, 2014. 23
[BDGL16a] Anja Becker, Léo Ducas, Nicolas Gama, and Thijs Laarhoven. New directions
in nearest neighbor searching with applications to lattice sieving. In SODA ’16
Proceedings of the twenty-seventh annual ACM-SIAM symposium on Discrete
Algorithms, pages 10–24. SIAM, 2016. https://eprint.iacr.org/2015/
1128. 27, 37
[BDGL16b] Anja Becker, Léo Ducas, Nicolas Gama, and Thijs Laarhoven. New directions
in nearest neighbor searching with applications to lattice sieving. In SODA,
pages 10–24. SIAM, 2016. 33
10 The two changes in this paragraph may require lowering d so as to keep the value kct k
0 ∞ smaller
than γ2 with high probability. Lowering d will increase the size of the public key.
CRYSTALS-Dilithium 29
[BG14] Shi Bai and Steven D. Galbraith. An improved compression technique for
signatures based on learning with errors. In CT-RSA, pages 28–47, 2014. 3,
5, 25, 33
[BHLY16] Leon Groot Bruinderink, Andreas Hülsing, Tanja Lange, and Yuval Yarom.
Flush, gauss, and reload - A cache attack on the BLISS lattice-based signature
scheme. In CHES, pages 323–345, 2016. 3
[BKW03] Avrim Blum, Adam Kalai, and Hal Wasserman. Noise-tolerant learning, the
parity problem, and the statistical query model. J. ACM, 50(4):506–519,
2003. 36
[BN06] Mihir Bellare and Gregory Neven. Multi-signatures in the plain public-key
model and a general forking lemma. In ACM Conference on Computer and
Communications Security, pages 390–399, 2006. 6
[BS16] Jean-François Biasse and Fang Song. Efficient quantum algorithms for com-
puting class groups and solving the principal ideal problem in arbitrary degree
number fields. In SODA, pages 893–902. SIAM, 2016. 36
[CDPR16] Ronald Cramer, Léo Ducas, Chris Peikert, and Oded Regev. Recovering
short generators of principal ideals in cyclotomic rings. In EUROCRYPT (2),
volume 9666 of Lecture Notes in Computer Science, pages 559–585. Springer,
2016. 36
[CDW17] Ronald Cramer, Léo Ducas, and Benjamin Wesolowski. Short Stickelberger
class relations and application to ideal-SVP. In EUROCRYPT (1), volume
10210 of Lecture Notes in Computer Science, pages 324–348, 2017. 36
[CGS14] Peter Campbell, Michael Groves, and Dan Shepherd. Soliloquy: A cautionary
tale. In ETSI 2nd Quantum-Safe Crypto Workshop, pages 1–9, 2014. 36
[CN11a] Yuanmi Chen and Phong Q. Nguyen. BKZ 2.0: Better lattice security
estimates. In ASIACRYPT, volume 7073 of Lecture Notes in Computer
Science, pages 1–20. Springer, 2011. 33
[CN11b] Yuanmi Chen and Phong Q. Nguyen. BKZ 2.0: Better lattice security esti-
mates. In Dong Hoon Lee and Xiaoyun Wang, editors, Advances in Cryptology
– ASIACRYPT 2011, volume 7073 of LNCS, pages 1–20. Springer, 2011. http:
//www.iacr.org/archive/asiacrypt2011/70730001/70730001.pdf. 37
[DDLL13] Léo Ducas, Alain Durmus, Tancrède Lepoint, and Vadim Lyubashevsky.
Lattice signatures and bimodal gaussians. In CRYPTO (1), pages 40–56,
2013. 3
[DFMS19] Jelle Don, Serge Fehr, Christian Majenz, and Christian Schaffner. Security of
the fiat-shamir transformation in the quantum random-oracle model. Cryp-
tology ePrint Archive, Report 2019/190, 2019. https://eprint.iacr.org/
2019/190. 7
[DKL+ 18] Léo Ducas, Eike Kiltz, Tancrède Lepoint, Vadim Lyubashevsky, Peter
Schwabe, Gregor Seiler, and Damien Stehlé. Crystals-dilithium: A lattice-
based digital signature scheme. IACR Trans. Cryptogr. Hardw. Embed. Syst.,
2018(1):238–268, 2018. 7, 17
[DLP14] Léo Ducas, Vadim Lyubashevsky, and Thomas Prest. Efficient identity-based
encryption over NTRU lattices. In ASIACRYPT, pages 22–41, 2014. 3
30 CRYSTALS-Dilithium
[DSDGR20] Dana Dachman-Soled, Léo Ducas, Huijing Gong, and Mélissa Rossi. Lwe with
side information: Attacks and concrete security estimation. 2020. https:
//eprint.iacr.org/2020/292.pdf. 37, 38
[Duc18] Léo Ducas. Shortest vector from lattice sieving: a few dimensions for free. In
Annual International Conference on the Theory and Applications of Crypto-
graphic Techniques, pages 125–145. Springer, 2018. 37, 38
[EFGT17] Thomas Espitau, Pierre-Alain Fouque, Benoît Gérard, and Mehdi Tibouchi.
Side-channel attacks on BLISS lattice-based signatures: Exploiting branch
tracing against strongswan and electromagnetic emanations in microcon-
trollers. In CCS, pages 1857–1874, 2017. 3
[HPS11] Guillaume Hanrot, Xavier Pujol, and Damien Stehlé. Algorithms for the
shortest and closest lattice vector problems. In IWCC, volume 6639 of Lecture
Notes in Computer Science, pages 159–190. Springer, 2011. 33
[KF17] Paul Kirchner and Pierre-Alain Fouque. Revisiting lattice attacks on over-
stretched NTRU parameters. In EUROCRYPT (1), volume 10210 of Lecture
Notes in Computer Science, pages 3–26, 2017. 36
[KLS18] Eike Kiltz, Vadim Lyubashevsky, and Christian Schaffner. A concrete treat-
ment of fiat-shamir signatures in the quantum random-oracle model. In
EUROCRYPT, pages 552–586, 2018. 5, 6, 7, 17, 24, 25
[LZ19] Qipeng Liu and Mark Zhandry. Revisiting post-quantum fiat-shamir. Cryp-
tology ePrint Archive, Report 2019/262, 2019. https://eprint.iacr.org/
2019/262. 7
[MLB17] Artur Mariano, Thijs Laarhoven, and Christian Bischof. A parallel variant
of LDSieve for the SVP on lattices. In 2017 25th Euromicro International
Conference on Parallel, Distributed and Network-based Processing (PDP),
pages 23–30. IEEE, 2017. 37
[PBY17] Peter Pessl, Leon Groot Bruinderink, and Yuval Yarom. To BLISS-B or not
to be: Attacking strongswan’s implementation of post-quantum signatures.
In CCS, pages 1843–1855, 2017. 3
CRYSTALS-Dilithium 31
[PS00] David Pointcheval and Jacques Stern. Security arguments for digital signatures
and blind signatures. J. Cryptology, 13(3):361–396, 2000. 6
[PSS+ 18] Damian Poddebniak, Juraj Somorovsky, Sebastian Schinzel, Manfred Lochter,
and Paul Rösler. Attacking deterministic signature schemes using fault attacks.
In EuroS&P, pages 338–352, 2018. 5, 14
[SBB+ 18] Niels Samwel, Lejla Batina, Guido Bertoni, Joan Daemen, and Ruggero
Susella. Breaking ed25519 in wolfssl. In CT-RSA, pages 1–20, 2018. 5, 14
[Sch89] Claus-Peter Schnorr. Efficient identification and signatures for smart cards.
In CRYPTO, pages 239–252, 1989. 23, 26
[SE94] Claus-Peter Schnorr and M. Euchner. Lattice basis reduction: Improved
practical algorithms and solving subset sum problems. Math. Program.,
66:181–199, 1994. 33
[YD17] Yang Yu and Léo Ducas. Second order statistical behavior of lll and bkz.
In International Conference on Selected Areas in Cryptography, pages 3–22.
Springer, 2017. 37
Proof. The output of Decomposeq is an integer r1 such that 0 ≤ r1 < (q − 1)/α and
another integer r0 such that kr0 k∞ ≤ α/2. Because kzk∞ ≤ α/2, the integer v1 :=
HighBitsq (r + z, α) either stays the same as r1 or becomes r1 ± 1 modulo m = (q − 1)/α.
More precisely, if r0 > 0, then −α/2 < r0 + z ≤ α. This implies that v1 is either r1 or
r1 + 1 mod m. If r0 ≤ 0, then we have −α ≤ r0 + z ≤ α/2. In this case, we have v1 = r1
or r1 − 1 mod m.
The MakeHintq routine checks whether r1 = v1 and outputs 0 if this is so, and 1 if
r1 6= v1 . The UseHintq routine uses the “hint” h to either output r1 (if y = 0) or, depending
on whether r0 > 0 or not, output either r1 + 1 mod+ m or r1 − 1 mod+ m.
The lemma below shows that r is not too far away from the output of the UseHintq
algorithm. This will be necessary for the security of the scheme.
Lemma 5. Let (h, r) ∈ {0, 1} × Zq and let v1 = UseHintq (h, r, α). If h = 0, then
kr − v1 · αk∞ ≤ α/2; else kr − v1 · αk∞ ≤ α + 1.
Proof. Let (r1 , r0 ) := Decomposeq (r, α). We go through all three cases of the UseHintq
procedure.
Case 1 (h = 0): We have v1 = r1 and
r − v1 · α = r1 · α + r0 − r1 · α = r0 ,
r − v1 · α = r1 · α + r0 − (r1 + 1 − κ · (q − 1)/α) · α
= −α + r0 + κ · (q − 1).
32 CRYSTALS-Dilithium
r − v1 · α = r1 · α + r0 − (r1 − 1 + κ · (q − 1)/α) · α
= α + r0 − κ · (q − 1).
w0 = LowBitsq (r + s, α) = r1 · α + r0 + s mod ± α = r0 + s.
r1 · α + r0 + s = r + s = w1 · α + w0 = r1 · α + w0 ,
and therefore r0 + s = w0 .
B Zero-Knowledge Proof
The security of our scheme does not rely on the part of the public key t0 being secret and
so we will be assuming that the public key is t rather than t1 .
We want to first compute the probability that some particular (z, c) is generated in
Step 19 taken over the randomness of y and the random oracle H which is modeled as a
random function. We have
Whenever z has all its coefficients less than γ1 − β then the above probability is
exactly the same for every such tuple (z, c). This is because kcsi k∞ ≤ β, and thus
kz − cs1 k∞ ≤ γ1 − 1, which is a valid value of y. Therefore, if we only output z when all
its coefficients have magnitudes less than γ1 − β, then the resulting distribution will be
uniformly random over Sγ`1 −β−1 × Bτ .
CRYSTALS-Dilithium 33
The simulation of the signature follows [Lyu12, BG14]. The simulator picks a uniformly
random (z, c) in Sγ`1 −β−1 × Bτ , after which it also makes sure that
By Equation (2), we know that w − cs2 = Az − ct, and therefore the simulator can
perfectly simulate this as well.
If z does indeed satisfy kLowBitsq (w−cs2 , 2γ2 )k∞ < γ2 −β, then as long as kcs2 k∞ ≤ β
(which is how we set β), we will have
C Concrete Security
C.1 Lattice Reduction and Core-SVP Hardness
The best known algorithm for finding very short non-zero vectors in Euclidean lattices is
the Block–Korkine–Zolotarev algorithm (BKZ) [SE94], proposed by Schnorr and Euchner
in 1991. More recently, it was proven to quickly converge to its fix-point [HPS11] and
improved in practice [CN11a]. Yet, what it achieves asymptotically remains unchallenged.
BKZ with block-size b makes calls to an algorithm that solves the Shortest lattice
Vector Problem (SVP) in dimension b. The security of our scheme relies on the necessity
to run BKZ with a large block-size b and the fact that the cost of solving SVP is
exponential in pb. The best known classical SVP solver [BDGL16b] runs in time ≈ 2cC ·b
with cC = log2 3/2 ≈ 0.292. The best p known quantum SVP solver [Laa15, Sec. 14.2.10]
runs in time ≈ 2cQ ·b with cQ = log2 13/9 ≈ 0.265. One may hope to improve these run-
times, but going below ≈ 2cP ·b with cP = log2 4/3 ≈ 0.2075 would require a theoretical
p
breakthrough. Indeed, the best known SVP solvers rely on covering the b-dimensional
sphere with cones of center-to-edge angle π/3: this requires 2cP ·b cones. The subscripts C,
Q, P respectively stand for Classical, Quantum and Paranoid.
The strength of BKZ increases with b. More concretely, given as input a basis (c1 , . . . , cn )
of an n-dimensional lattice, BKZ repeatedly uses the b-dimensional SVP-solver on lattices
of the form (ci+1 (i), . . . , cj (i)) where i ≤ n, j = min(n, i + b) and where ck (i) denotes
the projection of ck orthogonally to the vectors (c1 , . . . , ci ). The effect of these calls
is to flatten the curve of the `i = log2 kci (i − 1)k’s (for i = 1, . . . , n). At the start of
the execution, the `i ’s typically decrease fast, at least locally. As BKZ preserves the
determinant of the ci ’s, the sum of the `i ’s remains constant throughout the execution,
and after a (small) polynomial number of SVP calls, BKZ has made the `i ’s decrease less.
It can be heuristically estimated that for sufficiently large b, the local slope of the `i ’s
converges to
1
b
slope(b) = log2 (π · b)1/b
,
b−1 2πe
unless the local input `i ’s are already too small or too large. The quantity slope(b) decreases
with b, implying that the larger b the flatter the output `i ’s.
34 CRYSTALS-Dilithium
`i `i
log2 q log2 q
Zone 1
Zone 3
Zone 1
Zone 2
Zone 3
0 0
0 i 0 i
Before reduction After b-BKZ with small b
`i `i
log2 q Zone 2 log2 q
Zone 3
Zone 2
0 0
0 i 0 i
After b-BKZ with med. b After b-BKZ with large b
Figure 11: Evolution of Gram-Schmidt length in log-scale under BKZ reduction for various
blocksizes. The area under the curves remains constant, and the slope in Zone 2 decrease
with the blocksize. Note that Zone 3 may disappear before Zone 1, depending on the shape
of the input basis.
In our case, the input `i ’s are of the following form (cf. Fig. 11). The first ones
are all equal to log2 q and the last ones are all equal to 0. BKZ will flatten the jump,
decreasing `i ’s with small i’s and increasing `i ’s with large i’s. However, the local slope
slope(b) may not be sufficiently small to make the very first `i ’s decrease and the very
last `i ’s increase. Indeed, BKZ will not increase (resp. increase) some `i ’s if these are
already smaller (resp. larger) than ensured by the local slope guarantee. In our case, the
`i ’s are always of the following form at the end of the execution:
• The first `i ’s are constant equal to log2 q (this is the possibly empty Zone 1).
• Then they decrease linearly, with slope slope(b) (this is the never-empty Zone 2).
• The last `i ’s are constant equal to 0 (this is the possibly empty Zone 3).
The graph is continuous, i.e., if Zone 1 (resp. Zone 3) is not empty, then Zone 2 starts
with `i = log2 q (resp. ends with `i = 0).
Given an LWE instance, there are two lattice-based attacks. The primal attack and
the dual attack. Here, the primal attack consists in finding a short non-zero vector in the
lattice Λ = {x ∈ Zd : Mx = 0 mod q} where M = (rot(A)[1:m] |Im |vec(t)[1:m] ) is an m × d
matrix where d = 256 · ` + m + 1 and m ≤ 256 · k. Indeed, it is sometime not optimal to
use all the given equations in lattice attacks.
We tried all possible number m of rows, and, for each trial, we increased the blocksize of b
until the value `d−b obtained as explained above was deemed sufficiently large. As explained
CRYSTALS-Dilithium 35
in [ADPS16, Sec. 6.3], if 2`d−b is greater than the expected norm of (vec(s1 ), vec(s2 )) after
projection orthogonally to the first d − b vectors, it is likely that the MLWE solution can
be easily extracted from the BKZ output.
The dual attack consists in finding a short non-zero vector in the lattice Λ0 = {(x, y) ∈
Z × Zd : MT x + y = 0 mod q)}, M = (rot(A)[1:m] ) is an m × d matrix where d = 256 · `
m
and m ≤ 256 · k. Again, for each value of m, we increased the value of b until the
value `1 obtained as explained above was deemed sufficiently small according the analysis
of [ADPS16, Sec. 6.3].
norm ≈ 2`i after projection orthogonally to the first i − 1 basis vectors. Now, let us look
closely at the shape of such a vector. As the first i − 1 basis vectors are the first i − 1
canonical unit vectors multiplied by q, projecting orthogonally to these consists in zeroing
the first i − 1 coordinates. The remaining w − i + 1 coordinates have total Euclidean
norm ≈ 2`i ≈ q, and the last w − j coordinates√are 0. We heuristically assume that these
coordinates have similar magnitudes σ ≈ 2`i / j − i + 1; we model each such coordinate
b
as a Gaussian of standard deviation σ. We assume that each one of our 4/3 vectors
p
has its first i − 1 coordinates independently uniformly distributed modulo q, and finally
compute the probability that all coordinates in both ranges [0, i − 1] and [i, j] are less than
B in absolute value. Our cost estimate is the inverse of that probability multiplied by the
run-time of our b-dimensional SVP-solver.
Forgetting q-vectors. For all the parameter sets in Table 1 and Table 2, the best
parametrization of the attack above kept the basis in a shape with a non-trivial Zone 1.
We note that the coordinates in this range have a quite lower probability of passing the
`∞ constraint than coordinates in Zone 2. We therefore considered a strategy consisting
of “forgetting” the q-vectors, by re-randomizing the input basis before running the BKZ
11 Note that a solution to Eq. (13) would require the coefficient in from of t0 to be ±1, while we’re
allowing any small polynomial. Furthermore, as discussed after Eq. (13), some parts of the real solution
are smaller than the bound ζ, but we’re ignoring this for the sake of being conservative with our analysis.
36 CRYSTALS-Dilithium
`i `i
log2 q
Zone 1
Zone 2
Zone 3
Zone 2
Zone 3
0 0
0 i 0 i
Keeping q-vectors Forgetting q-vectors
Figure 12: Effect of forgetting q-vectors by randomization, under the same BKZ-blocksize
b.
algorithm. For the same blocksize b, this makes Zone 1 of the output basis disappear
(BKZ does not find the q-vectors), at the cost of producing a basis with first vectors of
larger Euclidean norms. This is depicted in Fig. 12.
It turns out that this strategy always improves over the previous strategy for the
parameter ranges considered in Table 2. We therefore used this strategy for our security
estimates.
Algebraic attacks. One specificity of our LWE and SIS instances is that they are inherited
from MLWE and MSIS instances. One may wonder whether the extra algebraic structure
of the resulting lattices can be exploited by an attacker. The line of work of [CGS14,
BS16, CDPR16, CDW17] did indeed find new cryptanalytic results on certain algebraic
lattices, but [CDW17] mentions serious obstacles towards breaking cryptographic instances
of Ring-LWE. By switching from Ring-LWE to MLWE, we get even further away from
those weak algebraic lattice problems.
Dense sublattice attacks. Kirchner and Fouque [KF17] showed that the existence of
many linearly independent and unexpectedly short lattice vectors (much shorter than
Minkowski’s bound) helps BKZ run better than expected in some cases. This could
happen for our primal LWE attack, by extending M = (rot(A)[1:m] |Im |vec(t)[1:m] ) to
(rot(A)[1:m] |Im |rot(t)[1:m] ): the associated lattice now has 256 linearly independent short
vectors rather than a single one. The Kirchner-Fouque analysis of BKZ works best if both
q and the ratio between the number of unexpectedly short vectors and the lattice dimension
are high. In the NTRU case, for example, the ratio is 1/2, and, for some schemes derived
from NTRU, the modulus q is also large. We considered this refined analysis of BKZ in
our setup, but, to become relevant for our parameters, it requires a parameter b which
is higher than needed with the usual analysis of BKZ. Note that [KF17] also arrived to
the conclusion that this attack is irrelevant in the small modulus regime, and is mostly a
threat to fully homomorphic encryption schemes and cryptographic multilinear maps.
Note that, once again, the switch from Ring-LWE to MLWE takes us further away
from lattices admitting unconventional attacks. Indeed, the dimension ratio of the dense
sub-lattice is 1/2 in NTRU, at most 1/3 in lattices derived from Ring-LWE, and at most
1/(` + 2) in lattices derived from MLWE.
CRYSTALS-Dilithium 37
Specialized attack against `∞ -SIS. At last, we would like to mention that it is not clear
whether the attack sketched in Appendix C.3 above for SIS in infinity norm is optimal.
Indeed, as we have seen, this approach produces many vectors, with some rather large
uniform coordinates (at indices 1, . . . , i), and smaller Gaussian ones (at indices i, . . . , j). In
our current analysis, we simply hope that one of the vector satisfies the `∞ bound. Instead,
one could combine them in ways that decrease the size of the first (large) coefficients, while
letting the other (small) coefficients grow a little bit.
This situation created by the use of `∞ -SIS (see Remark 1) has — to the best of our
knowledge — not been studied in detail. After a preliminary analysis, we do not consider
such an improved attack a serious threat to our concrete security claims, especially in light
of the approximations already made in favor of the adversary.
• It uses the probabilistic simulation of [DSDGR20] rather the the GSA-intersect model
of [ADPS16, AGVW17] to determine the BKZ blocksize b for a successful attack.
The required blocksize is somewhat larger (by an added term between 10 and 20 for
our parameters), because of a “tail” phenomenon [YD17],
• It accounts for the “few dimensions for free” introduced in [Duc18], which permits
to solve SVP in dimension β while running sieving in a somewhat smaller dimension
b0 = b − o(b),
• It relies on the concrete estimation for the cost of sieving in gates from [AGPS19]
• It dismisses the dual attack as realistically more expensive than the primal one,
noting that the analysis of [ADPS16] (also used here) by assuming that the sieve
provides 2.208b many vectors as short as the shortest one. First, most of those vectors
are larger by a factor 4/3, secondly the trick of exploiting all those vectors is not
p
13 https://github.com/lducas/leaky-LWE-Estimator/tree/NIST-round3