LEDAcrypt v3
LEDAcrypt v3
LEDAcrypt v3
Submitters
Marco Baldi (phone: +39 071 220 4894), Università Politecnica delle Marche, Dipartimento di
Ingegneria dell’Informazione (DII), Via Brecce Bianche 12, I-60131, Ancona, Italy.
There are no auxiliary submitters. The principal submitter is the team listed above.
Same as submitters.
Same as submitters.
Gerardo Pelosi (phone: +39 02 2399 3476), Politecnico di Milano, Dipartimento di Elettronica,
Informazione e Bioingegneria (DEIB), Via G. Ponzio 34/5, I-20133, Milano, Italy.
Page 3
Contents
Foreword 9
4
LEDAcrypt
Page 5
LEDAcrypt
Bibliography 124
A Deriving the bit-flipping probabilities for the in-place iteration function of the
LEDAdecoder 130
Page 6
Acronyms
BF Bit Flipping
GE Gate Equivalent
KI Kobara-Imai
NP Nondeterministic-Polynomial
QC Quasi-Cyclic
7
LEDAcrypt
TM Turing Machine
Page 8
Foreword
This document provides a complete and detailed specification of the post-quantum cryptographic
primitives named LEDAcrypt (Low-dEnsity parity-check coDe-bAsed cryptographic systems), sub-
mitted to the 2nd round of the NIST post-quantum contest [1] and resulting from the merger
between the LEDAkem and LEDApkc proposals submitted to the 1st round of the contest [54].
LEDAcrypt provides a set of cryptographic primitives based on binary linear error-correcting codes.
In particular, the following cryptographic primitives are proposed:
i. We present a theoretical model to provide sound upper bounds to the probability of a de-
cryption failure of the underlying Niederreiter/McEliece encryption schemes, when employing
QC-LDPC codes as the code family of choice.
ii. We employ the constructions provided by [39, 45] to attain provable IND-CCA2 guarantees
in the proposed LEDAcrypt-KEM and LEDAcrypt-PKC primitives, respectively.
iii. We propose a method to allow the use of the classic definition of Decoding Failure Rate (DFR)
from coding theory as a proxy for the notion of δ-correctness required by the construction
in [39] for IND-CCA2 KEMs, under the assumption of a preimage resistant extensible output
function (XOF) being available.
iv. We provide three choices for the rate of the underlying QC-LDPC codes, and two choices
of DFR for each security category specified by NIST. We highlight the engineering tradeoffs
between key size, overall transmitted data, and execution time.
v. We employ a reproducible parameter design procedure relying on finite-regime estimates of
the computational effort required to carry out attacks against LEDAcrypt and jointly optimize
the parameters for security and DFR.
vi. We provide a constant time software implementation for the proposed LEDAcrypt-KEM
and LEDAcrypt-PKC, optimized for the Intel Haswell Instruction Set Architecture, which
exploits the presence of the Intel AVX2 instruction set extension.
9
LEDAcrypt
The main innovations with respect to the second round specification are as follows:
• We propose, and analyze from the DFR standpoint two decoding strategies, namely in-place
and out-of-place Bit Flipping (BF) decoding. We provide closed form worst case estimates for
the DFR of both of them, under consolidated assumptions from the literature on iteratively
decoded codes. Our worst-case estimates for the DFR allow us to match the requirements
for an IND-CCA2 construction with the choice of either one of the decoding strategies. Our
models for the worst case behaviour of the DFR consider a finite number of decoder iterations,
enabling the constant time implementation of the decoder. In particular, we choose to consider
BF decoders with two iterations for performance reasons.
• We provide an IND-CCA2 construction for LEDAcrypt-KEM which allows employing the
common DFR notion from the coding theory lexicon, matching it with the requirement of
the δ-correctness from the [39] proofs, and we provide a proof of its IND-CCA2 guarantees,
under the (provided) bounded DFR of the employed QC-LDPC code.
• A recent attack [2] highlighted that the product structure of the secret key matrix, HQ, can
be exploited to lower the security margin provided by LEDAcrypt. We adopt a conservative
solution, that is consider Q = I, removing the effects of the product structure altogether. A
brief justification of this rationale is provided in Section 2.4, from which the current specifica-
tion assumes that the secret key is a randomly drawn block circulant matrix H. While making
this choice reduces the speed advantages coming from the performance-oriented criterion of
using of a product-based secret key, we consider the importance of the security of the scheme
paramount with respect to the speed gains.
• New parameterizations for LEDAcrypt instances considering the case Q = I are provided.
In addition to the previous specification, we also provide parameters for all the code rates
( 12 , 32 , 43 ) also for the case of IND-CCA2 primitives. The parameters are obtained through a
joint optimization of the circulant matrix size p, the error weight t and the column weight
of the parity-check matrix v. Our previous strategy relied only on enlarging the circulant
matrix size to attain the desired DFR, starting from IND-CPA secure parameters. Our new
approach allows further reductions of the key size.
• We performed an exploration of the design space for the implementation of the low level
primitives involved in the computation of the LEDAcrypt ciphers, namely the binary poly-
nomial multiplication, the modular binary polynomial inversion and the BF decoder. We
motivate the implementation choices made through benchmarking of the different alterna-
tives, providing a constant time implementation of LEDAcrypt-KEM and LEDAcrypt-PKC,
and a speed-oriented, variable time implementation of LEDAcrypt-KEM-CPA for ephemeral
key use.
• Despite forgoing the speedup provided by the use of the product structure, the improvements
in the cryptosystem engineering allowed us to improve by 20% to 100% (depending on the
NIST category) the computation speed of the LEDAcrypt-KEM-CPA primitive with respect
to the one reported in the NIST Second PQC Standardization Conference. Furthermore,
despite the additional overhead imposed by the constant time implementation requirement,
we either match, or improve by 30% the computation speed of LEDAcrypt-KEM with respect
to the results reported in the aforementioned venue. Analyzing the obtained results, we
provide recommendations on the choices for the code rate to be made, depending on which
figure of merit (computation speed, bandwidth, or a combined metric) is chosen.
Page 10
Chapter 1
This chapter recalls some basic concepts and provides a functional description of the LEDAcrypt
primitives.
1.1 Preliminaries
Let us introduce a set of background notions and nomenclature concerning finite fields and circu-
lant matrix algebra, binary error correcting codes, and code-based cryptosystems, which will be
employed in LEDAcrypt.
11
LEDAcrypt
identity element is the v × v identity matrix. The algebra of the polynomial ring F2 [x]/hxv + 1i is
isomorphic to the ring of v × v circulant matrices over F2 with the following map
v−1
X
A ↔ a (x) = ai xi . (1.3)
i=0
According to (1.3), any binary circulant matrix is associated to a polynomial in the variable x
having coefficients over F2 that coincide with the entries in the first row of the matrix
In addition, according to (1.3), the all-zero circulant matrix corresponds to the null polynomial and
the identity matrix to the unitary polynomial.
The ring of polynomials F2 [x]/hxv + 1i includes elements that are zero divisors: such elements
are mapped onto singular circulant matrices over F2 . Avoiding such matrices is important in the
computation of the LEDAcrypt primitives, as non-singular v × v circulant matrices are required.
However, a proper selection of the size v of a circulant matrix allows to easily generate invertible
instances of it as described next.
Polynomial inversion in a finite field In order to provide efficient execution for the LEDAcrypt
primitives, it is crucial to be able to efficiently check invertibility of a binary circulant matrix, and
to generate a non-singular circulant matrix efficiently. To this end, we exploit the isomorphism (1.3)
between p × p binary circulant matrices and polynomials in F2 [x]/hxp + 1i, turning the problem into
providing an efficient criterion for the invertibility of an element of F2 [x]/hxp + 1i and describing
an efficient way to generate such invertible polynomials. In the following, we recall some facts
from finite field theory, and we derive a necessary and sufficient condition for the invertibility of
an element of F2 [x]/hxp + 1i, provided p is chosen according to the described criterion. Let Fqm be
the finite field of cardinality q m , with q a prime power and m a positive integer; given an element
α ∈ Fqm , the following propositions hold [75]:
(i) The minimal polynomial of α with respect to Fq , i.e., the nonzero monic polynomial f (x) ∈
Fq [x] of the least degree such that f (α) = 0, always exists, it is unique, and it is also irreducible
over Fq .
(ii) If a monic irreducible polynomial g(x) ∈ Fq [x] has α ∈ Fqm as a root, then it is the minimal
polynomial of α with respect to Fq .
Definition 1.1.1 Let n be a positive integer and q a prime power such that gcd(n, q) = 1, which
means that n and q are coprime. A cyclotomic coset of q mod n containing the value a ∈ Zn is
defined as
Ca = {aq j mod n : j = 0, 1, . . .}.
A subset {a1 , . . . , as } ⊆ Zn is named as a complete set of representatives of cyclotomic cosets of
[s
q mod n if ∀ i 6= j Cai ∩ Caj = ∅ and Caj = Zn .
j
It is worth noting that the previous definition allows to easily infer that two cyclotomic cosets are
either equal or disjoint. Indeed, given two cyclotomic cosets Ca1 and Ca2 , with a1 6= a2 mod n, if
Page 12
LEDAcrypt
Ca1 ∩ Ca2 6= ∅, two positive integers j and k such that a1 q j = a2 q k mod n should exist. Assuming
(without loss of generality) that k ≥ j, the condition gcd(n, q) = 1 would ensure the existence of the
multiplicative inverse of q and consequentially that a1 = a2 q k−j mod n, which in turn would imply
that the cyclotomic coset including a1 is a subset of the coset including a2 , i.e., Ca1 ⊆ Ca2 . However,
as the previous equality can be rewritten as a2 = a1 (q −1 )k−j mod n, it would also imply Ca2 ⊆ Ca1 ,
leading to conclude that a1 = a2 mod n, which is a contradiction of the initial assumption about
them being different.
Two notable theorems that make use of the cyclotomic coset definition to determine the minimal
polynomials of every element in a finite field can be stated as follows [75].
TheoremY 1.1.1 Let α be a primitive element of Fqm , the minimal polynomial of αi in Fq [x] is
g (i) (x) = (x − αj ), where Ci is the unique cyclotomic coset of q mod q m − 1 containing i.
j∈Ci
Theorem 1.1.2 Given a positive integer n and a prime power q, with gcd(n, q) = 1, let m be
a positive integer such that n | (q m − 1). Let α be a primitive element of Fqm and let g (i) (x) ∈
Fq [x] be the minimal polynomial of αi ∈ Fqm . Denoting as {a1 , . . . , as } ⊆ Zn a complete set of
representatives of cyclotomic cosets of q mod n, the polynomial xn − 1 ∈ Fq [x] can be factorized as
the product of monic irreducible polynomials over Fq , i.e.,
s
Y
(q m −1)ai
xn − 1 = g n
(x).
i=1
Corollary 1.1.1 Given a positive integer n and a prime power q, with gcd(n, q) = 1, the number
of monic irreducible factors of xn − 1 ∈ Fq [x] is equal to the number of cyclotomic cosets of q mod
n.
From the previous propositions on the properties of finite fields, it is possible to derive the following
results.
Corollary 1.1.2 Given an odd prime number p, if 2 is a primitive element in the finite field
Zp then the irreducible (non trivial) polynomials being a factor of xp + 1 ∈ F2 [x] are x + 1 and
Φ(x) = xp−1 + xp−2 + · · · + x + 1.
Proof. Considering the ring of polynomials with binary coefficients F2 [x] and picking a positive
integer n as an odd prime number (i.e., n = p), Corollary 1.1.1 ensures that the number of factors
of xp + 1 ∈ F2 [x] equals the number of cyclotomic cosets of 2 mod p.
If 2 is a primitive element of Zp , its order, ordp (2), is equal to the order of the (cyclic) multiplicative
group of the field, i.e., ordp (2) = | (Zp \ {0}, ·) | = p − 1; thus, the said cyclotomic cosets can be
listed as: C0 = {0 · 2j mod p : j = 0, 1, . . . } = {0} and C1 = {1 · 2j mod p : j = 0, 1, . . . } = Zp \ {0}.
The polynomial xp − 1 ∈ F2 [x] admits α = 1 as a root, therefore its two (non trivial) factors can
p +1
be listed as: x + 1 and xx+1 = xp−1 + xp−2 + · · · + x + 1.
Theorem 1.1.3 (Invertible elements in F2 [x]/hxp + 1i) Let p be a prime number such that
ordp (2) = p − 1 (i.e., 2 is primitive element in the field Zp ). Let g(x) be a binary polynomial in
F2 [x]/hxp + 1i, with deg(g(x)) > 0.
g(x) has a multiplicative inverse in F2 [x]/hxp + 1i if and only if it contains an odd number of terms
and g(x) 6= Φ(x), with Φ(x) = xp−1 + xp−2 + · · · + x + 1.
Page 13
LEDAcrypt
Proof. If g(x) ∈ F2 [x]/hxp + 1i contains an odd number of terms and g(x) 6= Φ(x), to prove it is
invertible modulo xp + 1 we need to consider that gcd(g(x), xp + 1) = gcd(g(x), (x + 1)Φ(x)).
It is easy to observe that x + 1 does not divide g(x), i.e., (x + 1) - g(x), as g(1) = 1, thus they
are coprime. Considering Φ(x), we know by hypothesis that ordp (2) = p − 1, therefore Φ(x) is
irreducible over F2 [x] (see Corollary 1.1.2), which excludes that g(x) | Φ(x).
To the end of proving that g(x) and Φ(x) are coprime, it has to hold that Φ(x) - g(x). To this end
assume, by contradiction, that g(x)h(x) = Φ(x) for a proper choice of h(x) ∈ F2 [x]. The previous
equality entails that deg(g(x)) + deg(h(x)) = p − 1, while deg(g(x)) ≤ p − 1, which in turn leaves
deg(h(x)) = 0 as the only option, leading to conclude h(x) = 0 or h(x) = 1. In case h(x) = 0,
the equality g(x) · 0 = xp−1 + xp−2 + · · · + x + 1 is false, while in case h(x) = 1, the equality
g(x) · 1 = Φ(x) contradicts the hypothesis. Since we proved that g(x) - Φ(x) and Φ(x) - g(x),
g(x) 6= Φ(x) by hypothesis, we can infer that they are coprime.
Finally, being gcd(g(x), xp + 1) = gcd(g(x), (x + 1)Φ(x)) = 1 we conclude that g(x) is invertible.
To prove the other implication of the theorem, if g(x) ∈ F2 [x]/hxp + 1i, with deg(g(x)) > 0, is
invertible we need to derive that g(x) must have an odd number of terms and be different from
Φ(x). Being g(x) invertible, this means that gcd(g(x), xp + 1) = gcd(g(x), (x + 1)Φ(x)) = 1, which
in turn means that gcd(g(x), x + 1) = 1 and gcd(g(x), Φ(x)) = 1 that guarantees that g(x) 6= Φ(x)
and that g(1) = 1. Willing to prove that g(x) must have an odd number of terms, assume, by
contradiction, it has an even number of terms. Regardless of which terms are contained in g(x)
this means that it admits 1 as a root, which contradicts the premise.
Invertibility of square quasi cyclic binary matrices In the construction of the LEDAcrypt
primitives we will need to generate QC binary square matrices that must be invertible. To remove
the need to compute the matrix determinant, as it is computationally demanding, we prove and
employ the following theorem. We denote as perm (·) the permanent of a matrix, and as w (·) the
number of the nonzero coefficients of a polynomial, a quantity also known as its weight.
Theorem 1.1.4 Let p > 2 be a prime such that ordp (2) = p − 1 and Q is an n0 × n0 matrix of
elements of F2 [x]/hxp + 1i; if perm (w(Q)) is odd and perm (w(Q)) < p, then Q is non-singular.
Proof. Since each block Qij is isomorphic to a polynomial qij (x) ∈ F2 [x]/hxp + 1i, the determinant
of the matrix Q is represented as an element of F2 [x]/hxp + 1i as well. Let us denote by d(x) the
polynomial associated to the determinant. If the inverse of d(x) exists, then Q is non-singular.
According to Theorem 1.1.3, showing that d(x) has odd weight and d(x) 6= Φ(x) = xp−1 + xp−2 +
· · · + 1 is enough to guarantee that it is invertible. In general, when we are considering two
polynomials a(x) and b(x), with w (a(x)) = wa and w (b(x)) = wb , the following statements hold:
ii. w (a(x) + b(x)) = wa + wb − 2c2 , where c2 is the number of cancellations due to monomials
with the same exponent appearing in both polynomials.
The determinant d(x) is obtained through multiplications and sums of the elements qij (x) and, in
case of no cancellations, has weight equal to perm (w(Q)). If some cancellations occur, considering
statements i) and ii) above, we have that w (d(x)) = perm (w(Q))−2c, where c is the overall number
of cancellations. So, even when cancellations occur, d(x) has odd weight only if perm (w(Q)) is
Page 14
LEDAcrypt
odd. In addition, the condition perm (w(Q)) < p guarantees that d(x) 6= Φ(x), since w (Φ(x)) = p.
With this result, we can guarantee that a QC matrix is non-singular, provided that the weights of
its circulant blocks are chosen properly.
Binary error correcting codes rely on a redundant representation of information in the form of
binary strings to be able to detect and correct accidental bit errors which may happen during
transmission or storage. We employ binary codes acting on a finite binary sequence at once, known
as the information word, which are known as block codes. We will refer to them simply as binary
codes in the following.
In this setting, let F2 be the binary finite field with the addition and multiplication operations
that correspond to the usual exclusive-or and logical product between two Boolean values. Let Fk2
denote the k-dimensional vector space defined on F2 . A binary code, denoted as C (n, k), is defined
as a bijective map C (n, k) : Fk2 → Fn2 , n, k ∈ N, 0 < k < n, between any binary k-tuple (i.e., an
information word) and a binary n-tuple (denoted as codeword). The value n is known as the length
of the code, while k is denoted as its dimension.
Encoding through C (n, k) means converting an information word u ∈ Fk2 into its corresponding
codeword c ∈ Fn2 . Given a codeword ĉ corrupted by an error vector e ∈ Fn2 with Hamming weight
t > 0 (ĉ = c + e), decoding aims to recover both the value of the information word u and the value
of the error vector e. A code is said to be t-error correcting if, for any value of e, given ĉ there is a
decoding procedure to retrieve both the error vector e and the original information word u.
Definition 1.1.2 (Linear Code) The code C (n, k) is linear if and only if the set of its 2k code-
words is a k-dimensional subspace of the vector space Fn2 .
A property of linear block codes that follows from Definition 1.1.2 is that the sum modulo 2, i.e.,
the component wise exclusive-or, of two codewords is also a codeword.
Definition 1.1.3 (Minimum distance) Given a linear binary code C (n, k), the minimum dis-
tance of C (n, k) is the minimum Hamming distance among all the distances which can be computed
between a pair of its codewords.
If the code is linear, its minimum distance coincides with the minimum Hamming weight of its
nonzero codewords.
Definition 1.1.4 (Encoding) Given C (n, k), a linear error correcting code, and Γ ⊂ Fn2 the vector
subspace containing its 2k codewords, it is possible to represent it choosing k linearly independent
codewords {g0 , g1 , . . . gk−1 } ∈ Fn2 to form a basis of Γ. Any codeword c = [c0 , c1 , . . . , cn−1 ] can be
expressed as a linear combination of the vectors of the basis
where the binary coefficients ui can be thought as the elements of an information vector u =
[u0 , u1 , . . . , uk−1 ], which the code maps into c. We then say that u is encoded into c.
Page 15
LEDAcrypt
Equation (1.5) can be rewritten as c = uG, where G is a k ×n binary matrix known as the generator
matrix of the code C (n, k), i.e.,
g0
g1
G= .
..
.
gk−1
Since any set of k linearly independent codewords can be used to form G, a code can be represented
by different generator matrices. Among the possible generator matrices for a linear code, one known
as systematic can always be derived.
Definition 1.1.5 (Systematic Encoding) A linear error correcting code C (n, k) is said to have
systematic encoding, or to be systematic in short, if each one of its codewords contains the infor-
mation vector it is associated to.
A conventional way to express a systematic code is the one where each n-bit codeword, c, is obtained
by appending r = n − k redundancy bits (ck , ck+1 , . . . , cn−1 ) to its corresponding k-bit information
word (i.e., c0 , c1 , . . . , ck−1 , with ci = ui , 0 ≤ i < k): c = [u0 , u1 , . . . , uk−1 |ck , ck+1 , . . . , cn−1 ]. It
follows that the associated k × n generator matrix G can be written as G = [Ik |P ], where Ik
denotes the k × k identity matrix and P is a k × r binary matrix.
Let us consider the set of all n-bit vectors in Fn2 that are orthogonal to any codeword of the code
subspace Γ, known as its orthogonal complement Γ⊥ . Its dimension is dim Γ⊥ = n − dim (Γ) =
The r×n matrix H is known as a parity-check matrix of the code C (n, k), while, for any n-bit vector
x ∈ Fn2 , the r × 1 vector s = HxT , where T denotes transposition, is known as the syndrome of x
through H. Given that H is a basis of Γ⊥ , every codeword c ∈ Γ satisfies the equality HcT = 0r×1
where 0r×1 is the r × 1 zero vector, i.e., a codeword belonging to C (n, k) has a null syndrome
through H.
It is easy to show that the generator matrix G and the parity-check matrix H are two equivalent
descriptions of a linear code. Indeed, we have that HcT = HGT uT = 0r×1 , ∀ u ∈ Fk2 , yielding in
turn that HGT = 0r×k . Exploiting the aforementioned relation, it is possible to derive H from G
and vice versa. Let us consider, for the sake of clarity, the case of a systematic code C (n,k) with
G = [Ik |P ]. It is possible to obtain the corresponding parity-check matrix H as P T |Ir , which
satisfies HGT = P T + P T = 0r×k . Finally, considering a generic parity-check matrix H = [A|B],
with A an r × k matrix and B an r × r non-singular
h matrix, a systematic generator matrix of the
−1
T i
corresponding code is computed as G = Ik | B A , being B −1 the inverse of matrix B.
A QC code is defined as a linear block code C (n, k) having information word size k = pk0 and
codeword size n = pn0 , where n0 is denoted as basic block length of the code and each cyclic shift
of a codeword by n0 symbols results in another valid codeword [72].
Page 16
LEDAcrypt
LEDAcrypt hinges on a QC code C (pn0 , pk0 ) having the generator and parity-check matrices com-
posed by p × p circulant sub-matrices (blocks).
A Low-Density Parity-Check (LDPC) code C (n, k) is a special type of linear block code charac-
terized by a sparse parity-check matrix H. In particular, the Hamming weight of a column of H,
denoted as dv , is much smaller than its length r and increases sub-linearly with it. In terms of error
correction capability, LDPC codes having a non-constant weight for either the rows or the columns
of H, hence known as irregular LDPC codes, were proven to approach the channel capacity [50].
Considering the parity-check matrix H of an LDPC code as the incidence matrix of a graph, such
a graph is known as Tanner graph, and it has been shown that keeping the number of short cycles
as small as possible in such a graph is beneficial to the error correction performance of the code.
The peculiar form of LDPC codes allows to devise an efficient decoding procedure, provided their
parity-check matrix H is known, via algorithms known as BF decoders [33]. Indeed, BF algorithms
perform decoding with a fixed-point procedure which exploits the form of H to iteratively deduce
which bits of an error-affected codeword should be flipped in order to obtain a zero-valued syndrome
for it. If the fixed-point procedure converges within a desired amount of iterations to a zero-valued
syndrome, the decoding action is deemed successful.
The rationale of BF decoders is in considering the parity-check matrix H as the description of a
set of r equations in the codeword bits yielding the syndrome bits as their results. Such equations
are known as parity-check equations, or parity checks, in short. In this context, the one-valued
coefficients of the i-th column of a parity matrix H can be thought of as the indicators of which
parity checks of the code are involving the i-th bit of the received codeword. The result of each
one of the said parity checks is a bit in the syndrome, hence a zero-valued syndrome indicates a
set of successful parity checks, and thus a correct codeword. The convergence of the fixed-point
decoder is influenced by the number of parity checks in which each codeword element is involved:
in particular, being involved in a small number of parity checks speeds up the convergence.
An LDPC code may also be a QC code, expressed with a QC parity-check or generator matrix,
hence being named a QC-LDPC code, which is indeed the case of the codes employed in LEDAcrypt.
An efficient BF decoding procedure for QC-LDPC codes can be devised relying on the number of
unsatisfied parity checks to which a codeword bit concurs as an estimate of it being affected by an
error. We describe such a procedure in Algorithm 1, where the sparse and QC nature of the matrix
H is explicitly exploited.
To this end H is represented as r0 × n0 sparse p × p circulant blocks, and only the positions of the
first column of each block are memorized in Hsparse. Algorithm 1 receives, alongside Hsparse,
the error-affected codeword to be corrected x, its syndrome computed as s = HxT , and performs
the fixed-point decoding procedure for a maximum of imax iterations. The algorithm outputs its
best estimate for the correct codeword c and a Boolean variable decodeOk reporting the success,
or not, of the decoding procedure. The procedure iterates at fixed-point (loop at lines 4–20) the
decoding procedure, which starts by counting how many unsatisfied parity checks a codeword bit
is involved into (lines 5–12). Such a value is obtained considering which are the asserted bits in a
given column of H, taking care of accounting for its sparse representation, and the cyclic nature of
its blocks (line 10). Whenever a bit in the i-th column and assertedHbitPos-th row of H is set,
it is pointing to the fact that the i-th bit of the codeword is involved in the assertedHbitPos-th
parity-check equation. Thus, if the assertedHbitPos-th bit of the syndrome is unsatisfied, i.e.,
equal to 1, the number of unsatisfied parity checks of the i-th bit is incremented (lines 11–12).
Page 17
LEDAcrypt
Algorithm 1: BF decoding
Input: x: QC-LDPC error-affected codeword as a 1 × pn0 binary vector.
s: QC-LDPC syndrome. It is a pr0 × 1 binary vector obtained as s = HxT .
Hsparse: sparse version of the parity-check matrix Hpr0 ×pn0 , represented as a
dv × n0 integer matrix containing for each of its n0 columns, the
positions in {0, 1, . . . , pr0 − 1} of the asserted binary coefficients in the
first column of each circulant block in the sequence of n0 submatrices
in H = [H0 | H1 | . . . | Hn0 −1 ] (any of which with size p × p).
Output: c: error-free 1 × pn0 codeword
decodeOk: Boolean value denoting the successful outcome of the decoding action
Data: imax: maximum number of allowed iterations before reporting a decoding failure
1 codeword ← x // bitvector with size pn0
2 syndrome ← s // bitvector with size pr0
3 iter ← 0 // scalar variable denoting the number of iterations
4 repeat
5 iter ← iter + 1
6 unsatParityChecks ← 01×pr0 // counters of unsatisfied parity checks
7 for i = 0 to n0 − 1 do
8 for exp = 0 to p − 1 do
9 for j = 0 to dv − 1 do j k
10 assertedHbitPos←(exp +Hsparse[j][i] ) mod p+p· Hsparse[j][i] div p
11 if syndrome[assertedHbitPos] = 1 then
12 unsatParityChecks[i · p + exp] ← 1 + unsatParityChecks[i · p + exp]
13 threshold ← ThresholdChoice(iterationCounter, syndrome)
14 for i = 0 to pn0 − 1 do
15 if unsatParityChecks[i] ≥ threshold then
16 BitToggle(codeword[i]) // codeword update
17 for j = 0 to dv − 1 do j k
18 assertedHbitPos ← (exp + Hsparse[j][i]) mod p + p · Hsparse[j][i] div p
19 BitToggle(syndrome[assertedHbitPos])
20 until syndrome 6= 01×pr0 and iter < imax
21 if syndrome = 01×pr0 then
22 return codeword, true
23 else
24 return codeword, false
Page 18
LEDAcrypt
Once the computation of the number of unsatisfied parity checks per codeword bit is completed, a
decision must be taken on which of them are to be flipped, as they are deemed error affected. The
choice of the threshold that the number of unsatisfied parity checks should exceed, can be done
a-priori from the code parameters, or determined taking into account the iteration reached by the
decoder and the current weight of the syndrome.
Thus, the procedure toggles the values of all the codeword bits for which the number of unsatisfied
parity checks matches the maximum one (lines 14–16). Once this step is completed, the values
of the parity checks should be recomputed according to the new value of the codeword. While
this can be accomplished by pre-multiplying the transposed codeword by H, it is more efficient to
exploit the knowledge of which bits of the codeword were toggled to change only the parity-check
values in the syndrome affected by such toggles. Lines 17–19 of Algorithm 1 update the syndrome
according to the aforementioned procedure, i.e., for a given i-th codeword bit being toggled, all
the syndrome values corresponding to the positions of the asserted coefficients in the i-th column
of H are also toggled. Once either the decoding procedure has reached its intended fixed-point,
i.e., the syndrome is a zero-filled vector, or the maximum number of iterations has been reached,
Algorithm 1 returns its best estimate for the corrected codeword, together with the outcome of the
decoding procedure (lines 21–24).
The McEliece cryptosystem is a Public-Key Encryption (PKE) scheme proposed by Robert McEliece
in 1978 [52] and exploiting the hardness of the problem of decoding a random-like linear block code.
In the original proposal, the McEliece cryptosystem used irreducible Goppa codes as secret codes,
but its construction can be generalized to other families of codes. The triple of polynomial-time
algorithms ΠMcE =(KeygenMcE , EncMcE , DecMcE ) defining the scheme are as follows:
• The key-generation algorithm considers a binary linear block code C (n, k), with codeword
length n, information word length k and outputs a secret key sk McE defined as the generator
matrix Gk×n of a code C (n, k) able to correct t ≥ 1 or less bit errors, plus a randomly chosen
invertible binary matrix Sk×k , named scrambling matrix, and a binary permutation matrix
Pn×n :
sk McE ← {S, G, P }. (1.6)
The corresponding public key pk McE is computed as the generator matrix G0k×n of a permutation-
equivalent code with the same size and correction capability of the original code:
• The encryption algorithm takes as input a public key pk McE and a message
composed as a
1 × k binary vector u, and outputs a ciphertext x1×n ← EncMcE McE
pk , u computed as:
x = uG0 + e, (1.8)
where e is a 1 × n random binary error vector with weight t (i.e., with exactly t asserted bits).
Page 19
LEDAcrypt
As mentioned, in the original McEliece cryptosystem, algebraic code families (namely, Goppa
codes) provided with bounded-distance decoders were used. In such a case, since the number
of errors correctable by the secret code is t, the correction of the error vector eP −1 is ensured
by design and the cryptosystem exhibits a zero DFR.
It is also worth noticing that the original McEliece cryptosystem only provides One Wayness
against Chosen Plaintext Attack (OW-CPA) guarantees, which means that, given a cipher-
text, it is computationally impossible to recover the plaintext without knowing the private
key. Suitable conversions of the cryptosystem must be exploited in order to achieve Indistin-
guishability Under Adaptive Chosen Ciphertext Attack (IND-CCA2), which means that an
adversary with access to a decryption oracle (that knows the private key) cannot distinguish
whether a string is a decryption of a legitimate given ciphertext or a random valid plaintext
message. The decryption oracle cannot be queried on the given ciphertext. When these
conversions are used, some constraints on the public code can be relaxed.
The Niederreiter cryptosystem [55] is a code-based cryptosystem exploiting the same trapdoor in-
troduced in the McEliece PKE [52] with an alternative formulation. The Niederreiter PKE employs
syndromes and parity-check matrices in place of codewords and generator matrices, as employed by
the algorithms in the McEliece PKE. The first proposal of such a scheme used Generalized Reed-
Solomon (GRS) codes that were proven to make the whole construction vulnerable [67]. However,
when the same family of codes is used, Niederreiter and McEliece cryptosystems exhibit the same
cryptographic guarantees [49]. The triplet of polynomial-time algorithms ΠNie =(KeygenNie , EncNie ,
DecNie ) defining the scheme are as follows:
• The key-generation algorithm considers a binary linear block code C (n, k), with codeword
length n, information word length k and outputs a secret key sk Nie defined as the parity-
check matrix Hr×n of a code C (n, k), with r = n − k, able to correct t ≥ 1 or less bit errors,
plus a randomly chosen invertible binary matrix Sr×r , named scrambling matrix:
Note that the knowledge of H 0 is not amenable to be employed with an efficient decoding
algorithm as it actually hides the structure of the selected code.
• The encryption algorithm takes as input a public key pk Nie and a message composed as a 1×n
binary vector e with exactly t asserted bits, and outputs a ciphertext xr×1 ← EncNie pk Nie , e
computed as the syndrome of the original message:
x = H 0 eT = SHeT . (1.12)
Page 20
LEDAcrypt
In the original Niederreiter cryptosystem, algebraic code families provided with bounded-
distance decoders were considered. In such a case, the syndrome decoding algorithm allows
to deterministically recover the original error vector with weight t and the cryptosystem
exhibits a zero DFR.
In the following we describe the algorithms of both McEliece ΠMcE =(KeygenMcE , EncMcE , DecMcE ) and
Niederreiter ΠNie =(KeygenNie , EncNie , DecNie ) cryptosystems instantiated with QC-LDPC codes.
• The key-generation algorithms KeygenNie in Fig. 1.1(a) and KeygenMcE in Fig. 1.1(b) con-
sider a QC-LDPC code C (n, k), with codeword length n = pn0 , information word length
k = p(n0 − 1), where n0 ∈ {2, 3, 4}, p is a prime number such that ordp (2) = p − 1.
The first step in the key generation procedures is to find two random binary matrices that
correspond to the (secret) quasi-cyclic p × pn0 parity-check matrix H of a QC-LDPC code
with the mentioned parameters and a pn0 × pn0 quasi-cyclic sparse binary matrix Q. The
matrix H is structured as 1×n0 circulant blocks, each of which with size p×p and with a fixed
odd number of asserted elements per row/column, denoted as dv (to guarantee invertibility
of each block – see Th. 1.1.3)
L = HQ (1.14)
Page 21
LEDAcrypt
Figure 1.1: Summary of the key generation algorithms of both Niederreiter (a) and McEliece (b)
cryptosystems, instantiated with QC-LDPC codes
with the same rank of H, thus rank(L) = p. This implies that every p × p block of L =
[L0 , L1 , . . . , Ln0 −1 ] has the same rank (i.e., rank(Li ) = p) and is therefore invertible. As a
consequence, in line 6 of both Fig. 1.1(a) and Fig. 1.1(b), there is no need to check whether
Ln0 −1 admits a multiplicative inverse or not.
It is worth noting that the necessary and sufficient conditions in Th. 1.1.3 state that an
Page 22
LEDAcrypt
invertible binary circulant block matrix must exhibit a weight of any row/column equal to
0 −1
nX
an odd number. Since Lj = Hi Qij , the weight of Lj satisfies also the following wt(Lj ) =
i=0
dv × m − 2c, where the parameter c ≥ 0 is justified with an argument analogous to the one
Pn0 −1
reported in the proof of Th. 1.1.4. From this, it is easy to conclude that m = i=0 mi must
also be an odd number.
Lines 4–5 in both Fig. 1.1(a) and Fig. 1.1(b) check that all the blocks of the matrix L have a
weight equal to dv ×m, and repeat the generation process until this condition does hold. Such
a constraint is imposed as the weight of the blocks of L is a security parameter in LEDAcrypt,
and then must be appropriately constrained.
Starting from the multiplicative inverse of Ln0 −1 , the following matrix can be computed
M = L−1
n0 −1 L = [M0 |M1 |M2 | . . . |Mn0 −2 |Ip ] = [Ml |I] (1.15)
where I denotes the p×p identity matrix. The matrix M in (1.15) is the parity-check matrix of
the public code in systematic form, and can easily be converted into the systematic generator
matrix of the same code.
The private key for each of the schemes at hand consists of the two matrices H and Q:
sk Nie ← {H, Q}, sk McE ← {H, Q}. Comparing these with the keypair definitions in the
original Niederreiter and McEliece schemes, it is worth noting that the original non-singular
scrambling matrix is replaced with the multiplicative inverse of the last circulant block result-
ing from the multiplication of the secret matrices, i.e., L−1
n0 −1 with L = HQ = [L0 , . . . , Ln0 −1 ].
The secret permutation matrix introduced in the original McEliece scheme has no counterpart
in our scheme because we will use a public generator matrix in systematic form and will embed
the result into a CCA2 construction, thus making it redundant from a security perspective.
The computation of the public key depends on whether the scheme is Niederreiter-based
or McEliece-based. Specifically, the computation of the public key values will yield either
a parity-check matrix or a generator matrix, both in systematic form, referring to a public
code without a computationally efficient decoding/decryption algorithm (see lines 7–9 in both
Fig. 1.1(a) and Fig. 1.1(b)).
It is worth noting that in practice there is no need to store a public key including circulant
matrices (blocks) that are identity p × p matrices. Moreover, since both H and Q are formed
by sparse circulant matrices (blocks), it is convenient to store each block as the set of integers
representing the positions of non-zero elements of the first row of each block. The said set of
integers requires at least dlog2 (p)e bits to be stored.
P 0 −1
If we consider that the circulant blocks in any block row of Q have overall weight m= ni=0 mi ,
the size of sk Nie (equiv., sk McE ) is |sk Nie |=n0 (dv + m) dlog2 (p)e bits.
We note that, given the fast generation procedure for H and Q, a viable option to reduce
the size of the private key at rest is to store the seed of a cryptographically secure Pseudo
Random Number Generator (PRNG) from which H and Q are generated.
• The encryption algorithm EncryptNie in Fig. 1.2(a) takes as input a public key pk Nie =
[M0 | . . . Mj | . . . | Mn0 −2 |I] and a message composed as a randomly chosen 1 × pn0 binary
vector e with exactly t asserted bits, and outputs a ciphertext that is the p × 1 syndrome of
the message computed multiplying the sequence of n0 message blocks of size 1 × p by the QC
parity-check matrix of the public code included in the public key
s = [M0 | . . . Mj | . . . | Mn0 −2 |I]eT .
Page 23
LEDAcrypt
(a) (b)
Figure 1.2: Encryption algorithm of Niederreiter (a) and McEliece (b) cryptosystems, instantiated
with QC-LDPC codes
The encryption algorithm EncryptMcE in Fig. 1.2(b) takes three input parameters: a plaintext
message composed as a 1 × p(n0 − 1) binary vector u, an error vector composed as a 1 × pn0
binary vector e with exactly t asserted bits (supposed to be uniformly and randomly picked
from the set of binary vectors with the same weight), and a public key pk McE structured as a
T
McE
QC generator matrix with size (n0 − 1)×n0 , i.e., pk = Z | [M0 | . . . | Mn0 −2 ] , where
Z = diag(I, . . . , I) is a diagonal block matrix composed as n0 − 1 replicas of the identity
circulant block I (i.e., a p × p identity matrix), which coincides with an (n0 − 1)p × (n0 − 1)p
identity matrix. The algorithm outputs a ciphertext c that is an error affected 1 × pn0
Page 24
LEDAcrypt
binary vector composed as follows. The sequence of the first (n0 −1) binary vectors from the
plaintext message (each with size 1 × p) followed by the binary vector obtained as the sum of
products between the corresponding 1 × p blocks in u and the ones in the public key portion
[M0 | . . . | Mn0 −2 ], is added to the sequence of n0 binary vectors of the error vector e
0 −2
nX
c = [e0 | . . . |en0 −2 |en0 −1 ] + u0 | . . . | un0 −2 | uj Mj .
j=0
• The decryption algorithm DecryptNie in Fig. 1.3(a) takes as input a secret key sk Nie =
{H, Q} and a ciphertext that is identified with a p × 1 binary syndrome Tthat is supposed to
be computed as s = [M0 | . . . | Mn0 −2 |I]eT = L−1 [L
n0 −1 0 , . . . , L n0 −1 ] e . The outcome of
the decryption algorithm is the original binary vector e. The initial steps of the algorithm
(lines 1-2) consist in taking the secret matrices, re-computing L = HQ and executing the
multiplication between the received syndrome s and the last block of L = [L0 , . . . , Ln0 −1 ] to
T
obtain a new p × 1 binary vector s0 = Ln0 −1 s = HQ eT = H QeT = H eQT .
Defining the expanded error vector as e0 = eQT , the binary vector s0 can be thought of as
s0 = He0T to the end of applying a QC-LDPC decoding
−1 procedure and recover both e0 and
0
subsequently the original error vector e = e QT .
QC-LDPC decoders are not bounded distance decoders, and some non-zero DFR must be
tolerated. The system parameters can be chosen such that the DFR is acceptably small;
for this purpose, the average decoding radius of the private code must be sufficiently larger
than the Hamming weight of e0 , which is t0 ≤ mt and approximately equal to mt, due to
the sparsity of Q and e. As shown in line 3 of algorithm DecryptNie in Fig. 1.3(a), the
LEDAdecoder algorithm performs−1 the computation of the error vector e saving altogether
the computation of both QT and the subsequent vector-matrix multiplication.
The decoder adopted in versions < 2.5 of the LEDAcrypt specification allows the error vector
e to be recovered directly from the syndrome s and the secret matrices H and Q, while
in the current and future specifications and implementations of LEDAcrypt (i.e., the ones
with ver. ≥ 2.5) we take the matrix Q equal to an identity matrix of the same size, while
the values of the parameters of the cryptoschemes are chosen accordingly as per the NIST
security requirements (see Chap. 4). As a consequence, the analyses and algorithms detailed
in Chap. 3 to describe the LEDAdecoder allow e to be recovered from the syndrome s and
the secret matrix L = H.
If the decoding procedure terminates successfully (i.e., the decoder verifies that the syndrome
corresponding to the recovered error is equal to zero), the procedure returns also a Boolean
variable res = true along with the recovered value e (see line 3 of algorithm DecryptNie in
Fig. 1.3(a)). On the other hand, when the decoding fails, the algorithm DecryptNie returns
the value of the Boolean variable res set to false and a null value for the original message
e = ⊥ (lines 4–5).
The decryption algorithm DecryptMcE in Fig. 1.3(b), takes as input a secret key sk McE =
{H, Q} and a ciphertext that is identified with an error affected codeword c = [c0 , . . . , cn0 −1 ]
that was computed as the element-wise addition between a 1 × pn0 error vector Pn0 −2with exactly t
asserted bits and a 1×pn0 binary vector [u0 , . . . , un0 −2 , blk], where blk = j=0 uj (Ln0 −1 HQ).
The outcomes of the decryption algorithm are the original error vector e and message u.
The initial steps of the algorithm (lines 1-2 in Fig. 1.3(b)) consist in taking the secret matrices,
re-computing L = HQ and executing the multiplication between the parity-check matrix L
Page 25
LEDAcrypt
Figure 1.3: Decryption algorithm of Niederreiter (a) and McEliece (b) cryptosystems, instantiated
with QC-LDPC codes
Page 26
LEDAcrypt
In this section we describe the structure of the Key Encapsulation Method of LEDAcrypt, where
the QC-LDPC Niederreiter cryptoscheme described in the previous section, is employed as a passive
secure (aka one wayness CPA secure or OW-CPA secure) building block1 to design the LEDAcrypt-
KEM scheme with a long term public/private keypair, and the LEDAcrypt-KEM-CPA scheme with
an ephemeral public/private keypair. In particular, each of the said schemes is composed by a triple
of algorithms for the generation of the public/private keypair, the encapsulation of a secret session-
key and the decapsulation of a secret session-key, respectively. The key generation algorithms
of both schemes follow the definitions and steps already shown in Figure 1.1(a) as Algorithm 2,
while they differ in the size of the generated keypairs and in the numerical values of the specific
parameters, as described in the following chapters. An encapsulation algorithm enables a sender
to compute a secret session-key and the corresponding encrypted (or encapsulated) payload, i.e., a
ciphertext, employing the public key of the intended receiver. A decapsulation algorithm enables
the intended receiver to employ her private key to decrypt a ciphertext computed with her public
key and subsequently derive the secret session-key chosen by the sender.
The LEDAcrypt-KEM scheme employs long term keys for scenarios where key reuse is desirable.
Assuming the use of a QC-LDPC decoder with appropriately low DFR, the encryption and de-
cryption transformations of the OW-CPA Niederreiter scheme reported in Figure 1.2(a) and in
Figure 1.3(a), respectively, are used in the construction reported in Figure 1.4 to obtain the en-
capsulation and decapsulation algorithms of a Key Encapsulation Module (KEM) with IND-CCA2
guarantees.
The IND-CCA2 security guarantees of LEDAcrypt-KEM are obtained by employing a QC-LDPC
decoder with DFR=2−λ and the construction described by algorithms Encap and Decap in Figure
1.4, ensuring protection with a security level 2λ , against passive and active adversaries. The security
against a passive adversary is provided by the OW-CPA property of the underlying Niederreiter
scheme and the fact that session key is derived by a Key Derivation Function (KDF). The security
against active adversaries, including the ones considered in the so called reaction attacks [56] – which
induces a decryption failure choosing plaintext messages that are in the message space, to recover
the private key – is provided building on one of the constructions reported in [39], augmented with
a message confirmation mechanism to prevent the choice of a specific plaintext message.
In particular, lines 3–4 of Algorithm Encap in Figure 1.4(a), and lines 1–2, 7–10 of Algorithm
Decap in Figure 1.4(b) coincide with the Um ⊥ construction of an IND-CCA2 KEM defined in [39].
Specifically, Theorem 3.6 in [39] establishes that, in the random oracle model (ROM), the IND-
CCA2 security of the construction Um ⊥ tightly reduces to the OW-CPA security of the underlying
deterministic public key cryptoscheme, with a bound on the probability of observing a decryp-
tion failure when the OW-CPA scheme decrypts a legitimate ciphertext, that is formalized as
δ-correctness of the OW-CPA public key cryptoscheme. In our case, the said OW-CPA public key
1
A deterministic (key-generation, encrypt, decrypt) triple of algorithms is said to be one wayness (OW) if the
probability that a polynomial time attacker can invert a randomly generated ciphertext c = Encrypt(m, pk) (where
m is chosen at random in the plaintext space) is a negligible function of the cryptosystem parameters. The term
“chosen plaintext attack” (CPA) is used because the attacker is not allowed to make decryption queries.
Page 27
LEDAcrypt
(a) (b)
Figure 1.4: Description of the key encapsulation and decapsulation primitives of LEDAcrypt-KEM
cryptoscheme coincides with the Niederreiter scheme described in Section 1.2 (see Figure 1.1(a),
Figure 1.2(a), Figure 1.3(a)).
The same construction and the same assumption about the OW-CPA security of the underlying
deterministic cryptosystem were proven to achieve IND-CCA2 also in the quantum oracle model
(QROM) in [41], with a tighter security reduction being reported in [42].
The proofs in [39, 41, 42] define the δ-correctness of the underlying OW-CPA scheme as the prob-
ability that a (possibly unbounded) adversary is able to induce a decryption failure on a valid
ciphertext, taken as the maximum over all possible plaintexts, and averaged over all the keypairs.
The proofs require that δ ≤ 2−λ for the KEM construction to exhibit IND-CCA2 assurances with
a security level 2λ .
The definition of δ-correctness does not match the one usually employed to analyze the failure
probability of a Niederreiter public key cryptosystem relying on QC-LDPC codes. Indeed, it is
customary, in the coding theory community, to evaluate the decoding failure probability (or decod-
ing failure rate – DFR) of a given code (i.e., Niederreiter keypair) averaged over all the possible
plaintexts (i.e., error vectors).
We solve the mismatch between the said definitions, proposing an augmentation of the Um ⊥ con-
struction, where we add a mechanism to prevent the adversary from exploiting the advantage which
he may have in selecting a set of input messages, i.e., error vectors, causing a decryption failure
Page 28
LEDAcrypt
LongTermSeed c x
Concat Decrypt(sk,s)
TRNG
e,res Hash(e)
seed
0 1 wt(e) == t ?
XOF AND seed
res == true ?
y
KDF(e) e Hash(e)
e == test_e? test_e XOF(seed)
KDF(y)
Encrypt(e,pk) x
K c K
(a) (b)
Figure 1.5: Graphical representation of the IND-CCA2 LEDAcrypt-KEM Encap (a), and
LEDAcrypt-KEM Decap primitives (b)
with probability different from (and possibly higher than) the average over all the error vectors.
The said mechanism makes the maximum probability of a decryption failure over all plaintexts
coincide with the average failure probability over all plaintexts, by taking off from the adversary
the opportunity to feed the KEM encapsulation function with a plaintext message of her choice.
Figure 1.5 shows a graphical description of the operations of Algorithms Encap and Decap in
Figure 1.4, where the gray boxes highlight the operations coinciding with the IND-CCA2 KEM
based on the Um
⊥ construction in [39].
As shown in Figure 1.5(a), our augmentation generates the confidential message (i.e., the error
vector), e, to be processed by the underlying OW-CPA Niederreiter encryption function as the
output of an Extensible Output Function (XOF) fed with an ephemeral secret value seed. The error
vector e is subsequently encapsulated in the encrypted payload, c, following the Um ⊥ construction.
The rest of the mechanism computes a tag, x, as the bitwise-xor between the seed and the hash-
digest of the error vector itself. The recipient of the message is meant to receive both the encrypted
payload c and the tag x to verify that the error vector was actually generated as the output of an
XOF.
As shown in Figure 1.5(b), the recipient of c and x, after recovering the error vector, e, via the QC-
LDPC syndrome decoding performed by the underlying OW-CPA Niederreiter decryption function
fed with c and the private key, proceeds to recover the XOF input employed by the sender (seed)
by computing the bitwise-xor between the received tag x and the hash-digest of the recovered error
vector. Subsequently, a test error vector e0 is computed as XOF(seed) at the recipient end. If
the test error vector e0 matches the one output from the syndrome decoding, e, there are only
two possibilities. The first one makes the recipient confident that the recovered error vector e
(i.e. the received KEM plaintext message) was actually generated as the output of an XOF by
a sender choosing an ephemeral seed value and following the proposed construction. The second
possibility is that an adversary is able to obtain a preimage of the XOF for an arbitrary error vector
e. Indeed, if the adversary is able to do so, he can choose a plaintext message e for the OW-CPA
Niederreiter encapsulation function at his own will and can compute the accompanying tag x as
XOF−1 (e) ⊕ Hash(e). Therefore, if the computation of a preimage of the XOF is possible, the
adversary can induce a decryption failure probability of the OW-CPA Niederreiter scheme, during
Page 29
LEDAcrypt
the execution of the decapsulation function, that is different from the one averaged over all error
vectors, and the overall construction cannot be claimed to be IND-CCA2.
We summarize the security guarantees of our augmented construction in the following lemma.
Lemma 1.3.1 Given an OW-CPA Niederreiter public key cryptosystem, based on QC-LDPC
codes, with DFR=δ, and a preimage-resistant XOF, the LEDAcrypt-KEM construction shown in
Figure 1.4, or equivalently in Figure 1.5, provides IND-CCA2 guarantees in the ROM and QROM
model.
Proof. In all the trivial cases where an adversary abides by the error vector generation mechanism
shown in Figure 1.4, he will pick a random error vector among all the possible ones, therefore
matching the maximum probability of a decryption failure with the average one over all error
vectors.
We now show that, being able to derive the value x for a chosen value of e is at least as hard as
finding a preimage of XOF, reducing the problem of obtaining a preimage of XOF for an error-
vector looking output to finding a valid x passing the check expressed as the third clause at line
7, in Figure 1.4. To this end, consider the algorithm Afind tag (e), which computes a tag value x so
that e passes the check expressed as the third clause at line 7, in Figure 1.4. Given that the said
check validates the equality e = XOF(x ⊕ Hash(e)), the value x computed by Afind tag (e) should
be such that x = XOF−1 (e)⊕Hash(e). We are now able to provide an algorithm Ainv XOF (b), which
requires only a single call to Afind tag (e) to find a preimage a for a given b through XOF, that is,
XOF(a) = b. This, in turn, proves that computing Ainv XOF (b) is no harder than Afind tag (e), or, in
other words, that computing Afind tag (e) is at least as hard as computing Ainv XOF (b).
The procedural description of Ainv XOF (b) is as follows:
(ii) h ← Hash(b)
(iii) return h ⊕ y
The correctness of the provided Ainv XOF (b) is proven observing that y = XOF−1 (b) ⊕ Hash(b),
therefore h ⊕ y can be rewritten as Hash(b) ⊕ XOF−1 (b) ⊕ Hash(b) = XOF−1 (b), obtaining a
preimage of b through XOF. Therefore, for an attacker to be able to employ a chosen value for the
error vector e instead of a randomly drawn one it is necessary for him to be able to find a preimage
of XOF, which is not feasible by hypothesis. Therefore, the best correctness error δ the attacker
may achieve matches the DFR of the underlying OW-CPA Niederreiter scheme, in turn making the
proofs in the ROM and QROM model in [39, 41, 42] hold for the steps denoted as Um ⊥ contained in
The LEDAcrypt-KEM-CPA scheme employs ephemeral public/private keypairs for scenarios where
the security property of indistinguishability against chosen ciphertext attacks (IND-CPA) is usually
required and the Perfect Forward Secrecy (PFS) property is desirable.
Page 30
LEDAcrypt
The encryption and decryption transformations of the LEDAcrypt-KEM-CPA scheme employ the
same IND-CCA2 construction defined in the previous section. The differences with respect to the
scheme with long-term keys will be in terms of DFR and execution times. Indeed, LEDAcrypt-
KEM-CPA will exhibit, w.r.t. the LEDAcrypt-KEM scheme with long-term keys, a DFR=10−9
per public/private keypair as well as faster key-generation times and faster decryption times due
to different choices of the underlying QC-LDPC code parameters and the choice of the decoder
applied by the OW-CPA Niederreiter building block.
As the construction reported in Figure 1.5 allows to exhibit IND-CCA2 guarantees also for the
LEDAcrypt-KEM-CPA scheme, this implies that IND-CPA guarantees are provided as well. A
simple lemma declaring the IND-CPA property of the LEDAcrypt-KEM-CPA scheme is as follows.
Lemma 1.3.2 Given an OW-CPA Niederreiter public key cryptosystem, based on QC-LDPC,
and a preimage-resistant KDF, the LEDAcrypt-KEM-CPA construction shown in Figure 1.5 is
IND-CPA secure.
Proof. The IND-CPA definition for a KEM states the computational indistinguishability of the
two following games: Game1 and Game2.
In Game1 a challenger generates a pair of public/private keys (pk, sk). Subsequently, he picks a
confidential message e (i.e., the error vector in our case) in a uniformly random way, and applies
the KEM encapsulation function to get a session key K0 and the corresponding encrypted payload
c (i.e., (c, K0 ) ← Encap(e, pk)). Finally the challenger in Game1 outputs the tuple (pk,c,K0 ).
In Game2 a challenger generates a pair of public/private keys (pk, sk). Subsequently, he picks a
confidential message e (i.e., the error vector in our case) in a uniformly random way, and applies
the KEM encapsulation function to get a session key K0 and the corresponding encrypted payload c
(i.e., (c, K0 ) ← Encap(e, pk)). Finally the challenger in Game2 outputs the tuple (pk,c,K1 ), where
K1 is a session key that is uniformly random picked.
In case of Game1, an honest run of the KEM encapsulation function is executed and the session
key is derived as the output of a KDF fed with the output of an XOF which is in turn fed with
an ephemeral random seed. Since the outputs of a KDF and an XOF are indistinguishable from
uniformly random sequences, the only way the adversary could distinguish K0 from K1 would be
to derive the error vector e by inverting the KDF and subsequently match c with the outcome of
the KEM encapsulation function fed with the recovered e and the public key. Alternatively, the
adversary should guess the private key to correctly decapsulate the ciphertext c, derive the error
vector and compute the corresponding session key. Since, the guessing of the private key of the
Niederreiter public key cryptosystem is prevented by the OW-CPA hypothesis and also the KDF is
preimage resistant by hypothesis, the given games are computationally indistinguishable applying
the key generation and encapsulation functions of the LEDAcrypt-KEM-CPA scheme.
Concerning the additional security guarantees of the LEDAcrypt-KEM-CPA scheme, it is easy to
acknowledge that the IND-CCA2 construction (see Figure 1.5) of the scheme is sufficient to foil
the private key or message recovery attempts executed by active adversaries able to exploit the
accidental or moderate reuse of the ephemeral public/private keypairs, Indeed, in the event of a
keypair reuse, a (possibly active) attacker [56] will not be able to craft messages inducing decoding
failures faster than drawing them randomly, due to the message confirmation tag included in the
IND-CCA2 construction. As a consequence, reusing an ephemeral keypair for a number of times
1
DFR is unlikely to provide any advantage to an attacker.
Page 31
LEDAcrypt
In this section we describe the structure of the Public Key Cryptosystem of LEDAcrypt, where the
QC-LDPC McEliece cryptoscheme described in Section 1.2 is employed as a building block in a
construction allowing to exhibit IND-CCA2 guarantees. While it is possible to employ LEDAcrypt-
KEM in combination with a symmetric encryption primitive to get a Key Encapsulation Module
+ Data Encapsulation Mechanism (KEM+DEM), we note that such an approach may lead to a
non-negligible ciphertext expansion in case plaintexts are small in size.
In particular, the LEDAcrypt-PKC scheme is composed by a triple of algorithms for the genera-
tion of the public/private keypair, the encryption transformation of a plaintext message with
arbitrary length, and the decryption transformation of a ciphertext with arbitrary length,
respectively. The key generation algorithm of LEDAcrypt-PKC follows the definitions and steps
already shown in Figure 1.1(b) as Algorithm 3, while the encryption and decryption transforma-
tions are defined augmenting the McEliece encryption and decryption primitives in Figure 1.2(b)
and Figure 1.3(b) as prescribed by the IND-CCA2 γ-construction introduced in [45]. It is worth
noting that the systematic form of the generator matrix of the public code included in the public
key, pk McE , would easily allow any observer to recover the information word u embedded in an
encrypted message c, without recovering the private key of the cipher (i.e., sk McE = {H, Q}). Nev-
ertheless, the conversion proposed by Kobara and Imai in [45], with the purpose of maximizing
the amount of message encrypted via a McEliece encryption transformation, allows IND-CCA2
guarantees to be provided in the Random Oracle Model (ROM). Indeed, one of the elements of
the Kobara and Imai γ-construction [45] is to obfuscate the plaintext message, before encrypting it
with the McEliece transformation, employing a bit-wise xor between the plaintext message and a
secret pseudo-random mask that can be re-computed also at the recipient side. As a consequence,
the confidentiality of the information word as well as the secrecy of the private key remain guaran-
teed by the hardness of the NP-hard syndrome decoding problem that an adversary should face in
recovering the error vector knowing the ciphertext (codeword) and the systematic generator matrix
employed as public key.
Figure 1.6 and Figure 1.7 describe the encryption and decryption transformations of the IND-CCA2
LEDAcrypt-PKC, respectively. As the Kobara-Imai (KI) γ-conversion [45] is based on bit-wise
manipulations, to provide a clear and detailed description, we introduce some specific naming
conventions. Bit-string values will be reported with a teletype font name (e.g., the plaintext to be
encrypted will be denoted as ptx, and the resulting ciphertext will be denoted as ctx) while, the
length of a bit-string, s, will also be expressed in bits and denoted as ls .
A pseudo-code description of the KI-γ encryption algorithm is shown as Algorithm 10 in Fig. 1.6.
The main intuition of the KI conversion relies on xor-combining a padded plaintext message with
the output of a Deterministic Random Bit Generator (DRBG), seeded with the random bit-string
extracted from a True Random Number Generator (TRNG). To this end, we chose the NIST
standard PRNG CTR-DRBG, instantiated with AES-256 as its core block cipher. Specifically, the
original plaintext message (with size lptx bits) is prefixed with a bit-string named constant, having
lconst bits, and a bit-string named lenField, having llenField bits, containing the value of lptx .
This message is then concatenated, if necessary, with a sequence of null bits to align the over-
Page 32
LEDAcrypt
Figure 1.6: Description of the KI-γ encryption function adopted to define the encryption primitive
of LEDAcrypt PKC
all length of such l an extended plaintext m to a byte-multiple for implementation convenience (i.e.,
lconst +llenField +lptx
lextendedPtx = 8 · 8 ). The extended plaintext is then xor-combined with the output
of a DRBG to allow the resulting obfuscated plaintext to be safely enciphered. This ensures the
ciphertext to appear perfectly random.
In order for the IND-CCA2 property to hold, a fresh random value (i.e., a seed), encoded with lseed
bits, should be available from a TRNG for each message encryption. In our instance of the the KI-γ
construction, the bit-length of the DRBG seed equals 128, 192, 256 bits, depending on the security
level prescribed by NIST for the DRBG primitive, addressing category 1, 3, or 5, respectively. The
lenField bit-string is commonly encoded employing 64 bits, while the arbitrarily chosen value
encoded into the bit-string named constant equals, in our instance of the construction, a sequence
of lseed zero bits, i.e., constant = 0lseed . To be able to successfully decrypt the plaintext, the
seed of the DRBG is also enciphered alongside the message. The secret seed is xor-combined with
the digest computed by a hash function fed with the obfuscated plaintext. The correctness of the
resulting obfuscated seed is guaranteed by concatenating to the original seed value enough zero bits
to match the length of the hash digest in case lseed < lhash , thus obtaining lobfuscatedSeed = lhash .
The extended plaintext bits are subsequently split into two bit-strings, namely iword and leftOver.
The former, having size of liword bits, is employed as the information word u to be encoded by the
QC-LDPC code underlying the processing performed by the McEliece encryption function (i.e.,
Page 33
LEDAcrypt
Figure 1.7: Description of the KI-γ decryption function adopted to define the decryption primitive
of LEDAcrypt PKC
liword = p(n0 − 1)), while the latter (with size lleftOver bits) is concatenated with the output of the
McEliece encryption function to yield the final ciphertext (ctx) of the KI-γ construction.
The obfuscated secret seed bits are employed as input to the constant-weight encoding function to
obtain a binary error vector e, with exactly t asserted bits, which is in turn fed into the McEliece
encryption function to complete a proper initialization of the encryption process. In case the
number of bits of the obfuscated seed (i.e., lobfuscatedSeed = lhash ) are less than the ones required
by the constant-weight encoding procedure, random bits are extracted from the DRBG (or from a
new random source) to match the requirements.
The final ciphertext is composed by a bit-string made of the binary elements of the codeword
computed by the McEliece encryption function, concatenated with the leftOver bit-string. In
case the bit-length of leftOver is not a multiple of 8, leftOver is prefixed with a zero pad up to
the nearest byte for the sake of implementation ease and efficiency.
It is worth noting that the KI-γ construction allows a plaintext message of arbitrary length to be
encrypted. Employing the LEDAcrypt-PKC construction, plaintext messages longer than p(n0 −
1) − lconst − llenField have a fixed ciphertext expansion of p + lconst + llenField .
In contrast a LEDAcrypt-KEM+DEM approach requires a fixed ciphertext expansion of pn0 bits.
The described procedure employs the QC-LDPC code parameters of choice (i.e., the ones of
Page 34
LEDAcrypt
C (pn0 , p(n0 − 1)) that allows up to t bit errors to be corrected), and a hash function, Hash(),
having a digest with a byte-aligned bit-length lhash as configuration parameters. We chose as our
hash function, to instantiate the construction, the NIST Standard SHA-3 hash function.
Algorithm 11 in Fig. 1.7 reports the pseudocode of the LEDAcrypt-PKC decryption procedure. The
algorithm performs, in reverse order, the steps dual to the ones in the encryption procedure. The
value in the first lconst bits of the retrieved extended plaintext message is compared with the fixed
value of the constant considered in the KI-γ to reject the decryption outcome in case they mismatch.
The rationale underlying the design choice of prefixing to the plaintext with the constant string is
that the probability of the DRBG generating the same initial lconst bits when it is fed with distinct
seeds is negligible (2−128 , 2−192 , 2−256 , respectively) and the probability of failing the detection of
a correct decryption of the de-obfuscated concatenation of constant, lenField, and plaintext by
checking only the value constant is also negligible.
As the McEliece decryption transformation employs a QC-LDPC code with an intrinsically non-
bounded decoding radius, the decryption process of the KI-γ may have an additional reason to
fail. As shown in the pseudocode, the IND-CCA2 property2 is preserved making a decoding failure
(consider the failed check at line 3 of Algorithm 11) indistinguishable from an accidental error
(consider the failed check at line 11 of Algorithm 11), and preventing active adversaries to freely
choose both the seed and the plaintext.
The security guarantees of the LEDAcrypt-PKC scheme are summarized in the following lemma.
Lemma 1.4.1 Given an OW-CPA McEliece cryptosystem based on the QC-LDPC syndrome de-
coding problem as the one in Section 1.2, with DFR=δ, the LEDAcrypt-PKC construction provides
IND-CCA2 guarantees in the ROM model.
Proof. (Sketch)
The Fujisaki-Okamoto (FO) generic conversion presented in [30,31] enhances a correct (i.e., DFR = 0)
OW-CPA public key encryption scheme to exhibit indistinguishability against adaptive chosen ci-
phertext (IND-CCA2) attacks in the random oracle model. In [45], Kobara and Imai introduced
three equivalent constructions (KI-α, KI-β, KI-γ) tailored specifically for the McEliece cryptossys-
tem rather than a general OW-CPA encryption scheme, showing considerable savings in terms of
ciphertext expansion and the possibility to employ also a systematic generator matrix without af-
fecting the IND-CCA2 guarantees. In [39] the authors solves the issues of the FO and KI conversions
related to the need of a perfectly correct OW-CPA public key scheme and tight security reductions,
leveraging the notion of δ-correctness. The δ-correctness definition in [39] considers the probability
of failure of the decryption primitive on a legitimate ciphertext, averaged over all possible key pairs
and taken as the maximum achievable by an adversary who knows both the public and private key
of the scheme. The decoding failure analysis adopted by the McEliece cryptosystem, underlying
the LEDAcrypt-PKC scheme, considers an upper bound on the decoding failure rate for any given
keypair, assuming that the adversary is randomly picking error vectors. This analysis matches the
requirements of the definitions by [39]. In fact, due to the KI-γ construction the maximization
of the decoding failure rate over all the plaintexts has no effects, since the plaintext message in
the KI-γ is employed as a codeword of the underlying systematic McEliece scheme, and thus does
not influence the decoding failure probability. Indeed, the error vector in the KI-γ construction is
picked as the output of a hash (encoded with a constant weight encoding procedure) and thus the
2
given that the decoding failure rate is low enough
Page 35
LEDAcrypt
attacker is not able to choose a specific error pattern unless he finds a preimage for the said hash.
A direct approach to implement the constant-weight encoding and decoding procedures may consist
in applying the conversion mandated by a combinatorial number system [44], which maps (bijec-
tively) any combination of n bits with t asserted bits to an integer value in the range {0, . . . , nt }
and viceversa thus, building an enumeration of the set of combinations. However, computing the
said bijection requires a computational effort in the same range as computing the binomial nt ,
which adversely impacts the performance of LEDAcrypt when n is in a range of tens of thousands
and t in the hundreds range.
In [7], we introduced an efficiently computable, bijective function which yields the desired mapping
with a computational complexity linear in the length of the string at hand, and delineate how
the said function can be computed in constant time. The details of the implemented construction
and the related experimental campaign showing speed-ups from three to five orders of magnitude
with respect to other state-of-the-art solutions, can be found either in [7] or in the pre-print draft
available at https://www.ledacrypt.org/documents/cf2020.pdf.
Page 36
Chapter 2
In order to analyze the security of the LEDAcrypt primitives, we start by providing a quantita-
tive assessment of NIST’s security level targets in terms of the number of classical and quantum
elementary operations required to perform an exhaustive key search against AES.
Then, we analyze the best known attacks against LEDAcrypt from the perspective of a passive
attacker, willing to perform either a message or key recovery attack given a ciphertext or the public
key, respectively. Finally, we analyze the best known attacks in the context of an active attacker
with the aim of providing IND-CCA2 guarantees for the LEDAcrypt cryptosystem.
The main known attacks against LEDAcrypt are those applicable against QC-LDPC code-based
cryptosystems [5], which have been studied for twelve years since the first proposal appeared in [4],
plus statistical attacks recently introduced in [27, 38]. We carefully analyze their capabilities and
address the design of parameters for the LEDAcrypt primitives to provide the required security
guarantees taking into account the computational cost reduction following from the use of a quan-
tum computer in the solution of the underlying computationally hard problems.
In addition to the aforementioned attacks, it has recently been pointed out [57] that the occurrence
of some weak keys may follow from the product structure of L in Eq. (1.14). Such a vulnerability
is also taken into account in this chapter, and suitably countered.
The bar to be cleared to design parameters for post-quantum cryptosystems is the computational
effort required on either a classical or a quantum computer to break the AES with a key size of
λ bits, λ ∈ {128, 192, 256}, through an exhaustive key search. The three pairs of computational
efforts required on a classical and quantum computer correspond to NIST Category 1, 3, and 5,
respectively [53]. Throughout the design of the parameters for the LEDA cryptosystems we ignore
Categories 2 and 4: if a cipher matching those security levels is required, we advise to employ the
parameters for Categories 3 and 5, respectively.
The computational worst-case complexity of breaking AES on a classical computer can be estimated
as 2λ CAES , where CAES is the amount of binary operations required to compute AES on a classical
computer on a small set of plaintexts, and match them with a small set of corresponding ciphertexts
37
LEDAcrypt
to validate the correct key retrieval. Indeed, more than a single plaintext-ciphertext pair is required
to retrieve AES keys [34]. In particular, a validation on three plaintext-ciphertext pairs should be
performed for AES-128, on four pairs for AES-192 and on five for AES-256.
Willing to consider a realistic AES implementation for exhaustive key search purposes, we re-
fer to [73], where the authors survey the state-of-the-art of Application-Specific Integrated Cir-
cuit (ASIC) AES implementations, employing the throughput per Gate Equivalent (GE) as their
figure of merit. The most performing AES implementations are the ones proposed in [73], and re-
quire around 16ki GEs. We thus deem reasonable to estimate the computational complexity of an
execution of AES as 16ki binary operations. We are aware of the fact that this is still a conservative
estimate, as we ignore the cost of the interconnections required to carry the required data to the
AES cores.
The computational complexity of performing an AES key retrieval employing a quantum com-
puter was measured first in [34], where a detailed implementation of an AES breaker is provided.
The computation considers an implementation of Grover’s algorithm [35] seeking the zeros of the
function given by the binary comparison of a set of AES ciphertexts with the encryption of their
corresponding plaintexts for all the possible key values. In a more recent work, the authors of [40]
proposed a more optimized version of the Grover oracle function to perform a key recovery attack
on AES, and provided complexity figures for the full circuit depth of the breaker of both their
proposals, and the one by [34], which we summarize in Table 2.1.
The NIST call takes into account the fact that quantum cryptanalytic machines implementing a
Grover-based key finding strategy may be limited more severely by a long sequential computation,
due to the stability of the qubits, and the (square-root) reduced effectiveness of parallelizing Grover’s
algorithm. As a consequence, the call specifies an expected maximum circuit depth for the quantum
codebreaking circuits, and expresses the corresponding AES codebreaking complexities taking the
effects of such a bound on the circuit depth into account.
Currently, no gate level description of an effective oracle function to perform quantum codebreaking
on a code-based cryptosystem is publicly available. It is therefore hard to assess directly how
much a limit on the circuit depth would affect the entire codebreaking circuit. To overcome this
lack of information, we compare the depth of the AES breaker in the ideal case (i.e., without
limiting the maximum depth of the quantum circuit) with a similarly maximum-depth unlimited
circuit to perform a Grover-based attack on LEDAcrypt. We consider this comparison sufficiently
fair, given the current state-of-the-art of the description of quantum circuits to attack code-based
cryptosystems, as we expect that a reduction of the maximum effective depth to have a similar
adverse effect on the performance of both the AES and the LEDAcrypt quantum codebreaking
circuits.
In Table 2.1 we summarize the computational cost of performing exhaustive key searches on all three
AES variants (i.e., with 128, 192, and 256 bits long keys), both considering classical and quantum
computers. For the sake of simplicity, in the following we will round up fractional exponents of the
reported complexities to the next integer value.
Page 38
LEDAcrypt
Table 2.1: Classical and quantum computational costs to perform an exhaustive key search on
AES. The quantum computational cost is expressed as the full depth of the AES circuit, assuming
no limits on the maximum constructible quantum circuit are present
NIST AES Key Size Classical Cost Quantum Cost [34] Quantum Cost [40]
Category (bits) (binary operations) (quantum gates) (quantum gates)
1 128 2128 · 214 · 3 = 2143.5 1.16 · 281 ≈ 281.2 1.08 · 275 ≈ 275.1
3 192 2192 · 214 · 4 = 2208 1.33 · 2113 ≈ 2113.4 1.33 · 2107 ≈ 2107.4
5 256 2256 · 214 · 5 = 2272.3 1.57 · 2145 ≈ 2145.6 1.57 · 2139 ≈ 2139.6
The set of computational decision problems for which an efficient solution algorithm can be devised
for a non-deterministic Turing Machine (TM) represents a fruitful computational class from which
primitives for asymmetric cryptosystems have been designed. Such a computational class, known
as the Nondeterministic-Polynomial (NP) class, is characterized by problems for which it is efficient
(i.e., there is a polynomial-time algorithm) to verify the correctness of a solution on a deterministic
TM, while finding a solution to the problem does not have in general an efficient algorithm on a
deterministic machine, hence the computational asymmetry required to build a cryptosystem.
When considering a quantum TM, i.e., the abstract computational model for a quantum computer,
the class of problems which can be solved in polynomial time, with the quantum TM providing the
correct answer with probability > 32 , is known as the Bounded-error Quantum Polynomial (BQP)
time class [15].
In 1997 Peter Shor proved that the integer factoring problem, which has its decisional version in
NP, is effectively in BQP [66], in turn demonstrating that a widely adopted cryptographic trapdoor
function can be broken in polynomial time by a quantum computer. Consequentially, to devise a
proper post-quantum asymmetric primitive, it is crucial to choose a computational problem which
resides outside BQP as its underlying foundation. While there is no current formal proof, a sub-
class of NP, the NP-complete problem class, is widely believed to contain computational problems
not belonging to BQP, thus allowing only a polynomial speedup in their solution with a quantum
TM.
LEDAcrypt is constructed starting from the computational search problems: i) performing the
decoding of a codeword of a random linear code, i.e., finding the error vector affecting a codeword,
known as Syndrome Decoding Problem (SDP), and ii) finding a bounded weight codeword of
a generic random linear code, known as the Codeword Finding Problem (CFP). The decision
problems corresponding to the SDP and CFP were shown to be NP-complete in [12], and, as for
all NP-complete problems, we have a natural search-to-decision reduction, as noted in [3, §2.5].
In addition to SDP and CFP, it is also known that determining the minimum distance of a random
code is also NP-hard, since its corresponding decision problem was proven to be NP-complete
in [74].
Finally, in [11] was proven that the syndrome decoding problem remains NP-hard, with its decision
version being NP-complete, even in the case of a random quasi-cyclic code.
Page 39
LEDAcrypt
LEDAcrypt primitives rely on the indistinguishability of their generator matrix G and parity-check
matrix H from the ones of a random, quasi-cyclic linear code. Provided that this indistinguishability
holds, the problems of performing a key recovery attack, or a message recovery attack against
LEDAcrypt are equivalent to solving either a CFP or an SDP respectively. This in turn makes the
Niederreiter and McEliece cryptosystems described in Section 1.2 OW-CPA under the assumption
that no efficient method to solve the CFP, or the SDP exists.
At the moment of writing, the only technique known to exploit the non random nature of the H
and G matrices of LEDAcrypt is to rely on the low-weight of the corresponding secret QC-LDPC
code.
The most computationally effective technique known to attack the LEDAcrypt schemes is the same
technique solving the SDP and CFP on random binary linear block codes with the same length,
rate and number of errors. Among the available set of techniques, the ones which are most effective
are known as Information Set Decoding (ISD). ISD was invented as a general efficient decoding
technique for a random binary linear block code by Eugene Prange in [58]. While ISD is indeed
more efficient than guessing the error affected positions in an incorrect codeword, its complexity is
still exponential in the number of errors affecting the codeword.
ISD can thus be employed as a message recovery attack in both McEliece and Niederreiter cryp-
tosystems, solving the corresponding SDPs. In the former case, it will recover the error pattern
from the error affected codeword constituting the ciphertext, allowing to recover the original mes-
sage; in the Niederreiter case, ISD will derive the error vector (which corresponds to the message
itself) from the syndrome constituting the ciphertext. When a message recovery attack of this kind
is performed against a cryptosystem exploiting quasi-cyclic codes, such as the case of LEDAcrypt,
it is known that a polynomial speedup factor equal to the square root of the circulant block size
can be achieved [65].
ISD algorithms have a long development history, dating back to the early ’60s [58], and provide a
way to recover the error pattern affecting a codeword of a generic random linear block code given
a representation of the code in the form of either its generator or parity-check matrix.
Despite the fact that the improvement provided by ISD over the straightforward enumeration of all
the possible error vectors affecting the codeword is only polynomial, employing an ISD technique
provides substantial speedups. It is customary for ISD variant proposers to evaluate the effectiveness
of their attacks considering the improvement on a worst-case scenario as far as the code rate and
number of corrected errors goes (see, for instance [9]). Such an approach allows to derive the
computational complexity as a function of a single variable, typically taken to be the code length
n, and obtaining asymptotic bounds for the behavior of the algorithms.
In our parameter design, however, we chose to employ non-asymptotic estimates of the compu-
tational complexity of the ISD attacks. Therefore, we explicitly compute the amount of time
employing a non-asymptotic analysis of the complexity of ISD algorithms, given the candidate pa-
rameters of the code at hand. This approach permits us to retain the freedom to pick rates for our
codes which are different from the worst-case one for decoding, thus exploring different trade-offs
in the choice of the system parameters. It is common for ISD variants to have free parameters,
Page 40
LEDAcrypt
which should be tuned to achieve optimal performance. We sought the optimal case by explicitly
computing the complexity of the ISD variant for a large region of the parameter space, where the
minimum complexity resides. In particular, we note that all such free parameters are tuning points
for computational tradeoffs.
We consider the ISD variants proposed by Prange [58], Lee and Brickell [47], Leon [48], Stern [69],
Finiasz and Sendrier [29], and Becker, Joux, May and Meurer (BJMM) [9], in our computational
complexity evaluation on classical computers. The reason for considering all of them is to avoid
concerns on whether their computational complexity in the finite-length regime is already well
approximated by their asymptotic behavior. For all the attacks, we consider the complexity as
obtained with a logarithmic memory access cost, considering as the amount of required memory
only the one taken by the lists in case of collision-based ISD techniques. Such a choice is mutuated
by the fact that the size of such lists is exponential in the code parameters, and thus has a small, but
non negligible, impact on the computation complexity.ISD In order to estimate the computational
complexity of ISD on quantum computing machines, we consider the results reported in [22], which
are the same employed in the original specification [6]. Since complete and detailed formulas are
available only for the ISD algorithms proposed by Lee and Brickell [47], and Stern [69], we consider
those as our computational complexity bound. While asymptotic bounds show that executing
a quantum ISD derived from the May-Meurer-Thomae (MMT) algorithm [51] is faster than a
quantum version of Stern’s [43], we note that there is no computational complexity formulas showing
this in the finite regime.
We note that solving the CFP problem employing a given ISD variant to search for a codeword
of weight w in a random code with an r × n parity-check matrix is slightly faster than solving
an SDP searching for an error of weight w on the same code. In the following, we will denote
with C-ISDsdp (n, r, t) and Q-ISDsdp (n, r, t) the cost of solving the SDP on a code represented by
an r × n parity-check matrix, finding an error of weight t, with a classical and quantum computer,
respectively. We denote with C-ISDcfp (n, r, t) and Q-ISDcfp (n, r, t) the complexities of solving the
CFP on a code represented by an r × n parity-check matrix, finding a codeword of weight w, with a
classical and quantum computer, respectively. In the following, we will omit the C- and Q- prefixes
whenever the statements apply to both computational costs.
Next, we describe how ISD techniques can be employed to perform a key recovery attack against
the codes employed in LEDAcrypt. More precisely, we consider three different approaches and
discuss their computational complexities.
Finding low weight codewords in C . In LEDAcrypt, the public key is the representation of a
code C whose parity-check matrix L in (1.14) is in the form
L = L0 , L1 , · · · , Ln0 −1 , (2.1)
Page 41
LEDAcrypt
Finding low weight codewords in a smaller code C 0 . Consider the systematic generator
matrix for C , with the form:
T
L0 L−1
Ip G0 Ip n0 −1
T
Ip G1 Ip L1 L−1
n 0 −1
Gsys = = . (2.4)
. .. .
..
. .. .
..
Ip Gn0 −2 Ip L −1
T
n0 −2 Ln0 −1
Pick a a block Gi and constitute the matrix G0 = [Ip , Gi ]. G0 can be thought as the generator
matrix of a code C 0 , with length 2p and dimension p, which, we show, contains codewords of weight
2v. Indeed, we have:
T
LTn0 −1 G0 = [LTn0 −1 ; LTn0 −1 Li LTn0 −1 ] = [LTn0 −1 ; LTi ]. (2.5)
We thus have that the low weight codewords of G reveal enough information to reconstruct the
secret key from the public key. It is thus possible to apply an ISD technique to solve the CFP
problem with complexity ISDcfp (2p, p, 2v). Note that there are p codewords of weight 2v for each
block Gi employed to build G0 , therefore resulting in a speedup for the ISD equal to n0 p.
Codeword finding in the dual code C ⊥ . An attacker can consider the dual code of C , C ⊥ . A
valid generator matrix for such a code is thus the matrix L which, by construction, has low weight
codewords. Indeed, the rows of L have weight n0 v. It is thus possible to solve the codeword finding
problem on C ⊥ with a cost ISDcfp (n0 p, (n0 − 1)p, n0 v). Due to the quasi cyclic nature of L, the ISD
cost will be reduced by a factor p.
We thus have that the cost of key recovery attacks taken into account in our parameter design
ISDcfp (n0 p,p,2v) ISDcfp (2p,p,2v) ISDcfp (n0 p,(n0 −1)p,n0 v)
procedure is min p(n20 )
, n0 p , p .
Page 42
LEDAcrypt
A recent attack against LEDAcrypt primitives exploits the product structure of the parity-check
matrix of the private code, i.e., L = HQ to search for weak keys that make decoding easier than
in the general case.
This attack procedure has been introduced in [2] and leverage the product structure of the public
code parity-check matrix to make guess separately on H and Q and projecting them onto H 0 . This
accelerates the ISD procedures with respect to a direct application of them directly on H 0 .
According to such an approach, one key is considered weak if occurs with probability 2−x , requires
the equivalent of 2y AES operations for ISD and x + y < λ, being λ the claimed security level.
Some of such weak keys have been found for the following LEDAcrypt instances:
This attack has been tested for n0 = 2, and works well when n0 is small and the weights of H
and Q are well balanced. Hence, it can be countered by increasing n0 or choosing unbalanced
weights for H and Q. In order to completely avoid attacks of this type, in the following we focus
on LEDAcrypt instances with m = 1, thus yielding maximally unbalanced weights for H and Q.
This coincides with considering those special instances of LEDAcrypt in which the matrix Q boils
down to a quasi cyclic permutation matrix and, hence, can be neglected. Therefore, from this
point onward, we will simply consider the case L = HI = H. We note that taking Q = I does
not otherwise adversely impact the one-wayness of the Niederreiter and McEliece cryptosystems as
described in Section 1.2, provided that the weight of a column of L = H is chosen taking Q = I
into account (i.e., taking v ≈ dv · m, where dv is the column weight of the old H matrix and m the
column weight of the old Q matrix).
Considering the choice of obtaining the parity-check matrix H as a randomly generated, block
circulant matrix made of 1 × n0 blocks with size p, we note that performing exhaustive key search
attacks will always be more expensive than solving the codeword finding problem via advanced
ISD solvers. Indeed, picking H as a random block circulant matrix with column weight v, and
row weight w = n0 v, does not offer any additional possibility of enumerating its components, as
opposed to obtaining it as the product of two low weight block circulant matrices. This results in
n
a complexity of performing an exhaustive key search equal to w × ctest , where ctest is the cost of
testing a candidate key for correctness. We note that such a computational cost is higher than even
the one required by the simplest ISD techniques. Taking as an example Lee and Brickell’s ISD [47],
(n)
the correct key is found in k w r × ctest attempts, where x is a free positive integer parameter
(x)(w−x)
to be optimized.
Page 43
LEDAcrypt
In addition to the proper sizing of the parameters of LEDAcrypt so that it withstands the afore-
mentioned attacks, attacks performed by an active attacker should be taken into account.
The best known case of active attacks against LEDAcrypt, and, more in general, non null decoding
failure probability cryptosystems are the so-called reaction attacks.
In these attacks, an adversary has access to a decryption oracle, to which he may pose a large
amount of queries; when the scheme’s DFR is non-zero, then events of decryption failures leak
information about the secret key. Therefore, the secret key can be retrieved through a statistical
analysis on the decryption outcome can be performed [27,28,37,38,62]. In particular, these attacks
exploit the relation between the DFR of the code and the supports of the private matrices and
the error vector used for encryption. Indeed, whenever e and H have pairs of ones placed at the
same distance, the decoder exhibits a DFR smaller than the average one over all the possible error
vectors.
Reaction attacks require the collection of the outcome of decoding (success or failure) on a ciphertext
for which the attacker knows the distances between the set bits in the error vector for a significant
number of ciphertexts, to achieve statistical confidence in the result. The information on the
decoding status is commonly referred to as the reaction of the decoder, hence the name of the
attack. We note that employing an appropriate construction, so that the resulting primitive enjoys
IND-CCA2 guarantees provably prevents reaction attacks, as the attacker model fits the CCA2
one.
In the following we briefly recall the work of [38]. Given a binary vector a of length p, having
Ψa = {p0 , p1 , · · · , pw } as its support, i.e., the set of positions of a containing a set bit, we define its
distance spectrum DS(a) as:
For a circulant matrix C, the distance spectrum DS(C) is defined as the distance spectrum of
its first row, since all the rows share the same spectrum. Indeed, it can be easily shown that the
cyclic shift of a vector does not change its distance spectrum. As proven in [27], it is possible to
reconstruct a vector a once its distance spectrum and number of set symbols is known.
The attacker can aim at recovering the distance spectrum of ones of the circulant blocks in the secret
key; for LEDAcrypt schemes, such an information is enough to recover the whole secret key from
the public one. To do this, the adversary generates a large number of valid plaintext/ciphertext
pairs and then queries the oracle with the produced ciphertexts which, to each received query,
replies with the outcome of the corresponding decryption phase. Then, the statistical analysis
of the oracle’s replies (in terms of decoding success or failure) is exploited by Eve to recover the
distance spectra she is interested in. When an IND-CCA2 secure conversion is adopted, the error
vector used during encryption cannot be chosen by Eve, and can be seen as a randomly extracted
n
vector among all the possible t n-tuples with weight t. A critical parameter for these attacks is
T , which is the number of collected ciphertexts. In fact, after observing T ciphertexts, the average
number of failures observed by Eve is T · DFR, which is the basis for her statistical analysis. For
LEDAcrypt instances, we consider that any key pair has a lifetime equal to T = DFR−1 , which
means allowing that only one decryption failure is observed by Eve during the whole lifetime of
each key pair, on average.
Page 44
LEDAcrypt
It has been recently pointed out in [56] that some mechanisms exist to generate error vectors able
to artificially increase the DFR of these systems. However, they start from the observation of at
least one failure, which is unlikely when the original DFR is sufficiently low. In addition, these
methods require the manipulation of error vectors, which is not feasible when an IND-CCA2 secure
conversion is adopted.
Side channel attacks against LEDAcrypt may be performed exploiting either the computation time
or the energy required to perform a computation as a valid side channel. Attacks exploiting the
latter information leakage, or any common proxy of the energy information (e.g., electromagnetic
emissions) follow the usual attack methodology, common to all asymmetric ciphers [59, 68]. In this
regard, the highly parallelizable nature of the out-of-place iteration function, or the intrinsically
randomized nature of the described randomized-in-place iteration function provide room for a
natural hiding in time countermeasure, at a very limited overhead. Such an approach provides
an useful complement to a masking strategy which can be employed on the Boolean and simple
arithmetic computations of the decoder itself.
Concerning timing side channel attacks, the decoder iterations are amenable to a constant time
implementation, and the fixed number of iterations of the decoder ensures that no obstacles are
present to carry it out. While the fixed number of iterations suppresses the most evident timing
side channel leakage [26] altogether, we note that more subtle timing leakages due to the number
of flips performed during an iteration itself may yield equally informative content [63]. Indeed,
both the number of flips, and the number of iterations can be exploited by an attacker in the same
fashion the information coming from a decoding failure is. The motivation behind these attacks is
in the fact that many features of the QC-LDPC/MDPC decoding phase (together with the failure
probability) can be related to the number of overlapping ones between the columns of the secret
parity-check matrix that are indexed by the support of the error vector. This number is indeed
strictly correlated to the number of unsatisfied parity-check equations, which is the key quantity
that is used in BF decoders to estimate error affected bits. In the case of QC codes, the number of
overlapping ones between columns is directly related to the distance spectrum of the circulant blocks
in the secret key. Thus, combining such side-channel attacks with the key reconstruction described
in [38] allows for a complete key-recovery attack. To solve the aforementioned issues concerning
timing attacks, a consolidated study of constant-time implementation of QC-LDPC/QC-MDPC
decoders is currently present in open literature, starting with [18], and counting, amount the most
recent works [23, 36].
Page 45
Chapter 3
In this chapter we provide the rationale of the design and the description of the LEDAdecoder
procedure for the three cryptographic primitives. In this design, our aim is twofold:
• Provide a predictable-DFR, constant time implementable, decoding strategy for the primitives
LEDAcrypt-KEM and LEDAcrypt-PKC where IND-CCA2 guarantees are desired;
• provide a low execution latency optimized decoding strategy, with a DFR low enough to be
acceptable in practical cases (≈ 10−9 ) for LEDAcrypt-KEM-CPA.
The LEDAdecoder decoding strategies are derived from the classic BF-decoder, originally pro-
posed by Gallager [33]. This choice is motivated by both the computational efficiency of this family
of algorithms and the possibility of providing a reliable and simple theoretical model for the DFR,
which becomes significantly involved when more complicated decoders are taken into account.
BF-decoders rely on a simple procedure which iterates the computation of a function employing
the syndrome s and the parity-check matrix H. Such a function starts from an estimate, ê, of the
sought error vector, e, and computes a new estimate of it, ē, from the available data.
The initial value of the error estimate ê, in the case of the first iteration, is set to the all-zero vector
of length n. Therefore, such a function outputs an updated estimate ē of the error vector, and the
syndrome s̄ of the vector obtained as the difference between the sought error vector e and ē, that
is: s̄ = H(ē ⊕ e)T .
From now on, we will refer to such a function as an iteration of the BF-decoder.
The BF-decoder computes repeatedly an iteration function, providing as an input to the next
execution of it an estimate of the sought error vector ê and a syndrome s, with values equal to
the updated error vector estimate ē and the updated syndrome s̄ computed at the end of the
current execution, respectively (i.e., ê = ē, and s = s̄), as well as the parity-check matrix H, until
the computed value of s̄ is the null vector, or a pre-defined number of iterations, imax, has been
performed.
Different strategies to compute the iteration function can be employed. Indeed, a decoder can be
designed to execute a sequence of iteration functions of different kinds. In our approach to the
design of the LEDAdecoder, we will employ two iteration functions which go by the name of
in-place and out-of-place, and for which we provide:
46
LEDAcrypt
Out-of-place
In-place iteration Max Mismatches Threshold Numerical DFR
iteration
DFR estimate correction bound decision strategy simulation
DFR estimate
LEDAdecoder
LEDAdecoder
for IND-CPA
for IND-CCA2
primitives
primitives
ii. a closed form procedure to compute a conservative estimate of the number of mis-
matches (discrepancies) between the actual error vector e and the output of the iteration ē
(i.e., the Hamming weight of e ⊕ ē: wt (e ⊕ ē)), at the end of an out-of-place iteration
function computation (Section 3.3);
iii. a code-specific closed form procedure to compute the maximum number of mismatches
between e and ē that can be corrected with certainty by a single computation of an iteration
function of any of the two kinds (Section 3.4);
iv. a threshold decision strategy trading off an improvement in the correction capability of
the out-of-place iteration function with its closed form DFR estimation (Section 3.5), which
is replaced by numerical DFR simulations.
Figure 3.1 summarizes our choices in designing the LEDAdecoder for both the IND-CCA2 prim-
itives and INC-CPA primitives in LEDAcrypt.
Decoder design for IND-CCA2 LEDAcrypt-KEM and IND-CCA2 LEDAcrypt-PKC
We design a closed form predictable DFR LEDAdecoder combining our ability to derive a closed
form DFR estimate (Section 3.2 and Section 3.3) with the procedure to determine the maximum
number of mismatches which can be corrected with a single computation of an iteration (Sec-
tion 3.4). Indeed, we determine what is the probability of the decoder ending up in computing an
error estimated ē with a given number of discrepancies wt (e ⊕ ē), lesser or equal to a given amount,
τ , after computing all but the last iteration (employing the closed form procedures stated in i) and
ii) for the in-place and out-of-place iteration functions, respectively).
The last iteration of the LEDAdecoder is then chosen to be either in-place or out-of-place with
the following criterion:
• An in-place iteration allows us to employ the tighter upper bound on the correction capability
provided by item i), at the cost of a higher decoder latency as in-place iterations are less
amenable to parallel implementations.
Page 47
LEDAcrypt
Page 48
LEDAcrypt
Table 3.1: Summary of the LEDAdecoder instances depending on the LEDAcrypt primitive
KEM-CPA case) the early exits after an iteration (lines 3 and 6) can be employed to obtain a faster
execution. By contrast, in constant time implementations all the iterations are always computed –
in practice not altering the value of ê during their computation.
Note that Algorithm 12 returns the value of ê deeming it correct if the value of s is null.
Indeed, there is a probability of the decoder ending up in providing an error vector ē which does not
match the actual one e, even if the syndrome is null. We take into account these failure cases too
in our analysis, as we monitor the number of discrepancies between ē and e, and deem a decoding
successful if and only if there are none.
Finally, we provide a quick summary of how the appropriate choices of imaxIn , imaxOut and the
bit-flipping thresholds result in the LEDAdecoder variant for IND-CPA or IND-CCA2 primitives
in Table 3.1.
In particular, in addition to the choices which have been described, we pick the total number
of iterations for the LEDAdecoder employed in IND-CCA2 primitives to be equal to 2, as it
provides a good engineering tradeoff between key size and execution speed and makes the primitives
amenable to a constant time implementation.
Conventions adopted in this chapter
In the remainder of this chapter, we will describe in detail the contributions i)–iv). To this end, we
recall from Section 2.4 that from the specification reported in this manuscript on, the parity-check
matrix of LEDAcrypt secret code, L = HQ, is chosen taking Q = I; thus, the Q matrix is neglected
for all practical purposes. As a consequence, the LEDAdecoder procedure is expected to recover
the value of an error vector e of weight t, given the secret parity-check matrix H and a syndrome
s of the error e through H, that is, e ← LEDAdecoder(H, s).
In particular, during the Niederreiter-based decryption procedure of LEDAcrypt-KEM (see Fig-
ure 1.3(a) in Section 1.2), s is identified with the private syndrome s0 , obtained multiplying the (pub-
lic) syndrome contained in the ciphertext by the last block of L = H, Ln0 −1 = Hn0 −1 , s0 = Hn0 −1 s.
During the decryption of LEDAcrypt-PKC, s corresponds to the syndrome of the corrupted code-
word c, contained in the ciphertext, computed through the secret parity-check matrix.
Page 49
LEDAcrypt
In the following we describe the assumptions and derivations at the core of our statistical model
for the behaviour of the iteration functions employed in the LEDAdecoder.
The set of simultaneous parity-check equations, in the unknown error vector e with n0 p bits and
weight t, tackled by the decoder on an input parity-check matrix H, with p rows and n0 p columns,
and a syndrome s with length p, is captured by the following relation HeT = sT .
To the end of finding the bit-values of the unknown error vector the decoder starts taking an
estimated error vector ê with all bits set to zero and applies a sequence of iteration functions, each
of which will evaluate whether or not to flip (i.e., change) the j-th bit of ê, êj , if the corresponding
number of unsatisfied parity-check equations (upcj ) exceeds a predefined threshold th, updating
the syndrome value accordingly. The decoder will stop when the updated syndrome value is equal
to zero. Note that the parity-check equations (i.e., the rows in the H êT = sT set of simultaneous
equations) influenced by the j-th bit of the estimated error vector are the ones having an asserted
bit in their j-th column position, and the subset of unsatisfied parity-check equations includes
the ones having a constant term that is also asserted (as it coincides with the non-null value in
the corresponding bit-position of the syndrome), therefore the value of any upcj can be computed
summing si · hi,j for all the equations, that is 0 ≤ i ≤ p − 1. The distinguishing point between an
in-place and an out-of-place iteration function lies on when the update of the syndrome value is
executed. In the former iteration function, the syndrome is updated just after each test establishing
if the j-th bit value in the estimated error vector (i.e., êj ) should be flipped or not (i.e., if upcj ≥ th
or not). In the latter iteration function, the syndrome is updated after the test and possible flip of
all the bits of the estimated error vector have been performed.
The analyses of the in-place and of the out-of-place iteration functions, reported in the next sub-
sections, are all based on the following statements and Lemma.
Assumption 3.1.1 The probability Pth f |1 = Pr upcj ≥ th | e j 6
= ê j , with j ∈ {0, 1, . . . , n−1}, that
the number of the unsatisfied parity checks involving the j-th bit of the error vector, i.e., upcj , is
large enough to trigger a flip of êj , given that its current value does not match the value of the
j-th bit in the unknown error vector, i.e., ẽj = ej ⊕ êj = 1, is a function of only the total number
t̂ = wt (ê ⊕ e) of positions over which the estimated error vector ê and the unknown error vector e
differ.
Analogously, the probability Pth m|0 = Pr upcj < th | ej = êj that the number of the unsatisfied
parity checks involving the j-th bit of the error vector, i.e., upcj , is low enough to maintain the
current value êj of the j-th estimated error vector bit, given that its current value matches the value
of the j-th bit in the unknown error vector, i.e., ẽj = ej ⊕ êj = 0, is a function of only the total
number t̂ = wt (ê ⊕ e) of positions over which the estimated error vector ê and the unknown error
vector e differ.
Informally, we are stating that the statistical behaviour of the single given upcj does not depend
on its location j, but only on the number of discrepancies between the estimated error vector and
the actual one, and the fact that the j-th position of ê is in accordance or not with e.
The following probabilities referred to flipping (f ) or maintaining (m) the value of each bit of ê
Page 50
LEDAcrypt
Note that the assumption on the fact that the syndrome at hand is obtained from a vector ẽ = ê ⊕ e
of weight t̂ is trivially true if the iteration of the decoder being considered is the first one being
computed, since the error estimate ê is null and the error vector e is drawn at random with weight
t = t̂. This in turn states that, when employing Assumption 3.1.2 in estimating the correction
capability of the first iteration of a decoder, we are only relying on the fact that the rows of the
matrix H can be considered independent, neglecting the effects of the quasi-cyclic structure.
In the following Lemma we establish how, upon relying on the previous assumption, the probabilities
that characterize the choices on the bits of the estimated error vector, made by a either an in-place
or an out-of-place iteration function, can be expressed.
Page 51
LEDAcrypt
Pmin{w−1,t̂−1} w−1
n0 p−w
l=0, l even l t̂−1−l
ρ1,u (t̂) = Pr hi,: (ẽ)T = 1 | ẽz = 1 =
n0 p−1
t̂−1
Consequently, the probability Pth f |1 (t̂) = Pr [upcz ≥ th | ẽz = êz ⊕ ez = 1] of changing (flipping) the
z-th bit of the estimated error vector êz assuming that ẽz = 1, and the probability Pth m|0 (t̂) =
Pr [upcz < th | ẽz = êz ⊕ ez = 0] of maintaining êz assuming that ẽz = 0, are computed as follows
v
X v σ v−σ
Pth
f |1 (t̂) = ρ1,u (t̂) 1 − ρ1,u (t̂) ,
σ=th
σ
th−1
X
v σ v−σ
Pth
m|0 (t̂) = ρ0,u (t̂) 1 − ρ0,u (t̂) .
σ
σ=0
Proof. For the sake of brevity, we consider the case of ẽz = 1 deriving the expression of Pth f |1 (t̂); the
th
proof for Pm|0 (t̂) can be carried out with similar arguments. Given a row hi,: of the parity-check
Ln0 p−1
matrix H, such that z ∈ Supp(hi,: ), the equation j=0 hi,j ẽj (in the unknown ẽ) yields a non-null
value for the i-th bit
L of the syndrome, ŝi , (i.e., the equation is unsatisfied) if and only if the support
of ẽ is such that nj=0 0 p−1
hi,j ẽj = 2a + 1, a ≥ 0, including the term having j = z, i.e., hi,z ẽz = 1.
This implies that the cardinality of the set obtained intersecting the support of hi,: with the one of
ẽ, |(Supp(hi,: ) \ {z}) ∩ (Supp(ẽ) \ {z})|, must be an even number, which in turn cannot be larger
than the minimum between |Supp(hi,: ) \ {i}| = w − 1 and |Supp(ẽ) \ {i}| = t̂ − 1.
The probability ρ1,u (t̂) is obtained considering the fraction of the number of values of ẽ having
an even number of asserted bits matching the asserted bits ones in a row of H (noting that, for
the z-th bit position, both the error and the row of H are set) on the number of ẽ values having
n p−1
t̂ − 1 asserted bits over n0 p − 1 positions, i.e., 0t̂−1 . The numerator of the said fraction is easily
computed as the sum of all ẽ configurations having an even number 0 ≤ l ≤ min{w − 1, t̂ − 1} of
asserted bits. Considering a given value for l, the counting of ẽ values is derived as follows. Picking
w−1
one vector with l asserted bits over w possible positions, i.e., one vector over l possible ones,
n0 p−w
there are t̂−1−l possible values of the error vector exhibiting t̂ − 1 − l null bits in the remaining
n0 p−w
n0 p − w positions; therefore, the total number of vectors with weigh l is w−1 l t̂−1−l
.
Repeating the same line of reasoning for each value of l allows to derive the numerator of the
formula defining ρ1,u (t̂).
From Assumption 3.1.2, the observation of any unsatisfied parity check involving the z-th bit of the
error vector ẽz (i.e., hi,: (ẽ)T = 1) given that ẽz = êz ⊕ ez = 1, is modeled as a random variable with
a Bernoulli distribution having parameter (or expected value) ρ1,u (t̂), and each of these random
variables is independent of the others. Consequentially, the probability that the decoder performs a
bit-flip of an element of the estimated error vector when the corresponding bit of the error vector is
asserted and the counter of the unsatisfied parity checks (upc) is above or equal to a given threshold
th, is derived as the binomial probability obtained adding the outcomes of v (column-weight of H)
i.i.d. Bernoulli trials.
Page 52
LEDAcrypt
We consider the in-place iteration function described in Algorithm 13, which takes as input the
p-bit syndrome s, the initial n-bit error vector estimate ê, and the transposed parity-check matrix
of the code represented, for performance reasons, as an n0 × v integer matrix containing in each
row the positions (ranging in {0, 1, . . . , p − 1}) of the set coefficients in the first column of each n0
p × p block of the circulant-block matrix H = [H0 | H1 | . . . | Hn0 −1 ]. The algorithms outputs the
updated n-bit error vector estimate, ē, and the corresponding updated syndrome. In the following
analysis, the number of mismatches between the error estimate ê and the sought error vector e that
are going to be corrected is denoted as t̂ = wt(ê ⊕ e), while the (eXclusive-OR) difference between
e and ê is denoted as ẽ = ê ⊕ e (thus, t̂ = wt(ẽ)).
It is worth noting that, if the computation of Algorithm 13 corresponds to the first iteration
performed by the LEDAdecoder (that is, iterIn = 0 in Algorithm 12) then t̂ = t since ê is the
null vector of length n0 p, and there are exactly t mismatches with e.
Our aim in this section is to provide a way to compute the probability of t̂ taking a given value at
the end of the in-place iteration performed by Algorithm 13. Note that the probability of t̂ being
Page 53
LEDAcrypt
zero is the probability that the iteration completes correctly the error estimation, i.e., that the
algorithm of the in-place iteration function succeeds in retrieving e at the end of its computation.
Algorithm 13 starts by drawing a random permutation π and applying it to the sequence of inte-
ger indexes {0, 1, . . . , n0 p − 1} (line 1–2), then it continues by determining the flipping threshold
according to the number of the iteration being computed, by means of a lookup (line 3).
The outer loop of the algorithm at lines 4–16 iterates over the entries of ê, following the order given
by the permuted index sequence J.
Each execution of the loop body computes the number of unsatisfied parity-check equations, upcj ,
to which the bit ẽ contributes (lines 5–9). If upcj is above or equals the chosen threshold, th,
(line 10) we obtain the output estimate ēj by flipping êj (line 11), while the output syndrome is
obtained updating the current one accordingly (lines 12–14). If upcj is below the chosen threshold,
the output estimate ēj remains equal to êj , while also the syndrome is not modified. Note that, as
j is supposed to denote a column index of the p × n0 p matrix H, the algorithm needs to compute
both the block index, i ← bj/pc, and the offset, offset, of the position of the column at hand
(j = i · p + offset) to correctly determine the v positions of the set coefficients on the j-th column
of H as: (offset + Htr[i][k]) mod p, for all k denoting the position of a set coefficient along the
first column of the block i.
Using the probabilities Pth th
f |1 and Pm|0 , we derive a statistical model for one in-place iteration as
described in Algorithm 13. In particular, we consider a worst-case evolution of the iteration from
the correction standpoint, i.e., the computation path leading to a decoding success at the end of
the iteration with the lowest probability. Indeed, the success probability of the in-place decoder
also depends on the order in which the ê bits are processed.
To describe the worst-case computation path (from a correction standpoint), we consider a subset
of the permutations π picked at the beginning of the iteration (line 1 of Algorithm 13).
Let Sn∗ ⊆ Sn be the set of all permutations π ∗ ∈ Sn∗ such that1
that is the set of permutations which cause all the positions having discrepancies between e and ê
to be analyzed last by the iteration.
We denote with Pr [ ē 6= e| π ∈ Sn ] the probability that the estimated error vector provided as
output by the RandomizedInPlaceIteration algorithm is different from e, conditioned by the
fact that the permutation π was picked. Similarly, we define Pr [ ē 6= e| π ∗ ∈ Sn∗ ].
It can be verified that Pth th th th
f |1 (t̂) ≥ Pf |1 (t̂ + 1) and Pm|0 (t̂) ≥ Pm|0 (t̂ + 1), ∀t̂, as increasing the
number of current mis-estimated error bits, increases the likelihood of a wrong decoder decision.
By leveraging Assumption 3.1.1, we now prove that the in-place iteration function reaches a correct
decoding at the end of the outer loop with the minimum probability each time a π ∗ ∈ Sn∗ is applied
to the sequence of indices.
Lemma 3.2.1 Let s and ê, with s = H(e ⊕ ê)T and t̂ = wt (e ⊕ ê) be the input of Algorithm 13
(RandomizedInPlaceIteration), and let ē be the error vector estimate provided as output.
Then
∀π ∈ Sn , ∀π ∗ ∈ Sn∗ , Pr [ ē 6= e| π ∈ Sn ] ≤ Pr [ ē 6= e| π ∗ ∈ Sn∗ ] .
1
The notation Supp(w), where w is a vector, denotes the set of indexes(positions) corresponding to a non-null
component of w
Page 54
LEDAcrypt
Proof. For the sake of readability, we rewrite Pr [ ē 6= e| π ∈ Sn ] as 1 − β(π), where β(π) is the
probability that all the positions of ê, evaluated in the order specified by π, are correctly flipped or
not, so that ē = e. To visualize the effect of a permutation π ∗ ∈ Sn∗ on the discrepancies between ê
and e, we can consider the following representation
The in place iteration will thus analyze first a run of n − t̂ positions where the difference vector
ẽ between the permuted error π ∗ (e) vector and π ∗ (ê) contains only zeroes, followed by a run of
t̂ positions containing only ones. The expression of β(π ∗ ) can be derived, thanks to Assumption
3.1.1, to be the following
n−t̂
β(π ∗ ) = Pth
m|0 (t̂) · Pth th th
f |1 (t̂) · Pf |1 (t̂ − 1) · · · Pf |1 (1)
To demonstrate this expression, note that the elements in the first n − t̂ positions of π ∗ (e) and
π ∗ (ê) match; therefore, the decoder takes the correct decision if it does not change the value of
π ∗ (ê). Then, if a sequence of n − t̂ correct decisions are taken in the first n − t̂ evaluations, t̂ will
not change, thus each decision will be correct with probability Pth m|0 (t̂). This leads to a probability
n−t̂
of taking the first n − t̂ decisions equal to Pth m|0 (t̂) . Through an analogous line of reasoning,
observe that the decoder will need to flip the value of the last t̂ of ê to complete the iteration with no
residual discrepancies. Therefore, at each evaluation during the computation path we are interested
in computing the probability, the value of t̂ will decrease by one unit, yielding the remaining part
of the expression.
Consider now a generic permutation π, such that the resulting π(e ⊕ e0 ) has support {u0 , · · · , ut0 −1 }
Visually, the permuted discrepancies vector π(e ⊕ ê) appears as
We now show that we always have β(π) ≥ β(π ∗ ). First of all, we observe that Pthm|0 (0) = 1 since the
decoder, in case t̂ = 0, will act on a null syndrome and thus will always maintain the same value
for the elements of ê. Then, exploiting the monotonicity of Pth th
m|0 and Pf |1 , the following chain of
Page 55
LEDAcrypt
The expressions of the probabilities i. and ii. are derived in Appendix A, and only depend on
the probabilities Pth th
f |1 (t̂) and Pm|0 (t̂). Given those, we obtain probability iii. as:
h i ω h i h i
E0 E1
X
for ω ≥ x, PrSn∗ ω →
− x = PrSn∗ ω −→ x − δ PrSn∗ ω −→ δ . (3.2)
1
δ=max{0 ; x−(n−ω)}
Page 56
LEDAcrypt
h i
Being able to compute any PrSn∗ ω →
− x of our choice allows us to extend our analysis to any
1
number of consecutive calls to the RandomizedInPlaceIteration function.
To clarify this point, denote with a ·(i) superscript the fact that a given value pertains to the i-th
consecutive execution of Algorithm 13 after the first (i.e., i ≥ 1). For the first iteration (i.e., i = 0),
the number of discrepancies between e and ê just before the execution of the RandomizedInPla-
ceIteration function is expressed as: t̂(i−1) = t.
Since a different permutation π is applied at the beginning of each execution of Algorithm 13, we
have that the worst-case computation path (from a correction capability standpoint) is the one
where a permutation among the ones packing the discrepancies between e and ê(i) , i.e., π ∈ Sn∗ , is
taken at each iteration. Each of these permutations will pack the t̂(i) discrepancies in the positions
which will be considered last by the RandomizedInPlaceIteration function.
Clearly, t̂(i) also expresses the weight of the error vector corresponding to the syndrome which is
given as input to the (i + 1)-th iteration.
In
We denote with PrSn∗ t −−−−→ x the probability that the number of differences between ē(imax −1)
imaxIn
and e is equal to x, after a maximum of imaxIn ≥ 1 consecutive calls to the RandomizedInPla-
ceIteration function (each call labeled with iterIn = 0, iterIn = 1, iterIn = 2, . . . , iterIn =
imaxIn − 1, respectively – see Algorithm 12 at lines 2–4).
Then, by considering all possible configurations of the residual errors t̂(i) , we have
In
n
X n
X Y−1
imax h i
PrSn∗ t −−−−→ x = ··· PrSn∗ t̂(j−1) −
→ t̂(j) .
In
imax 1
t̂(0) =0 t̂(imaxIn −1) =0 j=0
The above formula is very simple and, essentially, it takes into account all possible transitions
starting from an initial number of residual errors equal to t and ending with a final a number of
residual errors equal to x.
This result allows us to determine what is the probability of leaving at most x discrepancies between
the error estimate output by the last in-place iteration function out of a sequence of imaxIn ones.
We will exploit this result,in two different ways for the LEDAcrypt IND-CCA2 primitives:
i. to provide an upper bound on the DFR for a decoder having imaxOut = 0, simply estimating
the probability that imaxIn in-place iterations
leave
no discrepancies and computing the
complement to one as: DFR ≤ 1 − PrSn∗ t −−−−→ 0 .
imaxIn
ii. together with the result providing a code-specific bound to the maximum number of discrep-
ancies which can be corrected by an iteration function, either in-place or out-of-place (see
Section 3.4), to derive the LEDAdecoder DFR.
In this case, we provide a conservative DFR estimation, for an imaxOut = 1 decoder, comput-
ing the maximum number of discrepancies corrected with certainty, τ , according to section
Section 3.4, and subsequently determining the probability that the first imaxIn iterations leave
at most τ discrepancies as the sum of the probabilities of leaving
exactly
x discrepancies for
Pτ
all the values of x between 0 and τ : DFR ≤ 1 − x=0 PrSn∗ t −−−−→ x .
imaxIn
Page 57
LEDAcrypt
100
10−1
DFR estimation for worst
10−2 case permutation
10−3 DFR estimation for average
DFR
case permutation
10−4 Numerical simulation,
10−5 worst-case permutation
Numerical Simulation,
10−6 random permutation
10−7
10−8
10 20 30 40 50 60 70 80
t
Figure 3.2: Comparison of our worst-case DFR estimation technique with numerical simulations,
for a LEDAdecoder with imaxIn = 1 imaxOut = 0, on a QC-LDPC code with parameters n0 =
2, p = 4801, v = 45, employing th = 27 as the bit flipping threshold. The numerical DFR simulation
is obtained decoding random error vectors of weight t, until either 100 decoding errors are reached,
or until 108 decoding operations have been performed, whichever happens first.
A bonus point of the analysis we proposed is that it is also allows to obtain an estimate for the
average DFR of a code, i.e., an estimate of the correction capability whenever a random permutation
π is being used. Indeed, let π(ê) be the vector obtained by applying the permutation π on ê, with
support Supp(π(ê)).
Considering two consecutive elements of elements Supp(π(ê)), ai , ai+1 with 0 ≤ i ≤ t−2, we denote
the length of the 0-run between them as ai+1 − ai . The value d of the average denote the average
zero-run length in ê, i.e., d=E [ai+1 − ai ] , ∀i∈{0, 1, . . . , t−2}, where E[·] denotes the expected value,
can be derived to be n− t̂
t̂+1
. Consequently, we obtain as the estimate for the average DFR after the
computation of one in-place iteration as:
Yt̂ d Y t̂
th
∀t̂ > 0, DFR = 1 − Pm|0 (j) Pth
f |1 (l). (3.3)
j=1 l=1
It is interesting to note that the estimate for the average case DFR converges to the one of the
worst case DFR after one in-place iteration function h iexecution, as t̂h tends ito zero. Indeed, such
estimate can be obtained as DFR ≤ 1 − PrSn∗ t → − 0 , where PrSn∗ t → − 0 can be derived from
0 0
n0 p−t̂
Appendix A to be t̂j=1 Pth · t̂l=1 Pth
Q Q
m|0 (j) f |1 (l), and it is easy to note that n0 p − t converges
n−t̂
to the same value as d = t̂+1
when t̂ tends to zero.
To provide numerical evidence, further backing the soundness of our assumptions, we report in
Figure 3.2 the results of performing a numerical simulation down to values of DFR equal to ≈ 10−8
for a code with p = 4801, v = 45. The code parameters were chosen to obtain an acceptable
simulation time (about 2k to 5k core hours per data set). These parameters require the same effort
Page 58
LEDAcrypt
to solve either SDP or one of the CFP reported in Chapter 2, assuming t = 45: ≈ 296 classical
operations or ≈ 264 quantum gates. We depict the results of the worst-case and average-case DFR
estimation technique presented in this section as solid lines. We depict as + signs the results of
the simulations of the LEDAdecoder with imaxIn = 1, imaxOut = 0, both employing a randomly
picked permutation, and selecting ad hoc a worst-case permutation (since we actually know the
value of ẽ during the simulations). As it can be seen, our estimation techniques provide a very
good fit for the actual behavior of the LEDAdecoder with imaxIn = 1, imaxOut = 0, in particular
matching also the fact that the average case behavior of the decoder tends to the worst case one
when t is decreasing.
Page 59
LEDAcrypt
We consider the in-place iteration function described in Algorithm 14, which takes as input the
p-bit syndrome s, the initial n-bit error vector estimate ê, and the transposed parity-check matrix
of the code represented, for performance reasons, as an n0 × v integer matrix containing in each
row the positions (ranging in {0, 1, . . . , p − 1}) of the set coefficients in the first column of each n0
p × p block of the circulant-block matrix H = [H0 | H1 | . . . | Hn0 −1 ]. The algorithms outputs the
updated n-bit error vector estimate, ē, and the corresponding updated syndrome. In the following
analysis, the number of mismatches between the error estimate ê and the sought error vector e that
are going to be corrected is denoted as t̂ = wt(ê ⊕ e), while the (eXclusive-OR) difference between
e and ê is denoted as ẽ = ê ⊕ e (thus, t̂ = wt(ẽ)).
Our aim is to assess the probability that, at the end of the iteration function computation, the
number of mismatches between the output error vector ē and the actual error vector e is below or
equal to a sufficiently small number τ , i.e., that wt(e ⊕ ē) ≤ τ .
Algorithm 14 describes the OutOfPlaceIteration function. The computation starts by keeping
a copy of the input syndrome s, CurrSynd, on which the unsatisfied parity-check (upc) computations
of all the bits of the estimated error vector ê will be based (line 1), and by initializing the output
Page 60
LEDAcrypt
error estimate ē and the output syndrome with the input estimate ê and the input syndrome,
respectively (lines 2). Subsequently, the threshold to test whether to flip or not each bit in the
estimated error vector is determined – employing the number of iterations previously computed as
input (for possible use in the LEDAcrypt-KEM and LEDAcrypt-PKC systems), or the syndrome
weight (for possible use in the LEDAcrypt-KEM-CPA system). Each execution of the body of the
nested loops at lines 4–5 considers the j-th element of ê and the j-th column of the parity-check
matrix (0 ≤ j ≤ n0 p − 1), assuming that j = i · p + offset, where i ∈ {0, . . . , n0 − 1} is the block
index of the column at hand, while 0 ≤ offset ≤ p − 1 is the distance of the column at hand from
the first column of the block where it belongs. In particular, the instructions on lines 6–9 compute
the number of unsatisfied parity-check equations, upc, to which the bit ê contributes. If upc is above
or equals the chosen threshold, th, (line 10) the j-th bit of the output error estimate ē (which has
been initialized with a copy of ê at line 2) is flipped and the output syndrome updated accordingly
(lines 11–14). The distinguishing behavior w.r.t. the in-place iteration function is determined by
the use of the vector CurrSynd to compute the number of unsatisfied parity-check equations and
the use of s̄ as separate vector for the computation of the output syndrome, that jointly make each
bit-flip performed on in the output estimated error vector independent from the others.
We consider Assumption 3.1.1 and Assumption 3.1.2 to hold, and rely on Lemma 3.1.1 to estimate
the probabilities that characterize the output error vector estimate of the out-of-place iteration
function. Specifically, we use Pth f |1 (t̂) to characterize the probability that the iteration function
flips any bit position i of ê which has a discrepancy (i.e., with ẽi = ei ⊕ êi = 1), and Pth m|0 (t̂) to
characterize the probability that the iteration function does not flip any bit position i which is not
affected by a discrepancy between ê and e (i.e., with ẽi = ei ⊕ êi = 0).
We model the number of error mismatches at the end of the iteration function computation, that
is wt(e ⊕ ē), as a random variable over the discrete domain of integers {0, . . . , n}, n = n0 p, having
a probability mass function that we determine as follows.
Let us denote with fcorrect the number of bit-flips that are correctly performed by a single run of
the out-of-place iteration function, i.e., the number of positions j such that ej = 1 and ēj = 1;
analogously, let us denote with fwrong the number of bit-flips that are wrongly performed by a
single run of the out-of-place iteration function, i.e., the number of positions j such that ej = 0
and ēj = 1.
Because of Assumption 3.1.1, the bit-flip actions performed by the out-of-place iteration function
are statistically independent, i.e.,
Pr fcorrect = xc , fwrong = xw = Pr [fcorrect = xc ] · Pr fwrong = xw , (3.4)
while, given Lemma 3.1.1, the probability of the function performing xc ∈ {0, . . . , t} correct bit-flips
out of t, and the probability of the function performing xw ∈ {0, . . . , n − t} wrong bit-flips out of
n − t can be quantified as:
t xc t−xc
Pr [fcorrect = xc ] = Pth
f |1 (t̂) 1 − Pth
f |1 (t̂) ,
xc
(3.5)
n−t th
xw th n−t−xw
Pr fwrong = xw = 1 − Pm|0 (t̂) Pm|0 (t̂) .
xw
Page 61
LEDAcrypt
100
10−1
10−2
10−3
DFR
Figure 3.3: Comparison of our worst-case DFR estimation technique with numerical simulations,
for a LEDAdecoder with imaxIn = 0 imaxOut = 1, on a QC-LDPC code with parametrs n0 = 2, p =
4801, v = 45, employing th = 25 as the bit flipping threshold. The numerical DFR simulation is
obtained decoding random error vectors of weight t, until either 100 decoding errors are reached,
or until 108 decoding operations have been performed, whichever happens first.
It is easy to see that, in the case of fcorrect = xc and fwrong = xw , we have wt (e ⊕ ē) = t + xw − xc :
thus, the probability that the output error vector ē differs from the actual error vector e in x ∈
{0, . . . , t} positions can be expressed as:
t
X
Pr [wt(e ⊕ ē) = x] = Pr [fcorrect = xc ] · Pr fwrong = x + xc − t . (3.6)
xc =t−x
where the number of correct bit-flips that must be performed ranges from t minus the number of
residual mismatches x up to the initial number of mismatches t. This, in turn, allows us to derive
the probability that the total P number of discrepancies left at the end of the out-of-place iteration
are at most τ ∈ {0, . . . , n} as τx=0 Pr [wt(e ⊕ ē) = x].
This result allows us to obtain the DFR of a decoder employing one or more out-of-place iterations
in a fashion similar to the one for the in-place decoder described in Section 3.2, i.e., we determine
the maximum number of discrepancies τ which can be corrected with certainty in a single iteration
employing the result from Section 3.4, and subsequently determine the probability that running
imaxOut − 1 out-of-place iteration functions leave at most τ error employing the method we just
described.
The DFR of the entire LEDAdecoder in the case there are only imaxOut = 2 out-of-place iterations
(for the parameter design of the IND-CCA2 primitives) is obtained as the complement to one of
the aforementioned probability of leaving at most τ discrepancies as follows:
τ
X
DFR ≤ 1 − Pr [wt(e ⊕ ē) = x] .
x=0
To provide numerical evidence, further backing the soundness of our assumptions, we report in
Page 62
LEDAcrypt
Figure 3.3 the results of performing a numerical simulation down to values of DFR equal to ≈ 10−8
for a code with p = 4801, v = 45. The code parameters were chosen to obtain an acceptable
simulation time (about 2k to 5k core hours per data set). These parameters require the same effort
to solve either SDP or one of the CFP reported in Chapter 2, assuming t = 45: ≈ 296 classical
operations or ≈ 264 quantum gates. We depict the results of the conservative DFR estimation
technique presented in this section as a solid line and depict as + signs the results of the simulations
of the LEDAdecoder with imaxIn = 0, imaxOut = 1. As it can be seen, our estimation techniques
provide a very good fit for the actual behavior of the LEDAdecoder with imaxIn = 0, imaxOut = 1.
Page 63
LEDAcrypt
In the following, we consider the execution of one iteration function (either in-place or out-of-place)
on an input syndrome s corresponding to s = H(e ⊕ ê)T , and an estimated error vector ê. We
denote with ẽ the bitwise difference between ê and e (mismatch), that is ẽ = ê ⊕ e, and with
t̂ = wt (ẽ) = wt (e ⊕ ê) the number of discrepancies between the unknown error vector and the
input estimated one.
Specifically, we recall the results of [61] and provide a way to determine the code-specific maximum
admissible value, τ , for t̂ so that the execution of the iteration function outputs an estimated error
vector equal to the unknown error vector, i.e.: ē = e.
Note that, in determining such a bound, we rely on no assumptions.
We use s and H to denote the syndrome and parity-check matrix lifted into the integers, respectively,
while si denotes the i-th entry of s and hi,j denotes the entry in the i-th row and j-th column of
H. The number of unsatisfied parity-checks for the i-th bit of the error vector is denoted as upci ,
and is computed as
p−1 p−1 nM
0 p−1 nM0 p−1
! !
X X X
upci = sj hj,i = hj,i ẽu hj,u = ẽu hj,u . (3.7)
j=0 j=0 u=0 j∈{0,...,p−1} u=0
hj,i =1
We denote with h:,i the i-th column of H which, in LEDAcrypt instances, has weight equal to v
for all 0 ≤ i ≤ pn0 − 1. Let H(A) denote the matrix composed by the rows of H indexed by a set
of integers A ⊆ {0, . . . , p − 1} and h(A) :,i denote its i-th column, with 0 ≤ i ≤ pn0 − 1.
We thus have that H(Supp(h:,i )) is a rectangular matrix with v rows of n0 p elements, where the i-th
column, h(Supp(h:,i )) :,i is constituted by all ones.
H(Supp(h:,i )) thus corresponds to the set of parity-check equations in which the i-th component of
the error vector corresponding to the given syndrome is involved.
Equation (3.7) can be thus rewritten as
M
T
upci = wt h(Supp(h :,i )) :,j
= wt H(Supp(h:,i )) ẽ . (3.8)
j∈Supp(ẽ)
Recall that the iteration function will change the i-th bit of the input error estimate êi if and only
if upci is greater or equal than threshold th.
Depending on whether ẽi = 0 or ẽi = 1, the function will make a correct choice if it does not flip the
i-th bit, or if it does, respectively. If ẽi = 0, the function will make a correct decision if upci < th:
we thus consider the following upper bound on upci , assuming ẽi = 0:
M X
upci = wt H(Supp(h:,i )) ẽT = wt h(Supp(h:,i )) :,j ≤ wt h(Supp(h:,i )) :,j . (3.9)
j∈Supp(ẽ) j∈Supp(ẽ)
Page 64
LEDAcrypt
By contrast, if ẽi = 1 the function will make a correct decision if upci ≥ th.
We thus consider the following lower bound on upci , assuming ẽi = 1:
M M
upci = wt h(Supp(h:,i )) :,j = wt h(Supp(h:,i )) :,i ⊕ h(Supp(h:,i )) :,j
j∈Supp(ẽ) j∈Supp(ẽ)\{i}
M X
= v − wt h(Supp(h:,i )) :,j ≥ v − wt h(Supp(h:,i )) :,j .
j∈Supp(ẽ)\{i} j∈Supp(ẽ)\{i}
(3.10)
A relevant quantity in Eq. (3.10) is the Hamming weight of the columns h(Supp(h:,i )) :,j with i 6= j.
Indeed, it represents the amount of interference on the upci value induced by the presence of error
bits in positions different from i.
To ease the computation of wt h(Supp(h:,i )) :,j , we observe that it is equivalent to compute the
cardinality of the set obtained from the intersection of the i-th and j-th column of H, i.e.,
|Supp(h:,i ) ∩ Supp(h:,j )|, since H(Supp(h:,i )) is formed by the rows of H in which the column hi
contains
a one, and only the ones present in such rows of the j-th column of H will contribute to
wt h(Supp(h:,i )) :,j .
We therefore define Γ as an n × n positive integer matrix in which the entry in row i and column
j, denoted as γi,j , amounts to |Supp(h:,i ) ∩ Supp(h:,j )|.
After this definition, Eq. (3.9) and Eq. (3.10) can be rewritten as follows:
X
upci ≤ γi,j , when ẽi = 0 (i.e., ei ⊕ êi = 0), (3.11)
j∈Supp(ẽ)
X
upci ≥ v − γi,j , when ẽi = 1 (i.e., ei ⊕ êi = 1). (3.12)
j∈Supp(ẽ)\{i}
We thus have just provided a way to compute a code-specific (Γ depends on the specific H instance)
upper bound to the value of upci in case the corresponding position in ê is correctly estimated (i.e.,
ẽi = 0), and a lower bound to the value of upci in case ê is incorrectly estimated (i.e., ẽi = 1).
Since we are willing to determine the case where the iteration function corrects with certainty the
discrepancies in ẽ, we need to consider the case where the upper bound on upci for each position
corresponding to a correct êi (see Eq. (3.11)) is strictly lower than the lower bound on the value of
upci for the case of an incorrect êi (see Eq. (3.12)).
Therefore, given a value 0 ≤ i ≤ n0 p − 1, it must hold that:
X X
γi,j < v − γi,j , ∀ ẽ ∈ Fn2 s.t. wt (ẽ) = t̂. (3.13)
j∈Supp(ẽ) j∈Supp(ẽ)\{i}
Indeed, in this case, it is possible to use as the bit-flipping threshold (e.g., th in Algorithm 14
on line 3) any value strictly greater than the left quantity in Eq. (3.13), resulting in the iteration
function executing a bit-flip of all and only the incorrectly estimated êi received as input.
Page 65
LEDAcrypt
For the sake of a more compact notation, we now define the quantity µ(z), 0 ≤ z ≤ n0 p. Informally,
µ(z) is the maximum change in the value of any single upci induced by z discrepancies being present
in locations different from the i-th itself. We therefore have that:
X
µ(z) = max γi,j
0≤i≤n0 p−1
ẽ∈Fn j∈Supp(ẽ)
2 ,wt(ẽ)=z, ẽi =0
since the contributions to upci whenever no discrepancy is present in the i-th position (i.e., ẽi = 0),
only come from the t̂ discrepancies being present in the other locations.
Similarly, observing that, whenever a discrepancy is
present in the i-th position, (i.e., ẽi = 0), the
value of upci is obtained subtracting from v (as wt h(Supp(h:,i )) :,i = v) the effects of the other
t̂ − 1 columns selected by the positions of of asserted bits in the error vector, allowing to rewrite
Eq. (3.12) as:
upci ≥ v − µ(t̂ − 1), when ẽi = 1 (i.e., ei ⊕ êi = 1). (3.15)
Considering Eq. (3.16), the largest value of t̂ = wt (ẽ) = wt (e ⊕ ê) for which it holds is the maximum
number of residual mismatches between ê and e that be corrected with certainty by the execution
of the iteration function at hand, employing a bit-flip threshold th such that:
µ(t̂) + 1 ≤ th ≤ v − µ(t̂).
Therefore, given a specific code, which means a specific instance of the parity-check matrix H, it
is possible to compute the threshold th to be set during the execution of an iteration function in
such a way that it is able to correct with certainty a number of mismatches between e and the
input ê, which amounts to t̂. To this end, performance concerns on design and implementation of
the LEDAdecoder mandates to find an efficient strategy to compute both the matrix Γ and t̂, as
described in the next sub-section.
The computation of the largest value of mismatches between the sought error vector and the input
estimated error vector, t̂, that is surely corrected by an out-of-place iteration function requires a
significant computational effort for the calculus of the matrix Γ, which is approximately cubic in
the value of n0 p.
We now prove that Γ is quasi cyclic, thus allowing a significant reduction in the time required by
its computation. Note that Γ is a symmetric matrix, since γi,j = γj,i , for all pairs of indexes i, j due
to the symmetry of the column intersection. Consider the indexes i, j ∈ {0, . . . , n0 p − 1} obtained
as i = pip + i0 and j = pjp + j 0 , respectively, with ip , jp ∈ {0, . . . n0 − 1} and i0 , j 0 ∈ {0, . . . , p − 1}.
Page 66
LEDAcrypt
Denoting with P the p×p circulant permutation matrix associated to the cyclic shift of one position,
and observing that P p = Ip , and that (P a )T = P p−a , we have that
0 0
h:,i = P i h:,ip , h:,j = P j h:,jp (3.17)
which proves that the matrix Γ is quasi cyclic with block size p, and can thus be written as
Γ0,0 Γ0,1 ··· Γ0,n0 −1
Γ1,0 Γ 1,1 ··· Γ1,n0 −1
Γ= . , (3.18)
. .. .. ..
. . . .
Γn0 −1,0 Γn0 −1,1 · · · Γn0 −1,n0 −1
where each Γi,j is a p × p circulant integer matrix; in particular, due to the commutativity of the
set intersection that defines the γi,j values, we also have that ΓTi,j = Γj,i .
Based on the above analysis, it can be easily shown that the matrix Γ is fully described by a
subset of only p−1 2 n0 (n0 + 1) − n0 entries. Indeed, all the blocks on the main block-diagonal
of Γ, i.e. the Γi,i , i ∈ {0, . . . , n0 − 1} blocks, are both symmetrical and circulant, and recalling
that we do not need to store the elements on the main diagonal, i.e. the γi,i , i ∈ {0, . . . , n0 − 1}
values we obtain that p−1 2 elements are sufficient to fully describe each Γi,i . Since there are n0
blocks on the diagonal, those can be represented by n0 p−1 2 elements of Γ. Thanks to the fact that
Γ is symmetric, it is sufficient to represent, besides the blocks on the diagonal, only the blocks
Γi,j , i ∈ {0, . . . , n0 − 1}, j ∈ {1, . . . , n0 − 1}, i < j. Since there are n20 such blocks, and each one
requires p elements to be fully described, we obtain a total number of elements equal to
This property can be used to obtain an efficient method to evaluate the largest value of t̂ for which
the condition in Eq. (3.16) still holds.
Algorithm 15 provides a procedural description of such an efficient computation, calculating only
the first rows of the circulant blocks of Γ (lines 2–15), and subsequently testing the admissible values
for t̂ (lines 19–30). Note that it is possible to further cut down the computational complexity of
the algorithm by computing directly the histograms containing the occurrences of unique values in
each row of Γ. Such a method is not reported for the sake of brevity, but it allows a significant
speedup (around two orders of magnitude) in practice.
Page 67
LEDAcrypt
Page 68
LEDAcrypt
3.5 Efficient choice of out-of place decoder thresholds for the nu-
merically simulated DFR of LEDAcrypt-KEM-CPA
The choice of the flipping threshold for the out-of-place iteration function influences its error cor-
rection capability. While the previous section described how to determine a threshold such that
a number of discrepancies in the error estimate would be corrected with certainty, the present
section aims at providing a more sophisticated threshold selection criterion. While the criterion
we introduce in this section does not allow for a closed form analysis of the DFR, it practically
provides very fast convergence of a decoding algorithm relying only on out-of-place iterations. We
thus exploit this criterion for LEDAcrypt-KEM-CPA, where ephemeral keys are employed and a
practically low enough DFR is sufficient. This also motivates our choice of employing a decoder
that only exploits out-of-place iterations for LEDAcrypt instances with ephemeral keys. We assess
its DFR by means of numerical simulations targeting parameters sets with a DFR= 10−9 , and
derive the value of imaxOut as the maximum number of iterations found to be required for achieving
the target DFR. These engineering choices allow the decoder to be fully parallelized, as all the upc
values can be computed independently, and the flipping decisions applied to the ê and s likewise.
In the following, we describe our rationale for the computation of the ComputeThreshold func-
tion required in the OutOfPlaceIteration Algorithm 14 at line 3.
A natural choice is to set the threshold used at the l-th iteration of the whole function equal to the
largest among the upc values. This strategy ensures that only those few bits that have maximum
likelihood of being mismatched between ê and e are flipped during each function execution, thus
achieving the lowest DFR. However, it has some drawbacks in terms of complexity, since the
computation of the maximum upc value, for each column of the H matrix, entails additional memory
storage and some repeated operations. For such reasons, we consider an approach with reduced
complexity, according to which the threshold values are computed on the basis of the syndrome
weight wt(s) at each execution of the iteration function. The function relating the syndrome weight
to the threshold to be chosen can be precomputed and tabulated, allowing an efficient lookup to
take place at runtime.
Coherently with the notation adopted in the previous sections, we denote with ê and s, respec-
tively, the input error vector estimate and the syndrome, and recall that s = H (e ⊕ ê)T , where
wt (e ⊕ ê)=t̂. Let us denote with wt(s)|t̂ the average syndrome weight when wt (e ⊕ ê) = t̂.
To this end, let punsatisfied (t̂) denote the probability that a randomly selected parity-check equation
is unsatisfied. Assuming that e ⊕ ê is distributed as a random vector of weight t̂, we have:
min{ n0 v, t̂ } n0 v
n0 p−n0 v
X j t̂−j
punsatisfied (t̂) = n0 p .
j = 1, j odd t̂
Then, under the assumption that the parity-check equations are statistically independent, the
average syndrome weight is estimated as:
The aforementioned relation allows us to estimate t̂, the amount of set bits in ẽ = e ⊕ ê, as the
value t̃ for which wt(s)|t̃ is closer to the (available) wt(s)|t̂ . Relying on this estimate to approximate
the value of t̂ as t̃, we now define a strategy to pick a threshold value depending on the estimated
number of discrepancies t̃ itself.
Page 69
LEDAcrypt
We now consider ẽi , i.e., the i-th element of the vector of differences between e and ê, ẽ = ê ⊕ e
and assume that the number of discrepancies wt(ẽ) = t̂ matches our estimate t̃.
Denote the number of unsatisfied parity-check equations in which the i-th bit participates as
upci ∈{0. . . . , v}, with 0 ≤ i ≤ n0 p − 1. The probability that such a position contains a discrepancy
can be written as:
Pr [ẽi = 1, upci = σ]
Pr [ẽi = 1 |upci = σ ] = =
Pr [upci = σ]
Pr [ẽi = 1, upci = σ]
= =
Pr [ẽi = 1, upci = σ] + Pr [ẽi = 0, upci = σ]
Pr [ẽi = 0, upci = σ] −1
= 1+ .
Pr [ẽi = 1, upci = σ]
By elaborating the previous result, we obtain:
1
Pr [ẽi = 1 |upci = σ ] = σ , (3.20)
1−ρ0,u (t̃) v−σ
n0 p−t̃ ρ0,u (t̃)
1+ t̃ ρ1,u (t̃) 1−ρ1,u (t̃)
where ρ0,u (t̃) and ρ1,u (t̃) are given in Lemma 3.1.1, and note that (as expected intuitively) it is a
monotonically increasing function as σ increases over the integers {0, . . . , v}.
Willing to define a threshold for the value of upci such that it is advisable to flip the i-th bit of ê,
we have that the probability of performing a flip when ẽi = 1 should exceed the one when ẽi = 0,
in turn for a given value σ taken by upci , yielding:
( (
Pr [ẽi = 1 | upci = σ] > (1 + ∆)Pr [ẽi = 0 | upci = σ] Pr [ẽi = 1 | upci = σ] > 1+∆2+∆
⇒ 1
Pr [ẽi = 0 | upci = σ] = 1 − Pr [ẽi = 1 | upci = σ] Pr [ẽi = 0 | upci = σ] < 2+∆
where ∆ ≥ 0 represents a margin which can be arbitrarily chosen. The minimum value of σ such
that the inequalities in the previous relation are satisfied can be computed as:
1+∆
th = min σ ∈ {0, . . . , v} : Pr [ẽi = 1|upci = σ] > (3.21)
2+∆
and used as the decision threshold during the execution of the out-of-place iteration function.
We now combine the results of Eq. (3.19) (linking the the weight of the syndrome to the estimated
number of discrepancies t̃), Eq. (3.20) (linking the value of the estimated number of discrepancies
t̃, to the correct flipping probability given a certain threshold), to obtain the function mapping a
given(available) value for wt(s) onto the corresponding threshold value to be adopted for the out-
of-place iteration. Such a function, of which we provide a graphical representation in Figure 3.4,
splits the domain of possible values for wt (s) into t regions, according to the value t̃ for which the
value wt(s)|t̃ is closest to the available value of wt (s).
(j)
We therefore determine t + 1 delimiting values, ws , j ∈ {0, . . . , t} as:
wt(s)|j−1 + wt(s)|j
w(0)
s = 0; ∀ j ∈ {1, . . . , t}, w(j)
s = , (3.22)
2
(j) (j+1)
Each of the t regions, each one delimited by a pair ws , ws , j ∈ {0, . . . , t}, is thus considered to
be the one where we assume t̃ = j, and employ this value of t̃ to derive the value of the threshold
to be used via Eq. (3.21).
Page 70
LEDAcrypt
p wt(s)
0 = wt(s)|0 wt(s)|1 wt(s)|2 wt(s)|3
(j)
Figure 3.4: Graphical representation of the relations between wt (s), wt(s)|j , ws and thj . The
example is referred to the case of t = 3.
(j)
It can be easily shown that, if j is the largest integer such that wt (s) ≥ ws , then j minimizes
(j)
|wt (s) − ws |. The function described by Eq. (3.22) can be efficiently stored as a lookup table
(j)
containing pairs hws , thj i for all j ∈ {0, . . . , t}.
Therefore, the ComputeThreshold function will start by computing the value of wt(s), and
subsequently find the required threshold tht̃ . Algorithm 16 provides a procedural description of a
constant-time lookup into the table.
(j)
We have observed that, moving from the largest values of ws to the smallest ones, the threshold
values computed this way firstly exhibit a decreasing trend, then start to increase. According to
numerical simulations, neglecting the final increase has no impact from the performance standpoint.
Therefore, in the look-up table we replace the threshold values after the minimum with a constant
value equal to the minimum itself. In addition, the so-obtained table usually contains consecutive
rows with the same threshold value: we can clearly eliminate all such rows, apart from the first one
(0)
(i.e., the one containing ws ). By means of numerical simulations, we have also observed that the
lowest DFR within a reasonable amount of iterations, for LEDAcrypt instances, is obtained with
the choice ∆ = 0.
Page 71
Chapter 4
In this chapter we describe the reproducible parameter design procedure employed to obtain the
code parameters in LEDAcrypt cryptosystems, provide the parameter sets themselves, and report
our choices for the symmetric primitives required in the cryptosystems. We provide the parameter
generation procedure as public domain software at https://github.com/ledacrypt.
Symmetric primitives. Concerning the IND-CCA2 construction for LEDAcrypt-KEM (Sec-
tion 1.3), we chose to employ as Extendable Output Function (XOF), the NIST standard SHAKE-
128 [25] for NIST Category 1 and SHAKE-256 [25] for Category 3 and Category 5. We chose to
employ as cryptographic hash the NIST standard SHA-3 [25] with digest length, lhash , equal to
256, 384, and 512 bits for NIST Category 1, 3, and 5 of the LEDAcrypt parameters, respectively.
Taking as input the error vector, two distinct instances of such a cryptographic hash are employed
in the IND-CCA2 construction of LEDAcrypt-KEM. Indeed, an instance is employed to implement
the Key Derivation Function (KDF) that outputs the encapsulated secret key, while the other one
is employed to build the confirmation tag to be transmitted along with the ciphertext. To achieve
domain separation between the said instances of the cryptographic hash, we prefix the input to the
former instance (i.e., the one employed for the KDF) with a zero-valued byte, while we prefix the
input to the second instance with the binary encoding of 1 over one byte. This technique, called
prefixing in the recent analysis of many of the IND-CCA2 constructions employed in the NIST PQC
contest [10], allows to derive in a sound fashion two different hash functions starting from the same
primitive.
We note that SHA-3 and SHAKE are already domain separated due to their definition, as they
have a different suffix added to the message by construction.
Concerning the IND-CCA2 construction for LEDAcrypt-PKC (Section 1.4), we chose to employ as
Deterministic Random Bit Generator (DRBG) the NIST recommended CTR-DRBG [8] instantiated
with AES-256 and fed with a TRNG-extracted seed having bit-length, lseed , equal to 192, 256, and
320 bits for the NIST Category 1, 3, 5 of the LEDAcrypt parameters, respectively.
Designing code parameters. The LEDAcrypt design procedure described in this section takes
as input the desired security level λc and λq , expressed as the base-2 logarithm of the number of
operations of the desired computational effort on a classical and quantum computer, respectively. In
addition to λc and λq , the procedure also takes as input the number of circulant blocks, n0 ∈{2, 3, 4},
72
LEDAcrypt
forming the parity-check matrix H, allowing tuning of the code rate. Finally, the design procedure
takes the maximum acceptable DFR, and the number of in-place, imaxIn , and out-of place, imaxOut ,
decoder iterations.
The parameter design procedure outputs the size of the circulant blocks, p, the weight of a column
of H, v, and the number of intentional errors, t. We recall that Q is chosen as I, as highlighted in
Section 2.4 and is therefore neglected in the entire design procedure, and in the code.
The procedure enforces the following constraints on the parameter choice:
• The minimum cost for a message recovery via Information Set Decoding (ISD) on both quan-
tum and classical computers must exceed 2λq and 2λc operations, respectively. This constraint
binds the values of the code length n = n0 p, the code dimension k = (n0 −1)p and the number
of errors t to be chosen such that an ISD on the public code C (n, k) aiming at decoding a
random error vector of weight t requires more than 2λq or 2λc operations on a quantum and
a classical computer.
• The minimum cost for a key recovery attack via ISD on both quantum and classical computers
must exceed 2λq and 2λc operations, respectively. This constraint binds the values of the code
length n = n0 p, the code redundancy r = p and the number of ones in a row of H, w = vn0 ,
to be chosen
such that the effort of an recovering the private
QC-LDPC/QC-MDPC code,
ISDcfp (n0 p,p,2v) ISDcfp (2p,p,2v) ISDcfp (n0 p,(n0 −1)p,n0 v)
i.e., min p(n20 )
, n0 p , p (see Section 2.3.1), requires more
than 2λq or 2λc operations on a quantum and classical computer.
• The choice of the circulant block size, p, should be such that p is a prime number and such
that ord2 (p) = p − 1 (see Theorem 1.1.3).
• The choice of the circulant block size, p, and parity-check matrix density, w = n0 v, must
allow the code to correct the required amount of errors, with the specified DFR, depending
on the IND-CCA2 or IND-CPA requirements of the primitive.
We report a synthetic description of the procedure implemented in the publicly available code as
Algorithm 17. The rationale of the procedure1 is to proceed in refining the choice for p, t, v at fix
point, considering only values of p respecting ord2 (p) = p − 1.
Since there are cyclic dependencies among the constraints on p, t, and v, the search for the parameter
set is structured as a fix point solver iterating the computation of appropriate values for the three
desired parameters (Algorithm 17, lines 2–18), starting from an arbitrary choice of the value of p,
so to be able to compute the ISD costs.
The loop body starts by deriving the length, n, dimension, k, and redundancy, r = n − k, of the
code, assigning them to obtain a code rate equal to 1 − n10 (line 3). Subsequently, the procedure for
the parameter choice proceeds executing a loop (lines 5–7) to determine a value t, with t < r, such
that a message recovery attack on a generic code C (n, k), looking for an error vector with weight
t, requires more than the specified amount of computational efforts on both classical and quantum
computers.
1
Note that, in the pseudocode of Algorithm 17, the loop construct while(< condition >) . . . iterates the execution
of instructions in the loop body when the condition is true, while the loop construct Repeat . . . until(< condition >)
iterates the instructions in the loop body when the condition is false.
Page 73
LEDAcrypt
1 p ← starting prime
2 repeat
3 n ← n0 p, k ← (n0 − 1)p, r ← p
4 t←0
5 repeat
6 t←t+1
7 until t ≥ r ∨ C-ISDsdp (n, r, t) ≥ 2λc ∧ Q-ISDsdp (n, r, t) ≥ 2λq
8 v←1
9 repeat
10 v ←v+2
C-ISDcfp (n0 p,p,2v) Q-ISDcfp (n0 p,p,2v)
11 SecureOk ← p(n20 )
≥ 2λc ∧ p(n20 )
≥ 2λq
C-ISDcfp (2p,p,2v) Q-ISDcfp (2p,p,2v)
12 SecureOk ← SecureOk ∧ n0 p ≥ 2λc ∧ n0 p ≥ 2λq
C-ISDcfp (n0 p,(n0 −1)p,n0 v) Q-ISDcfp (n0 p,(n0 −1)p,n0 v)
13 SecureOk ← SecureOk ∧ p ≥ 2λc ∧ p ≥ 2λq
14 until (SecureOk = true ∨ vn0 ≥ p)
15 if (SecureOk = true) then
16 p, τ ← ComputepDFRok(n0 , v, t)
17 return (p, t, v)
18 until p,v,t do not change
To determine the weight of a column of H, i.e., v, the procedure moves on searching for a candidate
value of v. The loop at lines 9–14 tests the resistance of the cryptosystem against key recovery
attacks attacks on both classical and quantum computers. The final value of v is computed as the
smallest odd integer such that resistance against such attacks is obtained. We remember that the
condition of v being odd is sufficient to guarantee the non singularity of the circulant blocks of H.
In both the evaluation of the key recovery attacks, which drives the choice of v, and the eval-
uation of the message recovery attack, which drives the choice of t, we consider the best (i.e.,
lowest) complexity among the ones of solving the Codeword Finding Problem / Syndrome Decod-
ing Problem (CFP/SDP) with the ISD techniques proposed by Prange [58], Lee and Brickell [47],
Leon [48], Stern [69], Finiasz and Sendrier [29], and Becker, Joux, May and Meurer (BJMM) [9],
and Prange [58], Lee and Brickell [47], Stern [69] for the quantum computing platform as discussed
in Section 2.3. For all the aforementioned algorithm, we perform an exhaustive search for the best
ISD parameters, ensuring that the found parameter sets do not lie at the boundary of our explo-
ration range. Since all the ISD parameters are the result of a computational tradeoff, we expect
Page 74
LEDAcrypt
the found solution to be the actual minimum computation effort for an attack.
When suitable values for the code parameters from a security standpoint are found, the algorithm
computes the minimum value of p, such that the decoding algorithm is expected to correct t errors,
with the desired DFR, depending on the guarantees required by the primitive. The same procedure
computing the appropriate value for p also derives, in case an estimate for a LEDAdecoder to
be used in IND-CCA2 primitives is desired, the value τ of errors to be corrected by the decoder in
a single iteration with certainty.
In order to achieve the maximum correction efficiency from the code at hand, the procedure esti-
mating the value of p yielding the desired DFR computes the best (i.e., lowest) DFR value exploring
exhaustively all the values for the decoding thresholds in the decoders.
As a consequence, we employ at key generation phase a rejection sampling procedure, which per-
forms the analysis from Section 3.4, and discards the keypair if the code does not correct the
amount of errors τ determined at parameter design time. To prevent this rejection sampling from
significantly diminishing the keyspace, we estimate the value of τ at design time by running the
analysis from Section 3.4 on 1000 randomly extracted codes, and pick the value τ which is corrected
by at least a half of the randomly generated codes.
We note that such a fixpoint optimization procedure, with respect to the one proposed during
the first round of the NIST post quantum standardization process, simultaneously optimizes the
parameters for achieving the desired security margin and DFR, while the previous procedure to
obtain long-term keys (i.e., suitable for IND-CCA2 primitives) started from the optimization of
the values of t and v to match the desired security margin against ISD attacks, and then gradually
increased the value of p until a desirable DFR was achieved. The current procedure instead, allows
to exploit the additional security margin coming from an increase of the value of p, in turn allowing
a reduction of the keysizes.
The C++ tool provided relies on Victor Shoup’s NTL library (available at https://www.shoup.
net/ntl/), in particular for the arbitrary precision integer computations and the tunable precision
floating point computations, and requires a compiler supporting the C++11 standard.
The tool follows the computation logic described in Algorithm 17, but it is optimized to reduce the
computational effort as follows:
• The values of p respecting the constraint ord2 (p) = p − 1 are pre-computed up to 149, 939
and stored in a lookup table.
• The search for the values of p, t and v respecting the constraints is performed by means of a
dichotomic search instead of a linear scan of the range.
• The computations of the binomial coefficients employs a precomputation of all the factorials
as arbitrary precision integers up to a tunable number. When a factorial is not available as
a precomputed value, we switch to a direct, incremental calculation starting from the largest
factorial available in the precomputed table. We resort to this technique since Stirling’s
approximation, considering the approximation up to the eight term of the series, was proven
to be lacking accuracy in the determination of extremely low DFR values (i.e., ≤ 2−100 ).
Namely, employing Stirling’s approximation lead to excessively conservative DFR estimates.
• We chose to precompute all the values of the factorials up to 90000!, 150000! and 250000! to
perform the parameter design for Category 1, 3 and 5, and limit the maximum code lengths
accordingly. The values were chosen so that Stirling’s approximation is not used when the
Page 75
LEDAcrypt
final code parameter results are obtained. We note that this results in a memory occupation
of around 6.9 GiB, 19.7 GiB, and 57.4 GiB per optimization process, respectively. The value
can be tuned to accomodate parameter generation replication for smaller parameter sets with
less demanding hardware.
• We left Stirling’s approximation available in the public code to provide also reproducibility
of the previous parameter designs.
• All floating point values are memorized and manipulated as 704-bits mantissas (≈ 200 decimal
digits) and 64-bit exponents, as per provided by the NTL library.
In this section, we present the code parameters for IND-CCA2 LEDAcrypt-KEM and IND-CCA2
LEDAcrypt-PKC, evaluating the engineering tradeoff given by performing two, one or no out-of-
place iterations in a decoder, preceded by a number of in-place iterations such that the total decoder
iteration number is fixed at two. The choice to pick such a low number of iterations allows the
implementation of the decoder to benefit from a very fast, constant time execution, and allow us
to employ our DFR estimation techniques in a regime where they provide a close, conservative, fit
to the values which can be obtained through numerical simulations, in turn keeping a reasonable
size of the parameter design space.
To provide a numerical confirmation of such a fact, we report the outcome of numerical simulations
on a p = 4801, n0 = 2, v = 45 QC-LDPC/QC-MDPC code in Figure 4.1.
DFR
DFR
(a) imaxIn = 2 imaxOut = 0, th0 = 26 (b) imaxIn = 1 imaxOut = 1, th0 = 27 (c) imaxIn = 0 imaxOut = 2, th0 = 25
Figure 4.1: Comparison of our worst-case DFR estimation technique with numerical simulations,
for the LEDAdecoder choices employed to provide predictable DFR for IND-CCA primitives
(i.e., imaxIn + imaxOut = 2). Results obtained on a QC-LDPC/QC-MDPC code with parameters
n0 = 2, p = 4801, v = 45 (taking t = 45, it exhibits a complexity of ≈ 296 classical operations or
≈ 264 quantum gates against the best SDP/CFP solvers) employing thresholds th0 for the first
iteration computation found computing all the DFR estimates according to Chapter 3, and picking
the first iteration thresholds providing the best DFR for each case. The numerical DFR simulation
is obtained decoding random error vectors of weight t, until either 100 decoding errors are reached,
or until 108 decoding operations have been performed, whichever happens first.
Page 76
LEDAcrypt
The code parameters were chosen to obtain an acceptable simulation time (from 2 · 103 to 5 · 103
core hours per data set). For the record, these parameters require the same effort to solve either the
Syndrome Decoding Problem (SDP) or one of the Codeword Finding Problems (CFPs) reported in
Chapter 2, and allow the code to exhibit a resistance of ≈ 296 classical operations or ≈ 264 quantum
gates when t = 45.
In order to choose the bit-flipping thresholds for the first iteration to be used in these numerical
simulations, we employed the DFR estimation techniques reported in Chapter 3, computing the
DFR for all the possible threshold values in {d v+1
2 c, . . . , v}, and picking the one providing the best
estimated DFR. The second iteration threshold is chosen with the code-specific criterion reported
in Section 3.4, for the LEDAdecoder with imaxOut > 0, performing a rejection sampling at key
generation time so to achieve a number of errors corrected with certainty by the last out-of-place
iteration τ = 7. For the case of the imaxIn = 2, imaxOut = 0 LEDAdecoder, we simply picked the
second iteration threshold equal to the one for the first iteration. As it can be seen in Figure 4.1, our
worst case DFR estimation techniques provide a good fit of the decoder behaviour, while retaining
a conservative estimate of the actual decoder DFR.
Table 4.1 and Table 4.2 report the values of the parameters of the codes designed to be used
with a LEDAdecoder employing (imaxIn = 0, imaxOut = 2) and (imaxIn = 1, imaxOut = 1),
respectively. Specifically, for each NIST category, guaranteed DFR values and number of circulant
blocks n0 ∈ {2, 3, 4}, the tables report: the size p of the circulant blocks of H, the weight v of
a column of H, the number t of intentional errors, and the value of τ (weight of error corrected
with certainty in a single iteration) to be employed during the rejection sampling phase in the
keyGeneration procedure, and the ratio of keypairs accepted on average during an execution of
the KeyGeneration procedure. Furthermore, for each NIST category of the reported codes, the
tables also show the base-2 logarithm of the estimated computational cost of the best algorithms for
solving the following problems on a classical and a quantum computer (denoted with the prefixes
C- and Q-, respectively), described in Chapter 2: syndrome decoding problem for key recovery
(SDP); codeword finding problem in the hidden LDPC/MDPC code (CFP-1); codeword finding
problem in the length-reduced hidden LDPC/MDPC code (CFP-2); codeword finding in the dual
code of the hidden code (CFP-3).
A noteworthy point is that the third strategy of key recovery attack (i.e., the one denoted as CFP-3)
is effectively the one being the most effective for all the parameter sets having n0 equal to either 3 or
4, while the key recovery attack complexities are almost all the same for n0 = 2 bar for a single unit
advantage of the codeword finding problem in length-reduced hidden LDPC/MDPC code (CFP-2),
which is easily ascribed to an extra factor 2 in the complexity denominator (see Section 2.3.1). In
Table 4.1, we observe a significant reduction in the size of the p parameter, with respect to the
parameter sets proposed in the LEDAcrypt specification at the beginning of the second round of the
NIST PQC contest. Note that such parameters were proposed only for the case n0 = 2 employing
a LEDAdecoder with a number of iterations with imaxIn = 0, and imaxOut = 2. The reason
for the ≈ 30% decrease in the size of the p parameter for the sets with DFR equal to 2−64 , and
the ≈ 50% decrease in size for the sets with DFR equal to 2−λ is due to a combined effect of the
joint optimization of p, t, and v provided by the new parameter design procedure, and the higher
correction power of a sparse code where L = HQ and Q = I. The former effect can be seen from
the slight decrease of the values of t and v, which previously was kept unchanged from the one for
ephemeral key use parameters. The latter effect is evident observing that the values reported for
τ in the LEDAcrypt specification provided for the second round of the NIST PQC contest, were
in the 4–6 range, while the current values for τ are in the 9–18 range, indicating that more errors
Page 77
LEDAcrypt
with more than twice the weight can be corrected with certainty by the given code parameters.
Table 4.3 reports the values of the parameters of the codes designed to be used with a LEDAde-
coder employing (imaxIn = 2, imaxOut = 0). A notable difference of this table w.r.t. Table 4.1
and Table 4.2 is that it does not show the values of the parameter τ and the percentage of keypairs
accepted during the execution of the KeyGeneration procedure, because the DFR of the codes
are conservatively estimated for the two executions of the in-place iteration function applying the
results of the analysis in Section 3.2.
Comparing the results in Table 4.1 (decoder with imaxIn = 0, imaxOut = 2), Table 4.2 (decoder
with imaxIn = 1, imaxOut = 1), and Table 4.3 (decoder with imaxIn = 2, imaxOut = 0) we can
observe a steady decrease in the code length due to the use of in-place iteration functions. Such
a decrease is more evident in the case of imaxIn = 2, since the DFR estimate in that case is not
employing the result from Section 3.4 to estimate the correction power of the last iteration, instead
employing the method in Section 3.2. Indeed, when two in-place iterations are considered, instead
of assuming that a decoding failure takes place whenever the first in-place iteration leaves more
than τ errors, the analysis in Section 3.2 considers the non-unitary probability of decoding in such
a case too, yielding a tighter DFR estimate.
Page 78
LEDAcrypt
Table 4.1: Parameter sizes for IND-CCA2 LEDAcrypt-KEM, and IND-CCA2 LEDAcrypt-PKC,
with an imaxIn =0, imaxOut =2 LEDAdecoder executing two times the out-of-place iteration
function. Attack complexities are reported as the base-2 logarithm of the estimated computational
cost of the best algorithm for solving the following problems on a classical and a quantum computer
(denoted with the prefixes C- and Q-, respectively).
SDP: syndrome decoding problem for key recovery;
CFP-1: codeword finding problem in the hidden LDPC/MDPC code;
CFP-2: codeword finding problem in the length-reduced hidden LDPC/MDPC code;
CFP-3: codeword finding on dual code of the of hidden code.
All attack complexities are equal or greater to the ones mandated by NIST categories as per
Chapter 2, Table 2.1
NIST key % C-SDP Q-SDP C-CFP-1 Q-CFP-1 C-CFP-2 Q-CFP-2 C-CFP-3 Q-CFP-3
DFR n0 p v t τ
Cat. ok (> 143.5) (> 81.2) (> 143.5) (> 81.2) (> 143.5) (> 81.2) (> 143.5) (> 81.2)
2 23,371 71 130 10 85.3 144.2 95.7 148.3 94.4 147.3 93.4 148.3 94.4
2−64 3 16,067 79 83 9 86.5 145.3 96.1 250.8 146.4 161.0 99.7 145.7 93.9
4 13,397 83 66 8 93.7 145.4 96.3 329.3 186.1 167.9 102.7 145.3 94.4
1
2 28,277 69 129 11 68.2 144.8 96.4 145.0 92.9 144.0 91.97 145.0 92.9
2−128 3 19,709 79 82 10 66.5 144.6 96.1 251.3 146.9 161.5 100.2 146.2 94.5
4 16,229 83 65 9 63.5 144.2 96.1 329.6 186.5 168.3 103.2 145.8 95.0
NIST key % C-SDP Q-SDP C-CFP-1 Q-CFP-1 C-CFP-2 Q-CFP-2 C-CFP-3 Q-CFP-3
DFR n0 p v t τ
Cat. ok (> 208.0) (> 113.4) (> 208.0) (> 113.4) (> 208.0) (> 113.4) (> 208.0) (> 113.4)
2 40,787 103 195 13 81.9 208.9 130.2 210.8 127.5 209.8 126.5 210.8 127.5
2−64 3 28,411 117 124 11 99.0 209.0 130.2 369.8 207.8 235.5 138.8 210.8 128.4
4 22,901 123 98 11 61.4 208.0 129.8 487.8 267.3 246.4 143.9 210.2 128.7
3
2 52,667 103 195 15 59.8 208.7 130.6 211.5 128.2 210.5 127.2 211.5 128.2
2−192 3 36,629 115 123 13 81.9 208.3 130.2 364.0 205.2 232.1 137.5 208.0 127.3
4 30,803 123 98 12 75.9 209.0 130.8 488.2 267.9 246.9 144.6 210.8 129.5
NIST key % C-SDP Q-SDP C-CFP-1 Q-CFP-1 C-CFP-2 Q-CFP-2 C-CFP-3 Q-CFP-3
DFR n0 p v t τ
Cat. ok (> 272.3) (> 145.6) (> 272.3) (> 145.6) (> 272.3) (> 145.6) (> 272.3) (> 145.6)
2 61,717 137 261 17 55.2 273.0 163.8 277.7 162.3 276.7 161.3 277.7 162.3
2−64 3 42,677 153 165 14 96.4 273.1 163.8 482.9 265.7 306.5 175.7 273.0 160.9
4 35,507 163 131 13 95.9 273.2 164.0 646.9 348.2 325.4 184.8 275.5 162.9
5
2 83,579 135 260 18 51,6 273.1 164.4 274.6 161.2 273.6 160.2 274.6 161.2
2−256 3 58,171 153 165 16 65,2 274.1 164.8 483.5 266.4 307.1 176.5 273.7 161.6
4 48,371 161 131 15 78,9 274.3 165.1 647.2 348.9 325.9 185.5 276.2 163.6
Page 79
LEDAcrypt
Table 4.2: Parameter sizes for IND-CCA2 LEDAcrypt-KEM, and IND-CCA2 LEDAcrypt-PKC,
with an imaxIn =1, imaxOut =1 LEDAdecoder executing one time the in-place iteration func-
tion followed by one execution of the out-of-place iteration function. Attack complexities
are reported as the base-2 logarithm of the estimated computational cost of the best algorithm for
solving the following problems on a classical and a quantum computer (denoted with the prefixes
C- and Q-, respectively).
SDP: syndrome decoding problem for key recovery;
CFP-1: codeword finding problem in the hidden LDPC/MDPC code;
CFP-2: codeword finding problem in the length-reduced hidden LDPC/MDPC code;
CFP-3: codeword finding on dual code of the of hidden code.
All attack complexities are equal or greater to the ones mandated by NIST categories as per
Chapter 2, Table 2.1
NIST key % C-SDP Q-SDP C-CFP-1 Q-CFP-1 C-CFP-2 Q-CFP-2 C-CFP-3 Q-CFP-3
DFR n0 p v t τ
Cat. ok (> 143.5) (> 81.2) (> 143.5) (> 81.2) (> 143.5) (> 81.2) (> 143.5) (> 81.2)
2 21,701 71 130 10 75.56 144.0 95.4 148.1 94.2 147.1 93.2 148.1 94.2
2−64 3 15,053 79 83 9 75.74 145.1 95.9 250.7 146.2 160.9 99.5 145.5 93.7
4 12,739 83 66 8 89.69 145.2 96.2 329.2 186.0 167.8 102.6 145.2 94.3
1
2 27,077 69 130 11 62.61 144.7 96.2 148.6 94.8 147.6 93.8 148.6 94.8
2−128 3 19,541 79 82 10 45.10 144.5 96.0 251.2 146.8 161.4 100.1 146.1 94.4
4 15,773 83 65 9 53.33 144.2 96.0 329.6 186.5 168.3 103.2 145.8 94.9
NIST key % C-SDP Q-SDP C-CFP-1 Q-CFP-1 C-CFP-2 Q-CFP-2 C-CFP-3 Q-CFP-3
DFR n0 p v t τ
Cat. ok (> 208.0) (> 113.4) (> 208.0) (> 113.4) (> 208.0) (> 113.4) (> 208.0) (> 113.4)
2 37,813 103 196 13 70.6 208.6 130.0 210.7 127.3 209.7 126.3 210.7 127.3
2−64 3 26,597 117 124 11 97.4 208.8 129.9 369.7 207.6 235.4 138.7 210.7 128.2
4 22,229 123 98 11 57.0 208.0 129.7 487.8 267.2 246.4 143.8 210.1 128.7
3
2 50,363 103 195 15 50.1 208.7 130.5 211.4 128.1 210.4 127.1 211.4 128.1
2−192 3 35,419 115 123 13 73.8 208.3 130.1 370.2 208.3 236.0 139.4 211.3 129.0
4 29,221 123 98 13 61.2 209.0 130.7 488.2 267.9 246.9 144.5 210.7 129.4
NIST key % C-SDP Q-SDP C-CFP-1 Q-CFP-1 C-CFP-2 Q-CFP-2 C-CFP-3 Q-CFP-3
DFR n0 p v t τ
Cat. ok (> 272.3) (> 145.6) (> 272.3) (> 145.6) (> 272.3) (> 145.6) (> 272.3) (> 145.6)
2 58,171 137 262 17 53.1 273.8 164.1 277.6 162.2 276.6 161.2 277.6 162.2
2−64 3 40,387 155 165 14 90.8 274.5 164.3 489.1 268.7 310.3 177.5 276.3 162.4
4 33,493 163 131 13 88.8 273.0 163.8 646.8 348.1 325.3 184.7 275.4 162.7
5
2 81,773 135 260 17 87.0 273.0 164.3 274.5 161.1 273.5 160.1 274.5 161.1
2−256 3 56,731 153 165 16 52.9 274.1 164.8 483.5 266.4 307.1 176.4 273.7 161.6
4 47,459 161 131 15 73.0 274.2 165.0 647.2 348.9 325.9 185.5 276.1 163.6
Page 80
LEDAcrypt
Table 4.3: Parameter sizes for IND-CCA2 LEDAcrypt-KEM, and IND-CCA2 LEDAcrypt-PKC,
with an imaxIn =2, imaxOut =0 LEDAdecoder executing two times the in-place iteration
function. Attack complexities are reported as the base-2 logarithm of the estimated computational
cost of the best algorithm for solving the following problems on a classical and a quantum computer
(denoted with the prefixes C- and Q-, respectively).
SDP: syndrome decoding problem for key recovery;
CFP-1: codeword finding problem in the hidden LDPC/MDPC code;
CFP-2: codeword finding problem in the length-reduced hidden LDPC/MDPC code;
CFP-3: codeword finding on dual code of the of hidden code.
All attack complexities are equal or greater to the ones mandated by NIST categories as per
Chapter 2, Table 2.1
2 15,461 71 132 144.9 95.4 147.4 93.3 146.4 92.3 147.4 93.3
2−64 3 10,891 79 83 144.1 94.9 250.3 145.5 160.3 98.8 144.9 93.0
4 9,283 83 66 144.3 95.2 329.0 185.3 167.4 101.9 144.7 93.6
1
2 19,813 71 131 144.7 95.7 147.9 94.0 146.9 93.0 147.9 94.0
2−128 3 14,051 79 83 145.0 95.8 250.6 146.1 160.8 99.4 145.4 93.6
4 11,909 83 66 145.1 96.1 329.2 185.9 167.8 102.5 145.2 94.2
2 27,701 103 197 208.7 129.4 210.0 126.5 209.0 125.5 210.0 126.5
2−64 3 19,949 117 125 209.5 129.8 369.3 207.0 234.9 138.0 210.1 127.6
4 16,747 123 99 209.0 129.8 487.6 266.7 246.0 143.2 209.6 128.0
3
2 38,069 103 196 208.7 130.0 210.7 127.4 209.7 126.4 210.7 127.4
2−192 3 27,179 117 124 209.0 130.1 369.8 207.7 235.5 138.8 210.7 128.4
4 22,853 123 98 208.1 129.9 487.9 267.3 246.4 143.9 210.2 128.8
2 43,397 137 263 273.9 163.7 276.9 161.4 275.9 160.4 276.9 161.4
2−64 3 31,069 155 166 273.7 163.5 488.8 268.2 309.9 177.0 275.8 161.8
4 26,251 163 132 274.2 164.0 645.4 347.7 325.0 184.2 275.0 162.2
5
2 61,211 137 261 273.0 163.8 277.7 162.3 276.7 161.3 277.7 162.3
2−256 3 43,499 153 165 273.2 163.9 483.0 265.8 306.6 175.8 273.1 160.9
4 36,877 163 131 273.4 164.2 646.9 348.3 325.5 184.9 275.6 163.0
Page 81
LEDAcrypt
Table 4.4: Parameter sizes for the LEDAcrypt-KEM-CPA cryptosystem together with the related
attack complexities – tailored for the scenario where ephemeral private/public keypairs are de-
sired. Each parameter set is chosen targeting a DFR=10−9 . Experimentally, the execution of the
encapsulation and decapsulation procedures, for a given keypair, 109 times exhibits zero decryp-
tion/decoding failures. The column labeled as imaxOut shows the maximum number of times that
the out-of-place iteration function is executed by the LEDAdecoder.
Attack complexities are reported as the base-2 logarithm of the estimated computational cost of the
best algorithm for solving the following problems on a classical and a quantum computer (denoted
with the prefixes C- and Q-, respectively).
SDP: syndrome decoding problem for key recovery;
CFP-1: codeword finding problem on hidden LDPC/MDPC code;
CFP-2: codeword finding problem on length-reduced hidden LDPC/MDPC code;
CFP-3: codeword finding on dual of hidden code.
All attack complexities are equal or greater to the ones mandated by NIST categories as per Chap-
ter 2, Table 2.1
2 10,853 71 133 6 144.8 94.7 146.6 92.4 145.6 91.4 146.6 92.4
1 3 8,237 79 84 5 144.8 94.7 250.0 144.8 159.9 98.1 144.4 92.3
4 7,187 83 67 4 145.3 95.3 328.2 184.8 167.1 101.3 144.2 92.9
2 20,981 103 198 6 208.8 129.0 209.4 125.8 208.4 124.8 209.4 125.8
3 3 15,331 117 125 5 208.7 128.9 369.1 206.4 234.6 137.4 209.6 126.9
4 13,109 123 99 4 208.2 128.9 485.1 266.2 245.8 142.7 209.2 127.4
2 35,117 137 263 4 273.3 163.0 276.5 160.9 275.5 159.9 276.5 160.9
5 3 25,579 155 166 4 273.1 162.8 488.7 267.8 309.7 176.5 275.4 161.4
4 21,611 163 132 4 273.6 163.3 644.9 347.3 324.9 183.7 274.7 161.7
Table 4.4 reports the code parameters obtained for LEDAcrypt-KEM-CPA, and the relative work
factor of the four attacks considered, solving the SDP to recover a message, and the three CFP
solutions for key recovery, with the fastest ISD technique available. We note that, in our complexity
estimation we found out that the quantum variant of the Stern’s algorithm as described by de
Vries [22] does not achieve an effective speedup when compared against a quantum transposition of
Lee and Brickell’s ISD. Such a result can be ascribed to the fact that the speedup obtained by the
reduction in the number of ISD iterations which can be obtained by Stern’s ISD is mitigated by
the fact that the overall number of iterations to run is quadratically reduced by applying Grover’s
algorithm to execute them. This result effectively matches the point made at the end of [13].
Page 82
LEDAcrypt
The parameters in the table are obtained as the output of the parameter design procedure described
by Algorithm 17, where the value of p is obtained through a simple heuristic. We subsequently
simulate the decoding of 109 syndromes obtained from randomly generated error vectors to estimate
the code DFR for each parameter set, and increase the value of p alone by 5% until no more decoding
errors are reported. During this procedure we needed to increase the value of p only in the category
1 parameters, raising it by 5% in the n0 = 2 case, and by 10% in the n0 = 3 and n0 = 4 cases.
The parameter sets for LEDAcrypt-KEM-CPA provide slightly smaller values of p than the one
present in the round-2 submission of LEDAcrypt. Indeed, the size improvement of 5% to 25%,
depending on the category and code rate is due to the greater correction power of the sparse code
in case the Q matrix is taken to be an identity, as we are doing to prevent weak keys attacks.
Table 4.4 also reports the maximum observed number of iterations during the decoding, although
we point out that most (> 50%) of the decoding processes terminate before the stated imaxOut .
Page 83
Chapter 5
Optimized LEDAcrypt
Implementation
In this chapter, we will report the strategies employed to realize an efficient implementation of the
LEDAcrypt primitives, together with quantitative motivations for their choices. We employ, as an
experimental assessment platform, a desktop machine equipped with an Intel Core i5-6600, clocked
at 3.2 GHz, and endowed with 16 GiB of DDR-4, running Debian GNU/Linux 10.3 for amd64
architectures. The C11 source code is compiled with gcc 8.3.0, specifying the highest optimization
level (-O3). The number of clock cycles is measured via the rtdscp instruction, which provides
an integrated instruction barrier for the instructions in flight in the pipeline. All the timings
reported are the averages of 1000 executions. All the measured implementations are optimized
employing the Intel supplied compiler intrinsics for the AVX2 ISA extensions, available in the Intel
Haswell ISA, as per NIST indications. Our aim is to obtain a constant time codebase for the IND-
CCA LEDAcrypt primitives, while the codebase for LEDAcrypt-KEM-CPA is optimized without
considering constant-time constraints, as its ephemeral key use scenario does not allow to exploit
the variable timing leakage.
In particular, in this chapter, we analyze:
• the strategies adopted for both constant- and variable-time binary polynomial multiplications,
where we select a Sparse-By-Dense multiplication strategy for variable time multiplications,
and a Dense-By-Dense multiplication strategy for constant time execution, as we measure
them to be the most efficient in our implementations;
• the strategy to implement the LEDAdecoder employing the bit-slicing technique to obtain a
fully constant time and highly parallel decoding strategy;
• a set of four different modular polynomial inversion strategies, picking the most efficient one
in our scenario, i.e. the approach relying on Fermat’s little theorem, specialized for the xp + 1
modulus;
84
LEDAcrypt
• the tradeoffs which can be achieved employing a value of p with low Hamming weight, speeding
up the modular inverse computation, at the cost of increasing multiplication and decoding
time for the case of LEDAcrypt-KEM-CPA.
The figures of merit to be considered in the performance evaluation of LEDAcrypt systems with
the QC-MDPC code configurations and parameters reported in Tables 4.1, 4.2 and 4.3 for the IND-
CCA2 LEDAcrypt-KEM and IND-CCA2 LEDAcrypt-PKC as well as with the ones in Table 4.4
for LEDAcrypt-KEM-CPA (proposed for use-cases where ephemeral public/private keypairs are
employed), must take into account both the execution times of optimized implementations targeting
x86 64 platforms and the byte-size of data exchanged between sender and receiver. Assuming both
sender and receiver equipped with two instances of the same computational platform, the definition
of the figures of merit must distinguish between the implementations of IND-CCA2 systems, with
their strict requirement to be immune to timing analyses (a.k.a. constant-time implementations),
and the ephemeral-key system which is allowed to run also non constant-time implementations of
its primitives.
Specifically, for the IND-CCA2 LEDAcrypt-KEM (resp., LEDAcrypt-PKC) system, the first figure
of merit is defined as the sum of the execution timings of the Encap (resp, Encrypt) and Decap
(resp. Decrypt) algorithms, both measured as the required number of machine clock cycles,
i.e.: TKEM = TEncap + TDecap (resp. TPKC = TEncrypt + TDecrypt ). For the LEDAcrypt-KEM-CPA,
the first figure of merit is defined as the sum of the execution timings of the KeyGeneration,
Encap and Decap algorithms, measured as the required number of machine clock cycles, i.e.:
TKEM−CPA = TKeyGen + TEncap + TDecap .
A second figure of merit that keeps into account also the computational overhead experienced
at both the receiver and the sender side to process the byte-streams exchanged over the net-
work is defined by adding to the total execution time (i.e., TKEM , TPKC or TKEM−CPA , respectively)
also the total number of transmitted bytes, Nbytes , multiplied by a normalization coefficient α
(measured as machine clock cycles over bytes). In particular for the IND-CCA2 LEDAcrypt-
KEM (resp., LEDAcrypt-PKC) system, the second figure of merit is defined as Tcombined−KEM =
TKEM + α · Nbytes (resp. Tcombined−PKC = TPKC + α · Nbytes ), while for the LEDAcrypt-KEM-CPA is
defined as Tcombined−KEM−CPA = TKEM−CPA + α · Nbytes . An empirical value for the coefficient α, which
is usually accepted as well-modeling the computational overhead introduced by the execution of
the routines in the software-stack employed for communication, is α = 1000.
In every LEDAcrypt instance the foremost role is played by the decoding procedure designed in
Chapter 3, as Algorithm 12. Indeed, for IND-CCA2 systems, the performance of the KeyGeneration
algorithm is a (relatively) limited concern as it does not influence the aforementioned figures of
merit. As a consequence, the most performance demanding algorithm w.r.t. the total execution
time TKEM (resp., TPKC ) is the QC-MDPC decoder (see Algorithm 12).
In the case of the LEDAcrypt-KEM-CPA system, the parameter design took advantage of the re-
quirement of only a practically usable DFR (i.e., DFR= 10−9 ), which in turn can be determined
via numerical simulations and not via closed-form formulae. The decoder implemented for the
LEDAcrypt-KEM-CPA system is a variable iteration number decoder, where the maximum number
of iterations was capped experimentally to the highest one obtained in the numerical simulations.
Page 85
LEDAcrypt
109 109
Time (clock cycles)
107 107
106 106
50 100 150 200 50 100 150 200
n0 p (kbit) n0 p (kbit)
(a) Tdecoder (b) Tdecoder + αNsyndrome−bytes , α = 1000
Figure 5.1: Performance of each LEDAdecoder variant as a function of the error vector sizes
(n0 p) which correspond to the code configurations reported in Tables 4.1, 4.2, 4.3 in Section 4.1
(for each NIST category and n0 ∈{2, 3, 4})
The LEDAcrypt-KEM-CPA decoder employs only out-of-place iterations (see Section 5.3) to min-
imize the decoder latency, and exploits the syndrome-weight dependent threshold determination
rule described in Section 3.5. Such a threshold selection criterion is experimentally determined to
attain a higher correction capability w.r.t. a fixed threshold choice depending only on the number
of computed iterations. This, in turn, allows to further reduce the key size, improving the perfor-
mance figures of merit. An implementation analysis that keeps into account the aforementioned
figures of merit to perform a further tuning of LEDAcrypt-KEM-CPA parameters is presented in
Section 5.5.
In the following we analyze the execution timings of the three LEDAdecoder variants designed
and described in Section 4.1 for IND-CCA2 instances of LEDAcrypt, as well as the figures of merit
obtained by adding to the execution timings the number of bytes encoding the syndrome multiplied
by a normalization coefficient α, with α = 1000. Such a choice is going to evaluate better the
performance differences among the three variants of the LEDAdecoder, highlighting the impact
of the parameter sizes derived from the DFR analyses presented in the previous chapters also on the
size of the transmitted syndrome. It is worth noting that all three variants of the LEDAdecoder
provide strong IND-CCA2 security guarantees featuring closed-form formulae to assess the DFR
provided by the execution of two out-of-place iteration functions, of one out-of-place iteration
function followed by one in-place iteration function, or of two in-place iteration functions.
It is useful to note that the in-place decoding strategy does not allow to parallelize the computation
of its intermediate values, namely, the unsatisfied parity check counts, as the computation of each
one of them depends on the result of the computation of the previous one, in the decoder processing
Page 86
LEDAcrypt
order. Figure 5.1(a) compares the execution times of each decoding strategy as a function of the
values of the circulant-block size p of the underlying code. Figure 5.1(b) takes into account also
the byte-size of the syndrome. Conservatively, Figure 5.1 shows the performance of the three
LEDAdecoder variants employing a constant-time implementation of the out-of-place iteration
function, and non constant-time implementation (and thus faster) of the in-place iteration function.
As it is clear, from Figure 5.1, the most efficient LEDAdecoder strategy executes two times an
out-of-place iteration function. Indeed, despite the computational advantage of a non-constant
time implementation the strategies employing in-place iterations are significantly slower than the
purely out of place one. It is interesting to note, though, that the reduced data being processed by
them (as a result of the smaller values of p, thanks to the higher correction power of an in-place
iteration), effectively causes a reduction in the gap in the combined execution time-transmitted
data metric.
On the basis of these evaluations, we implement the IND-CCA2 LEDAcrypt-KEM and the
IND-CCA2 LEDAcrypt-PKC systems keeping the code configurations and parameters
shown in Table 4.1 in Section 4.1 and a decoder executing two out-of-place iteration
functions.
Page 87
LEDAcrypt
Page 88
LEDAcrypt
Data: m = dlog2 (p − 1)e: number of bits needed to encode an exponent value bi in binary, i.e.,
bi = (βm−1 , βm−2 , . . . , β1 , β0 )bin .
The algorithm always executes the same sequence of operations regardless of the value of bi , with a
computational cost: O (p · (log2 (p) − log2 (w) + 1)) bit operations
1 begin
2 c(x) ← g(x) ≪p (bi mod w) // (intra-word) circular left shift of j<w bits (constant-time)
3 for j ← log2 (w) to m − 1 do
4 tmp(x) ← g(x) ≪p j // (inter-word) circular left shift of a public amount j
5 cond ← (βj == 1) // cond equals 1 in case the equality test is true, 0 otherwise
6 IF-Constant Time(c(x), cond, tmp(x), c(x)) // if (cond ==true) then c(x) ← tmp(x);
7 return c(x)
8 end
9 IF-Constant Time(Z(x), BoolVal, U (x), V (x))
// output Z(x) = zl−1 (xw )l−1 + . . . + z1 (xw )1 + z0 , with zk a w-bit machine word, l = dlogw (p)e
// input U (x) = ul−1 (xw )l−1 + . . . + u1 (xw )1 + u0 , with uk a w-bit machine word, l = dlogw (p)e
// input V (x) = vl−1 (xw )l−1 + . . . + v1 (xw )1 + v0 , with vk a w-bit machine word, l = dlogw (p)e
10 mask ← 0 − (w-bit uint)(BoolVal) // mask is an all 0s or all 1s w-bit value
11 for j ← 0 to l − 1 do
12 zj ← zj ⊕ (uj and mask) or (vj and mask) // and, or, ⊕ operations are computed bitwise
13 return Z(x)
bi ∈ {0, 1, . . . , p − 1}, 0 ≤ i < s, and the latter encoded with p bits. From a high-level per-
spective, the algorithm computes the result d(x) ≡ f (x) · g(x) ∈ Z2 [x]/(xp − 1) by initializing it to
the null polynomial and thenexecuting s iterations of a loop where in each iteration d(x) is updated
as: d(x) ← d(x) + xbi · g(x) mod (xp − 1), 0 ≤ i < s. We note that although the number of loop
iterations coincides with the number of asserted bits in f (x), this is not a secret information for
the use of the Sparse-By-Dense Multiplication algorithm in LEDAcrypt. Indeed, the value s
will be (approximately) equal to either the square root of a fixed fraction of the row/column weight
v √ p
v of the secret p × pn0 parity-check matrix (i.e., s = n0 , v ≈ p · n0 ⇒ s ≈ p/n0 ), which is
a publicly known parameter, or the number t of asserted bits in the error vector, which is also a
publicly known parameter of the cryptoscheme.
Concerning the operations performed in the body of the loop executed by the Sparse-By-Dense
Multiplication algorithm, the modular multiplication of a monomial by a dense p-bit polyno-
mial, i.e., c(x) ← xbi · g(x) mod (xp − 1), must be computed by a carefully designed subroutine
to guarantee that no information on the exponent of the monomial bi leaks from the timing side-
channel. The subsequent modular addition with the partial value of d(x) is implemented trivially
in constant-time performing a bitwise xor between the operands.
Algorithm 18 shows the pseudo-code of the constant-time gf2x modular monomial mul CT al-
gorithm introduced for the first time in [18]. The exponent value of the input monomial (i.e., the po-
sition of an asserted bit in f (x)) is considered in binary encoding, i.e., bi = (βm−1 , βm−2 , . . . , β1 , β0 )bin ,
Page 89
LEDAcrypt
Computational Complexity
Algorithm Implementation
(bit ops.)
Dense-By-Dense Mul. constant-time p1+ , with 0 < < 1
p
constant-time s·(8·p·(m−log2 (w))+3·p), with s≈ p/n0 , n0 ∈{2, 3, 4}, m=dlog2 (p − 1)e
Sparse-By-Dense Mul. p
variable-time s·(4·p), with s≈ p/n0 , n0 ∈{2, 3, 4}
with m=dlog2 (p−1)e, to the end of replacing the computation c(x) ← xbi · g(x) mod (xp − 1) as
m−1
M
j
c(x) ← (x2 · g(x) mod (xp − 1)) · (βj == 1) .
j=0
j
The computation of m circular bit shifts (i.e., x2 · g(x) mod (xp − 1) – equivalently denoted as
g(x) ≪p j in Algorithm 18, lines 4) guarantees no information leakage from the timing side-channel
provided that the modular addition of the result of each of the said circular shifts to the partial
value of the returned polynomial c(x) is also performed in a constant-time fashion as shown in
Algorithm 18, lines 5-6 and lines 9–12. It is worth noting that log2 (w) iterations of the loop at
lines 3-6 in Algorithm 18, are effectively saved in our implementation by performing the intra-word
circular-bit shift (under the assumption that a single word bit shift is performed in constant time),
mandated by the log2 (w) least significant bits of the bi in (βm−1 , βm−2 , . . . , β1 , β0 )bin , with a single
scan of the input operand g(x) before the beginning of the loop iterations, which in turn can be
started from the log2 (w)-th bit of the binary encoding of bi .
The choice of the most fitting multiplication strategy for each LEDAcrypt cryptosystem, i.e.,
LEDAcrypt-KEM-CPA, LEDAcrypt-KEM and LEDAcrypt-PKC must keep into account the re-
quirement for constant-time execution mandated for the IND-CCA2 systems LEDAcrypt-KEM
and LEDAcrypt-PKC and the performance figures exhibited by the Dense-By-Dense Multi-
plication and Sparse-By-Dense Multiplication algorithms. Considering the computational
complexities of the said algorithms, also shown in Table 5.1, the specific values of the LEDAcrypt
parameters, and the fact that different computational platforms feature ISAs with specific oppor-
tunities for optimizations, the selection of the most fitting algorithm/implementation calls for an
exhaustive design space exploration.
It is worth noting that in the implementation of any LEDAcrypt instance one of the two factors of
every multiplication is always managed with a sparse representation for global efficiency reasons.
Therefore, willing to use a Dense-By-Dense Multiplication strategy to realize the multiplica-
tions, an additional constant-time routine, named Densify Polinomial, must be implemented to
convert the sparse representation of a polynomial into a dense one. Specifically, the constant-time
implementation of the Densify Polinomial routine takes as input a set of s integer values, repre-
senting the positions of the asserted bits among the coefficients of a p-bit polynomial and computes
the i-th coefficient of the output polynomial, for all 0 ≤ i < p, by scanning all the input values and
adding to the zero-initialized value of the i-th coefficient, via an inclusive or, either a set bit or a
zero bit, depending on i being equal to the input value at hand or not. The inclusive or operation
Page 90
LEDAcrypt
is performed by means of a If-Constant Time subroutine following a logic similar to the one
applied in Algorithm 18. The computational complexity of the described procedure amounts to
O(s·p) bit operations and is further reduced exploiting the possibility to update the coefficients of
the output polynomial in sequential chunks, each of which including a number of bits/coefficient
equal to the machine word size w, obtaining a running time proportional to O(s·p/w).
104
0 20 40 60 80
p (kbit)
Figure 5.2: Execution times of the polynomial multiplication algorithms plotted against the bit-
lengths (i.e., p values) of a circulant block of the QC-MDPC code designed for both the IND-CCA2
LEDAcrypt-KEM instance and the LEDAcrypt-KEM-CPA instance. Plus marks (+) refer to prime
sizes employed in LEDAcrypt-KEM-CPA; Cross marks(×) refer to prime sizes employed in the IND-
CCA2 LEDAcrypt-KEM with DFR=2−64 ; Circular marks (◦) refer to prime sizes employed in the
IND-CCA2 LEDAcrypt-KEM with DFR=2−λ , with λ∈{128, 192, 256} being the security-equivalent
AES keylength
Figure 5.2 depicts the running time of the described algorithms on our benchmark platform, en-
dowed with the pclmuldq carryless multiplication instruction listed in the AVX2 ISA extension
and a machine word size w = 64. It shows, on the same plot, for each multiplication strategy,
the total number of machine cycles needed to execute each of them with polynomial sizes and
Hamming weight employed in the LEDAcrypt-KEM-CPA and the IND-CCA2 LEDAcrypt-KEM,
when the cryptosystem parameters guarantee either a DFR= 2−64 or a DFR= 2−λ , respectively,
with λ being the security-equivalent AES key-length, ranging in {128, 192, 256}. The timings for
the multiplications involved in the IND-CCA2 LEDAcrypt-PKC match the ones of the IND-CCA2
LEDAcrypt-KEM they share the values of the p.
In Figure 5.2 the total numbers of machine cycles are shown as a function of the bit-length of
the polynomials corresponding to the bit-length of the circulant-block side of the QC-MDPC
parity check matrix adopted in each LEDAcrypt instance. Finally, the timings of the Dense-
By-Dense Multiplication in the figure includes also the machine cycles required by the Den-
sify Polinomial routine to convert the sparse factor into a dense one.
Page 91
LEDAcrypt
Counter-intuitively, Figure 5.2 makes clear that the best choice for the selection of the most fitting
multiplication strategy, in terms of execution times, for all LEDAcrypt requiring constant time
execution and for all possible rates of the underlying QC-MDPC code (i.e., n0 ∈ {2, 3, 4}), is to
pick the proposed Dense-By-Dense Multiplication algorithm even if it mandates the execution
of the Densify Polinomial routine before every operation. Contrariwise, for all scenarios where
LEDAcrypt-KEM-CPA is involved, and therefore no constant time execution requirements are
needed, the variable time Sparse-By-Dense Multiplication is the most convenient approach.
Page 92
LEDAcrypt
1 th ← ComputeThreshold(iterOut , s)
2 for i = 0 to n0 − 1 do
3 bs upci ← Replicate and bitslice(−th)
/* bs upci initialized as a (dlog2 (v)e + 1) × p bit-matrix having all columns equal
to the 2’s complement encoding of the integer −th; msb stored in row 0. */
4 for k = 0 to v − 1 do
5 s0 ← gf2x modular monomial mul CT(H[i][k],s)
6 bs upci ← BitSliced increment(bs upci , s0 )
/* s0 is processed as a (dlog2 (v)e + 1) × p) zero bit-matrix except for the
first row that actually stores each syndrome bit as the msb in a column;
bs upci is updated performing an addition between each pair of corresponding
columns in bs upci and the s0 bit-matrix; the addition is vectorized to act
on the bits stored in an entire row of bs upci at once */
7 for i = 0 to n0 − 1 do
8 ẽi ← BitWise NOT(Extract Row with Msb(bs upci ))
/* if the j-th msb is set (0≤j<p), then the unsatisfied parity-check counting
for ei [j] was greater or equal to th */
9 ēi ← ẽi ⊕ êi // update of the input p-bit êi to get the output ēi
10 s̄i ← si ⊕ Dense-By-Dense Multiplication(ẽi , HiT ) // s̄i =si ⊕ẽi ·HiT
11 return {ē, s̄}
Page 93
LEDAcrypt
of each of them; the polynomial representation of each block of the transposed parity-check matrix
H T is also assumed to be available for efficiency reasons. The decoding process strategy starts from
an (n0 p)-bit error vector estimate ê = (ê0 , ê1 , . . . , ên0 −1 ), where êi , 0≤i<n0 , is a p-bit polynomial,
and computes the number of parity equations (êH T = H êT = s) with both an asserted constant
value (i.e., sk = 1, 0 ≤ k < p) and a term in correspondence with the l-th bit of the error vector, for
all bits 0 ≤ l < pn0 . Subsequently, each of the said numbers of unsatisfied parity-check equations,
upcs for short, is compared with a pre-determined threshold th ∈ {d v2 e, . . . , v} to determine an
auxiliary (n0 p)-bit error vector ẽ having an asserted bit when the upc value is greater than or equal
than th and a clear bit otherwise. The auxiliary error vector ẽ is in turn employed to update the
input error vector estimate ê to get the output error vector estimate ē as ē = ẽ⊕ê. Finally, the value
of the syndrome is also updated to take into account the updates on the estimated error vector,
and returned as an output s̄, by computing s̄ = s̄ ⊕ ẽH T (or equivalently, as shown in Algorithm 14,
by adding to the input syndrome all columns of H corresponding to upc values greater or equal to
th).
The optimized out-of-place iteration function of the LEDAdecoder shown in Algorithm 19 replaces
the aforementioned array of n0 p upc counters (employed to compute the number of unsatisfied
parity-check equations including as a term the l-th bit, 0 ≤ l < n0 p, of the error vector and an
asserted bit syndrome as constant term) with a sequence of n0 bit-matrices bs upci , 0 ≤ i < n0 ,
with p-columns, each of which is thought to be in one-to-one correspondence with the analogous
bit position in the i-th p-bit portion of the initial error vector estimate êi . The number of rows in
the bit-matrix bs upci equals the number of binary digits needed to encode in two’s-complement
the signed integer value of each upc counter, i.e.: dlog2 (v)e + 1. The actual implementation of
Algorithm 19 assumes, conventionally, the first row of each bs upci , 0 ≤ i < n0 , (i.e., bs upci [0][:])
to store the least significant bits (lsb) of the upc values.
The computation of the upc values is performed by the steps shown in Algorithm 19, lines 2–
6, considering them as a sequence of n0 groups of p consecutive upcs, where the i-th group is
represented by the p columns of the bit-matrix bs upci . For each bs upci (line 2), the first step
of the algorithm (line 3) initializes each of its columns with the two’s-complement encoding of
−th, i.e., the opposite of the integer value kept as threshold to establish if a given upc counter
should determine a bit-flip in the corresponding position of the estimated error vector or not. The
idea is to establish if the bs upci column (counter) is greater than or equal to 0 by looking at the
elements in the last row of a bit-matrix and to proceed by updating the estimate error vector and
the syndrome value (lines 7–10) accordingly.
The actual computation of the upc values stored in the columns of bs upci proceeds observing that
for each position a of an asserted coefficient of the i-th block, Hi , of the parity-check matrix (a =
H[i][k], 0 ≤ k < v), the constant term of the parity-check equation, including the said coefficient as
a term, can be obtained rotating the p-bit syndrome polynomial as s0 ← xa s mod (xp − 1). This
monomial by polynomial multiplication in Z2 [x]/(xp − 1) must be performed in constant-time to
prevent the leakage of information related to the secret parity-check matrix from the timing side-
channel. To this end, the gf2x modular monomial mul CT routine presented in Section 5.2 is
employed (see Algorithm 19, line 5). As a consequence of the computation of the rotated syndrome
s0 , the update of the upc values stored in the j-th column, 0 ≤ j < p, of bs upci can proceed by
incrementing it by either one or zero, depending on the value of the j-th bit in s0 . To perform
such an increment, s0 is processed as a (dlog 2(v)e + 1) × p) zero bit-matrix except for the first row
that stores each bit of s0 as the lsb of the zero or one two’s complement value represented in a
column. The update of the j-th column in bs upci is performed by adding to it the j-th column of
Page 94
LEDAcrypt
the bit-matrix representing s0 . The implementation of the BitSliced increment routine shown
in Algorithm 19, line 6, considers the Boolean formulae to realize the increment operation and
parallelizes its computation over multiple upc values by applying the same Boolean operation
simultaneously on all the bits in the same row of the matrices, exploiting the vector instructions of
the computational platform at hand.
Once the upc values has been computed as columns of bs upci , 0 ≤ i < n0 , the final steps of Algo-
rithm 19 (lines 7–10) extract the last row of the i-th bit-matrix, bs upci [dlog2 (v)e][:] and compute
the i-th portion of the estimated error vector corrections ẽi as the bitwise NOT of bs upci [dlog2 (v)e][:].
Subsequently, the i-th portion of the input estimated error vector êi is xored with the i-th portion
of the error vector corrections ẽi , to obtain the corresponding portion of the output error vector
estimate ēi (lines 8-9). Finally, the output syndrome is evaluated by subtracting (via bitwise-xor)
from the input syndrome the product between the estimated error vector corrections ẽ and the
transposed parity-check matrix (line 10).
A final remark on the memory access patterns during the computation of line 6, suggests that a
memory representation of bs upci providing greater cache-friendliness is a different one from the
canonical linearization of the bs upci matrix by row, as operated by the C language. Indeed, we
found out by experimental assessment, that splitting the bs upci matrix in blocks of columns as
wide as the vector instructions offered by the underlying platform (i.e., 256 bits for the AVX2
instructions in Intel Haswell CPUs and successors), and linearizing each block by row, yielded a
speedup of around 2×. We ascribe this speedup to the fact that when BitSliced Increment
operation is computed, on a portion of 256 columns of the bs upci matrix, the said portion resides
in a single cache line, therefore minimizing the performance impact of the store operations.
Page 95
LEDAcrypt
Euclid’s algorithm to compute the greatest common divisor, gcd, between two polynomials is
commonly employed to derive a polynomial time algorithm yielding the multiplicative inverse of
an element in the multiplicative group of a Galois field represented in polynomial basis, e.g., the
multiplicative group of GF(2m ), m ≥ 2 with GF(2m ) ∼ = Z2 [x]/(f (x)), where f (x) is an irreducible
polynomial with deg(f (x)) = m.
A schoolbook analysis considers a polynomial a(x) and the irreducible polynomial f (x), em-
ployed to represent the field elements (deg(a(x)) < deg(f (x))), to iteratively apply the equality
gcd(f (x), a(x))= gcd(a(x), f (x) mod a(x)) and derive the computation path leading to the non-null
constant term representing the greatest common divisor d (e.g., d = 1, when Z2 [x] is considered):
for a proper number of iterations, z ≥ 2. A close look to the previous derivation shows that
two series of auxiliary polynomials wi (x) and ui (x), 0 ≤ i ≤ z − 1, can be defined to derive the
series ofjremaindersk ri (x) as: ri (x) = ri−2 (x) − qi (x) · ri−1 (x) = wi (x) · f (x) + ui (x) · a(x), where
ri−2 (x)
qi (x) = ri−1 (x) , with r0 (x) = f (x) and r1 (x) = a(x). Specifically, the computations shown above
can be rewritten as follows:
Page 96
LEDAcrypt
Finally, the multiplicative inverse of a(x) is obtained from the equality d = wz−1 (x)·f (x)+uz−1 (x)·
a(x), by computing:
a(x)−1 ≡ d−1 · uz−1 (x) mod f (x) .
In the last derivation of remainder polynomials, it is worth noting that wi (x) = wi−2 (x) − qi (x) ·
wi−1 (x), and ui (x) = ui−2 (x) − qi (x) · ui−1 (x), with w0 (x) = 1, u0 (x) = 0 and w1 (x) = 0, u1 (x) = 1.
Restricting ourselves to the case of binary polynomials, Algorithm 20 shows the pseudo-code for
computing an inverse employing only the strictly needed operations. Note that the execution of
an actual division would prevent the efficient implementation of the algorithm, therefore in the
following section we consider the optimizations to this algorithm that circumvent its computation.
The naming conventions adopted in Algorithm 20 denote as R(x) the i-th remainder of the gcd
procedure described before, and as S(x) the (i − 1)-th remainder, with i ≥ 1. Furthermore, the i-th
u(x) value is denoted as U(x), while the value it took at the previous iteration (i.e., the (i − 1)-th
u(x) value) is denoted as V(x), with i ≥ 1.
Note that in the last iteration (the z-th one, counting the first as 1) R(x) = rz (x) = 0, while
S(x) = rz−1 (x) = 1.
The rationale to keep the notation of the pseudo-code variables as polynomials in x lies on the
willingness of not specifying the implementation dependent choice for the endianness of values
stored in array variables, i.e., if the least significant coefficient of a polynomial is stored in the first
or last cell of an array.
In the case of LEDAcrypt where the arithmetic operations are performed in Z2 [x]/(xp − 1), with
Page 97
LEDAcrypt
ord2 (p) = p − 1, it is worth noting that Euclid’s algorithm can still be applied to compute the
unitary elements as the factorization of f (x) = xp − 1 in irreducible factors, i.e., f (x) =
inverse of P
(x + 1) · ( p−1 i
i=0 x ), shows that every binary polynomial a(x) with degree less than p and an odd
number of asserted coefficients, that is, also different from the said factors, (i.e., any polynomial
representing a circular block in the LEDAcrypt) always admits a greater common divisor equal
to one (i.e., gcd(a(x), f (x)) = 1). As a consequence, the steps of Euclid’s algorithm to compute
inverses remain the same shown before.
In [17] Brunner et al. optimized the computation performed by Algorithm 20 embedding into
it the evaluation of the quotient and remainder of the polynomial division bS(x)/R(x)c. Indeed,
Algorithm 21 performs the division operation by repeated shifts and subtractions, while keeping
track of the difference δ between the degrees of polynomials in S(x) (which is the dividend and
starts by containing the highest degree polynomial, that is, the modulus f (x)) and R(x) (which
is the divisor, and is initialized to the polynomial whose inverse should be computed, a(x)). The
difference between the degrees, δ = deg(S) − deg(R) is updated each time the degree of either S(x)
or R(x) is altered, via shifting.
Page 98
LEDAcrypt
Brunner et al. in [17] observed that, since either a single bit-shift or a subtraction is performed
at each iteration, the number of iterations equals 2m, that is the number m of single bit-shifts
needed to consider every bit of the element to be inverted, plus the number of substractions to be
performed after each alignment. The last iteration of Algorithm 21 computes R(x) = 0, S(x) = 1
(i.e., deg(R) = deg(S) = 0), and outputs the result in V(x). The part of Algorithm 21 dealing with
computations on U(x) and V(x) mimics the same operations, in the same order, that are applied to
R(x) and S(x), respectively, as the schoolbook algorithm does (Alg. 20).
Kobayashi-Takagi-Takagi Algorithm
The target implementation of Algorithm 21 is a dedicated hardware one, therefore the algorithm
simplifies the computations to be performed in it, reducing them to single bit-shifts, bitwise xors
and single bit comparison. While this approach effectively shortens the combinatorial cones of a
hardware circuit, yielding advantages in timing, when directly transposed in a software implemen-
tation it may fail to exploit the wide computation units that are present in a CPU to the utmost. In
particular, modern desktops and high-end mobile CPUs are endowed with a so-called carryless mul-
tiplier computation unit. Such a unit computes the product of two binary polynomials with degree
lower than the architecture word size, w, with a latency which is definitely smaller than computing
the same operation with repeated shifts and xors (3–7 cycles on Intel CPUs [20]). To maximize the
exploitation of the available carryless multipliers, Kobayashi et al. [46] optimized further the gcd
based Algorithm 21. The main assumption in this sense is that a w-bit input carryless multiplier
is available on the target CPU architecture. Algorithm 22 reports the result of the optimization
proposed in [46], which still takes as input an irreducible polynomial of GF(2m ), employed to build
a representation for all its elements, and one of its invertible elements. However, these polynomials
and the ones computed as intermediate values of the algorithm, are now thought as polynomials of
degree at most M −1, with M = d(m+1)/we, having as coefficients binary polynomials with degree
w−1. For example, the polynomial S(x) = sm ·xm +sm−1 ·xm−1 +· · ·+s1 ·x+s0 , with sj ∈ {0, 1} and
0 ≤ j ≤ m, is processed by considering it as S(x) = SM −1 (x) · (xw )M −1 + · · · + S1 (x) · (xw )1 + S0 (x),
Si (x) = siw+w−1 · xw−1 + · · · + siw+1 · x + siw , where siw+l = 0 if iw + l > m, with 0 ≤ l ≤ w − 1,
and 0 ≤ i ≤ M − 1.
Employing this approach, the loop at lines 5–22 of Algorithm 21 is rewritten to perform fewer itera-
tions, each one of them corresponding to the computation made in w iterations of Algorithm 21. The
crucial observation is that the computations performed by Algorithm 21 on U(x), V(x),R(x),and
S(x)
U(x) R(x)
at each iteration can be represented as a linear transformation of the vectors and .
V(x) S(x)
Indeed, the authors represent the computation
portions
of Algorithm
21, driven
by the
values
of
U(x) x 0 U(x) R(x) x 0 R(x)
S(x) and R(x), present at lines 6–7 as = × , = × , and
V(x) 0 1 V(x) S(x) 0 1 S(x)
rewrite similarly also lines 11–12, line 14, lines 16–18 and line 21.
With this representation of the computation, Algorithm 22 selects a portion of R(x) and S(x)
as large as an architectural word (line 5 of Algorithm 22) and makes a copy of them (variables
C(x) and D(x)). Employing these copies, Algorithm 22 computes a single linear transformation,
H, cumulating the effects of w computations of the main loop of Algorithm 21, to apply them
all at once (line 28, Algorithm 22). While this approach appears more expensive, as polynomial
multiplications are employed to compute the values of U(x), V(x), R(x) and S(x), at the end of each
iteration of the loop at lines 5–33, we recall that such multiplications can be carried out at the cost
Page 99
LEDAcrypt
of a handful of xors by the carryless multipliers present on modern CPUs, effectively resulting in
a favourable tradeoff for the strategy described by Algorithm 22. To provide a further speedup,
Algorithm 22 has a shortcut in case entire word-sized portions of R(x) (which is initialized with the
input polynomial to be inverted) are filled with zeroes, a fact which can be efficiently checked in
one or a few CPU operations in software. Indeed, in that case, the resulting transformation to be
Page 100
LEDAcrypt
w
x 0
applied is given by H = ; therefore the authors of Algorithm 22 insert a shortcut (lines
0 1
6–9) to skip the expensive execution of the loop body computing a trivial transformation, simply
applying the aforementioned H, which boils down to two word-sized operand shifts (line 7). We note
that this efficiency improvement trick, while effectively making the inversion faster, also provides
an information leakage via the timing side channel. Indeed, due to this shortcut, the inversion
algorithm will be running faster if it is consuming words of the polynomial to be inverted which
contain only zeroes. As a result, some information on the length of the zero runs of the operand is
encoded in the timing side channel. We note that, in many code-based cryptoschemes, the position
of the asserted coefficients of the polynomial to be inverted represent the private key of the scheme.
As a final step, Algorithm 22 returns either the value of U(x) or the value of V(x), detecting which
is the variable containing the actual pseudo inverse (i.e., a(x)−1 xwM ), and performing an exact
division by xwM (i.e., a shift by M words).
Bernstein-Yang Algorithm
The last approach to polynomial inversion via improved Euclid algorithms, is the one recently
proposed in [14]. Bernstein and Yang provide a comprehensive study [14] of modular inversion and
greatest common divisor (gcd) computation both for integer and polynomial Euclidean domains.
In this document, we specialize the divide-et-impera strategy devised for the computation of gcds
and modular inverses for polynomial rings having coefficients over a generic GF(p) onto one which
only performs the computation of modular inverses of binary polynomials. We report the said
specialized algorithm as Algorithm 23. A full proof of the correctness of the algorithmic approach
is provided in [14], and we will omit it from the current document for space reasons. However,
to provide an intuition of the inner working of Algorithm 23, we note that it is possible to obtain
Algorithm 22 as a special case of it, highlighting the conceptual similarities. The key observation
made in [14] is that it is possible to split the operands recursively up to a point where the operands
become as small as the designer deems useful (a machine word, in our case) and then proceed to
compute a portion of the linear transform over GF(2m ) which operates on the inputs to provide
the modular inverse. Once such a transform is computed for the operand fragments, the transform
can be recombined by means of a multiplication of matrices over GF(2m ) yielding the modular
inverse. Algorithm 23 performs the operand splitting phase in the jumpdivstep function (defined
at lines 17–25), taking two polynomials f (x), g(x) of maximum degree n and the difference between
their degrees δ, and picking a splitting point j (line 20). The splitting point can be chosen as any
non null portion of the maximum degree n, although splitting the operands in half is likely to be
optimal. We report that the intuition of optimality of splitting the operands in half was verified to
be the optimal choice in our implementations. Two subsequent recursive calls to the jumpdivstep
function are made, splitting the input polynomials at their j-th degree term (lines 21 and 24), until
the maximum degree n is equal or smaller than the machine word size (w in the algorithm). When
this happens (“branch taken” at line 18 of Algorithm 23), the function handling the base case of
the recursion, divstep is invoked. divstep can be seen as a clever reformulation of n iterations,
of the loop body of Algorithm 21, with n being the parameter taken as input by divstep. The
computation of the loop body is decomposed into a conditional swap (lines 5–9), depending only
on the value of the difference between the operand degrees δ being positive and the constant term
of g(x) being equal to 1, and a sequence of steps (lines 10–13) performing the correct operation
between the two portions of the source operands f (x), g(x) and the auxiliary values.
Page 101
LEDAcrypt
Algorithm 23: Bernstein and Yang (BY) Inverse [14] specialized for GF(2m )
Input: f (x) irreducible polynomial of GF(2m ), m ≥ 2,
a(x) invertible element of GF(2m )
Output: V(x), such that a(x)−1 ≡ V(x) ∈ GF(2m )
Data: w: machine word size
// Base case, solved iteratively on n ≤ w
1 function divstep(n, δ, f (x), g(x)):
2 U(x) ← xn−1 ; V(x) ← 0
3 Q(x) ← 0; R(x) ← xn−1
4 for i ← 0 to n − 1 do
5 if (δ > 0 ∧ g0 = 1) then
6 δ ← −δ
7 Swap(f (x), g(x))
8 Swap(U(x), Q(x))
9 Swap(R(x), V(x))
10 δ =δ+1
11 Q(x) ← (f0 · Q(x) − g0 · U(x))/x // dropping the remainder
12 R(x) ← (f0 · R(x) − g0 · V(x))/x // dropping the remainder
13 g(x) ← (f0 · g(x) − g0 · f (x))/x // dropping the remainder
U(x) V(x)
14 H←
Q(x) R(x)
15 return δ, H
16
This approach allows a constant-time implementation using for the if construct Boolean-predicate
operations (lines 5–9).
To compute the polynomial inverse, Algorithm 23 invokes the jumpdivstep function on the re-
flected representation of both the modulus and the polynomial to be inverted, that is it considers
the polynomials S(x), R(x) of the same degree of f (x) and a(x), respectively, obtained swapping
the coefficient of the xi term with the one of the xdeg(S(x))−i (resp., xdeg(R(x))−i ) one. Both poly-
nomials, represented as degree 2m − 1 ones, adding the appropriate null terms, are processed by
the jumpdivstep call, which returns the final difference in degrees (expected to be null), and the
reflected representation of the polynomial inverse in the second element of the first row of H.
Page 102
LEDAcrypt
Page 103
LEDAcrypt
1 exp ← BinEncoding(p − 2)
2 expLength←dlog2 (p − 2)e
// scan of exp from right to left; l=0 ↔ LSB
3 b(x) ← a(x) // exp0 =1 as p−2 is an odd number
4 c(x) ← a(x)
5 for l ← 1 to expLength − 1 do
2l−1
6 c(x) ← c(x)2 · c(x) // 2l−1 squarings, 1 mul.
7 if expl = 1 then
2i
8 b(x) ← b(x)2 // 2l squarings
9 b(x) ← b(x) · c(x) // 1 multiplication
permutation which must be computed is fixed, and depends only on the values p and 2i , which
are both public and fixed, thus avoiding any meaningful information leakage via the timing side
channel. The authors of [24] observe that, since the required permutations, one for each value of
(2i mod p), with i ∈ {21 , 22 , . . . , 2dlog2 (p−2)e } are fixed, they can be precomputed, and stored in
lookup tables.
We note that the precomputation of such tables, while feasible, is likely to take a non-negligible
amount of memory. Indeed, such tables require p · (dlog2 (p − 2)e − 1) · dlog2 (p)e bits to be stored,
assuming optimal packing in memory. This translates into tables ranging between 136 kiB and 2, 856
kiB for the recommended prime sizes in LEDAcrypt [6]. While these table sizes are not problematic
for an implementation targeting a desktop platform, more constrained execution environments, such
as microcontrollers, may not have enough memory to store the full tables.
To this end, we introduce and examine a computational tradeoff, where only a small lookup table
comprising the (dlog2 (p−2)e−1) values obtained as (2i mod p), with i ∈ {21 , 22 , . . . , 2dlog2 (p−2)e }, is
i
precomputed and stored, while the position of the j-th coefficient, 0≤j≤p−1, of a(x)2 is computed
via a multiplication and a (single-precision) modulus operation online, as (j · (2i mod p)) mod p.
This effectively reduces the tables to a size equal to (dlog2 (p − 2)e − 1) · dlog2 (p)e bits, which
corresponds to less than 1 kiB for all the LEDAcrypt parameters.
Finally, the authors of [24] also note that, on x86 64 platforms, it is faster to precompute the inverse
permutation with respect to the one corresponding to raising a(x) to 2i , as it allows the collection
i
of binary coefficients of the result a(x)2 that are contiguous in their memory representation. This
i
is done determining, for each coefficient of a(x)2 , which was its corresponding one in a(x) through
the inverse permutation. The said inverse permutation maps the coefficient of the a-th degree term
i
in a(x)2 , 0≤a≤p−1, back to the coefficient of the (a · (2−i mod p) mod p)-th term in a(x). We
Page 104
LEDAcrypt
109
104
102
107 101
100
Fermat-sq BCH
106 BY Fermat-comp
10−1
KTT Fermat-tab
0 20 40 60 80 0 20 40 60 80
p (kbit) p (kbit)
(a) Running time of the different algorithms (b) Value of the t statistic computed on 103 samples
Figure 5.3: Experimental evaluation of the four modular inverse algorithms. (a) average running
time in clock cycles over 103 computations. (b) absolute value of the t-statistic for a Student’s t with
two populations of 103 execution times for each algorithm. The dashed horizontal line highlights
the upper bound of the range [−4.5, 4.5], pointing out no data-dependent changes in the timing
behaviors of the algorithms below it with a confidence of 99.999% (significance level α = 10−5 )
note that, also in this case, it is possible to either tabulate the entire set of inverse permutations,
as suggested by [24], or simply tabulate the values of (2−i mod p) and determine each position of
the permutation at hand partially at runtime.
We implemented in C11 all the aforementioned inversion algorithms, employing appropriate com-
piler intrinsics to exploit the features of the AVX2 ISA extensions available on the Intel Haswell
microarchitecture, selected by NIST as the standard evaluation platform for desktop applications.
In particular, we employed vector Boolean instructions to speed up the required xor and shift
operations which make up a significant portion of the reported algorithms.
We exploited the presence of the carryless multiplication instruction (pclmulqdq) which performs
a polynomial multiplication of two 64-bit elements in a 128-bit one. We implemented the poly-
nomial multiplication primitive applying the Toom-Cook (TC) optimized interpolation proposed
in [16] until the recursive operand-splitting computation path yields operands below 50, 64-bit
machine words. Then we employ a Karatsuba multiplication technique, with all multiplications
involving operands below 9 machine words performed picking the optimal sequence of addition &
multiplication instructions according to [70]. We determined the number of machine words where
the multiplication strategy is changed (between the TC and Karatsuba strategies) via exhaustive
exploration of the tradeoff points.
Concerning the exponentiations to 2i , as required by the optimized Fermat’s little based inverse,
we investigated the effects of the tradeoff reported in [24], where it is noted that, for small values
of i, it can be faster to compute the exponentiation to 2i by repeated squaring, instead of resorting
Page 105
LEDAcrypt
to a bitwise permutation. Indeed, it is possible to obtain a fast squaring exploiting the pclmulqdq
instruction which performs a binary polynomial multiplication of two, 64 terms, polynomials in a
128 terms one. While it may appear counterintuitive, this approach is faster than the use of the
dedicated Parallel Bits Deposit (pdep) instruction available in the AVX2 instruction set. Indeed,
the throughput obtained with the parallel bit deposit instruction is lower than the one obtained
via pclmulqdq. This is due to the possibility of performing two pclmulqdq bit-interleaving starting
from a 128-bit operand which can be loaded in a single instruction, as opposed to two pdep which
need to expand a 64-bit wide operand only. This fact, combined with the low latency of the
pclmulqdq instruction (5 to 7 cycles, depending on the microarchitecture, with a number of Clock
cycles Per Instruction (CPI) of 1 when the pipeline is full, while the pdep instruction has a 3
cycles latency, to act on operands having half the size) provides a fast zero interleaving strategy.
We also recall that the implementation of the pdep instruction on AMD CPUs is quite slow (50+
cycles) and non constant-time, as it is implemented in microcode. As a consequence, the use
of the pdep instruction would lead to a scenario where the non constant-time execution on an
(AVX2 ISA extension compliant) AMD CPU can compromise the security of the implementation.
Finally, we note that the pclmulqdq bit-interleaving strategy performs better than the one relying
on interleaving by Morton numbers [64], employing the 256-bit registers available via the AVX2
ISA extension, plus the corresponding vector shift and vector or instructions.
We computed the polynomial inversions picking the modulus f (x)=xp −1∈GF(2p )[x] for all the
values of p indicated in use in the ephemeral key exchange LEDAcrypt-KEM-CPA (Table 4.4) ,
and in the IND-CCA2 LEDAcrypt-KEM and LEDAcrypt-PKC with their two-iterations out-of-
place decoder (Table 4.1).
Figure 5.3 (a) provides a depiction of the average number of clock cycles taken to compute each
polynomial inversion algorithm, computed over 103 runs of it, acting on a randomly drawn, invert-
ible, polynomial on the Intel based machine. As it can be seen, the inverse computation strategy
exploiting large tables and Fermat’s little theorem proves to be the most efficient one for the entire
range of primes which are employed in the LEDAcrypt specification. A further noteworthy point
is that, if a compact memory footprint is desired, and the constant-time property is not strictly
required, as it is the case for ephemeral keypairs, Algorithm 22 provides a valid alternative to the
method relying on Fermat’s little theorem, with compact lookup tables (Fermat-comp in the legend)
for the low range of prime sizes. We also note that an implementation of Fermat’s little polynomial
inverse, employing only repeated squarings via bit interleaving, instead of a permutation based com-
putation of the exponentiation to 2i is significantly slower (Fermat-sq in Figure 5.3) than all other
methods. Precomputing only the smaller table of the values 2i mod p, i ∈ 21 , 22 , . . . , 2dlog2 (p−2)e
(the Fermat-comp line in Figure 5.3) still provides a good amount of the speedup obtained by
the precomputation,the full speedup being the one obtained by Fermat-tab, while employing very
little memory < 1kiB; Finally, we note that, if the modular inverse is not being computed on a
polynomial ring having a modulus with a peculiar structure, such as Z2 [x]/(xp − 1), the benefits
of performing the inversion via Fermat’s little theorem may be smaller, or nullified, with respect
to a completely general purpose inversion algorithm such as the one by Bernstein and Yang which,
from our analysis, provides competitive performance and is constant time.
We note that the implementation of the algorithms expected to be running in constant time (Al-
gorithm 23 (BY in the legend) and Algorithm 24 (Fermat-sq, Fermat-comp, and Fermat-tab in
the legend)) relies on their implementation not containing any secret-dependent branches, nor any
memory lookups whose address depends on a secret value. To provide an experimental validation
of the said fact Figure 5.3 (b) reports the result of the validation of the constant-time property
Page 106
LEDAcrypt
by means of a Student t-test, for each value of p, done on two timing populations of 103 samples
each taken, one when computing inverses of random invertible polynomials, and the other when
computing the inverse of the x2 + x + 1 trinomial. The choice of a trinomial is intended to elicit
the leakage of the position of the (very sparse) set coefficients, which represents the secret not to
be disclosed in cryptoschemes such as LEDAcrypt, as the position of the few set coefficients is
clustered at one end of the polynomial.
The values of the t-statistic, represented in absolute value in Figure 5.3 (b), for Algorithm 23 (BY
in the legend) and Algorithm 24 (all Fermat variants in the legend) are within the range [−4.5, 4.5]
(upper bound indicated with a dashed horizontal line in the figure), in turn implying that the t-
test is not detecting data-dependent changes in the timing behaviors with a confidence of 99.999%
(significance level α = 10−5 ). By contrast, both the method by Brunner et al. (BCH in the legend),
and the method by Kobayashi et al. (KTT in the legend) show t statistic values far out of the
interval [−4.5, 4.5], exposing (as expected) their non constant time nature.
Finally, we note that on a desktop platform, given a polynomial to be inverted a(x), making
the KTT and BCH algorithms immune to the timing side channel by replacing the computation
of a(x)−1 with λ(x) · (λ(x) · a(x))−1 , where λ(x) is a randomly chosen unitary polynomial, will
increase further their performance penalty w.r.t. the Fermat-tab solution.
Page 107
LEDAcrypt
The LEDAcrypt-KEM-CPA cryptosystem as described in Chapter 1 and Section 4.2 has been
designed with QC-MDPC code parameters targeting use-cases featuring ephemeral public/private
keypairs. Indeed, these use-cases involve a one round-protocol where i) the sender generates her
own public/private keypair and sends the public key to the receiver, together with the other public
information on the scheme; ii) the receiver employs the Encapsulation algorithm (Encap) fed with
the public key of the sender to generate both a secret symmetric-key and an encrypted payload
corresponding to the encryption of the said secret key; the payload is in turn sent back to the
sender to complete the protocol and to allow her to derive the same secret key value executing the
decapsulation (Decap) algorithm fed with her own private key and the encrypted payload.
Assuming both sender and receiver equipped with two instances of the same computational plat-
form, a first figure of merit assessing the performance of the implementation of the ephemeral-key
KEM scheme is defined as the sum of the execution timings of the KeyGeneration (TKeyGen ), En-
cap (TEncap ) and Decap (TDecap ) algorithms, measured as the required number of machine clock
cycles, i.e.: TKEM−CPA = TKeyGen + TEncap + TDecap .
A second figure of merit, which we employed also in the decoder performance analysis in Section 5.1,
keeping into account also the computational overhead experienced at both the receiver and sender
side to process the byte-streams exchanged over the network to transmit the public key and the
encrypted payload, respectively, is defined by adding to TKEM−CPA also the total number of trans-
mitted bytes, Nbytes , multiplied by a normalization coefficient β (measured as machine clock cycles
over bytes), i.e.: Tcombined = TKEM−CPA + β · Nbytes . An empirical value for such coefficient, which is
typically accepted as modeling the computational overhead due to the execution of the routines in
the software-stack employed for communication is β = 1000.
The use of a code profiling tool to analyze the software implementation of LEDAcrypt-KEM-CPA
reveals that the time to compute the single multiplicative inverse required during the execution of
the KeyGeneration algorithm takes from 40% to 60% of the total execution time TKEM−CPA , as
the rate n0n−1
0
, n0 ∈ {2, 3, 4}, of the underlying QC-MDPC code varies among the code configu-
rations reported in Table 4.4 in Section 4.2. In Section 5.4.2 and Section 5.4.3, we presented the
algorithm based on the application of the Fermat’s little theorem (see Algorithm 24) as the most
effective strategy to compute the multiplicative inverse of an element in Z2 [x]/(xp − 1), given the
LEDAcrypt-KEM-CPA parameters in Section 4.2. The computational complexity of the optimized
implementation of the Algorithm 24 highlights the need to compute the following operations to
invert an element a(x) ∈ Z2 [x]/(xp − 1):
(d(log2 (p − 2)e − 1 + HammingWeight(p − 2))×(PolMul + ExpPowerTwo(i))+1×PolSquaring
where PolMul denotes a polynomial multiplication, PolSquaring denotes a polynomial squar-
i
ing, while ExpPowerTwo(i) denotes the computation of a(x)2 , with i ∈ {21 , 22 , . . . , 2dlog2 (p−2)e }.
The optimized implementation of the inversion algorithm features the ExpPowerTwo(i) rou-
tine implemented with the aid of either a compact or a full permutation table (refer to Sec-
tions 5.4.2 and 5.4.3 for further details) that in turn exhibit the same execution time, respec-
tively, regardless of the input parameter i. Concerning the polynomial multiplication algorithm,
the Dense-By-Dense multiplication strategy described in Section 5.2 is applied, as the
Sparse-By-Dense multiplication strategy does not fit the operands involved in the inversion
algorithms. From these observations it is easy to infer that an inversion algorithm featuring values
for the p parameter that exhibit a low HammingWeight(p − 2) may provide better execution timings.
Page 108
LEDAcrypt
It is worth noting that only p values with ord2 (p) = p − 1 (i.e., such that 2 is a primitive element of
Zp ) that are larger than the ones shown in Section 4.2 allow to keep the same security guarantees.
This consideration, together with the one that increasing the size of p will inevitably introduce
a performance penalty in the execution of both the decoding and the multiplications algorithm,
justifies our choice of restricting to the range [p, p × 1.1] the search for new values to be assigned to
the p parameters of the distinct instances of LEDAcrypt-KEM-CPA. Indeed, the potential speedup
of Tcombined and/or TKEM−CPA figures of merit will quickly decrease as p increases.
best pnew within +5% from pold best pnew within +5% from pold
best pnew within +5% from pold (compact tables) best pnew within +5% from pold (compact tables)
best pnew within 0%–10% from pold best pnew within 0%–10% from pold
best pnew within 0%–10% from pold (compact tables) best pnew within 0%–10% from pold (compact tables)
20 10
17.5
15 7.5
12.5 5
speedup (%)
10 speedup (%)
7.5 2.5
5
2.5 0
0 −2.5
−2.5
−5 −5
20 40 60 80 20 40 60 80
n0 pold (kbit) n0 pold (kbit)
(a) Figure of merit: total execution time, TKEM−CPA (b) Figure of merit: total execution time plus the
time needed to process the transmitted/received byte-
streams, Tcombined = TKEM−CPA + β · Nbytes , with β = 1000
Figure 5.4: Speedups or slowdowns obtained for the LEDAcrypt-KEM-CPA figures of merit (key
generation+encryption+decryption time,TKEM−CPA , and related Tcombined = TKEM−CPA + β · Nbytes ),
when employing the values of p larger than the ones required by security requirements alone
Figure 5.4 shows the speedups/slowdowns of the performance of all LEDAcrypt-KEM-CPA in-
stances, with respect to the figures of merit TKEM−CPA (Figure 5.4(a)) and Tcombined (Figure 5.4(b)),
when the parameter p in each code configuration in Table 4.4 in Section 4.2, denoted as pold from
now on, is increased up to 10%. Specifically, the x-axes report the error vector size n0 pold for all
QC-MDPC code configurations in Table 4.4 (i.e., n0 ∈ {2, 3, 4} and NIST Category equal to 1, 3,
5), while the y-axes denotes the speedups/slowdowns of the figure of merit at hand computed as
T −T
100× pnewTp pold .
old
The green and blue bars consider LEDAcrypt-KEM-CPA instances featuring an inverse algorithm
implemented with the aid of full permutation tables and refer to pnew values that yield the best
results when picked in a range within 0%-5% and 0%-10% from pold , respectively. The red and
black bars consider LEDAcrypt-KEM-CPA instances featuring an inverse algorithm implemented
with the aid of compact permutation tables (useful to have a portable implementation for platforms
with limited memory requirements), and refer to pnew values that yield the best results when picked
in a range within 0%-5% and 0%-10% from pold , respectively.
The three macro-groups of bars, from left to right, in each figure, refer to QC-MDPC code param-
Page 109
LEDAcrypt
Table 5.2: New p values for the LEDAcrypt-KEM-CPA instances replacing the ones reported in
Table 4.4 in Section 4.2
NIST
n0 pold pnew pnew −pold HW(pnew −2)−HW(pold −2)
Category
2 10,853 10,883 30 -2
1 3 8,237 8,237 0 0
4 7,187 7,187 0 0
NIST
n0 pold pnew pnew −pold HW(pnew −2)−HW(pold −2)
Category
2 20,981 21,011 30 -4
3 3 15,331 15,373 42 -2
4 13,109 13,109 0 0
NIST
n0 pold pnew pnew −pold HW(pnew −2)−HW(pold −2)
Category
2 35,117 35,339 222 -2
5 3 25,579 25,603 24 -5
4 21,611 21,611 0 0
eters addressing the requirements of the NIST Categories 1, 3 and 5, respectively. Within each
macro-group of bars, from left to right, in each figure, bundles of four bars (green, red, blue, black)
refer to QC-MDPC code parameters with n0 equal to 2, 3 and 4, respectively.
As it is evident from both Figure 5.4(a) and Figure 5.4(b), not all LEDAcrypt-KEM-CPA instances
benefit from an enlargement of the parameter p. In particular, focusing on the figure of merit
Tcombined shown in Figure 5.4(b), we updated the values of p reported in Table 4.4 in Section 4.2
as shown in Table 5.2, raising the value of p where either a gain above 2.5% in the metric can
be achieved by the solution with the compact-tables inverse, or the increase in the value of p is
negligible (i.e. < 50).
Page 110
Chapter 6
In this chapter, we provide an analysis of the overall performance of the LEDAcrypt primitives, and
provide recommendations on the choices of the code rate to be made to optimize two key figures
of merit: the computation speed of the primitive and the combined execution time-transmitted
data metric. The timing results reported for the IND-CCA2 primitives LEDAcrypt-KEM and
LEDAcrypt-PKC relate to a constant time implementation.
In this section, we consider the key and ciphertext lengths for the LEDAcrypt primitives. Con-
cerning LEDAcrypt-KEM-CPA, Table 6.1 reports all the key sizes, ciphertext sizes and overall
transmitted data during a key agreement, as a function of the NIST category and n0 . It can be no-
ticed that the amount of transmitted data grows with the increase of n0 , while the ciphertext alone
of the LEDAcrypt-KEM-CPA shrinks. Indeed, this fact is due to the shrinking of the value of p for
codes sharing the same NIST Category and with increasing n0 , as the ciphertext of LEDAcrypt-
KEM-CPA is indeed a p-bit long syndrome (plus some negligible overhead), while the entirety of
transmitted data also includes the public key (as it is intended for ephemeral use).
The growth trend of the transmitted amount of data is reversed in LEDAcrypt-KEM (data reported
in Table 6.2), where there is no need to retransmit the public key at each communication, indeed
limiting the ciphertext to the p-bit syndrome alone. This in turn allows LEDAcrypt-KEM to
transmit smaller amounts of data (for the same NIST Category) than LEDAcrypt-KEM-CPA.
However, the transmitted overhead of providing the ephemeral key service employing LEDAcrypt-
KEM-CPA, in scenarios where perfect forward secrecy is desired, such as TLS 1.3, is limited to 704
extra bytes to be sent.
We note that the configurations of LEDAcrypt-KEM-CPA and LEDAcrypt-KEM keep the amount
of transmitted data well below the 20,480 B mark (down to a quantity between a half and a third
of it), which was indicated by Stebila et al. in [21] to be causing issues in some implementations
of OpenSSL, and definitely below the 216 − 1 = 65, 535 B mark size of the public key field in the
TLS 1.3 handshake protocol as reported in [21]. Indeed, in [21] the authors report the successful
111
LEDAcrypt
Table 6.1: Key, ciphertext and transmitted data sizes for LEDAcrypt-KEM-CPA instances
employing the QC-MDPC parameters reported in Table 4.4 in Section 4.2, and partially updated
as shown in Table 5.2 in Section 5.5
Page 112
LEDAcrypt
Table 6.2: Key, ciphertext and transmitted data sizes for the IND-CCA2 LEDAcrypt-KEM in-
stances employing the QC-MDPC parameters reported in Table 4.1 in Section 4.1 and the LEDAde-
coder designed to execute two out-of-place iteration functions
Page 113
LEDAcrypt
Table 6.3: Key, ciphertext and transmitted data sizes for the IND-CCA2 LEDAcrypt-PKC in-
stances employing the QC-MDPC parameters reported in Table 4.1 in Section 4.1 and the LEDAde-
coder designed to execute two out-of-place iteration functions
Page 114
LEDAcrypt
In this section, we report the measured running times for LEDAcrypt-KEM, LEDAcrypt-PKC, and
LEDAcrypt-KEM-CPA on our experimental evaluation setup, constituted by an Intel Core i5-6600,
clocked at 3.2 GHz, where the turbo-boost feature was disabled.
We report that LEDAcrypt-KEM-CPA takes around 600µs for the computation of an entire ephemeral
key exchange (namely, key generation, encapsulation and decapsulation) for NIST category 1, with
the best timing achieved by the solution featuring a code with n0 = 4. Indeed, picking n0 > 2 is
the favourable solution to minimize the execution time of LEDAcrypt-KEM-CPA for all the NIST
categories. To put the computation time of LEDAcrypt-KEM-CPA in perspective, the OpenSSL
1.1.1d benchmark for the elliptic curve Diffie-Hellman key exchange for the NIST-P-384 curve, re-
ports an execution time of 1.08 ms on the same experimental workbench, and the computation of
the encryption and decryption primitives alone (without key generation) of RSA take 645 µs with
a 2048 bit modulus, and 1.93 ms with a 3072 bit modulus. LEDAcrypt-KEM-CPA thus provides
comparable computation times with current ephemeral key exchange primitives.
The execution times reported for LEDAcrypt-KEM, (Table 6.5) show that, for parameters meeting
the strict requirement providing a DFR of 2−λ (where 2λ , λ ∈ {128, 192, 256}, is the expected
security margin of the scheme) to attain IND-CCA2 guarantees, LEDAcrypt-KEM takes around
1ms to complete the encapsulation and decapsulation procedure, a time indeed comparable with
the RSA-3072 key exchange, which provides the same expected security margin.
We note that, in the case of LEDAcrypt-KEM, the best timings for the overall long term key
exchange (encapsulation plus decapsulation) are achieved with choices of n0 ∈ {2, 3}, depending
on the NIST category. This change with respect to LEDAcrypt-KEM-CPA is due to the fact that
the increase in the code length n0 p turns in a computational requirement increase of the decoder,
which is not compensated by the smaller size of the multiplication operands for the n0 overall
multiplications performed in the encryption and decryption primitives.
We note that the entire implementation of the LEDAcrypt-KEM and LEDAcrypt-PKC primitives
provides constant time execution, preventing the information leakage via the timing side chan-
nel altogether. The high variance measured in the LEDAcrypt-KEM and LEDAcrypt-PKC key
generation processes is entirely due to the key rejection sampling procedure, which restarts the
constant time key generation procedure altogether if the produced key does not meet the require-
ments described in Chapter 3, effectively discarding all previous key related information. Therefore,
such a time variation, does not provide any information on the generated keypair.
Finally, we report the execution times for the LEDAcrypt-PKC primitive in Table 6.6, taken when
encrypting a random, 1 kiB sized message. A message with the said size is always less than the
information word size (pn0 -bit) in input to the encryption primitive. From the reported timings
it can be seen that the impact of the Kobara-Imai construction on the overall running time of
the KEM is around 30%, and mostly due to the increase in the encryption time required by the
constant-time constant weight encoding primitive, implemented according to [7], as well as to the
increase in the decryption time required by the computation of the syndrome to be fed to the
decoder (as LEDAcrypt-PKC is McEliece-based scheme).
Page 115
LEDAcrypt
Table 6.5: Execution times in milliseconds for the IND-CCA2 LEDAcrypt-KEM instances
Page 116
LEDAcrypt
Table 6.6: Execution times in milliseconds for the IND-CCA2 LEDAcrypt-PKC instances
Page 117
LEDAcrypt
In this section, we analyze the running times of the LEDAcrypt primitives, expressed as clock cycle
counts on the benchmarking platform, together with the combined time-transmitted data metric,
to highlight which choices of the parameter n0 provide the best results.
Concerning the case of LEDAcrypt-KEM-CPA (Table 6.7), we observe that the fastest solutions
for the entire ephemeral KEM are the ones relying on n0 = 4 for category 1 and 3, and on n0 = 3
for category 5. This effect is due to the significant speed gains obtained by reducing the size of
the operand of the modular polynomial inverse operation. While evaluating speed alone leads to
the aforementioned n0 choices, if the amount of transmitted data is taken into account combining
it with the computation time by multiplying the number of transmitted bytes by β = 1000 and
adding it to the clock cycle count, the most efficient solutions become the ones based on n0 = 2 or
n0 = 3. This change is due to the impact of the increase in the transmitted data which, considering a
normalization coefficient β = 1000 overcomes the speed advantage. While this provides a reasonable
hint in terms of the best tradeoff when considering the computation time and cost of transmission
together, we recall that such a metric may not fit extreme scenarios where the communication
link throughputs and latencies are either very good (i.e., high throughput, low latency, as on an
intra-device communication bus), or very bad (e.g., over the air links with limited bandwidth, such
as the ones employed by the LoRa protocol in a low-power wide-area network).
Analyzing the speed and combined metric results for LEDAcrypt-KEM (Table 6.8), where the
latency of the key exchange does not consider the key generation action (as it is supposed to be
widely amortized over its use), the best, i.e., lowest, computational requirements are obtained
picking n0 = 2 or n0 = 3, depending on the NIST category at hand, since the side-effect of
increasing the code length, which comes with increasing the code rate, raises the computational load
of the decoder, and picking n0 > 2 increases the number of multiplications to be computed during
encryption. However, when evaluating the overall efficiency including the amount of transmitted
data, following the same metric described before, we obtain that the reduction in the amount of
data to be sent, coming from the smaller values of p, which are admissible when n0 > 2, is chosen are
enough to offset the additional computational requirements. Indeed, when considering a combined
metric, the best choices for LEDAcrypt-KEM are the ones picking n0 = 3 or n0 = 4 depending on
the NIST category at hand.
Finally, we provide the computational load required by LEDAcrypt-PKC (Table 6.9), and evaluate
the extent of the overhead of computing the PKC with respect to the KEM alone. We observe that,
in computing the LEDAcrypt-PKC the overheads imposed by the computation of the Kobara-Imai
construction, as opposed to the simpler one required by LEDAcrypt-KEM are however relatively
contained (≈ 30%), especially for the LEDAcrypt instances providing higher security margins.
Page 118
LEDAcrypt
Table 6.7: Performance figures of merit for LEDAcrypt-KEM-CPA. Execution times are re-
ported in clock cycles (cc), while values for Tcombined are reported in equivalent cc, being it
computed by increasing the overall execution time with β × N, where β = 1000 and N is the total
number of bytes transmitted over the communication channel (i.e., public key+ciphertext)
Table 6.8: Performance figures of merit for the IND-CCA2 LEDAcrypt-KEM instances. Execu-
tion times are reported in clock cycles (cc), while values for Tcombined are reported in equivalent
cc, being it computed by increasing the overall execution time with β × N, where β = 1000 and N
is the total number of bytes transmitted over the communication channel (i.e., ciphertext)
Page 119
LEDAcrypt
Table 6.9: Execution times in clock cycles (cc) for the IND-CCA2 LEDAcrypt-PKC instances
Page 120
LEDAcrypt
In this section we report our recommendation on the choices for the QC-MDPC code configurations
of all the LEDAcrypt cryptosystems. In doing this, we consider three metrics: the amount of
transmitted data, the computation time required and the combined metric made by the number of
clock cycles taken by the computation plus the number of transmitted bytes multiplied by 1000.
We do not directly take as an optimization metric the size of the public key alone: our motivation to
do so is twofold. First of all, in the scenarios where the public key needs to be communicated often,
such as the ephemeral key use in LEDAcrypt-KEM-CPA, its size is already taken into account in the
communicated data size. Second, in all three LEDAcrypt systems (LEDAcrypt-KEM, LEDAcrypt-
PKC, and LEDAcrypt-KEM-CPA) the concerns on the public key size regarding the impossibility
to fit in practical use constraints are fulfilled for all the public key sizes we propose, as we analyzed
in Section 6.1. Indeed, our public key sizes are always smaller than the 216 − 1 bound mandated
by the public key size field in TLS 1.3, and fit rather comfortably even in the Flash memory of
modern (e.g., ARM Cortex-M based), low end microcontrollers.
Table 6.10 reports our recommendation on the optimal choices for the QC-MDPC code configu-
rations of the LEDAcrypt-KEM-CPA instances. If the transmitted data size alone is preferred,
the choice of n0 = 2 fits all the three NIST categories, while if the computation speed alone is
taken into account, picking n0 > 2 is always the most efficient choice. An analysis of the combined
metric, allows us to observe that picking n0 = 2 for NIST category 1 and 3, and n0 = 3 for NIST
category 3 yields the best results when balancing computation and transmitted data. Indeed, if the
transmission of both the public key and the ciphertext of the LEDAcrypt-KEM-CPA is required,
as it is in an ephemeral key use scenario, the increase in the transmitted data is compensated in a
combined metric by the reduction of the prime size only for NIST category 5.
Page 121
LEDAcrypt
Table 6.11 reports our recommendation on the optimal choices for the QC-MDPC code configu-
rations of the IND-CCA2 version of the LEDAcrypt-KEM, and, since the parameter choice acts
only on the asymmetric components of the PKC system, also for the code configurations of the
LEDAcrypt-PKC instances.
Examining the results, we have that our recommendation is to pick either n0 = 4, or n0 = 3 in
all cases where the transmitted data or the combined computation-transmission metric is to be
optimized. Indeed, the reduction in transmitted data which is achievable raising the code rate
( n0n−1
0
) is enough to compensate the increase in computational power required during encryption
and decryption. Contrariwise, if computation speed alone is the criterion for optimization, n0 = 3
should be chosen for NIST category 1 parameters, and n0 = 2 for NIST category 3 and 5 parameters.
Finally, in our recommended parameter choices, we point out also a parameter selection when a
DFR= 2−64 only is guaranteed. These parameter sets, while not matching the formal requirement
mandated by [39] to attain IND-CCA2 security assurances, provide a practical data-point for
scenarios where a concrete DFR= 2−64 is not likely to provide to an attacker the opportunity
of observing a decoding failure. This includes, for instance, low-end embedded devices, which are
unlikely to exceed 224 encryptions during their entire lifetime (e.g., in smartcards the number of
encryptions is typically capped via a hardware counter), thus bringing the probability of an attacker
observing a decoding failure to be practically negligible.
In this scenario, the summary of the results, reported in Table 6.12, recommends to pick n0 = 4, if
minimizing the transmitted data is the end goal, n0 = 3 or n0 = 4 depending on the NIST category
if the combined metric is to be optimized, and n0 = 2 or n0 = 3 if the computation speed is the
only aim of the optimization.
Page 122
LEDAcrypt
Page 123
Bibliography
124
LEDAcrypt
[11] T. P. Berger, P. Cayrel, P. Gaborit, and A. Otmani, “Reducing key length of the McEliece
cryptosystem,” in Progress in Cryptology - AFRICACRYPT 2009, Second International
Conference on Cryptology in Africa, Gammarth, Tunisia, June 21-25, 2009. Proceedings, ser.
Lecture Notes in Computer Science, B. Preneel, Ed., vol. 5580. Springer, 2009, pp. 77–97.
[Online]. Available: https://doi.org/10.1007/978-3-642-02384-2\ 6
[12] E. Berlekamp, R. McEliece, and H. van Tilborg, “On the inherent intractability of certain
coding problems,” IEEE Trans. Information Theory, vol. 24, no. 3, pp. 384–386, May 1978.
[13] D. J. Bernstein, “Grover vs. mceliece,” in Post-Quantum Cryptography, Third International
Workshop, PQCrypto 2010, Darmstadt, Germany, May 25-28, 2010. Proceedings, ser. Lecture
Notes in Computer Science, N. Sendrier, Ed., vol. 6061. Springer, 2010, pp. 73–80. [Online].
Available: https://doi.org/10.1007/978-3-642-12929-2\ 6
[14] D. J. Bernstein and B. Yang, “Fast constant-time gcd computation and modular inversion,”
IACR Trans. Cryptogr. Hardw. Embed. Syst., vol. 2019, no. 3, pp. 340–398, 2019. [Online].
Available: https://doi.org/10.13154/tches.v2019.i3.340-398
[15] E. Bernstein and U. V. Vazirani, “Quantum complexity theory,” SIAM J. Comput., vol. 26,
no. 5, pp. 1411–1473, Oct. 1997.
[16] M. Bodrato, “Towards Optimal Toom-Cook Multiplication for Univariate and Multivariate
Polynomials in Characteristic 2 and 0,” in WAIFI 2007, Madrid, Spain, June
21-22, 2007, Proceedings, ser. Lecture Notes in Computer Science, C. Carlet and
B. Sunar, Eds., vol. 4547. Springer, 2007, pp. 116–133. [Online]. Available: https:
//doi.org/10.1007/978-3-540-73074-3\ 10
[17] H. Brunner, A. Curiger, and M. Hofstetter, “On Computing Multiplicative Inverses in
GF(2ˆm),” IEEE Trans. Computers, vol. 42, no. 8, pp. 1010–1015, 1993.
[18] T. Chou, “Qcbits: Constant-time small-key code-based cryptography,” in Cryptographic
Hardware and Embedded Systems - CHES 2016 - 18th International Conference, Santa
Barbara, CA, USA, August 17-19, 2016, Proceedings, ser. Lecture Notes in Computer Science,
B. Gierlichs and A. Y. Poschmann, Eds., vol. 9813. Springer, 2016, pp. 280–300. [Online].
Available: https://doi.org/10.1007/978-3-662-53140-2\ 14
[19] J. W. Chung, S. G. Sim, and P. J. Lee, “Fast Implementation of Elliptic Curve Defined
over GF(p m ) on CalmRISC with MAC2424 Coprocessor,” in Cryptographic Hardware
and Embedded Systems - CHES 2000, Second International Workshop, Worcester, MA,
USA, August 17-18, 2000, Proceedings, ser. Lecture Notes in Computer Science, Ç. K.
Koç and C. Paar, Eds., vol. 1965. Springer, 2000, pp. 57–70. [Online]. Available:
https://doi.org/10.1007/3-540-44499-8\ 4
[20] I. Corp., “Intel intrinsics guide,” https://software.intel.com/sites/landingpage/
IntrinsicsGuide/, 2020.
[21] E. Crockett, C. Paquin, and D. Stebila, “Prototyping post-quantum and hybrid key exchange
and authentication in TLS and SSH,” Cryptology ePrint Archive, Report 2019/858, 2019,
https://eprint.iacr.org/2019/858.
[22] S. de Vries, “Achieving 128-bit security against quantum attacks in OpenVPN,” Master’s
thesis, University of Twente, Aug. 2016. [Online]. Available: http://essay.utwente.nl/70677/
[23] N. Drucker, S. Gueron, and D. Kostic, “QC-MDPC decoders with several shades of
gray,” IACR Cryptology ePrint Archive, vol. 2019, p. 1423, 2019. [Online]. Available:
https://eprint.iacr.org/2019/1423
Page 125
LEDAcrypt
[24] ——, “Fast polynomial inversion for post quantum QC-MDPC cryptography,” IACR
Cryptology ePrint Archive, vol. 2020, p. 298, 2020. [Online]. Available: https:
//eprint.iacr.org/2020/298
[25] M. J. Dworkin, “SHA-3 Standard: Permutation-Based Hash and Extendable-Output Func-
tions,” U.S.A. Federal Information Processing Standard. (NIST FIPS). Report number: 202.
Available online at: https://doi.org/10.6028/NIST.FIPS.202, 2015.
[26] E. Eaton, M. Lequesne, A. Parent, and N. Sendrier, “QC-MDPC: A timing attack and a CCA2
KEM,” in PQCrypto, T. Lange and R. Steinwandt, Eds. Fort Lauderdale, FL, USA: Springer
International Publishing, Apr. 2018, pp. 47–76.
[27] T. Fabšič, V. Hromada, P. Stankovski, P. Zajac, Q. Guo, and T. Johansson, “A reaction attack
on the QC-LDPC McEliece cryptosystem,” in Post-Quantum Cryptography: 8th International
Workshop, PQCrypto 2017, T. Lange and T. Takagi, Eds. Utrecht, The Netherlands: Springer
International Publishing, Jun. 2017, pp. 51–68.
[28] T. Fabšič, V. Hromada, and P. Zajac, “A reaction attack on LEDApkc,” Cryptology ePrint
Archive, Report 2018/140, 2018, https://eprint.iacr.org/2018/140.
[29] M. Finiasz and N. Sendrier, “Security bounds for the design of code-based cryptosystems,”
in Advances in Cryptology - ASIACRYPT 2009, 15th International Conference on the Theory
and Application of Cryptology and Information Security, Tokyo, Japan, December 6-10, 2009.
Proceedings, 2009, pp. 88–105.
[30] E. Fujisaki and T. Okamoto, “Secure Integration of Asymmetric and Symmetric Encryption
Schemes,” in Advances in Cryptology - CRYPTO ’99, 19th Annual International Cryptology
Conference, Santa Barbara, California, USA, August 15-19, 1999, Proceedings, ser. Lecture
Notes in Computer Science, M. J. Wiener, Ed., vol. 1666. Springer, 1999, pp. 537–554.
[Online]. Available: https://doi.org/10.1007/3-540-48405-1\ 34
[31] ——, “Secure integration of asymmetric and symmetric encryption schemes,” J.
Cryptology, vol. 26, no. 1, pp. 80–101, 2013. [Online]. Available: https://doi.org/10.1007/
s00145-011-9114-1
[32] R. G. Gallager, “Low-density parity-check codes,” IRE Trans. Inform. Theory, vol. IT-8, pp.
21–28, Jan. 1962.
[33] ——, Low-Density Parity-Check Codes. M.I.T. Press, 1963.
[34] M. Grassl, B. Langenberg, M. Roetteler, and R. Steinwandt, “Applying Grover’s algorithm
to AES: quantum resource estimates,” in Post-Quantum Cryptography - 7th International
Workshop, PQCrypto 2016, Fukuoka, Japan, February 24-26, 2016, Proceedings, 2016, pp.
29–43.
[35] L. K. Grover, “A fast quantum mechanical algorithm for database search,” in Proc. 28th Annual
ACM Symposium on the Theory of Computing, Philadephia, PA, May 1996, pp. 212–219.
[36] A. Guimarães, D. de Freitas Aranha, and E. Borin, “Optimized implementation of QC-MDPC
code-based cryptography,” Concurrency and Computation: Practice and Experience, vol. 31,
no. 18, 2019. [Online]. Available: https://doi.org/10.1002/cpe.5089
[37] Q. Guo, T. Johansson, and P. Stankovski Wagner, “A key recovery reaction attack on QC-
MDPC,” IEEE Trans. Information Theory, vol. 65, no. 3, pp. 1845–1861, Mar. 2019.
[38] Q. Guo, T. Johansson, and P. Stankovski, “A key recovery attack on MDPC with CCA security
using decoding errors,” in Advances in Cryptology – ASIACRYPT 2016, ser. Lecture Notes in
Page 126
LEDAcrypt
Computer Science, J. H. Cheon and T. Takagi, Eds. Springer Berlin Heidelberg, 2016, vol.
10031, pp. 789–815.
[39] D. Hofheinz, K. Hövelmanns, and E. Kiltz, “A modular analysis of the Fujisaki-Okamoto
transformation,” in Theory of Cryptography - 15th International Conference, TCC 2017,
Baltimore, MD, USA, November 12-15, 2017, Proceedings, Part I, ser. Lecture Notes in
Computer Science, Y. Kalai and L. Reyzin, Eds., vol. 10677. Springer, 2017, pp. 341–371.
[Online]. Available: https://doi.org/10.1007/978-3-319-70500-2\ 12
[40] S. Jaques, M. Naehrig, M. Roetteler, and F. Virdia, “Implementing Grover oracles for
quantum key search on AES and LowMC,” CoRR, vol. abs/1910.01700, 2019. [Online].
Available: http://arxiv.org/abs/1910.01700
[41] H. Jiang, Z. Zhang, L. Chen, H. Wang, and Z. Ma, “IND-CCA-secure key encapsulation
mechanism in the quantum random oracle model, revisited,” in Advances in Cryptology -
CRYPTO 2018 - 38th Annual International Cryptology Conference, Santa Barbara, CA,
USA, August 19-23, 2018, Proceedings, Part III, ser. Lecture Notes in Computer Science,
H. Shacham and A. Boldyreva, Eds., vol. 10993. Springer, 2018, pp. 96–125. [Online].
Available: https://doi.org/10.1007/978-3-319-96878-0\ 4
[42] H. Jiang, Z. Zhang, and Z. Ma, “Tighter security proofs for generic key encapsulation mech-
anism in the quantum random oracle model,” in Post-Quantum Cryptography, J. Ding and
R. Steinwandt, Eds. Cham: Springer International Publishing, 2019, pp. 227–248.
[43] G. Kachigar and J. Tillich, “Quantum information set decoding algorithms,” in Post-Quantum
Cryptography - 8th International Workshop, PQCrypto 2017, Utrecht, The Netherlands, June
26-28, 2017, Proceedings, 2017, pp. 69–89.
[44] D. E. Knuth, The Art of Computer Programming: Generating All Combinations and Parti-
tions, 1st ed. Addison-Wesley, 2005, vol. 4.
[45] K. Kobara and H. Imai, “Semantically secure McEliece public-key cryptosystems —
conversions for McEliece PKC,” Lecture Notes in Computer Science, vol. 1992, pp. 19–35,
2001. [Online]. Available: citeseer.ist.psu.edu/kobara01semantically.html
[46] K. Kobayashi, N. Takagi, and K. Takagi, “Fast inversion algorithm in GF(2m )
suitable for implementation with a polynomial multiply instruction on GF(2),” IET
Computers & Digital Techniques, vol. 6, no. 3, pp. 180–185, 2012. [Online]. Available:
https://doi.org/10.1049/iet-cdt.2010.0006
[47] P. J. Lee and E. F. Brickell, “An observation on the security of McEliece’s public-key cryp-
tosystem,” in Advances in Cryptology - EUROCRYPT ’88, Workshop on the Theory and Ap-
plication of of Cryptographic Techniques, Davos, Switzerland, May 25-27, 1988, Proceedings,
1988, pp. 275–280.
[48] J. S. Leon, “A probabilistic algorithm for computing minimum weights of large error-correcting
codes,” IEEE Trans. Information Theory, vol. 34, no. 5, pp. 1354–1359, Sep. 1988.
[49] Y. X. Li, R. Deng, and X. M. Wang, “On the equivalence of McEliece’s and Niederreiter’s
public-key cryptosystems,” IEEE Trans. Information Theory, vol. 40, no. 1, pp. 271–273, Jan.
1994.
[50] M. Luby, M. Mitzenmacher, M. Shokrollahi, and D. Spielman, “Improved low-density parity-
check codes using irregular graphs,” IEEE Trans. Information Theory, vol. 47, no. 2, pp.
585–598, Feb. 2001.
Page 127
LEDAcrypt
[51] A. May, A. Meurer, and E. Thomae, “Decoding random linear codes in Õ(20.054n ),” in Ad-
vances in Cryptology - ASIACRYPT 2011 - 17th International Conference on the Theory and
Application of Cryptology and Information Security, Seoul, South Korea, December 4-8, 2011.
Proceedings, 2011, pp. 107–124.
[52] R. J. McEliece, “A public-key cryptosystem based on algebraic coding theory.” DSN Progress
Report, pp. 114–116, 1978.
[53] National Institute of Standards and Technology. (2016, Dec.) Post-quantum crypto project.
[Online]. Available: http://csrc.nist.gov/groups/ST/post-quantum-crypto/
[54] ——. (2018) Post-quantum cryptography - round 1 submissions. [Online]. Available:
https://csrc.nist.gov/Projects/Post-Quantum-Cryptography/Round-1-Submissions
[55] H. Niederreiter, “Knapsack-type cryptosystems and algebraic coding theory,” Probl. Contr.
and Inform. Theory, vol. 15, pp. 159–166, 1986.
[56] A. Nilsson, T. Johansson, and P. Stankovski Wagner, “Error amplification in code-based cryp-
tography,” IACR Transactions on Cryptographic Hardware and Embedded Systems, vol. 2019,
no. 1, pp. 238–258, Nov. 2018.
[57] R. Perlner, “Attack on LEDA’s product structure,” private communication, Oct. 2019.
[58] E. Prange, “The use of information sets in decoding cyclic codes,” IRE Trans. Information
Theory, vol. 8, no. 5, pp. 5–9, Sep. 1962.
[59] M. Rossi, M. Hamburg, M. Hutter, and M. E. Marson, “A Side-Channel Assisted Cryptanalytic
Attack Against QcBits,” in Cryptographic Hardware and Embedded Systems - CHES 2017
- 19th International Conference, Taipei, Taiwan, September 25-28, 2017, Proceedings, ser.
Lecture Notes in Computer Science, W. Fischer and N. Homma, Eds., vol. 10529. Springer,
2017, pp. 3–23. [Online]. Available: https://doi.org/10.1007/978-3-319-66787-4\ 1
[60] A. Salomaa, “Chapter II - Finite Non-deterministic and Probabilistic Automata,” in Theory
of Automata, ser. International Series of Monographs on Pure and Applied Mathematics,
A. Salomaa, Ed. Pergamon, 1969, vol. 100, pp. 71 – 113.
[61] P. Santini, M. Battaglioni, M. Baldi, and F. Chiaraluce, “Hard-decision iterative decoding of
ldpc codes with bounded error rate,” in ICC 2019 - 2019 IEEE International Conference on
Communications (ICC), May 2019, pp. 1–6.
[62] P. Santini, M. Baldi, and F. Chiaraluce, “Assessing and countering reaction attacks against
post-quantum public-key cryptosystems based on qc-ldpc codes,” in Cryptology and Network
Security, J. Camenisch and P. Papadimitratos, Eds. Cham: Springer International Publishing,
2018, pp. 323–343.
[63] P. Santini, M. Battaglioni, F. Chiaraluce, and M. Baldi, “Analysis of reaction and timing at-
tacks against cryptosystems based on sparse parity-check codes,” in Code-Based Cryptography,
M. Baldi, E. Persichetti, and P. Santini, Eds. Cham: Springer International Publishing, 2019,
pp. 115–136.
[64] Sean Eron Anderson, “Bit Twiddling Hacks,” https://graphics.stanford.edu/∼seander/
bithacks.html, last accessed 7 April 2020, 2020.
[65] N. Sendrier, “Decoding one out of many,” in Post-Quantum Cryptography - 4th International
Workshop, PQCrypto 2011, Taipei, Taiwan, November 29 - December 2, 2011. Proceedings,
2011, pp. 51–67.
Page 128
LEDAcrypt
[66] P. W. Shor, “Polynomial-time algorithms for prime factorization and discrete logarithms on a
quantum computer,” SIAM J. Comput., vol. 26, no. 5, pp. 1484–1509, Oct. 1997.
[67] V. Sidelnikov and S. Shestakov, “On cryptosystems based on generalized Reed-Solomon codes,”
Diskretnaya Math, vol. 4, pp. 57–63, 1992.
[68] B. Sim, J. Kwon, K. Y. Choi, J. Cho, A. Park, and D. Han, “Novel Side-
Channel Attacks on Quasi-Cyclic Code-Based Cryptography,” IACR Trans. Cryptogr.
Hardw. Embed. Syst., vol. 2019, no. 4, pp. 180–212, 2019. [Online]. Available:
https://doi.org/10.13154/tches.v2019.i4.180-212
[69] J. Stern, “A method for finding codewords of small weight,” in Coding Theory and Applications,
3rd International Colloquium, Toulon, France, November 2-4, 1988, Proceedings, 1988, pp.
106–113.
[70] C. Su and H. Fan, “Impact of Intel’s new instruction sets on software implementation of
GF(2)[x] multiplication,” Inf. Process. Lett., vol. 112, no. 12, pp. 497–502, 2012. [Online].
Available: https://doi.org/10.1016/j.ipl.2012.03.012
[71] J. Tillich, “The decoding failure probability of MDPC codes,” in 2018 IEEE International
Symposium on Information Theory, ISIT 2018, Vail, CO, USA, June 17-22, 2018, 2018, pp.
941–945.
[72] R. Townsend and E. J. Weldon, “Self-orthogonal quasi-cyclic codes,” IEEE Trans. Information
Theory, vol. 13, no. 2, pp. 183–195, Apr. 1967.
[73] R. Ueno, S. Morioka, N. Homma, and T. Aoki, “A high throughput/gate AES hardware
architecture by compressing encryption and decryption datapaths - Toward efficient CBC-
mode implementation,” in Cryptographic Hardware and Embedded Systems - CHES 2016 -
18th International Conference, Santa Barbara, CA, USA, August 17-19, 2016, Proceedings,
2016, pp. 538–558.
[74] A. Vardy, “The intractability of computing the minimum distance of a code,” IEEE
Trans. Information Theory, vol. 43, no. 6, pp. 1757–1766, 1997. [Online]. Available:
https://doi.org/10.1109/18.641542
[75] C. Xing and S. Ling, Coding Theory: A First Course. New York, NY, USA: Cambridge
University Press, 2003.
Page 129
Appendix A
We consider one call of RandomizedInPlaceIteration, with input error vector estimate ê and
actual error vector e. We denote with E0 the set of positions in which ê and e match, and with
E1 the set of positions in which they differ. We define t̂0 = |{Supp(e ⊕ ê) ∩ E0 }|, and t̂1 =
|{Supp(e ⊕ ê) ∩ E1 }|; clearly, t̂ = wt (e ⊕ ê) = t̂0 + t̂1 .
We now characterize the statistical distribution of t̂0 and t̂1 after n bit estimates, processed in the
order pointed out by π ∗ ∈ Sn∗ , i.e., the permutation which places at the end all the positions j
where êj 6= ej .
We characterize distribution of t̂0 during the computation of the iteration as a function of Pth
f |0 (t̂) and
th
Pm|0 (t̂), i.e. the probability that an error estimate bit will be flipped or maintained (Assumption
3.1.1).
To model the statistical distribution of t̂0 we employ the framework of Probabilistic Finite State
Automata (PFSA) [60]. Informally, a PFSA is a Finite State Automaton (FSA) characterized by
transition probabilities for each of the transitions of the FSA. The state of a PFSA is a discrete
probability distribution over the set of FSA states and the probabilities of the transitions starting
from the same FSA state, reading a the same symbol, must add up to one.
We model the evolution of the statistical distribution of t̂0 during the iteration computation as the
state of a PFSA having n − t̂ FSA states, each one mapped onto a specific value for t̂0 , as depicted
in Figure A.1. We consider the underlying FSA to be accepting the input language constituted by
binary strings obtained as the sequences π ∗ (t̂ ⊕ e).
We therefore have that, for the PFSA modeling the evolution of t̂0 while the R-IP BF decoder acts
on the first n − t̂ positions specified by π ∗ , all the read bits will be equal to 0, as π ∗ sorts the
positions of t̂ so that the (n − t̂ at the first iteration) positions with no discrepancy between ê and
e come first.
The transition probability for the PFSA transition from a state t̂0 = i to t̂0 = i + 1 requires the
130
LEDAcrypt
0, Pth
m|0 (t̂) 0, Pth
m|0 (t̂ + 1) 0, Pth
m|0 (n)
Figure A.1: Structure of the probabilistic FSA modeling the evolution of the distribution of the t̂0
variable. Read characters are reported in black, transition probabilities in red.
RIP decoder to flip a bit of ê equal to zero and matching the one in the same position of e, causing
a discrepancy. Because of Assumption 3.1.1, the probability of such an event is Pth f |0 (t̂ + i), while
the probability of not altering the bit, i.e., the self-loop transition from t̂0 = i to t̂0 = i itself, is
Pth
m|0 (t̂ + i).
Note that, during the inner loop iterations of the R-IP BF-decoder acting on positions of t̂ which
have no discrepancies it is not possible to decrease the value t̂0 , as no reduction on the number of
discrepancies between t̂ and e can be done changing values of t̂ which are already equal to the ones
in e. Hence, the probability of transitioning from t̂0 = i to t̂0 = i − 1 is zero.
The evolution of a PFSA can be computed simply taking the current state, represented as the
vector y of n − t̂ elements, with the i-th element containing Pr t̂0 = i and multiplying it by an
appropriate matrix which characterizes the transitions in the PFSA. Such a matrix is derived as
the adjacency matrix of the PFSA graph representation, keeping only the edges for which the read
character matches the edge label, and substituting the one-values in the adjacency matrix with the
probability labelling the corresponding edge.
We obtain the transition matrix for the PFSA in Figure A.1 as the (n − t̂ + 1) × (n − t̂ + 1) matrix:
th
Pth
Pm|0 (t̂) f |0 (t̂) 0 0 0 0
0
Pth th
m|0 (t̂ + 1) Pf |0 (t̂ + 1) 0 0 0
. . . . . .
K0 = .
. .
. .
. .
. .
. .
.
th th
0 Pm|0 (n − 1) Pf |0 (n − 1)
0 0 0
0 0 0 0 0 Pthm|0 (n)
Since we want to compute the effect on the distribution of t̂0 after the first n − t̂ rounds of the loop
in the randomized in place iteration function, we obtain it as yK0n−t̂ . Note that the subsequent
t iterations of the RIP decoder will not alter the value of t̂0 as they act on positions
j such that
ej = 1. Since we know that, at the beginning of the first iteration y = [Pr t̂0 = 0 = 1, Pr t̂0 = 1 =
h i
E0
0, Pr t̂0 = 2 = 0, · · · , Pr t̂0 = n − t̂ = 0], we are able to compute PrSn∗ ω −→ x as the (x + 1)-th
element of yK0n−t̂ .
We now model the distribution of t̂1 , during the last t̂ rounds of the loop in the randomized in
place iteration function. Note that, to this end, the first n − t̂ iterations of the inner loop have no
effect on t̂1 . Denote with t̃ the incorrectly estimated bits in wt (ẽ), which corresponds to wt (e ⊕ ê)
when the iteration function is about to evaluate the first of the positions j where êj 6= ej . Note
Page 131
LEDAcrypt
1, Pth
m|1
(t̃) 1, Pth
m|1
(t̃ − 1) 1, Pth
m|1
(t̃ − t̂)
Figure A.2: Structure of the probabilistic FSA modeling the evolution of the distribution of the t̂1
variable. Read characters are reported in black, transition probabilites in red-
that, at the beginning of the iteration function computation, we have t̃ = t̃0 + t̂1 , where t̃0 is the
number of discrepancies in the first n − t̂ positions when the iteration is about to analyze the
first position of ê for which wt (e ⊕ ê). Following arguments analogous to the ones employed to
model the PFSA describing the evolution for t̂0 , we obtain the one modeling the evolution of t̂1 ,
reported in Figure A.2. The initial distribution of the values of t̂1 , constituting the initial state
of the PFSA in Figure A.2 is such that Pr t̂1 = t̂ = 1, corresponding to the t̂1 . element vector
z = [0, 0, . . . , 0, 1]. Computing the transition matrix of the PFSA in Figure A.2 we obtain:
th
Pm|1 (t̃ − t̂) 0 0 0 0 0
Pth (t̃ − t̂ + 1) Pth (t̃ − t̂ + 1) 0 0 0 0
f |1 m|1
K1 =
.
.. .
.. .
.. .
.. .
.. .
..
0 0 0 P th (t̃ − 1) Pth (t̃ − 1) 0
f |1 m|1
0 0 0 0 Pth
f |1 (t̃) Pth (t̃)
m|1
h i
E1
We are thus able to obtain the PrSn∗ ω −→ x computing zK1t̂ and taking the (x + 1)-th element
of it.
Page 132
Statements
Statement by Each Submitter
I, Marco Baldi, of Università Politecnica delle Marche, Dipartimento di Ingegneria dell’ Infor-
mazione (DII), Via Brecce Bianche 12, I-60131, Ancona, Italy, do hereby declare that the cryp-
tosystem, reference implementation, or optimized implementations that I have submitted, known as
LEDAcrypt, is my own original work, or if submitted jointly with others, is the original work of the
joint submitters. I further declare that I do not hold and do not intend to hold any patent or patent
application with a claim which may cover the cryptosystem, reference implementation, or optimized
implementations that I have submitted, known as LEDAcrypt.
I do hereby acknowledge and agree that my submitted cryptosystem will be provided to the public
for review and will be evaluated by NIST, and that it might not be selected for standardization by
NIST. I further acknowledge that I will not receive financial or other compensation from the U.S.
Government for my submission. I certify that, to the best of my knowledge, I have fully disclosed
all patents and patent applications which may cover my cryptosystem, reference implementation or
optimized implementations. I also acknowledge and agree that the U.S. Government may, during
the public review and the evaluation process, and, if my submitted cryptosystem is selected for stan-
dardization, during the lifetime of the standard, modify my submitted cryptosystem’s specifications
(e.g., to protect against a newly discovered vulnerability).
I acknowledge that NIST will announce any selected cryptosystem(s) and proceed to publish the
draft standards for public comment.
I do hereby agree to provide the statements required by Sections 2.D.2 and 2.D.3, below, for any
patent or patent application identified to cover the practice of my cryptosystem, reference imple-
mentation or optimized implementations and the right to use such implementations for the purposes
of the public review and evaluation process.
I acknowledge that, during the post-quantum algorithm evaluation process, NIST may remove my
cryptosystem from consideration for standardization. If my cryptosystem (or the derived cryptosys-
tem) is removed from consideration for standardization or withdrawn from consideration by all
submitter(s) and owner(s), I understand that rights granted and assurances made under Sections
2.D.1, 2.D.2 and 2.D.3, including use rights of the reference and optimized implementations, may
be withdrawn by the submitter(s) and owner(s), as appropriate.
Signed:
Title:
Date:
Place:
Statement by Reference/Optimized Implementations’ Owner(s)
I, Marco Baldi, of Università Politecnica delle Marche, Dipartimento di Ingegneria dell’ Infor-
mazione (DII), Via Brecce Bianche 12, I-60131, Ancona, Italy, am one of the owners of the sub-
mitted reference implementation and optimized implementations and hereby grant the U.S. Govern-
ment and any interested party the right to reproduce, prepare derivative works based upon, distribute
copies of, and display such implementations for the purposes of the post-quantum algorithm public
review and evaluation process, and implementation if the corresponding cryptosystem is selected for
standardization and as a standard, notwithstanding that the implementations may be copyrighted
or copyrightable.
Signed:
Title:
Date:
Place:
Statement by Each Submitter
I, Franco Chiaraluce, of Università Politecnica delle Marche, Dipartimento di Ingegneria dell’ In-
formazione (DII), Via Brecce Bianche 12, I-60131, Ancona, Italy, do hereby declare that the cryp-
tosystem, reference implementation, or optimized implementations that I have submitted, known as
LEDAcrypt, is my own original work, or if submitted jointly with others, is the original work of the
joint submitters. I further declare that I do not hold and do not intend to hold any patent or patent
application with a claim which may cover the cryptosystem, reference implementation, or optimized
implementations that I have submitted, known as LEDAcrypt.
I do hereby acknowledge and agree that my submitted cryptosystem will be provided to the public
for review and will be evaluated by NIST, and that it might not be selected for standardization by
NIST. I further acknowledge that I will not receive financial or other compensation from the U.S.
Government for my submission. I certify that, to the best of my knowledge, I have fully disclosed
all patents and patent applications which may cover my cryptosystem, reference implementation or
optimized implementations. I also acknowledge and agree that the U.S. Government may, during
the public review and the evaluation process, and, if my submitted cryptosystem is selected for stan-
dardization, during the lifetime of the standard, modify my submitted cryptosystem’s specifications
(e.g., to protect against a newly discovered vulnerability).
I acknowledge that NIST will announce any selected cryptosystem(s) and proceed to publish the
draft standards for public comment.
I do hereby agree to provide the statements required by Sections 2.D.2 and 2.D.3, below, for any
patent or patent application identified to cover the practice of my cryptosystem, reference imple-
mentation or optimized implementations and the right to use such implementations for the purposes
of the public review and evaluation process.
I acknowledge that, during the post-quantum algorithm evaluation process, NIST may remove my
cryptosystem from consideration for standardization. If my cryptosystem (or the derived cryptosys-
tem) is removed from consideration for standardization or withdrawn from consideration by all
submitter(s) and owner(s), I understand that rights granted and assurances made under Sections
2.D.1, 2.D.2 and 2.D.3, including use rights of the reference and optimized implementations, may
be withdrawn by the submitter(s) and owner(s), as appropriate.
Signed:
Title:
Date:
Place:
Statement by Reference/Optimized Implementations’ Owner(s)
I, Franco Chiaraluce, of Università Politecnica delle Marche, Dipartimento di Ingegneria dell’ In-
formazione (DII), Via Brecce Bianche 12, I-60131, Ancona, Italy, am one of the owners of the
submitted reference implementation and optimized implementations and hereby grant the U.S. Gov-
ernment and any interested party the right to reproduce, prepare derivative works based upon, dis-
tribute copies of, and display such implementations for the purposes of the post-quantum algorithm
public review and evaluation process, and implementation if the corresponding cryptosystem is se-
lected for standardization and as a standard, notwithstanding that the implementations may be
copyrighted or copyrightable.
Signed:
Title:
Date:
Place:
Statement by Each Submitter
I, Paolo Santini, of Università Politecnica delle Marche, Dipartimento di Ingegneria dell’ Infor-
mazione (DII), Via Brecce Bianche 12, I-60131, Ancona, Italy, do hereby declare that the cryp-
tosystem, reference implementation, or optimized implementations that I have submitted, known as
LEDAcrypt, is my own original work, or if submitted jointly with others, is the original work of the
joint submitters. I further declare that I do not hold and do not intend to hold any patent or patent
application with a claim which may cover the cryptosystem, reference implementation, or optimized
implementations that I have submitted, known as LEDAcrypt.
I do hereby acknowledge and agree that my submitted cryptosystem will be provided to the public
for review and will be evaluated by NIST, and that it might not be selected for standardization by
NIST. I further acknowledge that I will not receive financial or other compensation from the U.S.
Government for my submission. I certify that, to the best of my knowledge, I have fully disclosed
all patents and patent applications which may cover my cryptosystem, reference implementation or
optimized implementations. I also acknowledge and agree that the U.S. Government may, during
the public review and the evaluation process, and, if my submitted cryptosystem is selected for stan-
dardization, during the lifetime of the standard, modify my submitted cryptosystem’s specifications
(e.g., to protect against a newly discovered vulnerability).
I acknowledge that NIST will announce any selected cryptosystem(s) and proceed to publish the
draft standards for public comment.
I do hereby agree to provide the statements required by Sections 2.D.2 and 2.D.3, below, for any
patent or patent application identified to cover the practice of my cryptosystem, reference imple-
mentation or optimized implementations and the right to use such implementations for the purposes
of the public review and evaluation process.
I acknowledge that, during the post-quantum algorithm evaluation process, NIST may remove my
cryptosystem from consideration for standardization. If my cryptosystem (or the derived cryptosys-
tem) is removed from consideration for standardization or withdrawn from consideration by all
submitter(s) and owner(s), I understand that rights granted and assurances made under Sections
2.D.1, 2.D.2 and 2.D.3, including use rights of the reference and optimized implementations, may
be withdrawn by the submitter(s) and owner(s), as appropriate.
Signed:
Title:
Date:
Place:
Statement by Reference/Optimized Implementations’ Owner(s)
I, Paolo Santini, of Università Politecnica delle Marche, Dipartimento di Ingegneria dell’ Infor-
mazione (DII), Via Brecce Bianche 12, I-60131, Ancona, Italy, am one of the owners of the sub-
mitted reference implementation and optimized implementations and hereby grant the U.S. Govern-
ment and any interested party the right to reproduce, prepare derivative works based upon, distribute
copies of, and display such implementations for the purposes of the post-quantum algorithm public
review and evaluation process, and implementation if the corresponding cryptosystem is selected for
standardization and as a standard, notwithstanding that the implementations may be copyrighted
or copyrightable.
Signed:
Title:
Date:
Place: