Software Implementation of Pairing-Based Cryptogra
Software Implementation of Pairing-Based Cryptogra
Software Implementation of Pairing-Based Cryptogra
net/publication/221540899
CITATIONS READS
46 1,001
2 authors:
Some of the authors of this publication are also working on these related projects:
All content following this page was uploaded by Conrado Porto Lopes Gouvêa on 01 July 2016.
1 Introduction
Wireless sensor networks (WSN) have been the subject of a lot of research re-
cently due to their vast number of applications. One of the challenges they bring
is how to secure their communication against eavesdropping or malicious ma-
nipulation. These can be addressed through many cryptographic schemes; but
since these nodes are highly constrained environments, these schemes must be
implemented with great efficiency.
The advantages of asymmetric over symmetric cryptography for WSNs is
well established in the literature. For that reason, we chose to implement two
types of asymmetric cryptosystems: pairing-based and elliptic curve cryptog-
raphy. The security levels being considered are the 64/70-bit, being the most
feasible and where most of the work so far has focused; and the 128-bit, which
can be expensive but may be necessary in the coming years and has not been well
explored for WSNs. The main contributions of this work are a platform-specific
optimization to improve the speed of both types of cryptosystems and timings
for computations in those two different security levels.
The remainder of this work is organized as follows. In Section 2 we give an
introduction to the MSP430 microcontroller, describing its features and limita-
tions. Subsequently, in Section 3, the fundamental operations of multiplication
and reduction are described along with our proposed optimization. The imple-
mentation and results of pairing-based cryptography is described in Section 4.
In Section 5, the implementation and results of elliptic curve cryptography is
detailed. Finally, this paper in concluded in Section 6.
Field multiplication over IFp sums about 75% of the running time of point mul-
tiplication and pairing computation. Consequently, it is crucial to implement
it using assembly language since this leads to a speedup greater than two-fold,
according to our experiments. Multiplication in IFp consists of two operations:
the plain multiplication of the operands into a double precision number and its
subsequent reduction modulo a prime.
3.1 Multiplication
The standard algorithm for multiplication is the Comba method [1], which is a
column-wise variant of the row-wise standard schoolbook version that reduces
memory accesses. Recently, it has been suggested a variant of the Comba method,
the Hybrid method [2], that mixes the row-wise and column-wise techniques. It
can be seen as the plain Comba method, with the difference that each “digit”
is now stored in multiple machine integers, and the digit-digit multiplication is
carried out with the row-wise schoolbook technique. Both methods are illustrated
in Figure 1.
The advantage of the Hybrid method is that, in a digit-digit multiplication,
all of the integers of the first digit can be stored in registers, reducing memory
reads. Consequently, this method is appropriate for platforms with a relatively
large number of registers. In [3], the authors present an even more optimized
version of the Hybrid method, using carry-catcher registers in order to simplify
its carry handling. They have also studied its application on many platforms, in-
cluding the MSP430, where they were able to obtain a 15.4% speed improvement
compared to the Comba method.
It appears that the Hybrid method is always superior to the plain Comba
method when there are sufficient registers available, but this fails to take into
account the characteristics of the platform. Analyzing the running time of the
Comba method, it can be concluded that the majority of the time is spent at
one repeated step: multiply and accumulate. For each column of the result, it is
necessary to compute many products and accumulate them in order to obtain the
result of that column and the carries of the next two columns. The importance
of the multiply and accumulate step (which we will refer to as “MulAcc”) was
noted before in [2,4]. However, what has been overlooked so far is the fact that
the MulAcc is exactly what is provided by the MAC (Multiply and Accumulate)
operation of the MSP430 hardware multiplier.
The MulAcc step is illustrated in Figure 2. It consists of the reading of
two integers, one from each operand, followed by their multiplication into a
double precision integer, and finally the addition of those two integers to a triple
precision accumulator (the third only accumulates the carries of those additions).
Fig. 1. Comparison of multiplication methods: Comba to the left, Hybrid to the right.
Fig. 2. The MulAcc step, using as example the step for words a1 and b2. The registers
r14 and r15 hold the pointers to the two 4-word operands.
The pseudo-assembly code for the MulAcc step without using MAC is listed
in Algorithm 1 and using MAC in Algorithm 2. Compared to Algorithm 1,
Algorithm 2 has two less instructions, one less memory read and one less address
in extension words, saving four cycles in total. This leads to a great speedup since
the MulAcc step is repeated n2 times with n being the size of the operands in
machine integers.
The main advantage of using plain Comba with MAC compared to the Hybrid
method is that the latter uses all of the 12 available registers, while the former
leaves 8 free registers. These can be used as a simple cache for the operands.
Additionally, one register can be used to save the address of the SUMEXT in
order to add using the register indirect mode instead of register indexed, saving
one more cycle in each MulAcc step (this requires a reordering of the instruc-
tions since otherwise the SUMEXT is fetched before the two cycle delay of the
hardware multiplier). Table 1 compares the instruction counts of our implemen-
tation and those from [3]. It can be readily seen that the greatest savings come
from the smaller number of add instructions, since the hardware multiplier does
most of the additions by itself. Also, one cycle can be saved in each step due to
the linear nature of the access of the first operand, which can be read with the
register indirect with post-increment addressing mode (mov @reg+,&label).
The multiplication timings are detailed in Table 2, where is clear that the
Comba multiplier using the MAC optimization is indeed effective, and 9.2%
faster than the Hybrid multiplier given in [3]. We have found that using Karat-
suba multiplication with a 128-bit Comba multiplier is a little faster than using
256-bit Comba, and it also requires less code space.
3.2 Reduction
There also are specific algorithms for reduction when the prime modulus has
a special form. For primes of the form 2k − c such as the 160-bit primes from the
SECG standard [8] the algorithm is described in [9]. For “NIST primes” [10],
the algorithm is described in [11].
The reduction timings are presented in Table 3. The reduction timing from
[5] was estimated by subtracting the reported multiplication timing in [3] from
the field multiplication timing in [5]. While an exact comparison may be hard to
make due to this inexact estimate, we notice again that the MAC optimization
is very effective. The Barrett reduction was slower than Montgomery reduction,
but since we have focused on optimizing Montgomery, we believe its speed can be
further improved. As expected, reduction modulo a special form prime is much
faster.
Finally, the running times of algorithms for field multiplication – multiplica-
tion followed by reduction – are given in Table 4. Compared to [5], field multi-
plication using MAC is about 28% faster.
It has been shown recently that identity-based cryptography using bilinear pair-
ings is very appropriate in the wireless sensor network scenario [12]. There are
many identity-based cryptographic schemes, but the most useful in this context
probably is the non-interactive key agreement scheme [13,14,15] that allows two
parties to compute a mutual key without interaction in order to bootstrap a
secure channel using symmetric encryption, and will be described next.
Let e : G1 × G2 → GT be a bilinear pairing with G1 and G2 being additive
groups and GT a multiplicative group, all of them with a prime order r. Let
H1 : {0, 1}∗ → G1 and H2 : {0, 1}∗ → G2 be two hash functions. The master
key generation is done by the key generation center by choosing a random s ∈
{1, ..., r − 1}. The private key distribution is done before the deployment of
the sensors by assigning a sensor A the identity IDA and private keys S1A =
sH1 (IDA ) and S2A = sH2 (IDA ).
Now, suppose sensors A and B wish to compute a shared key. If G1 and G2
were the same group and the pairing was symmetric, then the two hash functions
would be the same and the two private keys of each node would be equal. There-
fore, A could compute e(S1A , H1 (IDB )) and B could compute e(H1 (IDA ), S1B ).
Due to the bilinearity and symmetry, we have
Then both parties can generate the same value, which can be used to derive
a shared key. In our case, though, the pairing is asymmetric since the elliptic
curves used are ordinary. Therefore, we need two private keys for each sensor, the
hash functions are different, and the last step in the equation is not valid. Still,
we have two useful equations which can be easily verified: e(S1A , H2 (IDB )) =
e(H1 (IDA ), S2B ) and e(H1 (IDB ), S2A ) = e(S1B , H2 (IDA )). In [14], it is suggested
that each party should multiply their sides of those two equations in order to
compute the shared key, but this requires two pairing computations. In [5] it is
suggested that the sensors could agree on which equation they should use with
a little amount of communication. Instead, there is a simpler fix that maintains
the non-interactive aspect of the protocol. It can be defined that the sensor with
the smaller ID in lexicographical order should use its first private key in the first
pairing parameter and the other its second private key in the second pairing
parameter, therefore choosing one of the equations without any interaction.
but using the final exponentiation optimization from [23]. Since the Miller loop
runs through the bits of 6x + 2 (or x in Xate), which has low Hamming weight,
the sliding window technique is not appropriate and was not used.
We present the timings for the finite field operations and pairing compu-
tations in Table 6. The pairing computation is much more expensive than in
the MNT curve, and probably unacceptable for the wireless sensor scenario. As
noted in [24], it is important to keep in mind that the pairing computation
scales more-or-less like RSA rather than like elliptic curve cryptography. It is
also worth noticing that the three kind of pairings give almost the same speed,
with the Xate pairing being a little faster. We describe the Xate pairing for BN
curves in Algorithm 3.
The ROM and RAM requirements of the pairing computation program are
listed in Table 7. To put them in perspective, we note that popular sensors have
such as Tmote Sky and TelosB have 48KB of ROM and 10K of RAM. The code
size is still large; though it is only possible to determine its feasibility by analyz-
ing specific applications. The amount of RAM allocated is probably tolerable,
since most of it is allocated from the stack and freed after the computation.
While identity based schemes built with pairings seem ideal for the wireless sen-
sor scenario, they still are expensive, mainly in the higher 128-bit level of security.
For that reason, we have also implemented the cheaper elliptic curve cryptogra-
phy in order to allow comparison with pairing-based cryptography. To illustrate
a concrete use, the ECDSA (Elliptic Curve Digital Signature Algorithm) [10]
was chosen for its popularity and wide standardization. However, it is important
to notice that elliptic curve cryptography still requires the expensive public key
authentication which is outside the scope of this work.
Algorithm 3 Xate pairing for BN curves
Input: x ∈ ZZ (the BN parameter), Q ∈ E 0 (IFp2 ), P ∈ E(IFp )
Ouput: ζ(Q, P ) ∈ IFp12
1: v, xQ ⇐ f|x|,Q (P ) {fr,Q if the Miller function, it also computes rQ}
2: if x > 0 then
3: v ⇐ 1/v
4: xQ = −xQ
5: end if
3 10
6: v ⇐ v 1+p+p +p
7: v, A ⇐ gxQ,pxQ (P ) {gP,Q is the line function from the Miller function, it also
computes P + Q }
8: v, B ⇐ gp3 xQ,p10 xQ (P )
9: v, C ⇐ gA,B (P )
12
10: return v (p −1)/r
Table 6. Timings for field operations and pairing computations on the BN curve
Table 7. ROM and maximum allocated RAM size for pairing programs
Table 8. Timings for field operations and point multiplication for the given curves
secg160r1 P-256
Algorithm Cycles Time (ms) Cycles Time (ms)
Field operations
Multiplication 1,952 0.24 4,327 0.54
Squaring 1,734 0.22 3,679 0.46
Inversion 187,575 19.27 292,170 36.52
Random point multiplication
4NAF 4,417,661 0,552 13,372,271 1,672
5NAF 4,433,104 0,554 13,188,903 1,649
Montgomery ladder 6.319,383 0,790 20,476,234 2,560
Unknown from [25] 0,800
Fixed point multiplication
Comb, w = 4 1,831,063 0,229 5,688,793 0,711
Comb, w = 4 in [27] 0,720
Sliding window, w = 4 in [25] 0,720
Simultaneous point mult.
Interleaved 5,204,544 0,651 15,784,176 1,973
Table 9. Timings for ECDSA
secg160r1 P-256
Algorithm Cycles Time (s) Cycles Time (s)
Key Generation 1,849,903 0.231 5,682,433 0.710
Sign 2,166,906 0.270 5,969,593 0.746
Verify 5,488,568 0.686 16,139,555 2.017
Table 10. ROM and maximum allocated RAM size for elliptic curve programs
6 Conclusion
Implementing efficient cryptographic schemes on wireless sensor networks is a
difficult task, but feasible. It is important to analyze every feature offered by
the platform in order to get the best results, as can be seen with the simple but
effective optimization using the MAC operation from the hardware multiplier of
the MSP430. Still, there is plenty of work to be done. As our implementation has
shown, there is a steep price to be paid in the 128-bit level of security pairing
computation (14.5 seconds). Some relevant future work that we would suggest is
to provide a fast implementation of identity based cryptography in other security
levels and implement in software the recently proposed method to speed up finite
field arithmetic for BN curves [28].
References
1. Comba, P.: Exponentiation cryptosystems on the IBM PC. IBM Systems Journal
29(4) (1990) 526–538
2. Gura, N., Patel, A., Wander, A., Eberle, H., Shantz, S.: Comparing Elliptic Curve
Cryptography and RSA on 8-bit CPUs. In: Cryptographic Hardware and Embed-
ded Systems - CHES 2004. Volume 3156 of Lecture Notes in Computer Science.,
Springer Berlin / Heidelberg (2004) 925–943
3. Scott, M., Szczechowiak, P.: Optimizing multiprecision multiplication for pub-
lic key cryptography. Cryptology ePrint Archive, Report 2007/299 (2007)
http://eprint.iacr.org/.
4. Großschädl, J.: Instruction Set Extension for Long Integer Modulo Arithmetic
on RISC-Based Smart Cards. Symposium on Computer Architecture and High
Performance Computing (2002) 13–19
5. Szczechowiak, P., Kargl, A., Scott, M., Collier, M.: On the application of pairing
based cryptography to wireless sensor networks. In: Proceedings of the second
ACM conference on Wireless network security, ACM New York, NY, USA (2009)
1–12
6. Montgomery, P.: Modular multiplication without trial division. Mathematics of
computation 44(170) (1985) 519–521
7. Barrett, P.: Implementing the Rivest Shamir and Adleman public key encryption
algorithm on a standard digital signal processor. In: Proceedings on Advances
in cryptology—CRYPTO’86 table of contents. Volume 263 of Lecture Notes in
Computer Science., Springer Berlin / Heidelberg (1987) 311–323
8. Certicom Research: SEC 2: Recommended Elliptic Curve Domain Parameters
(2006) http://www.secg.org/.
9. Menezes, A., Van Oorschot, P., Vanstone, S.: Handbook of Applied Cryptography.
CRC Press (1997)
10. National Institute of Standards and Technology: FIPS 186-3: Digital Signature
Standard (DSS) (2009) http://www.itl.nist.gov.
11. Hankerson, D., Vanstone, S., Menezes, A.: Guide to Elliptic Curve Cryptography.
Springer (2004)
12. Oliveira, L., Aranha, D., Morais, E., Daguano, F., Lopez, J., Dahab, R.: TinyTate:
computing the tate pairing in resource-constrained sensor nodes. In: Sixth IEEE
International Symposium on Network Computing and Applications, 2007. NCA
2007. (2007) 318–323
13. Sakai, R., Ohgishi, K., Kasahara, M.: Cryptosystems based on pairing. In: The 2000
Symposium on Cryptography and Information Security, Okinawa, Japan. (2000)
14. Dupont, R., Enge, A.: Provably secure non-interactive key distribution based on
pairings. Discrete Applied Mathematics 154(2) (2006) 270–276
15. Oliveira, L., Scott, M., Lopez, J., Dahab, R.: TinyPBC: Pairings for authenticated
identity-based non-interactive key distribution in sensor networks. In: Networked
Sensing Systems, 2008. INSS 2008. 5th International Conference on. (2008) 173–
180
16. Lenstra, A.K.: Key Lengths. In: Handbook of Information Security. John Wiley
& Sons (2004)
17. Lenstra, A., Verheul, E.: Selecting Cryptographic Key Sizes. Journal of Cryptology
14(4) (2001) 255–293
18. Barreto, P., Naehrig, M.: Pairing-Friendly Elliptic Curves of Prime Order. In: Se-
lected Areas in Cryptography. Volume 3897 of Lecture Notes in Computer Science.,
Springer Berlin / Heidelberg (2006) 319–331
19. Nogami, Y., Akane, M., Sakemi, Y., Kato, H., Morikawa, Y.: Integer Variable
χ-Based Ate Pairing. In: Pairing-Based Cryptography — Pairing 2008. Volume
5209 of Lecture Notes in Computer Science., Springer Berlin / Heidelberg (2008)
178–191
20. Devegili, A., Scott, M., Dahab, R.: Implementing Cryptographic Pairings over
Barreto-Naehrig Curves. In: Pairing-Based Cryptography — Pairing 2007. Volume
4575 of Lecture Notes in Computer Science., Springer Berlin / Heidelberg (2007)
197–207
21. Vercauteren, F.: Optimal pairings. Cryptology ePrint Archive, Report 2008/096
(2008) http://eprint.iacr.org/.
22. Lee, E., Lee, H.S., Park, C.M.: Efficient and generalized pairing computa-
tion on abelian varieties. Cryptology ePrint Archive, Report 2008/040 (2008)
http://eprint.iacr.org/.
23. Scott, M., Benger, N., Charlemagne, M., Perez, L.J.D., Kachisa, E.J.: On the final
exponentiation for calculating pairings on ordinary elliptic curves. Cryptology
ePrint Archive, Report 2008/490 (2008) http://eprint.iacr.org/.
24. Galbraith, S., Paterson, K., Smart, N.: Pairings for cryptographers. Discrete
Applied Mathematics 156(16) (2008) 3113–3121
25. Wang, H., Li, Q.: Efficient Implementation of Public Key Cryptosystems on Mote
Sensors (Short Paper). In: Information and Communications Security. Volume
4307 of Lecture Notes in Computer Science., Springer Berlin / Heidelberg (2006)
519–528
26. Montgomery, P.: Speeding the Pollard and elliptic curve methods of factorization.
Mathematics of Computation 48(177) (1987) 243–264
27. Szczechowiak, P., Oliveira, L., Scott, M., Collier, M., Dahab, R.: NanoECC: Testing
the Limits of Elliptic Curve Cryptography in Sensor Networks. In: Wireless Sensor
Networks. Volume 4913 of Lecture Notes in Computer Science., Springer Berlin /
Heidelberg (2008)
28. Fan, J., Vercauteren, F., Verbauwhede, I.: Faster Fp-arithmetic for Cryptographic
Pairings on Barreto-Naehrig Curves. In: Cryptographic Hardware and Embed-
ded Systems - CHES 2009. Volume 5747 of Lecture Notes in Computer Science.,
Springer Berlin / Heidelberg (2009) 240–253